libcuda.so - The NVIDIA CUDA Driver Library
libcudart.so - The NVIDIA CUDA Runtime Library
libcublas.so - The NVIDIA cuBLAS Library
libcusparse.so - The NVIDIA cuSPARSE Library
libcusolver.so - The NVIDIA cuSOLVER Library
libcufft.so, libcufftw.so - The NVIDIA cuFFT Libraries
libcurand.so - The NVIDIA cuRAND Library
libnppc.so, libnppi.so, libnpps.so - The NVIDIA CUDA NPP Libraries
libnvvm.so - The NVIDIA NVVM Library
libdevice.so - The NVIDIA libdevice Library
libcuinj32.so, libcuinj64.so - The NVIDIA CUINJ Libraries
libnvToolsExt.so - The NVIDIA Tools Extension Library
The CUDA Driver API library for low-level CUDA programming.
The CUDA Runtime API library for high-level CUDA programming, on top of the CUDA
The cuBLAS library is an implementation of BLAS (Basic Linear Algebra
Subprograms) on top of the NVIDIA CUDA runtime. It allows the user to access
the computational resources of NVIDIA Graphics Processing Unit (GPU), but does
not auto-parallelize across multiple GPUs.
To use the cuBLAS library, the application must allocate the required matrices
and vectors in the GPU memory space, fill them with data, call the sequence of
desired cuBLAS functions, and then upload the results from the GPU memory
space back to the host. The cuBLAS library also provides helper functions for
writing and retrieving data from the GPU.
The cuSPARSE library contains a set of basic linear algebra subroutines used for
handling sparse matrices. It is implemented on top of the NVIDIA CUDA runtime
(which is part of the CUDA Toolkit) and is designed to be called from C and
C++. The library routines can be classified into four categories:
* Level 1: operations between a vector in sparse format and a vector in dense
* Level 2: operations between a matrix in sparse format and a vector in dense
* Level 3: operations between a matrix in sparse format and a set of vectors in
dense format (which can also usually be viewed as a dense tall matrix)
* Conversion: operations that allow conversion between different matrix formats
The cuSOLVER library contains LAPACK-like functions in dense and sparse linear
algebra, including linear solver, least-square solver and eigenvalue solver.
The NVIDIA CUDA Fast Fourier Transform (FFT) product consists of two separate
libraries: cuFFT and cuFFTW. The cuFFT library is designed to provide high
performance on NVIDIA GPUs. The cuFFTW library is provided as porting tool to
enable users of FFTW to start using NVIDIA GPUs with a minimum amount of
The FFT is a divide-and-conquer algorithm for efficiently computing discrete
Fourier transforms of complex or real-valued data sets. It is one of the most
important and widely used numerical algorithms in computational physics and
general signal processing. The cuFFT library provides a simple interface for
computing FFTs on an NVIDIA GPU, which allows users to quickly leverage the
floating-point power and parallelism of the GPU in a highly optimized and
tested FFT library.
The cuRAND library provides facilities that focus on the simple and efficient
generation of high-quality pseudorandom and quasirandom numbers. A
pseudorandom sequence of numbers satisfies most of the statistical properties
of a truly random sequence but is generated by a deterministic algorithm. A
quasirandom sequence of n-dimensional points is generated by a deterministic
algorithm designed to fill an n-dimensional space evenly.
NVIDIA NPP is a library of functions for performing CUDA accelerated processing.
The initial set of functionality in the library focuses on imaging and video
processing and is widely applicable for developers in these areas. NPP will
evolve over time to encompass more of the compute heavy tasks in a variety of
problem domains. The NPP library is written to maximize flexibility, while
maintaining high performance.
NPP can be used in one of two ways:
* A stand-alone library for adding GPU acceleration to an application with
minimal effort. Using this route allows developers to add GPU acceleration to
their applications in a matter of hours.
* A cooperative library for interoperating with a developer’s GPU code
Either route allows developers to harness the massive compute resources of
NVIDIA GPUs, while simultaneously reducing development times.
The NVVM library is used by NVCC to compile CUDA binary code to run on NVIDIA
The libdevice library is a collection of NVVM bitcode functions that implement
common functions for NVIDIA GPU devices, including math primitives and
bit-manipulation functions. These functions are optimized for particular GPU
architectures, and are intended to be linked with an NVVM IR module during
compilation to PTX.
The CUDA internal libraries for profiling. Used by nvprof and the Visual
The NVIDIA Tools Extension Library.
For more information, please see the online documentation at
©2013 NVIDIA Corporation. All rights reserved.