My Notes and codes documentation for CUDA learning journey
Went through the Book: Cuda By Example An Introduction to General-Purpose GPU Programming
In this chapter, the authors explored the fascinating journey of parallel processing, from its humble beginnings with dual-core processors to today’s multi-core CPUs (8-core, 16-core, 32-core, and beyond). They highlighted how the clock speeds of CPUs have dramatically evolved—from 1 MHz in the 1980s to 4 GHz today, achieving speeds nearly 1,000 times faster than early personal computers.
However, as computational tasks like tumor detection or molecular simulations became increasingly complex, CPUs alone struggled to handle the workload. This challenge gave rise to GPU computing and the advent of CUDA (Compute Unified Device Architecture), which revolutionized how GPUs were used.
Initially, GPUs were designed strictly for graphics-intensive tasks, and programming them required converting projects into graphics-oriented code using languages like OpenGL, GLSL, or Microsoft HLSL. CUDA, introduced by NVIDIA, changed this paradigm by allowing developers to program GPUs directly using CUDA C, a language built on C/C++ but enhanced with GPU-specific functionalities.
The authors also discussed real-world applications that demonstrated CUDA’s transformative potential:
This chapter emphasized the significance of CUDA in democratizing high-performance computing, showcasing how GPUs became instrumental in solving problems previously thought to require supercomputers.
Installation of CUDA Toolkit: Click here to redirect to the download page
Snapshots of Installation Process:
Image_1
Image_2
This chapter, focused entirely on setting up the CUDA Toolkit. It provided a detailed guide on how to install CUDA on various platforms, ensuring compatibility with different systems.
We started with a simple “Hello, World!” program and learned about two types of programs: host and device. The host code runs on the CPU, while the device code runs on the GPU.
For the host code, it is quite similar to standard C:
#include <iostream>
int main(void) {
printf("Hello World 👋!");
return 0;
}
For the device (GPU) code, it looks like this:
#include <iostream>
__global__ void kernel(void) {
}
int main(void) {
kernel<<<1, 1>>>();
printf("Hello World 👋!");
return 0;
}
To run the kernel code (with a .cu file extension) in VS Code, you need to install the Nsight Visual Studio Code Edition extension.
If you encounter an error about cl.exe not being found, add the following paths to your system environment variables:
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\bin\Hostx64\x64C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\bin\Hostx64\x64\cl.exe__global__ void kernel(void) defines a function called kernel that runs on the GPU. The __global__ keyword indicates that this function can be called from the host (CPU) and will execute on the device (GPU).<<<1, 1>>> specify the execution configuration for the kernel, where the first parameter is the number of thread blocks and the second parameter is the number of threads per block.