My learning resources:
- Books:
- Cuda By Example An Introduction to General-Purpose GPU Programming — Jason Sandres, Edward Kandrot
- PMPP; *4th Edition — Wen-mei, David, Izzat
Phase 1: CUDA Basics & Memory (Days 01 - 20)
| Day |
Curriculum & Highlights |
Action |
| 01-05 |
Setup, Hello World, Vector Add, Image Conversion, MatMul Intro. |
View |
| 06-10 |
Warps, Divergence, Occupancy, Memory Hierarchy. |
View |
| 11-15 |
Tiling, Dynamic Tiling, Coalescing, Thread Coarsening. |
View |
| 16-20 |
1D/2D Convolutions, Constant Memory, Halo Cells, Caching. |
View |
Phase 2: Parallel Algorithms (Days 21 - 40)
| Day |
Curriculum & Highlights |
Action |
| 21-25 |
Stencil Ops, Parallel Histograms, Atomic Ops, Privatization. |
View |
| 26-30 |
Sum/Max Reduction, Shared Memory Optimizations, Prefix Scan. |
View |
| 31-35 |
Kogge-Stone, Brent-Kung, Hierarchical Scan. |
View |
| 36-40 |
Parallel Merge, Co-ranks, Radix Sort Implementation. |
View |
Phase 3: Sparse Matrices, Graphs & ML Ops (Days 41 - 60)
| Day |
Curriculum & Highlights |
Action |
| 41-45 |
COO, CSR, ELL, Hybrid ELL-COO, JDS Formats. |
View |
| 46-50 |
Parallel BFS, Vertex/Edge Centric Traversal, Frontiers. |
View |
| 51-55 |
CNN Intro, Forward Pass, Backpropagation Kernels. |
View |
| 56-60 |
PyCUDA Integration, Matrix Inversion, Batch/Layer Norm. |
View |
Phase 4: Triton, cuDNN & Advanced MRI (Days 61 - 80)
| Day |
Curriculum & Highlights |
Action |
| 61-65 |
MHA Triton, CNN Inference, cuDNN Integration, LeNet-5. |
View |
| 66-70 |
MRI Reconstruction, FFT Kernels, Dynamic Parallelism. |
View |
| 71-75 |
Tensara Prep: Softmax, GEMM with Bias/ReLU, Inclusive Scan. |
View |
| 76-80 |
Thrust, 4D Tensor MatMul, Swish Activation, RMS Norm. |
View |
Phase 5: Competition & Final Completion (Days 81 - 100)
| Day |
Curriculum & Highlights |
Action |
| 81-85 |
Softplus, 1D Conv, KL-Divergence, High-Performance ReLU. |
View |
| 86-90 |
Layer Norm (4D), Tri-MatMul, Symmetric MatMul, GEMM+. |
View |
| 91-95 |
Triplet Margin Loss, GELU, MSE Loss Performance. |
View |
| 96-100 |
Sigmoid Performance, 2D Pooling, Challenge Completion. |
View |
Project Highlights
- Comprehensive Coverage: From CUDA basics to advanced deep learning and transformer architectures.
- Hands-on Code: Every day features real CUDA code, with a focus on practical, high-performance GPU programming.
- Modern Deep Learning: Includes CNNs, RNNs, attention mechanisms, normalization, and more.
- Performance Optimization: Profiling, memory management, and multi-GPU strategies.
What’s Next?
Stay tuned for more advanced CUDA explorations, real-world projects, and deep dives into GPU-powered AI!
Follow this repository for future updates and bonus content.