100 Days of CUDA

My learning resources:

Books:
- Cuda By Example An Introduction to General-Purpose GPU Programming — Jason Sandres, Edward Kandrot
- PMPP; *4th Edition — Wen-mei, David, Izzat

Phase 1: CUDA Basics & Memory (Days 01 - 20)

Day	Curriculum & Highlights	Action
01-05	Setup, Hello World, Vector Add, Image Conversion, MatMul Intro.	View
06-10	Warps, Divergence, Occupancy, Memory Hierarchy.	View
11-15	Tiling, Dynamic Tiling, Coalescing, Thread Coarsening.	View
16-20	1D/2D Convolutions, Constant Memory, Halo Cells, Caching.	View

Phase 2: Parallel Algorithms (Days 21 - 40)

Day	Curriculum & Highlights	Action
21-25	Stencil Ops, Parallel Histograms, Atomic Ops, Privatization.	View
26-30	Sum/Max Reduction, Shared Memory Optimizations, Prefix Scan.	View
31-35	Kogge-Stone, Brent-Kung, Hierarchical Scan.	View
36-40	Parallel Merge, Co-ranks, Radix Sort Implementation.	View

Phase 3: Sparse Matrices, Graphs & ML Ops (Days 41 - 60)

Day	Curriculum & Highlights	Action
41-45	COO, CSR, ELL, Hybrid ELL-COO, JDS Formats.	View
46-50	Parallel BFS, Vertex/Edge Centric Traversal, Frontiers.	View
51-55	CNN Intro, Forward Pass, Backpropagation Kernels.	View
56-60	PyCUDA Integration, Matrix Inversion, Batch/Layer Norm.	View

Phase 4: Triton, cuDNN & Advanced MRI (Days 61 - 80)

Day	Curriculum & Highlights	Action
61-65	MHA Triton, CNN Inference, cuDNN Integration, LeNet-5.	View
66-70	MRI Reconstruction, FFT Kernels, Dynamic Parallelism.	View
71-75	Tensara Prep: Softmax, GEMM with Bias/ReLU, Inclusive Scan.	View
76-80	Thrust, 4D Tensor MatMul, Swish Activation, RMS Norm.	View

Phase 5: Competition & Final Completion (Days 81 - 100)

Day	Curriculum & Highlights	Action
81-85	Softplus, 1D Conv, KL-Divergence, High-Performance ReLU.	View
86-90	Layer Norm (4D), Tri-MatMul, Symmetric MatMul, GEMM+.	View
91-95	Triplet Margin Loss, GELU, MSE Loss Performance.	View
96-100	Sigmoid Performance, 2D Pooling, Challenge Completion.	View

Project Highlights

Comprehensive Coverage: From CUDA basics to advanced deep learning and transformer architectures.
Hands-on Code: Every day features real CUDA code, with a focus on practical, high-performance GPU programming.
Modern Deep Learning: Includes CNNs, RNNs, attention mechanisms, normalization, and more.
Performance Optimization: Profiling, memory management, and multi-GPU strategies.

What’s Next?

Stay tuned for more advanced CUDA explorations, real-world projects, and deep dives into GPU-powered AI!
_{Follow this repository for future updates and bonus content.}