My Notes and codes documentation for CUDA learning journey
Okay so yesterday, I explored the different types of memory available, such as global memory (large but slow) and shared memory (small but fast). This understanding laid the foundation for learning about the tiling concept, which optimizes memory usage and improves computational efficiency.
Example case:
Mds for $M$ and Nds for $N$).Click Here to view the full implementation of tiled matrix multiplication.