100 Days of CUDA

My Notes and codes documentation for CUDA learning journey

View the Project on GitHub Firojpaudel/100_days_of_CUDA

Summary of Day 91:

Kernels of the day:

  1. Symmetric Matrix Multiplication:

Kernel: 1

Click Here to redirect towards code

[!note]

  • Performance: $4.8 \text{ TFLOPs}$
  • Runtime: $166.12 \text{ ms}$
  • GPU: A100-80GB

Kernel: 2

Click Here to redirect towards code

[!note]

  • Performance: $8.5 \text{ TFLOPs}$
  • Runtime: $93.81 \text{ ms}$
  • GPU: A100-80GB

[!caution] For some reason, I could not get it working on H100 GPUs. It failed in last test case. I’ll try to optimize that in the future

  1. Gemm with bias and RELU:

Click Here to redirect towards code

[!important] I went through HAMDI’s code for this. I just could not get it right on first time. Will try to give my own touch some day. I optimized his code basically

[!note]

  • Performance: $10.6 \text{ TFLOPs}$
  • Runtime: $1.43 \text{ ms}$
  • GPU: A100-80GB