100 Days of CUDA

My Notes and codes documentation for CUDA learning journey

View the Project on GitHub Firojpaudel/100_days_of_CUDA

Summary of Day 86:

Improved my previous vector addition kernel and made it work with more GFLOPs.

Click Here to redirect to the code.

[!note]

  • Performance: $277.82 \text{ GFLOPs}$
  • Runtime: $0.72 \text{ ms}$
  • GPU: NVIDIA H100

Next, wrote a ReLU kernel:

Click Here to redirect to the code

[!note]

  • Performance: $450.33 \text{ GFLOPs}$
  • Runtime: $0.18 \text{ ms}$
  • GPU: NVIDIA H100