100 Days of CUDA

My Notes and codes documentation for CUDA learning journey

View the Project on GitHub Firojpaudel/100_days_of_CUDA

Summary of Day 96:

Today’s kernel: MSE Loss

Click Here to redirect to the code.

[!note]

  1. GPU: H100
    • Performance: $5.41 \text{ TFLOPs}$
    • Runtime: $0.06 \text{ ms}$
  2. GPU: L40S
    • Performance: $6.41 \text{ TFLOPs}$
    • Runtime: $0.08 \text{ ms}$