100 Days of CUDA

My Notes and codes documentation for CUDA learning journey

View the Project on GitHub Firojpaudel/100_days_of_CUDA

Summary of Day 76:

Trying out cumulative product today.

\[\text{output}[i] = \prod_{j=0}^{i} \text{input}[j]\]

Assumption both vectors are of same sizes. ie., $N$

$1^{st}$ approach: Naive simplest kernel:

Click Here to redirect to the code.

The performance of this one was very bad. Like very very bad.

[!note] The best it could go was up to:

  • Performance: $0.03 \text{ GFLOPs}$
  • Runtime: $0.03 \text{ ms}$

$2^{nd}$ approach: Multikernel approach

Click Here to redirect to the code.

[!note]

  • Performance: $5.5 \text{ GFLOPs}$
  • Runtime: $0.10 \text{ ms}$

Tried other ways but idk why its failing. Will look into this tomorrow!