My Notes and codes documentation for CUDA learning journey
*Exercises from Chapter-14
###
\begin{bmatrix}
1 & 0 & 7 & 0 \\
0 & 0 & 8 & 0 \\
0 & 4 & 3 & 0 \\
2 & 0 & 0 & 1 \\
\end{bmatrix}
Represent it in each of the following formats:
Solution:
Okay, so let’s represent first into COO format:
In COO; three 1D arrays:
valuerowIdxcolIdx
\begin{bmatrix}
1 & 0 & 7 & 0 \\
0 & 0 & 8 & 0 \\
0 & 4 & 3 & 0 \\
2 & 0 & 0 & 1 \\
\end{bmatrix} \longrightarrow
\begin{array}{ll}
\text{value:} & 1 \quad 7 \quad 8 \quad 4 \quad 3 \quad 2 \quad 1 \\
\text{rowIdx:} & 0 \quad 0 \quad 1 \quad 2 \quad 2 \quad 3 \quad 3 \\
\text{colIdx:} & 0 \quad 2 \quad 2 \quad 1 \quad 2 \quad 0 \quad 3
\end{array}
Will implement kernel in a while
Onto CSR:
\begin{bmatrix}
1 & 0 & 7 & 0 \\
0 & 0 & 8 & 0 \\
0 & 4 & 3 & 0 \\
2 & 0 & 0 & 1 \\
\end{bmatrix} \longrightarrow
\begin{array}{ll}
\text{rowPtrs:} & 0 \quad 2 \quad 3 \quad 5 \quad 7^* \\
\text{colIdx:} & 0 \quad 2 \quad 2 \quad 1 \quad 2 \quad 0 \quad 3 \\
\text{value:} & 1 \quad 7 \quad 8 \quad 4 \quad 3 \quad 2 \quad 1
\end{array}
Didn’t get what I did here?: Click Here to revise
Okay, so let’s go into ELL next:
\begin{bmatrix}
1 & 0 & 7 & 0 \\
0 & 0 & 8 & 0 \\
0 & 4 & 3 & 0 \\
2 & 0 & 0 & 1 \\
\end{bmatrix} \xrightarrow{\text{Group nonZeros by Row + Pad rows to equal size}}
\begin{bmatrix}
0 & 2 \\
2 & * \\
1 & 2 \\
0 & 3 \\
\end{bmatrix}
\begin{bmatrix}
1 & 7 \\
8 & * \\
4 & 3 \\
2 & 1 \\
\end{bmatrix}
\begin{bmatrix}
0 & 2 \\
2 & * \\
1 & 2 \\
0 & 3 \\
\end{bmatrix}
\begin{bmatrix}
1 & 7 \\
8 & * \\
4 & 3 \\
2 & 1 \\
\end{bmatrix} \xrightarrow{\text{Store padded rows in column-major order}}
\begin{array}{ll}
\text{colIdx:} & 0 \quad 2 \quad 1 \quad 0 \quad 2 \quad * \quad 2 \quad 3 \\
\text{value:} & 1 \quad 8 \quad 4 \quad 2 \quad 7 \quad * \quad 3 \quad 1
\end{array}
ⓘ Note: Column-major scan looks like this:
![]()
And Finally JDS
\begin{bmatrix}
1 & 0 & 7 & 0 \\
0 & 0 & 8 & 0 \\
0 & 4 & 3 & 0 \\
2 & 0 & 0 & 1 \\
\end{bmatrix} \xrightarrow{\text{Group nonZeros by Row }}
\begin{bmatrix}
0 & 2 \\
2 & * \\
1 & 2 \\
0 & 3 \\
\end{bmatrix}
\begin{bmatrix}
1 & 7 \\
8 & * \\
4 & 3 \\
2 & 1 \\
\end{bmatrix}
\begin{bmatrix}
0 & 2 \\
2 & * \\
1 & 2 \\
0 & 3 \\
\end{bmatrix}
\begin{bmatrix}
1 & 7 \\
8 & * \\
4 & 3 \\
2 & 1 \\
\end{bmatrix} \xrightarrow{\text{Sort rows by size }} \begin{bmatrix}
0 & 2 \\
1 & 2 \\
0 & 3 \\
2 & * \\
\end{bmatrix}
\begin{bmatrix}
1 & 7 \\
4 & 3 \\
2 & 1 \\
8 & * \\
\end{bmatrix}
\begin{bmatrix}
0 \\
2 \\
3 \\
1 \\
\end{bmatrix}
\begin{bmatrix}
0 & 2 \\
1 & 2 \\
0 & 3 \\
2 & * \\
\end{bmatrix}
\begin{bmatrix}
1 & 7 \\
4 & 3 \\
2 & 1 \\
8 & * \\
\end{bmatrix}
\begin{bmatrix}
0 \\
2 \\
3 \\
1 \\
\end{bmatrix} \xrightarrow{\text{Store in Column Major Order}}
\begin{array}{ll}
\text{iterPtr:} & 0 \quad 4 \quad 7 \\
\text{colIdx:} & 0 \quad 1 \quad 0 \quad 2 \quad 2^* \quad 2 \quad 3 \\
\text{value:} & 1 \quad 4 \quad 2 \quad 8 \quad 7^* \quad 3 \quad 1
\end{array}
ⓘ FYI:
iterPointerarray is also known as jagged diagonal pointer array.
###
If the information that is provided is not enough to allow an answer, indicate what information is missing.
Solution:
In COO:
For a sparse matrix with $m$ rows, $n$ columns, and $z$ non-zeros, the COO format requires:
values: $z$ integersrowIndices: $z$ integerscolumnIndices: $z$ integersHence, total storage requirement in COO: $3z$ integers.
In CSR: For a sparse matrix with $m$ rows, $n$ columns, and $z$ non-zeros, the CSR format requires:
values: $z$ integerscolumnIndices: $z$ integersrowOffsets: $(m+1)$ integersHence, total storage requirement in CSR: $2z+ (m+1)$ integers.
In ELL: For a sparse matrix with $m$ rows, $n$ columns, and $z$ non-zeros, the ELL format requires:
columnIndices: $ m \times K $ integersvalues: $ m \times K $ integerswhere, $K$ stands for maximum number of non-zeros in any row.
Hence, total storage requirement in ELL: $2m \times K$ integers.
ⓘ Note:
Missing information in ELL: To precisely calculate the storage requirements for ELL, we need to know $K$, the maximum number of non-zeros in any row. Without this information, we cannot provide a definitive answer for the ELL format.
In JDS: For a sparse matrix with $m$ rows, $n$ columns, and $z$ non-zeros, the JDS format requires:
values: $z$ integerscolumnIndices: $z$ integersrow permutation array : $m$ integersIterPointer array): $(\text{number of jagged diagonals} + 1)$ integersHence, the total storage requirement for JDS format is $2z + m + (\text{number of jagged diagonals} + 1)$ integers.
ⓘ Note:
Missing information in JDS: To precisely calculate the storage requirements for JDS, we need to know number of jagged diagonals, which is equal to the maximum number of non-zeros in any row after permutation. Without this information, we cannot provide a definitive answer for the JDS format.
###
Solution:
Click Here to redirect to code implementation.
⚠️ In Code: You could see implementation of thrust quite a bit. Thrust is a powerful parallel algorithms library designed for CUDA and C++ that simplifies and accelerates programming on GPUs (Graphics Processing Units). It provides a high-level interface for performing common parallel operations, making it easier for developers to harness the power of GPU computing without needing to dive into the low-level details of CUDA programming. Below, I’ll explain what Thrust is used for in a clear and structured way.
Click Here to redirect to official documentation of thrust library!
###
Solution:
Click Here to redirect to code implementation.