# 2. Matrix Operations

February 10, 202

## Basic Linear Algebra Subroutines (BLAS)

Basic linear algebra subroutines are a standard library interface for manipulating dense vectors and matrices. There are 3 levels of BLAS routines.

- Vectors $O(n)$ flops. $O(n)$ data.
- $x \cdot y = z$

- Matrix-vector $O(n^2)$ flops. $O(n^2)$ data.
- $Ax = B$

- Matrix-matrix $O(n^3)$ flops. $O(n^2)$ data.
- $AB = C$

## Locality and memory

Matrices are usually stored as contiguous arrays, in **column major order**. The two costs of operations are

- arithmetic operations (
**flops**) - time spent fetching data (communication).

The cost of communication is much higher than the cost of a floating point operation.

💡 Communication takes about 100 cycles to fetch something.

Memories are optimized for

- spatial locality
- It is faster to access data sequentially that is close to each other.

- temporal locality
- This includes reusing data in cache.

## Examples of matrix vector products

Here is a basic implementation of matrix-vector multiplication.

```
function matvec_row(A, x)
m, n = size(A)
y = zeros(eltype(x), m)
for i = 1:m
for j = 1:n
y[i] += A[i,j] * x[j]
end
end
return y
end
```

This has some spatial locality in $y$ because it is reusing cache.

It has some temporal locality in $x$ because it is sequentially calling the pointer.

However, this is slower than iterating over the $n$ then $m$! This is because the column-wise implementation has spatial locality for both $i$ and $j$.

### Matrix multiplication

Notice that for BLAS3 operations, it takes $O(n^3)$ flops but uses $O(n^2)$ data. This creates a big opportunity for efficiency.

One way we can try to compute this is by the **inner product.** Suppose we want to compute $C = AB$. We can write $A$ as a $n\times 1$ matrix.

We can also try computing an **outer** **product**.

In practice, these can be made much faster by blocking.

## Blocking

Blocking is a way to increase temporal locality by representing matrices as smaller matrices. It allocates memory that can be allocated fast.

## Structured matrices

There are a couple important matrices that we should know.

- Structured matrices
- Diagonal
- Tridiagonal
- Permutation
- One nonzero per row/col