Principle Component Analysis is a Latent Space visualization technique for linear pattern extraction and dimensionality reduction. It transforms high-dimensional data into a lower-dimensional space while preserving as much variance as possible.
For a two dimensional example mapping onto a line, you can think of PCA as finding the best line to fold the data onto, such that the distance from the points to the line is minimized. That βlineβ is the largest principle component of the distribution:
In the context of linear algebra, the Principle Component can be thought of as the eigenvector corresponding to the largest eigenvalue of the dataβs covariance matrix. This eigenvector points in the direction of maximum variance in the data.
Computation
The first step in computing Principle Component Analysis is to find the covariance matrix.
This can be calculated by first centering your data, this means calculating the mean of every variable and then substracting that mean from each datapoint
And then calculating the Covariance matrix: