Principal components analysis (PCA) is a dimensionality reduction technique that aims to preserve variability in the data. A Variational Autoencoder (VAE) (Kingma et al. (2014)) is a deep learning approach to, among other things, generate novel images of handwritten digits and celebrities. So how are these two models related?
To begin, let's explore PCA. We assume we have a dataset , where . In the case of scRNA-seq data, each dimension would correspond to one gene. For simplicity, we assume our data have been mean-centered, i.e., . PCA learns a sequence of unit vectors such that:
The vectors are mutually orthogonal
The squared distance of points to lines in direction of vectors is minimzed
Let us unpack this. To find the first principal component, we solve the following optimization problem.
Another interpretation of PCA is that it projects the data onto lines (components) such that the variance of the resulting projected points is maximized. To see this consider
code_test = 1