|
| 1 | +Principal Component Analysis |
| 2 | +============================ |
| 3 | + |
| 4 | +.. contents:: |
| 5 | + :local: |
| 6 | + :depth: 2 |
| 7 | + |
| 8 | +Introduction |
| 9 | +------------ |
| 10 | + |
| 11 | +Principal component analysis is one technique used to take a large list |
| 12 | +of interconnected variables and choose the ones that best suit a model. |
| 13 | +This process of focusing in on only a few variables is called |
| 14 | +**dimensionality reduction**, and helps reduce complexity of our |
| 15 | +dataset. At its root, principal component analysis *summarizes* data. |
| 16 | + |
| 17 | +.. figure:: _img/pca4.png |
| 18 | + |
| 19 | + Ref: https://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues |
| 20 | + |
| 21 | +Motivation |
| 22 | +---------- |
| 23 | + |
| 24 | +Principal component analysis is extremely useful for deriving an overall, |
| 25 | +linearly independent, trend for a given dataset with many variables. |
| 26 | +It allows you to extract important relationships out of variables that |
| 27 | +may or may not be related. Another application of principal component |
| 28 | +analysis is for display - instead of representing a number of different |
| 29 | +variables, you can create principal components for just a few and plot |
| 30 | +them. |
| 31 | + |
| 32 | +Dimensionality Reduction |
| 33 | +------------------------ |
| 34 | + |
| 35 | +There are two types of dimensionality reduction: feature elimination |
| 36 | +and feature extraction. |
| 37 | + |
| 38 | +**Feature elimination** simply involves pruning |
| 39 | +features from a dataset we deem unnecessary. A downside of feature |
| 40 | +elimination is that we lose any potential information gained from the |
| 41 | +dropped features. |
| 42 | + |
| 43 | +**Feature extraction**, however, creates new variables |
| 44 | +by combining existing features. At the cost of some simplicity or |
| 45 | +interpretability, feature extraction allows you to maintain all |
| 46 | +important information held within features. |
| 47 | + |
| 48 | +Principal component analysis deals with feature extraction (rather than |
| 49 | +elimation) by creating a set of independent variables called principal |
| 50 | +components. |
| 51 | + |
| 52 | +PCA Example |
| 53 | +----------- |
| 54 | + |
| 55 | +Principal component analysis is performed by considering all of our |
| 56 | +variables and calculating a set of direction and magnitude pairs (vectors) |
| 57 | +to represent them. For example, let's consider a small example dataset |
| 58 | +plotted below: |
| 59 | + |
| 60 | +.. figure:: _img/pca1.png |
| 61 | + |
| 62 | + Ref: https://towardsdatascience.com/a-one-stop-shop-for-principal-component-analysis-5582fb7e0a9c |
| 63 | + |
| 64 | +Here we can see two direction pairs, represented by the red and green |
| 65 | +lines. In this scenario, the red line has a greater magnitude as the |
| 66 | +points are more clustered across a greater distance than with the |
| 67 | +green direction. Principal component analysis will use the vector |
| 68 | +with the greater magnitude to transform the data into a smaller |
| 69 | +feature space, reducing dimensionality. For example, the above graph |
| 70 | +would be transformed into the following: |
| 71 | + |
| 72 | +.. figure:: _img/pca2.png |
| 73 | + |
| 74 | + Ref: https://towardsdatascience.com/a-one-stop-shop-for-principal-component-analysis-5582fb7e0a9c |
| 75 | + |
| 76 | +By transforming our data in this way, we've ignored a feature that |
| 77 | +is less important to our model - that is, higher variation along the |
| 78 | +green dimension will have a greater impact on our results than |
| 79 | +variation along the red. |
| 80 | + |
| 81 | +The mathematics behind principal component analysis are left out of |
| 82 | +this discussion for brevity, but if you're interested in learning |
| 83 | +about them we highly recommend visiting the references listed at the |
| 84 | +bottom of this page. |
| 85 | + |
| 86 | +Number of Components |
| 87 | +-------------------- |
| 88 | + |
| 89 | +In the example above, we took a two-dimensional feature space and |
| 90 | +reduced it to a single dimension. In most scenarios though, you will |
| 91 | +be working with far more than two variables. Principal component |
| 92 | +analysis can be used to just remove a single feature, but it is often |
| 93 | +useful to reduce several. There are several strategies you can employ |
| 94 | +to decide how many feature reductions to perform: |
| 95 | + |
| 96 | +1. **Arbitrarily** |
| 97 | + |
| 98 | + This simply involves picking a number of features to keep for your |
| 99 | + given model. This method is highly dependent on your dataset and |
| 100 | + what you want to convey. For instance, it may be beneficial to |
| 101 | + represent your higher-order data on a 2D space for visualization. |
| 102 | + In this case, you would perform feature reduction until you have |
| 103 | + two features. |
| 104 | + |
| 105 | +2. **Percent of cumulative variability** |
| 106 | + |
| 107 | + Part of the principal component analysis calculation involves |
| 108 | + finding a proportion of variance which approaches 1 through each |
| 109 | + round of PCA performed. This method of choosing the number of |
| 110 | + feature reduction steps involves selecting a target variance |
| 111 | + percentage. For instance, let's look at a graph of cumulative |
| 112 | + variance at each level of PCA for a theoretical dataset: |
| 113 | + |
| 114 | + .. figure:: _img/pca3.png |
| 115 | + |
| 116 | + Ref: https://www.centerspace.net/clustering-analysis-part-i-principal-component-analysis-pca |
| 117 | + |
| 118 | + The above image is called a scree plot, and is a representation |
| 119 | + of the cumulative and current proportion of variance for each |
| 120 | + principal component. If we wanted at least 80% cumulative variance, |
| 121 | + we would use at least 6 principal components based on this scree plot. |
| 122 | + Aiming for 100% variance is not generally recommended, as reaching |
| 123 | + this means your dataset has redundant data. |
| 124 | + |
| 125 | +3. **Percent of individual variability** |
| 126 | + |
| 127 | + Instead of using principal components until we reach a cumulative |
| 128 | + percent of variability, we can instead use principal components |
| 129 | + until a new component wouldn't add much variability. In the plot |
| 130 | + above, we might choose to use 3 principal components since the |
| 131 | + next components don't have as strong a drop in variability. |
| 132 | + |
| 133 | +Conclusion |
| 134 | +---------- |
| 135 | + |
| 136 | +Principal component analysis is a technique to summarize data, and is |
| 137 | +highly flexible depending on your use case. It can be valuable in both |
| 138 | +displaying and analyzing a large number of possibly dependent variables. |
| 139 | +Techniques of performing principal component analysis range from |
| 140 | +arbitrarily selecting principal components, to automatically finding |
| 141 | +them until a variance is reached. |
| 142 | + |
| 143 | +Code Example |
| 144 | +------------ |
| 145 | + |
| 146 | +Our example code, `pca.py`_, shows you how to perform principal component |
| 147 | +analysis on a dataset of random x, y pairs. The script goes through a |
| 148 | +short process of generating this data, then calls sklearn's PCA module: |
| 149 | + |
| 150 | +.. _pca.py: https://github.com/machinelearningmindset/machine-learning-course/blob/master/code/unsupervised/PCA/pca.py |
| 151 | + |
| 152 | +.. code:: python |
| 153 | +
|
| 154 | + # Find two principal components from our given dataset |
| 155 | + pca = PCA(n_components = 2) |
| 156 | + pca.fit(points) |
| 157 | +
|
| 158 | +Each step in the process includes helpful visualizations using |
| 159 | +matplotlib. For instance, the principal components fitted above are |
| 160 | +plotted as two vectors on the dataset: |
| 161 | + |
| 162 | +.. figure:: _img/pca5.png |
| 163 | + |
| 164 | +The script also shows how to perform dimensionality reduction, discussed |
| 165 | +above. In sklearn, this is done by simply calling the transform method |
| 166 | +once a PCA is fitted, or doing both steps at the same time with |
| 167 | +fit_transform: |
| 168 | + |
| 169 | +.. code:: python |
| 170 | +
|
| 171 | + # Reduce the dimensionality of our data using a PCA transformation |
| 172 | + pca = PCA(n_components = 1) |
| 173 | + transformed_points = pca.fit_transform(points) |
| 174 | +
|
| 175 | +The end result of our transformation is just a series of X values, |
| 176 | +though the code example performs an inverse transformation for plotting |
| 177 | +the result in the following graph: |
| 178 | + |
| 179 | +.. figure:: _img/pca6.png |
| 180 | + |
| 181 | +References |
| 182 | +---------- |
| 183 | + |
| 184 | +1. http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf |
| 185 | +2. https://towardsdatascience.com/a-one-stop-shop-for-principal-component-analysis-5582fb7e0a9c |
| 186 | +3. https://towardsdatascience.com/pca-using-python-scikit-learn-e653f8989e60 |
| 187 | +4. https://en.wikipedia.org/wiki/Principal_component_analysis |
| 188 | +5. https://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues |
| 189 | +6. https://www.centerspace.net/clustering-analysis-part-i-principal-component-analysis-pca |
0 commit comments