![]() Supplementary qualitative variables (green): Column 13 corresponding to the two athlete-tic meetings (2004 Olympic Game or 2004 Decastar).Supplementary continuous variables (red): Columns 11 and 12 corresponding respectively to the rank and the points of athletes.Supplementary variables: As supplementary individuals, the coordinates of these variables will be predicted also.Active variables (in pink, columns 1:10) : Variables that are used for the principal component analysis.Supplementary individuals (in dark blue, rows 24:27) : The coordinates of these individuals will be predicted using the PCA information and parameters obtained with active individuals/variables.Active individuals (in light blue, rows 1:23) : Individuals that are used during the principal component analysis.Due to this redundancy, PCA can be used to reduce the original variables into a smaller number of new variables ( = principal components) explaining most of the variance in the original variables. This is a categorical (or factor) variable factor. Head(decathlon2.active, 4) # X100m Long.jump Shot.put High.jump X400m X110m.hurdle We start by subsetting active individuals and active variables for the principal component analysis: decathlon2.active <- decathlon2 It can be used to color individuals by groups. In principal component analysis, variables are often scaled (i.e. standardized). This is particularly recommended when variables are measured in different scales (e.g: kilograms, kilometers, centimeters, …) otherwise, the PCA outputs obtained will be severely affected. The goal is to make the variables comparable. Generally variables are scaled to have i) standard deviation one and ii) mean zero. The standardization of data is an approach widely used in the context of gene expression data analysis before PCA and clustering analysis. When scaling variables, the data can be transformed as follow: We might also want to scale the data when the mean and/or the standard deviation of variables are largely different. Where \(mean(x)\) is the mean of x values, and \(sd(x)\) is the standard deviation (SD). The R base function `scale() can be used to standardize the data. It takes a numeric matrix as an input and performs the scaling on the columns. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |