Principal component analysis - Principal component analysis

Japanese: 主成分分析 - しゅせいぶんぶんせき(英語表記)principal component analysis
Principal component analysis - Principal component analysis
It is abbreviated as PCA. It is a multivariate analysis method to reduce the variation of many ( p ) variables to a smaller number ( mp ) of components. When PCA is applied to a data matrix X of n individuals × p variables consisting of data x ij of variable j for individual i , the r -th principal component score f ir ( r = 1, ..., m ) of individual i and the loading a jr of variable j on the r- th principal component are obtained. For example, we who live in a three-dimensional world cannot draw a scatter plot of five variables, but the main variation of the five variables can be visualized by a scatter plot with the first and second principal component scores f i 1 and f i 2 as coordinate values, and the relationship between the variables and the principal components can be interpreted from a jr . Hereinafter, data x ij is the average deviation score with an average of 0 through individual i . PCA formulations are broadly divided into two types.

One of them, Hotelling, H. (1933), formulates PCA as finding the weighted sum of variables with the maximum variance. That is, the sum of variables xij weighted by wjr .



is the r-th principal component score of individual i . Here, the weight w j 1 of the first principal component score is



Under the above, through the individual



The w j 1 is the one that maximizes the variance of . Therefore, f i 1 is the score that maximizes the individual difference. Furthermore, the weight w jr ( r ≧ 2) of the second and subsequent principal component scores is uncorrelated with the scores of the higher principal components, and



This is w jr , which maximizes the variance of f ir through individuals under the above condition.

Another formulation, due to Pearson, C. (1901), regards PCA as a data approximation by the weighted sum of principal component scores, i.e., the sum of scores f ir weighted by loadings a jr



and the residual sum of squares between data x ij



Minimize



Here, F is a principal component score matrix of n individuals × m components consisting of fir , and A is a loading matrix of p variables × m components consisting of ajr . Under the constraint that A'A is a unit matrix, the solution of fir is



In addition, when x ij is the standard score, under the constraint that n -1 F'F is the unit matrix, the solution for the loading a jr is equal to the correlation coefficient between variable j and the rth principal component score.

The important thing about the latter formulation is that any matrix can be decomposed into the product of three matrices KΛL', which is called singular value decomposition (SVD). Here, K'K and L'L are identity matrices, and Λ is a diagonal matrix with diagonal elements in descending order. If the SVD of data matrix X is expressed as X = KΛL', Eckart, C. and Young, G. (1936) proved that FA' that minimizes the residual sum of squares ∥X-FA'∥ 2 is KmΛmL'm . Here, Λm is an m × m diagonal matrix consisting of the upper left element of Λ, and Km and Lm are the top m columns of K and L, respectively. Since X = KΛL', n -1 X'X = L( n -1 Λ 2 )L', and the diagonal elements of n -1 Λ 2 are the eigenvalues ​​of the covariance matrix.

An extension of PCA is three-way principal component analysis, which was proposed by Tucker, L.R. (1966) and established after 1980 mainly by Dutch psychometricians. This is a PCA that finds components corresponding to each phase from data arranged in a three-way array. An example of a three-way array is examinee x condition x test scores obtained by multiple examinees taking multiple tests under multiple conditions, where examinee, condition, and test are the phases. →Factor analysis →Multivariate analysis [Adachi Kohei]

Latest Sources Psychology Encyclopedia Latest Psychology Encyclopedia About Information

Japanese:
PCAと略称される。多数(個)の変数の変動をより少数(個)の成分に縮約するための多変量解析法である。個体に対する変数のデータijからなる個体×変数のデータ行列XにPCAを適用すると,個体の第主成分得点principal component score fir=1,…,),さらに変数の第主成分への負荷量loading ajrが得られる。たとえば5変数の散布図は,3次元の世界に住むわれわれには描けないが,第1,第2主成分得点i1i2を座標値とした散布図で5変数の主要変動を可視化でき,変数と主成分の関係はajrから解釈できる。以下,データijは,個体を通した平均が0の平均偏差得点とする。PCAの定式化は2種に大別される。

 その一つはホテリングHotelling,H.(1933)により,分散最大maximum varianceの変数の重みつき合計weighted sum of variablesを求めることとして,PCAは定式化される。すなわち,jrで重みづけられた変数ijの合計



を個体の第主成分得点とする。ただし,第1主成分得点の重みj1は,条件



のもとで,個体を通した



の分散を最大にするj1である。したがって,i1は個体差を最大にする得点となる。さらに,第2以降の主成分得点の重みjr≧2)は,得点がそれより上位の主成分得点と無相関になり,かつ



という条件のもとで,個体を通したirの分散を最大にするjrである。

 もう一つの定式化はピアソンPearson,C.(1901)に由来し,PCAを主成分得点の重みつき合計weighted sum of principal component scoresによるデータ近似data approximationとみなすものである。すなわち,負荷量jrで重みつけられた得点irの合計



とデータijとの残差平方和



を最小にする



を求めることとして,PCAは定式化される。ここで,Fはirからなる個体×成分の主成分得点行列,Aはjrからなる変数×成分の負荷行列である。A′Aを単位行列とする制約条件下では,irの解は前段の



と一致する。また,ijが標準得点のとき,-1F′Fを単位行列とする制約条件下では,負荷量jrの解は,変数と第主成分得点の相関係数と一致する。

 後者の定式化で重要になるのは,いかなる行列も三つの行列の積KΛL′に分解されることであり,これを特異値分解singular value decomposition(SVD)とよぶ。ここで,K′KとL′Lは単位行列,Λは対角要素が降順の対角行列である。データ行列XのSVDをX=KΛL′と表わすと,残差平方和∥X-FA′∥2を最小にするFA′がKmΛmL′mとなることを,エッカートEckart,C.とヤングYoung,G.(1936)が証明している。ここで,ΛmはΛの左上の要素からなる×の対角行列,KmとLmはそれぞれKとLの上位列からなる。なお,X=KΛL′より-1X′X=L(-1Λ2)L′であり,-1Λ2の対角要素が共分散行列の固有値eigenvalueとなる。

 PCAの拡張手法に,タッカーTucker,L.R.(1966)が草案し,1980年以降にオランダの計量心理学者を中心に確立された三相主成分分析three-way principal component analysisがある。これは三相配列のデータから,各相に対応する成分を求めるPCAである。三相配列の例として,複数受験者が複数条件で複数のテストを受けて得られる受験者×条件×テストの得点が挙げられ,この条件では受験者・条件・テストが相となる。 →因子分析 →多変量解析
〔足立 浩平〕

出典 最新 心理学事典最新 心理学事典について 情報

<<:  Embryo transfer

>>:  Zhu Shijie - Shusei Ketsu

Recommend

Eino - Eino

〘 noun 〙 Tax paid in money under the Kanko system....

Diodotos

…the Greek kingdom founded around 250 BCE by Diod...

Amatsu Akaboshi

…Please refer to the respective entries for the &...

Manji Seiho - Manji Seiho

The basic laws of the Choshu domain were enacted b...

Adamawa-Eastern (English spelling)

…They are subdivided into six groups: West Atlant...

Wehnelt, B.

...They said that there are two types of substanc...

Crab Yamabushi - Kaniyamabushi

The title of a Kyogen piece. Yamabushi Kyogen. A ...

Timbuktu - Tombuktu (English spelling)

Timbuktu is an ancient city in central Mali, West...

lanthanum

La. Atomic number 57. Electron configuration [Xe]...

Masashige Okudaira

1694-1746 A daimyo from the early to mid-Edo peri...

stomach cancer

Concept Gastric cancer is an epithelial malignant ...

The Battle of Algiers

...The government of the Fourth Republic of Franc...

Middle class - chukanso (English spelling) middle class

In the class and hierarchical structure of societ...

Zagros [Mountains] - Zagros

A folded mountain range that runs from the west to...

Itazuke ruins

<br /> The remains of a moated settlement fr...