Principal component analysis - Principal component analysis

Japanese: 主成分分析 - しゅせいぶんぶんせき(英語表記)principal component analysis
Principal component analysis - Principal component analysis
It is abbreviated as PCA. It is a multivariate analysis method to reduce the variation of many ( p ) variables to a smaller number ( mp ) of components. When PCA is applied to a data matrix X of n individuals × p variables consisting of data x ij of variable j for individual i , the r -th principal component score f ir ( r = 1, ..., m ) of individual i and the loading a jr of variable j on the r- th principal component are obtained. For example, we who live in a three-dimensional world cannot draw a scatter plot of five variables, but the main variation of the five variables can be visualized by a scatter plot with the first and second principal component scores f i 1 and f i 2 as coordinate values, and the relationship between the variables and the principal components can be interpreted from a jr . Hereinafter, data x ij is the average deviation score with an average of 0 through individual i . PCA formulations are broadly divided into two types.

One of them, Hotelling, H. (1933), formulates PCA as finding the weighted sum of variables with the maximum variance. That is, the sum of variables xij weighted by wjr .



is the r-th principal component score of individual i . Here, the weight w j 1 of the first principal component score is



Under the above, through the individual



The w j 1 is the one that maximizes the variance of . Therefore, f i 1 is the score that maximizes the individual difference. Furthermore, the weight w jr ( r ≧ 2) of the second and subsequent principal component scores is uncorrelated with the scores of the higher principal components, and



This is w jr , which maximizes the variance of f ir through individuals under the above condition.

Another formulation, due to Pearson, C. (1901), regards PCA as a data approximation by the weighted sum of principal component scores, i.e., the sum of scores f ir weighted by loadings a jr



and the residual sum of squares between data x ij



Minimize



Here, F is a principal component score matrix of n individuals × m components consisting of fir , and A is a loading matrix of p variables × m components consisting of ajr . Under the constraint that A'A is a unit matrix, the solution of fir is



In addition, when x ij is the standard score, under the constraint that n -1 F'F is the unit matrix, the solution for the loading a jr is equal to the correlation coefficient between variable j and the rth principal component score.

The important thing about the latter formulation is that any matrix can be decomposed into the product of three matrices KΛL', which is called singular value decomposition (SVD). Here, K'K and L'L are identity matrices, and Λ is a diagonal matrix with diagonal elements in descending order. If the SVD of data matrix X is expressed as X = KΛL', Eckart, C. and Young, G. (1936) proved that FA' that minimizes the residual sum of squares ∥X-FA'∥ 2 is KmΛmL'm . Here, Λm is an m × m diagonal matrix consisting of the upper left element of Λ, and Km and Lm are the top m columns of K and L, respectively. Since X = KΛL', n -1 X'X = L( n -1 Λ 2 )L', and the diagonal elements of n -1 Λ 2 are the eigenvalues ​​of the covariance matrix.

An extension of PCA is three-way principal component analysis, which was proposed by Tucker, L.R. (1966) and established after 1980 mainly by Dutch psychometricians. This is a PCA that finds components corresponding to each phase from data arranged in a three-way array. An example of a three-way array is examinee x condition x test scores obtained by multiple examinees taking multiple tests under multiple conditions, where examinee, condition, and test are the phases. →Factor analysis →Multivariate analysis [Adachi Kohei]

Latest Sources Psychology Encyclopedia Latest Psychology Encyclopedia About Information

Japanese:
PCAと略称される。多数(個)の変数の変動をより少数(個)の成分に縮約するための多変量解析法である。個体に対する変数のデータijからなる個体×変数のデータ行列XにPCAを適用すると,個体の第主成分得点principal component score fir=1,…,),さらに変数の第主成分への負荷量loading ajrが得られる。たとえば5変数の散布図は,3次元の世界に住むわれわれには描けないが,第1,第2主成分得点i1i2を座標値とした散布図で5変数の主要変動を可視化でき,変数と主成分の関係はajrから解釈できる。以下,データijは,個体を通した平均が0の平均偏差得点とする。PCAの定式化は2種に大別される。

 その一つはホテリングHotelling,H.(1933)により,分散最大maximum varianceの変数の重みつき合計weighted sum of variablesを求めることとして,PCAは定式化される。すなわち,jrで重みづけられた変数ijの合計



を個体の第主成分得点とする。ただし,第1主成分得点の重みj1は,条件



のもとで,個体を通した



の分散を最大にするj1である。したがって,i1は個体差を最大にする得点となる。さらに,第2以降の主成分得点の重みjr≧2)は,得点がそれより上位の主成分得点と無相関になり,かつ



という条件のもとで,個体を通したirの分散を最大にするjrである。

 もう一つの定式化はピアソンPearson,C.(1901)に由来し,PCAを主成分得点の重みつき合計weighted sum of principal component scoresによるデータ近似data approximationとみなすものである。すなわち,負荷量jrで重みつけられた得点irの合計



とデータijとの残差平方和



を最小にする



を求めることとして,PCAは定式化される。ここで,Fはirからなる個体×成分の主成分得点行列,Aはjrからなる変数×成分の負荷行列である。A′Aを単位行列とする制約条件下では,irの解は前段の



と一致する。また,ijが標準得点のとき,-1F′Fを単位行列とする制約条件下では,負荷量jrの解は,変数と第主成分得点の相関係数と一致する。

 後者の定式化で重要になるのは,いかなる行列も三つの行列の積KΛL′に分解されることであり,これを特異値分解singular value decomposition(SVD)とよぶ。ここで,K′KとL′Lは単位行列,Λは対角要素が降順の対角行列である。データ行列XのSVDをX=KΛL′と表わすと,残差平方和∥X-FA′∥2を最小にするFA′がKmΛmL′mとなることを,エッカートEckart,C.とヤングYoung,G.(1936)が証明している。ここで,ΛmはΛの左上の要素からなる×の対角行列,KmとLmはそれぞれKとLの上位列からなる。なお,X=KΛL′より-1X′X=L(-1Λ2)L′であり,-1Λ2の対角要素が共分散行列の固有値eigenvalueとなる。

 PCAの拡張手法に,タッカーTucker,L.R.(1966)が草案し,1980年以降にオランダの計量心理学者を中心に確立された三相主成分分析three-way principal component analysisがある。これは三相配列のデータから,各相に対応する成分を求めるPCAである。三相配列の例として,複数受験者が複数条件で複数のテストを受けて得られる受験者×条件×テストの得点が挙げられ,この条件では受験者・条件・テストが相となる。 →因子分析 →多変量解析
〔足立 浩平〕

出典 最新 心理学事典最新 心理学事典について 情報

<<:  Embryo transfer

>>:  Zhu Shijie - Shusei Ketsu

Recommend

Extreme Load - Extreme Load

…If the load is further increased, the deformatio...

Kearney, P. - Kearney

…Irish author. Nephew of Peadar Kearney, the auth...

Liver lobe

The hepatoduodenal ligament is the hepatogastric ...

Medici family - Medici

A distinguished family of Florence, Italy, and pa...

Already published

…According to Chinese psychology, when the mind i...

Hydrophis melanocephalus (English spelling)

...There are 53 species of sea snakes in 15 gener...

Rammed Earth - Hanchiku

A method of compacting earth to build a building&#...

Gentileschi, Orazio (Lomi)

Born: Around 1563, Pisa [Died] c. 1647. London Ita...

Winchester Tropes Collection

…The use of organs in churches was also early, wi...

On Liberty

…He was concerned about mixing it with the tradit...

Apricot rain - Kyoukau

...The reason why games such as tamari and kemari...

Ebisudai (English name) Japanese squirrelfish

A marine fish belonging to the order Alfonsinophy...

Malagasy mongoose (English spelling)

...general term for mammals of the subfamily Mong...

Eosinophile chemotactic factor

…Of these antibodies, the immunoglobulin IgE anti...

tenable corpse

…It is rare for the whole body to become complete...