Multivariate analysis - multivariate analysis

Japanese: 多変量解析 - たへんりょうかいせき(英語表記)multivariate analysis
Multivariate analysis - multivariate analysis
Multivariate analysis refers to a method of analyzing the correlation structure and causal relationships between multiple variables when multiple observations are obtained from an individual of interest in an experiment or survey. In other words, when multiple variables y1 , ..., yp are observed from an individual, it is a general term for various statistical analysis methods that treat the vector y = [ y1 , ..., yp ]' that combines multiple variables as a single observation unit, rather than analyzing each variable separately. Multivariate analysis began in the late 19th century and early 20th century with the research on regression and correlation by Galton, F. and Pearson, C., and at the same time, Spearman, CE came up with the idea of ​​factor analysis. Then, Fisher, RA, Hotelling, H., Rao, CR, Anderson, TW, and others extended univariate statistical theory to multivariate, and the basic method was systematized. Subsequent notable developments include the research on structural equation models by Jöreskog, KG and others from the late 1960s, and the research on projection matrices as the basis of multivariate analysis conducted by Haruo Yanai.

Multivariate analysis methods can be broadly categorized from various perspectives, one of which would be classification by purpose, such as dimensionality reduction, causal analysis, and classification of individuals. Below, we will describe the basic methods, assuming that the mean of all variables is 0.

[Dimension reduction] Let us add the subscript i representing the individual to y and express the data of individual i as y i = [ y i 1 , ..., y ip ]'. The weighted composite score of each variable



Principal component analysis (PCA ) is a method for finding a weight vector wk = [w1k, ..., wpk]' (k = 1 , ... , m ) that reduces the inter-individual variation of the original p variables as much as possible. For example, if the weight type m is set to 2 for five-dimensional data with p = 5, the distribution of individuals in an invisible five-dimensional space can be approximately visualized with a two-dimensional scatter plot of the scores [ fi1 , fi2 ]' based on w1 , w2 ( k = 1 , 2 ). When a different set of variables xi = [ xi1 , ..., xiq ]' is observed from individual i along with yi , the weighted composite score for each set of variables is



and



Canonical correlation analysis is a method of finding wk and vk that maximizes the sum of the correlation coefficients with respect to k , and is positioned as a method of reducing the correlation between a group of variables to a score with a small number of dimensions ( m ).

[Causal analysis between observed variables] A path diagram is useful for showing causal relationships and predictive relationships between variables (Figure). In a path diagram, variables are shown as boxes, unobserved latent variables are enclosed in circles or ellipses, and causal relationships between variables are shown with unidirectional arrows, while correlations are shown with bidirectional arrows. As shown in the example of path diagram (A), the causal relationships in which each element of y i = [ y i 1 , ..., y ip ]' is the result and x i = [ x i 1 , ..., x iq ]' is the cause can be expressed using the error e ij.



Here, if the coefficient vector aj = [ aj1 , ..., ajq ]' is organized into a p x q matrix A = [ a1 , ..., ap ] and the error vector is expressed as ei = [ ei1 , ..., eip ] ' , the above model can be expressed as yi = Axi + ei . The analysis to find A that minimizes the sum of squared errors under this model is called multivariate regression analysis. If there is no correlation between errors, then the solution and the sum of squared errors for each result j can be expressed as yij = a'jxi + eij



The solutions of multiple regression analysis that minimizes

As shown in the example of path diagram (B), the analysis that reflects the causal relationships between the elements of yi = [ yi1 , ..., yip ]' and leaves only the paths that show the causal relationships is called path analysis, and the model can be written as yi = A [ yi , xi ] + e . In path analysis, the analyst himself considers the model that shows the causal relationships between variables, in other words, how to connect the arrows in the path diagram, and selects the model that fits the data well.

[Factor analysis with latent variables] In the multivariate regression analysis model y i = Ax i + e i , where x i is replaced by the unobserved latent variables f i , y i = Af i + e i is the model of exploratory factor analysis. In contrast, as shown in the path diagram (C), factor analysis that is performed under the assumption that paths are limited to specific variables, that is, that some of the elements of A are 0, is called confirmatory factor analysis. A model that develops this and considers causality between latent variables and can be written as [y i , f i ] = Af i + e i is called a structural equation model (SEM). Many models of factor analysis, path analysis, and SEM are sometimes collectively called covariance structure analysis because their solutions are based on covariance. However, SEM includes models for the mean and is not limited to models related to covariance structure. Independent component analysis, which has been developed since the 1990s, is positioned as a multivariate analysis method that aims to identify factors by viewing them as sources of independent signals.

[Classification of individuals] For example, methods for statistically determining which group an individual belongs to, such as classifying patients into three groups: healthy, cold, or hay fever, are collectively called discriminant analysis. In the most basic two-group linear discriminant analysis,



Assuming that group discrimination is performed by comparing the magnitude of and a threshold c , the optimal values ​​of w = [ w1 , ..., wq ]' and c are estimated from the data of multiple individuals whose groups are known, and then the x of an individual whose group is unknown is substituted into f (x) to determine which of the two groups the individual belongs to. Note that when the group to be discriminated is not given in advance, the method of constructing groups, that is, the method of dividing individuals into groups so that similar individuals belong to the same group and distant individuals belong to different groups, is collectively called cluster analysis.

[Methods related to principal component analysis] Principal component analysis can also be modeled as yi = Bf i + e i using f i = [ f i 1 , ..., f im ]', which is a vector of weighted composite scores. This appears to be the same as the exploratory factor analysis model yi = Af i + e i , but the two analyses differ due to different assumptions about the error e i . For example, principal component analysis expanded to analyze data in which the observed values ​​of the three variables "faculty, gender, and desired occupation" are categories such as "engineering faculty, male, technical occupation" is called multiple correspondence analysis or quantification method type 3, and gives a vector that quantifies the categories as the solution. The method of finding the coordinate values ​​of categories from distance data between categories is called multidimensional scaling.

[Multivariate inferential statistics] Among theoretical distributions that represent the probability density that a random variable vector [ Y1 , ..., Yp ]' will take on the realized value [ y1 , ..., yp ] ' , a representative one is the multivariate normal distribution, which is a generalization of the normal distribution. Based on such theoretical distributions, hypothesis testing methods and interval estimation methods for the solutions of the methods described above have been devised. In addition, when x i in the multivariate regression analysis model y i = Ax i + e i is a vector that represents the group to which an individual belongs with elements of 1 or 0, the column of A becomes the average vector of each group, and a method for testing hypotheses such as the equality of average vectors between groups based on this model is called multivariate analysis of variance. →Factor analysis →Regression analysis →Categorical data analysis →Cluster analysis →Structural equation model →Principal component analysis →Multidimensional scaling method [Adachi Kohei]
Diagram: Example of a path diagram
">

Diagram: Example of a path diagram


Latest Sources Psychology Encyclopedia Latest Psychology Encyclopedia About Information

Japanese:
多変量解析とは,実験や調査において関心の対象となっている個体から複数の観測値が得られるとき,その複数の変数間の相関の構造や因果関係について分析する手法を指す。すなわち,個体から複数変数1,…,pが観測されるとき,各変数を別々に分析するのではなく,複数変数をまとめたベクトルy=[1,…,p]′を一つの観測単位として扱う統計解析の諸方法を総称する。多変量解析は,19世紀末から20世紀初頭のゴールトンGalton,F.やピアソンPearson,C.の回帰・相関の研究に萌芽し,同時期にスピアマンSpearman,C.E.が因子分析を着想している。そして,フィッシャーFisher,R.A.,ホテリングHotelling,H.,ラオRao,C.R.やアンダーソンAnderson,T.W.などによって一変量の統計理論が多変量に拡張され,基本的方法が体系化される。その後の特筆すべき動きに,1960年代後半からのヨレスコフJöreskog,K.G.らによる構造方程式モデルの研究や,柳井晴夫が進めた多変量解析の基礎としての射影行列の研究がある。

 多変量解析の諸方法はさまざまな観点で大別されるが,その一つは,次元縮約,因果分析,個体の分類といった目的ごとの分類であろう。以下,すべての変数の平均は0と想定して,基本的な諸方法を記す。

【次元縮約】 個体を表わす添え字をyにつけて,個体のデータをyi=[i1,…,ip]′と表わそう。各変数の重みつき合成得点



が,もとの変数の個体間変動をできるだけよく縮約するような重みベクトルwk=[1k,…,pk]′(=1,…,)を求める方法が主成分分析principal component analysis(PCA)である。たとえば,=5の5次元データに対して重みの種類を2とすれば,目には見えない5次元空間内での個体の散布が,w1,w2=1,2)に基づく得点[i1i2]′の2次元散布図で近似的に可視化される。個体から,yiとともに別種の変数群xi=[i1,…,iq]′が観測されるとき,それぞれの変数群の重みつき合成得点







の相関係数のに関する総和を最大にするwkとvkを求める方法が正準相関分析canonical correlation analysisであり,変数群間の相関関係を少数()次元の得点に縮約させる方法と位置づけられる。

【観測変数間の因果分析】 変数間の因果関係や予測の関係を示すためには,パス図path diagramが便利である(図)。パス図においては,変数を四角で示し,観測されていない潜在変数を円あるいは楕円で囲み,また変数間の因果関係を一方向の矢印,相関関係を双方向の矢印で示す。パス図(A)に例示するように,yi=[i1,…,ip]′の各要素が結果,xi=[i1,…,iq]′が原因である因果関係は,誤差ijを用いて



とモデル化される。ここで,係数ベクトルaj=[j1,…,jq]′をp×qの行列A=[a1,…,ap]にまとめ,誤差ベクトルをei=[i1,…,ip]′と表わせば,上記のモデルはyi=Axi+eiと表わせる。このモデルのもとで誤差2乗和を最小にするAを求める分析を,多変量回帰分析multivariate regression analysisという。誤差間に相関がない場合,この解と各結果についてij=a′jiijの誤差2乗和



を最小にする重回帰分析multiple regression analysisの解は一致する。

 パス図(B)に例示するように,yi=[i1,…,ip]′の要素間にも因果関係を反映させ,因果関係を示すパスのみを残す分析をパス解析path analysisとよび,そのモデルはyi=A[yi,xi]+eiのように書ける。パス解析では,変数間の因果関係を表わすモデル,言い換えれば,パス図における矢印の結び方を分析者自身が考え,データへの適合度が高いモデルを選定することになる。

【潜在変数を伴う因子分析】 多変量回帰分析のモデルyi=Axi+eiのxiが,観測されない潜在変数latent variablesのfiに代わったyi=Afi+eiが,探索的因子分析exploratory factor analysisのモデルである。これに対して,パス図(C)に例示するように,パスが特定の変数どうしに限られる,つまりAの要素のいくつかは0であるという仮定のもとに行なう因子分析を確認的因子分析confirmatory factor analysisという。これを発展させて,潜在変数どうしにも因果を考え,[yi,fi]=Afi+eiのように書けるモデルを構造方程式モデルstructural equation model(SEM)という。因子分析・パス解析・SEMの多くのモデルは,それらの解法が共分散に基づく点で共通するため,共分散構造分析covariance structure analysisと総称されることがある。ただしSEMは平均に対するモデルを含み,共分散構造に関するモデルに限定されない。なお,1990年代より発展した独立成分分析independent component analysisは,因子を互いに独立した信号の発信源とみなして,その同定をめざす多変量解析法と位置づけられる。

【個体の分類】 たとえば,来診者を健常群・風邪の群・花粉症の3群のいずれかへ分類するといった,個体の所属群の判別を統計的に行なう方法を,判別分析discriminant analysisと総称する。最も基本的な2群の線形判別分析では,



と閾値の大小比較で群判別を行なうことを想定して,所属群が既知の複数個体のデータからw=[1,…,q]′との最適値を推定した後,所属群が未知の個体のxを(x)に代入して,その個体が2群のいずれに属するかを判別する。なお判別すべき群があらかじめ与えられていない場合に,群を構成する手法,すなわち似た個体同士は同一群,隔たる個体同士は異なる群に属するように個体を群分けする手法を,クラスター分析cluster analysisと総称する。

【主成分分析に関連する手法】 主成分分析は,重みつき合成得点をベクトルにしたfi=[i1,…,im]′を用いてyi=Bfi+eiとモデル化することもでき,これは探索的因子分析のモデルyi=Afi+eiと見かけ上は同じであるが,誤差eiに対する仮定の違いから両分析は異なる。たとえば,三変数「学部・性別・希望職種」の観測値が「工学部・男性・技術職」といったカテゴリーであるデータを分析するために拡張された主成分分析は,多重対応分析multiple correspondence analysisまたは数量化法3類などとよばれ,カテゴリーを数量化したベクトルを解として与える。カテゴリー同士の距離的なデータから,カテゴリーの座標値を求める方法は,多次元尺度法multidimensional scalingとよばれる。

【多変量推測統計】 確率変数ベクトル[1,…,p]′が実現値[1,…,p]′を取る確率密度を表わす理論分布の中でも代表的なものは,正規分布を一般化した多変量正規分布multivariate normal distributionである。こうした理論分布に基づき,ここまで記した諸方法の解に関する仮説検定法や区間推定法が考案されている。なお,多変量回帰分析のモデルyi=Axi+eiのxiが,個体の所属群を1か0の要素で表わすベクトルであるとき,Aの列は各群の平均ベクトルとなり,このモデルを基礎として平均ベクトルの群間等値の仮説などを検定する手法を多変量分散分析multivariate analysis of varianceとよぶ。 →因子分析 →回帰分析 →カテゴリカル・データ分析 →クラスター分析 →構造方程式モデル →主成分分析 →多次元尺度法
〔足立 浩平〕
図 パス図の例
">

図 パス図の例


出典 最新 心理学事典最新 心理学事典について 情報

<<:  Lake Tahoe (English spelling)

>>:  Theory of function of many variables

Recommend

Ammonia Caramel - Sweet bean paste with caramel

…In this case, the starch from the raw material s...

Carabao - Carabao

…The swamp buffalo is a working buffalo kept main...

Paul Klee

Swiss painter. Born December 18th in Münchenbuchs...

Jean de France, duc de Berry (English spelling)

…In the early 12th century, the city was incorpor...

Square tie - Kakutai

…It is called a four‐in‐hand because the length f...

Sperm whale - sperm whale

A mammal of the sperm whale family (illustration) ...

Acid chloride - Sanenkabutsu

Also known as acyl chloride. A compound in which ...

"The Tale of Ichijo Okura"

...These parts show the characteristic features o...

Return feast - Return master

〘noun〙① After a gambling or sumo festival, the win...

Fjord - fiord (English spelling)

This refers to a glacial valley whose lower reach...

The Shaykh Site Rebellion - The Shaykh Site Rebellion

A Kurdish rebellion during the establishment of th...

Crested shelduck (English spelling) Tadorna cristata; crested shelduck

Anseriformes, Anatidae. There are only three speci...

Kisobushi

A representative folk song of Nagano Prefecture. I...

Agemaki Musubi - Agemaki Musubi

…[Ikeda Takae] In Japan, along with clasps and ob...

Sabae [city] - Sabae

A city in the central part of Fukui Prefecture, in...