Cluster analysis - cluster analysis, clustering

Japanese: クラスター分析 - クラスターぶんせき(英語表記)cluster analysis,clustering
Cluster analysis - cluster analysis, clustering
This refers to a method of classifying surveyed items (variables), individuals, organizations (individuals), etc., using statistical information when they are considered to be made up of heterogeneous groups or populations. When classifying individuals, a grouping (classification) of individuals is found based on the p- variable data x i = [ x i 1 , ..., x ip ]' of individual i (= 1, ..., n ) or similarity data between individuals, in which similar individuals belong to the same group (cluster) and dissimilar objects belong to different groups. In the following description, when classifying a group of variables, individual i can be read as variable i . Cluster analysis is the general name for such statistical methods, and is broadly divided into hierarchical cluster analysis and non-hierarchical cluster analysis.

The diagram illustrates the principle of hierarchical clustering. The dendrogram on the right, which is the result of analyzing the data x1 = [4, 1] ' , x2 = [1, 5]', x3 = [5, 4]', x4 = [1, 3]', and x5 = [5, 1]', as shown in the diagram on the left, is obtained through the following three steps: (1) The distance between the five points in the scatter plot is found, and the shortest x1 and x5 are merged into one group C1 . This merging is shown by the intersection C1 in the dendrogram on the right. (2) Set the representative point of group C1 as the center of gravity of the points of the individuals belonging to it , c1 = 0.5( x1 + x5 ) = [4.5, 1]', find the distance between c1 , x2 , x3 , and x4 , and merge the shortest x2 and x4 into group C2 . This is shown by C2 on the right. ( 3 ) Find the distance between the representative point of C2 , c2 = 0.5( x2 + x4 ), and c1 and x3 , and since x3 and c1 are the shortest, merge x3 into C1 . This merging is shown by C3 on the right.

Hierarchical analysis is subdivided into several sub-methods based on the differences in the procedures in steps 2 and 3 above. Among them, the method explained using the diagram above is called the centroid method, and is characterized by using the centroid to calculate the distance between groups and individuals, and between groups. Other methods for calculating the inter-group distance include the group average method, which uses the average of the squared distances between individuals belonging to different groups, the nearest neighbor method, which uses the shortest distance, the furthest neighbor method, which uses the longest distance, and Ward's method, which calculates the distance between A and B by subtracting the distance between individuals in group A and the distance between individuals in group B from the distance between individuals in the group obtained by merging groups A and B, i.e., the increase in the distance between individuals due to the merging of groups.

Methods that do not hierarchically (sequentially) merge individuals or groups, but instead define a statistically ideal classification using an objective function and optimize it, are collectively called nonhierarchical clustering. The representative method, the K - means method,



g ik that minimizes is found. Here, k (= 1, ..., K ) represents the group, g i 1 , ..., g iK are parameters that are 1 only for the group to which individual i belongs and 0 for all others, x̄ k is the average (centre of gravity) of the data of individuals belonging to group k , and ∥x i -x̄ k ∥ represents the distance between x i and x̄ k . g ik that minimizes the objective function f ( g ik ) represents the classification that minimizes the sum of the squared distance between each individual and the average of the cluster that contains it.

The K -means method does not allow individuals to belong to multiple groups, but one non-hierarchical analysis that does is ADCLUS (additive clustering), developed in the field of quantitative psychology. This is based on the similarity data s ij between i and j ,



This is a method to find g ik, which is either 1 or 0, and a continuous quantity w k ≧0 that minimizes, and its aim is easy to understand if you call i and j stimuli and group k feature k . In other words, g ik g jk = 1 indicates that both stimuli share the feature k of weight w k , and Adclass aims to describe the similarity by the sum of the shared features w k . →Multivariate Analysis [Adachi Kohei]
Figure: Principle of hierarchical cluster analysis
">

Figure: Principle of hierarchical cluster analysis


Latest Sources Psychology Encyclopedia Latest Psychology Encyclopedia About Information

Japanese:
調査対象になっている項目(変数)や個人,組織(個体)などが異質のグループや集団から成立していると考えられるとき,それらを統計的な情報を使って分類する手法を指す。個体を分類する場合には,個体(=1,…,)の変量データxi=[i1,…,ip]′,または個体間の類似性データに基づいて,類似する個体同士は同じ群(クラスター)に,類似しない対象同士は異なる群に属するような個体の群分け(分類)を見いだす。以下の記述において,変数群を分類する場合には,個体を変数と読み替えればよい。クラスター分析は,こうした統計手法の総称名で,階層的クラスター分析と非階層的クラスター分析に大別される。

 図は階層的クラスター分析hierarchical clusteringの原理を例示する。そのうちの左の図のように散布するデータx1=[4,1],x2=[1,5]′,x3=[5,4]′,x4=[1,3]′,x5=[5,1]′の分析結果である右の樹形図(デンドログラム)は,次の3ステップを通して求められる。⑴散布図の5点間の距離を求め,最短のx1とx5を一つの群1として併合する。この併合を右の樹形図の交わり1が示す。⑵群1の代表点を所属個体の点の重心c1=0.5(x1+x5)=[4.5,1]′として,c1,x2,x3,x4間の距離を求め,最短のx2とx4を群2として併合する。これを右の2が示す。⑶2の代表点c2=0.5(x2+x4)とc1とx3の距離を求め,x3とc1が最短であるため,x31に併合する。この併合を右の3が示す。

 以上のステップの⑵,⑶における手順の違いによって,階層的分析はいくつかの下位手法に細分される。その中でも上記の図を用いた説明による手法は重心法centroid methodとよばれ,群と個体,および群間の距離の算出に重心を用いるのが特徴である。ほかに群間距離として,異なる群に属する個体同士の距離の2乗の平均を用いる群平均法group average method,最短距離を用いる最近隣法nearest neighbor method,最長距離を用いる最遠隣法furthest neighbor methodや,群Aと群Bを合併した群内の個体間距離から群A内の個体間距離とB内の個体間距離を減じた値,つまり群の合併に伴う個体間距離の増分を,AとBの距離とするウォード法Ward's methodなどがある。

 階層的(逐次的)に個体や群を合併していくのではなく,統計学的に理想的な分類を目的関数によって定義して,それを最適化する方法を非階層的クラスター分析nonhierarchical clusteringと総称する。その代表である平均法-means methodでは,



を最小にするikが求められる。ここで,(=1,…,)は群を表わし,i1,…,iKは,それらの中で個体が属する群に対応するものだけが1,ほかはすべて0を取るパラメータ,x̄kは群に所属する個体のデータの平均(重心),∥xi-x̄k∥はxiとx̄kの距離を表わす。目的関数ik)を最小にするikは,各個体とそれを含むクラスターの平均との平方距離の合計が最小となる分類を表わす。

 平均法は,各個体の複数群への所属を認めない方法であるが,それを認める非階層的分析の一つに,計量心理学の分野で開発されたアドクラスADCLUS(additive clustering)がある。これは,の類似性データijに基づいて,



を最小にする1か0のikと連続量のk≧0を求める方法であり,そのねらいは,を刺激,群を特徴とよび換えるとわかりやすい。すなわち,ikjk=1となることは両刺激がウェイトkの特徴を共有することを表わし,共有特徴のkの総和によって類似性を記述することをアドクラスはめざしている。 →多変量解析
〔足立 浩平〕
図 階層的クラスター分析の原理
">

図 階層的クラスター分析の原理


出典 最新 心理学事典最新 心理学事典について 情報

<<:  Glastonbury

>>:  Cluster - Cluster (English spelling)

Recommend

HIP - HIP

...After sintering, for machine parts, sizing may...

Urushiyama No. 2 Tomb - Urushiyama No. 2 Tomb

...The prefectural capital of Yamagata Prefecture...

Hydrate

…As with living tissue, this is often due to the ...

"Autumn Leaves" (poem) - He

…Other collections of poems include “Tales” (1946...

Wood fragrance - Mokkou

Herbal medicine Use for Herbal medicine One of th...

Shado - Shado

Around 1668 - 1737 (around Kanbun 8 - Genbun 2) A ...

Geelong

A port city in the state of Victoria in southeaste...

Bertrand Du Guesclin

Around 1320-80 Commander-in-chief of the French ro...

Japanese Olympic Committee

…One of the National Olympic Committees (NOCs) th...

Kirishima ebine - Kirishima ebine

An evergreen perennial plant of the orchid family...

Giant tortoise (Elephant turtle) - Giant tortoise (English spelling)

A turtle of the Testudinidae family, the largest l...

Senso-ji Temple - Senso-ji

Located in Asakusa, Taito Ward, Tokyo, this templ...

white-breasted wood-swallow

...A species of bird in the passerine family Hiru...

Ilex latifolia (English spelling)

… [Toshio Hamatani]. … *Some of the terms that me...

Semisulcospira kurodai (English spelling) Semisulcospirakurodai

…Haikyuchuu [Namibe Tadashige]. . . *Some of the ...