Voiceprint - Seimon

Japanese: 声紋 - せいもん

It is a pattern created by analyzing the characteristics of an individual's voice by a machine. The process of generating voice has three main physiological elements: exhalation, phonation, and articulation. In other words, the airflow generated by the exhalation movement vibrates the vocal cords on the left and right, generating sound waves (vocal cord sound source), which become speech sounds with resonant characteristics as they pass through the pharynx, oral cavity, and nasal cavity (collectively called the vocal tract) formed by the articulatory organs (palate, upper and lower jaws, tongue, lips, nasal septum, etc.). This speech sound has individual characteristics due to individual differences in vocal cord vibration, vocal tract, and speech movement. Generally, when we talk about voiceprints, we are referring to the results of analysis using a sound spectrogram (sonagraph). There are two types of sound spectrogram patterns, wideband (300 Hz) and narrowband (45 Hz), depending on the analysis filter. With a wideband filter, it is possible to see changes related to the vocal tract, such as the formant frequencies of speech (formants are the parts with the strongest energy obtained through frequency analysis. From lowest to highest frequency, they are called the first formant, second formant, etc.) and the formant bandwidth. Individual characteristics of the vocal tract are mainly manifested in the structure of higher formants (third, fourth formant and above) and the formant bandwidth. On the other hand, with a narrowband filter, it is possible to see the harmonic structure of the fundamental frequency (pitch frequency) of vocal cord vibration for vowels, etc., and to observe changes related to the vocal cord source. Individual characteristics of the vocal cord source are manifested as differences in the average fundamental frequency, regular changes in the fundamental frequency, and vibration waveforms.

In identifying voiceprints, the voiceprints of parents and children, brothers, sisters, twins, etc. are similar. This is because the inherent length, size, and movement of the vocalization and articulation organs are similar. The differences between adult and child voices, male and female voices, etc. are also mainly caused by the vocalization and articulation organs. In general, female voices have a higher pitch (higher pitch) than male voices, and it is said that analyzing female voices is more difficult than analyzing male voices. However, good analysis results can be obtained for high-pitched voices by using a 500 Hz bandpass filter. In addition, when the same speaker changes the pitch or intensity of the voice, the waveform of the voice changes, but the overall change shows a similar regularity to when the voice is almost unchanged. It is also known that modified vocalization (speaking with the nose pinched or wearing a mask, etc.) does not affect the formants of vowels.

In criminal investigations, individual identification is often done by identifying voices through telephone lines, and there are several methods for recording voices, such as recording directly from the rosette through a capacitor to remove noise other than the voice of the telephone, recording from the handset using a coupler, recording using a telephone pickup coil, and recording using a telephone-specific recorder. Based on the data recorded in this way, individual identification of voiceprints is carried out. To identify the personal characteristics of a voiceprint, listening tests, voiceprint collection, and voiceprint comparison are performed, and if there are many reproducible voiceprints and specific positional matches between the two voiceprints, the two voices are determined to be the voices of the same person. Individual identification of voiceprints is possible with a fairly high probability. Recently, research into automatic speaker identification using computers has been widely conducted, and examples of automatic speaker identification have been reported.

[Hideaki Sugie]

Source: Shogakukan Encyclopedia Nipponica About Encyclopedia Nipponica Information | Legend

Japanese:

個人的な声の特徴を機械によって分析し、模様化したものをいう。音声の生成過程は、呼気、発声、調音の3種類がおもな生理的要素となっている。すなわち、呼気運動で生じた呼気流は左右の声帯を振動させ、音波（声帯音源）を生じ、これが調音器官（口蓋(こうがい)、上下顎(がく)、舌、唇、鼻中隔など）によって形成された咽頭(いんとう)、口腔(こうくう)、鼻腔（一括して声道という）を通過することによって共鳴的特性をもった言語音声となる。この言語音声は声帯振動、声道、発話運動などの個人差によって個人的特徴が生ずる。一般に声紋というときは、サウンドスペクトログラム（ソナグラフ）で分析した結果をいう。サウンドスペクトログラムには分析フィルターによって、広帯域（300ヘルツ）と狭帯域（45ヘルツ）の2種類のパターンがある。広帯域フィルターでは、音声のフォルマント周波数（フォルマントとは、周波数分析によって得られるエネルギーの強い部分。周波数の低いほうから順次、第一フォルマント、第二フォルマントとよぶ）、およびフォルマント帯域幅など、声道に関連する変化のようすをみることができる。声道の個人的特徴は、おもに高次フォルマント（第三、第四フォルマント以上）の構造、およびフォルマント帯域幅などに現れてくる。一方、狭帯域フィルターでは、母音などの声帯振動の基本周波数（ピッチ周波数）の調和構造をみることができ、声帯音源に関連する変化が観察される。声帯音源の個人的特徴は、平均の基本周波数、基本周波数の規則的変化、および振動波形の違いとして現れる。

　声紋の識別において、親子、兄弟、姉妹、双生児などの声紋は類似している。これは発声、調音器官の固有の長さ、大きさ、運動などが類似しているためである。大人と子供、男と女などの声の違いも、発声・調音器官が主要な相違の原因である。一般に女性の声は、男性の声に比較してピッチ（音の高低）が高く、その分析は男性の声よりも困難であるとされているが、高いピッチをもった音声は500ヘルツの帯域フィルターを用いるとよい分析結果が得られる。また、同一の話者が音の高さ、強さなどを変えた場合、音声の波形は変化するが、全体の変わり方は、ほぼ変えない場合と類似した規則性を示す。さらに修飾発声（鼻つまみ、マスクかけなどをした発声）をしても、母音のフォルマントには影響がないことなどが知られている。

　犯罪捜査上での個人識別では、電話回線を通した声の識別が多く、その録音には電話の音声以外の雑音を除去できるように、ローゼットからコンデンサーを介して直接録音する方法、カプラーを用いて受話器から録音する方法、電話ピックアップコイルを用いて録音する方法、電話専用録音機を用いて録音する方法などがある。このようにして録音した資料を基に、声紋の個人識別を行っている。声紋の個人的特徴を識別するには、聴取試験、声紋採取、声紋比較などを行い、両者の声紋の再現性、特異的な位置の一致などが数多く存在すれば、両者の音声は同一人の音声と判断される。声紋の個人識別はかなり高い確率で識別可能である。なお、最近ではコンピュータを用いた自動話者識別の研究が広く行われ、自動的に話者の識別を行った例も報告されている。

［杉江秀明］

出典　小学館　日本大百科全書(ニッポニカ)日本大百科全書(ニッポニカ)について　情報 | 凡例

<<: Ximen Qing (English spelling)

>>: Shèng Mào yè (English spelling)