Phonetics - Onsei-gaku (English spelling)

Japanese: 音声学 - おんせいがく（英語表記）phonetics

Etymologically, the term is derived from the Greek word φων (sound) and a suffix meaning science in general, and is one of the empirical sciences that attempts to study speech, that is, the language sounds that humans use as a means of communication, from a natural scientific perspective.

The process from production to reception of speech sounds can be roughly divided into three stages: (1) the process in which the speaker produces speech sounds using the so-called speech organs such as the mouth, nose, and throat; (2) the process in which the sounds propagate through the air as sound waves; and (3) the process in which the sounds are heard and recognized by the listener's auditory organs.

Accordingly, the research fields can be divided into three areas: (1) physiological (or articulatory) phonetics, (2) acoustic phonetics, and (3) auditory phonetics.

[1] Physiological phonetics is the study of which parts of the speech organs are moved and how they are moved in order to produce the speech sounds used in a given language system, and as an example, the differences shown in Figure A are pursued. However, thanks to the development of various devices such as electropalatography, it is no longer enough to simply capture the static position of articulation of the speech organs, but it is also becoming increasingly common to dynamically capture the ever-changing articulatory movements themselves, and many research results are being produced.

[2] Acoustic phonetics is a field that mainly investigates the acoustic aspects of speech sounds, and has made great strides since the end of World War II thanks to the development of a wide variety of equipment. The most widely used of these is the sound spectrogram shown in Figure B , which allows the frequency and amplitude distribution of speech sounds to be analyzed in a very short time. Furthermore, the gestalt of the stripes and shading distribution on the record is not only useful for distinguishing speech sounds, but has also been found to be significant in identifying individual differences. In Japan, this has attracted the attention of the National Research Institute of Police Science and other institutions since the "Yoshinobu-chan" case, and is now called a "voiceprint," meaning a personal characteristic of the voice comparable to a fingerprint, and is used as a reference for criminal investigations. Meanwhile, advances in computers have given rise to the method of Analysis by Synthesis (abbreviated as AbS), which does not simply passively analyze given data, but also uses synthetic sounds that artificially combine predicted elements. The principle of the AbS method is to approach the truth by repeating a feedback process in which the synthesized sound (output) from a hypothesized generative model is compared with the analysis data (input) in the form of synthesis → comparison → control, and the main parameters of the generative model are controlled based on the differences that arise during this process. In recent years, automatic package sorting by voice and devices that type out characters exactly as they are spoken have been developed, all of which are based on voice recognition devices based on the aforementioned research results. In the future, developments in this field are expected to bring immeasurable benefits to people with physical disabilities, such as being able to open doors and switch televisions and change channels just by speaking from the comfort of their own homes.

[3] Auditory phonetics is the youngest branch of phonetics, but listening experiments using spectrograms and other tools are being actively conducted. For example, research in the United States has shown that babies one month after birth can distinguish between [ba] and [pa] using the Voice Onset Time (VOT) value, which is calculated by focusing on the gap between the release of closure and the start of vocal cord vibration.

[Jyosei Hyakutaro]

Research Methods

Broadly speaking, there are subjective (or auditory) methods and objective (or instrumental) methods. The former involves detailed introspection of one's own articulation, confirmation of the movement of the speech organs, and at the same time, training one's ears to accurately grasp the articulation of others, so observation requires the use of both one's eyes and ears. The latter involves instantaneous recording of speech using various devices, and includes analysis and synthesis from an acoustical perspective, as well as various dynamic studies from a physiological perspective. This method is expected to continue to develop in the future due to the rapid development of equipment, but considering the large time, space, and financial constraints of using the equipment, it is desirable for those who aspire to phonetics to first master the subjective method and then try to use the objective method in combination. To do this, one must first master phonetic symbols as a method of describing speech, which is a dynamic phenomenon that changes from moment to moment.

[Jyosei Hyakutaro]

unit

Just as written language distinguishes between units such as sentences, clauses, phrases, and words, spoken language also assumes several units that form a hierarchical structure roughly as shown in Figure C. If we assume that a unit that gives some kind of break to the sound that is carried out sequentially along the time axis is a "segmental unit," and conversely, that an element that gives various deformations (transformations) to a continuous sound chunk, such as pitch, strength, and length, is a "prosodic unit," then the largest segmental unit would be ①, which gives a clear break to the flow of breath. However, because the interruption of the flow of breath can also occur due to mere physiological necessity, such as shortness of breath, in reality the scale of the break can range from a single sound at the minimum to as long as the breath lasts, making it very unclear. Therefore, the word "break" is defined as "a natural rest in normal writing style," and is distinguished from a "pause" (see ③ below) that is added with a special intention, such as when taking notes or correcting a hesitation. Furthermore, when looking very roughly, there can be large and small breaks, so we will distinguish between these by calling the former "maai" and the latter "kukiri".

Although "pause" and "phrase break" often seem to coincide with the duration of the break, the essential difference between the two lies in whether or not the speaker is in a state of preparation for the next utterance. In other words, no matter how long the break lasts, if it is in the middle of an utterance and is interpreted as a state of preparation for the next utterance, it is called a "phrase break" and is considered to belong to ②, which will be described later. On the other hand, a break that is in a state of completion for the utterance is called a "pause" regardless of its length and is considered to belong to the level of ①. On the other hand, at the prosodic level, intonation is considered to be added to this unit. ② has the function of making a "small" break in the expiratory flow in terms of segmentation, and is the unit in which the above-mentioned "phrase break," that is, a break that is in a state of preparation for the next utterance, is placed. Traditionally, it has generally been called a stress group, but since this name is inappropriate for languages that do not have stress accent, such as Japanese, the author proposes the term shown here. Furthermore, if we were to specifically show the differences from ① by symbolizing them as (B) for spacing and (A) for phrase breaks, we get the following.

(From Natsume Soseki's "Ten Nights of Dreams")
You are a samurai. (A) "There is no way a samurai wouldn't be able to attain enlightenment," said the priest. (B) ... Omitted ... "If you are dissatisfied, bring me proof that you have attained enlightenment," he said, turning away from him. (A) How outrageous. (B)
As is clear from the above examples, "pauses" and "breaks" do not necessarily coincide with punctuation marks. On the other hand, at the prosodic level, they are units in which accent clauses and elongations can be placed. An accent clause is a concept that refers to a unit that is created when a syllable that carries an accent is placed at the core and is followed by a series of syllables that are in some sense inferior from a relative standpoint before and after it, such as "hashiga" for "hashi" (chopsticks) and "phonetics is all that" for "phonetics." It is a unit in accent theory that corresponds to a "phrase" in grammar. Elongation is,
For a long time / For a long time Lie / Lie~~
It refers to something like the latter in each group such as above. ③ is a unit that is required when intentionally cutting a phrase short, or when pronouncing or writing slowly and carefully, and is called a semantic paragraph in the sense that it is the minimum necessary to maintain a coherent meaning. Therefore, the break that occurs here is interpreted by the author as an unnatural break added with a special intention, and will be called a "pause". Furthermore, because of the definition given above, the duration of the pause itself does not matter. On the other hand, at the prosodic level, prominence can be placed. Prominence is,
What I want is neither fame nor fortune. It's you.

In the case of anata, it generally refers to the sum of strength, pitch, tension, etc. However, there are also cases where negative prominence is used to achieve an effect by lowering the tone of a single sound (or a series of single sounds). ④ A syllable is a unit given to a certain group formed by a single sound (or a series of single sounds). There are two types of syllables: "phonological syllables" that are at the everyday level that anyone can easily take out, such as five, seven, and five when writing a haiku in Japanese, and "phonetic syllables" that are at the level where scholars cannot agree even if physiological, acoustic, and auditory aspects are fully utilized. Usually, the "peak" of the accent exists at this level. ⑤ A single sound is the smallest unit assumed in phonetics, and corresponds to individual vowels, consonants, etc. However, the above assumed units often only fully exert their effects when supported by phonology. Therefore, phonetics and phonology are closely and inseparably related to each other, like the front and back of the same page.

[Jyosei Hyakutaro]

Research History

Human interest in speech has a very long history, with the earliest records dating back to around 2000 BC in Egypt, where documents were left pointing out stuttering and pronunciation errors. The Old Testament (Isaiah, Chapter 32, Verse 4) also contains a description of pronunciation techniques for speech. Research on articulation was conducted in ancient India in the 4th century BC, but it did not see any remarkable development until it was introduced to Europe in the 19th century. Modern phonetics was founded on the development of comparative linguistics based on discoveries in Sanskrit, and blossomed in Germany in the mid-19th century. After that, through research on both physiology and physics, and the establishment of the International Phonetic Alphabet by the International Phonetic Society, phonology, which provides a powerful theoretical foundation, was developed in the 1920s. The sound spectrograph, invented in the United States during World War II, promoted acoustic research into speech sounds, and more recently, advances in computers and the development of a wide variety of equipment have made it possible to conduct highly accurate research into acoustic and physiological aspects. Given these circumstances, the area in which we expect the greatest developments in the future is research into hearing. In Japan, the Phonetic Society was founded in 1926 (Taisho 15), and today there are organizations such as the Acoustical Society of Japan and the Japanese Society of Phonetic and Linguistic Medicine, and research is being actively pursued.

[Jyosei Hyakutaro]

"Phonetics" by Hattori Shiro (1984, Iwanami Shoten)" ▽ "Phonetics" by Josei Hyakutaro and supervised by Kindaichi Haruhiko (1982, Apollon Music Industry Co., Ltd.)" ▽ "Speech Science" edited by Fujimura Yasushi and supervised by Oizumi Mitsuo (1972, University of Tokyo Press)" ▽ "Speech Information Processing" edited by Hiki Shizuo (1973, University of Tokyo Press)"

[Reference] | Phonetic symbols | Speaking

Physiological phonetics (differences in tongue tip contact) [Figure A]
©Shogakukan ">

Physiological phonetics (differences in tongue tip contact) [Figure A]

Acoustic phonetics (sound spectrogram of the Tokyo dialect of Japanese) [Figure B]
©Shogakukan ">

Acoustic Phonetics (Sound of the Tokyo Dialect of Japanese)

Phonetic units (Figure C)
©Shogakukan ">

Phonetic units (Figure C)

Source: Shogakukan Encyclopedia Nipponica About Encyclopedia Nipponica Information | Legend

Japanese:

語原的にはギリシア語のφων（音）と、科学一般を意味する接尾辞からなる術語で、音声、すなわち人類がコミュニケーションの手段として用いている言語音を、自然科学的に研究しようとする経験科学の一つである。

　言語音の産出から受容に至る過程は、おおむね、(1)話者が口、鼻、のどなどのいわゆる音声器官organs of speechによって言語音を産出する過程、(2)音波として空気中を伝播(でんぱ)する過程、(3)聴者の聴覚器官によって聴き取られ認知される過程、の3種に分類される。

　したがって、研究分野もこれらに対応して、〔1〕生理（または調音）音声学physiological or articulatory phonetics、〔2〕音響音声学acoustic phonetics、〔3〕聴覚音声学auditory phoneticsの3分野に分けられる。

〔1〕生理音声学は、当該言語体系内において用いられている言語音を産出するためには、音声器官のどの部位をどのように運動させるのかという点を研究するもので、一例を示せば、図Aのような差異が追究されることになる。ただしエレクトロ・パラトグラフィーなどをはじめとする種々の機器が開発されたおかげで、ただ単に音声器官の調音位置を静的なものとして押さえるだけでなく、時々刻々と変動する調音運動自体を動的に捕捉することも盛んに行われており、数々の研究成果をあげつつある。

〔2〕音響音声学は、もっぱら言語音の音響学的側面を追究する分野であるが、第二次世界大戦後、種々さまざまな機器が開発されたおかげで、長足の進歩を遂げた。なかでももっとも利用度の高いのは図Bに示したサウンド・スペクトログラムで、これによれば、言語音の周波数と振幅分布がきわめて短時間に分析できる。さらに記録図上の縞目(しまめ)と濃淡分布のゲシタルトは、単に言語音の弁別に役だつだけでなく、個人差の識別にも有意であることが判明したため、わが国でも「吉展(よしのぶ)ちゃん」事件を契機に、科学警察研究所などの注目するところとなり、現在では指紋に匹敵する声の個人的特徴という意味で「声紋」とよばれ、犯罪捜査の参考にも利用されている。一方、コンピュータの進歩は、所与のデータを単に受動的に分析するだけでなく、逆に予見されるエレメントを人為的に組み合わせた合成音を併用するAnalysis by Synthesis（A-b-Sと略称する）の手法を生んだ。A-b-S法の原理は、合成→比較→制御という形で、仮説としてたてられた生成モデルによる合成音（出力）と分析資料（入力）を比較し、その際に生ずる差異に基づいて、生成モデルの主要パラメーターを制御するといったフィードバック過程の反復によって、真理に迫ろうとするものである。近年、音声による荷物の自動仕分けや、しゃべったとおりに文字を打つ装置などが開発されているが、これらはいずれも前述の研究成果を踏まえた音声認識装置によっている。今後、いながらにしてしゃべるだけでドアが開き、テレビのスイッチやチャンネルの切り替えができるなど、体の不自由な人たちにとっても、この分野の発展は計り知れない恩恵をもたらすことが期待される。

〔3〕聴覚音声学は、音声学のなかではもっとも後れた分野であるが、スペクトログラムなどを併用した聴取実験が盛んに行われており、たとえば、閉鎖の解除と声帯振動開始時とのギャップに注目して求められたVOT（Voice Onset Time）値を用いて、生後1か月を経過すれば［ba］と［pa］が弁別できるとするアメリカの研究成果などがあげられている。

［城生佰太郎］

研究方法

大別すると、主観的（または聴覚的）方法と客観的（または器械的）方法とがある。前者は、自己の調音を詳細に内省して、音声器官の運動を確かめると同時に、耳を練磨することによって、他人の調音も正確に把握するよう努力する方法で、観察に際しては耳のほかに目も同時に働かせる必要がある。後者は、種々の機器を用いて瞬間的な音声を記録する方法で、音響的側面からの分析や合成、ならびに生理的側面からの種々さまざまな動的研究を含む。この方法は、日進月歩の機器開発によって、今後ますます進展が予測されるが、装置の利用には時間的、空間的、経済的制約が大であることを思えば、音声学を志す者は、まず主観的方法を修めたうえで、客観的方法も併用できるよう心がけることが望ましい。そのためには、なによりも、瞬間的に変動してしまう動的現象である音声を記述する方法としての音声記号を身につけなければならない。

［城生佰太郎］

単位

文字言語で文、節、句、単語などの単位が区別されているのと同様、音声言語でもおよそ図Cに示すような階層構造をなすものとして、いくつかの単位が仮定されている。いま、時間軸に沿って順次遂行されていく音声に、何らかの切れ目をつける単位を「分節上の単位」、逆に一続きの音塊内でこれに対する高低、強弱、長短など種々さまざまなデフォルメ（変形）を与える要素を「韻律上の単位」と仮定すれば、分節上最大の単位は、呼気の流れにしっかりとした切れ目をつける①となる。ただし呼気流の中断は、単なる生理的必然――すなわち息切れなど――によっても生ずるので、事実上その規模は最小限唯一の単音から、最大限息の続く限りまで存在することになってしまい、甚だ不明瞭(ふめいりょう)である。そこで「切れ目」ということばを、「ノーマルな文体における自然な休息」と定義して、これを書き取りとか言いよどみの訂正などのように、特殊な意図のもとに加えられた「休止」（後述③参照）と区別する。さらに切れ目にも、ごく大ざっぱにみれば大きな切れ目と小さな切れ目がありうるので、前者を「間合い」、後者を「句切り」と名づけて、これらも区別することとする。

　なお「間合い」と「句切り」は、多くの場合切れ目の持続時間と一致するかにみえるが、両者の本質的差異はむしろ次の発話を準備する体勢をとっているか否かの点にある。すなわちたとえどんなに切れ目が永続しようとも発話の中途にあって、次の準備体勢をとっていると解釈されるものは「句切り」とよんで、後述する②に属するものとみなす。反対に発話に対するいちおうの完了体勢にある切れ目は、長さと無関係に「間合い」とよび、①のレベルにかかるものとみる。一方、韻律上のレベルではイントネーションがこの単位に加わるものと考えられる。②は分節上、呼気流に「小さな」切れ目をつける機能を有するもので、前述した「句切り」、すなわち次の発話の準備体勢を有する切れ目が置かれる単位である。従来は一般に強め段落（stress group）とよばれてきたが、日本語などをはじめとした、強さアクセントをもたない言語には不適当な名称なので筆者はここに示した術語を提唱している。なお、①との相違を、間合い→(B)、句切り→(A)のように記号化して具体的に示せば次のようになる。

　（夏目漱石『夢十夜』より）
　お前は侍である。(A)侍なら悟れぬ筈(はず)はなからうと和尚(おしょう)が云つた。(B)…中略…口惜しければ悟つた證據(しょうこ)を持つて來いと云つてぷいと向(むこう)をむいた。(A)怪(け)しからん。(B)
　以上の例から明らかなように、「間合い」や「句切り」はかならずしも句読点符号と一致するとは限らない。一方、韻律上のレベルではアクセント節や引伸しなどが置かれうる単位である。アクセント節とは「箸(はし)」に対する「箸が」、「音声学」に対する「音声学こそが」のように、アクセントを担った音節を核として、前後に相対的見地から、何らかの意味で劣勢な音節が連続した際に生ずるひとまとまりをさす概念で、文法論上の「文節」に対応するアクセント論上の単位である。また、引伸しとは、
　　ズット　／　ズーット
　　ナガイ間　／　ナガーイ間
　　ウソ　／　ウッソ～～
などの各組における後者のようなものをさす。③は分節上、文句をわざわざ短くくぎったり、ゆっくりしたていねいな発音、書き取りなどを行う際に必要とされる単位で、まとまった意味を壊さないための最小限度という意味において意義段落と称される。したがってここに生ずる跡切れを、筆者は特殊な意図のもとに加えられた不自然な跡切れと解釈し、「休止」とよぶこととする。なお前記のような定義を与えた関係上、休止そのものの持続時間は問わないことになる。一方、韻律上のレベルではプロミネンス（卓立）が置かれうる。プロミネンスとは、
　　私が欲しいのは名誉でも財産でもありません。アナタなんですよ。

におけるアナタのように、強さ、高さ、緊張などの総和を一般にさす。ただしマイナスのプロミネンスといって、そこだけ他よりもトーンを落とすことによって効果をあげる場合もある。④音節は、単音（または単音連続）が形成する、あるまとまりに対して付与された単位で、たとえば日本語で俳句をつくるときに五、七、五というように、だれにでも簡単に取り出せる日常茶飯的レベルの「音韻論的音節」と、生理的・音響的・聴覚的側面を総動員しても、なおかつ学者間で意見がまとまらない「音声学的音節」の2種が区別される。なお通常、アクセントの「頂点」はこのレベルに存在している。⑤単音は、音声学で仮定される最小単位で、個々の母音、子音などに該当する。ただし、以上に仮定した諸単位は、いずれも音韻論に支えられてこそ初めてその効果を余すところなく発揮することが少なくない。したがって、音声学と音韻論は互いに密接不可分な、同一紙面の表裏のような関係にあるということになる。

［城生佰太郎］

研究史

音声に対する人類の関心はきわめて歴史が長く、最古の記録としては紀元前2000年ごろにエジプトですでに、吃音(きつおん)や発音の誤りに関する指摘をした文献が残されている。また『旧約聖書』（「イザヤ書」32章4節）にも演説をする際の発音法などについての記述がある。さらに前4世紀には古代インドで調音面に関する研究が行われていたが、19世紀にこれがヨーロッパに紹介されるまでは、瞠目(どうもく)すべき発展をみることはなかった。近代音声学はサンスクリットの発見に基づく比較言語学の発達によって地盤が築かれ、19世紀中葉のドイツで開花した。その後、生理・物理両面での研究、国際音声学協会による国際音声記号の制定などを経て、1920年代には強力な理論的支えである音韻論を発達させた。第二次世界大戦中アメリカで発明されたサウンド・スペクトログラフは、言語音の音響学的研究を促進し、さらに近年におけるコンピュータの進歩と種々さまざまな機器の発達は、音響面および生理面にきわめて精度の高い研究を可能ならしめている。これらの状況から、今後もっとも発展が期待されるのは、聴覚面における研究である。わが国では1926年（大正15）に音声学協会（49年に日本音声学会と改称）が創立されたのを初めとして、今日では日本音響学会、日本音声言語医学会などが立ち並び、盛んに研究が進められている。

［城生佰太郎］

『服部四郎著『音声学』（1984・岩波書店）』▽『城生佰太郎講・著、金田一春彦監修『音声学』（1982・アポロン音楽工業社）』▽『藤村靖編著、大泉充郎監修『音声科学』（1972・東京大学出版会）』▽『比企静雄編『音声情報処理』（1973・東京大学出版会）』

[参照項目] | 音声記号 | 話す

生理音声学（舌尖の接触の違い）〔図A〕

音響音声学（日本語〈東京方言〉のサウン…

音声学上の単位〔図C〕

出典　小学館　日本大百科全書(ニッポニカ)日本大百科全書(ニッポニカ)について　情報 | 凡例

<<: Phonetic sign - Onseikigo (English notation)

>>: Ounce - Onsu (English spelling) ounce