Tone, stress and phythm in spoken Chinese
汉语口语的声调, 重音, 及韵律
Edited by Hana Trˇísková 廖敏 主编
The present volume results from The International Workshop, Tone, Stress and Rhythm in Spoken Chinese held in Prague in May 1999. The workshop was jointly organized by the CCK International Sinological Center at the Charles University, and the Oriental Institute of the Academy of Sciences of the Czech Republic.
In comparison to studies on written languages, research on spoken languages does not have such a long history. In recent years we can observe a growing interest especially in suprasegmental features of languages (one of the reasons being the needs of rapidly developing speech technologies). The above holds good for Chinese linguistics too. The aim of the Prague workshop was to bring together specialists working in this field. The meeting proved that substantial progress has been made in the past years, although the approaches to this subject are diverse. Besides the importance of the topic itself, there were also ´historical´ reasons for organizing this event in Prague. The tradition of phonological studies carried out by the Prague Linguistic School reaches back to the 1930s. Furthermore, research on Chinese phonology and phonetics was conducted here in the course of several decades by Prof. Oldřich Švarný, who turned eighty last year. This volume is dedicated to him.
The workshop offered an international context for Švarný´s work, which is pioneering in many aspects. His research on Mandarin prosody, launched in early 1950s1 got a major impetus during his stay at the University of California at Berkeley in 1969/1970. Švarný carried out an instrumental analysis of fluent Chinese speech in the Phonology Laboratory of Prof. William S. Y. Wang. He experimentally verified several levels of stress in Pekinese and acoustic cues for segmentation. In subsequent research Švarný studied accentuation of compounds. Relying on broad statistics, he outlined seven ´accentuation types´ of disyllabic words and described major factors conditioning their variability. Švarný´s studies on Madarin prosody resulted in a design of prosodic transcription, based on pinyin. The system has a strong theoretical base and was successfully tested in the teaching process. It should be noted that Švarný´s scholarly erudition was always inseparable of his willingness to take up educational responsibilities. Thanks to him, Czech students of Mandarin have teaching materials at their disposal, which stand up to theoretical standards in their description of prosody.
A unique feature of all of Švarný´s language teaching works is voluminous exemplificative material available both on tapes and in prosodic transcription. Numerous attempts to mark prosodic features of Mandarin speech for pedagogical purposes were made in the past (e.g. N. A. Speshnev: “Fonetika kitajskogo jazyka”, Leningrad 1970; “Practical Chinese Reader”, Beijing 1988; Wu Jiemin: “Xinbian putonghua jiaocheng”, Hangzhou 1988). However, Švarný is undoubtedly the first one to implement a prosodic transcription on such a large scale and in such a systematic way. The ability to employ theoretical findings in pedagogical materials compiled for practical use is one of the major Švarný´s merits.
At the end of 1990s, Švarný published an extensive dictionary ”Učební slovník jazyka čínského” (Learning Dictionary of Modern Chinese2) in four volumes3. This work has two unique features distinguishing it from a standard dictionary. First, entries (i.e. characters in a certain reading) are analyzed into semantic fields – yusus4. Every yusu is equipped with numerous examples of both free and/or bound usage. The second major objective of the dictionary is to describe the prosody of Mandarin utterances. Prosodic transcripts of 16 000 exemplificative sentences5 make up an essential part of the dictionary. Prosody is viewed as a mean for expressing numerous linguistic functions beyond lexical tones, including accentuation of compounds, sentence stress, focus, sentence intonations etc. This voluminous work has to be considered as the outcome of Švarný´s lifelong research on Chinese grammar and prosody.
The papers presented at the workshop (altogether sixteen) touched upon the subject of Mandarin prosody from various angles – they dealt with tonal variations in connected speech, speech rhythm and nature of stress, comparison of accent phenomenon across Chinese dialects, rhythm as a stylistic device, intonation, relationship between prosody and grammar, or prosodic annotation of a speech database. Some contributions offered a historical perspective, or a language teaching perspective of the topic. To make the present volume coherent, the editors decided to choose out of all papers mainly those dealing with the experimental phonetics. However, it has to be pointed out that the papers not included here brought many new ideas and substantially contributed to the overall success of the workshop.
Human speech is materialized in sound waves. However, the communicative information encoded in acoustic waveforms is extremely complex. To reveal the contribution of particular factors influencing the prosodic shape of Mandarin utterances and to find proper tools for its description are among the major research objectives of the studies on Chinese prosody. While encouraging results were achieved in many aspects (e.g. the effects of downstep and declination, or the interplay of adjacent tones) are rather well documented, other effects are not profoundly explained yet (e.g. stress assignment rules, the interplay between prosody and grammar, pragmatic and emotional functions). The authors of the following pages concentrate on various aspects of prosody of Mandarin (i.e. of Standard Chinese, only in case of Chang Yueh-chin Taiwan Mandarin): sources of F0 variations of lexical tones (Xu, Shih), rhythm (Cao), links between prosody and grammar (Třísková and Sehnal, Chang, Feng), and historical development of stress rendering (Endo). The speech materials on which the experimental studies are based are either read speech recorded in laboratory conditions (Shih, Xu, Chang, Třísková and Sehnal), or TV news and broadcasting (Cao).
In all languages, prosodic features are carried by three major acoustic parameters: fundamental frequency, intensity, and duration. However, the specific ways of their utilization for expressing particular linguistic functions vary. To give an example – while in some languages, for instance in Czech, duration has a distinctive function at the segmental level, in Mandarin the increased duration of a syllable typically signals stress. Yet another example: while in non-tone languages we are accustomed to attribute the F0 modulations primarily to the factors rooted at sentence level (such as sentence intonation), in Mandarin, pitch is functionally used also on the lowest prosodic level – level of syllables – to distinguish meanings of various yusus. Sometimes superficial observers wrongly assume there is no room for sentence intonation in Mandarin, as both tones and intonation are manifested by pitch changes. However, as Xu and Shih point out, sentence intonation, focus and tones are realized by different aspects of F0 contours (tones are shaped by local F0 contours, while focus and intonation are expressed by pitch range variations). The term ´intonation´ is commonly used as a general term covering pitch variations of speech. Xu suggests that there is in fact nothing left for an independent entity called intonation, once various F0 shaping factors are identified.
Tone in Mandarin is characterized by a set of acoustic features, distinctive F0 curve being the most striking one. In connected speech, canonical forms of tones undergo more or less dramatic changes in all acoustic parameters. Thus, one of the research objectives is to find out how lexical tones are realized in utterances, and to disclose the sources of their behavior. Xu and Shih make a substantial contribution to this issue in their papers. Both of them are focused on F0 variations. Xu sheds light on the complexity of factors affecting F0 curve, identifying and categorizing them. Lexical tones, prosodic structure, syntax, pragmatics and emotions are listed among major voluntary factors. On the other hand, involuntary factors he defines as the limitations of the articulators. Xu demonstrates how some of the voluntary and involuntary factors interact with one another in producing F0 contours. His experiments deal with three types of effects, related to different linguistic levels: (1) pitch contour variations due to adjacent tones, (2) interplay of tone and focus, (3) mechanism of downstep and declination. Xu concludes that to obtain a clear picture of F0 variations in Mandarin, the distinction between communicative intent (reflected in voluntary factors), and involuntary articulatory constraints always needs to be made.
Shih attempts to isolate effects of individual factors for intonation analysis and data normalization, and to combine them for intonation generation. She draws a hierarchical prosodic structure, where particular layers of intonational effects are rooted in different linguistic levels. Similarly to Xu, various effects are treated separately as additive components contributing to the surface F0 contour. Shih analyzes the segmental effects, rooted at the segment level, and the declination effect, rooted at the sentence level. The results of experiments encouragingly show that segmental effects are quite predictable. Experiments on declination effect observed its interaction with sentence length and focus. Concluding experiment on F0 generation was done by summing various effects. The clear advantage of Shih´s model of F0 normalization and generation is its modularity, which allows exploring the effect of particular factors separately and to utilize results obtained from other studies.
Speech rhythm is related to both speech production and perception. Perceived rhythmic organization of speech usually corresponds to certain acoustic-phonetic correlates. However, there is no straight correspondence between the measured values and the perceived qualities. To paraphrase Xu, we can suggest that there is no independent entity of ´rhythm´. It is just a cover term for all relevant factors contributing to the overall rhythmic percept. Speech rhythm is often defined and treated in various ways. We still lack a generally accepted notion (Švarný´s works6 offer one of the scarce systematical concepts of rhythm in Mandarin). It is commonly recognized that speech rhythm forms a hierarchy. However, there are differences between the number of hierarchical levels that particular authors recognize. Švarný marks two rhythmical levels in his prosodic transcription: ´rhythmical segment´ (composed of disyllables and/or odd syllables), and ´colon´. Cao recognizes three hierarchical levels above the syllabic level: ´minor rhythmic unit´, ´intermediate rhythmic chunk´ and ´major rhythmic group´ (corresponding to prosodic word, prosodic phrase and intonation phrase of metric phonology). She attempts to find acoustic cues for the boundary markers of these rhythmical units, and the coherence features bonding together their components. Cao´s hierarchy of junctures is supported by pitch and duration measurements and perception tests as well. As a material she uses TV news and broadcasted speech. Mandarin Chinese is traditionally viewed as a stress-timed language with a strong tendency towards isochrony. However, the theory of isochrony as such has its critics. Cao claims that she found no evidence for so called isochrony in Mandarin (unlike Švarný, who strongly advocates plausibility of the concept of isochrony for Mandarin). Further on, Cao questions the relationship between prosody and syntax. Similarly to other authors, she concludes that the correspondence between prosody and syntax is not direct.
Třísková and Sehnal approach the issue of rhythm and its relationship with grammar from the angle of corpus annotation and statistical processing. They introduce the PALM software, designed to grasp and analyze the basic rhythmic structure of Mandarin utterances. A small corpus was prosodically transcribed and annotated for various prosodic and grammatical features. Třísková explains theoretical basis of her prosodic transcription, which was partly inspired by Švarný´s system (a simplified version is proposed for pedagogical purposes). Statistical analysis of the annotated database is carried out, observing various combinations of prosodic and/or grammatical features of either syllables, or words. Several examples of utilization are offered. Třísková´s examples deal with stress and tone features of the syllables depending on speech tempo, Sehnal is focused on mutual dependence between the grammatical features of words, and their stress features. The PALM project is one of a few labeling systems devised for Mandarin which includes prosodically labeled data. The software can be applied to a larger database to study the links between rhythmical structure of the Mandarin utterance, its grammatical structure and speech tempo.
Feng´s paper deals with the historical development of ba sentences, explaining synchronic phenomena with diachronic studies. Prosody is viewed here as an important factor contributing to syntactic changes. Besides the links of stress assignment rules to the syntactic structure of the sentence, Feng also discusses the relationship of these rules to the semantic structure. He suggests that ba sentences spread out to natural speech from poetry while changing their structure, semantics and consequently the stress rules in the course of this process. Feng argues that the ba construction first appeared in early Tang poetry. Ba sentences of this type [ba NP V] had the main stress falling on NP. With further development the structure and consequently the semantics of ba constructions changed – the predicate became more complex, expressing a delimitative event. However, delimitation requires the object to be specific. In natural speech, the inevitable result was the loss of stress of the NP. Ba became out of focus and was reduced to an empty verb. In natural speech this was grammaticalized as a new pattern with an unstressed NP and stressed predicate.
Chang studies prosodic cues for disambiguation. It is well known that Mandarin Chinese is highly homonymous. This phenomenon has several sources – in particular a restricted choice of syllabic structures, the rarity of polysyllabic words (according to ”Xiandai hanyu pinlü cidian” 1986, in colloquial speech about 75% of word occurrences fall to monosyllabic words), lack of inflection etc. Consequently, sometimes the sole phonetic information is not sufficient to distinguish between unambiguously structured words/phrases. On the other hand, there can be pairs of phrases or words, which are structurally ambiguous, and prosodic features of speech can help to disambiguate them. Chang is testing both lexically ambiguous phrases, and structurally ambiguous phrases. The experimental data showed no significant acoustic difference for lexically ambiguous phrases. For structurally ambiguous phrases, though, she found differences in duration in some types of syntactic structures (while no consistent differences in F0 were discovered). The perception tests showed that if there was an acoustic difference, the subjects could perceive this difference well and use it to disambiguate sentences. Duration proved to be a more robust acoustic cue than F0. If there was no significant acoustic difference, the subjects tended to rely on sentence frequency or word frequency to disambiguate.
Endo offers a historical perspective to the reflection of stress in Mandarin. He shows historical evidence of the existence of stress phenomenon. The evidence of stress can be found in old poetry (i.e. the rhyming features), or transcription materials between Chinese and some other language (e.g. Tibetan, Sanskrit, Khotan, Persian, Korean, and Russian). The data show the existence of stressed and unstressed versions of pronunciation of the same word. The stress-conditioned phonetic change in many cases led to a phonological change, where doublet readings were codified and eventually written by two different characters. Introducing various transcription sources, Endo shows that stress-related phenomena were not only conveyed in the transcriptions, but were also actively recognized and described by the authors (the earliest description dating back to the Ming dynasty). Other interesting sources are dictionaries and language teaching materials. Endo compares transcription systems as used in several textbooks and other materials (Seidel 1901, Arendt 1918, and Chinese Linguaphone 1928). He points out that the modern dialects also provide promising source for the reconstruction of the history of stress in Chinese.
Last but not least: the fact that many of the prosody-related issues do not have a satisfactory solution in linguistic research reflects upon the state of the art of dictionaries, textbooks and methodology of teaching Chinese as a second language. For instance, one of the issues of Mandarin prosody frequently glossed over by lexicographic works is the variation of stress in compound words. Number of exceptions which attempt to reflect the stress features of compounds can be found, though. Perhaps the earliest example of such dictionary is the ”Russko-kitajskij slovar” (Russian-Chinese dictionary, Isaia 1867) quoted by Endo. One of the more recent works is ”Kitajsko-russkij slovar” edited by I. M. Oshanin (Chinese-Russian Dictionary, Moscow 1955), or ”Chugoku jiten” by Kuraishi Takeshiro (Chinese-Japanese dictionary, Tokyo 1966). Švarný´s dictionary mentioned above is the most recent case coping with the problem.
If we take a look at the language teaching materials, we note that even the phenomena which were already successfully described by the linguists often do not find proper treatment in these practical areas. E.g. a third tone is traditionally brought out in the textbooks in its canonical form as high-low-high, instead of being primarily described as a low tone. Insufficient rendering of changes of citation forms of tones in connected speech regularly causes puzzlement to the elementary students of Chinese. I recall a liuxuesheng complaining that she had to spend arduous time at school to learn the lexical tones, yet as soon as she walked out of the classroom, she got impression the Chinese did not actually speak in tones at all! This little story indicates that there must be something wrong with our teaching methods. Modern methodology of teaching Mandarin phonetics requires more frequent contact between those working in theoretical research and the language teaching community.
The advantage of workshops and seminars on a small scale is undoubtedly a chance to become very intense and focused. The organizers trust that the Prague event, hosted by the ancient walls of the Charles University, was such an example. It undoubtedly helped to establish the contacts among the foremost researchers engaged in the discipline and provided a distinct perspective on the field. The participants came up with a broad variety of views and new linguistic data. The future task is to integrate them in a systematic framework. The following pages offer an insight into the field from different angles – be it experimental phonetics, studies on grammar, language teaching or historical development. At the same time, hitherto unresolved problems are pointed out. We hope this volume can serve as a stimulation for future research.
Article 文章
Abstract 摘要
This paper discuss various sources of tonal variations in connected speech. It is argued that these sources are better understood when they are viewed as either voluntary or involuntary. Voluntary sources are those stemming from linguistic/paralinguistic demands, and involuntary sources from articulatory constraints. Linguistic/paralinguistic demands represent various communicative functions on the one hand, but are associated with articulation-specific pitch targets and pitch ranges on the other. These pitch targets and pitch ranges are what speakers actually intend to implement in their speech; but such implementation is constrained by the limitations of the articulators that actually produce the fundamental frequency of voice. Observed variations in F0 contours in connected speech thus reflect different levels of linguistic/paralinguistic demands as well as their interaction with various articulatory constraints.
本文讨论影响连续语流中声调变异的各种因素,并试图论证,要理解这些因素最好把它们分为主观和客观因素。主观因素来自于语言功能,客观因素来自于发音器官的局限。语言功能一方面对应于各种交际需求,一方面又跟具体的音高目标和调域相对应。这些音高目标和调域时说话人力图实现的直接目标,但是他们的努力总是受到发音器官的种种局限。因此,我们在连续语流里观察到的基频曲线反映的是不同层次的语言功能跟各种发音局限相互作用的结果。
Abstract 摘要
Tone shapes in connected speech can be drastically different from their canonical citation forms. The variations are conditioned by many different factors, some have local effects and some have global effects. This paper identifies the sources of some effects, examining the scope and magnitude of these effects with experimental data, and exploring how the results can be modeled for both f0 generation and data normalization.
连续语句中的声调形态,往往与单字调有相当的出入。这些变化有一定的肇因。有的因素会造成局部的影响。有的因素则有长期的效应。要确切掌握这些因素,做成功的数据仿真并不容易。不过,我们可以利用中文声调的特性,解决一部分的问题。本文讨论两个实验,研究一些因素对声调影响的范畴与幅度,藉以探讨如何处理声调调型的生成及正规化。
Abstract 摘要
This study is a concerned with the rhythm of Mandarin Chinese. As the basis of the study, a set of speech materials was elected from TV news and broadcasting. Pitch and duration measurements were made through their spectrograms, and an informal perception test on rhythm unit division was conducted as well. This paper reports some preliminary results obtained here. The description concentrates on rhythmic structure, including the division of rhythmic chunks, the hierarchical organization, the coherency features within rhythmic units and the boundary markers between these units. In addition, some related issues are also discussed in general.
本文研究汉语普通话的节奏问题。研究的主要基础是对电视新闻联播和电台广播话语的实验分析,包括主观听辨试验和客观的音高和时长测量。根据初步的实验结果,重点介绍和讨论以下几方面的内容:
1.节奏组块的划分;
2.节奏的层次结构;
3.节奏单元内部的内聚特征;
4.节奏单元之间的分界标志;
5. 讨论
5.1节奏同话语信息时域分布的关系;
5.2 节奏结构同句法结构的关系;
5.3 节奏组块的分与合的关系;
6. 小结
6.1 汉语普通话的节奏包含韵律词,韵律短语和语调短语三个基本层次;韵律词通常包含2-3 个音节,韵律短语的跨度多数为7+2个音节。
6.2 节奏单元的内聚特征和分解标志,主要体现为语音单元音高的规律性起伏变化和时长的规律性伸缩停延;
6.3 韵律节奏的结构是以句法结构为基础的,但不等于句法结构,因而不能期望完全通过句法结构推导节奏的层次结构。
6.4 语音的节奏看来并不是建立在某种语音成分或语音单元如重音或音节的等间隔出现的基础上,而是建立在语音信息在时间域的规律性分布的基础上,具体表现为一定的韵律现象在一定位置上的规律性出现。这种规律性的出现模式客观上体现了口头话语的层次结构。
Abstract 摘要
The present paper describes the software PALM, designed to grasp and analyze the rhythm of Mandarin utterances. The functions of the software were tested on a small database consisting of 23 sentences recorded in slow tempo and in fast tempo. As a first step, the utterances were prosodically transcribed. The transcription captures stress features and horizontal segmentation. Theoretical fundaments of prosodic transcription are outlined (a simplified version of prosodic transcription is proposed to be used in teaching Mandarin as a second language). Transcribed utterances were broken into entries corresponding to syntactic words, then labeled for various features (both prosodical and grammatical). Query function allows retrieval of the instances – either words, or syllables –sharing various combinations features (for words: number of syllables, syntactic function, stress pattern etc.; for syllables: level of stress, tonality etc.). Count function allows statistical processing of the search results. PALM was designed as a tool for finding links between rhythmical structure of the Mandarin utterance, its grammatical structure and speech tempo.
The present paper describes the software PALM, designed to grasp and analyze the rhythm of Mandarin utterances. The functions of the software were tested on a small database consisting of 23 sentences recorded in slow tempo and in fast tempo. As a first step, the utterances were prosodically transcribed. The transcription captures stress features and horizontal segmentation. Theoretical fundaments of prosodic transcription are outlined (a simplified version of prosodic transcription is proposed to be used in teaching Mandarin as a second language). Transcribed utterances were broken into entries corresponding to syntactic words, then labeled for various features (both prosodical and grammatical). Query function allows retrieval of the instances – either words, or syllables –sharing various combinations features (for words: number of syllables, syntactic function, stress pattern etc.; for syllables: level of stress, tonality etc.). Count function allows statistical processing of the search results. PALM was designed as a tool for finding links between rhythmical structure of the Mandarin utterance, its grammatical structure and speech tempo.
Abstract 摘要
This paper explores the origins of the ba construction in Classical Chinese. It is argued that the disposal ba sentences were born in poetic environments and further evolved in natural speech. It was a result of stress shift in purposive sentences with a poetic structure. Syntactically, the disposal ba sentences originated from a purposive construction involving an Empty Operator Movement. As the last verb became more and more complex, the stress of purposive ba construction began to be shifted to the end of the sentence and the purposive ba construction gradually turned into a delimitative ba construction. Under the pressure of delimitation, the object of ba was forced to be more and more specific. Moreover, it is argued that the ba sentences in modern Chinese could also be analyzed as involving a null operator movement and constrained by prosody. Thus, prosody is one of the most important factors that motivate syntactic changes, and diachronic studies can also provide proper explanations for synchronic facts.
本文探讨把字句的来源,认为处置型把字句托生于诗歌而完成于口语,是在诗律结构中“重音”转移的结果。句法上,处置型把字句源于空运符运作的目的句。随着目的句中最后动词前后成份的日趋复杂,把字句的重音便开始后移。与此同时,目的型把字句也开始向“终届型把字句”转化。在终结句法体态模式的压力之下,把字句的宾语也随之“特指化”。文章最后指出:韵律是促发句法演变的重要因素;而历时研究同样可以为共时痕现象做出必要和充分的解释。
Abstract 摘要
Results of pervious studies examining ambiguous phrases revealed that duration appeared to be the most robust cue in disambiguation in English, while pause was the more powerful cue in Mandarin. The present study investigated from acoustic and perceptual viewpoints how Taiwan Mandarin subjects disambiguate phrases. The experiment was done within a question-answer context. The phonetic realization of three kinds of ambiguous phrases was studied: (1) lexically and syntactically ambiguous phrases with ‘ji’ (how many/several), (2) lexically ambiguous phrases, and (3) syntactically ambiguous phrases. No systematically significantly differences in fundamental frequency and in duration were found for lexically ambiguous phrases and syntactically ambiguous phrases. Despite that, we observed that the syllables might have a significantly duration difference in some ambiguous phrases. The perception study confirmed that our subjects could perceive this acoustic difference well. Moreover, the acoustic difference coincided with the syntactic boundary. For the phrases in which no significant acoustic difference was found, the perception correct rate was low and the subjects tended to use the concept of sentence frequency to interpret the ambiguous phrases. We also showed that there might be different duration in Taiwan Mandarin grammatical categories. A word serving as a verb might have a longer duration compared with the same word serving as a noun.
歧义句相关研究均指出英语中区辨歧义句最重要的声学特征为音长,而在汉语中则是停顿。本研究从声学和听辨的角度探讨了台湾华语语者是如何区辨歧义句的。我们研究 “几” 字句(词汇和句法歧义句),词汇歧义句,句法歧义句等三种形式句子的语音体现。录音是以答问的形式进行的。 结果显示词汇歧义句和句法歧义句在基频和音长上的差异并不显著,但在某些歧义句中,有些音节的音长有着显著的差异。听辨测验的结果也显示台湾华语语者能听辨这些声学差异。而这些有显著声学差异的音节多出现在语法界在线。至于那些音节无显著差异的歧义句,听辨测验的答对率非常的低,且语者会采用语句频率的概念来诠释歧义句。 此外,也发现在台湾华语种不同的词类,它的音长可能不同,如一个词作为动词时,它的音长会比它作为名词时长。
Abstract 摘要
This paper aims to collect phenomena reflecting Chinese stress accent from historical materials as much as possible, and explore its conditioning factors. The paper contains 8 sections: 1. Theme, 2. Pre-Han Period, 3. Tang Dynasty, 4. Yuan Dynasty, 5. Ming Dynasty, 6. Qing Dynasty, 7. The First Half of the 20th Century, and 8. Comparative Study of Modern Dialects.
本文从历代文献中尽量搜集反映汉语轻重音的现象,探索其产生条件。文分8节:第1节说明本文主题,以下各节分别讨论(2)先秦时期,(3)唐代,(4)元代,(5)明代,(6)清代,(7)20世纪前叶,及(8)现代方言里的情况。