2009年10月21日 星期三

An Introduction to Taiwanese Speech Notepad

An Introduction to Taiwanese Speech Notepad 
 
Yu-Chu Chang
 
Taiwanese Speech Notepad, a stand-alone text-to-speech software for the person who likes to learn Taiwanese language through Romanized Taiwanese or Chinese characters registered copyright with the name of Modern Literal Taiwanese Reading and Writing System in 2001. As the speech engine and the speech synthesizer are original programs developed by the authors, the system can work independently without mounting MS Speech SDK or IBM TTS Engine. With Corpus-based multilingual dictionary and Dynamic Database structure, the speech of Taiwanese Speech Notepad can be generated from literal Romanized Taiwanese or Chinese text using pre-recorded natural human voice as well as prosodic speech synthesizer.
Taiwanese (also called Amoy) is a phonetics-based language. A Taiwanese sentence is a Set (mathematics), a collection of well defined and distinct objects named tone group. There are two different tones for a syllable or a word in Taiwanese. In initial or medial position of a word or a tone group, the syllable or word will be given with sandhi tone while lexical tone was given for the end syllable of a word or a tone group. Therefore, if and only if a lexical tone was given at the end of a set of words, then the set of words was a tone group.
Tone sandhi is a phonological change occurring in tonal languages such as Mandarin and some languages spoken in China, in which individual word change its tone that was based on the pronunciation of adjacent words in a sentence. Tone sandhi of Taiwanese, although complex, can be followed by some rules. In addition, tones, semantics and syntactic structures are closely related in Taiwanese. A listener can distinguish exactly word meaning from its tone phase among similar sentences. For example, different tone phase of the clay oven roll represented distinct meaning in the following sentences.
Mua5-a2 (lexical tone) tua7 e5 sio-piann2. (The clay oven roll with big sesame.)
Mua5-a2 (context tone) tua7 e5 sio-piann2. (The clay oven roll as big as a sesame.)
Another example is that the tone phase of "tsit-king (the unit of a house)" depends on its context in a sentence.
Tsit-king (lexical tone) u7 tsai-hue. (There are flowers around the house.)
Tsit-king (context tone) u7 tsai-hue e5 tshu3 si7 guan2-tau. (The house with many flowers is mine.)
In general, inserting words in a Taiwanese sentence could be a lexical form, a context form or a tone group. No matter what embedding form is, the final sentence structure would be decomposed into tone groups.
The following sentences reveal the unique tone group derivative structure of Taiwanese language through variety forms of word inserting within a sentence. The original sentence consists of two tone groups: [A-bi2] [beh khi3 Tai5-pak.] (Amy is going to Taipei.)
Context word (siunn7) inserting: [A-bi2] [siunn7 beh khi3 Tai5-pak.] (Amy wants to go to Taipei.) The number of tone group doesn't change.
Lexical word (pai3-it) inserting: [A-bi2] [pai3-it] [beh khi3 Tai5-pak.] (Amy is going to Taipei on Monday.) The number of tone group was changed to three.
Tone group (tse7 gu5-tshia) inserting: [A-bi2] [beh tse7 gu5-tshia] [khi3 Tai5-pak.] (Amy is going to Taipei by an ox-cart.) The number of tone group was changed to three.
Since tone sandhi processing is so important to Taiwanese sentences that the tone group parser will surely be one of the most complicated and challenged subjects for Taiwanese text-to-speech systems. There are three major components in Taiwanese Speech notepad; a tone group parser, a speech synthesizer and a speech engine. In general, the implementation of speech synthesizer or speech engine depends on programming technique. However the tone group parser relies more on knowledge representation method and the application of artificial intelligence. Luckily we find a way to extract linguistic expertise from human experts and transfer it into an well-organized knowledge base. A symbol system that was designed to coin language expertise and heuristic knowledge was hooped to a corpus. Each symbol consist of three attributes: default tone value, part of speech and mode mark. This corpus was used to cooperate with rule based sandhi processor to pick up accuracy tone among possible tones for each word in a sentence. By means of the symbol system and rule inference homonyms or multiple-POS words such as ti7 or be2 in the following sentences can be assigned accuracy tone through tone sandhi processing.
Ti7 (chopstick, noun, lexical tone) khng3 ti7 (at/on/in, preposition, context tone) ann2-na5-a2 lai7. (The chopsticks were in a bowl basket.)
Tsit-tsiah be2 (horsae, noun, lexical tone) be2 (buy, verb, context tone) beh kah goo7-ban7. (The horse was bought for 50000 bucks.)
Through the study of Taiwanese architecture, we realize that the production system of Taiwanese is quite clear and definite. This logical and scientific natural language has been developed and inherited several thousand years without its own writing system. Tone group must be one the most important elements for Taiwanese language acquisition. Through the development of Taiwanese Speech Notepad, we found that tone group was not only the basic syntax unit of a sentence but also a unique semantic unit as well as prosodic boundary in Taiwanese. This approach may reveal that modern Mandarin that lack of the regularity of tone group is different from Taiwanese language.
Taiwanese Speech Notepad is more than a text-to-speech system. It is also a simulator for tone sandhi acquisition in Taiwanese intra-sentence. By means of the adjustment of corpus and rules, we may understand how people handle Taiwanese tone sandhi through long-term memory as well as short-term memory in our brain. In other words, the Taiwanese tone group parser based on languages learning and acquisition theory offered an experimental environment for the simulation of knowledge engineering and human thinking. The forming process and the derivative structure of Taiwanese probably is one of the keys to artificial intelligence research.
Taiwanese tone sandhi process in the brain is so marvelous that our simulator has been under construction during the last decade. A work-in-process version of artificial tone group parser that includes a knowledge base and an executable program file for Microsoft Windows system (XP/Win7) can be download for evaluation by sending a request to poirotdavid@yahoo.com.tw. or Download Taiwanese Tone Group Parser 

Latest Information: http://vikonlab.tripod.com/mltmenu3e.htm (in English) http://vikonlab.tripod.com/mltmenu3.htm (in Chinese) http://www.amazon.co.jp/gp/product/4768456316?ie=UTF8&tag=kinyobihondana-22&linkCode=as2&camp=247&creative=1211&creativeASIN=4768456316 (in Japanese)

2009年7月5日 星期日

Phì-jû Tsia̍h-tsiú (譬喻食酒)


Phì-jû Tsia̍h-tsiú
譬喻食酒

台灣府城教會報(1886) 

2008-10-05 張東瀛改譯


Tsit-phiⁿ bûn-tsiuⁿ pún-tué sī Pe̍h-Uē-Jī, guân-bûn khan tī Tâi-uân Hú-siâⁿ Kàu-huē-Pò, 1895-Nî, Tsap-Gueh, Tē 127 Hō, 95 ia̍h. Huan-tsò Tâi-Uân-Lô-Má-Jī ê sî-tsūn, uī tio̍h beh hōo gú-im khah tsiap-kīn hiān-tāi Tâi-Gú,ū tsò tām-po̍h-á siu-kái.






Phì-jû Tsia̍h-tsiú

Lán ê sí-tsóo A-tong thâu-tsi̍t-pái khì tsai phô-tô-tshiū, hit-sî Môo-kuí tsiū kīn-uá tshiū-thâu lâi thâi tsi̍t tsiah khóng-tshiok, suà--lo̍h-lâi tsiong I ê hueh khì ak phô-tô ê tshiū-kin. āu-lâi phô-tô-tshiū seⁿ-ki huat-hio̍h, Môo-kuí koh kīn-uá tshiū-thâu, koh thâi tsi̍t-tsiah kâu, tsiong kâu-hueh ak- lo̍h-khì. Thîng bô luā-kú hit tsâng phô-tô-tshiū ū kiat kué-tsí kui-lia̍p-kui-lia̍p, Môo-kuí tsiū koh lâi, ia̍h koh thâi tsi̍t tsiah sai, iū-koh tsiong sai-hueh ak-lo̍h-khì. Kàu-bué hiah-ê kué-tsí tsha-put-to tsiâu-si̍k, Môo-kuí tsiū koh lâi, tsit-pái thâi tsi̍t tsiah ti, iû- guân tsiong ti-hueh khì ak phô-tô-tshiū.

Guân-tsá ê kāu-tsiú sī tuì phô-tô ê kué-tsí kik--tshut-lâi, sóo-í tsia̍h-tsiú--ê sī ná-teh lim hit sì-tsióng khîm-siù ê pún-sìng. Án-ni tú-á tsia̍h-tsiú liáu ê lâng, I ê bīn tō âng-âng, ū tuā khì-khài ê bôo-iūⁿ, lóng tsin iông-iông tik-ì, káⁿ-ná khóng-tshiok án-ne. Tán-hāu bô luā-kú kàu lio̍h-á-tsuì, tsiū tiô-kha tah-tshiú, thiàu-kuè-lâi thiàu-kuè-khì, ná kâu ê khuán-sit. Aū-lâi i koh khah tsuì, tsin ióng-bíng beh tsio lâng lâi kap i su-iâⁿ. Tsit-tsām tō ná sai kāng-khuán. Kàu-bué-tshiú tuā tsuì, siān-siān lóng bô la̍t, kan-ta ài khì kō thôo-muâi teh khùn. Tse sī ti ê môo-iūⁿ.


﹝譯者註﹞
這篇文章本來是白話字,原文刊於台灣府城教會報,1895年十月, 第一二七號, 95 頁。
翻成台語羅馬拼音時,有稍微修改,使語音比較接近現代台語。

譬喻食酒

我們的始祖亞當,頭一次去種葡萄樹。那時魔鬼就在樹頭附近殺一隻孔雀,然後將血澆入葡萄樹根。 後來葡萄樹生枝長葉,魔鬼又在樹頭殺一隻猴子,將猴血澆下去。過不了多久,那株葡萄樹開始結果纍纍, 魔鬼再來,殺了一頭獅子,又將獅血澆下去。最後那些果實差不多成熟時,魔鬼又來,這次殺一條豬,還是將豬血澆葡萄樹。

從前烈酒是從葡萄的果實榨出來,所以喝酒的人就像喝那四種禽獸的本性。因此,剛喝酒的人,臉色泛紅,有英雄氣概,揚揚得意,好像孔雀一般。沒多久,到微醺之時,手舞足蹈,跳來跳去,像猴子那樣。 後來更醉,勇猛十足,到處找人挑釁,這時就像獅子。到最後大醉,全身無力,躺在地上呼呼大睡,這是豬的模樣。




 
This website is sponsored by VIKON Corp.
Copyright July 1996. All rights reserved.

Last update : 2008-08-20
有什麼建議嗎 ? 來信請寄:

  vikontony@gmail.com