Details of Database of Faculty members and Researchers

小学生向け教育番組の音声に用いられる語彙の予備調査

Other Link： https://dblp.uni-trier.de/db/journals/corr/corr2111.html#abs-2111-03629

淺井優介, 北村達也, 川村よし子

甲南大学紀要知能情報学編 13 ( 1 ) 67 - 75 2020.7

Joint Work

Publisher：甲南大学

日本語教育が必要な児童向けの教材作成の基礎データを提供するために，NHK Eテレの小学生向け教育番組の音声を書き起こし，語彙表を試作した．低学年向け17番組計510分，高学年向け19番組計570分を音声認識を利用して書き起こし，その中に現れた単語を有用度指標にもとづいて降順に並べ，語彙表を作成した．得られた語彙表は，先行研究にて作成された書き言葉コーパスに基づく語彙表よりも易しい単語が抽出されており，本研究の方法論の有効性が示された．

DOI： 10.14990/00003648

Other Link： http://doi.org/10.14990/00003648

System Integration for Component-Based Manzai Robots with Improved Scalability

Tomohiro Umetani, Satoshi Aoki, Tatsuya Kitamura, Akiyo Nadamoto

Journal of Robotics and Mechatronics 32 ( 2 ) 459 - 468 2020.4

Publisher：Fuji Technology Press Ltd.

This paper describes system developments for integrating control systems of Manzai robot duos that automatically generate Manzai scripts from Internet articles based on given keywords, as well as improvements in the scalability of the integrated control system. Component-based Manzai robots controlled by RT-Middleware have been developed. However, conventional Manzai robot systems, the control systems of which are individually developed, experience some difficulties in interface integration and system maintenance as well as in scalability. In this study, we built a Manzai robot system excellent in reusability, maintainability and scalability by separating the common part from the hardware-dependent part by using the RT components of RT-Middleware. We also verify the reusability and scalability of the hardware-constrained component groups by implementing the Manzai robot control system into ready-made robots with different types of mechanism. We proved the effectiveness of the developed Manzai robot control system on its implementation results.

DOI： 10.20965/jrm.2020.p0459

Occurrence conditions of delayed fundamental frequency fall in Japanese isolated word speech of Tokyo dialect speakers Reviewed

Tatsuya Kitamura, Yuta Amakawa, Hiroaki Hatano

Journal of the Phonetic Society of Japan 23 ( 0 ) 165 - 173 2019.12

Joint Work

Publisher：The Phonetic Society of Japan

In a delayed fundamental frequency (F0) fall or a late fall phenomenon, the F0 fall occurs on the post-accented mora in Japanese speech. This study conducted a large-scale investigation of the occurrence conditions of the delayed F0 fall for 230 words of 48 Tokyo-dialect Japanese speakers (21 males and 27 females). The results showed that the delayed F0 fall occurred more frequently (1) in female speech than in male speech, (2) in initial-accented words than in middle-accented words, (3) in longer words, (4) in words in which the accented mora was followed by a mora with a back vowel.

DOI： 10.24467/onseikenkyu.23.0_165

Further observations on a principal components analysis of head-related transfer functions Reviewed

Parham Mokhtari, Hiroaki Kato, Hironori Takemoto, Ryouichi Nishimura, Seigo Enomoto, Seiji Adachi, Tatsuya Kitamura

Scientific Reports 9 7477 2019.5

Joint Work

DOI： 10.1038/s41598-019-43967-0

Survey of Japanese undergraduate and graduate students' self-concept of clumsiness in speech Reviewed

75 ( 3 ) 118 - 124 2019.3

Joint Work

A questionnaire-based survey was conducted on native Japanese undergraduate and graduate students in 15 universities and institutes in Japan to evaluate their feelings of clumsiness while speaking during daily conversation. Responses from 1,831 students without known history of speech, hearing, or language disorders were analyzed. The results showed that 31.0 % of the participants felt "clumsy" or "rather clumsy" while speaking during daily conversation. Analysis by gender revealed that 35.5% of the male students and 24.4% of the female students felt "clumsy" or "rather clumsy" while speaking. The students who had focused on science and maths in high school tended to feel a greater degree of clumsiness in speech than those who had focused on humanities. Students who felt clumsy while speaking tended to think that their speech was often misunderstood. Over 90% of the participants expressed interest in improving their articulation.

DOI： 10.20697/jasj.75.3_118

教科書中の単語の初出課を判定する日本語教育支援システムの利用状況の分析

北村達也

甲南大学紀要知能情報学編 11 ( 2 ) 209 - 215 2019.2

Single Work

Publisher：甲南大学

日本語教育用の教科書に含まれる単語がその教科書において初めて現れる課（初出課）を自動的に判定するシステムを開発し，その利用状況を調査した．その結果，2018年4月1日から7月31日までの四半期に10,000回を超えるアクセスがあり，そのうちの約9割は日本国内からのアクセスであった．また，利用者100名を対象としたアンケート調査の結果，利用者の約6割が日本語教師を職業としている人であった．そして，利用者の約8割がこのようなシステムの有無が教科書の選定に影響すると回答した．

DOI： 10.14990/00003307

Other Link： http://doi.org/10.14990/00003307

Morphological characteristics of male and female hypopharynx: A magnetic resonance imaging-based study Reviewed

Ju Zhang, Kiyoshi Honda, Jianguo Wei, Tatsuya Kitamura

Journal of the Acoustical Society of America 145 734 2019.2

Joint Work

DOI： 10.1121/1.5089220

チューブ発声時の皮膚振動を利用したバイオフィードバックシステムの開発と効果の検討 Reviewed

川村直子, 北村達也, 城本修

音声言語医学 59 ( 4 ) 334 - 341 2018.9

Joint Work

DOI： 10.5112/jjlp.59.334

Audio-Visual Teaching Aid for Instructing English Stress Timings

Tatsuya Kitamura, Ryo Nagata, Kotaro Funakoshi

甲南大学紀要知能情報学編 11 ( 1 ) 1 - 17 2018.7

Joint Work

Publisher：甲南大学

This study proposed and evaluated an audio-visual teaching aid for teaching rhythm of spoken English. The teaching aid instructs stress timing of English by movements of a circle marker on PC screen. Native Japanese participants exercised English sentences with and without the teaching aid and their speech sounds were recorded before and after the exercise. The results of analyses of the speech sounds showed that the teaching aid could improve in learning the English stress timing.

DOI： 10.14990/00003196

Other Link： http://doi.org/10.14990/00003196

Replacement of sensor cables for reducing effects on articulation in the Northern Digital Incorporated's Wave electromagnetic articulography system Reviewed

Tatsuya Kitamura, Yukiko Nota, Michiko Hashi, Hiroaki Hatano

JASA Express Letters 143 ( 3 ) EL154 - EL159 2018.3

Joint Work

This study attempted to improve the five-degrees-of-freedom sensors of the Northern Digital Incorporated's Wave electromagnetic articulography system by replacing their cables with thinner and more flexible cables to reduce interference in articulation. Measurement errors and data loss rates were compared between the original and the proposed sensors. The proposed sensors showed twofold tracking accuracy and data loss rates compared to the original sensors in an experiment using a crank-rocker mechanism. Data loss rates of the proposed sensors increased in articulatory data collection from four speakers. The proposed sensors have been made available commercially.

DOI： 10.1121/1.5025167

PubMed

自己完結性を有するコンポーネント駆動型の卓上ロボット環境の構築

梅谷智弘, 清瀬大貴, 榊原洋之, 青木哲, 北村達也

計測自動制御学会論文誌 54 ( 1 ) 126 - 128 2018

Joint Work

DOI： 10.9746/sicetr.54.126

Scalable Component-Based Manzai Robots as Automated Funny Content Generators

Tomohiro Umetani, Satoshi Aoki, Kazuhiro Akiyama, Ryo Mashimo, Tatsuya Kitamura, Akiyo Nadamoto

Journal of Robotics and Mechatronics 28 ( 6 ) 862 - 869 2016.12

Joint Work

DOI： 10.20965/jrm.2016.p0862

Implicit Communication Robots based on Automatic Scenario Generation using Web Intelligence

MASHIMO Ryo, KITAMURA Tatsuya, UMETANI Tomohiro, NADAMOTO Akiyo

International Journal of Web Information Systems 12 ( 3 ) 312 - 335 2016.9

Joint Work

DOI： 10.1108/IJWIS-04-2016-0017

単語リストに基づく単語分類機能をもつテキストエディタ Reviewed

北村達也

日本語学 ( 8 ) 80 - 87 2016.8

Single Work

Manzai robot system with scalability based on distributed software components

Tomohiro Umetani, Satoshi Aoki, Kazuhiro Akiyama, Ryo Mashimo, Tatsuya Kitamura, Akiyo Nadamoto

2015 International Symposium on Micro-NanoMechatronics and Human Science, MHS 2015 2016.3

Joint Work

Publisher：Institute of Electrical and Electronics Engineers Inc.

This paper describes a manzai robot system with scalability that is developed based on the distributed software components. Manzai is a Japanese traditional stand-up comedy that is usually performed by two comedians. The manzai robots generate their manzai scripts based on web news articles related to keywords given by audiences and the searching results on WWW automatically, and then the robots perform the manzai scripts. Each robot is controlled by distributed RT components executed on the Raspberry Pi controller. The RT components control the manzai robots synchronously. The paper focuses on the scalability of the manzai robot system. Experimental results show the feasibility of manzai performance robots with scalability of the functions of the robot systems.

DOI： 10.1109/MHS.2015.7438343

磁気センサシステムに基づく調音運動と口蓋形状の関係の観測

北村達也, 能田由紀子, 吐師道子, 波多野博顕, 梅谷智弘

音声言語医学 57 ( 1 ) 52 - 52 2016.1

Joint Work

Publisher：日本音声言語医学会

Human-Robots Implicit Communication based on Dialogue between Robots using Automatic Generation of Funny Scenarios from Web

Ryo Mashimo, Tomohiro Umetani, Tatsuya Kitamura, Akiyo Nadamoto

ELEVENTH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN ROBOT INTERACTION (HRI'16) 327 - 334 2016

Joint Work

Publisher：ASSOC COMPUTING MACHINERY

Numerous studies have examined communication robots that communicate with people, but it is difficult for robots to communicate with people smoothly. We call the communication style based on dialogue between robots as "human-robot implicit communication". As described herein, we propose a Manzai-robots for which the interaction style is human-robot implicit communication based on an automatically generated scenario from web news. Our generated Manzai scenario consists of snappy patter and a misunderstanding of dialogue based on the four kinds of gap of structure of funny points. Our purpose is that people feel familiarity from smoothly human-robot communication using dialogue between robots based on a Manzai scenario. We conducted experiment of three kinds to assess (1) the effectiveness of automatic creation of Manzai scenario for the robots, (2) the effectiveness of the Manzai-robots as a media, and (3) the effectiveness of types of familiarity for Manzai-robots. Based on their results, we measured the familiarity and smooth communication of our Manzai-robots.

Automatic generation of Japanese traditional funny scenario from web content based on web intelligence

Ryo Mashimo, Tomohiro Umetani, Tatsuya Kitamura, Akiyo Nadamoto

17th International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2015 - Proceedings 2015.12

Joint Work

Publisher：Association for Computing Machinery, Inc

Today there is much information and knowledge on the internet, and many studies have examined the extraction of many kinds of knowledge from the internet. In addition, numerous studies have examined entertainment robots that communicate with people, but it is difficult for robots to communicate smoothly with people. We specifically examine communication between robots based on dialogue. Here, we create a dialogue-based scenario for the robots to undertake automatically, but it is difficult because the dialogue requires knowledge of many kinds. We consider the use of the knowledge from the web and create scenarios automatically. As described herein, we propose a system that generates dialogue scenarios automatically from web news articles in real time. We used the Manzai metaphor, which is Japanese traditional humorous comedy in our system. Our generated Manzai scenario consists of snappy patter and a misunderstanding dialogue based on the gap of our structure of funny points. We create communication robots to amuse people with our generated humorous robot dialogue scenarios.

DOI： 10.1145/2837185.2837232

Tongue Movement of Healthy Adults who Feel Clumsy Articulating Alveolar Flaps

TACHIKAWA Wataru, OZAWA Yoshiaki, HASHI Michiko, KITAMURA Tatsuya, NOTA Yukiko

Journal of the Phonetic Society of Japan 19 ( 3 ) 50 - 56 2015.12

Joint Work

Publisher：Journal of the Phonetic Society of Japan

Tongue movement of healthy young adults who feel clumsy articulating alveolar sounds in daily conversation was measured using the WAVE speech research system. Tongue blade movements of these speakers during repetitive production of Japanese /ra/ showed reduction of speed and movement range compared with those who feel no clumsiness, despite the absence of any organic or neurological abnormalities. Such differences may suggest underlying differences of fine and rapid motor controls required for smooth speech production which may be related to their awareness of clumsiness in articulation.

DOI： 10.24467/onseikenkyu.19.3_50

Crucial Prosodic Features in Japanese Learners' Pronunciation: Evidence from Naturalness Judgments of Synthetic Speech Reviewed

Rongna A, Ryoko Hayashi, Tatsuya Kitamura

Journal of the Phonetic Society of Japan 19 ( 3 ) 37 - 42 2015.12

Joint Work

The present study reports native speakers' impressions of JFL learners' utterances before and after shadowing/repeating training. Evaluation was also done for prosodically modified synthesized stimuli in order to examine the crucial prosodic cues. The results suggest that both durational patterns and pitch patterns are important for the utterances to be heard as natural Japanese, but durational patterns may be more important. Moreover, shadowing training appears to improve mora-timed rhythm. The results of the present study could provide useful suggestions for developing pronunciation training for Japanese pronunciation and speech education.

DOI： 10.24467/onseikenkyu.19.3_37

Non-contact measurement of facial surface vibration patterns during singing by scanning laser Doppler vibrometer

Tatsuya Kitamura, Keisuke Ohtani

Frontiers in Psychology, section Performance Science 6 2015.11

Joint Work

Publisher：FRONTIERS MEDIA SA

This paper presents a method of measuring the vibration patterns on facial surfaces by using a scanning laser Doppler vibrometer (LDV). The surfaces of the face, neck, and body vibrate during phonation and, according to Titze (2001), these vibrations occur when aerodynamic energy is efficiently converted into acoustic energy at the glottis. A vocalist's vibration velocity patterns may therefore indicate his or her phonatory status or singing skills. LDVs enable laser-based non-contact measurement of the vibration velocity and displacement of a certain point on a vibrating object, and scanning LDVs permit multipoint measurements. The benefits of scanning LDVs originate from the facts that they do not affect the vibrations of measured objects and that they can rapidly measure the vibration patterns across planes. A case study is presented herein to demonstrate the method of measuring vibration velocity patterns with a scanning LDV. The objective of the experiment was to measure the vibration velocity differences between the modal and falsetto registers while three professional soprano singers sang sustained vowels at four pitch frequencies. The results suggest that there is a possibility that pitch frequency are correlated with vibration velocity. However, further investigations are necessary to clarify the relationships between vibration velocity patterns and phonation status and singing skills.

DOI： 10.3389/fpsyg.2015.01682

Measurement of perceptual speaker similarity for sentence speech in ATR speech database Reviewed

Journal of the Acoustical Society of Japan 71 ( 10 ) 516 - 525 2015.10

Joint Work

DOI： 10.20697/jasj.71.10_516

Improvement of five-degree-of-freedom sensors for Northern Digital Incorporated's Wave speech research system

Tatsuya Kitamura, Yukiko Nota, Michiko Hashi, Hiroaki Hatano

Acoustical Science and Technology 36 ( 4 ) 347 - 350 2015

Joint Work

Publisher：ACOUSTICAL SOC JAPAN

DOI： 10.1250/ast.36.347

Manzai Robots: Entertainment Robots Based on Auto-Created Manzai Scripts from Web News Articles

UMETANI Tomohiro, MASHIMO Ryo, NADAMOTO Akiyo, KITAMURA Tatsuya, NAKAYAMA Hirotaka

J Robot Mechatron 26 ( 5 ) 662 - 664 2014.10

Joint Work

DOI： 10.20965/jrm.2014.p0662

Verification of reproducibility of measurements of skin vibration during singing by scanning laser-Doppler vibrometer Reviewed

55 ( 2 ) 167 - 172 2014.4

Single Work

DOI： 10.5112/jjlp.55.167

Acoustic and articulatory characteristics of English reduced vowels uttered by native speakers of English and Japanese : Analysis based on X-ray microbeam speech production database

Hatano Hiroaki, Kitamura Tatsuya

The Journal of the Acoustical Society of Japan 70 ( 3 ) 106 - 113 2014.3

Joint Work

Publisher：The Acoustical Society of Japan

DOI： 10.20697/jasj.70.3_106

発話観測システムNDI Waveの改良型センサを用いた子音構音の観測

北村達也, 能田由紀子, 波多野博顕, 吐師道子, 西谷実

音声言語医学 55 ( 1 ) 59 - 59 2014.1

Joint Work

Publisher：日本音声言語医学会

Comparison of vocal tract transfer functions calculated using one-dimensional and three-dimensional acoustic simulation methods

Hironori Takemoto, Parham Mokhtari, Tatsuya Kitamura

15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4 408 - 412 2014

Joint Work

Publisher：ISCA-INT SPEECH COMMUNICATION ASSOC

Acoustic characteristics of the vocal tract have been investigated extensively in the literature using a one-dimensional (ID) acoustic simulation method. Because the ID method assumes plane wave propagation only, it is recognized to be valid only in the low frequency region (below about 4 or 5 kHz). Recently, a three-dimensional (3D) acoustic simulation method was developed, to obtain more precise acoustic characteristics of the vocal tract. In the present study, from a male's vocal tract shapes, transfer functions were calculated using the 1D and 3D methods and compared with each other to evaluate the valid frequency range of the ID method. As a result, when acoustic effects of the piriform fossae were considered in the ID method, the transfer functions agreed with each other up to 7 kHz (ignoring small dips). The 3D method showed that a deep dip was generated at around 8 kHz by the transverse resonance mode in the pharynx. Above this dip frequency, the transfer functions disagreed with each other. Thus, the ID method is valid up to 7 kHz for this subject. Because this subject has a relatively large vocal tract, in general the upper limit of the valid frequency range could exceed 8 kHz.

Vocal tract length estimation based on vowels using a database consisting of 385 speakers and a database with MRI-based vocal tract shape information

Hideki Kawahara, Tatsuya Kitamura, Hironori Takemoto, Ryuichi Nisimura, Toshio Irino

15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4 870 - 874 2014

Joint Work

Publisher：ISCA-INT SPEECH COMMUNICATION ASSOC

A highly-reproducible estimation method of vocal tract length (VTL) and text independent VTL estimation method are proposed based on a Japanese vowel database spoken by 385 male and female speakers ranging from age 6 to 56 and other vowel database with MRI-based vocal tract shape information. Proposed methods are based on interference-free power spectral representation and systematic suppression of biasing factors. MRI data is used to calibrate VTL estimation result to be represented in terms of physically meaningful unit. These databases are normalized based on the estimated VTL information to provide a reference template, which is used to implement a text independent VTL estimation method. A prototype system for text independent estimation of VTL is implemented using Mat lab and runs faster than realtime on a PC.

Acoustic interaction between the right and left piriform fossae in generating spectral dips Reviewed

Hironori Takemoto, Seiji Adachi, Parham Mokhtari, Tatsuya Kitamura

Journal of the Acoustical Society of America 134 ( 4 ) 2955 - 2964 2013.10

Joint Work

Publisher：ACOUSTICAL SOC AMER AMER INST PHYSICS

It is known that the right and left piriform fossae generate two deep dips on speech spectra and that acoustic interaction exists in generating the dips: if only one piriform fossa is modified, both the dips change in frequency and amplitude. In the present study, using a simple geometrical model and measured vocal tract shapes, the acoustic interaction was examined by the finite-difference time-domain method. As a result, one of the two dips was lower in frequency than the two independent dips that appeared when either of the piriform fossae was occluded, and the other dip was higher in frequency than the two dips. At the lower dip frequency, the piriform fossae resonated almost in opposite phase, while at the higher dip frequency, they resonated almost in phase. These facts indicate that the piriform fossae and the lower part of the pharynx can be modeled as a coupled two-oscillator system whose two normal vibration modes generate the two spectral dips. When the piriform fossae were identical, only the higher dip appeared. This is because the lower mode is not acoustically coupled to the main vocal tract enough to generate an absorption dip. (C) 2013 Acoustical Society of America.

DOI： 10.1121/1.4818744

日本語学習者の音声の韻律変換が自然性評価に与える影響

阿栄娜, 林良子, 北村達也

日本音響学会2013年秋季研究発表会講演論文集 425 - 426 2013.9

Joint Work

Naturalness on Japanese pronunciation before and after shadowing training and prosody modified stimuli

Rongna A, Ryoko Hayashi, Tatsuya Kitamura

Proceedings of Interspeech 2013 Satellite workshop on Speech and Language Technology in Education 143 - 146 2013.8

Joint Work

日本語学習者のための文章と難易度判定システムの構築と運用実験

川村よし子, 北村達也

Journal CAJLE 14 18 - 30 2013.7

Joint Work

Timing differences in articulation between voiced and voiceless stop consonants: An analysis of cine-MRI data

Masako Fujimoto, Tatsuya Kitamura, Hiroaki Hatano, Ichiro Fujimoto

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 955 - 958 2013

Joint Work

Publisher：ISCA-INT SPEECH COMMUNICATION ASSOC

Laryngeal and supralaryngeal articulators coordinately work to produce speech sounds. In order to study differences in supralaryngeal manifestations of voiced and voiceless consonants, we compared the tongue movement during a minimal pair /agise/ and /akise/ using the fast scanning techniques of MRI movies. The result showed that the tongue displacement starts earlier in /k/ than in /g/ for many of the speakers of Tokyo Japanese. This agrees with our previous findings using other dialect speakers. These results suggest that many Japanese actively differentiate supralaryngeal articulation according to the voicing of the consonants, raising the tongue earlier in voiceless ones. This movement is presumably to ensure the voicelessness of the consonant. The present study also supplies evidence for the usefulness of a constructive approach for physical modeling.

Differences in articulatory movement between voiced and voiceless stop consonants Reviewed

Ryosuke O. Tachibana, Tatsuya Kitamura, Masako Fujimoto

Acoustical Science and Technology 33 ( 6 ) 391 - 393 2012.11

Joint Work

DOI： 10.1250/ast.33.391

Measurement of vibration velocity pattern of facial surface during phonation using scanning vibrometer

Tatsuya Kitamura

Acoustical Science and Technology 33 ( 2 ) 126 - 128 2012.3

Single Work

DOI： 10.1250/ast.33.126

A Method for Predicting Stressed Words in Teaching Materials for English Jazz Chants Reviewed

NAGATA Ryo, FUNAKOSHI Kotaro, KITAMURA Tatsuya, NAKANO Mikio

IEICE Trans Inf Syst (Inst Electron Inf Commun Eng) E95.D ( 11 ) 2658 - 2663 2012

Joint Work

Publisher：IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG

To acquire a second language, one must develop an ear and tongue for the correct stress and intonation patterns of that language. In English language teaching, there is an effective method called Jazz Chants for working on the sound system. In this paper, we propose a method for predicting stressed words, which play a crucial role in Jazz Chants. The proposed method is specially designed for stress prediction in Jazz chants. It exploits several sources of information including words, POSs, sentence types, and the constraint on the number of stressed words in a chant text. Experiments show that the proposed method achieves an F-measure of 0.939 and outperforms the other methods implemented for comparison. The proposed method is expected to be useful in supporting non-native teachers of English when they teach chants to students and create chant texts with stress marks from arbitrary texts.

DOI： 10.1587/transinf.E95.D.2658

Correlation between vocal tract length, body height, formant frequencies, and pitch frequency for the five Japanese vowels uttered by fifteen male speakers

Hiroaki Hatano, Tatsuya Kitamura, Hironori Takemoto, Parham Mokhtari, Kiyoshi Honda, Shinobu Masaki

13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3 402 - 405 2012

Joint Work

Publisher：ISCA-INT SPEECH COMMUNICATION ASSOC

We conducted quantitative analyses of a magnetic resonance imaging (MRI) database to examine the correlation between physical measures (vocal tract length and body height) and acoustic parameters (pitch and formant frequencies) of vowels. The vocal tract length was measured from MRI data for the five Japanese vowels produced by fifteen male Japanese speakers between the ages of 24 and 55. The acoustic features were computed from vowel sounds recorded during scan. The vocal tract length showed a weak positive correlation with the speakers' age (correlation coefficient r = 0.51) but not with the speaker body height (r = 0.08). There were only weaker correlations between the vocal tract length and the first four formant frequencies except that F1 and F2 of the vowel /e/ show negative correlations with the vocal tract length (F1: r = -0.65, F2: r = -0.56). The result suggests that the vocal tract length is one of the dominant factors causing individual differences in the formant frequencies for the vowel /e/, produced by not forming a strong constriction. Furthermore, the pitch frequency was negatively correlated with the body height (r = -0.61).

Simulation of the coupling between vocal-fold vibration and time-varying vocal tract

Yosuke Tanabe, Parham Mokhtari, Hironori Takemoto, Tatsuya Kitamura

Journal of the Acoustical Society of America 130 ( 4 ) 2441 2011.10

Joint Work

Study of perceptual factors for speaker identification focusing on perceptual similarity of speaker characteristics

Tsuyoshi Izumida, Tatsuya Kitamura

Acoustical Science and Technology 32 ( 5 ) 216 - 219 2011.9

Joint Work

DOI： 10.1250/ast.32.216

Dental imaging using a magnetic resonance visible mouthpiece for measurement of vocal tract shape and dimension

Tatsuya Kitamura, Hironori Nishimoto, Ichiro Fujimoto, Yasuhiro Shimada

Acoustical Science and Technology 32 ( 5 ) 224 - 227 2011.9

Joint Work

Acoustic analysis of the vocal tract during vowel production by finite-difference time-domain method Reviewed

Hironori Takemoto, Parham Mokhtari, Tatsuya Kitamura

Journal of the Acoustical Society of America 128 ( 6 ) 3724 - 3738 2010.12

Joint Work

Publisher：ACOUSTICAL SOC AMER AMER INST PHYSICS

The vocal tract shape is three-dimensionally complex. For accurate acoustic analysis, a finite-difference time-domain method was introduced in the present study. By this method, transfer functions of the vocal tract for the five Japanese vowels were calculated from three-dimensionally reconstructed magnetic resonance imaging (MRI) data. The calculated transfer functions were compared with those obtained from acoustic measurements of vocal tract physical models precisely constructed from the same MRI data. Calculated transfer functions agreed well with measured ones up to 10 kHz. Acoustic effects of the piriform fossae, epiglottic valleculae, and inter-dental spaces were also examined. They caused spectral changes by generating dips. The amount of change was significant for the piriform fossae, while it was almost negligible for the other two. The piriform fossae and valleculae generated spectral dips for all the vowels. The dip frequencies of the piriform fossae were almost stable, while those of the valleculae varied among vowels. The inter-dental spaces generated very small spectral dips below 2.5 kHz for the high and middle vowels. In addition, transverse resonances within the oral cavity generated small spectral dips above 4 kHz for the low vowels.

DOI： 10.1121/1.3502470

Visualisation of hypopharyngeal cavities and vocal-tract acoustic modelling

Kiyoshi Honda, Tatsuya Kitamura, Hironori Takemoto, al

Computer methods in Biomechanics and Biomedical Engineering 13 ( 4 ) 443 - 453 2010.7

Joint Work

Publisher：TAYLOR & FRANCIS LTD

The hypopharyngeal cavities consist of the laryngeal cavity and bilateral piriform fossa, constituting the bottom part of the vocal tract near the larynx. Visualisation of these cavities with magnetic resonance imaging (MRI) techniques reveals that during speech, the laryngeal cavity takes the form of a long-neck flask and the piriform fossa takes the form of a goblet of varying shapes: the former diminishes greatly in whispering and the latter disappears during deep inhalation. These cavities have been shown to exert significant acoustic effects at higher frequency spectra. In this study, acoustic experiments were conducted for male and female mechanical vocal tracts with the results that acoustic effects of those cavities determine the frequency spectra above 2kHz, giving rise to peaks and zeros. An acoustic model of vowel production was proposed with three components: voice source, hypopharyngeal cavities and vocal tract proper, which provides effective means in controlling voice quality and expressing individual vocal characteristics.

DOI： 10.1080/10255842.2010.490528

A study of brain activities elicited by synthesized emotional voices controlled with prosodic features Reviewed

Yasuhiro Hamada, Tatsuya Kitamura, Masato Akagi

Journal of Signal Processing 14 ( 4 ) 265 - 268 2010.7

Joint Work

Similarity of effects of emotions on the speech organ configuration with and without speaking

Tatsuya Kitamura

11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2 909 - 912 2010

Single Work

Publisher：ISCA-INST SPEECH COMMUNICATION ASSOC

In this work we propose and verify a hypothesis on emotional speech production: emotions induce physical and physiological changes in the whole body including changes in the configuration and physical/mechanical properties of the speech organs, regardless of whether or not the person is speaking, and as a side effect, this changes the voice quality. To verify this hypothesis, we measured the configuration of the speech organs of professional actors simulating four emotions (neutral, hot anger, joy, and sadness) with and without speaking by magnetic resonance imaging. The results clearly showed that emotions affect the speech organ configuration, and the same tendency of changes in the speech organ configuration was found regardless of whether or not the person was speaking. We also measured electromagnetic articulography data while a participant watched a relaxation or horror movie, and the result implies that emotional changes can deform the speech organ configuration even if the participant does not speak. These results support our hypothesis.

Transfer functions of solid vocal-tract models constructed from ATR MRI database of Japanese vowel production

Tatsuya Kitamura, Hironori Takemoto, Seiji Adachi, Kiyoshi Honda

Acoustical Science and Technologies 30 ( 4 ) 288 - 296 2009.4

Joint Work

The ATR MRI database of Japanese vowel production was used to evaluate the acoustic characteristics of the vocal tract for the five Japanese vowels through the measurements of frequency responses from solid vocal-tract models formed by a stereolithographic technique. The database includes speech sounds as well as volumetric magnetic resonance imaging (MRI) data, but the speech sounds were recorded separately from the acquisition MRI data; therefore, their speech spectra are not appropriate for use as the reference for the transfer functions of the vocal tract. A time-stretched pulse signal generated from a horn driver unit was introduced into the physical model at the lips, and the response signals of the models were recorded at the model's glottis. In the measurements, the glottis of the models was sealed with a plastic plate, and the response signals were measured from a small hole in the plate using a probe microphone. This method permits accurate measurement of the transfer functions of the vocal tract under a closed-glottis condition. The resulting transfer functions of the five Japanese vowels provide a benchmark for testing numerical analysis methods that have been used to study vocal-tract acoustics, although the solid wall decreases the frequencies of lower resonances.

DOI： 10.1250/ast.30.288

Resonance characteristics of hypopharyngeal cavities

Kiyoshi Honda, Tatsuya Kitamura, Hironori Takemoto, Parham Mokhtari, Seiji Adachi

Journal of the Acoustical Society of America 123 ( 5 ) 3731 2008.7

Joint Work

MRI-Based Study on Morphological and Acoustic Properties of Mandarin Sustained Vowels

WANG Gaowu, KITAMURA Tatsuya, LU Xugang, DANG Jianwu, KONG Jiangping

J Signal Process 12 ( 4 ) 311 - 314 2008.7

Joint Work

Deformation of the hypopharyngeal cavities due to F0 changes and its acoustic effects Reviewed

Hironori Takemoto, Tatsuya Kitamura, Kiyoshi Honda, Shinobu Masaki

Acoustical Science and Technology 2008.4

Joint Work

Single-matrix formulation of a time domain acoustic model of the vocal tract with side branches

MOKHTARI Parham, TAKEMOTO Hironori, KITAMURA Tatsuya

Speech Communication 50 ( 3 ) 179 - 190 2008.3

Joint Work

Publisher：ELSEVIER SCIENCE BV

Although it has been found that the piriform fossae play an important role in speech production and acoustics, the popular time domain articulatory synthesizer of [Maeda, S., 1982. A digital simulation method of the vocal-tract system. Speech Comm. 1 (3–4), 199–229] currently cannot include any more than one side branch to the acoustic tube that represents the main vocal tract. To overcome this limitation, in this paper we extended Maeda's (1982) simulation method, by mathematical reformulation in terms of a single-matrix equation having a system matrix that is both sparse and symmetric. Using vocal tract area functions measured by MRI, the simulation results showed that the piriform fossae suppress the energy in the higher frequencies by introducing spectral zeros around 4–5 kHz, and also tend to lower the second formant of vowels. These spectral changes agree with results produced using a well-tested frequency domain transmission-line method, thus validating our new formulation of the time domain synthesizer. The reformulation can be easily extended to accommodate any number of vocal tract side branches, thus enabling more realistic, physiologically correct acoustic simulation of speech production.

DOI： 10.1016/j.specom.2007.08.001

Effects of acoustic modification on perception of speaker characteristics for sustained vowels

Tatsuya Kitamura, Takeshi Saitou

Acoustical Science and Technology 2007.6

Joint Work

Vocal tract length perturbation and its application to male-female vocal tract shape conversion

Seiji Adachi, Hironori Takemoto, Tatsuya Kitamura, Parham Mokhtari, Kiyoshi Honda

Journal of the Acoustical Society of America 121 ( 6 ) 3874 - 3885 2007.6

Joint Work

Publisher：ACOUSTICAL SOC AMER AMER INST PHYSICS

An alternative and complete derivation of the vocal tract length sensitivity function, which is an equation for finding a change in formant frequency due to perturbation of the vocal tract length [Fant, Quarterly Progress and Status Rep. No. 4, Speech Transmission Laboratory, Kungliga Teknisha H6gskolan, Stockholm, 1975, pp. 1-14] is presented. It is based on the adiabatic invariance of the vocal tract as an acoustic resonator and on the radiation pressure on the wall and at the exit of the vocal tract. An algorithm for tuning the vocal tract shape to match the formant frequencies to target values, such as those of a recorded speech signal, which was proposed in Story [J. Acoust. Soc. Am. 119, 715-718 (2006)], is extended so that the vocal tract length can also be changed. Numerical simulation of this extended algorithm shows that it can successfully convert between the vocal tract shapes of a male and a female for each of five Japanese vowels. (c) 2007 Acoustical Society of America.

DOI： 10.1121/1.2730743

Principal components of vocal tract area functions and inversion of speech by linear regression of cepstrum coefficient

Parham Mokhtari, Hironori Takemoto, Tatsuya Kitamura, Kiyoshi Honda

Journal of Phonetics 2007.1

Joint Work

A bone-conduction system for auditory stimulation in MRI Reviewed

Yukiko Nota, Tatsuya Kitamura, Hironori Takemoto, Hiroyuki Hirata, Kiyoshi Honda, Yasuhiro Shimada, Ichiro Fujimoto, Yuko Syakudo, Shinobu Masaki

Acoustical Science and Technology 2007.1

Joint Work

Principal components of vocal-tract area functions and inversion of vowels by linear regression of cepstrum coefficients

Parham Mokhtari, Tatsuya Kitamura, Hironori Takemoto, Kiyoshi Honda

JOURNAL OF PHONETICS 35 ( 1 ) 20 - 39 2007.1

Joint Work

Publisher：ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD

This paper addresses the following two hypotheses: (1) vocal-tract area functions of Japanese vowels can be accurately represented by a linear combination of only a few principal components which, furthermore, are similar to those reported in the literature for different languages; and (ii) the principal components' weights can be predicted and area functions thereby accurately estimated from acoustics by linear regression of cepstrum parameters. To test these hypotheses, synchronized acoustic and vocal-tract 3D MRI data were recorded from an adult male Japanese speaker for both sustained and dynamic vowel utterances. The first two principal components explained covariations in vocal-tract shape and length accounting for 94-97% of the total variance, and indeed provided a cross-linguistic validation of the two underlying components of vowel production emergent from the literature. Multiple linear regression models were then evaluated for their accuracy in reconstructing the area functions of the dynamic utterance by predicting the first two PC coefficients, using either carefully measured formants or cepstral coefficients defined in various frequency bands. The best formant-based regression model required all four formants, with a mean adjusted correlation of 0.93 and mean absolute errors of 0.187 cm(2) in area and 0.131 em in vocal-tract length. The best cepstrum-based regression model prescribed 24 cepstral coefficients defined in the frequency band 0-4 kHz, with a mean adjusted correlation of 0.92 and mean absolute errors of 0.102 cm(2) in area and 0.082 cm in vocal-tract length. These results suggest that vowel production features, properly constrained by PCA modeling, can be mapped with sufficient accuracy from easily measured cepstrum parameters. More work is required to reduce the dependence on MRI data, to extend the applicability of these methods to different voice qualities and different speakers, and to select a smaller subset of acoustic parameters for more robust, real-time inversion. (c) 2006 Elsevier Ltd. All rights reserved.

DOI： 10.1016/j.wocn.2006.01.001

An MRI-based time-domain speech synthesis system Reviewed

Tatsuya Kitamura, Hironori Takemoto, Parham Mokhtari, Toshio Hirai

Journal of the Acoustical Society of America 120 ( 5 ) 3037 2006.12

Joint Work

Changes in vocal tract resonance during a pitch cycle Reviewed

Tatsuya Kitamura, Seiji Adachi

Journal of the Acoustical Society of America 120 ( 5 ) 3351 2006.12

Joint Work

Measurements of MRI scanning noise by an optical microphone

Kitamura Tatsuya, Masaki Shinobu, Shimada Yasuhiro, Fujimoto Ichiro, Syakudo Yuko, Honda Kiyoshi

The Journal of the Acoustical Society of Japan 62 ( 5 ) 379 - 382 2006.5

Joint Work

DOI： 10.20697/jasj.62.5_379

Investigation of effectiveness to estimate vocal tract transfer functions by FEM

Nishimoto Hironori, Akagi Masato, Kitamura Tatsuya, Suzuki Noriko

The Journal of the Acoustical Society of Japan 62 ( 4 ) 306 - 315 2006.4

Joint Work

DOI： 10.20697/jasj.62.4_306

Acoustic roles of the laryngeal cavity in vocal tract resonance Reviewed

Hironori Takemoto, Seiji Adachi, Tatsuya Kitamura, Parham Mokhtari, Kiyoshi Honda

Journal of the Acoustical Society of America 2006.4

Joint Work

Cyclicity of laryngeal cavity resonance due to vocal fold vibration Reviewed

Tatsuya Kitamura, Hironori Takemoto, Seiji Adachi, Parham Mokhtari, Kiyoshi Honda

Journal of the Acoustical Society of America 120 ( 4 ) 2239 - 2249 2006.4

Joint Work

Publisher：ACOUSTICAL SOC AMER AMER INST PHYSICS

Acoustic effects of the time-varying glottal area due to vocal fold vibration on the laryngeal cavity resonance were investigated based on vocal tract area functions and acoustic analysis. The laryngeal cavity consists of the vestibular and ventricular parts of the larynx, and gives rise to a regional acoustic resonance within the vocal tract, with this resonance imparting an extra formant to the vocal tract resonance pattern. Vocal tract transfer functions of the five Japanese vowels uttered by three male subjects were calculated under open- and closed-glottis conditions. The results revealed that the resonance appears at the frequency region from 3.0 to 3.7kHz when the glottis is closed and disappears when it is open. Real spectra estimated from open- and closed-glottis periods of vowel sounds also showed the on-off pattern of the resonance within a pitch period. Furthermore, a time-domain acoustic analysis of vowels indicated that the resonance component could be observed as a pitch-synchronized rise-and-fall pattern of the bandpass amplitude. The cyclic nature of the resonance can be explained as the laryngeal cavity acting as a closed tube that generates the resonance during a closed-glottis period, but damps the resonance off during an open-glottis period.

DOI： 10.1121/1.2335428

Difference in vocal tract shape between upright and supine postures: Observations by an open-type MRI scanner

KITAMURA Tatsuya, TAKEMOTO Hironori, HONDA Kiyoshi, SHIMADA Yasuhiro, FUJIMOTO Ichiro, SYAKUDO Yuko, MASAKI Shinobu, KURODA Kagayaki, OKU-UCHI Noboru, SENDA Michio

Acoustical Science and Technology 26 ( 5 ) 465 - 468 2005.9

Joint Work

DOI： 10.1250/ast.26.465

Individual variation of the hypopharyngeal cavities and its acoustic effects

Tatsuya Kitamura, Kiyoshi Honda, Hironori Takemoto

Acoustical Science and Technology 26 ( 1 ) 16 - 26 2005.1

Joint Work

A method of tooth superimposition on MRI data for accurate measurement of vocal tract shape and dimensions Reviewed

Hironori Takemoto, Tatsuya Kitamura, Hironori Nishimoto, Kiyoshi Honda

Acoustical Science and Technology 25 ( 6 ) 468 - 474 2004.11

Joint Work

Exploring human speech production mechanisms by MRI

Kiyoshi Honda, Hironori Takemoto, Tatsuya Kitamura, Satoru Fujita, Sayoko Takano

IEICE Transactions on Information and Systems E87-D ( 5 ) 1050 - 1058 2004.5

Joint Work

Publisher：IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG

Recent investigations using magnetic resonance imaging (MRI) of human speech organs have opened up new avenues of research. Visualization of the speech production system provides abundant information on the physiological and acoustic realization of human speech. This article summarizes the current status of MRI applications with respect to speech research as well as our own experience of discovery and re-evaluation of acoustic events emanating from the vocal tract and physiological mechanisms.

Development of a Japanese reading resource bank using the Internet

Yoshiko Kawamura, Tatsuya Kitamura

6 241 - 255 2001

Joint Work

Development of a reading tutorial system for JSL and JFL learners using the EDR electronic Japanese-English dictionary Reviewed

Yoshiko Kawamura, Tatsuya Kitamura, Rei Hobara

24 7 - 12 2000.8

Joint Work

学習履歴管理機能を持つ日本語読解支援システムの開発とその評価

北村達也, 川村よし子, 内山潤, 寺朱美, 奥村学

日本教育工学雑誌 23 ( 3 ) 127 - 133 1999

Joint Work

Publisher：日本教育工学会

Significant cues in spectral envelope of isolated vowels for speaker identification

Kitamura Tatsuya, Akagi Masato

The Journal of the Acoustical Society of Japan 53 ( 3 ) 185 - 191 1997.3

Joint Work

Publisher：一般社団法人日本音響学会

単母音のスペクトル包絡において個人性が顕著に現れる帯域とその帯域において話者識別に寄与する成分についての検討を行った.スペクトル包絡の特定の帯域を変形させた刺激音を用いた聴覚実験により, スペクトル包絡の変形と個人性知覚との定量的な関係を求めた.その結果, 以下のことが明らかになった.(1)個人性はスペクトル包絡全体に現れるが, 高域により多く現れる.(2)話者識別にはスペクトル包絡のdipよりもpeakが重要な意味を持っている.(3)個人性は音韻によらずスペクトル包絡の20 ERB rate (1,740Hz)付近に存在するpeak以上の帯域に顕著に現れる可能性が高く, この帯域を利用して話者変換が可能である.(4)この帯域のpeakを3角形で近似しても個人性が保存される.

DOI： 10.20697/jasj.53.3_185

Speaker individualities in speech spectral envelopes

Tatsuya Kitamura, Masato Akagi

Journal of the Acoustical Society of Japan (E) 16 ( 5 ) 283 - 289 1995.9

Joint Work

The aim of the three psychoacoustic experiments described here was to clarify whether there are speaker individualities in the spectral envelopes, in which frequency bands such individualities exist, and how frequency bands having speaker individualities can be manipulated. The LMA analysis-synthesis system was used to prepare stimuli varied specific frequency bands, and the frequency bands having speaker individualities were estimated expermentally. The results indicate that (1) speaker individualities exist in spectral envelopes, (2) these individualities are mainly at frequencies higher ...

DOI： 10.1250/ast.16.283

離散分布型HMMによる単語音声認識におけるビタビbest-firstサーチの検討

好田正紀, 北村達也

電子情報通信学会論文誌. D-II, 情報・システム, II-情報処理 77 ( 7 ) 1187 - 1197 1994

Joint Work

HMMによる音声認識をグラフサーチの問題とみなし，ビームサーチの技法を利用して，当該節点までのスコアのみに基づく枝刈りや，forward-backwardサーチのようにより単純なモデルを用いた認識処理に基づく当該節点以降の推定スコアも考慮した枝刈りが検討された．また，best-firstサーチの技法を利用して，スタックデコーディング法のように厳密なA探索に必ずしもこだわらない実用的な探索法や，tree-trellisサーチのようにN-best候補の探索に対して高速化を図る方法が検討された．本論文では，best-firstサーチの技法を利用して，HMMのビタビアルゴリズムによる認識処理に対して高速化を図る方法を検討し，最大経路スコアに基づく推定スコア設定法および単純な音素HMMを利用する推定スコア設定法を提案した．ビタビbest-firstサーチは，推定スコアを適切に設定すれば，認識率を低下させずに，認識処理で主要な部分を占める経路展開の計算量が1％以下となり，計算量低減の効果が非常に大きいことを示した．単純な音素HMMを利用する推定スコアは，時間軸の順序関係が考慮されるので精度が良いが，推定スコア設定に大きな計算量を必要とする．経路展開の計算量と推定スコア設定の計算量の両方を考慮すると，単語内最大経路スコアに基づく推定スコアが最も良い．この推定スコアは，A探索の条件を満たすので，最適解も保証される．