Papers - KITAMURA Tatsuya
-
Effects of increased nasal volume due to topical adrenaline on the voice Reviewed International journal
Oguro, Omura, Uchio, Imagawa, Kitamura, Takemoto, Otori
Journal of Voice 2024.9
-
JASA Express Letters 4 015201 2024.1
-
Vocal tract configuration during imitating voices: A case study for a professional impersonator Reviewed
Tatsuya Kitamura
Acoustical Science and Technology 44 ( 5 ) 407 - 410 2023.9
Authorship:Lead author, Last author, Corresponding author
-
Articulation of Nasal Consonants in the Ikema Dialect of Miyako Ryukyuan of Southern Ryukyu Reviewed
Journal of the Phonetic Society of Japan 27 ( 1 ) 13 - 26 2023.7
Authorship:Last author
-
自主的な音声障害のリハビリテーションの継続を可能とするためのIoTクラウドシステムの開発 Reviewed
川村直子, 北村達也
リハビリテーション・エンジニアリング 38 ( 2 ) 95 - 104 2023.5
Authorship:Last author
-
Naoko Kawamura, Tatsuya Kitamura
The Japan Journal of Logopedics and Phoniatrics 64 ( 1 ) 10 - 17 2023.1
Authorship:Last author Publisher:The Japan Society of Logopedics and Phoniatrics
DOI: 10.5112/jjlp.64.10
-
中村紘稀, 北村達也, 梅谷智弘
甲南大学紀要知能情報学編 15 ( 1 ) 45 - 52 2022.9
Authorship:Last author
-
北村達也, 川村よし子
甲南大学紀要知能情報学編 15 ( 1 ) 13 - 24 2022.9
Authorship:Lead author, Corresponding author
-
コロナ禍における音声収録の実態調査 Invited
北村 達也, 高野 佐代子, 石本 祐一
日本音響学会誌 78 ( 4 ) 187 - 188 2022.4
-
音声訓練法による顔面皮膚振動パターンの変化 ―言語聴覚士を対象にした計測―
川村 直子, 北村 達也
甲南大学紀要 知能情報学編 13 ( 2 ) 111 - 122 2021.2
Joint Work
Publisher:甲南大学
音声障害のリハビリテーション(音声リハビリ)で行う音声訓練において,発声時の顔面の皮膚振動感覚は効率の良い発声状態を表すとして重視されている.しかしながら,顔面の皮膚振動感覚はあくまで患者の主観に基づいた感覚であり,指導する言語聴覚士が患者の顔面の皮膚振動を把握することは現状では難しい.さらに,音声訓練中の顔面の皮膚振動について検証した報告は今のところ見受けられない.そこで,本研究では,音声リハビリ経験のある言語聴覚士を対象に,顔面の振動感覚を重視する3つの音声訓練法を用いて,スキャニング型レーザドップラ振動計により発声時の顔面の皮膚振動速度パターンを計測したので報告する.
DOI: 10.14990/00003695
Other Link: http://doi.org/10.14990/00003695
-
Hideki Kawahara, Kohei Yatabe, Ken-Ichi Sakakibara, Tatsuya Kitamura, Hideki Banno, Masanori Morise
CoRR abs/2111.03629 2021
-
淺井 優介, 北村 達也, 川村 よし子
甲南大学紀要 知能情報学編 13 ( 1 ) 67 - 75 2020.7
Joint Work
Publisher:甲南大学
日本語教育が必要な児童向けの教材作成の基礎データを提供するために,NHK Eテレの小学生向け教育番組の音声を書き起こし,語彙表を試作した.低学年向け17番組計510分,高学年向け19番組計570分を音声認識を利用して書き起こし,その中に現れた単語を有用度指標にもとづいて降順に並べ,語彙表を作成した.得られた語彙表は,先行研究にて作成された書き言葉コーパスに基づく語彙表よりも易しい単語が抽出されており,本研究の方法論の有効性が示された.
DOI: 10.14990/00003648
Other Link: http://doi.org/10.14990/00003648
-
System Integration for Component-Based Manzai Robots with Improved Scalability
Tomohiro Umetani, Satoshi Aoki, Tatsuya Kitamura, Akiyo Nadamoto
Journal of Robotics and Mechatronics 32 ( 2 ) 459 - 468 2020.4
Publisher:Fuji Technology Press Ltd.
This paper describes system developments for integrating control systems of Manzai robot duos that automatically generate Manzai scripts from Internet articles based on given keywords, as well as improvements in the scalability of the integrated control system. Component-based Manzai robots controlled by RT-Middleware have been developed. However, conventional Manzai robot systems, the control systems of which are individually developed, experience some difficulties in interface integration and system maintenance as well as in scalability. In this study, we built a Manzai robot system excellent in reusability, maintainability and scalability by separating the common part from the hardware-dependent part by using the RT components of RT-Middleware. We also verify the reusability and scalability of the hardware-constrained component groups by implementing the Manzai robot control system into ready-made robots with different types of mechanism. We proved the effectiveness of the developed Manzai robot control system on its implementation results.
-
Tatsuya Kitamura, Yuta Amakawa, Hiroaki Hatano
Journal of the Phonetic Society of Japan 23 ( 0 ) 165 - 173 2019.12
Joint Work
Publisher:The Phonetic Society of Japan
In a delayed fundamental frequency (F0) fall or a late fall phenomenon, the F0 fall occurs on the post-accented mora in Japanese speech. This study conducted a large-scale investigation of the occurrence conditions of the delayed F0 fall for 230 words of 48 Tokyo-dialect Japanese speakers (21 males and 27 females). The results showed that the delayed F0 fall occurred more frequently (1) in female speech than in male speech, (2) in initial-accented words than in middle-accented words, (3) in longer words, (4) in words in which the accented mora was followed by a mora with a back vowel.
-
Further observations on a principal components analysis of head-related transfer functions Reviewed
Parham Mokhtari, Hiroaki Kato, Hironori Takemoto, Ryouichi Nishimura, Seigo Enomoto, Seiji Adachi, Tatsuya Kitamura
Scientific Reports 9 7477 2019.5
-
Survey of Japanese undergraduate and graduate students' self-concept of clumsiness in speech Reviewed
75 ( 3 ) 118 - 124 2019.3
Joint Work
A questionnaire-based survey was conducted on native Japanese undergraduate and graduate students in 15 universities and institutes in Japan to evaluate their feelings of clumsiness while speaking during daily conversation. Responses from 1,831 students without known history of speech, hearing, or language disorders were analyzed. The results showed that 31.0 % of the participants felt "clumsy" or "rather clumsy" while speaking during daily conversation. Analysis by gender revealed that 35.5% of the male students and 24.4% of the female students felt "clumsy" or "rather clumsy" while speaking. The students who had focused on science and maths in high school tended to feel a greater degree of clumsiness in speech than those who had focused on humanities. Students who felt clumsy while speaking tended to think that their speech was often misunderstood. Over 90% of the participants expressed interest in improving their articulation.
-
教科書中の単語の初出課を判定する日本語教育支援システムの利用状況の分析
北村 達也
甲南大学紀要 知能情報学編 11 ( 2 ) 209 - 215 2019.2
Single Work
Publisher:甲南大学
日本語教育用の教科書に含まれる単語がその教科書において初めて現れる課(初出課)を自動的に判定するシステムを開発し,その利用状況を調査した.その結果,2018年4月1日から7月31日までの四半期に10,000回を超えるアクセスがあり,そのうちの約9割は日本国内からのアクセスであった.また,利用者100名を対象としたアンケート調査の結果,利用者の約6割が日本語教師を職業としている人であった.そして,利用者の約8割がこのようなシステムの有無が教科書の選定に影響すると回答した.
DOI: 10.14990/00003307
Other Link: http://doi.org/10.14990/00003307
-
Morphological characteristics of male and female hypopharynx: A magnetic resonance imaging-based study Reviewed
Ju Zhang, Kiyoshi Honda, Jianguo Wei, Tatsuya Kitamura
Journal of the Acoustical Society of America 145 734 2019.2
-
チューブ発声時の皮膚振動を利用したバイオフィードバックシステムの開発と効果の検討 Reviewed
川村直子, 北村達也, 城本修
音声言語医学 59 ( 4 ) 334 - 341 2018.9
-
Audio-Visual Teaching Aid for Instructing English Stress Timings
Tatsuya Kitamura, Ryo Nagata, Kotaro Funakoshi
甲南大学紀要 知能情報学編 11 ( 1 ) 1 - 17 2018.7
Joint Work
Publisher:甲南大学
This study proposed and evaluated an audio-visual teaching aid for teaching rhythm of spoken English. The teaching aid instructs stress timing of English by movements of a circle marker on PC screen. Native Japanese participants exercised English sentences with and without the teaching aid and their speech sounds were recorded before and after the exercise. The results of analyses of the speech sounds showed that the teaching aid could improve in learning the English stress timing.
DOI: 10.14990/00003196
Other Link: http://doi.org/10.14990/00003196
-
Tatsuya Kitamura, Yukiko Nota, Michiko Hashi, Hiroaki Hatano
JASA Express Letters 143 ( 3 ) EL154 - EL159 2018.3
Joint Work
This study attempted to improve the five-degrees-of-freedom sensors of the Northern Digital Incorporated's Wave electromagnetic articulography system by replacing their cables with thinner and more flexible cables to reduce interference in articulation. Measurement errors and data loss rates were compared between the original and the proposed sensors. The proposed sensors showed twofold tracking accuracy and data loss rates compared to the original sensors in an experiment using a crank-rocker mechanism. Data loss rates of the proposed sensors increased in articulatory data collection from four speakers. The proposed sensors have been made available commercially.
DOI: 10.1121/1.5025167
-
自己完結性を有するコンポーネント駆動型の卓上ロボット環境の構築
梅谷智弘, 清瀬大貴, 榊原洋之, 青木哲, 北村達也
計測自動制御学会論文誌 54 ( 1 ) 126 - 128 2018
-
Scalable Component-Based Manzai Robots as Automated Funny Content Generators
Tomohiro Umetani, Satoshi Aoki, Kazuhiro Akiyama, Ryo Mashimo, Tatsuya Kitamura, Akiyo Nadamoto
Journal of Robotics and Mechatronics 28 ( 6 ) 862 - 869 2016.12
-
Implicit Communication Robots based on Automatic Scenario Generation using Web Intelligence
MASHIMO Ryo, KITAMURA Tatsuya, UMETANI Tomohiro, NADAMOTO Akiyo
International Journal of Web Information Systems 12 ( 3 ) 312 - 335 2016.9
-
Manzai robot system with scalability based on distributed software components
Tomohiro Umetani, Satoshi Aoki, Kazuhiro Akiyama, Ryo Mashimo, Tatsuya Kitamura, Akiyo Nadamoto
2015 International Symposium on Micro-NanoMechatronics and Human Science, MHS 2015 2016.3
Joint Work
Publisher:Institute of Electrical and Electronics Engineers Inc.
This paper describes a manzai robot system with scalability that is developed based on the distributed software components. Manzai is a Japanese traditional stand-up comedy that is usually performed by two comedians. The manzai robots generate their manzai scripts based on web news articles related to keywords given by audiences and the searching results on WWW automatically, and then the robots perform the manzai scripts. Each robot is controlled by distributed RT components executed on the Raspberry Pi controller. The RT components control the manzai robots synchronously. The paper focuses on the scalability of the manzai robot system. Experimental results show the feasibility of manzai performance robots with scalability of the functions of the robot systems.
-
磁気センサシステムに基づく調音運動と口蓋形状の関係の観測
北村 達也, 能田 由紀子, 吐師 道子, 波多野 博顕, 梅谷 智弘
音声言語医学 57 ( 1 ) 52 - 52 2016.1
-
Human-Robots Implicit Communication based on Dialogue between Robots using Automatic Generation of Funny Scenarios from Web
Ryo Mashimo, Tomohiro Umetani, Tatsuya Kitamura, Akiyo Nadamoto
ELEVENTH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN ROBOT INTERACTION (HRI'16) 327 - 334 2016
Joint Work
Publisher:ASSOC COMPUTING MACHINERY
Numerous studies have examined communication robots that communicate with people, but it is difficult for robots to communicate with people smoothly. We call the communication style based on dialogue between robots as "human-robot implicit communication". As described herein, we propose a Manzai-robots for which the interaction style is human-robot implicit communication based on an automatically generated scenario from web news. Our generated Manzai scenario consists of snappy patter and a misunderstanding of dialogue based on the four kinds of gap of structure of funny points. Our purpose is that people feel familiarity from smoothly human-robot communication using dialogue between robots based on a Manzai scenario. We conducted experiment of three kinds to assess (1) the effectiveness of automatic creation of Manzai scenario for the robots, (2) the effectiveness of the Manzai-robots as a media, and (3) the effectiveness of types of familiarity for Manzai-robots. Based on their results, we measured the familiarity and smooth communication of our Manzai-robots.
-
Automatic generation of Japanese traditional funny scenario from web content based on web intelligence
Ryo Mashimo, Tomohiro Umetani, Tatsuya Kitamura, Akiyo Nadamoto
17th International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2015 - Proceedings 2015.12
Joint Work
Publisher:Association for Computing Machinery, Inc
Today there is much information and knowledge on the internet, and many studies have examined the extraction of many kinds of knowledge from the internet. In addition, numerous studies have examined entertainment robots that communicate with people, but it is difficult for robots to communicate smoothly with people. We specifically examine communication between robots based on dialogue. Here, we create a dialogue-based scenario for the robots to undertake automatically, but it is difficult because the dialogue requires knowledge of many kinds. We consider the use of the knowledge from the web and create scenarios automatically. As described herein, we propose a system that generates dialogue scenarios automatically from web news articles in real time. We used the Manzai metaphor, which is Japanese traditional humorous comedy in our system. Our generated Manzai scenario consists of snappy patter and a misunderstanding dialogue based on the gap of our structure of funny points. We create communication robots to amuse people with our generated humorous robot dialogue scenarios.
-
Tongue Movement of Healthy Adults who Feel Clumsy Articulating Alveolar Flaps
TACHIKAWA Wataru, OZAWA Yoshiaki, HASHI Michiko, KITAMURA Tatsuya, NOTA Yukiko
Journal of the Phonetic Society of Japan 19 ( 3 ) 50 - 56 2015.12
Joint Work
Publisher:Journal of the Phonetic Society of Japan
Tongue movement of healthy young adults who feel clumsy articulating alveolar sounds in daily conversation was measured using the WAVE speech research system. Tongue blade movements of these speakers during repetitive production of Japanese /ra/ showed reduction of speed and movement range compared with those who feel no clumsiness, despite the absence of any organic or neurological abnormalities. Such differences may suggest underlying differences of fine and rapid motor controls required for smooth speech production which may be related to their awareness of clumsiness in articulation.
-
Crucial Prosodic Features in Japanese Learners' Pronunciation: Evidence from Naturalness Judgments of Synthetic Speech Reviewed
Rongna A, Ryoko Hayashi, Tatsuya Kitamura
Journal of the Phonetic Society of Japan 19 ( 3 ) 37 - 42 2015.12
Joint Work
The present study reports native speakers' impressions of JFL learners' utterances before and after shadowing/repeating training. Evaluation was also done for prosodically modified synthesized stimuli in order to examine the crucial prosodic cues. The results suggest that both durational patterns and pitch patterns are important for the utterances to be heard as natural Japanese, but durational patterns may be more important. Moreover, shadowing training appears to improve mora-timed rhythm. The results of the present study could provide useful suggestions for developing pronunciation training for Japanese pronunciation and speech education.
-
Non-contact measurement of facial surface vibration patterns during singing by scanning laser Doppler vibrometer
Tatsuya Kitamura, Keisuke Ohtani
Frontiers in Psychology, section Performance Science 6 2015.11
Joint Work
Publisher:FRONTIERS MEDIA SA
This paper presents a method of measuring the vibration patterns on facial surfaces by using a scanning laser Doppler vibrometer (LDV). The surfaces of the face, neck, and body vibrate during phonation and, according to Titze (2001), these vibrations occur when aerodynamic energy is efficiently converted into acoustic energy at the glottis. A vocalist's vibration velocity patterns may therefore indicate his or her phonatory status or singing skills. LDVs enable laser-based non-contact measurement of the vibration velocity and displacement of a certain point on a vibrating object, and scanning LDVs permit multipoint measurements. The benefits of scanning LDVs originate from the facts that they do not affect the vibrations of measured objects and that they can rapidly measure the vibration patterns across planes. A case study is presented herein to demonstrate the method of measuring vibration velocity patterns with a scanning LDV. The objective of the experiment was to measure the vibration velocity differences between the modal and falsetto registers while three professional soprano singers sang sustained vowels at four pitch frequencies. The results suggest that there is a possibility that pitch frequency are correlated with vibration velocity. However, further investigations are necessary to clarify the relationships between vibration velocity patterns and phonation status and singing skills.
-
Measurement of perceptual speaker similarity for sentence speech in ATR speech database Reviewed
Journal of the Acoustical Society of Japan 71 ( 10 ) 516 - 525 2015.10
-
Improvement of five-degree-of-freedom sensors for Northern Digital Incorporated's Wave speech research system
Tatsuya Kitamura, Yukiko Nota, Michiko Hashi, Hiroaki Hatano
Acoustical Science and Technology 36 ( 4 ) 347 - 350 2015
-
Manzai Robots: Entertainment Robots Based on Auto-Created Manzai Scripts from Web News Articles
UMETANI Tomohiro, MASHIMO Ryo, NADAMOTO Akiyo, KITAMURA Tatsuya, NAKAYAMA Hirotaka
J Robot Mechatron 26 ( 5 ) 662 - 664 2014.10
-
Verification of reproducibility of measurements of skin vibration during singing by scanning laser-Doppler vibrometer Reviewed
55 ( 2 ) 167 - 172 2014.4
-
Hatano Hiroaki, Kitamura Tatsuya
The Journal of the Acoustical Society of Japan 70 ( 3 ) 106 - 113 2014.3
Joint Work
Publisher:The Acoustical Society of Japan
-
発話観測システムNDI Waveの改良型センサを用いた子音構音の観測
北村 達也, 能田 由紀子, 波多野 博顕, 吐師 道子, 西谷 実
音声言語医学 55 ( 1 ) 59 - 59 2014.1
-
Comparison of vocal tract transfer functions calculated using one-dimensional and three-dimensional acoustic simulation methods
Hironori Takemoto, Parham Mokhtari, Tatsuya Kitamura
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4 408 - 412 2014
Joint Work
Publisher:ISCA-INT SPEECH COMMUNICATION ASSOC
Acoustic characteristics of the vocal tract have been investigated extensively in the literature using a one-dimensional (ID) acoustic simulation method. Because the ID method assumes plane wave propagation only, it is recognized to be valid only in the low frequency region (below about 4 or 5 kHz). Recently, a three-dimensional (3D) acoustic simulation method was developed, to obtain more precise acoustic characteristics of the vocal tract. In the present study, from a male's vocal tract shapes, transfer functions were calculated using the 1D and 3D methods and compared with each other to evaluate the valid frequency range of the ID method. As a result, when acoustic effects of the piriform fossae were considered in the ID method, the transfer functions agreed with each other up to 7 kHz (ignoring small dips). The 3D method showed that a deep dip was generated at around 8 kHz by the transverse resonance mode in the pharynx. Above this dip frequency, the transfer functions disagreed with each other. Thus, the ID method is valid up to 7 kHz for this subject. Because this subject has a relatively large vocal tract, in general the upper limit of the valid frequency range could exceed 8 kHz.
-
Vocal tract length estimation based on vowels using a database consisting of 385 speakers and a database with MRI-based vocal tract shape information
Hideki Kawahara, Tatsuya Kitamura, Hironori Takemoto, Ryuichi Nisimura, Toshio Irino
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4 870 - 874 2014
Joint Work
Publisher:ISCA-INT SPEECH COMMUNICATION ASSOC
A highly-reproducible estimation method of vocal tract length (VTL) and text independent VTL estimation method are proposed based on a Japanese vowel database spoken by 385 male and female speakers ranging from age 6 to 56 and other vowel database with MRI-based vocal tract shape information. Proposed methods are based on interference-free power spectral representation and systematic suppression of biasing factors. MRI data is used to calibrate VTL estimation result to be represented in terms of physically meaningful unit. These databases are normalized based on the estimated VTL information to provide a reference template, which is used to implement a text independent VTL estimation method. A prototype system for text independent estimation of VTL is implemented using Mat lab and runs faster than realtime on a PC.
-
Acoustic interaction between the right and left piriform fossae in generating spectral dips Reviewed
Hironori Takemoto, Seiji Adachi, Parham Mokhtari, Tatsuya Kitamura
Journal of the Acoustical Society of America 134 ( 4 ) 2955 - 2964 2013.10
Joint Work
Publisher:ACOUSTICAL SOC AMER AMER INST PHYSICS
It is known that the right and left piriform fossae generate two deep dips on speech spectra and that acoustic interaction exists in generating the dips: if only one piriform fossa is modified, both the dips change in frequency and amplitude. In the present study, using a simple geometrical model and measured vocal tract shapes, the acoustic interaction was examined by the finite-difference time-domain method. As a result, one of the two dips was lower in frequency than the two independent dips that appeared when either of the piriform fossae was occluded, and the other dip was higher in frequency than the two dips. At the lower dip frequency, the piriform fossae resonated almost in opposite phase, while at the higher dip frequency, they resonated almost in phase. These facts indicate that the piriform fossae and the lower part of the pharynx can be modeled as a coupled two-oscillator system whose two normal vibration modes generate the two spectral dips. When the piriform fossae were identical, only the higher dip appeared. This is because the lower mode is not acoustically coupled to the main vocal tract enough to generate an absorption dip. (C) 2013 Acoustical Society of America.
DOI: 10.1121/1.4818744
-
日本語学習者の音声の韻律変換が自然性評価に与える影響
阿栄娜, 林良子, 北村達也
日本音響学会2013年秋季研究発表会講演論文集 425 - 426 2013.9
Joint Work
-
Naturalness on Japanese pronunciation before and after shadowing training and prosody modified stimuli
Rongna A, Ryoko Hayashi, Tatsuya Kitamura
Proceedings of Interspeech 2013 Satellite workshop on Speech and Language Technology in Education 143 - 146 2013.8
Joint Work
-
Timing differences in articulation between voiced and voiceless stop consonants: An analysis of cine-MRI data
Masako Fujimoto, Tatsuya Kitamura, Hiroaki Hatano, Ichiro Fujimoto
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 955 - 958 2013
Joint Work
Publisher:ISCA-INT SPEECH COMMUNICATION ASSOC
Laryngeal and supralaryngeal articulators coordinately work to produce speech sounds. In order to study differences in supralaryngeal manifestations of voiced and voiceless consonants, we compared the tongue movement during a minimal pair /agise/ and /akise/ using the fast scanning techniques of MRI movies. The result showed that the tongue displacement starts earlier in /k/ than in /g/ for many of the speakers of Tokyo Japanese. This agrees with our previous findings using other dialect speakers. These results suggest that many Japanese actively differentiate supralaryngeal articulation according to the voicing of the consonants, raising the tongue earlier in voiceless ones. This movement is presumably to ensure the voicelessness of the consonant. The present study also supplies evidence for the usefulness of a constructive approach for physical modeling.
-
Differences in articulatory movement between voiced and voiceless stop consonants Reviewed
Ryosuke O. Tachibana, Tatsuya Kitamura, Masako Fujimoto
Acoustical Science and Technology 33 ( 6 ) 391 - 393 2012.11
-
Measurement of vibration velocity pattern of facial surface during phonation using scanning vibrometer
Tatsuya Kitamura
Acoustical Science and Technology 33 ( 2 ) 126 - 128 2012.3
-
A Method for Predicting Stressed Words in Teaching Materials for English Jazz Chants Reviewed
NAGATA Ryo, FUNAKOSHI Kotaro, KITAMURA Tatsuya, NAKANO Mikio
IEICE Trans Inf Syst (Inst Electron Inf Commun Eng) E95.D ( 11 ) 2658 - 2663 2012
Joint Work
Publisher:IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG
To acquire a second language, one must develop an ear and tongue for the correct stress and intonation patterns of that language. In English language teaching, there is an effective method called Jazz Chants for working on the sound system. In this paper, we propose a method for predicting stressed words, which play a crucial role in Jazz Chants. The proposed method is specially designed for stress prediction in Jazz chants. It exploits several sources of information including words, POSs, sentence types, and the constraint on the number of stressed words in a chant text. Experiments show that the proposed method achieves an F-measure of 0.939 and outperforms the other methods implemented for comparison. The proposed method is expected to be useful in supporting non-native teachers of English when they teach chants to students and create chant texts with stress marks from arbitrary texts.
-
Correlation between vocal tract length, body height, formant frequencies, and pitch frequency for the five Japanese vowels uttered by fifteen male speakers
Hiroaki Hatano, Tatsuya Kitamura, Hironori Takemoto, Parham Mokhtari, Kiyoshi Honda, Shinobu Masaki
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3 402 - 405 2012
Joint Work
Publisher:ISCA-INT SPEECH COMMUNICATION ASSOC
We conducted quantitative analyses of a magnetic resonance imaging (MRI) database to examine the correlation between physical measures (vocal tract length and body height) and acoustic parameters (pitch and formant frequencies) of vowels. The vocal tract length was measured from MRI data for the five Japanese vowels produced by fifteen male Japanese speakers between the ages of 24 and 55. The acoustic features were computed from vowel sounds recorded during scan. The vocal tract length showed a weak positive correlation with the speakers' age (correlation coefficient r = 0.51) but not with the speaker body height (r = 0.08). There were only weaker correlations between the vocal tract length and the first four formant frequencies except that F1 and F2 of the vowel /e/ show negative correlations with the vocal tract length (F1: r = -0.65, F2: r = -0.56). The result suggests that the vocal tract length is one of the dominant factors causing individual differences in the formant frequencies for the vowel /e/, produced by not forming a strong constriction. Furthermore, the pitch frequency was negatively correlated with the body height (r = -0.61).
-
Simulation of the coupling between vocal-fold vibration and time-varying vocal tract
Yosuke Tanabe, Parham Mokhtari, Hironori Takemoto, Tatsuya Kitamura
Journal of the Acoustical Society of America 130 ( 4 ) 2441 2011.10
Joint Work
-
Study of perceptual factors for speaker identification focusing on perceptual similarity of speaker characteristics
Tsuyoshi Izumida, Tatsuya Kitamura
Acoustical Science and Technology 32 ( 5 ) 216 - 219 2011.9
-
Dental imaging using a magnetic resonance visible mouthpiece for measurement of vocal tract shape and dimension
Tatsuya Kitamura, Hironori Nishimoto, Ichiro Fujimoto, Yasuhiro Shimada
Acoustical Science and Technology 32 ( 5 ) 224 - 227 2011.9
Joint Work
-
Acoustic analysis of the vocal tract during vowel production by finite-difference time-domain method Reviewed
Hironori Takemoto, Parham Mokhtari, Tatsuya Kitamura
Journal of the Acoustical Society of America 128 ( 6 ) 3724 - 3738 2010.12
Joint Work
Publisher:ACOUSTICAL SOC AMER AMER INST PHYSICS
The vocal tract shape is three-dimensionally complex. For accurate acoustic analysis, a finite-difference time-domain method was introduced in the present study. By this method, transfer functions of the vocal tract for the five Japanese vowels were calculated from three-dimensionally reconstructed magnetic resonance imaging (MRI) data. The calculated transfer functions were compared with those obtained from acoustic measurements of vocal tract physical models precisely constructed from the same MRI data. Calculated transfer functions agreed well with measured ones up to 10 kHz. Acoustic effects of the piriform fossae, epiglottic valleculae, and inter-dental spaces were also examined. They caused spectral changes by generating dips. The amount of change was significant for the piriform fossae, while it was almost negligible for the other two. The piriform fossae and valleculae generated spectral dips for all the vowels. The dip frequencies of the piriform fossae were almost stable, while those of the valleculae varied among vowels. The inter-dental spaces generated very small spectral dips below 2.5 kHz for the high and middle vowels. In addition, transverse resonances within the oral cavity generated small spectral dips above 4 kHz for the low vowels.
DOI: 10.1121/1.3502470
-
Visualisation of hypopharyngeal cavities and vocal-tract acoustic modelling
Kiyoshi Honda, Tatsuya Kitamura, Hironori Takemoto, al
Computer methods in Biomechanics and Biomedical Engineering 13 ( 4 ) 443 - 453 2010.7
Joint Work
Publisher:TAYLOR & FRANCIS LTD
The hypopharyngeal cavities consist of the laryngeal cavity and bilateral piriform fossa, constituting the bottom part of the vocal tract near the larynx. Visualisation of these cavities with magnetic resonance imaging (MRI) techniques reveals that during speech, the laryngeal cavity takes the form of a long-neck flask and the piriform fossa takes the form of a goblet of varying shapes: the former diminishes greatly in whispering and the latter disappears during deep inhalation. These cavities have been shown to exert significant acoustic effects at higher frequency spectra. In this study, acoustic experiments were conducted for male and female mechanical vocal tracts with the results that acoustic effects of those cavities determine the frequency spectra above 2kHz, giving rise to peaks and zeros. An acoustic model of vowel production was proposed with three components: voice source, hypopharyngeal cavities and vocal tract proper, which provides effective means in controlling voice quality and expressing individual vocal characteristics.
-
Yasuhiro Hamada, Tatsuya Kitamura, Masato Akagi
Journal of Signal Processing 14 ( 4 ) 265 - 268 2010.7
Joint Work
-
Similarity of effects of emotions on the speech organ configuration with and without speaking
Tatsuya Kitamura
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2 909 - 912 2010
Single Work
Publisher:ISCA-INST SPEECH COMMUNICATION ASSOC
In this work we propose and verify a hypothesis on emotional speech production: emotions induce physical and physiological changes in the whole body including changes in the configuration and physical/mechanical properties of the speech organs, regardless of whether or not the person is speaking, and as a side effect, this changes the voice quality. To verify this hypothesis, we measured the configuration of the speech organs of professional actors simulating four emotions (neutral, hot anger, joy, and sadness) with and without speaking by magnetic resonance imaging. The results clearly showed that emotions affect the speech organ configuration, and the same tendency of changes in the speech organ configuration was found regardless of whether or not the person was speaking. We also measured electromagnetic articulography data while a participant watched a relaxation or horror movie, and the result implies that emotional changes can deform the speech organ configuration even if the participant does not speak. These results support our hypothesis.
-
Transfer functions of solid vocal-tract models constructed from ATR MRI database of Japanese vowel production
Tatsuya Kitamura, Hironori Takemoto, Seiji Adachi, Kiyoshi Honda
Acoustical Science and Technologies 30 ( 4 ) 288 - 296 2009.4
Joint Work
The ATR MRI database of Japanese vowel production was used to evaluate the acoustic characteristics of the vocal tract for the five Japanese vowels through the measurements of frequency responses from solid vocal-tract models formed by a stereolithographic technique. The database includes speech sounds as well as volumetric magnetic resonance imaging (MRI) data, but the speech sounds were recorded separately from the acquisition MRI data; therefore, their speech spectra are not appropriate for use as the reference for the transfer functions of the vocal tract. A time-stretched pulse signal generated from a horn driver unit was introduced into the physical model at the lips, and the response signals of the models were recorded at the model's glottis. In the measurements, the glottis of the models was sealed with a plastic plate, and the response signals were measured from a small hole in the plate using a probe microphone. This method permits accurate measurement of the transfer functions of the vocal tract under a closed-glottis condition. The resulting transfer functions of the five Japanese vowels provide a benchmark for testing numerical analysis methods that have been used to study vocal-tract acoustics, although the solid wall decreases the frequencies of lower resonances.
DOI: 10.1250/ast.30.288
-
Resonance characteristics of hypopharyngeal cavities
Kiyoshi Honda, Tatsuya Kitamura, Hironori Takemoto, Parham Mokhtari, Seiji Adachi
Journal of the Acoustical Society of America 123 ( 5 ) 3731 2008.7
Joint Work
-
MRI-Based Study on Morphological and Acoustic Properties of Mandarin Sustained Vowels
WANG Gaowu, KITAMURA Tatsuya, LU Xugang, DANG Jianwu, KONG Jiangping
J Signal Process 12 ( 4 ) 311 - 314 2008.7
-
Deformation of the hypopharyngeal cavities due to F0 changes and its acoustic effects Reviewed
Hironori Takemoto, Tatsuya Kitamura, Kiyoshi Honda, Shinobu Masaki
Acoustical Science and Technology 2008.4
Joint Work
-
Single-matrix formulation of a time domain acoustic model of the vocal tract with side branches
MOKHTARI Parham, TAKEMOTO Hironori, KITAMURA Tatsuya
Speech Communication 50 ( 3 ) 179 - 190 2008.3
Joint Work
Publisher:ELSEVIER SCIENCE BV
Although it has been found that the piriform fossae play an important role in speech production and acoustics, the popular time domain articulatory synthesizer of [Maeda, S., 1982. A digital simulation method of the vocal-tract system. Speech Comm. 1 (3–4), 199–229] currently cannot include any more than one side branch to the acoustic tube that represents the main vocal tract. To overcome this limitation, in this paper we extended Maeda's (1982) simulation method, by mathematical reformulation in terms of a single-matrix equation having a system matrix that is both sparse and symmetric. Using vocal tract area functions measured by MRI, the simulation results showed that the piriform fossae suppress the energy in the higher frequencies by introducing spectral zeros around 4–5 kHz, and also tend to lower the second formant of vowels. These spectral changes agree with results produced using a well-tested frequency domain transmission-line method, thus validating our new formulation of the time domain synthesizer. The reformulation can be easily extended to accommodate any number of vocal tract side branches, thus enabling more realistic, physiologically correct acoustic simulation of speech production.
-
Effects of acoustic modification on perception of speaker characteristics for sustained vowels
Tatsuya Kitamura, Takeshi Saitou
Acoustical Science and Technology 2007.6
Joint Work
-
Vocal tract length perturbation and its application to male-female vocal tract shape conversion
Seiji Adachi, Hironori Takemoto, Tatsuya Kitamura, Parham Mokhtari, Kiyoshi Honda
Journal of the Acoustical Society of America 121 ( 6 ) 3874 - 3885 2007.6
Joint Work
Publisher:ACOUSTICAL SOC AMER AMER INST PHYSICS
An alternative and complete derivation of the vocal tract length sensitivity function, which is an equation for finding a change in formant frequency due to perturbation of the vocal tract length [Fant, Quarterly Progress and Status Rep. No. 4, Speech Transmission Laboratory, Kungliga Teknisha H6gskolan, Stockholm, 1975, pp. 1-14] is presented. It is based on the adiabatic invariance of the vocal tract as an acoustic resonator and on the radiation pressure on the wall and at the exit of the vocal tract. An algorithm for tuning the vocal tract shape to match the formant frequencies to target values, such as those of a recorded speech signal, which was proposed in Story [J. Acoust. Soc. Am. 119, 715-718 (2006)], is extended so that the vocal tract length can also be changed. Numerical simulation of this extended algorithm shows that it can successfully convert between the vocal tract shapes of a male and a female for each of five Japanese vowels. (c) 2007 Acoustical Society of America.
DOI: 10.1121/1.2730743
-
Principal components of vocal tract area functions and inversion of speech by linear regression of cepstrum coefficient
Parham Mokhtari, Hironori Takemoto, Tatsuya Kitamura, Kiyoshi Honda
Journal of Phonetics 2007.1
Joint Work
-
A bone-conduction system for auditory stimulation in MRI Reviewed
Yukiko Nota, Tatsuya Kitamura, Hironori Takemoto, Hiroyuki Hirata, Kiyoshi Honda, Yasuhiro Shimada, Ichiro Fujimoto, Yuko Syakudo, Shinobu Masaki
Acoustical Science and Technology 2007.1
Joint Work
-
Principal components of vocal-tract area functions and inversion of vowels by linear regression of cepstrum coefficients
Parham Mokhtari, Tatsuya Kitamura, Hironori Takemoto, Kiyoshi Honda
JOURNAL OF PHONETICS 35 ( 1 ) 20 - 39 2007.1
Joint Work
Publisher:ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD
This paper addresses the following two hypotheses: (1) vocal-tract area functions of Japanese vowels can be accurately represented by a linear combination of only a few principal components which, furthermore, are similar to those reported in the literature for different languages; and (ii) the principal components' weights can be predicted and area functions thereby accurately estimated from acoustics by linear regression of cepstrum parameters. To test these hypotheses, synchronized acoustic and vocal-tract 3D MRI data were recorded from an adult male Japanese speaker for both sustained and dynamic vowel utterances. The first two principal components explained covariations in vocal-tract shape and length accounting for 94-97% of the total variance, and indeed provided a cross-linguistic validation of the two underlying components of vowel production emergent from the literature. Multiple linear regression models were then evaluated for their accuracy in reconstructing the area functions of the dynamic utterance by predicting the first two PC coefficients, using either carefully measured formants or cepstral coefficients defined in various frequency bands. The best formant-based regression model required all four formants, with a mean adjusted correlation of 0.93 and mean absolute errors of 0.187 cm(2) in area and 0.131 em in vocal-tract length. The best cepstrum-based regression model prescribed 24 cepstral coefficients defined in the frequency band 0-4 kHz, with a mean adjusted correlation of 0.92 and mean absolute errors of 0.102 cm(2) in area and 0.082 cm in vocal-tract length. These results suggest that vowel production features, properly constrained by PCA modeling, can be mapped with sufficient accuracy from easily measured cepstrum parameters. More work is required to reduce the dependence on MRI data, to extend the applicability of these methods to different voice qualities and different speakers, and to select a smaller subset of acoustic parameters for more robust, real-time inversion. (c) 2006 Elsevier Ltd. All rights reserved.
-
An MRI-based time-domain speech synthesis system Reviewed
Tatsuya Kitamura, Hironori Takemoto, Parham Mokhtari, Toshio Hirai
Journal of the Acoustical Society of America 120 ( 5 ) 3037 2006.12
Joint Work
-
Changes in vocal tract resonance during a pitch cycle Reviewed
Tatsuya Kitamura, Seiji Adachi
Journal of the Acoustical Society of America 120 ( 5 ) 3351 2006.12
Joint Work
-
Measurements of MRI scanning noise by an optical microphone
Kitamura Tatsuya, Masaki Shinobu, Shimada Yasuhiro, Fujimoto Ichiro, Syakudo Yuko, Honda Kiyoshi
The Journal of the Acoustical Society of Japan 62 ( 5 ) 379 - 382 2006.5
-
Investigation of effectiveness to estimate vocal tract transfer functions by FEM
Nishimoto Hironori, Akagi Masato, Kitamura Tatsuya, Suzuki Noriko
The Journal of the Acoustical Society of Japan 62 ( 4 ) 306 - 315 2006.4
-
Acoustic roles of the laryngeal cavity in vocal tract resonance Reviewed
Hironori Takemoto, Seiji Adachi, Tatsuya Kitamura, Parham Mokhtari, Kiyoshi Honda
Journal of the Acoustical Society of America 2006.4
Joint Work
-
Cyclicity of laryngeal cavity resonance due to vocal fold vibration Reviewed
Tatsuya Kitamura, Hironori Takemoto, Seiji Adachi, Parham Mokhtari, Kiyoshi Honda
Journal of the Acoustical Society of America 120 ( 4 ) 2239 - 2249 2006.4
Joint Work
Publisher:ACOUSTICAL SOC AMER AMER INST PHYSICS
Acoustic effects of the time-varying glottal area due to vocal fold vibration on the laryngeal cavity resonance were investigated based on vocal tract area functions and acoustic analysis. The laryngeal cavity consists of the vestibular and ventricular parts of the larynx, and gives rise to a regional acoustic resonance within the vocal tract, with this resonance imparting an extra formant to the vocal tract resonance pattern. Vocal tract transfer functions of the five Japanese vowels uttered by three male subjects were calculated under open- and closed-glottis conditions. The results revealed that the resonance appears at the frequency region from 3.0 to 3.7kHz when the glottis is closed and disappears when it is open. Real spectra estimated from open- and closed-glottis periods of vowel sounds also showed the on-off pattern of the resonance within a pitch period. Furthermore, a time-domain acoustic analysis of vowels indicated that the resonance component could be observed as a pitch-synchronized rise-and-fall pattern of the bandpass amplitude. The cyclic nature of the resonance can be explained as the laryngeal cavity acting as a closed tube that generates the resonance during a closed-glottis period, but damps the resonance off during an open-glottis period.
DOI: 10.1121/1.2335428
-
Difference in vocal tract shape between upright and supine postures: Observations by an open-type MRI scanner
KITAMURA Tatsuya, TAKEMOTO Hironori, HONDA Kiyoshi, SHIMADA Yasuhiro, FUJIMOTO Ichiro, SYAKUDO Yuko, MASAKI Shinobu, KURODA Kagayaki, OKU-UCHI Noboru, SENDA Michio
Acoustical Science and Technology 26 ( 5 ) 465 - 468 2005.9
-
Individual variation of the hypopharyngeal cavities and its acoustic effects
Tatsuya Kitamura, Kiyoshi Honda, Hironori Takemoto
Acoustical Science and Technology 26 ( 1 ) 16 - 26 2005.1
Joint Work
-
A method of tooth superimposition on MRI data for accurate measurement of vocal tract shape and dimensions Reviewed
Hironori Takemoto, Tatsuya Kitamura, Hironori Nishimoto, Kiyoshi Honda
Acoustical Science and Technology 25 ( 6 ) 468 - 474 2004.11
Joint Work
-
Exploring human speech production mechanisms by MRI
Kiyoshi Honda, Hironori Takemoto, Tatsuya Kitamura, Satoru Fujita, Sayoko Takano
IEICE Transactions on Information and Systems E87-D ( 5 ) 1050 - 1058 2004.5
Joint Work
Publisher:IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG
Recent investigations using magnetic resonance imaging (MRI) of human speech organs have opened up new avenues of research. Visualization of the speech production system provides abundant information on the physiological and acoustic realization of human speech. This article summarizes the current status of MRI applications with respect to speech research as well as our own experience of discovery and re-evaluation of acoustic events emanating from the vocal tract and physiological mechanisms.
-
Development of a Japanese reading resource bank using the Internet
Yoshiko Kawamura, Tatsuya Kitamura
6 241 - 255 2001
-
Yoshiko Kawamura, Tatsuya Kitamura, Rei Hobara
24 7 - 12 2000.8
-
学習履歴管理機能を持つ日本語読解支援システムの開発とその評価
北村達也, 川村よし子, 内山潤, 寺朱美, 奥村学
日本教育工学雑誌 23 ( 3 ) 127 - 133 1999
-
Significant cues in spectral envelope of isolated vowels for speaker identification
Kitamura Tatsuya, Akagi Masato
The Journal of the Acoustical Society of Japan 53 ( 3 ) 185 - 191 1997.3
Joint Work
Publisher:一般社団法人日本音響学会
単母音のスペクトル包絡において個人性が顕著に現れる帯域とその帯域において話者識別に寄与する成分についての検討を行った.スペクトル包絡の特定の帯域を変形させた刺激音を用いた聴覚実験により, スペクトル包絡の変形と個人性知覚との定量的な関係を求めた.その結果, 以下のことが明らかになった.(1)個人性はスペクトル包絡全体に現れるが, 高域により多く現れる.(2)話者識別にはスペクトル包絡のdipよりもpeakが重要な意味を持っている.(3)個人性は音韻によらずスペクトル包絡の20 ERB rate (1,740Hz)付近に存在するpeak以上の帯域に顕著に現れる可能性が高く, この帯域を利用して話者変換が可能である.(4)この帯域のpeakを3角形で近似しても個人性が保存される.
-
Speaker individualities in speech spectral envelopes
Tatsuya Kitamura, Masato Akagi
Journal of the Acoustical Society of Japan (E) 16 ( 5 ) 283 - 289 1995.9
Joint Work
The aim of the three psychoacoustic experiments described here was to clarify whether there are speaker individualities in the spectral envelopes, in which frequency bands such individualities exist, and how frequency bands having speaker individualities can be manipulated. The LMA analysis-synthesis system was used to prepare stimuli varied specific frequency bands, and the frequency bands having speaker individualities were estimated expermentally. The results indicate that (1) speaker individualities exist in spectral envelopes, (2) these individualities are mainly at frequencies higher ...
DOI: 10.1250/ast.16.283
-
離散分布型HMMによる単語音声認識におけるビタビbest-firstサーチの検討
好田正紀, 北村達也
電子情報通信学会論文誌. D-II, 情報・システム, II-情報処理 77 ( 7 ) 1187 - 1197 1994
Joint Work
HMMによる音声認識をグラフサーチの問題とみなし,ビームサーチの技法を利用して,当該節点までのスコアのみに基づく枝刈りや,forward-backwardサーチのようにより単純なモデルを用いた認識処理に基づく当該節点以降の推定スコアも考慮した枝刈りが検討された.また,best-firstサーチの技法を利用して,スタックデコーディング法のように厳密なA探索に必ずしもこだわらない実用的な探索法や,tree-trellisサーチのようにN-best候補の探索に対して高速化を図る方法が検討された.本論文では,best-firstサーチの技法を利用して,HMMのビタビアルゴリズムによる認識処理に対して高速化を図る方法を検討し,最大経路スコアに基づく推定スコア設定法および単純な音素HMMを利用する推定スコア設定法を提案した.ビタビbest-firstサーチは,推定スコアを適切に設定すれば,認識率を低下させずに,認識処理で主要な部分を占める経路展開の計算量が1%以下となり,計算量低減の効果が非常に大きいことを示した.単純な音素HMMを利用する推定スコアは,時間軸の順序関係が考慮されるので精度が良いが,推定スコア設定に大きな計算量を必要とする.経路展開の計算量と推定スコア設定の計算量の両方を考慮すると,単語内最大経路スコアに基づく推定スコアが最も良い.この推定スコアは,A探索の条件を満たすので,最適解も保証される.