Papers - KITAMURA Tatsuya
-
Comparison of vocal tract transfer functions calculated using one-dimensional and three-dimensional acoustic simulation methods
Hironori Takemoto, Parham Mokhtari, Tatsuya Kitamura
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4 408 - 412 2014
Joint Work
Publisher:ISCA-INT SPEECH COMMUNICATION ASSOC
Acoustic characteristics of the vocal tract have been investigated extensively in the literature using a one-dimensional (ID) acoustic simulation method. Because the ID method assumes plane wave propagation only, it is recognized to be valid only in the low frequency region (below about 4 or 5 kHz). Recently, a three-dimensional (3D) acoustic simulation method was developed, to obtain more precise acoustic characteristics of the vocal tract. In the present study, from a male's vocal tract shapes, transfer functions were calculated using the 1D and 3D methods and compared with each other to evaluate the valid frequency range of the ID method. As a result, when acoustic effects of the piriform fossae were considered in the ID method, the transfer functions agreed with each other up to 7 kHz (ignoring small dips). The 3D method showed that a deep dip was generated at around 8 kHz by the transverse resonance mode in the pharynx. Above this dip frequency, the transfer functions disagreed with each other. Thus, the ID method is valid up to 7 kHz for this subject. Because this subject has a relatively large vocal tract, in general the upper limit of the valid frequency range could exceed 8 kHz.
-
Vocal tract length estimation based on vowels using a database consisting of 385 speakers and a database with MRI-based vocal tract shape information
Hideki Kawahara, Tatsuya Kitamura, Hironori Takemoto, Ryuichi Nisimura, Toshio Irino
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4 870 - 874 2014
Joint Work
Publisher:ISCA-INT SPEECH COMMUNICATION ASSOC
A highly-reproducible estimation method of vocal tract length (VTL) and text independent VTL estimation method are proposed based on a Japanese vowel database spoken by 385 male and female speakers ranging from age 6 to 56 and other vowel database with MRI-based vocal tract shape information. Proposed methods are based on interference-free power spectral representation and systematic suppression of biasing factors. MRI data is used to calibrate VTL estimation result to be represented in terms of physically meaningful unit. These databases are normalized based on the estimated VTL information to provide a reference template, which is used to implement a text independent VTL estimation method. A prototype system for text independent estimation of VTL is implemented using Mat lab and runs faster than realtime on a PC.
-
Acoustic interaction between the right and left piriform fossae in generating spectral dips Reviewed
Hironori Takemoto, Seiji Adachi, Parham Mokhtari, Tatsuya Kitamura
Journal of the Acoustical Society of America 134 ( 4 ) 2955 - 2964 2013.10
Joint Work
Publisher:ACOUSTICAL SOC AMER AMER INST PHYSICS
It is known that the right and left piriform fossae generate two deep dips on speech spectra and that acoustic interaction exists in generating the dips: if only one piriform fossa is modified, both the dips change in frequency and amplitude. In the present study, using a simple geometrical model and measured vocal tract shapes, the acoustic interaction was examined by the finite-difference time-domain method. As a result, one of the two dips was lower in frequency than the two independent dips that appeared when either of the piriform fossae was occluded, and the other dip was higher in frequency than the two dips. At the lower dip frequency, the piriform fossae resonated almost in opposite phase, while at the higher dip frequency, they resonated almost in phase. These facts indicate that the piriform fossae and the lower part of the pharynx can be modeled as a coupled two-oscillator system whose two normal vibration modes generate the two spectral dips. When the piriform fossae were identical, only the higher dip appeared. This is because the lower mode is not acoustically coupled to the main vocal tract enough to generate an absorption dip. (C) 2013 Acoustical Society of America.
DOI: 10.1121/1.4818744
-
日本語学習者の音声の韻律変換が自然性評価に与える影響
阿栄娜, 林良子, 北村達也
日本音響学会2013年秋季研究発表会講演論文集 425 - 426 2013.9
Joint Work
-
Naturalness on Japanese pronunciation before and after shadowing training and prosody modified stimuli
Rongna A, Ryoko Hayashi, Tatsuya Kitamura
Proceedings of Interspeech 2013 Satellite workshop on Speech and Language Technology in Education 143 - 146 2013.8
Joint Work
-
Timing differences in articulation between voiced and voiceless stop consonants: An analysis of cine-MRI data
Masako Fujimoto, Tatsuya Kitamura, Hiroaki Hatano, Ichiro Fujimoto
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 955 - 958 2013
Joint Work
Publisher:ISCA-INT SPEECH COMMUNICATION ASSOC
Laryngeal and supralaryngeal articulators coordinately work to produce speech sounds. In order to study differences in supralaryngeal manifestations of voiced and voiceless consonants, we compared the tongue movement during a minimal pair /agise/ and /akise/ using the fast scanning techniques of MRI movies. The result showed that the tongue displacement starts earlier in /k/ than in /g/ for many of the speakers of Tokyo Japanese. This agrees with our previous findings using other dialect speakers. These results suggest that many Japanese actively differentiate supralaryngeal articulation according to the voicing of the consonants, raising the tongue earlier in voiceless ones. This movement is presumably to ensure the voicelessness of the consonant. The present study also supplies evidence for the usefulness of a constructive approach for physical modeling.
-
Differences in articulatory movement between voiced and voiceless stop consonants Reviewed
Ryosuke O. Tachibana, Tatsuya Kitamura, Masako Fujimoto
Acoustical Science and Technology 33 ( 6 ) 391 - 393 2012.11
-
Measurement of vibration velocity pattern of facial surface during phonation using scanning vibrometer
Tatsuya Kitamura
Acoustical Science and Technology 33 ( 2 ) 126 - 128 2012.3
-
A Method for Predicting Stressed Words in Teaching Materials for English Jazz Chants Reviewed
NAGATA Ryo, FUNAKOSHI Kotaro, KITAMURA Tatsuya, NAKANO Mikio
IEICE Trans Inf Syst (Inst Electron Inf Commun Eng) E95.D ( 11 ) 2658 - 2663 2012
Joint Work
Publisher:IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG
To acquire a second language, one must develop an ear and tongue for the correct stress and intonation patterns of that language. In English language teaching, there is an effective method called Jazz Chants for working on the sound system. In this paper, we propose a method for predicting stressed words, which play a crucial role in Jazz Chants. The proposed method is specially designed for stress prediction in Jazz chants. It exploits several sources of information including words, POSs, sentence types, and the constraint on the number of stressed words in a chant text. Experiments show that the proposed method achieves an F-measure of 0.939 and outperforms the other methods implemented for comparison. The proposed method is expected to be useful in supporting non-native teachers of English when they teach chants to students and create chant texts with stress marks from arbitrary texts.
-
Correlation between vocal tract length, body height, formant frequencies, and pitch frequency for the five Japanese vowels uttered by fifteen male speakers
Hiroaki Hatano, Tatsuya Kitamura, Hironori Takemoto, Parham Mokhtari, Kiyoshi Honda, Shinobu Masaki
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3 402 - 405 2012
Joint Work
Publisher:ISCA-INT SPEECH COMMUNICATION ASSOC
We conducted quantitative analyses of a magnetic resonance imaging (MRI) database to examine the correlation between physical measures (vocal tract length and body height) and acoustic parameters (pitch and formant frequencies) of vowels. The vocal tract length was measured from MRI data for the five Japanese vowels produced by fifteen male Japanese speakers between the ages of 24 and 55. The acoustic features were computed from vowel sounds recorded during scan. The vocal tract length showed a weak positive correlation with the speakers' age (correlation coefficient r = 0.51) but not with the speaker body height (r = 0.08). There were only weaker correlations between the vocal tract length and the first four formant frequencies except that F1 and F2 of the vowel /e/ show negative correlations with the vocal tract length (F1: r = -0.65, F2: r = -0.56). The result suggests that the vocal tract length is one of the dominant factors causing individual differences in the formant frequencies for the vowel /e/, produced by not forming a strong constriction. Furthermore, the pitch frequency was negatively correlated with the body height (r = -0.61).
-
Simulation of the coupling between vocal-fold vibration and time-varying vocal tract
Yosuke Tanabe, Parham Mokhtari, Hironori Takemoto, Tatsuya Kitamura
Journal of the Acoustical Society of America 130 ( 4 ) 2441 2011.10
Joint Work
-
Study of perceptual factors for speaker identification focusing on perceptual similarity of speaker characteristics
Tsuyoshi Izumida, Tatsuya Kitamura
Acoustical Science and Technology 32 ( 5 ) 216 - 219 2011.9
-
Dental imaging using a magnetic resonance visible mouthpiece for measurement of vocal tract shape and dimension
Tatsuya Kitamura, Hironori Nishimoto, Ichiro Fujimoto, Yasuhiro Shimada
Acoustical Science and Technology 32 ( 5 ) 224 - 227 2011.9
Joint Work
-
Acoustic analysis of the vocal tract during vowel production by finite-difference time-domain method Reviewed
Hironori Takemoto, Parham Mokhtari, Tatsuya Kitamura
Journal of the Acoustical Society of America 128 ( 6 ) 3724 - 3738 2010.12
Joint Work
Publisher:ACOUSTICAL SOC AMER AMER INST PHYSICS
The vocal tract shape is three-dimensionally complex. For accurate acoustic analysis, a finite-difference time-domain method was introduced in the present study. By this method, transfer functions of the vocal tract for the five Japanese vowels were calculated from three-dimensionally reconstructed magnetic resonance imaging (MRI) data. The calculated transfer functions were compared with those obtained from acoustic measurements of vocal tract physical models precisely constructed from the same MRI data. Calculated transfer functions agreed well with measured ones up to 10 kHz. Acoustic effects of the piriform fossae, epiglottic valleculae, and inter-dental spaces were also examined. They caused spectral changes by generating dips. The amount of change was significant for the piriform fossae, while it was almost negligible for the other two. The piriform fossae and valleculae generated spectral dips for all the vowels. The dip frequencies of the piriform fossae were almost stable, while those of the valleculae varied among vowels. The inter-dental spaces generated very small spectral dips below 2.5 kHz for the high and middle vowels. In addition, transverse resonances within the oral cavity generated small spectral dips above 4 kHz for the low vowels.
DOI: 10.1121/1.3502470
-
Visualisation of hypopharyngeal cavities and vocal-tract acoustic modelling
Kiyoshi Honda, Tatsuya Kitamura, Hironori Takemoto, al
Computer methods in Biomechanics and Biomedical Engineering 13 ( 4 ) 443 - 453 2010.7
Joint Work
Publisher:TAYLOR & FRANCIS LTD
The hypopharyngeal cavities consist of the laryngeal cavity and bilateral piriform fossa, constituting the bottom part of the vocal tract near the larynx. Visualisation of these cavities with magnetic resonance imaging (MRI) techniques reveals that during speech, the laryngeal cavity takes the form of a long-neck flask and the piriform fossa takes the form of a goblet of varying shapes: the former diminishes greatly in whispering and the latter disappears during deep inhalation. These cavities have been shown to exert significant acoustic effects at higher frequency spectra. In this study, acoustic experiments were conducted for male and female mechanical vocal tracts with the results that acoustic effects of those cavities determine the frequency spectra above 2kHz, giving rise to peaks and zeros. An acoustic model of vowel production was proposed with three components: voice source, hypopharyngeal cavities and vocal tract proper, which provides effective means in controlling voice quality and expressing individual vocal characteristics.
-
Yasuhiro Hamada, Tatsuya Kitamura, Masato Akagi
Journal of Signal Processing 14 ( 4 ) 265 - 268 2010.7
Joint Work
-
Similarity of effects of emotions on the speech organ configuration with and without speaking
Tatsuya Kitamura
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2 909 - 912 2010
Single Work
Publisher:ISCA-INST SPEECH COMMUNICATION ASSOC
In this work we propose and verify a hypothesis on emotional speech production: emotions induce physical and physiological changes in the whole body including changes in the configuration and physical/mechanical properties of the speech organs, regardless of whether or not the person is speaking, and as a side effect, this changes the voice quality. To verify this hypothesis, we measured the configuration of the speech organs of professional actors simulating four emotions (neutral, hot anger, joy, and sadness) with and without speaking by magnetic resonance imaging. The results clearly showed that emotions affect the speech organ configuration, and the same tendency of changes in the speech organ configuration was found regardless of whether or not the person was speaking. We also measured electromagnetic articulography data while a participant watched a relaxation or horror movie, and the result implies that emotional changes can deform the speech organ configuration even if the participant does not speak. These results support our hypothesis.
-
Transfer functions of solid vocal-tract models constructed from ATR MRI database of Japanese vowel production
Tatsuya Kitamura, Hironori Takemoto, Seiji Adachi, Kiyoshi Honda
Acoustical Science and Technologies 30 ( 4 ) 288 - 296 2009.4
Joint Work
The ATR MRI database of Japanese vowel production was used to evaluate the acoustic characteristics of the vocal tract for the five Japanese vowels through the measurements of frequency responses from solid vocal-tract models formed by a stereolithographic technique. The database includes speech sounds as well as volumetric magnetic resonance imaging (MRI) data, but the speech sounds were recorded separately from the acquisition MRI data; therefore, their speech spectra are not appropriate for use as the reference for the transfer functions of the vocal tract. A time-stretched pulse signal generated from a horn driver unit was introduced into the physical model at the lips, and the response signals of the models were recorded at the model's glottis. In the measurements, the glottis of the models was sealed with a plastic plate, and the response signals were measured from a small hole in the plate using a probe microphone. This method permits accurate measurement of the transfer functions of the vocal tract under a closed-glottis condition. The resulting transfer functions of the five Japanese vowels provide a benchmark for testing numerical analysis methods that have been used to study vocal-tract acoustics, although the solid wall decreases the frequencies of lower resonances.
DOI: 10.1250/ast.30.288
-
Resonance characteristics of hypopharyngeal cavities
Kiyoshi Honda, Tatsuya Kitamura, Hironori Takemoto, Parham Mokhtari, Seiji Adachi
Journal of the Acoustical Society of America 123 ( 5 ) 3731 2008.7
Joint Work