Presentations -
-
Similarity of speaker individualities of sentence in ATR speech database set C
KAWAMOTO Hiroki, KITAMURA Tatsuya
IEICE technical report. Speech
Event date: 2013.2
We measured perceptual similarity of speaker individualities for a sentence of twenty male Japanese speakers in ATR speech database set C. Forty participants evaluated perceptual similarity of the sentence of pairs of speakers We obtained inter-speaker distances by a multidimensional scaling analysis on the basis of the results of the perceptual experiments.
-
Measurement of temporal cange of vocal tract volume during production of plosive and fricative consonants
KITAMURA Tatsuya, HATANO Hiroaki
IEICE technical report. Speech
Event date: 2012.11
The volume of the vocal tract of a male speaker during production of voiced and voiceless plosives and fricatives was measured directly from magnetic resonance imaging (MRI)data. Three-dimensional cine-MRI data of three-morae non-sense words were obtained by a synchronized sampling method, and the temporal change of the vocal tract volume was measured while there was a closure or a constriction at the alveolar. The results showed that the volume of the vocal tract for the voiced plosive /d/ increased almost monotonically, and the volume was larger than that for the voiceless plosive /t/ through the closure section. The maximum value and rise range of the vocal tract volume for the voiced plosive /d/ is greater than that for the voiced fricative /z/.
-
Speaker normalization by local expansion and contraction of the vocal tract
KITAMURA Tatsuya, TAKEMOTO Hironori, ADACHI Seiji
IEICE technical report. Speech
Event date: 2010.2
Vocal tract area functions for the five Japanese vowels of six male speakers were tuned for their first four formant frequencies to be close to those of a target speaker. The vocal tract warping functions were obtained as relationship between the original and deformed area functions. The results indicate that (1) the warping functions are not linear, (2) the vocal tract length of the deformed area functions are different from that of the target speaker, and (3) the shape of the warping functions of the five vowels are not constant for each speaker.
-
A6.情動による声道形状変化のMRI観測(研究発表,音声学会2009年度(第23回)全国大会発表要旨)
北村 達也
音声研究
Event date: 2009.12
-
Overview of methods and techniques used in MRI-based speech production studies
TAKEMOTO Hironori, KITAMURA Tatsuya
IEICE technical report. Speech
Event date: 2009.6
MRI is a powerful tool for the study of speech production. Although MRI sequences specifically designed for vocal tract imaging have been developed and everyone can use them, MRI-based speech production studies are small in number. One possible reason is that information about MRI characteristics relating to vocal tract imaging and data processing techniques is limited. In this paper, we provide a brief overview of MRI sequences used in this field, that is, the conventional sequence, phonation synchronized sequence, and movie sequence. Next, we review MRI data processing techniques for each sequence and outline the transmission line model and the finite-difference time-domain method as acoustic analysis methods. Finally, we comment on future prospects concerning MRI-based speech production studies.
-
Acoustic characteristics of solid models based on vowel production MRI data
KITAMURA Tatsuya, TAKEMOTO Hironori, HONDA Kiyoshi
Technical report of IEICE. EA
Event date: 2007.11
"ATR MRI database of Japanese vowel production" was used to evaluate acoustic characteristics of realistic vocal tracts for five Japanese vowels through the measurements of frequency responses from vocal tract solid models formed by a stereo-lithographic technique. An optimized Aoshima's time-stretched pulse signal generated from a horn driver unit was introduced into the solid model at the lip end. The response signals of the models were recorded at the model's glottis. This method permits accurate measurement of acoustic characteristics of the vocal tract including the laryngeal cavity. The results provide a benchmark for testing numerical analysis methods that have been used to study vocal tract acoustics.
-
Analysis of imitated voice produced by a professional impersonator
KITAMURA Tatsuya
IEICE technical report. Speech
Event date: 2007.10
This study is a comparative survey of voice produced by a professional impersonator imitating a target speaker in order to explore possible perceptual factors of similarity of speaker characteristics. The results show that the mean pitch frequency (F0) of the impersonator is approximately 20Hz higher than the target speaker and the dynamics of the F0 contour of the two speakers is closely resemble. The DFT spectra of the speakers are quite similar in its shape and the first, third, and fourth formant frequencies. Moreover, the difference between the amplitude levels of the first harmonic (H1) and the second harmonic (H2), a measure of the glottal source characteristics, are close between the speakers. In contrast, the second formant frequency and syllable duration of the imitated voice differ from the target voice.
-
A Method for Measuring Tooth Shape by Magnetic Resonance Imaging Using a Thermoplastic Elastomer Dental Mouthpiece
KITAMURA Tatsuya, HIRATA Hiroyuki, HONDA Kiyoshi, FUJIMOTO Ichiro, SHIMADA Yasuhiro, MASAKI Shinobu, NISHIKAWA Takafumi, FUKUI Kotaro, TAKANISHI Atsuo
IEICE technical report.
Event date: 2007.7
This work proposes a method for measurement of tooth shape by magnetic resonance imaging (MRI) using a dental mouthpiece made of a thermoplastic elastomer. Because this materials blended with edible paraffin, it can be imaged with high signal intensity MRI. Also, this dental mouthpiece is formed in a vacuum in order to eliminate formation of air bubbles and thereby obtain even MR images. The teeth, on the other hand, are imaged with low signal intensity that contrasts against the dental mouthpiece, thus enabling extraction of tooth shape from the MR images.
-
Vocal tract resonance under open-glottis condition
TAKEMOTO Hironori, KITAMURA Tatsuya, MOKHTARI Parham, ADACHI Seiji, HONDA Kiyoshi
IEICE technical report. Speech
Event date: 2007.3
Using area functions of the five Japanese vowels, glottal opening effects on the transfer function were examined by introducing a glottal impedance. Because the vocal tract resonance approached an open-tube resonance under the open-glotis condition, the first formant frequency increased. The fourth formant induced by the laryngeal cavity was shifted to a higher frequency and damped until it disappeared, because the laryngeal cavity resonance increased in frequency and attenuated. At the other formants, a node of volume velocity appeared at the junction between the laryngeal and pharyngeal cavities, and the vocal tract resonance could therefore be approximated by a closed-tube resonance of the vocal tract excluding the laryngeal cavity.
-
Effects of acoustic modifications on perception of speaker characteristics for sustained vowels
KITAMURA Tatsuya, SAITOU Takeshi
IEICE technical report. Speech
Event date: 2007.3
An interval scale for contribution of acoustic properties to perception of speaker identity was measured according to Thurstone paired-comparison methodology. In the experiments, several acoustic properties of sustained vowel /a/ uttered by 10 male speakers were modified and those effects on perception of closeness of speaker characteristics were investigated. An interval scale for sound quality of the stimuli was also measured in order to confirm whether the degradation of sound quality affects the results. The results showed that the order of perceptual contribution is speech spectra in higher frequency region, the frequency properties of the glottal source, the mean of the pitch frequency and time-pattern of the amplitude and pitch frequency in decreasing order representing the smaller intra-speaker variation of the properties the more important to perception of speaker identity. However, there is a strong positive correlation between interval scales of closeness of speaker characteristics and sound quality of the stimuli implying that sound quality might affect to the experimental results.
-
Cyclicity of laryngeal cavity resonance due to vocal fold vibration
KITAMURA Tatsuya, TAKEMOTO Hironori, ADACHI Seiji, MOKHTARI Parham, HONDA Kiyoshi
IEICE technical report. Speech
Event date: 2006.7
Acoustic effects of the time-varying glottal area due to vocal fold vibration on the laryngeal cavity resonance were investigated. Vocal tract transfer functions of the five Japanese vowels uttered by three male subjects were calculated under open- and closed-glottis conditions. The results revealed that the resonance appears at the frequency region from 3.0 to 3.7kHz when the glottis is closed and disappears when it is open. Real spectra estimated from open- and closed-glottis periods of vowel sounds also showed the on-off pattern of the resonance within a pitch period. The cyclic nature of the resonance can be explained as the laryngeal cavity acting as a closed tube that generates the resonance during a closed-glottis period, but damps the resonance off during an open-glottis period.
-
Acoustic characteristics of the laryngeal cavity
TAKEMOTO Hironori, ADACHI Seiji, KITAMURA Tatsuya, HONDA Kiyoshi, MOKHTARI Parham
IEICE technical report. Speech
Event date: 2005.5
Resonant mode analysis was performed on area functions of the five Japanese vowels to investigate acoustic properties of the laryngeal cavity. Around the resonance frequency of the laryngeal cavity, a remarkable increase in volume velocity was observed at the junction between the laryngeal and pharyngeal cavities. This suggests that the vocal tract proper (i.e., superior to the laryngeal cavity) resonates like an open tube. In the present study, such resonance was found to occur at the fourth formant. By contrast, the low volume velocities observed at the other formants revealed that at those frequencies the junction could be considered as a closed end, with the vocal tract proper resonating as a closed tube.
-
Measurement of changes of vocal tract shape by F_0 shift
KITAMURA Tatsuya, MOKHTARI Parham
Technical report of IEICE. EA
Event date: 2005.3
Effects of pitch frequency (F_0) shift in vocal tract shape were analyzed by volumetric magnetic resonance imaging (MRI). One male subject performed sustained productions of Japanese vowel /a/ and /i/ with being asked to adjust these F_0 to 110, 123, 130, 146, and 164 Hz pure tone. The results of comparison of vocal tract area functions extracted from the MR images revealed that F_0 and area function of the oral cavity show a strong negative correlation for vowel /a/ and F_0 and area function of the pharyngeal cavity show a negative correlation for vowel /i/.
-
Acoustic analysis of the vocal tract by FEM with voxel meshing
KITAMURA Tatsuya, TAKEMOTO Hironori, HONDA Kiyoshi
IEICE technical report. Speech
Event date: 2004.11
A finite element method (FEM) is applied to acoustic analysis of the vocal tracts of the five Japanese vowels. Finite element (FE) models were created meshing vocal tract regions extracted from volumetric MR images during production of the vowels, by 2×2×2 mm voxel elements (cubic elements). This meshing method converts voxels in an MRI volume data into finite elements, hence it is easy to mesh even though a target region has a complex form. In this study, peak frequencies of transfer functions of the FE models were compared with formant frequencies of speech data. The effects of the inter-dental spaces, the epiglottic vallecula, and the laryngeal tube on transfer functions of the FE model were also investigated. The results show that (1) the peak frequencies of the FE models roughly correspond to the formant frequencies of the speech data except for the vowel /u/, (2) the inter-dental spaces and the epiglottic vallecula cause dips in the transfer functions of the FE models, and (3) the laryngeal tube of the FE model of the vowel /a/ causes the fourth peak in the transfer functions of the FE models.
-
Difference in vocal tract shape between upright and supine postures Observations by an open-type MR scanner
KITAMURA Tatsuya, TAKEMOTO Hironori, HONDA Kiyoshi, SHIMADA Yasuhiro, FUJIMOTO Ichiro, SYAKUDO Yuko, MASAKI Shinobu, KURODA Kagayaki, OKU-UCHI Noboru, SENDA Michio
IEICE technical report. Speech
Event date: 2004.6
Midsagittal images were collected using an open-type magnetic resonance imaging scanner to examine possible effects of body postures on vowel articulation. Three male speakers performed sustained productions of five Japanese vowels with supine and upright body postures. Comparisons of data between the two conditions revealed that the tongue tends to be more retracted backward in supine posture in back vowels, and that the soft palate and lips also showed effects of gravity. In upright posture, the cervical spine and posterior pharyngeal wall were found to be more anterior relative to the hard palate, which suggests effects of head posture rather than of gravity. Acoustic data demonstrated major spectral differences in the frequency range above 1.5 kHz.
-
Estimation of transfer function of vocal tract extracted from MRI data by FEM
NISHIMOTO Hironori, AKAGI Masato, KITAMURA Tatsuya, SUZUKI Noriko
IEICE technical report. Speech
Event date: 2004.3
Vocal tract transfer functions (VTTFs) of 3-D vocal tract models were estimated by using the finite element method (FEM) and the method proposed by Sondhi et al. which was using cross-section area functions of vocal tracts. Subjects were two Japanese males with normal vocal tracts and one Japanese male who had oral lesions. The number of peaks of VTTFs and the number of peaks of spectral envelopes were compared. The number of the peaks and peak frequencies of two normal VTTFs estimated by the FEM and the 1-D model were almost corresponding to those from analyzed results of speech waves. The number of peaks of VTTF estimated by the FEM from one abnormal vocal tract was correspond to that from analyzed results of speech waves, and the peak frequencies were close to those from analyzed results of speech waves. Whereas, for the VTTF of an abnormal vocal tract estimated by the 1-D model, even the number of peaks was different from that from analyzed results of speech waves. The results indicate that the FEM can estimate the equivalent number of peaks and the equivalent peak frequencies of the VTTF as the analyzed results even when the vocal tracts were normal and abnormal. On the contrary, the 1-D model estimated them for normal vocal tracts only. This suggests that the FEM is useful for estimating transfer functions of complicated vocal tracts.
-
Comparison of measured and simulated transfer functions of vocal tract model
KITAMURA Tatsuya, NISHIMOTO Hironori, FUJITA Satoru, HONDA Kiyoshi
IEICE technical report. Speech
Event date: 2003.4
The aim of this study is to confirm accuracy of simulated transfer functions by using the finite element method (FEM). In this study, transfer functions of a few simple acoustical tubes were examined using three methods : acoustical measurement, electric circuit model, and FEM, and resonance frequencies of these transfer functions were compared. Resonance frequencies obtained by these three methods were almost in agreement for a uniform tube. For the replicas of vocal tract physical models from Chiba and Kajiyama, resonance frequencies simulated by using FEM almost corresponded with those by the electric circuit model. However, resonance peaks of measured transfer functions were not evident in the frequency range over 3 kHz. This implies that the measurement method used in this study has some problems.
-
Influence f context and word order in the identification of focal prominence in Japanese dialogue
KITAMURA Tatsuya, ITOH Kayo, ITOH Toshihiko, KITAZAWA Shigeyoshi
IEICE technical report. Speech
Event date: 2002.4
This paper studies the influence of prosodic features, context, and word order on the identification of focused clauses in Japanese dialogue, using a psychoacoustic experiment. In the experiment, question and answer speech was used as stimuli. The questions were to create two different contexts in the stimuli, and the answers had focal prominence at different clauses and had different word orders. The experimental results indicate that (1) prosodic characteristics are more significant for focus identification, (2) context has some effect on identification, and (3) it is probable that the word order has some effect on identification.
-
Prosodic phrase labeling based on prosodic features for developing prosodic database
KITAMURA Tatsuya, ITOH Toshihiko, MOCHIZUKI Kazuya, KITAZAWA Shigeyoshi
IEICE technical report. Speech
Event date: 2002.1
A very detailed segmentation of prosodic phrase has carried out in order to construct a Japanese prosodic database. The database, referred to here as "Japanese Multext", contains read style speech and spontaneous style speech by three male speakers and three female speakers in Tokyo dialect. The "prosodic phrase", we introduced as a unit of the segmentation, was defined and regarded as a unit of language speech perception. For the exact segmentation, the wide-band spectrum, the narrow-band spectrum, fine speech wave and fundamental frequency shapes, and transition of amplitude of the higher order formants were adopted to enumerate the candidate points for the segment boundary. Fine time adjustment by the steps of the respective fundamental period of the speech determined the exact boundary. To maintain the consistency of the segmentation, one person ascertained the entire segment carefully.
-
Three-dimensional analysis of vocal tract using MRI: Cases with tongue and mouth floor resection
Kitamura Tatsuya, Suzuki Noriko, Saito Hiroto, Michi Ken-ichi, Takahashi Toshiyuki, Akagi Masato, Wakumoto Masahiko
IEICE technical report. Speech
Event date: 2001.3
Magnetic resonance imaging (MRI) techniques were used to investigate three-dimensional vocal tract shape of patients after tongue and mouth floor resection. The vocal tract shape during the production of the vowel /i/ were analyzed. Subjects were two patients and two normals. Vocal tract asymmetry with respect to the mid-saggital plane was analyzed between the patients and the normals. The result shows that the patients' vocal tract have marked asymmetry caused by the surgery. It is possible that the asymmetrical vocal tract shape causes patients' abnormal voice.