Details of Database of Faculty members and Researchers

Presentations -

Division display >> ／ All the affair displays 1 - 85 of about 85

音声生成系の可視化およびAIを活用した解析

北村達也

近畿音声言語研究会 (大阪大学) 2025.3 近畿音声言語研究会

Event date： 2025.3

Country：Japan

口形に着目した発話訓練が音声に及ぼす影響

設楽郁巳, 北村達也, 牧野桃子, 山根典子

日本音響学会春季研究発表会 (拓殖大学) 2024.3 日本音響学会

Event date： 2024.3

Country：Japan

音声生成系の様々な可視化手法 Invited

北村達也

日本音響学会春季研究発表会 (拓殖大学) 2024.3 日本音響学会

Event date： 2024.3

Country：Japan

EMAを用いた日本語母音の時間的な伸長と調音運動の検討

白勢彩子, 北村達也

日本音響学会春季研究発表会 (拓殖大学) 2024.3 日本音響学会

Event date： 2024.3

Country：Japan

ポップアウトボイス話者の声帯振動観測の試み

北村達也, 榊原健一, 内尾紀彦, 山中絢太, 能田由紀子, 天野成昭

日本音響学会春季研究発表会 (拓殖大学) 2024.3 日本音響学会

Event date： 2024.3

Country：Japan

コミュニケーションロボットとの対話における交替潜時および交替シグナルの影響

櫻井裕真, 榎本佐知子, 北村達也, 梅谷智弘

CNR研究会 (オンライン) 2024.1

Country：Japan

コミュニケーションロボットを用いたWeb検索支援システム

谷川創太郎, 筒井大翔, 山泰斗, 北村達也, 梅谷智弘

第24回計測自動制御学会システムインテグレーション部門講演会 (新潟) 2023.12 計測自動制御学会

Country：Japan

発声訓練支援システム「スマートチューブ」の臨床的効果の検討

川村直子, 北村達也, 前川圭子

日本音声言語医学会学術講演会 (倉敷市) 2023.10 日本音声言語医学会

Event date： 2023.10

Country：Japan

超音波画像診断法による口蓋形状の造影について

北村達也, 孫静, 林良子, 能田由紀子, 前川喜久雄

日本音響学会秋季研究発表会 (名古屋市) 2023.9 日本音響学会

Event date： 2023.9

Country：Japan

音声治療の継続性向上のためのゲームアプリの開発

村井武人, 川村直子, 北村達也

日本音響学会秋季研究発表会 (名古屋市) 2023.9 日本音響学会

Event date： 2023.9

Country：Japan

音声障害患者を対象とした発声訓練支援システム「スマートチューブ」の評価

川村直子, 北村達也, 前川圭子

日本音響学会秋季研究発表会 (名古屋市) 2023.9 日本音響学会

Event date： 2023.9

Country：Japan

ポップアウトボイス生成時における母音声道形状の分析

相馬, 深澤, 竹本, 北村, 天野

日本音響学会秋季研究発表会 (名古屋市) 2023.9 日本音響学会

Event date： 2023.9

Country：Japan

口角の高さを指標にした発声訓練法の検討

設楽郁巳, 安田奈央, 北村達也, 牧野桃子, 山根典子

日本音声学会全国大会 (札幌市) 2023.9 日本音声学会

Event date： 2023.9

Country：Japan

使える音声資料の作り方を再考する：準備・録音・分析など

河原英紀, 榊原健一, 水町光徳, 矢田部浩平, 北村達也, 森勢将雅

日本音響学会音声コミュニケーション研究会 (オンライン) 2023.9 日本音響学会

Event date： 2023.9

Country：Japan

Development and Evaluation of an Internet of Things Device and Social Networking Service-Based e-Health System for Home Practice of Straw Phonation

IALP2023 2023.8

Event date： 2023.8

Country：New Zealand

クラウドAIを利用した公共図書館サービスロボットの実証評価

榎本佐知子, 筒井大翔, 北村達也, 梅谷智弘

ROBOMEC2023 (名古屋市) 2023.6 日本機械学会ロボティクス・メカトロニクス部門

Event date： 2023.6 - 2023.7

Country：Japan

チューブ発声訓練支援システムにおけるゲーミフィケーション導入の試み

村井武人, 北村達也, 川村直子

音学シンポジウム (電気通信大学) 2023.6

Event date： 2023.6

Singing voice range profiling toolbox with real-time interaction and its application to make recording data reusable

Hideki Kawahara, Ken-Ichi Sakakibara, Kohei Yatabe, Mitsunori Mizumachi and Tatsuya Kitamura

SOUND AND MUSIC COMPUTING CONFERENCE 2023 TOGETHER WITH STOCKHOLM MUSIC ACOUSTIC CONFERENCE 2023 (Stockholm) 2023.6

Country：Sweden

Singing voice profiling toolbox with real-time interaction and its application to make recording data reusable

Stockholm Music Acoustics Conference 2023 (SMAC 2023) 2023.6

Event date： 2023.6

Country：Sweden

女性声優の声質表現語抽出の試み

安田茉, 北村達也

日本音響学会音声研究会 2023.3

発話訓練経験による文章発話時の顔ランドマーク変位の違い

安田奈央, 北村達也

日本音響学会春季研究発表会 (オンライン) 2023.3 日本音響学会

Country：Japan

SNSを活用した音声障害リハビリテーション支援システムの有用性評価,

川村直子, 北村達也

日本音響学会春季研究発表会 2023.3

超音波診断装置による調音運動観測におけるプローブ位置の補正

大山陣, 北村達也, 孫静, 林良子

日本音響学会春季研究発表会 (オンライン) 2023.3 日本音響学会

Country：Japan

顔の動きをフィードバックする発話訓練支援システムの検討

設楽郁巳, 関和広, 北村達也

日本音響学会春季研究発表会 (オンライン) 2023.3 日本音響学会

Country：Japan

RTミドルウェアによるヒト型漫才ロボット制御の試み

都出若那, 中村紘稀, 北村達也, 梅谷智弘

計測自動制御学会関西支部・システム制御情報学会シンポジウム 2023.1

AI対話ロボットによるキャンパスの知能化

陸鳴宇, 筒井大翔, 梅谷智弘, 北村達也

第15回サイエンスフェア in 兵庫 2023.1

音声対話を用いた図書館ヘルプデスク支援ロボットの運用評価

梅谷智弘, 北村達也

RSJ2022 (東京都) 2022.9 日本ロボット学会

Event date： 2022.9

Country：Japan

Eテレの児童向け教育番組における単語出現頻度

北村達也, 川村よし子

言語資源ワークショップ2022 2022.9

Perceptual Evaluation of Penetrating Voices through a Semantic Differential Method,

Tatsuya Kitamura, Naoki Kunimoto, Hideki Kawahara, Shigeaki Amano

INTERSPEECH 2022 (online) 2022.9 ISCA

ポップアウトボイスの個人内変化に伴う声帯振動特性と声道音響特性の変化

北村達也, 能田由紀子, 榊原健一, 河原英紀, 天野成昭

日本音響学会秋季研究発表会 2022.9 日本音響学会

Country：Japan

磁気センサシステムによる調音運動計測のための口蓋・咬合面の計測法

能田由紀子, 北村達也, 竹本浩典, 前川喜久雄

日本音響学会秋季研究発表会 2022.9 日本音響学会

Country：Japan

拡張音声モーフィングによるポップアウト属性の検証可能性

河原英紀，森勢将雅，榊原健一，北村達也，牧勝弘

日本音響学会秋季研究発表会 2022.9 日本音響学会

Country：Japan

様々なSN比におけるポップアウトボイスの検出

北原真冬, 田嶋圭一, 米山聖子, 北村達也, 河原英紀, 天野成昭

日本音響学会秋季研究発表会 2022.9 日本音響学会

Country：Japan

音声情報伝達の確実性向上のためのポップアウトボイスの特徴解明

天野成昭, 河原英紀, 牧勝弘, 北村達也, 山川仁子, 能田由紀子

日本音響学会秋季研究発表会 2022.9 日本音響学会

Country：Japan

鼻腔・副鼻腔の単純化した形状モデルの音響特性の検討

伯田亜海, 竹本浩典, 北村達也

日本音響学会秋季研究発表会 2022.9 日本音響学会

Country：Japan

2段階モデルによるrtMRI動画からの輪郭抽出

藤澤流以, 堀井千陽, 天野沢海, 竹本浩典, 北村達也, 能田由紀子, 前川喜久雄

日本音響学会秋季研究発表会 2022.9 日本音響学会

Country：Japan

日本語話者10名の正中面における/k/の声道形状の分析

天野沢海, 藤澤流以, 竹本浩典, 北村達也, 能田由紀子, 前川喜久雄

日本音響学会秋季研究発表会 2022.9 日本音響学会

Country：Japan

Vocal-Tract Area Functions with Articulatory Reality for Tract Opening International coauthorship

Zhao Zhang, Ju Zhang, Jianguo Wei, Kiyoshi Honda, Tatsuya Kitamura

INTERSPEECH 2022 (online) 2022.9 ISCA

An objective test tool for pitch extractors' response attributes

Hideki Kawahara, Kohei Yatabe, Ken-Ichi Sakakibara, Tatsuya Kitamura, Hideki Banno, Masanori Morise

INTERSPEECH 2022 (online) 2022.9 ISCA

Eテレの児童向け教育番組における単語出現頻度

北村達也, 川村よし子

言語資源ワークショップ2022 (オンライン) 2022.8 国立国語研究所

Event date： 2022.8

Country：Japan

チューブ発声法の自主練習を支援する小型視覚的フィードバックシステムの試作

川村直子, 北村達也

日本言語聴覚学会 (新潟市) 2022.6 日本言語聴覚学会

Event date： 2022.6

Country：Japan

コロナ禍における児童の音声および口唇運動の収録

北村達也, 白勢彩子

音学シンポジウム2022 (オンライン) 2022.6

Country：Japan

クラウド音声APIを利用した図書館ヘルプデスク支援ロボットの試作

梅谷智弘, 的場瞳, 榎本佐知子, 陸鳴宇, 北村達也

ROBOMECH2022 (札幌市) 2022.6 日本機械学会

Country：Japan

変調周波数伝達特性と周波数応答で音声処理を調べよう

河原英紀 (和歌山大学), 矢田部浩平 (東京農工大学), 榊原健一 (北海道医療大学), 北村達也 (甲南大学), 坂野秀樹 (名城大学), 森勢将雅 (明治大学)

音学シンポジウム2022 (オンライン) 2022.6

Country：Japan

躍度による調音運動の流暢性評価の試み,

大上舞歌, 北村達也

日本音響学会音声研究会 2022.3

SNSによる声かけ機能を有する発声練習支援システムの試作

川村直子, 北村達也

日本音響学会音声研究会 2022.3

SD法による通る声の特性解析

國本尚輝, 北村達也, 河原英紀, 天野成昭

日本音響学会関西支部若手の会 (オンライン) 2021.12

発話のしにくさを自覚する話者の調音運動の観察

北村達也, 能田由紀子, 吐師道子

日本音声学会研究例会 2021.12 日本音声学会

チューブ発声法による音声リハビリ支援システム「スマートチューブ」の運用

川村直子, 北村達也

日本音響学会関西支部若手の会 (オンライン) 2021.12 日本音響学会

健常発話者の発話のしにくさの自覚に関係する発話過程の検討

古田尚久, 北村達也, 林良子, 能田由紀子, 鵜木祐史

日本音響学会聴覚研究会 2021.11

人名想起課題における非流暢性の調音運動観察

林良子, 孫静, 北村達也, 定延利之

日本音響学会春季研究発表会 (オンライン) 2021.3 日本音響学会

Country：Japan

音楽およびさまざまな音響コンテンツを用いた音響システム計測用ツールの試作について

河原英紀（和歌山大）・矢田部浩平（早大）・榊原健一（北海道医療大）・水町光徳（九工大）・北村達也（甲南大）・坂野秀樹（名城大）

応用音響研究会 2021.3

音声収録におけるスマートデバイスなどを用いた音圧較正の検討

河原英紀, 榊原健一, 水町光徳, 北村達也

日本音響学会聴覚研究会 2021.3 日本音響学会

日本語話者10名の正中面における母音声道形状の分析

天野沢海, 竹本浩典, 北村達也, 能田由紀子, 前川喜久雄

日本音響学会春季研究発表会 (オンライン) 2021.3 日本音響学会

Country：Japan

音声収録におけるスマートデバイスなどを用いた音圧較正の検討

原英紀榊原健一水町光徳北村達也

日本音響学会聴覚研究会 2021.1

発話のしにくさに関する自覚と発話訓練

北村達也, 川村直子, 能田由紀子, 吐師道子

日本音響学会騒音・振動研究会

Event date： 2019.11

researchmap

発話のしにくさに関する意識や実態に関するアンケート調査

北村達也, 能田由紀子, 吐師道子, 竹本浩典

聴覚研究会資料

Event date： 2018.1

単語音声におけるおそ下がり生起頻度の個人差

北村達也, 波多野博顕

音声研究

Event date： 2017.8

researchmap

A method of measuring articulatory space using NDI Wave speech research system

KITAMURA Tatsuya, NOTA Yukiko, HASHI Michiko, HATANO Hiroaki

IEICE technical report. Speech The Institute of Electronics, Information and Communication Engineers

Event date： 2014.11

The Wave speech research system of Northern Digital Inc. is a type of electromagnetic articulograph that can track the position of small sensors placed on the articulator. In the present study, we measured the individual articulatory space of subjects on the basis of the occulusal plane and the shape of the palate. We made a biteplate with four five-degree-of freedom sensors and mounted an impression of the upper teeth and the palate on it, and measured the occulusal plane using it. The xz-plane was defined by the longitudinal and lateral axes of the biteplate and the y-axis was defined as a surface normal of the xz-plane on the mid-sagittal plane. The shape of the palate was also measured by tracing the palate of the impression by a pen-style sensor or palate probe and was mapped into the articulatory space.

researchmap

Effects of Shift of Pitch Frequency on Perception of Speaker Individualities

KITAMURA Tatsuya, KAWAMOTO Hiroki

Technical report of IEICE. EA

Event date： 2013.7

This study investigated effects of the shift of the pitch frequency of sentence speech uttered by five male speakers for perceptual speaker identification. Stimuli used in experiments were re-synthesized speech, of which the pitch frequency was shifted from -6 to +6 semitones in increments of 2 semitones. In the experiments, participants were asked to judge whether the speakers of the stimuli were the same or not. The results showed that the 2-semitone-shift of the pitch frequency does not affect speaker identification. This perception characteristic is different from that for the pitch height of the sentence speech.

発話観測システムNDI Waveのセンサの改良

KITAMURA TATSUYA, NOTA YUKIKO, HATANO HIROAKI, HASHI MICHIKO, NISHITANI MINORU

情報処理学会研究報告(Web)

Event date： 2013.5

researchmap

Similarity of speaker individualities of sentence in ATR speech database set C

KAWAMOTO Hiroki, KITAMURA Tatsuya

IEICE technical report. Speech

Event date： 2013.2

We measured perceptual similarity of speaker individualities for a sentence of twenty male Japanese speakers in ATR speech database set C. Forty participants evaluated perceptual similarity of the sentence of pairs of speakers We obtained inter-speaker distances by a multidimensional scaling analysis on the basis of the results of the perceptual experiments.

Measurement of temporal cange of vocal tract volume during production of plosive and fricative consonants

KITAMURA Tatsuya, HATANO Hiroaki

IEICE technical report. Speech

Event date： 2012.11

The volume of the vocal tract of a male speaker during production of voiced and voiceless plosives and fricatives was measured directly from magnetic resonance imaging (MRI)data. Three-dimensional cine-MRI data of three-morae non-sense words were obtained by a synchronized sampling method, and the temporal change of the vocal tract volume was measured while there was a closure or a constriction at the alveolar. The results showed that the volume of the vocal tract for the voiced plosive /d/ increased almost monotonically, and the volume was larger than that for the voiceless plosive /t/ through the closure section. The maximum value and rise range of the vocal tract volume for the voiced plosive /d/ is greater than that for the voiced fricative /z/.

Speaker normalization by local expansion and contraction of the vocal tract

KITAMURA Tatsuya, TAKEMOTO Hironori, ADACHI Seiji

IEICE technical report. Speech

Event date： 2010.2

Vocal tract area functions for the five Japanese vowels of six male speakers were tuned for their first four formant frequencies to be close to those of a target speaker. The vocal tract warping functions were obtained as relationship between the original and deformed area functions. The results indicate that (1) the warping functions are not linear, (2) the vocal tract length of the deformed area functions are different from that of the target speaker, and (3) the shape of the warping functions of the five vowels are not constant for each speaker.

A6.情動による声道形状変化のMRI観測(研究発表,音声学会2009年度(第23回)全国大会発表要旨)

北村達也

音声研究

Event date： 2009.12

Overview of methods and techniques used in MRI-based speech production studies

TAKEMOTO Hironori, KITAMURA Tatsuya

IEICE technical report. Speech

Event date： 2009.6

MRI is a powerful tool for the study of speech production. Although MRI sequences specifically designed for vocal tract imaging have been developed and everyone can use them, MRI-based speech production studies are small in number. One possible reason is that information about MRI characteristics relating to vocal tract imaging and data processing techniques is limited. In this paper, we provide a brief overview of MRI sequences used in this field, that is, the conventional sequence, phonation synchronized sequence, and movie sequence. Next, we review MRI data processing techniques for each sequence and outline the transmission line model and the finite-difference time-domain method as acoustic analysis methods. Finally, we comment on future prospects concerning MRI-based speech production studies.

researchmap

Acoustic characteristics of solid models based on vowel production MRI data

KITAMURA Tatsuya, TAKEMOTO Hironori, HONDA Kiyoshi

Technical report of IEICE. EA

Event date： 2007.11

"ATR MRI database of Japanese vowel production" was used to evaluate acoustic characteristics of realistic vocal tracts for five Japanese vowels through the measurements of frequency responses from vocal tract solid models formed by a stereo-lithographic technique. An optimized Aoshima's time-stretched pulse signal generated from a horn driver unit was introduced into the solid model at the lip end. The response signals of the models were recorded at the model's glottis. This method permits accurate measurement of acoustic characteristics of the vocal tract including the laryngeal cavity. The results provide a benchmark for testing numerical analysis methods that have been used to study vocal tract acoustics.

researchmap

Analysis of imitated voice produced by a professional impersonator

KITAMURA Tatsuya

IEICE technical report. Speech

Event date： 2007.10

This study is a comparative survey of voice produced by a professional impersonator imitating a target speaker in order to explore possible perceptual factors of similarity of speaker characteristics. The results show that the mean pitch frequency (F0) of the impersonator is approximately 20Hz higher than the target speaker and the dynamics of the F0 contour of the two speakers is closely resemble. The DFT spectra of the speakers are quite similar in its shape and the first, third, and fourth formant frequencies. Moreover, the difference between the amplitude levels of the first harmonic (H1) and the second harmonic (H2), a measure of the glottal source characteristics, are close between the speakers. In contrast, the second formant frequency and syllable duration of the imitated voice differ from the target voice.

A Method for Measuring Tooth Shape by Magnetic Resonance Imaging Using a Thermoplastic Elastomer Dental Mouthpiece

KITAMURA Tatsuya, HIRATA Hiroyuki, HONDA Kiyoshi, FUJIMOTO Ichiro, SHIMADA Yasuhiro, MASAKI Shinobu, NISHIKAWA Takafumi, FUKUI Kotaro, TAKANISHI Atsuo

IEICE technical report.

Event date： 2007.7

This work proposes a method for measurement of tooth shape by magnetic resonance imaging (MRI) using a dental mouthpiece made of a thermoplastic elastomer. Because this materials blended with edible paraffin, it can be imaged with high signal intensity MRI. Also, this dental mouthpiece is formed in a vacuum in order to eliminate formation of air bubbles and thereby obtain even MR images. The teeth, on the other hand, are imaged with low signal intensity that contrasts against the dental mouthpiece, thus enabling extraction of tooth shape from the MR images.

researchmap

Vocal tract resonance under open-glottis condition

TAKEMOTO Hironori, KITAMURA Tatsuya, MOKHTARI Parham, ADACHI Seiji, HONDA Kiyoshi

IEICE technical report. Speech

Event date： 2007.3

Using area functions of the five Japanese vowels, glottal opening effects on the transfer function were examined by introducing a glottal impedance. Because the vocal tract resonance approached an open-tube resonance under the open-glotis condition, the first formant frequency increased. The fourth formant induced by the laryngeal cavity was shifted to a higher frequency and damped until it disappeared, because the laryngeal cavity resonance increased in frequency and attenuated. At the other formants, a node of volume velocity appeared at the junction between the laryngeal and pharyngeal cavities, and the vocal tract resonance could therefore be approximated by a closed-tube resonance of the vocal tract excluding the laryngeal cavity.

researchmap

Effects of acoustic modifications on perception of speaker characteristics for sustained vowels

KITAMURA Tatsuya, SAITOU Takeshi

IEICE technical report. Speech

Event date： 2007.3

An interval scale for contribution of acoustic properties to perception of speaker identity was measured according to Thurstone paired-comparison methodology. In the experiments, several acoustic properties of sustained vowel /a/ uttered by 10 male speakers were modified and those effects on perception of closeness of speaker characteristics were investigated. An interval scale for sound quality of the stimuli was also measured in order to confirm whether the degradation of sound quality affects the results. The results showed that the order of perceptual contribution is speech spectra in higher frequency region, the frequency properties of the glottal source, the mean of the pitch frequency and time-pattern of the amplitude and pitch frequency in decreasing order representing the smaller intra-speaker variation of the properties the more important to perception of speaker identity. However, there is a strong positive correlation between interval scales of closeness of speaker characteristics and sound quality of the stimuli implying that sound quality might affect to the experimental results.

researchmap

Cyclicity of laryngeal cavity resonance due to vocal fold vibration

KITAMURA Tatsuya, TAKEMOTO Hironori, ADACHI Seiji, MOKHTARI Parham, HONDA Kiyoshi

IEICE technical report. Speech

Event date： 2006.7

Acoustic effects of the time-varying glottal area due to vocal fold vibration on the laryngeal cavity resonance were investigated. Vocal tract transfer functions of the five Japanese vowels uttered by three male subjects were calculated under open- and closed-glottis conditions. The results revealed that the resonance appears at the frequency region from 3.0 to 3.7kHz when the glottis is closed and disappears when it is open. Real spectra estimated from open- and closed-glottis periods of vowel sounds also showed the on-off pattern of the resonance within a pitch period. The cyclic nature of the resonance can be explained as the laryngeal cavity acting as a closed tube that generates the resonance during a closed-glottis period, but damps the resonance off during an open-glottis period.

Acoustic characteristics of the laryngeal cavity

TAKEMOTO Hironori, ADACHI Seiji, KITAMURA Tatsuya, HONDA Kiyoshi, MOKHTARI Parham

IEICE technical report. Speech

Event date： 2005.5

Resonant mode analysis was performed on area functions of the five Japanese vowels to investigate acoustic properties of the laryngeal cavity. Around the resonance frequency of the laryngeal cavity, a remarkable increase in volume velocity was observed at the junction between the laryngeal and pharyngeal cavities. This suggests that the vocal tract proper (i.e., superior to the laryngeal cavity) resonates like an open tube. In the present study, such resonance was found to occur at the fourth formant. By contrast, the low volume velocities observed at the other formants revealed that at those frequencies the junction could be considered as a closed end, with the vocal tract proper resonating as a closed tube.

Measurement of changes of vocal tract shape by F_0 shift

KITAMURA Tatsuya, MOKHTARI Parham

Technical report of IEICE. EA

Event date： 2005.3

Effects of pitch frequency (F_0) shift in vocal tract shape were analyzed by volumetric magnetic resonance imaging (MRI). One male subject performed sustained productions of Japanese vowel /a/ and /i/ with being asked to adjust these F_0 to 110, 123, 130, 146, and 164 Hz pure tone. The results of comparison of vocal tract area functions extracted from the MR images revealed that F_0 and area function of the oral cavity show a strong negative correlation for vowel /a/ and F_0 and area function of the pharyngeal cavity show a negative correlation for vowel /i/.

Acoustic analysis of the vocal tract by FEM with voxel meshing

KITAMURA Tatsuya, TAKEMOTO Hironori, HONDA Kiyoshi

IEICE technical report. Speech

Event date： 2004.11

A finite element method (FEM) is applied to acoustic analysis of the vocal tracts of the five Japanese vowels. Finite element (FE) models were created meshing vocal tract regions extracted from volumetric MR images during production of the vowels, by 2×2×2 mm voxel elements (cubic elements). This meshing method converts voxels in an MRI volume data into finite elements, hence it is easy to mesh even though a target region has a complex form. In this study, peak frequencies of transfer functions of the FE models were compared with formant frequencies of speech data. The effects of the inter-dental spaces, the epiglottic vallecula, and the laryngeal tube on transfer functions of the FE model were also investigated. The results show that (1) the peak frequencies of the FE models roughly correspond to the formant frequencies of the speech data except for the vowel /u/, (2) the inter-dental spaces and the epiglottic vallecula cause dips in the transfer functions of the FE models, and (3) the laryngeal tube of the FE model of the vowel /a/ causes the fourth peak in the transfer functions of the FE models.

Difference in vocal tract shape between upright and supine postures Observations by an open-type MR scanner

KITAMURA Tatsuya, TAKEMOTO Hironori, HONDA Kiyoshi, SHIMADA Yasuhiro, FUJIMOTO Ichiro, SYAKUDO Yuko, MASAKI Shinobu, KURODA Kagayaki, OKU-UCHI Noboru, SENDA Michio

IEICE technical report. Speech

Event date： 2004.6

Midsagittal images were collected using an open-type magnetic resonance imaging scanner to examine possible effects of body postures on vowel articulation. Three male speakers performed sustained productions of five Japanese vowels with supine and upright body postures. Comparisons of data between the two conditions revealed that the tongue tends to be more retracted backward in supine posture in back vowels, and that the soft palate and lips also showed effects of gravity. In upright posture, the cervical spine and posterior pharyngeal wall were found to be more anterior relative to the hard palate, which suggests effects of head posture rather than of gravity. Acoustic data demonstrated major spectral differences in the frequency range above 1.5 kHz.

Estimation of transfer function of vocal tract extracted from MRI data by FEM

NISHIMOTO Hironori, AKAGI Masato, KITAMURA Tatsuya, SUZUKI Noriko

IEICE technical report. Speech

Event date： 2004.3

Vocal tract transfer functions (VTTFs) of 3-D vocal tract models were estimated by using the finite element method (FEM) and the method proposed by Sondhi et al. which was using cross-section area functions of vocal tracts. Subjects were two Japanese males with normal vocal tracts and one Japanese male who had oral lesions. The number of peaks of VTTFs and the number of peaks of spectral envelopes were compared. The number of the peaks and peak frequencies of two normal VTTFs estimated by the FEM and the 1-D model were almost corresponding to those from analyzed results of speech waves. The number of peaks of VTTF estimated by the FEM from one abnormal vocal tract was correspond to that from analyzed results of speech waves, and the peak frequencies were close to those from analyzed results of speech waves. Whereas, for the VTTF of an abnormal vocal tract estimated by the 1-D model, even the number of peaks was different from that from analyzed results of speech waves. The results indicate that the FEM can estimate the equivalent number of peaks and the equivalent peak frequencies of the VTTF as the analyzed results even when the vocal tracts were normal and abnormal. On the contrary, the 1-D model estimated them for normal vocal tracts only. This suggests that the FEM is useful for estimating transfer functions of complicated vocal tracts.

Comparison of measured and simulated transfer functions of vocal tract model

KITAMURA Tatsuya, NISHIMOTO Hironori, FUJITA Satoru, HONDA Kiyoshi

IEICE technical report. Speech

Event date： 2003.4

The aim of this study is to confirm accuracy of simulated transfer functions by using the finite element method (FEM). In this study, transfer functions of a few simple acoustical tubes were examined using three methods : acoustical measurement, electric circuit model, and FEM, and resonance frequencies of these transfer functions were compared. Resonance frequencies obtained by these three methods were almost in agreement for a uniform tube. For the replicas of vocal tract physical models from Chiba and Kajiyama, resonance frequencies simulated by using FEM almost corresponded with those by the electric circuit model. However, resonance peaks of measured transfer functions were not evident in the frequency range over 3 kHz. This implies that the measurement method used in this study has some problems.

researchmap

Influence f context and word order in the identification of focal prominence in Japanese dialogue

KITAMURA Tatsuya, ITOH Kayo, ITOH Toshihiko, KITAZAWA Shigeyoshi

IEICE technical report. Speech

Event date： 2002.4

This paper studies the influence of prosodic features, context, and word order on the identification of focused clauses in Japanese dialogue, using a psychoacoustic experiment. In the experiment, question and answer speech was used as stimuli. The questions were to create two different contexts in the stimuli, and the answers had focal prominence at different clauses and had different word orders. The experimental results indicate that (1) prosodic characteristics are more significant for focus identification, (2) context has some effect on identification, and (3) it is probable that the word order has some effect on identification.

Prosodic phrase labeling based on prosodic features for developing prosodic database

KITAMURA Tatsuya, ITOH Toshihiko, MOCHIZUKI Kazuya, KITAZAWA Shigeyoshi

IEICE technical report. Speech

Event date： 2002.1

A very detailed segmentation of prosodic phrase has carried out in order to construct a Japanese prosodic database. The database, referred to here as "Japanese Multext", contains read style speech and spontaneous style speech by three male speakers and three female speakers in Tokyo dialect. The "prosodic phrase", we introduced as a unit of the segmentation, was defined and regarded as a unit of language speech perception. For the exact segmentation, the wide-band spectrum, the narrow-band spectrum, fine speech wave and fundamental frequency shapes, and transition of amplitude of the higher order formants were adopted to enumerate the candidate points for the segment boundary. Fine time adjustment by the steps of the respective fundamental period of the speech determined the exact boundary. To maintain the consistency of the segmentation, one person ascertained the entire segment carefully.

researchmap

Three-dimensional analysis of vocal tract using MRI: Cases with tongue and mouth floor resection

Kitamura Tatsuya, Suzuki Noriko, Saito Hiroto, Michi Ken-ichi, Takahashi Toshiyuki, Akagi Masato, Wakumoto Masahiko

IEICE technical report. Speech

Event date： 2001.3

Magnetic resonance imaging (MRI) techniques were used to investigate three-dimensional vocal tract shape of patients after tongue and mouth floor resection. The vocal tract shape during the production of the vowel /i/ were analyzed. Subjects were two patients and two normals. Vocal tract asymmetry with respect to the mid-saggital plane was analyzed between the patients and the normals. The result shows that the patients' vocal tract have marked asymmetry caused by the surgery. It is possible that the asymmetrical vocal tract shape causes patients' abnormal voice.

Acoustic features after tongue and mouth floor resection : Preliminary report

Saito Hiroto, Suzuki Noriko, Kitamura Tatsuya, Akagi Masato, Michi Ken-ichi

IEICE technical report. Speech

Event date： 1999.3

Acoustic characteristics of patients after tongue and mouth floor resection were investigated using computerized maximum likelihood spectrum estimation and compared with normal subjects. The following resuIts were obtained: 1) F1 was located at a higher frequency in glossectomee than that in normal subjects. 2) In the 2 to 3 kHz range, features of spectrum peaks in glossectomee showed various abnormal patterns, which were not observed in normal subjects. 3) In the 2 to 3 kHz range, variations in the spectral peak with time were observed in glossectomee, which were not observed in normal subjects. From these results it is suggested that the selection of the analysis method is very important for observing the acoustic features of glossectomee.

Significant physical cues for speaker identification in speech spectral envelopes

KITAMURA TATSUYA, AKAGl MASATO

IEICE technical report. Speech

Event date： 1996.3

Significant physical cues for speaker identification of vowels are investigated though psychoacoustic experiments. The stimuli used for the experiments have spectral envelopes modified by the LMA analysis-synthesis system. The results lead to the following conclusions. 1) The peaks in the higher frequency band of the spectral envelopes were more significant than the valleys for speaker identification. 2) Speaker individualities mainly exist in the frequency band higher than the peak around 20 ERB rate (1740 Hz). 3) Even if these peaks were approximated by triangles whose vertex and width were same as those of the peaks, speaker individuality in the stimuli still remained. This indicates that the frequency and the bandwidth of the peaks are significant for speaker identification.

researchmap

Freqency Bands Having Speaker Individualities

Kitamura Tatsuya, Takagi Naoko, Akagi Masato

IEICE technical report. Speech

Event date： 1995.7

Frequency bands having speaker individualities in spectral envelopes of vowels are investigated through ABX test. The stimuli are synthesized by replacing the frequency bands of 0 to 10 ERB rate, 10 to 20 ERB rate and 20 to 30 ERB rate from speaker dependent spectral envelope to the spectral envelope with normalized speaker individualities. The experimental results show (1)that speaker individualities exist mainly in the higher frequency band and voice qualities can be controlled using this band of the spectral envelopes, and (2)that the larger the spectral distance of a frequency band between spectral envelope of speaker dependents and normalized spectral envelopes, the easier it will be to control the voice qualities.

Speaker individualities in speech spectral eavelopes

Kitamura Tatsuya, Akagi Masato

IEICE technical report. Speech

Event date： 1994.3

Speaker individualities in spectral envelopes of vowels are investigated through psychoacostic experiments.The stimuli used for the experiments are spectral envelopes varied by the LMA analysis-synthesis system.The experiment results show(1)that speaker individualities exist in spectral envelopes,(2)that more detailed information of the spectral envelopes is required for speaker identification than that for vowel identification,and that(3)speaker individualities exist in higher than 22 ERB Rate(2212 Hz)remarkably and vowel characteristics exist from 12 ERB Rate(603 Hz)to 22 ERB Rate.These results suggest that speaker individualities can be contoroled respectively.

researchmap

　 PREV - NEXT 　

<KITAMURA Tatsuya>

Presentations -