論文 - 北村 達也
-
ATR音声データベース内の文音声における知覚的話者間類似度の計測 査読あり
北村達也, 中間隆正, 大村宙, 川元広樹
日本音響学会誌 71 ( 10 ) 516 - 525 2015年10月
共著
ATR音声データベースセットCの関東出身話者男女各20名による文音声を対象にして音声の個人性の類似度評価を行った。同性の2話者の音声を1対とし,話者すべての組み合わせを実験参加者に提示してその類似度を5段階で判定させた。その後,異なる実験参加者群により再度同じ実験を行い,結果の再現性を確認した。その結果から知覚的な話者間類似度を求めると共に非計量多次元尺度構成法にて話者を平面上に布置した。得られた話者の布置と相関の高い特徴量を求めたところ,男性話者では平均F_0と話者の年齢とポーズ合計時間長,女性話者では平均F_0と発話時間長と話者の年齢となった。
-
Improvement of five-degree-of-freedom sensors for Northern Digital Incorporated's Wave speech research system
Tatsuya Kitamura, Yukiko Nota, Michiko Hashi, Hiroaki Hatano
Acoustical Science and Technology 36 ( 4 ) 347 - 350 2015年
-
Manzai Robots: Entertainment Robots Based on Auto-Created Manzai Scripts from Web News Articles
UMETANI Tomohiro, MASHIMO Ryo, NADAMOTO Akiyo, KITAMURA Tatsuya, NAKAYAMA Hirotaka
J Robot Mechatron 26 ( 5 ) 662 - 664 2014年10月
-
スキャニングレーザドップラ振動計による歌唱時の皮膚振動計測における再現性の検証 査読あり
北村達也
音声言語医学 55 ( 2 ) 167 - 172 2014年4月
単著
本研究では,スキャニングレーザドップラ振動計を用いて歌唱時の顔面の皮膚振動を複数回計測し,計測間の差異を評価した.レーザドップラ振動計とは,対象物にレーザ光を当て,振動によって反射光に生じるドップラ効果を利用して対象物の振動速度や変位を計測するシステムである.また,スキャニング型の振動計は,事前に指定した複数の計測点を自動的に走査して振動を計測することができる.本研究には声楽経験者3名が参加した.実験は坐位にて行い,前額をあご台のフレームに当てることによって頭部を固定した.そして,各自の出しやすい高さにて母音/a/を連続歌唱させ,歌唱区間における皮膚振動速度を計測した.3回の計測結果を比較した結果,平均二乗誤差は4.0 dB以下であった.また,3回の計測値の中央値から6 dB外れている計測点は全計測点の2.4%であり,これらの多くはレーザ光が垂直に当たりにくい部分であった.
DOI: 10.5112/jjlp.55.167
-
日英母語話者による英語弱化母音の音響・調音特徴 : X線マイクロビームデータベースに基づく分析
波多野 博顕, 北村 達也
日本音響学会誌 70 ( 3 ) 106 - 113 2014年3月
共著
出版者・発行元:一般社団法人日本音響学会
英語弱化母音/〓/の音響・調音特徴の記述と日英母語話者による相違の解明を目的として,英語母音/〓,〓,〓,〓/を対象に定量的分析を行った。分析にはX線マイクロビームデータベースを用い,"X-ray microbeam speech production database"から英語話者16名,その日本語版から日本語話者9名を選出した。各母音は単語発話より抽出し,持続時間,第1・第2フォルマント周波数,舌ペレット位置を計測した。結果を以下にまとめる。1)両母語話者とも/〓/の持続時間は/〓,〓/よりも短い。2)/〓/において,英語母語話者では舌の上下・前後方向に中舌化するが,日本語母語話者では上下方向のみである。3)英語母語話者のみ/〓/が後続子音へ調音同化するが,これは日英の音韻的な母音カテゴリに起因する。
-
発話観測システムNDI Waveの改良型センサを用いた子音構音の観測
北村 達也, 能田 由紀子, 波多野 博顕, 吐師 道子, 西谷 実
音声言語医学 55 ( 1 ) 59 - 59 2014年1月
-
Comparison of vocal tract transfer functions calculated using one-dimensional and three-dimensional acoustic simulation methods
Hironori Takemoto, Parham Mokhtari, Tatsuya Kitamura
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4 408 - 412 2014年
共著
出版者・発行元:ISCA-INT SPEECH COMMUNICATION ASSOC
Acoustic characteristics of the vocal tract have been investigated extensively in the literature using a one-dimensional (ID) acoustic simulation method. Because the ID method assumes plane wave propagation only, it is recognized to be valid only in the low frequency region (below about 4 or 5 kHz). Recently, a three-dimensional (3D) acoustic simulation method was developed, to obtain more precise acoustic characteristics of the vocal tract. In the present study, from a male's vocal tract shapes, transfer functions were calculated using the 1D and 3D methods and compared with each other to evaluate the valid frequency range of the ID method. As a result, when acoustic effects of the piriform fossae were considered in the ID method, the transfer functions agreed with each other up to 7 kHz (ignoring small dips). The 3D method showed that a deep dip was generated at around 8 kHz by the transverse resonance mode in the pharynx. Above this dip frequency, the transfer functions disagreed with each other. Thus, the ID method is valid up to 7 kHz for this subject. Because this subject has a relatively large vocal tract, in general the upper limit of the valid frequency range could exceed 8 kHz.
-
Vocal tract length estimation based on vowels using a database consisting of 385 speakers and a database with MRI-based vocal tract shape information
Hideki Kawahara, Tatsuya Kitamura, Hironori Takemoto, Ryuichi Nisimura, Toshio Irino
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4 870 - 874 2014年
共著
出版者・発行元:ISCA-INT SPEECH COMMUNICATION ASSOC
A highly-reproducible estimation method of vocal tract length (VTL) and text independent VTL estimation method are proposed based on a Japanese vowel database spoken by 385 male and female speakers ranging from age 6 to 56 and other vowel database with MRI-based vocal tract shape information. Proposed methods are based on interference-free power spectral representation and systematic suppression of biasing factors. MRI data is used to calibrate VTL estimation result to be represented in terms of physically meaningful unit. These databases are normalized based on the estimated VTL information to provide a reference template, which is used to implement a text independent VTL estimation method. A prototype system for text independent estimation of VTL is implemented using Mat lab and runs faster than realtime on a PC.
-
Acoustic interaction between the right and left piriform fossae in generating spectral dips 査読あり
Hironori Takemoto, Seiji Adachi, Parham Mokhtari, Tatsuya Kitamura
Journal of the Acoustical Society of America 134 ( 4 ) 2955 - 2964 2013年10月
-
Naturalness on Japanese pronunciation before and after shadowing training and prosody modified stimuli
Rongna A, Ryoko Hayashi, Tatsuya Kitamura
Proceedings of Interspeech 2013 Satellite workshop on Speech and Language Technology in Education 143 - 146 2013年8月
共著
-
Timing differences in articulation between voiced and voiceless stop consonants: An analysis of cine-MRI data
Masako Fujimoto, Tatsuya Kitamura, Hiroaki Hatano, Ichiro Fujimoto
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 955 - 958 2013年
共著
出版者・発行元:ISCA-INT SPEECH COMMUNICATION ASSOC
Laryngeal and supralaryngeal articulators coordinately work to produce speech sounds. In order to study differences in supralaryngeal manifestations of voiced and voiceless consonants, we compared the tongue movement during a minimal pair /agise/ and /akise/ using the fast scanning techniques of MRI movies. The result showed that the tongue displacement starts earlier in /k/ than in /g/ for many of the speakers of Tokyo Japanese. This agrees with our previous findings using other dialect speakers. These results suggest that many Japanese actively differentiate supralaryngeal articulation according to the voicing of the consonants, raising the tongue earlier in voiceless ones. This movement is presumably to ensure the voicelessness of the consonant. The present study also supplies evidence for the usefulness of a constructive approach for physical modeling.
-
Differences in articulatory movement between voiced and voiceless stop consonants 査読あり
Ryosuke O. Tachibana, Tatsuya Kitamura, Masako Fujimoto
Acoustical Science and Technology 33 ( 6 ) 391 - 393 2012年11月
-
Measurement of vibration velocity pattern of facial surface during phonation using scanning vibrometer
Tatsuya Kitamura
Acoustical Science and Technology 33 ( 2 ) 126 - 128 2012年3月
-
A Method for Predicting Stressed Words in Teaching Materials for English Jazz Chants 査読あり
NAGATA Ryo, FUNAKOSHI Kotaro, KITAMURA Tatsuya, NAKANO Mikio
IEICE Trans Inf Syst (Inst Electron Inf Commun Eng) E95.D ( 11 ) 2658 - 2663 2012年
-
Correlation between vocal tract length, body height, formant frequencies, and pitch frequency for the five Japanese vowels uttered by fifteen male speakers
Hiroaki Hatano, Tatsuya Kitamura, Hironori Takemoto, Parham Mokhtari, Kiyoshi Honda, Shinobu Masaki
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3 402 - 405 2012年
共著
出版者・発行元:ISCA-INT SPEECH COMMUNICATION ASSOC
We conducted quantitative analyses of a magnetic resonance imaging (MRI) database to examine the correlation between physical measures (vocal tract length and body height) and acoustic parameters (pitch and formant frequencies) of vowels. The vocal tract length was measured from MRI data for the five Japanese vowels produced by fifteen male Japanese speakers between the ages of 24 and 55. The acoustic features were computed from vowel sounds recorded during scan. The vocal tract length showed a weak positive correlation with the speakers' age (correlation coefficient r = 0.51) but not with the speaker body height (r = 0.08). There were only weaker correlations between the vocal tract length and the first four formant frequencies except that F1 and F2 of the vowel /e/ show negative correlations with the vocal tract length (F1: r = -0.65, F2: r = -0.56). The result suggests that the vocal tract length is one of the dominant factors causing individual differences in the formant frequencies for the vowel /e/, produced by not forming a strong constriction. Furthermore, the pitch frequency was negatively correlated with the body height (r = -0.61).
-
Simulation of the coupling between vocal-fold vibration and time-varying vocal tract
Yosuke Tanabe, Parham Mokhtari, Hironori Takemoto, Tatsuya Kitamura
Journal of the Acoustical Society of America 130 ( 4 ) 2441 2011年10月
共著
-
Study of perceptual factors for speaker identification focusing on perceptual similarity of speaker characteristics
Tsuyoshi Izumida, Tatsuya Kitamura
Acoustical Science and Technology 32 ( 5 ) 216 - 219 2011年9月
-
Dental imaging using a magnetic resonance visible mouthpiece for measurement of vocal tract shape and dimension
Tatsuya Kitamura, Hironori Nishimoto, Ichiro Fujimoto, Yasuhiro Shimada
Acoustical Science and Technology 32 ( 5 ) 224 - 227 2011年9月
共著