Review Papers (Misc) - Kazuhiro Seki
-
Using Microblog for Syndromic Surveillance
岡村 直人, 関 和広, 上原 邦昭
研究報告自然言語処理(NL) 2011 ( 9 ) 1 - 7 2011.5
感染症サーベイランスには,大別して医療機関の情報を利用した方法とWeb情報を利用した方法がある.Web情報はリアルタイム性が高く,感染症の蔓延防止を目的とする感染症サーベイランスに有用である可能性がある.本論文では,Web情報,特にソーシャルメディアを用いた感染症サーベイランスの第一歩として,インフルエンザを対象にTwitterの有効性を実験的に調査する.Twitterに投稿されたインフルエンザの症状を含むtweetの分類を行い,実際のインフルエンザ報告件数との関係を分析する.There are roughly two types of syndromic surveillance; One uses information from medical institutions and another gathers information from the Web. The information used by the latter, such as consumer generated media (CGM), may reflect more real-time events and thus may be more useful for syndromic surveillance since detecting early infection of a target syndrome prevents wider spread of the syndrome. This paper investigates the usefulness of CGM, specifically microblogs, for syndromic surveillance focusing on influenza. We collect a number of microblog posts (tweets) which include symptoms of influenza and study their size and transition against those of reported true influenza cases.
-
吉川幹人, 関和広, 上原邦昭
第73回全国大会講演論文集 2011 ( 1 ) 403 - 404 2011.3
我々が情報検索を行う際,一度の検索では目的の情報を発見できず,検索質問を修正しながら連続して検索を行うことがある.このような「Query Chain」を利用することで,検索質問と(非)適合文書とを関連づけた学習データを効率的に自動生成する手法が提案されている.しかし,Query Chainによって作成した訓練事例を用いた検索は,学習データに出現しない検索質問に対してはうまく機能せず,一般的なウェブ検索等に用いることは困難であった.本研究では,検索質問の類似性を考慮して訓練事例を確率的にサンプリングすることにより,この問題の解決を試みる.また,より高品質・多量の訓練事例を獲得するためにQuery Chainの拡張を行なう.さらに,実データを用いた評価実験によって提案手法の有効性を検証する.
-
Promising Entities Discovery Based on Network Analysis
宮西 大樹, 関 和広, 上原 邦昭
研究報告数理モデル化と問題解決(MPS) 2011 ( 3 ) 1 - 8 2011.2
本論文では,リンク予測の問題を解くことで,ノードの順位予測を行うモデルを提案し,共著ネットワークから,将来的に重要または影響力を持つ著者 (有望エンティティ) を同定する.従来では,ある時点における著者をノード,著者同士の共著関係をエッジとした共著ネットワークから,構造的な特徴を基に重要度や影響力の大きな著者の同定を行ってきた.しかし,著者同士の関係は年を追うごとに変化しており,著者の最新の重要度や影響力を把握するためには,現時点における著者間の関係を見るだけでは不十分である.そこで,本論文では,時間とともに変化するネットワークデータを対象として,ネットワークの構造によって決定された各ノードの将来的な重要度・影響力 (ネットワークの中心性) をリンク予測と RankBoost を用いることでノードの順位を予測する手法を提案する.この手法を共著ネットワークに適用することで,将来の主要な著者を予測する.arXiv(hep-th) データセットから抽出した共著ネットワークを用いた実験により,リンク予測をノードの順位予測に適用させることで AUC の高いリンク予測行うことができ,将来的なノードの順位をより正しく予測できた.This paper proposes a framework to predict future significance or importance of nodes of a network through link prediction. The network can be any kind, such as a co-authorship network where nodes are authors and co-authors are linked by edges. In this example, prediciting significant nodes may mean to discover influential authors in the future. There are existing approaches to predicting such significant nodes in a future network and they typically rely on existing relationships between nodes. However, since such relationships are dynamic and would naturally change over time (e.g., new co-authorship continues to emerge), approaches based only on the current status of the network would have limited potentiality to predict the future. In contrast, our proposed approach first predicts future links between nodes by multiple supervised classifiers and applies the RankBoost algorithm for combining the predicitions such that the links would lead to more precise predictions of a centrality (significance) measure of our choice. To demonstrate the effectiveness of our proposed approach, a series of experiments are carried out on the arXiv (HEP-Th) citation data set.
-
Comparative Study on Social Tags and Controlled Vocabularies for Biomedical Information Retrieval
QIN Huawei, SEKI Kazuhiro, UEHARA Kuniaki
IEICE technical report 110 ( 400 ) 71 - 76 2011.1
Publisher:The Institute of Electronics, Information and Communication Engineers
This paper focuses on social bookmarks (or social tags) and investigates the their utility for information retrieval (IR). Our main research question asked in the present work is "How are social tags compared with conventional, yet reliable manual indexing from the viewpoint of IR performance?". To answer the question, we look at the biomedical literature and begin with examining basic statistics of social tags from CiteULike in comparison with Medical Subject Headings (MeSH) annotated in the Medline bibliographic database. Then, using the data, we conduct various experiments in IR settings, which reveal that retrieval performance can be improved by using social tags as additional indices and that the quality of social tags can be measured by the number of CiteULike users who use the same tags.
-
1C1-2 An Ensemble Approach to Blog Distillation
Murasato So, Noguchi Tomoyoshi, Seki Kazuhiro, Uehara Kuniaki
インテリジェントシステム・シンポジウム講演論文集 2011 ( 21 ) 70 - 73 2011
Publisher:日本機械学会
The previous work for blog feed search typically aggregates the contents of blog posts or the relevance of blog posts belonging to the same site to find relevant blog sites. As another approach, the present study focuses on an assumption that there are some characteristics shared among relevant blog sites and, based on the assumption, proposes a machine learning framework for feed search. More precisely, we adapt an ensemble framework, which combines multiple classifiers or their outputs, and treat retrieval models as pseudo classifiers.
-
A User Agent for Finding Unknown Associations
HAGIMURA Takuya, SEKI Kazuhiro, UEHARA Kuniaki
IEICE technical report 110 ( 42 ) 99 - 103 2010.5
Publisher:The Institute of Electronics, Information and Communication Engineers
In the biomedical domain, a number of researchers have conducted research to find potential relationships (hypotheses). This framework could be applied to other domains, which, if viable, can be a support to creative thinking by helping human conception and cogitation. In this paper, we discuss the idea of the "User Agent for Finding Unknown Associations", which is an information retrieval system equipped with functions to generate and rank syllogistic hypotheses.
-
関 和広, 上原 邦昭
電子情報通信学会技術研究報告. LOIS, ライフインテリジェンスとオフィス情報システム = IEICE technical report. LOIS, Life intelligence and office information systems 110 ( 42 ) 1 - 6 2010.5
Publisher:一般社団法人電子情報通信学会
ブログやマイクロブログ(Twitterなど)といったソーシャルメディアの利用者の増加に従い,これら新しいメディアからの情報の抽出・利用についての研究が盛んに行われている.本研究では,ソーシャルメディアを実世界のオブジェクトのメタデータと捉え,これが従来の情報検索に及ぼす影響について議論する.特に本稿では,ソーシャルブックマークに注目し,熟練者による従来の統制語彙に基づく索引との比較を通して,情報検索におけるその有用性を検証する.より具体的には,生物医学分野の文献を題材とし,各論文に付与されたMeSH索引語(統制語彙に基づく索引)とソーシャルブックマークサービスの1つであるCiteULikeを利用して付与されたソーシャルタグを比較し,その特徴と有用性を様々な観点から実験的に調査する.実験の結果,情報検索においてソーシャルタグはMeSHと相補的に機能し,ソーシャルタグの網羅性が高まるほど検索精度が向上することが示された.
-
Gene Functional Annotation by Ortholog-based Hierarchical Classification
KINO YOSHIHIRO, SEKI KAZUHIRO, UEHARA KUNIAKI
IPSJ SIG technical reports 2008 ( 126 ) 107 - 110 2008.12
Publisher:Information Processing Society of Japan (IPSJ)
This paper proposes a novel method for gene functional annotation in the framework of hierarchical classification that uses as constraints known (already annotated) functions of genes orthologous to a given gene. A gene function is a biological property of a gene or the product it encodes, and is annotated with each gene in model organism databases, such as FlyBase and MGI. These gene functions are described using Gene Ontology (GO), common vocabularies to enable uniform access to different model organisms databases. Our proposed approach exploits gene functions of orthologous gene as constraints, dynamically creating classifiers from training data available under the constraints. The effectiveness of the proposed approach is demonstrated in various experiments.
-
Generative Model for Diverse Katakana Variants based on English Phonetic Orthography
HATTORI HIROYUKI, SEKI KAZUHIRO, UEHARA KUNIAKI
IPSJ SIG Notes 2008 ( 17 ) 65 - 68 2008.3
Publisher:Information Processing Society of Japan (IPSJ)
In Japanese orthography, there is often more than one way to spell a phoneme sequence. This is especially true for katakana words which are typically transliterations from foreign languages. For example, "Los Angeles" can be written as "rosuanjerusu," "rosanzerusu," or "rosuanzerusu" in Japanese; they all are considered legitimate. This ambiguity becomes a critical problem for automatic processing when those variants need to be associated with the same concept. To deal with the problem, this paper proposes a novel approach to produce katakana variants for a given katakana word based on a generative model that considers phonetic orthography of the original language for the given word. The proposed model is empirically evaluated based on the variants it generated. It is also shown that the model is beneficial for information retrieval systems when applied to query expansion.
-
Predicting Implicit Genetic Associations using an IR Model
関和広, MOSTAFA Javed
情報処理学会シンポジウムシリーズ(CD-ROM) 2007 ( 3 ) 1C-3 2007.11
-
Automatic Katakana Variants Generation via English Phonemes
HATTORI HIROYUKI, SEKI KAZUHIRO, UEHARA KUNIAKI
IPSJ SIG Notes 2007 ( 94 ) 59 - 64 2007.9
Publisher:一般社団法人情報処理学会
In information retrieval and other text processing applications, there has been a problem concerned with variant notations. For example, "Los Angeles" can be written as "rosuanjerusu, " "rosanzerusu, " or "rosuanzerusu" in Japanese. Thus, it would be desirable that a search system considers all the notations given any of them as a query. Although, there has been much research conducted for dealing with the problem, the previous work typically relied on the katakana rewriting rules derived from Japanese corpora or search engine logs, which apt to be suffered from the data sparseness problem. This paper proposes-based on our observation that a number of katakana variants are influenced by the pronunciation in the source language-a method to automatically generate katakana variants by back-transliterating a katakana word. The proposed method is evaluated on the NTCIR-3 Web retrieval test collection.