Abstract
To improve and enhance information retrieval on text database, there have been many approaches proposed so far, but few investigation captures context aspects of queries (of languages) directly. Here, we propose a new approach to retrieve contextual dependencies in Japanese based on latent topic model. The key idea comes from dependency structure which captures context in the database and the queries. We examine some experimental results to see the effectiveness.
Similar content being viewed by others
Notes
In other words, a topic does not mean human-recognizable subject such as politics or airplane but a kind of cluster putting together by some probabilistic measure.
Early draft version of this work appeared as “Context-based Query using Dependency Structures based on Latent Topic Model” in 2nd International Conference on Model and Data Engineering (MEDI2012), Poitiers, France. We have extended the comparison with several relevant investigation, revised the discussion section and some other minor changes.
word is a syntax.
One exception is any predicate should appear as a last verb.
We mean we may generate dependencies based on this probability distribution.
Here, we assume the joint probability in a form of naive Bayesian manner.
Clinton, ZeroZero, Ashita appear where the latter two words show the names of Manga.
References
Berger A, Lafferty J (1999) Information retrieval as statistical translation. Proceedings of the ACM SIGIR, pp 222–229
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Blei DM, Lafferty JD (2006) Dynamic topic models. Proceedings of the ICML, pp 113–120
Canini KR, Shi L, Griffiths TL (2009) Online inference of topics with latent dirichlet allocation. J Mach Learn Res Proc Track 5:65–72
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman, R: Indexing by latent semantic analysis. J Am Soc Inform Sci 41(6):391–407 (1990)
Grossman DA, Frieder O 2004 Information retrieval algorithms and heuristics. 2nd edn. Springer, Heidelberg
Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international SIGIR conference on research and development in information retrieval, proceedings ACM SIGIR
Hoffman MD, Blei DM, Bach FR (2010) Online learning for latent dirichlet allocation. In: Proceedings of the 24th annual conference on neural information processing systems (NIPS), pp 856–864
Iwata T, Yamada T, Sakurai Y, Ueda N (2010) Online multiscale dynamic topic models. In: Proceedings of the ACM SIGKDD, pp 663–672
Kurohashi S, Nagao M (1994) KN Parser : Japanese dependency/case structure analyzer. In: Proceedings of the workshop on sharable natural language, resources
Li W, McCallum A (2006) Pachinko allocation: DAG-structured mixture models of topic correlations. In: Proceedings of the ICML, pp 577–584
Liu X, Croft WB (2004) Cluster-based retrieval using language models. In: Proceedings of the ACM SIGIR, pp 186–193
Manning CD, Schutze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge
Sato I, Kurihara K, Nakagawa H (2010) Deterministic single-pass algorithm for LDA. In: Proceedings 24th annual conference on neural information processing systems (NIPS), pp 2074–2082
Shinzato K, Kurohashi S (2010) Exploiting term importance categories and dependency relations for natural language search. In: Proceedings of the 2nd workshop on NLPIX 2010, Beijing, pp 2–11
Shirai M, Miura T (2011) On domain independence of author identification. In: Proceedings of the 12th international conference on intelligent data engineering and automated learning (IDEAL), Norwich
Wakabayashi K, Miura T (2012) Forward–backward activation algorithm for Hierarchical Hidden Markov models. In: Proceedings of the 26th annual conference on neural information processing systems (NIPS), Nevada
Wei X, Bruce Croft W (2006) LDA-based document models for Ad-Hoc retrieval. In: Proceedings of the ACM SIGIR
Yanagisawa T, Miura T (2009) Sentence generation for stream announcement. In: Proceedings of the IEEE international pacific rim conference on communications, computers and signal processing (PACRIM)
Yanagisawa T, Miura T, Shioya I (2010) Simplifying sentences by frequent parsing patterns. In: Proceedings of the 11th international conference on intelligent data engineering and automated learning (IDEAL)
Yi X, Allan J (2009) A comparative study of utilizing topic models for information retrieval. In: Proceedings of the 31th European conference on IR research on advances in, information retrieval, pp 29–41
Zhai C, Lafferty J (2001) A study of smoothing methods for language models applied to ad-hoc information retrieval. In: Proceedings of the ACM SIGIR, pp 334–342
Acknowledgments
The authors deeply thank the reviewers of the journal and the MEDI2012 conference for their helpful comments. The authors kept feeling as if all of us had completed the joint work of this investigation.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shirai, M., Yanagisawa, T. & Miura, T. Context-Based Query Using Dependency Structures Based on Latent Topic Model. J Data Semant 3, 157–168 (2014). https://doi.org/10.1007/s13740-013-0031-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13740-013-0031-3