Abstract
The goal of a citation recommendation system is to suggest some references for a snippet in an article or a book, and this is very useful for both authors and the readers. The citation recommendation problem can be cast as an information retrieval problem, in which the query is the snippet from an article, and the relevant documents are the cited articles. In reality, the citation snippet and the cited articles may be described in different terms, and this makes the citation recommendation task difficult. Translation model is very useful in bridging the vocabulary gap between queries and documents in information retrieval. It can be trained on a collection of query and document pairs, which are assumed to be parallel. However, such training data contains much noise: a relevant document usually contains some relevant parts along with irrelevant ones. In particular, the citation snippet may only mention only some parts of the cited article’s content. To cope with this problem, in this paper, we propose a method to train translation models on such noisy data, called position-aligned translation model. This model tries to align the query to the most relevant parts of the document, so that the estimated translation probabilities could rely more on them. We test this model in a citation recommendation task for scientific papers. Our experiments show that the proposed method can significantly improve the previous retrieval methods based on translation models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Xu, J., Croft, W.B.: Query expansion using local and global document analysis. In: SIGIR 1996, pp. 4–11 (1996)
Karimzadehgan, M., Zhai, C.: Estimation of statistical translation models based on mutual information for ad hoc information retrieval. In: Proceeding of SIGIR 2010, pp. 323–330 (2010)
Berger, A., Lafferty, J.: Information retrieval as statistical translation. In: Proceedings of SIGIR 1999, pp. 222–229 (1999)
Nie, J.Y., Simard, M., Isabelle, P., Durand, R.: Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web. In: Proceedings SIGIR 1999, pp. 74–81 (1999)
Lavrenko, V., Choquette, M., Croft, W.B.: Cross-lingual relevance models. In: Proceedings of SIGIR 2002, pp. 175–182 (2002)
Xue, X., Jeon, J., Croft, W.B.: Retrieval models for question and answer archives. In: Proceeding of SIGIR 2008, pp. 475–482 (2008)
Murdock, V., Croft, W.B.: A translation model for sentence retrieval. In: Proceedings of HLT 2005, pp. 684–691. Association for Computational Linguistics, Stroudsburg (2005)
Gao, J., He, X., Nie, J.Y.: Clickthrough-based translation models for web search: from word models to phrase models. In: Proceedings of CIKM 2010, pp. 1139–1148 (2010)
Metzler, D., Bernstein, Y., Croft, W.B., Moffat, A., Zobel, J.: Similarity measures for tracking information flow. In: CIKM 2005, pp. 517–524 (2005)
Lu, Y., He, J., Shan, D., Yan, H.: Recommending citations with translation model. In: Proceedings of CIKM 2011, pp. 2017–2020 (2011)
Fung, P., Cheung, P.: Mining very-non-parallel corpora: Parallel sentence and lexicon extraction via bootstrapping and em. In: Proceedings of EMNLP 2004, pp. 57–63 (2004)
Zhao, B., Vogel, S.: Adaptive parallel sentences mining from web bilingual news collection. In: Proceedings of ICDM 2002, p. 745 (2002)
Lv, Y., Zhai, C.: Positional language models for information retrieval. In: Proceedings of SIGIR 2009, pp. 299–306 (2009)
Wang, M., Si, L.: Discriminative probabilistic models for passage based retrieval. In: Proceedings of SIGIR 2008, pp. 419–426. ACM, New York (2008)
Hearst, M.A., Plaunt, C.: Subtopic structuring for full-length document access. In: Proceedings of SIGIR 2003, pp. 59–68 (1993)
Bestgen, Y.: Improving text segmentation using latent semantic analysis: A reanalysis of Choi, Wiemer-hastings, and Moore (2001); Comput. Linguist. 32, 5–12 (2006)
Misra, H., Yvon, F., Cappé, O., Jose, J.: Text segmentation: A topic modeling perspective. Inf. Process. Manage. 47, 528–544 (2011)
Callan, J.P.: Passage-level evidence in document retrieval. In: Proceedings SIGIR 1994, pp. 302–310 (1994)
Zobel, J., Moffat, A., Wilkinson, R., Sacks-Davis, R.: Efficient retrieval of partial documents. Inf. Process. Manage. 31, 361–377 (1995)
He, Q., Pei, J., Kifer, D., Mitra, P., Giles, L.: Context-aware citation recommendation. In: Proceedings of WWW 2010, pp. 421–430 (2010)
McNee, S.M., Albert, I., Cosley, D., Gopalkrishnan, P., Lam, S.K., Rashid, A.M., Konstan, J.A., Riedl, J.: On the recommending of citations for research papers. In: Proceedings of CSCW 2002, pp. 116–125 (2002)
Zhou, D., Zhu, S., Yu, K., Song, X., Tseng, B.L., Zha, H., Giles, C.L.: Learning multiple graphs for document recommendations. In: Proceeding of WWW 2008, pp. 141–150 (2008)
Nascimento, C., Laender, A.H., da Silva, A.S., Gonçalves, M.A.: A source independent framework for research paper recommendation. In: Proceedings of JCDL 2011, pp. 297–306 (2011)
Kodakateri Pudhiyaveetil, A., Gauch, S., Luong, H., Eno, J.: Conceptual recommender system for citeseerx. In: Proceedings of RecSys 2009, pp. 241–244 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
He, J., Nie, JY., Lu, Y., Zhao, W.X. (2012). Position-Aligned Translation Model for Citation Recommendation. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds) String Processing and Information Retrieval. SPIRE 2012. Lecture Notes in Computer Science, vol 7608. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34109-0_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-34109-0_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34108-3
Online ISBN: 978-3-642-34109-0
eBook Packages: Computer ScienceComputer Science (R0)