Abstract
We propose a method to acquire paraphrases from the Web in accordance with a given sentence. For example, consider an input sentence “Lemon is a high vitamin c fruit”. Its paraphrases are expressions or sentences that convey the same meaning but are different syntactically, such as “Lemons are rich in vitamin c”, or “Lemons contain a lot of vitamin c”. We aim at finding sentence-level paraphrases from the noisy Web, instead of domain-specific corpora. By observing search results of paraphrases, users are able to estimate the likelihood of the sentence as a fact. We evaluate the proposed method on five distinct semantic relations. Experiments show our average precision is \(60.5\,\%\), compared to TE/ASE method with average precision of \(44.15\,\%\). Besides, we can acquire 3 paraphrases more than TE/ASE method per input.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
The acquisition relation exists between two companies such that one company acquired another.
- 4.
The directorOf relation exists between a director and his works, i.e. (Steven Spielberg,Saving Private Ryan), (James Cameron,Titanic).
- 5.
The leaderOf relation exists between a country and its current leader, i.e. (Barack Obama,U.S.), (Giorgio Napolitano,Italy).
- 6.
The ceoOf relation exists between a company and the chief executive officer of that company, i.e. (Tim Cook,Apple), (Mark Zuckerberg,Facebook).
- 7.
The founderOf relation exists between a person and his founded company, i.e. (Larry Page,Google).
- 8.
Replace entities in e with variables.
- 9.
- 10.
References
Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: Proceedings of the Fifth ACM Conference on Digital Libraries, DL 2000, pp. 85–94 (2000)
Anick, P.G., Tipirneni, S.: The paraphrase search assistant: terminological feedback for iterative information seeking. In: Proceedings of the 22Nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 153–159 (1999)
Bannard, C., Callison-Burch, C.: Paraphrasing with bilingual parallel corpora. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 597–604 (2005)
Barzilay, R., Elhadad, N.: Sentence alignment for monolingual comparable corpora. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 25–32 (2003)
Barzilay, R., McKeown, K.R.: Extracting paraphrases from a parallel corpus. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pp. 50–57 (2001)
Barzilay, R., McKeown, K.R., Elhadad, M.: Information fusion in the context of multi-document summarization. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 550–557 (1999)
Bollegala, D.T., Matsuo, Y., Ishizuka, M.: Relational duality: unsupervised extraction of semantic relations between entities on the web. In: Proceedings of the 19th International Conference on World Wide Web, pp. 151–160 (2010)
Callison-Burch, C., Koehn, P., Osborne, M.: Improved statistical machine translation using paraphrases. In: Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pp. 17–24 (2006)
Denning, P., Horning, J., Parnas, D., Weinstein, L.: Wikipedia risks. Commun. ACM 48(12), 152–152 (2005)
Etzioni, O., Banko, M., Soderland, S., Weld, D.S.: Open information extraction from the web. Commun. ACM 51(12), 68–74 (2008)
Etzioni, O., Cafarella, M., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the web: an experimental study. Artif. Intell. 165, 91–134 (2005)
Harris, Z.S.: Distributional structure. Word 10, 146–162 (1954)
Idan, I.S., Tanev, H., Dagan, I.: Scaling web-based acquisition of entailment relations. In: Proceedings of EMNLP, pp. 41–48 (2004)
Lin, D., Pantel, P.: Dirt - discovery of inference rules from text. In: Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 323–328 (2001)
Madnani, N., Ayan, N.F., Resnik, P., Dorr, B.J.: Using paraphrases for parameter tuning in statistical machine translation. In: Proceedings of the ACL Workshop on Statistical Machine Translation (2007)
Marton, Y., Callison-Burch, C., Resnik, P.: Improved statistical machine translation using monolingually-derived paraphrases. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 381–390 (2009)
McKeown, K.R., Barzilay, R., Evans, D., Hatzivassiloglou, V., Klavans, J.L., Nenkova, A., Sable, C., Schiffman, B., Sigelman, S., Summarization, M.: Tracking and summarizing news on a daily basis with columbia’s newsblaster. In: Proceedings of the Second International Conference on Human Language Technology Research, pp. 280–285 (2002)
Ohshima, H., Oyama, S., Tanaka, K.: Searching coordinate terms with their context from the web. In: Aberer, K., Peng, Z., Rundensteiner, E.A., Zhang, Y., Li, X., Unland, R. (eds.) WISE 2006. LNCS, vol. 4255, pp. 40–47. Springer, Heidelberg (2006)
Paşca, M., Dienes, P.: Aligning needles in a haystack: paraphrase acquisition across the web. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y., Unland, R. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 119–130. Springer, Heidelberg (2005)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Inc., New York (1986)
Shinyama, Y., Sekine, S.: Paraphrase acquisition for information extraction. In: Proceedings of the Second International Workshop on Paraphrasing, vol. 16, 65–71 (2003)
Shinyama, Y., Sekine, S., Sudo, K.: Automatic paraphrase acquisition from news articles. In: Proceedings of the Second International Conference on Human Language Technology Research, HLT 2002, pp. 313–318 (2002)
Wang, R., Callison-Burch, C.: Paraphrase fragment extraction from monolingual comparable corpora. In: Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web, pp. 52–60 (2011)
Wubben, S., van den Bosch, A., Krahmer, E., Marsi, E.: Clustering and matching headlines for automatic paraphrase acquisition. In: Proceedings of the 12th European Workshop on Natural Language Generation, ENLG 2009, pp. 122–125 (2009)
Yamamoto, Y., Tanaka, K.: Towards web search by sentence queries: asking the web for query substitutions. In: Yu, J.X., Kim, M.H., Unland, R. (eds.) DASFAA 2011, Part II. LNCS, vol. 6588, pp. 83–92. Springer, Heidelberg (2011)
Yates, A., Cafarella, M., Banko, M., Etzioni, O., Broadhead, M., Soderland, S.: Textrunner: Open information extraction on the web. In: Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pp. 25–26 (2007)
Acknowledgment
This work was supported in part by the following projects: Grants-in-Aid for Scientific Research (Nos. 24240013, 24680008) from MEXT of Japan.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhao, M., Ohshima, H., Tanaka, K. (2015). Finding Paraphrase Facts Based on Coordinate Relationships. In: Liu, A., Ishikawa, Y., Qian, T., Nutanong, S., Cheema, M. (eds) Database Systems for Advanced Applications. DASFAA 2015. Lecture Notes in Computer Science(), vol 9052. Springer, Cham. https://doi.org/10.1007/978-3-319-22324-7_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-22324-7_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22323-0
Online ISBN: 978-3-319-22324-7
eBook Packages: Computer ScienceComputer Science (R0)