Skip to main content

Feature Combination for Sentence Similarity

  • Conference paper
Advances in Artificial Intelligence (Canadian AI 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7884))

Included in the following conference series:

Abstract

The possible combinations of features traditionally used for sentence similarity amount to a very large feature space. Considering all possible combinations and training a support vector machine on the resulting meta-features in a two step process significantly improves performance. The proposed method is trained and tested on the SemEval-2012 Semantic Textual Similarity (STS) Shared Task data, outperforming the task’s highest ranking system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  • AbdelRahman, S., Blake, C.: Sbdlrhmn: A Rule-based Human Interpretation System for Semantic Textual Similarity Task. In: Proceedings of the 6th International Workshop on Semantic Evaluation (SemEval 2012), in Conjunction with the First Joint Conference on Lexical and Computational Semantics (2012)

    Google Scholar 

  • Agirre, E., Cer, D., Diab, M., Gonzalez-Agirre, A.: Semeval-2012 Task 6: A Pilot on Semantic Textual Similarity. In: Proceedings of the 6th International Workshop on Semantic Evaluation (SemEval 2012), in Conjunction with the First Joint Conference on Lexical and Computational Semantics (2012)

    Google Scholar 

  • Allison, L., Dix, T.I.: A Bit-String Longest-Common-Subsequence Algorithm. Information Processing Letters 23(5) (1986)

    Google Scholar 

  • Banea, C., Hassan, S., Mohler, M., Mihalcea, R.: UNT: A Supervised Synergistic Approach to Semantic Text Similarity. In: Proceedings of the 6th International Workshop on Semantic Evaluation (SemEval 2012), in Conjunction with the First Joint Conference on Lexical and Computational Semantics (2012)

    Google Scholar 

  • Banerjee, S., Pedersen, T.: Extended Gloss Overlaps as a Measure of Semantic Relatedness. In: International Joint Conference on Artificial Intelligence, vol. 18. Lawrence Erlbaum Associates Ltd. (2003)

    Google Scholar 

  • Bär, D., Biemann, C., Gurevych, I., Zesch, T.: UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures. In: Proceedings of the 6th International Workshop on Semantic Evaluation (SemEval 2012), in Conjunction with the First Joint Conference on Lexical and Computational Semantics (2012)

    Google Scholar 

  • Budanitsky, A., Hirst, G.: Evaluating WordNet-based Measures of Lexical Semantic Relatedness. Computational Linguistics 32(1) (2006)

    Google Scholar 

  • Chan, P.K., Stolfo, S.J.: A comparative evaluation of voting and meta-learning on partitioned data. In: Machine Learning International Conference. Citeseer (1995)

    Google Scholar 

  • Cohen, W.W., Ravikumar, P., Fienberg, S.E., et al.: A Comparison of String Distance Metrics for Name-Matching Tasks. In: Proceedings of the International Joint Conference on Artificial Intelligence Workshop on Information Integration on the Web, IIWeb 2003 (2003)

    Google Scholar 

  • Dagan, I., Glickman, O., Magnini, B.: The Pascal Recognising Textual Entailment Challenge. In: Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment (2006)

    Google Scholar 

  • Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the American society for information science 41(6) (1990)

    Google Scholar 

  • Dolan, B., Quirk, C., Brockett, C.: Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources. In: Proceedings of the 20th International Conference on Computational Linguistics. Association for Computational Linguistics (2004)

    Google Scholar 

  • Dunn, O.J., Clark, V.: Correlation coefficients measured on the same individuals. Journal of the American Statistical Association 64(325) (1969)

    Google Scholar 

  • Fellbaum, C.: WordNet. Theory and Applications of Ontology: Computer Applications (2010)

    Google Scholar 

  • Gabrilovich, E., Markovitch, S.: Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (2007)

    Google Scholar 

  • Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge Univ. Press (1997)

    Google Scholar 

  • Guthrie, D., Allison, B., Liu, W., Guthrie, L., Wilks, Y.: A Closer Look at Skip-Gram Modelling. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (2006)

    Google Scholar 

  • Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. The Journal of Machine Learning Research 3 (2003)

    Google Scholar 

  • Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: an Update. ACM SIGKDD Explorations Newsletter 11(1) (2009)

    Google Scholar 

  • Inkpen, D., Kipp, D., Nastase, V.: Machine Learning Experiments for Textual Entailment. In: Proceedings of the Second Recognizing Textual Entailment Challenge (2006)

    Google Scholar 

  • Jarmasz, M., Szpakowicz, S.: Roget’s Thesaurus and Semantic Similarity. In: Proceedings of the Conference on Recent Advances in Natural Language Processing (2003)

    Google Scholar 

  • Jaro, M.A.: Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida. Journal of the American Statistical Association (1989)

    Google Scholar 

  • Jiang, J.J., Conrath, D.W.: Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. In: Proceedings of the 10th International Conference on Research on Computational Linguistics (1997)

    Google Scholar 

  • Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., et al.: Moses: Open Source Toolkit for Statistical Machine Translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. Association for Computational Linguistics (2007)

    Google Scholar 

  • Landauer, T.K., Foltz, P.W., Laham, D.: An Introduction to Latent Semantic Analysis. Discourse Processes 25(2-3) (1998)

    Google Scholar 

  • Leacock, C., Chodorow, M.: Combining Local Context and WordNet Similarity for Word Sense Identification. WordNet: An Electronic Lexical Database 49(2) (1998)

    Google Scholar 

  • Levenshtein, V.I.: Binary Codes Capable of Correcting Deletions, Insertions, and Reversals. Soviet Physics Doklady 10(8) (1966)

    Google Scholar 

  • Lin, C.Y., Och, F.J.: Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics (2004)

    Google Scholar 

  • Lin, D.: An Information-Theoretic Definition of Similarity. In: Proceedings of the 15th International Conference on Machine Learning, vol. 1 (1998)

    Google Scholar 

  • Lin, C.Y.: ROUGE: A Package for Automatic Evaluation of Summaries. In: Text Summarization Branches Out: Proceedings of the ACL 2004 Workshop (2004)

    Google Scholar 

  • Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press (1999)

    Google Scholar 

  • Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and Knowledge-based Measures of Text Semantic Similarity. In: Proceedings of the National Conference on Artificial Intelligence (2006)

    Google Scholar 

  • Monge, A., Elkan, C.: An Efficient Domain-Independent Algorithm for Detecting Approximately Duplicate Database Records. In: Proceedings of the SIGMOD Workshop on Data Mining and Knowledge Discovery. Citeseer (1997)

    Google Scholar 

  • Morris, J., Hirst, G.: Lexical Cohesion Computed by Thesaural Telations as an Indicator of the Structure of Text. Computational Linguistics 17(1) (1991)

    Google Scholar 

  • Ng, A.Y.: On Feature Selection: Learning with Exponentially many Irrelevant Features as Training Examples. In: Proceedings of the 15th International Conference on Machine Learning (1998)

    Google Scholar 

  • Patwardhan, S.: Incorporating Dictionary and Corpus Information into a Context Vector Measure of Semantic Relatedness. Master’s thesis, University of Minnesota (2003)

    Google Scholar 

  • Pedersen, T., Patwardhan, S., Michelizzi, J.: WordNet: Similarity: Measuring the Relatedness of Concepts. In: Demonstration Papers at North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics (2004)

    Google Scholar 

  • Resnik, P.: Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence (1995)

    Google Scholar 

  • Šaric, F., Glavaš, G., Karan, M., Šnajder, J., Bašic, B.D.: TakeLab: Systems for Measuring Semantic Text Similarity. In: Proceedings of the 6th International Workshop on Semantic Evaluation (SemEval 2012), in Conjunction with the First Joint Conference on Lexical and Computational Semantics (2012)

    Google Scholar 

  • Smola, A.J., Schölkopf, B.: A Tutorial on Support Vector Regression. Statistics and Computing 14(3) (2004)

    Google Scholar 

  • Turney, P.D.: Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 491–502. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  • Winkler, W.E.: String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. In: Proceedings of the Section on Survey Research Methods. American Statistical Association (1990)

    Google Scholar 

  • Wu, Z., Palmer, M.: Verbs Semantics and Lexical Selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shareghi, E., Bergler, S. (2013). Feature Combination for Sentence Similarity. In: Zaïane, O.R., Zilles, S. (eds) Advances in Artificial Intelligence. Canadian AI 2013. Lecture Notes in Computer Science(), vol 7884. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38457-8_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38457-8_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38456-1

  • Online ISBN: 978-3-642-38457-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics