Skip to main content

Web News Sentence Searching Using Linguistic Graph Similarity

  • Conference paper
  • First Online:
Databases and Information Systems (DB&IS 2016)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 615))

Included in the following conference series:

  • 611 Accesses

Abstract

As the amount of news publications increases each day, so does the need for effective search algorithms. Because simple word-based approaches are inherently limited, ignoring much of the information in natural language, in this paper we propose a linguistic approach called Destiny, which utilizes this information to improve search results. The major difference from approaches that represent text as a bag-of-words is that Destiny represents sentences as graphs, with words as nodes and the grammatical relations between words as edges. The proposed algorithm is evaluated using a custom corpus of user-rated sentences and compared to a TF-IDF baseline, performs significantly better in terms of Mean Average Precision, normalized Discounted Cumulative Gain, and Spearman’s Rho.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ahn, J., Brusilovsky, P., Grady, J., He, D., Syn, S.Y.: Open user profiles for adaptive news systems: help or harm? In: 16th International Conference on World Wide Web (WWW 2007), pp. 11–20. ACM (2007)

    Google Scholar 

  2. Barwise, J., Cooper, R.: Generalized quantifiers and natural language. Linguist. Philos. 4, 159–219 (1981). http://dx.doi.org/10.1007/BF00350139

    Article  MATH  Google Scholar 

  3. Billsus, D., Pazzani, M.J.: User modeling for adaptive news access. User Model. User-Adap. Inter. 10(2–3), 147–180 (2000)

    Article  Google Scholar 

  4. Cook, S.A.: The complexity of theorem-proving procedures. In: Third Annual ACM Symposium on Theory of Computing (STOC 1971), pp. 151–158. ACM (1971). http://doi.acm.org/10.1145/800157.805047

  5. Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V., Aswani, N., Roberts, I., Gorrell, G., Funk, A., Roberts, A., Damljanovic, D., Heitz, T., Greenwood, M.A., Saggion, H., Petrak, J., Li, Y., Peters, W.: Text Processing with GATE (Version 6), University of Sheffield Department of Computer Science (2011)

    Google Scholar 

  6. Devitt, M., Hanley, R. (eds.): The Blackwell Guide to the Philosophy of Language. Blackwell Publishing, Oxford (2006)

    Google Scholar 

  7. Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  8. Frasincar, F., Borsje, J., Levering, L.: A semantic web-based approach for building personalized news services. IJEBR 5(3), 35–53 (2009)

    Google Scholar 

  9. Java, A., Finin, T., Nirenburg, S.: SemNews: a semantic news framework. In: The Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference (AAAI 2006), pp. 1939–1940. AAAI Press (2006)

    Google Scholar 

  10. Kilgarriff, A., Rosenzweig, J.: English SENSEVAL: report and results. In: 2nd International Conference on Language Resources and Evaluation (LREC 2000), pp. 1239–1244. ELRA (2000)

    Google Scholar 

  11. Klein, D., Manning, C.: Accurate unlexicalized parsing. In: 41st Meeting of the Association for Computational Linguistics (ACL 2003), pp. 423–430. ACL (2003)

    Google Scholar 

  12. Lopez, V., Uren, V., Motta, E., Pasin, M.: AquaLog: an ontology-driven question answering system as an interface to the semantic web. J. Web Semant. 5(2), 72–105 (2007)

    Article  Google Scholar 

  13. McGregor, J.J.: Backtrack search algorithms and the maximal common subgraph problem. Softw. Pract. Experience 12(1), 23–34 (1982)

    Article  MATH  Google Scholar 

  14. Porter, M.F.: An algorithm for suffix stripping. In: Readings in Information Retrieval, pp. 313–316. Morgan Kaufmann Publishers Inc. (1997)

    Google Scholar 

  15. Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, Maidenherd (1983)

    MATH  Google Scholar 

  16. Schouten, K., Ruijgrok, P., Borsje, J., Frasincar, F., Levering, L., Hogenboom, F.: A Semantic web-based approach for personalizing news. In: ACM Symposium on Applied Computing (SAC 2010), pp. 854–861. ACM (2010)

    Google Scholar 

  17. Schouten, K., Frasincar, F.: A linguistic graph-based approach for web news sentence searching. In: Decker, H., Lhotská, L., Link, S., Basl, J., Tjoa, A.M. (eds.) DEXA 2013, Part II. LNCS, vol. 8056, pp. 57–64. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  18. Ullmann, J.R.: An algorithm for subgraph isomorphism. J. ACM 23(1), 31–42 (1976)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgment

The authors are partially supported by the Dutch national program COMMIT.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Flavius Frasincar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Schouten, K., Frasincar, F. (2016). Web News Sentence Searching Using Linguistic Graph Similarity. In: Arnicans, G., Arnicane, V., Borzovs, J., Niedrite, L. (eds) Databases and Information Systems. DB&IS 2016. Communications in Computer and Information Science, vol 615. Springer, Cham. https://doi.org/10.1007/978-3-319-40180-5_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-40180-5_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-40179-9

  • Online ISBN: 978-3-319-40180-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics