Skip to main content

Building “Bag of Conception” Model Based on DBpedia

  • Conference paper
Advances in Software Engineering (ASEA 2008)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 30))

  • 426 Accesses

Abstract

Text classification has been widely used to assist users with the discovery of useful information from the Internet. However, Current text classification systems are based on the “Bag ofWords” (BOW) representation, which only accounts for term frequency in the documents, and ignores important semantic relationships between key terms. To overcome this problem, previous work attempted to enrich text representation by means of manual intervention or automatic document expansion. The achieved improvement is unfortunately very limited, due to the poor coverage capability of the dictionary, and to the ineffectiveness of term expansion. Fortunately, DBpedia appeared recently which contains rich semantic information. In this paper, we proposed a method compiling DBpedia knowledge into document representation to improve text classification. It facilitates the integration of the rich knowledge of DBpedia into text documents, by resolving synonyms and introducing more general and associative concepts. To evaluate the performance of the proposed method, we have performed an empirical evaluation using SVM calssifier on several real data sets. The experimental results show that our proposed framework, which integrates hierarchical relations, synonym and associative relations with traditional text similarity measures based on the BOW model, does improve text classification performance significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. de Buenaga Rodriguez, M., Gomez Hidalgo, J.M., Agudo, B.D.: Using WordNet to complement training information in text categorization. In: The 2nd international conference on recent advances in natural language processing, RANLP 1997 (1999)

    Google Scholar 

  2. Urena-Lopez, L.A., Buenaga, M., Gomez, J.M.: Integrating linguistic resources in TC through WSD. Comput. Hum. 35, 215–230 (2001)

    Article  Google Scholar 

  3. Reuters-21578 text categorization test collection, Distribution 1.0. Reuters (1997), http://www.daviddlewis.com/resources/testcollections/reuters21578/

  4. Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th international World Wide Web conference WWW 2003 (2003)

    Google Scholar 

  5. Hotho, A., Staab, S., Stumme, G.: Wordnet improves text document clustering. In: Proceedings of the semantic web workshop at SIGIR 2003 (2003)

    Google Scholar 

  6. Gabrilovich, E., Markovitch, S.: Feature generation for text categorization using world knowledge. In: Proceedings of the 19th international joint conference on artificial intelligence, IJCAI 2005 (2005)

    Google Scholar 

  7. Gabrilovich, E., Markovitch, S.: Overcoming the brittleness bottleneck using DBpedia: enhancing text categorization with encyclopedic knowledge. In: Proceedings of the 21nd AAAI conference on artificial intelligence, AAAI 2006 (2006)

    Google Scholar 

  8. http://dbpedia.org/

  9. Vapnik, V.N.: The nature of statistical learning theory. Springer, New York (1995)

    MATH  Google Scholar 

  10. Salton, G.: Automatic Text Processing. Addison-Wesley Publishing Inc., Boston (1989)

    Google Scholar 

  11. Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Google Scholar 

  12. Hotho, A., Staab, S., Stumme, G.: Wordnet improves Text Document Clustering. In: Proc. of the Semantic Web Workshop of the 26th Annual International ACM SIGIR Conference, Toronto, Canada (2003)

    Google Scholar 

  13. Moldovan, D.I., Mihalcea, R.: Improving the Search on the Internet by using WordNet and lexical operators. IEEE Internet Computing 4(1), 34–43 (2000)

    Article  Google Scholar 

  14. Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn Treebank. Comput. Linguist. 19(2), 313–330 (1993)

    Google Scholar 

  15. Paice, C.D.: Another stemmer. SIGIR Forum 24(3), 56–61 (1990)

    Article  Google Scholar 

  16. Reuters-21578 text categorization test collection, Distribution 1.0. Reuters (1997), http://www.daviddlewis.com/resources/testcollections/reuters21578/

  17. Hersh, W., Buckley, C., Leone, T., Hickam, D.: OHSUMED: an interactive retrieval evaluation and new large test collection for research. In: Proceedings of the 17th annual internationalACM-SIGIR conference on research and development in information retrieval (SIGIR 1994), pp. 192–201 (1994)

    Google Scholar 

  18. Lang, K.: Newsweeder: learning to filter netnews. In: Proceedings of the 12th international conference on machine learning (ICML 1995), pp. 331–339 (1995)

    Google Scholar 

  19. Joachims, T.: Text categorizationwith support vectormachines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  20. Stumme, G., Maedche, A.: FCA-Merge: A Bottom Up Approach for Merging Ontologies. In: Proceedings of the International Joint Conference on Artificial Intelligence, Seattle, Washington, USA, pp. 225–234 (2001)

    Google Scholar 

  21. Noy, N.F., Musen, M.A.: SMART: Automated Support for Ontology Merging and Alignment. In: Proceedings of the KAW 1999, Banff, Alberta, Canada, Saturday 16 to Thursday 21 October (1999)

    Google Scholar 

  22. Noy, N.F., Musen, M.A.: Algorithm and Tool for Automated Ontology Merging and Alignment. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI-2000), Austin, TX, USA (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liao, J., Bai, R. (2009). Building “Bag of Conception” Model Based on DBpedia. In: Kim, Th., Fang, WC., Lee, C., Arnett, K.P. (eds) Advances in Software Engineering. ASEA 2008. Communications in Computer and Information Science, vol 30. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10242-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-10242-4_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-10241-7

  • Online ISBN: 978-3-642-10242-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics