Building “Bag of Conception” Model Based on DBpedia

Liao, Junhua; Bai, Rujiang

doi:10.1007/978-3-642-10242-4_6

Junhua Liao⁵ &
Rujiang Bai⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 30))

Included in the following conference series:

International Conference on Advanced Software Engineering and Its Applications

426 Accesses

Abstract

Text classification has been widely used to assist users with the discovery of useful information from the Internet. However, Current text classification systems are based on the “Bag ofWords” (BOW) representation, which only accounts for term frequency in the documents, and ignores important semantic relationships between key terms. To overcome this problem, previous work attempted to enrich text representation by means of manual intervention or automatic document expansion. The achieved improvement is unfortunately very limited, due to the poor coverage capability of the dictionary, and to the ineffectiveness of term expansion. Fortunately, DBpedia appeared recently which contains rich semantic information. In this paper, we proposed a method compiling DBpedia knowledge into document representation to improve text classification. It facilitates the integration of the rich knowledge of DBpedia into text documents, by resolving synonyms and introducing more general and associative concepts. To evaluate the performance of the proposed method, we have performed an empirical evaluation using SVM calssifier on several real data sets. The experimental results show that our proposed framework, which integrates hierarchical relations, synonym and associative relations with traditional text similarity measures based on the BOW model, does improve text classification performance significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

de Buenaga Rodriguez, M., Gomez Hidalgo, J.M., Agudo, B.D.: Using WordNet to complement training information in text categorization. In: The 2nd international conference on recent advances in natural language processing, RANLP 1997 (1999)
Google Scholar
Urena-Lopez, L.A., Buenaga, M., Gomez, J.M.: Integrating linguistic resources in TC through WSD. Comput. Hum. 35, 215–230 (2001)
Article Google Scholar
Reuters-21578 text categorization test collection, Distribution 1.0. Reuters (1997), http://www.daviddlewis.com/resources/testcollections/reuters21578/
Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th international World Wide Web conference WWW 2003 (2003)
Google Scholar
Hotho, A., Staab, S., Stumme, G.: Wordnet improves text document clustering. In: Proceedings of the semantic web workshop at SIGIR 2003 (2003)
Google Scholar
Gabrilovich, E., Markovitch, S.: Feature generation for text categorization using world knowledge. In: Proceedings of the 19th international joint conference on artificial intelligence, IJCAI 2005 (2005)
Google Scholar
Gabrilovich, E., Markovitch, S.: Overcoming the brittleness bottleneck using DBpedia: enhancing text categorization with encyclopedic knowledge. In: Proceedings of the 21nd AAAI conference on artificial intelligence, AAAI 2006 (2006)
Google Scholar
http://dbpedia.org/
Vapnik, V.N.: The nature of statistical learning theory. Springer, New York (1995)
MATH Google Scholar
Salton, G.: Automatic Text Processing. Addison-Wesley Publishing Inc., Boston (1989)
Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Google Scholar
Hotho, A., Staab, S., Stumme, G.: Wordnet improves Text Document Clustering. In: Proc. of the Semantic Web Workshop of the 26th Annual International ACM SIGIR Conference, Toronto, Canada (2003)
Google Scholar
Moldovan, D.I., Mihalcea, R.: Improving the Search on the Internet by using WordNet and lexical operators. IEEE Internet Computing 4(1), 34–43 (2000)
Article Google Scholar
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn Treebank. Comput. Linguist. 19(2), 313–330 (1993)
Google Scholar
Paice, C.D.: Another stemmer. SIGIR Forum 24(3), 56–61 (1990)
Article Google Scholar
Reuters-21578 text categorization test collection, Distribution 1.0. Reuters (1997), http://www.daviddlewis.com/resources/testcollections/reuters21578/
Hersh, W., Buckley, C., Leone, T., Hickam, D.: OHSUMED: an interactive retrieval evaluation and new large test collection for research. In: Proceedings of the 17th annual internationalACM-SIGIR conference on research and development in information retrieval (SIGIR 1994), pp. 192–201 (1994)
Google Scholar
Lang, K.: Newsweeder: learning to filter netnews. In: Proceedings of the 12th international conference on machine learning (ICML 1995), pp. 331–339 (1995)
Google Scholar
Joachims, T.: Text categorizationwith support vectormachines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Chapter Google Scholar
Stumme, G., Maedche, A.: FCA-Merge: A Bottom Up Approach for Merging Ontologies. In: Proceedings of the International Joint Conference on Artificial Intelligence, Seattle, Washington, USA, pp. 225–234 (2001)
Google Scholar
Noy, N.F., Musen, M.A.: SMART: Automated Support for Ontology Merging and Alignment. In: Proceedings of the KAW 1999, Banff, Alberta, Canada, Saturday 16 to Thursday 21 October (1999)
Google Scholar
Noy, N.F., Musen, M.A.: Algorithm and Tool for Automated Ontology Merging and Alignment. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI-2000), Austin, TX, USA (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Shandong University of Technology Library, Zibo, 255049, China
Junhua Liao & Rujiang Bai

Authors

Junhua Liao
View author publications
You can also search for this author in PubMed Google Scholar
Rujiang Bai
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Hannam University, 133 Ojeong-dong, DaedeokGu, 306-791, Daejeon, South Korea
Tai-hoon Kim
National Chiao Tung University, Hsinchu, Taiwan
Wai-Chi Fang
Korea University, Seoul, South Korea
Changhoon Lee
Mississippi State University, Mississippi State, MS, USA
Kirk P. Arnett

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liao, J., Bai, R. (2009). Building “Bag of Conception” Model Based on DBpedia. In: Kim, Th., Fang, WC., Lee, C., Arnett, K.P. (eds) Advances in Software Engineering. ASEA 2008. Communications in Computer and Information Science, vol 30. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10242-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-10242-4_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10241-7
Online ISBN: 978-3-642-10242-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics