Skip to main content

Exploration of Document Classification with Linked Data and PageRank

  • Conference paper

Part of the Studies in Computational Intelligence book series (SCI,volume 511)

Abstract

In this article, we would like to present a new approach to classification using Linked Data and PageRank. Our research is focused on classification methods that are enhanced by semantic information. The semantic information can be obtained from ontology or from Linked Data. DBpedia was used as a source of Linked Data in our case. The feature selection method is semantically based so features can be recognized by non-professional users as they are in a human readable and understandable form. PageRank is used during the feature selection and generation phase for the expansion of basic features into more general representatives. This means that feature selection and PageRank processing is based on network relations obtained from Linked Data. The discovered features can be used by standard classification algorithms. We will present promising results that show the simple applicability of this approach to two different datasets.

Keywords

  • Feature Selection
  • Semantic Information
  • Link Data
  • Feature Selection Method
  • Basic Node

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-01571-2_6
  • Chapter length: 7 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   139.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-01571-2
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   179.99
Price excludes VAT (USA)
Hardcover Book
USD   249.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Berners-Lee, T.: Linked Data - Design Issues. Online document (2006), http://www.w3.org/DesignIssues/LinkedData.html/ (Cited January 12, 2013)

  2. Bloehdorn, S., Hotho, A.: Boosting for Text Classification with Semantic Features. In: Mobasher, B., Nasraoui, O., Liu, B., Masand, B. (eds.) WebKDD 2004. LNCS (LNAI), vol. 3932, pp. 149–166. Springer, Heidelberg (2006)

    CrossRef  Google Scholar 

  3. Brine, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(1-7), 107–117 (1998)

    CrossRef  Google Scholar 

  4. Cohen, W., Singer, Y.: Context-sensitive learning methods for text categorization. In: Proceedings of the ACM SIGIR 1996 (1996)

    Google Scholar 

  5. DBPedia, http://dbpedia.org/ (Cited January 12, 2013)

  6. de Melo, G., Siersdorfer, S.: Multilingual text classification using ontologies. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECiR 2007. LNCS, vol. 4425, pp. 541–548. Springer, Heidelberg (2007)

    CrossRef  Google Scholar 

  7. Gabrilovich, E., et al.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proceedings of the IJCAI 2007, Hyderabad, India, pp. 1606–1611 (2007)

    Google Scholar 

  8. Jaffri, A., Glaser, H., Millard, I.: URI Disambiguation in the Context of Linked Data. In: Proceedings of the LDOW 2008, Beijing, China (2008)

    Google Scholar 

  9. Lang, K.: Newsweeder: Learning to filter netnews. In: Proceedings of the Twelfth International Conference on Machine Learning, 20 News groups dataset, pp. 331–339 (1995)

    Google Scholar 

  10. Langville, A.N., Meyer, C.D.: Google’s PageRank and Beyond: The Science of Search Engine Ranking. Princeton University Press, Princeton (2006)

    Google Scholar 

  11. Ma, N., et al.: Bringing PageRank to the citation analysis. Proceedings of the Information Processing & Management 44(2), 800–810 (2008)

    CrossRef  Google Scholar 

  12. Ramakrishnanan, G., Bhattacharyya, P.: Text Representation with WordNet Synsets using Soft Sense Disambiguation. In: Proceedings of the 8th NLDB, Burg, Germany (2003)

    Google Scholar 

  13. Salton, G.: The SMART Retrieval System. Prentice-Hall, Englewood Cliffs (1971)

    Google Scholar 

  14. Schapire, R., Singer, Y.: BoosTexter: A boosting-based system for text categorization. In: Machine Learning, pp. 135–168 (1999)

    Google Scholar 

  15. Strube, M., Ponzetto, S.P.: WikiRelate! Computing semantic relatedness using Wikipedia. In: Proceedings of the AAAI 2006, Boston, USA, pp. 1419–1424 (2006)

    Google Scholar 

  16. Wang, W., Do, D.B., Lin, X.: Term Graph Model for Text Classification. In: Li, X., Wang, S., Dong, Z.Y. (eds.) ADMA 2005. LNCS (LNAI), vol. 3584, pp. 19–30. Springer, Heidelberg (2005)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Dostal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Dostal, M., Nykl, M., Ježek, K. (2014). Exploration of Document Classification with Linked Data and PageRank. In: Zavoral, F., Jung, J., Badica, C. (eds) Intelligent Distributed Computing VII. Studies in Computational Intelligence, vol 511. Springer, Cham. https://doi.org/10.1007/978-3-319-01571-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-01571-2_6

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-01570-5

  • Online ISBN: 978-3-319-01571-2

  • eBook Packages: EngineeringEngineering (R0)