Skip to main content

Automatic Invocation Linking for Collaborative Web-Based Corpora

  • Chapter
  • First Online:

Part of the book series: Advanced Information and Knowledge Processing ((AI&KP))

Abstract

Collaborative online encyclopedias or knowledge bases such as Wikipedia and PlanetMath are becoming increasingly popular because of their open access, comprehensive and interlinked content, rapid and continual updates, and community interactivity. To understand a particular concept in these knowledge bases, a reader needs to learn about related and underlying concepts. In this chapter, we introduce the problem of invocation linking for collaborative encyclopedia or knowledge bases, review the state of the art for invocation linking including the popular linking system of Wikipedia, discuss the problems and challenges of automatic linking, and present the NNexus approach, an abstraction and generalization of the automatic linking system used by PlanetMath.org. The chapter emphasizes both research problems and practical design issues through discussion of real world scenarios and hence is suitable for both researchers in web intelligence and practitioners looking to adopt the techniques. Below is a brief outline of the chapter.

Problem and Motivation. We first introduce the problem of invocation linking for online collaborative encyclopedia or knowledge bases. An online encyclopedia consists of multiple entries. An invocation link is a hyperlink from a term or phrase in an entry representing a concept to another entry that defines the concept. It allows a reader easily “jump” to requisite concepts in order to fully understand the current one. We refer to the term or phrase being linked from as link source and the entry being linked to as link target. The problem of invocation linking is how to add these invocation links in an online encyclopedia in order to build a semantic concept network.

State of the Arts. We review the state of arts for the invocation linking in current online encyclopedia and knowledge bases. The existing approaches can be mainly classified into: 1) manual linking where both the link source and link target are explicitly defined by the user (such as blog software), 2) semi-automatic linking where the link source are explicitly marked by the user but the link target is determined automatically (such as Wikipedia), and 3) automatic linking where both the link source and link target are determined automatically. We discuss the representative systems for each approach and illustrate their advantages and disadvantages. We will also review potential technologies such as web search and recommender systems and discuss their applicability for invocation linking.

Automatic Invocation Linking. We advocate in this chapter the automatic linking approach as we believe that the manual and semi-automatic approaches are an unnecessary burden on contributors, and in addition, require continuous re-inspection of the entire corpus by writers or other maintainers for a growing and dynamic corpus. We discuss the challenges and design goals for developing such an automatic linking system including linking quality, efficiency and scalability, and generalization to multiple corpus.

NNexus Approach. In particular, we present the NNexus system, an automatic linking system that we have developed as an abstraction and generalization of the linking component of PlanetMath (planetmath.org), PlanetPhysics(planetphysics.org), and other sites. We discuss a number of key features and design ideas of NNexus in addressing the challenges for invocation linking. NNexus provides an effective linking scheme utilizing metadata to automatically identify link sources and link targets. It achieves good linking quality with a classification-based link steering approach and an interactive entry filtering component. It achieves good efficiency and scalability by its efficient data structures as well as a mechanism for efficiently updating the links between entries that are related to newly defined or modified concepts in the corpus. Finally, its implementation utilizes OWL and has a simple interface, which allows for an almost unlimited number of online corpora to interconnect for automatic linking.

Conclusions and Open Issues. We close the chapter by discussing a set of interesting issues and open problems for invocation linking.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.wikipedia.org

  2. 2.

    http://www.planetmath.org

  3. 3.

    http://planetmath.org

  4. 4.

    Extracted from http://planetmath.org/encyclopedia/PlaneGraph.html

  5. 5.

    For more information on ontology mapping, we recommend the survey in [6].

  6. 6.

    Visit PlanetMath on the web at http://www.planetmath.org

  7. 7.

    Likely this number could exceed 95% with a little bit of targeted effort, and given that these policies have been available on PlanetMath for less than 2 years, the numbers will likely continue to improve on their own.

  8. 8.

    For their “Expert Voices” service. See http://www.nsdl.org/

  9. 9.

    http://aux.planetmath.org/nnexus/

References

  1. G. Adomavicius and A. Tuzhilin. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6), 2005.

    Google Scholar 

  2. Zharko Aleksovski and Michel Klein. Ontology mapping using background knowledge. In K-CAP ’05: Proceedings of the 3rd international conference on Knowledge capture, 2005.

    Google Scholar 

  3. Ricardo A. Baeza-Yates and Berthier A. Ribeiro-Neto. Modern Information Retrieval. ACM Press / Addison-Wesley, 1999.

    Google Scholar 

  4. J. Gardner, A. Krowne, and L. Xiong. NNexus: An Automatic Linker for Collaborative Web-Based Corpora In IEEE Transactions on Knowledge and Data Engineering, 21(6), 2009.

    Google Scholar 

  5. L. Gridinoc, M. Sabou, M. ďAquin, M. Dzbor, and E. Motta Semantic Browsing with PowerMagpie In ESWC ’2008: 5th European Semantic Web Conference, pages 802–806, 2008.

    Google Scholar 

  6. Yannis Kalfoglou and Marco Schorlemmer. Ontology mapping: The state of the art. In Y. Kalfoglou, M. Schorlemmer, A. Sheth, S. Staab, and M. Uschold, editors, Semantic Interoperability and Integration, number 04391 in Dagstuhl Seminar Proceedings. Internationales Begegnungs- und Forschungszentrum fuer Informatik (IBFI), Schloss Dagstuhl, Germany, 2005.

    Google Scholar 

  7. Kleinberg, Jon Authoritative sources in a hyperlinked environment In Journal of the ACM 46 (5):604632. 1999.

    Google Scholar 

  8. J. Kolbitsch and H. Maurer. Community building around encyclopeadic knowledge. Journal of Computing and Information Technology, 14, 2006.

    Google Scholar 

  9. Aaron Krowne. An architecture for collaborative math and science digital libraries. Master’s thesis, Virginia Polytechnic Institure and State University, Blacksburg, VA, 2003.

    Google Scholar 

  10. Max Völkel, Markus Krötzsch, Denny Vrandecic, Heiko Haller, and Rudi Studer. Semantic wikipedia. In WWW ’06: Proceedings of the 15th international conference on World Wide Web, pages 585–594, New York, NY, USA, 2006. ACM Press.

    Google Scholar 

  11. D. Milne and I. Witten. Learning to Link with Wikipedia. In CIKM ’2008: 17th Conference on Information and Knowledge Management, 2008.

    Google Scholar 

  12. Natalya Fridman Noy and Mark A. Musen. PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment. In Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, 2000.

    Google Scholar 

  13. S. E. Roberto Tazzoli and Paolo Castagna. Towards a semantic wiki web. In In Demo Session at ISWC2004, 2004.

    Google Scholar 

  14. Adam Souzis. Building a semantic wiki. IEEE Intelligent Systems, 20(5):87–91, 2005.

    Article  Google Scholar 

  15. G. Weaver, B. Strickland, and G. Crane. Quantifying the accuracy of relational statements in wikipedia: a methodology. In JCDL ’06: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, 2006.

    Google Scholar 

Download references

Acknowledgements

This work has been partially supported by the Google Summer of Code Program. We would also like to thank the editors of the special issue and the anonymous reviewers for their valuable comments that improved this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to James Gardner .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag London Limited

About this chapter

Cite this chapter

Gardner, J., Krowne, A., Xiong, L. (2010). Automatic Invocation Linking for Collaborative Web-Based Corpora. In: Chbeir, R., Badr, Y., Abraham, A., Hassanien, AE. (eds) Emergent Web Intelligence: Advanced Information Retrieval. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-84996-074-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-84996-074-8_2

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84996-073-1

  • Online ISBN: 978-1-84996-074-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics