Skip to main content

Semantic Services for Wikipedia

  • Chapter
  • First Online:
Weaving Services and People on the World Wide Web

Abstract

Wikipedia, a killer application in Web 2.0, has embraced the power of collaborative editing to harness collective intelligence. It features many attractive characteristics, like entity-based link graph, abundant categorization and semi-structured layout, and can serve as an ideal data source to extract high quality and well-structured data. In this chapter, we first propose several solutions to extract knowledge from Wikipedia. We do not only consider information from the relational summaries of articles (infoboxes) but also semi-automatically extract it from the article text using the structured content available. Due to differences with information extraction from the Web, it is necessary to tackle new problems, like the lack of redundancy in Wikipedia that is dealt with by extending traditional machine learning algorithms to work with few labeled data. Furthermore, we also exploit the widespread categories as a complementary way to discover additional knowledge. Benefiting from both structured and textural information, we additionally provide a suggestion service for Wikipedia authoring. With the aim to facilitate semantic reuse, our proposal provides users with facilities such as link, categories and infobox content suggestions. The proposed enhancements can be applied to attract more contributors and lighten the burden of professional editors. Finally, we developed an enhanced search system, which can ease the process of exploiting Wikipedia. To provide a user-friendly interface, it extends the faceted search interface with relation navigation and let the user easily express his complex information needs in an interactive way. In order to achieve efficient query answering, it extends scalable IR engines to index and search both the textual and structured information with an integrated ranking support.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Fu, L., Wang, H., Zhu, H., Zhang, H.,Wang, Y., Yu, Y.: Making more wikipedians: Facilitating semantics reuse for wikipedia authoring. Lecture Notes in Computer Science 4825, 128 (2007)

    Google Scholar 

  2. Giles, J.: Special Report–Internet encyclopaedias go head to head. Nature 438(15), 900–901 (2005)

    Google Scholar 

  3. Li, X., Liu, B.: Learning to classify texts using positive and unlabeled data. In: International Joint Conference on Artificial Intelligence. Lawrence Erlbaum Associates Ltd vol. 18, pp. 587–594 (2003)

    Google Scholar 

  4. Liu, Q., Xu, K., Zhang, L., Wang, H., Yu, Y., Pan, Y.: Catriple: Extracting Triples from Wikipedia Categories. In: Proceedings of the 3rd Asian Semantic Web Conference on The Semantic Web. Springer pp. 330–344 (2008)

    Google Scholar 

  5. Suchanek, F., Kasneci, G.,Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th international conference on World Wide Web. ACM New York, NY, USA pp. 697–706 (2007)

    Google Scholar 

  6. Wang, G., Yu, Y., Zhu, H.: Pore: Positive-only relation extraction from wikipedia text. Lecture Notes in Computer Science 4825, 580 (2007)

    Google Scholar 

  7. Yee, K., Swearingen, K., Li, K., Hearst,M.: Faceted metadata for image search and browsing. In: Proceedings of the SIGCHI conference on Human factors in computing systems. ACM New York, NY, USA pp. 401–408 (2003)

    Google Scholar 

  8. Zhang, L., Liu, Q., Zhang, J.,Wang, H., Pan, Y., Yu, Y.: Semplore: An ir approach to scalable hybrid query of semantic web data. Lecture Notes in Computer Science 4825, 652 (2007)

    Google Scholar 

  9. Zlatić, V., Božičević, M., Štefančić, H., Domazet, M.: Wikipedias: collaborative web-based encyclopedias as complex networks. SIAM Rev Phys Rev E 74, 016, 115 (2003)

    Google Scholar 

  10. Zobel, J., Moffat, A.: Inverted files for text search engines. ACMComputing Surveys (CSUR) 38(2) (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haofen Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Wang, H., Penin, T., Fu, L., Liu, Q., Xue, G., Yu, Y. (2009). Semantic Services for Wikipedia. In: King, I., Baeza-Yates, R. (eds) Weaving Services and People on the World Wide Web. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00570-1_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00570-1_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00569-5

  • Online ISBN: 978-3-642-00570-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics