Semantic Services for Wikipedia

Wang, Haofen; Penin, Thomas; Fu, Linyun; Liu, Qiaoling; Xue, Guirong; Yu, Yong

doi:10.1007/978-3-642-00570-1_2

Haofen Wang³,
Thomas Penin³,
Linyun Fu³,
Qiaoling Liu³,
Guirong Xue³ &
…
Yong Yu³

373 Accesses

Abstract

Wikipedia, a killer application in Web 2.0, has embraced the power of collaborative editing to harness collective intelligence. It features many attractive characteristics, like entity-based link graph, abundant categorization and semi-structured layout, and can serve as an ideal data source to extract high quality and well-structured data. In this chapter, we first propose several solutions to extract knowledge from Wikipedia. We do not only consider information from the relational summaries of articles (infoboxes) but also semi-automatically extract it from the article text using the structured content available. Due to differences with information extraction from the Web, it is necessary to tackle new problems, like the lack of redundancy in Wikipedia that is dealt with by extending traditional machine learning algorithms to work with few labeled data. Furthermore, we also exploit the widespread categories as a complementary way to discover additional knowledge. Benefiting from both structured and textural information, we additionally provide a suggestion service for Wikipedia authoring. With the aim to facilitate semantic reuse, our proposal provides users with facilities such as link, categories and infobox content suggestions. The proposed enhancements can be applied to attract more contributors and lighten the burden of professional editors. Finally, we developed an enhanced search system, which can ease the process of exploiting Wikipedia. To provide a user-friendly interface, it extends the faceted search interface with relation navigation and let the user easily express his complex information needs in an interactive way. In order to achieve efficient query answering, it extends scalable IR engines to index and search both the textual and structured information with an integrated ranking support.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.00; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Fu, L., Wang, H., Zhu, H., Zhang, H.,Wang, Y., Yu, Y.: Making more wikipedians: Facilitating semantics reuse for wikipedia authoring. Lecture Notes in Computer Science 4825, 128 (2007)
Google Scholar
Giles, J.: Special Report–Internet encyclopaedias go head to head. Nature 438(15), 900–901 (2005)
Google Scholar
Li, X., Liu, B.: Learning to classify texts using positive and unlabeled data. In: International Joint Conference on Artificial Intelligence. Lawrence Erlbaum Associates Ltd vol. 18, pp. 587–594 (2003)
Google Scholar
Liu, Q., Xu, K., Zhang, L., Wang, H., Yu, Y., Pan, Y.: Catriple: Extracting Triples from Wikipedia Categories. In: Proceedings of the 3rd Asian Semantic Web Conference on The Semantic Web. Springer pp. 330–344 (2008)
Google Scholar
Suchanek, F., Kasneci, G.,Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th international conference on World Wide Web. ACM New York, NY, USA pp. 697–706 (2007)
Google Scholar
Wang, G., Yu, Y., Zhu, H.: Pore: Positive-only relation extraction from wikipedia text. Lecture Notes in Computer Science 4825, 580 (2007)
Google Scholar
Yee, K., Swearingen, K., Li, K., Hearst,M.: Faceted metadata for image search and browsing. In: Proceedings of the SIGCHI conference on Human factors in computing systems. ACM New York, NY, USA pp. 401–408 (2003)
Google Scholar
Zhang, L., Liu, Q., Zhang, J.,Wang, H., Pan, Y., Yu, Y.: Semplore: An ir approach to scalable hybrid query of semantic web data. Lecture Notes in Computer Science 4825, 652 (2007)
Google Scholar
Zlatić, V., Božičević, M., Štefančić, H., Domazet, M.: Wikipedias: collaborative web-based encyclopedias as complex networks. SIAM Rev Phys Rev E 74, 016, 115 (2003)
Google Scholar
Zobel, J., Moffat, A.: Inverted files for text search engines. ACMComputing Surveys (CSUR) 38(2) (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Shanghai Jiao Tong University, 200240, Shanghai, China
Haofen Wang, Thomas Penin, Linyun Fu, Qiaoling Liu, Guirong Xue & Yong Yu

Authors

Haofen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Penin
View author publications
You can also search for this author in PubMed Google Scholar
Linyun Fu
View author publications
You can also search for this author in PubMed Google Scholar
Qiaoling Liu
View author publications
You can also search for this author in PubMed Google Scholar
Guirong Xue
View author publications
You can also search for this author in PubMed Google Scholar
Yong Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haofen Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wang, H., Penin, T., Fu, L., Liu, Q., Xue, G., Yu, Y. (2009). Semantic Services for Wikipedia. In: King, I., Baeza-Yates, R. (eds) Weaving Services and People on the World Wide Web. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00570-1_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-00570-1_2
Published: 17 April 2009
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00569-5
Online ISBN: 978-3-642-00570-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics