What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content

Auer, Sören; Lehmann, Jens

doi:10.1007/978-3-540-72667-8_36

Sören Auer^1,2 &
Jens Lehmann¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4519))

Included in the following conference series:

European Semantic Web Conference

2680 Accesses
81 Citations
3 Altmetric

Abstract

Wikis are established means for the collaborative authoring, versioning and publishing of textual articles. The Wikipedia project, for example, succeeded in creating the by far largest encyclopedia just on the basis of a wiki. Recently, several approaches have been proposed on how to extend wikis to allow the creation of structured and semantically enriched content. However, the means for creating semantically enriched structured content are already available and are, although unconsciously, even used by Wikipedia authors. In this article, we present a method for revealing this structured content by extracting information from template instances. We suggest ways to efficiently query the vast amount of extracted information (e.g. more than 8 million RDF statements for the English Wikipedia version alone), leading to astonishing query answering possibilities (such as for the title question). We analyze the quality of the extracted content, and propose strategies for quality improvements with just minor modifications of the wiki systems being currently used.

Download to read the full chapter text

Chapter PDF

QwwwQ: Querying Wikipedia Without Writing Queries

Getting the Most Out of Wikidata: Semantic Technology Usage in Wikipedia’s Knowledge Graph

DBpedia Mashups

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Apostolico, A., Galil, Z. (eds.): Pattern Matching Algorithms. OUP, Oxford (1997)
MATH Google Scholar
Auer, S., Dietzold, S., Riechert, T.: OntoWiki – A tool for social, semantic collaboration. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 736–749. Springer, Heidelberg (2006)
Chapter Google Scholar
Bizer, C.: D2R MAP - A database to RDF mapping language. In: WWW, Posters (2003), http://www2003.org/cdrom/papers/poster/p004/p4-bizer.html
Bryant, S.L., Forte, A., Bruckman, A.: Becoming wikipedian: transformation of participation in a collaborative online encyclopedia. In: GROUP’05: International Conference on Supporting Group Work, Net communities, pp. 1–10 (2005), http://doi.acm.org/10.1145/1099203.1099205
Chernov, S., Iofciu, T., Nejdl, W., Zhuo, X.: Extracting semantic relationships between wikipedia categories. In: 1st International Workshop: ”SemWiki2006 - From Wiki to Semantics” (SemWiki 2006), co-located with the ESWC2006 in Budva, Montenegro, June 12 (2006)
Google Scholar
Denoyer, L., Gallinari, P.: The Wikipedia XML Corpus. SIGIR Forum (2006)
Google Scholar
Dietzold, S.: Generating rdf models from ldap directories. In: Bizer, C., Auer, S., Miller, L. (eds.) Proceedings of the SFSW 05 Workshop on Scripting for the Semantic Web, Hersonissos, Crete, Greece, May 30, 2005. CEUR Workshop Proceedings, vol. 135 (2005)
Google Scholar
Dimitrov, D.A., Heflin, J., Qasem, A., Wang, N.: Information integration via an end-to-end distributed semantic web system. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 764–777. Springer, Heidelberg (2006)
Chapter Google Scholar
Douglas, S., Hurst, M.: Layout and language: lists and tables in technical documents. In: Proceedings of ACL SIGPARSE Workshop on Punctuation in Computational Linguistics, Jul. 1996, pp. 19–24 (1996)
Google Scholar
Embley, D.W., Tao, C., Liddle, S.W.: Automatically extracting ontologically specified data from HTML tables of unknown structure. In: Spaccapietra, S., March, S.T., Kambayashi, Y. (eds.) ER 2002. LNCS, vol. 2503, pp. 322–337. Springer, Heidelberg (2002)
Chapter Google Scholar
Hu, J., Kashi, R.S., Lopresti, D.P., Wilfong, G.T.: Evaluating the performance of table processing algorithms. International Journal on Document Analysis and Recognition 4(3), 140–153 (2002)
Article Google Scholar
Hurst, M.: Layout and language: Beyond simple text for information interaction – modelling the table. In: Proceedings of the 2nd International Conference on Multimodal Interfaces, Hong Kong (1999)
Google Scholar
Hurst, M.: The Interpretation of Tables in Texts. PhD thesis, University of Edinburgh (2000)
Google Scholar
Katz, B., Marton, G., Borchardt, G., Brownell, A., Felshin, S., Loreto, D., Louis-Rosenberg, J., Lu, B., Mora, F., Stiller, S., Uzuner, O., Wilcox, A.: External knowledge sources for question answering. In: Proceedings of the 14th Annual Text REtrieval Conference (TREC2005), Gaithersburg, MD (November 2005)
Google Scholar
Krötzsch, M., Vrandecic, D., Völkel, M.: Wikipedia and the Semantic Web - The Missing Links. In: Voss, J., Lih, A. (eds.) Proceedings of Wikimania 2005, Frankfurt, Germany (2005)
Google Scholar
Leuf, B., Cunningham, W.: The Wiki Way: Collaboration and Sharing on the Internet. Addison Wesley, Reading (Apr. 2001)
Google Scholar
Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)
MATH Google Scholar
Ng, H.T., Lim, C.Y., Koo, J.L.T.: Learning to recognize tables in free text. In: ACL (1999), http://www.aclweb.org/anthology/P99-1057
System One. Wikipedia3 (2006), http://labs.systemone.at/wikipedia3
Pinto, D., McCallum, A., Wei, X., Croft, W.B.: Table extraction using conditional random fields. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, IR theory, pp. 235–242 (2003)
Google Scholar
Pivk, A., Cimiano, P., Sure, Y.: From tables to frames. Journal of Web Semantics 3(2-3), 132–146 (2005), http://dx.doi.org/10.1016/j.websem.2005.06.003
Google Scholar
Suh, S., Halpin, H., Klein, E.: Extracting common sense knowledge from wikipedia. In: Proceedings of the ISWC-06 Workshop on Web Content Mining with Human Language Technologies (2006)
Google Scholar
Tijerino, Y.A., Embley, D.W., Lonsdale, D.W., Nagy, G.: Ontology generation from tables. In: WISE, pp. 242–252. IEEE Computer Society Press, Los Alamitos (2003), http://csdl.computer.org/comp/proceedings/wise/2003/1999/00/19990242abs.htm
Google Scholar
Völkel, M., Krötzsch, M., Vrandecic, D., Haller, H., Studer, R.: Semantic wikipedia. In: Carr, L., De Roure, D., Iyengar, A., Goble, C.A., Dahlin, M. (eds.) Proceedings of the 15th international conference on World Wide Web, WWW 2006, pp. 585–594. ACM Press, New York (2006)
Chapter Google Scholar
Wang, X.: Tabular abstraction, editing, and formatting. PhD thesis, University of Waterloo, Computer Science Dept., Waterloo, Ont., Canada (1996)
Google Scholar
Wang, Y., Phillips, I.T., Haralick, R.M.: Table structure understanding and its performance evaluation. Pattern Recognition 37(7), 1479–1497 (2004), http://dx.doi.org/10.1016/j.patcog.2004.01.012
Article Google Scholar
Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition: Models, observations, transformations, and inferences. International Journal on Document Analysis and Recognition 7(1), 1–16 (2004), http://dx.doi.org/10.1007/s10032-004-0120-9
Google Scholar

Download references

Author information

Authors and Affiliations

Universität Leipzig, Department of Computer Science, Johannisgasse 26, D-04103 Leipzig, Germany
Sören Auer & Jens Lehmann
University of Pennsylvania, Department of Computer and Information Science, Philadelphia, PA 19104, USA
Sören Auer

Authors

Sören Auer
View author publications
You can also search for this author in PubMed Google Scholar
Jens Lehmann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Enrico Franconi Michael Kifer Wolfgang May

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Auer, S., Lehmann, J. (2007). What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content. In: Franconi, E., Kifer, M., May, W. (eds) The Semantic Web: Research and Applications. ESWC 2007. Lecture Notes in Computer Science, vol 4519. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72667-8_36

Download citation

DOI: https://doi.org/10.1007/978-3-540-72667-8_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72666-1
Online ISBN: 978-3-540-72667-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content

Abstract

Chapter PDF

Similar content being viewed by others

QwwwQ: Querying Wikipedia Without Writing Queries

Getting the Most Out of Wikidata: Semantic Technology Usage in Wikipedia’s Knowledge Graph

DBpedia Mashups

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content

Abstract

Chapter PDF

Similar content being viewed by others

QwwwQ: Querying Wikipedia Without Writing Queries

Getting the Most Out of Wikidata: Semantic Technology Usage in Wikipedia’s Knowledge Graph

DBpedia Mashups

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation