Abstract
Wikis are established means for the collaborative authoring, versioning and publishing of textual articles. The Wikipedia project, for example, succeeded in creating the by far largest encyclopedia just on the basis of a wiki. Recently, several approaches have been proposed on how to extend wikis to allow the creation of structured and semantically enriched content. However, the means for creating semantically enriched structured content are already available and are, although unconsciously, even used by Wikipedia authors. In this article, we present a method for revealing this structured content by extracting information from template instances. We suggest ways to efficiently query the vast amount of extracted information (e.g. more than 8 million RDF statements for the English Wikipedia version alone), leading to astonishing query answering possibilities (such as for the title question). We analyze the quality of the extracted content, and propose strategies for quality improvements with just minor modifications of the wiki systems being currently used.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Apostolico, A., Galil, Z. (eds.): Pattern Matching Algorithms. OUP, Oxford (1997)
Auer, S., Dietzold, S., Riechert, T.: OntoWiki – A tool for social, semantic collaboration. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 736–749. Springer, Heidelberg (2006)
Bizer, C.: D2R MAP - A database to RDF mapping language. In: WWW, Posters (2003), http://www2003.org/cdrom/papers/poster/p004/p4-bizer.html
Bryant, S.L., Forte, A., Bruckman, A.: Becoming wikipedian: transformation of participation in a collaborative online encyclopedia. In: GROUP’05: International Conference on Supporting Group Work, Net communities, pp. 1–10 (2005), http://doi.acm.org/10.1145/1099203.1099205
Chernov, S., Iofciu, T., Nejdl, W., Zhuo, X.: Extracting semantic relationships between wikipedia categories. In: 1st International Workshop: ”SemWiki2006 - From Wiki to Semantics” (SemWiki 2006), co-located with the ESWC2006 in Budva, Montenegro, June 12 (2006)
Denoyer, L., Gallinari, P.: The Wikipedia XML Corpus. SIGIR Forum (2006)
Dietzold, S.: Generating rdf models from ldap directories. In: Bizer, C., Auer, S., Miller, L. (eds.) Proceedings of the SFSW 05 Workshop on Scripting for the Semantic Web, Hersonissos, Crete, Greece, May 30, 2005. CEUR Workshop Proceedings, vol. 135 (2005)
Dimitrov, D.A., Heflin, J., Qasem, A., Wang, N.: Information integration via an end-to-end distributed semantic web system. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 764–777. Springer, Heidelberg (2006)
Douglas, S., Hurst, M.: Layout and language: lists and tables in technical documents. In: Proceedings of ACL SIGPARSE Workshop on Punctuation in Computational Linguistics, Jul. 1996, pp. 19–24 (1996)
Embley, D.W., Tao, C., Liddle, S.W.: Automatically extracting ontologically specified data from HTML tables of unknown structure. In: Spaccapietra, S., March, S.T., Kambayashi, Y. (eds.) ER 2002. LNCS, vol. 2503, pp. 322–337. Springer, Heidelberg (2002)
Hu, J., Kashi, R.S., Lopresti, D.P., Wilfong, G.T.: Evaluating the performance of table processing algorithms. International Journal on Document Analysis and Recognition 4(3), 140–153 (2002)
Hurst, M.: Layout and language: Beyond simple text for information interaction – modelling the table. In: Proceedings of the 2nd International Conference on Multimodal Interfaces, Hong Kong (1999)
Hurst, M.: The Interpretation of Tables in Texts. PhD thesis, University of Edinburgh (2000)
Katz, B., Marton, G., Borchardt, G., Brownell, A., Felshin, S., Loreto, D., Louis-Rosenberg, J., Lu, B., Mora, F., Stiller, S., Uzuner, O., Wilcox, A.: External knowledge sources for question answering. In: Proceedings of the 14th Annual Text REtrieval Conference (TREC2005), Gaithersburg, MD (November 2005)
Krötzsch, M., Vrandecic, D., Völkel, M.: Wikipedia and the Semantic Web - The Missing Links. In: Voss, J., Lih, A. (eds.) Proceedings of Wikimania 2005, Frankfurt, Germany (2005)
Leuf, B., Cunningham, W.: The Wiki Way: Collaboration and Sharing on the Internet. Addison Wesley, Reading (Apr. 2001)
Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)
Ng, H.T., Lim, C.Y., Koo, J.L.T.: Learning to recognize tables in free text. In: ACL (1999), http://www.aclweb.org/anthology/P99-1057
System One. Wikipedia3 (2006), http://labs.systemone.at/wikipedia3
Pinto, D., McCallum, A., Wei, X., Croft, W.B.: Table extraction using conditional random fields. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, IR theory, pp. 235–242 (2003)
Pivk, A., Cimiano, P., Sure, Y.: From tables to frames. Journal of Web Semantics 3(2-3), 132–146 (2005), http://dx.doi.org/10.1016/j.websem.2005.06.003
Suh, S., Halpin, H., Klein, E.: Extracting common sense knowledge from wikipedia. In: Proceedings of the ISWC-06 Workshop on Web Content Mining with Human Language Technologies (2006)
Tijerino, Y.A., Embley, D.W., Lonsdale, D.W., Nagy, G.: Ontology generation from tables. In: WISE, pp. 242–252. IEEE Computer Society Press, Los Alamitos (2003), http://csdl.computer.org/comp/proceedings/wise/2003/1999/00/19990242abs.htm
Völkel, M., Krötzsch, M., Vrandecic, D., Haller, H., Studer, R.: Semantic wikipedia. In: Carr, L., De Roure, D., Iyengar, A., Goble, C.A., Dahlin, M. (eds.) Proceedings of the 15th international conference on World Wide Web, WWW 2006, pp. 585–594. ACM Press, New York (2006)
Wang, X.: Tabular abstraction, editing, and formatting. PhD thesis, University of Waterloo, Computer Science Dept., Waterloo, Ont., Canada (1996)
Wang, Y., Phillips, I.T., Haralick, R.M.: Table structure understanding and its performance evaluation. Pattern Recognition 37(7), 1479–1497 (2004), http://dx.doi.org/10.1016/j.patcog.2004.01.012
Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition: Models, observations, transformations, and inferences. International Journal on Document Analysis and Recognition 7(1), 1–16 (2004), http://dx.doi.org/10.1007/s10032-004-0120-9
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Auer, S., Lehmann, J. (2007). What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content. In: Franconi, E., Kifer, M., May, W. (eds) The Semantic Web: Research and Applications. ESWC 2007. Lecture Notes in Computer Science, vol 4519. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72667-8_36
Download citation
DOI: https://doi.org/10.1007/978-3-540-72667-8_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72666-1
Online ISBN: 978-3-540-72667-8
eBook Packages: Computer ScienceComputer Science (R0)