Building semantic kernels for cross-document knowledge discovery using Wikipedia

Yan, Peng; Jin, Wei

doi:10.1007/s10115-016-0973-5

Building semantic kernels for cross-document knowledge discovery using Wikipedia

Regular Paper
Published: 05 August 2016

Volume 51, pages 287–310, (2017)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Peng Yan¹ &
Wei Jin¹

324 Accesses
2 Citations
Explore all metrics

Abstract

Research into text mining has progressed over the past decade. One of the main challenges now is gauging the difficulty of taking advantage of outside knowledge in the discovery process. In this work, to address the limitations of the traditional bag-of- words model and expand the search scope beyond the document collections at hand, we present a new text mining approach incorporating Wikipedia as the background knowledge. Various semantic kernels are built out of the extensive knowledge derived from Wikipedia and applied to the search scenario of detecting potential semantic relationships between topics. We demonstrate the effectiveness of our approach through comparing with competitive baselines, as well as alternative solutions where only part of Wikipedia resources (e.g., the Wiki-article contents or the associated Wiki-categories) is considered.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Knowledge Graphs: Opportunities and Challenges

Article Open access 03 April 2023

Modeling Relational Data with Graph Convolutional Networks

Recent trends in knowledge graphs: theory and practice

Article 16 April 2021

References

Budanitsky A, Hirst G (2006) Evaluating wordnet-based measures of lexical semantic relatedness. Comput Linguist 32(1):13–47
Article MATH Google Scholar
Bollegala D, Matsuo Y, Ishizuka M (2007) Measuring semantic similarity between words using web search engines. In: Proceedings of the 16th international conference on World Wide Web, pp 757–766
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391
Article Google Scholar
Gabrilovich E, Markovitch S (2006) Overcoming the brittleness bottleneck using Wikipedia: enhancing text categorization with encyclopedic knowledge. AAAI 6:1301–1306
Google Scholar
Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using Wikipedia-based explicit semantic analysis. IJCAI 7:1606–1611
Google Scholar
Hahn R, Bizer C, Sahnwaldt C, Herta C, Robinson S, Bürgle M, Düwiger H, Scheel U (2010) Faceted Wikipedia search. Bus Inf Syst 47:1–11
Article Google Scholar
Hoffart J, Suchanek FM, Berberich K, Weikum G (2013) YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia. Artif Intelli 194:28–61
Article MathSciNet MATH Google Scholar
Hotho A, Staab S, Stumme G (2003) Wordnet improves text document clustering. In: Proceedings of the SIGIR 2003 semantic web workshop
Jin W, Srihari RK (2006) Knowledge discovery across documents through concept chain queries. In: Proceeding of the sixth IEEE international conference on data mining workshops, pp 448–452
Jin W, Srihari RK, Ho HH, Wu X (2007) Improving knowledge discovery in document collections through combining text retrieval and link analysis techniques. In: Proceeding of the seventh IEEE international conference on data mining, pp 193–202
Jin W, Srihari RK (2007) Graph-based text representation and knowledge discovery. In: Proceedings of the 2007 ACM symposium on applied computing, pp 807–811
Lehmann J, Schüppel J, Auer S (2007) Discovering unknown connections-the DBpedia relationship finder. CSSW 113:99–110
Google Scholar
Martin P (2003) Correction and extension of WordNet 1.7. In: Conceptual structures for knowledge creation and communication, pp 160–173
Milne D (2007) Computing semantic relatedness using Wikipedia link structure. In: Proceedings of the New Zealand computer science research student conference
MWDumper. Software. http://www.mediawiki.org/wiki/Manual:MWDumper
Salahli MA (2009) An approach for measuring semantic relatedness between words via related terms. Math Comput Appl 14(1):55
Google Scholar
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Book MATH Google Scholar
Srihari RK, Lamkhede S, Bhasin A (2005) Unapparent information revelation: a concept chain graph approach. In: Proceedings of the 14th ACM international conference on information and knowledge management, pp 329–330
Srihari RK, Li W, Niu C, Cornell T (2003) Infoxtract: a customizable intermediate level information extraction engine. In: Proceedings of the HLT-NAACL 2003 workshop on software engineering and architecture of language technology systems, pp 51–58
Srinivasan P (2004) Text mining: generating hypotheses from MEDLINE. J Am Soc Inf Sci Technol 55(5):396–413
Article Google Scholar
Strube M, Ponzetto SP (2006) WikiRelate! Computinga semantic relatedness using Wikipedia. AAAI 6:1419–1424
Google Scholar
Suchanek FM, Sozio M, Weikum G (2009) SOFIE: a self-organizing framework for information extraction. In: Proceedings of the 18th international conference on World wide web, pp 631–640
Swanson DR, Smalheiser NR (1999) Implicit text linkages between Medline records: using Arrowsmith as an aid to scientific discovery. Libr Trends 48(1):48–59
Google Scholar
Swanson DR (1991) Complementary structures in disjoint science literatures. In: Proceedings of the 14th annual international ACM SIGIR conference on research and development in information retrieval, pp 280–289
Wang P, Domeniconi C (2008) Building semantic kernels for text classification using Wikipedia. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 713–721
Wong SM, Ziarko W, Wong PC (1985) Generalized vector spaces model in information retrieval. In: Proceedings of the 8th annual international ACM SIGIR conference on research and development in information retrieval, pp 18–25
Yan P, Jin W (2012) Improving cross-document knowledge discovery using explicit semantic analysis. In: Proceedings of the 14th international conference on data warehousing and knowledge discovery, pp 378–389
Yan P, Jin W (2013) A new approach for improving cross-document knowledge discovery using Wikipedia. In: Proceedings of the 18th international conference on application of natural language to information systems, pp 291–296
Yan P, Jin W (2013) Mining semantic relationships between concepts across documents incorporating Wikipedia knowledge. Advances in data mining. Applications and theoretical aspects, pp 70–84
Yan P, Jin W (2015) Improving cross-document knowledge discovery through content and link analysis of Wikipedia knowledge. In: Transactions on large-scale data-and knowledge-centered systems XXI, pp 161–184

Download references

Acknowledgments

This research work is supported in part by the NSF Grant (IIS-1452898) and NSF/North Dokota EPSCoR IIP Seed Grant (EPS-0814442).

Author information

Authors and Affiliations

Department of Computer Science, North Dakota State University, 1320 Albrecht Boulevard, Fargo, ND, 58102, USA
Peng Yan & Wei Jin

Authors

Peng Yan
View author publications
You can also search for this author in PubMed Google Scholar
Wei Jin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peng Yan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yan, P., Jin, W. Building semantic kernels for cross-document knowledge discovery using Wikipedia. Knowl Inf Syst 51, 287–310 (2017). https://doi.org/10.1007/s10115-016-0973-5

Download citation

Received: 26 February 2014
Revised: 02 March 2016
Accepted: 22 July 2016
Published: 05 August 2016
Issue Date: April 2017
DOI: https://doi.org/10.1007/s10115-016-0973-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Building semantic kernels for cross-document knowledge discovery using Wikipedia

Abstract

Access this article

Similar content being viewed by others

Knowledge Graphs: Opportunities and Challenges

Modeling Relational Data with Graph Convolutional Networks

Recent trends in knowledge graphs: theory and practice

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Building semantic kernels for cross-document knowledge discovery using Wikipedia

Abstract

Access this article

Similar content being viewed by others

Knowledge Graphs: Opportunities and Challenges

Modeling Relational Data with Graph Convolutional Networks

Recent trends in knowledge graphs: theory and practice

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation