Skip to main content
Log in

Mining knowledge from text repositories using information extraction: A review

  • Published:
Sadhana Aims and scope Submit manuscript

Abstract

There are two approaches to mining text form online repositories. First, when the knowledge to be discovered is expressed directly in the documents to be mined, Information Extraction (IE) alone can serve as an effective tool for such text mining. Second, when the documents contain concrete data in unstructured form rather than abstract knowledge, Information Extraction (IE) can be used to first transform the unstructured data in the document corpus into a structured database, and then use some state-of-the-art data mining algorithms/tools to identify abstract patterns in this extracted data. This paper presents the review of several methods related to these two approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2

Similar content being viewed by others

References

  • Bhattacharya I, Godbole S and Gupta A 2010 Building re-usable dictionary repositories for real-world text mining - CIKM’10, October 26–30, 2010, Toronto, Ontario, Canada

  • Billisoly R 2008 Practical text mining with Perl, John Willey & Sons, Inc., Hoboken, New Jersey

    Book  Google Scholar 

  • Califf M E and Mooney R J 1997 Relational learning of pattern match rules for information extraction. In: T M Ellison (ed.) CoNLL97: Computational Natural Language Learning, ACL, pp 9–15

  • Callan J and Mitamura T 2002 Knowledge based extraction of named entities - CIKM’02, pp 532–537, November 4–9, McLean, Virginia, USA, ACM New York, NY, USA

  • Carlson A, Betteridge J and Wang R C 2010 Coupled semi-supervised learning for information extraction- WSDM’10, February 4–6, New York City, New York, USA

  • Clifton C, Cooley R and Rennie J 2004 TopCat: Data mining for topic identification in a text corpus. IEEE Trans. Knowl. Data Eng. 16(8): 949–964

    Article  Google Scholar 

  • Davi de Castro R, Golgher P B, da Silva A S and Laender A H F 2004 Automatic web news extraction using tree edit distance-WWW2004, pp 502–522, May 17–22. ACM, New York, USA

  • Ding Z, Zhang Q and Huang X 2011 Keyphrase extraction from online news using binary integer programming, Proc. 5th Internat. Joint Conf. on Natural Language Processing, Chiang Mai, Thailand, pp 165–173, November 8–13

  • Downey D, Etzioni O, Soderland S and Weld D S 2002 Learning text patterns for web information extraction and assessment (www.aaai.org)

  • Duan H and Zheng Y 2011 A study on features of the CRFs-based Chinese named entity recognition. Int. J. Adv. Intell. 3(2): 287–294

    MathSciNet  Google Scholar 

  • Etzioni O, Cafarella M, Downey D, Popescu A-M, Shaked T, Soderland S, Weld D S and Yates A 2005 Unsupervised named-entity extraction from the web: An experimental study. Artif. Intell. 165(1): 1–42

    Google Scholar 

  • Fatudimu I T, Musa A G, Ayo C K and Sofoluwe A B 2008 Knowledge discovery in online repositories: a text mining approach. Eur. J. Sci. Res. 22(2): 241–250

    Google Scholar 

  • Gupta V and Lehal G S 2009 A survey of text mining techniques and applications. J. Emerg. Technol. Web Intell. 1(1): 60–76

    Google Scholar 

  • Kim S, Jeong M and Lee G G 2009 A local tree alignment-based soft pattern matching approach for information extraction: Proc. of NAACL HLT 2009: Short Papers, Boulder, Colorado, pp 169–172, June, Association for Computational Linguistic

  • Konchady M 2009 Text mining application programming- Cengage Learning India Private Ltd.

  • Mahgoub H, Rosner D, Ismail N and Torkey F 2007 A text mining technique using association rules extraction. Int. J. Comput. Intell. 4(1): 21–28

    Google Scholar 

  • Mayor S and Pant B 2012 Document classification using support vector machine. Int. J. Eng. Sci. Technol. (IJEST) 4(4): 1741–1745

    Google Scholar 

  • Mooney R J and Nahm U Y 2003 Text mining with information extraction- multilingualism and electronic language management, Proc. 4th Internat. MIDP Colloquium, September 2003, Bloemfontein, South Africa, W Daelemans, T du Plessis, C Snyman and L Teck (eds) Van Schaik Pub., South Africa, pp 141–160

  • Patwardhan S and Riloff E 2006 Learning domain-specific information extraction patterns from the Web- IEBeyondDoc ’06, Proc. Workshop Information Extraction Beyond The Document, Association for Computational Linguistics, Stroudsburg, PA, USA, pp 66–73

  • Rose S, Engel D, Cramer N and Cowley W 2010 Automatic keyword extraction from individual document, Text mining: Application and theory, M W Berry and J Kogan (eds) John Willey & Sons Ltd 2010, pp 3–20

  • Sánchez D, Martín-Bautista M J and Blanco I 2008 Text knowledge mining: an alternative to text data mining. IEEE Int. Conf. Data Mining Workshops, pp 664–672. doi:10.1109/ICDMW.2008.57

  • Shehata S, Karray F and Kamel M S 2010 An efficient concept-based mining model for enhancing text clustering. IEEE Trans. Knowl. Data Eng. 22(10): 1360–1371

    Article  Google Scholar 

  • Speretta M and Gauch S 2008 Using text mining to enrich the vocabulary of domain ontologies - 2008 IEEE/ WIC/ ACM Internat. Conf. Web Intelligence and Intelligent Agent Technology, vol. 1, pp 549–552, IEEE Computer Society, Washington, DC, USA

  • Zakzouk T and Mathkour H 2011 Text classifiers for cricket sports news - 2011. Internat. Conf. Telecommun. Tech. Appli., Proc. CSIT, vol. 5, IACSIT Press, Singapore

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to SANDEEP R SIRSAT.

Rights and permissions

Reprints and permissions

About this article

Cite this article

SIRSAT, S.R., CHAVAN, D.V. & DESHPANDE, D.S.P. Mining knowledge from text repositories using information extraction: A review. Sadhana 39, 53–62 (2014). https://doi.org/10.1007/s12046-013-0197-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12046-013-0197-2

Keywords

Navigation