Skip to main content

Knowledge Extraction from Natural Language Processing

  • Chapter
Methodologies and Technologies for Networked Enterprises

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7200))

  • 993 Accesses

Abstract

This chapter presents a model for knowledge extraction from documents written in natural language. The model relies on a clear distinction between a conceptual level, which models the domain knowledge, and a lexical level, which represents the domain vocabulary. An advanced stochastic model (which mixes, in a novel way, two well-known approaches) stores the mapping between such levels, taking in account the linguistic context of words. Such a stochastic model is then used to disambiguate documents’ words, during the indexing phase. The engine supports simple keyword-based queries, as well as natural language-based queries. The system is able to extend the domain knowledge, by means of a production-rules engine. The validation tests indicate that the system is able to extract concepts with good accuracy, even if the train set is small.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alani, H., Kim, S., Millard, D.E., Hall, M.J.W.W., Weal, M.J., Hall, W., Lewis, P.H., Shadbolt, N.R.: Automatic ontology-based knowledge extraction and tailored biography generation from the web. IEEE Intelligent Systems 18, 14–21 (2002)

    Google Scholar 

  2. Atserias, J., Casas, B., Comelles, E., González, M., Padró, L., Padró, M.: Freeling 1.3: Syntactic and semantic services in an open-source NLP library. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy. ELRA (May 2006)

    Google Scholar 

  3. Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P.: The Description Logic Handbook: Theory, Implementation and Applications. Cambridge University Press (2003)

    Google Scholar 

  4. Charniak, E., Johnson, M.: Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In: Proceedings of the 43rd Annual Meeting of the ACM, New York, pp. 173–180 (June 2005)

    Google Scholar 

  5. de Marneffe, M.C., Manning, C.D.: Stanford typed dependencies manual. The Stanford Natural Language Processing Group, Stanford University, Stanford, California (September 2008)

    Google Scholar 

  6. Fellbaum, C. (ed.): WordNet: An Elettronic Lexical Database. MIT Press (1998)

    Google Scholar 

  7. Ferrucci, D., et al.: Towards an interoperability standard for text and multi-modal analytics. Research Report RC24122 (W0611-188), IBM (2006)

    Google Scholar 

  8. Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics (2005)

    Google Scholar 

  9. Francis, W.N., Kucera, H.: Brown Corpus Manual. Department of Linguistics. Brown University, Providence (1979)

    Google Scholar 

  10. Goodman, J.: Sequential conditional generalized iterative scaling. In: Proceedings of the 40th Annual Meeting of the Association for Conputational Linguistics (ACL), Philadelphia, USA, pp. 9–16 (July 2002)

    Google Scholar 

  11. He, Y.: Extended viterbi algorithm for second order Hidden Markov Process. In: Proceedings of 9th International Conference on Pattern Recognition, Rome, Italy, pp. 718–720 (January 1988)

    Google Scholar 

  12. Jaynes, E.T.: Prior probabilities. IEEE Transactions On Systems Science and Cybernetics sec-4(3), 227–241 (1968)

    Article  Google Scholar 

  13. Jelinek, F.: A fast sequential decoding algorithm using a stack. IBM Journal or Research and Development 13, 675–685 (1969)

    Article  MathSciNet  MATH  Google Scholar 

  14. Jurafsky, D., Martin, J.H.: Speech and Language Processing, pp. 254–259. Prentice-Hall (2000)

    Google Scholar 

  15. Kashyap, V., Bussler, C., Moran, M.: Ontology Authoring and Management. In: Data-Centric Systems and Applications, ch. 6. Springer, Heidelberg (2008)

    Google Scholar 

  16. Klein, D., Manning, C.: Fast exact inference with a factored model for natural language parsing. In: NIPS (2003)

    Google Scholar 

  17. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of 8th International Conference on Machine Learning (ICML 2001), Williamstown, MA, USA (June 2001)

    Google Scholar 

  18. Qiu, L., Kan, M.-Y., Chua, T.-S.: A public reference implementation of the RAP anaphora resolution algorithm. In: Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal, vol. 1, pp. 291–294 (May 2004)

    Google Scholar 

  19. Manning, C., SchĂĽtze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  20. McCallum, A., Freitag, D., Pereira, F.: Maximum entropy markov models for information extraction and segmentation. In: Proceedings of 7th International Conference on Machine Learning (ICML 2000), Stanford, USA, pp. 591–598 (June 2000)

    Google Scholar 

  21. Quillian, M.: Semantic Memory. Semantic Information Processing, pp. 227–270. MIT Press, Cambridge (1968)

    Google Scholar 

  22. Rabiner, L.R., Juang, B.H.: An introduction to hidden markov models. IEEE ASSP Magazine (January 1986)

    Google Scholar 

  23. Ratnaparkhi, A.: A simple introduction to maximum entropy models for natural language processing. Technical report, Institute for Research in Cognitive Science, University of Pennsylvania (1997)

    Google Scholar 

  24. Ratnaparkhi, A., Roukos, S., Ward, R.T.: A maximum entropy model for parsing. In: Proceedings of the International Conference on Spoken Language Processing, pp. 803–806 (1994)

    Google Scholar 

  25. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  26. Shieber, S.M.: The design of a computer language for linguistic information. In: Proceedings of Coling84, 10th International Conference on Computational Linguistics, pp. 362–366. Stanford University, Stanford (1984)

    Chapter  Google Scholar 

  27. Shieber, S.M.: An Introduction to Unification Based Approaches to Grammar. CSLI Lecture Notes Series, vol. 4. Center for the Study of Language and Information, Stanford University (1986)

    Google Scholar 

  28. Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of HLT-NAACL, Edmonton, Canada (2003)

    Google Scholar 

  29. Vargas-Vera, M., Motta, E., Domingue, J., Shum, S.B., Lanzoni, M.: Knowledge extraction by using an ontology-based annotation tool. In: Proceedings of the K-CAP 2001 Workshop on Knowledge Markup and Semantic Annotation, pp. 5–12 (2001)

    Google Scholar 

  30. Wallach, H.M.: Conditional random fields: An introduction. Technical Report MS-CIS-04-21, University of Pennsylvania CIS (February 2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Sbattella, L., Tedesco, R. (2012). Knowledge Extraction from Natural Language Processing. In: Anastasi, G., Bellini, E., Di Nitto, E., Ghezzi, C., Tanca, L., Zimeo, E. (eds) Methodologies and Technologies for Networked Enterprises. Lecture Notes in Computer Science, vol 7200. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31739-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31739-2_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31738-5

  • Online ISBN: 978-3-642-31739-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics