Knowledge Extraction from Natural Language Processing

Sbattella, Licia; Tedesco, Roberto

doi:10.1007/978-3-642-31739-2_10

Licia Sbattella²⁰ &
Roberto Tedesco²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7200))

993 Accesses

Abstract

This chapter presents a model for knowledge extraction from documents written in natural language. The model relies on a clear distinction between a conceptual level, which models the domain knowledge, and a lexical level, which represents the domain vocabulary. An advanced stochastic model (which mixes, in a novel way, two well-known approaches) stores the mapping between such levels, taking in account the linguistic context of words. Such a stochastic model is then used to disambiguate documents’ words, during the indexing phase. The engine supports simple keyword-based queries, as well as natural language-based queries. The system is able to extend the domain knowledge, by means of a production-rules engine. The validation tests indicate that the system is able to extract concepts with good accuracy, even if the train set is small.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alani, H., Kim, S., Millard, D.E., Hall, M.J.W.W., Weal, M.J., Hall, W., Lewis, P.H., Shadbolt, N.R.: Automatic ontology-based knowledge extraction and tailored biography generation from the web. IEEE Intelligent Systems 18, 14–21 (2002)
Google Scholar
Atserias, J., Casas, B., Comelles, E., González, M., Padró, L., Padró, M.: Freeling 1.3: Syntactic and semantic services in an open-source NLP library. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy. ELRA (May 2006)
Google Scholar
Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P.: The Description Logic Handbook: Theory, Implementation and Applications. Cambridge University Press (2003)
Google Scholar
Charniak, E., Johnson, M.: Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In: Proceedings of the 43rd Annual Meeting of the ACM, New York, pp. 173–180 (June 2005)
Google Scholar
de Marneffe, M.C., Manning, C.D.: Stanford typed dependencies manual. The Stanford Natural Language Processing Group, Stanford University, Stanford, California (September 2008)
Google Scholar
Fellbaum, C. (ed.): WordNet: An Elettronic Lexical Database. MIT Press (1998)
Google Scholar
Ferrucci, D., et al.: Towards an interoperability standard for text and multi-modal analytics. Research Report RC24122 (W0611-188), IBM (2006)
Google Scholar
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics (2005)
Google Scholar
Francis, W.N., Kucera, H.: Brown Corpus Manual. Department of Linguistics. Brown University, Providence (1979)
Google Scholar
Goodman, J.: Sequential conditional generalized iterative scaling. In: Proceedings of the 40th Annual Meeting of the Association for Conputational Linguistics (ACL), Philadelphia, USA, pp. 9–16 (July 2002)
Google Scholar
He, Y.: Extended viterbi algorithm for second order Hidden Markov Process. In: Proceedings of 9th International Conference on Pattern Recognition, Rome, Italy, pp. 718–720 (January 1988)
Google Scholar
Jaynes, E.T.: Prior probabilities. IEEE Transactions On Systems Science and Cybernetics sec-4(3), 227–241 (1968)
Article Google Scholar
Jelinek, F.: A fast sequential decoding algorithm using a stack. IBM Journal or Research and Development 13, 675–685 (1969)
Article MathSciNet MATH Google Scholar
Jurafsky, D., Martin, J.H.: Speech and Language Processing, pp. 254–259. Prentice-Hall (2000)
Google Scholar
Kashyap, V., Bussler, C., Moran, M.: Ontology Authoring and Management. In: Data-Centric Systems and Applications, ch. 6. Springer, Heidelberg (2008)
Google Scholar
Klein, D., Manning, C.: Fast exact inference with a factored model for natural language parsing. In: NIPS (2003)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of 8th International Conference on Machine Learning (ICML 2001), Williamstown, MA, USA (June 2001)
Google Scholar
Qiu, L., Kan, M.-Y., Chua, T.-S.: A public reference implementation of the RAP anaphora resolution algorithm. In: Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal, vol. 1, pp. 291–294 (May 2004)
Google Scholar
Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
MATH Google Scholar
McCallum, A., Freitag, D., Pereira, F.: Maximum entropy markov models for information extraction and segmentation. In: Proceedings of 7th International Conference on Machine Learning (ICML 2000), Stanford, USA, pp. 591–598 (June 2000)
Google Scholar
Quillian, M.: Semantic Memory. Semantic Information Processing, pp. 227–270. MIT Press, Cambridge (1968)
Google Scholar
Rabiner, L.R., Juang, B.H.: An introduction to hidden markov models. IEEE ASSP Magazine (January 1986)
Google Scholar
Ratnaparkhi, A.: A simple introduction to maximum entropy models for natural language processing. Technical report, Institute for Research in Cognitive Science, University of Pennsylvania (1997)
Google Scholar
Ratnaparkhi, A., Roukos, S., Ward, R.T.: A maximum entropy model for parsing. In: Proceedings of the International Conference on Spoken Language Processing, pp. 803–806 (1994)
Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Article MATH Google Scholar
Shieber, S.M.: The design of a computer language for linguistic information. In: Proceedings of Coling84, 10th International Conference on Computational Linguistics, pp. 362–366. Stanford University, Stanford (1984)
Chapter Google Scholar
Shieber, S.M.: An Introduction to Unification Based Approaches to Grammar. CSLI Lecture Notes Series, vol. 4. Center for the Study of Language and Information, Stanford University (1986)
Google Scholar
Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of HLT-NAACL, Edmonton, Canada (2003)
Google Scholar
Vargas-Vera, M., Motta, E., Domingue, J., Shum, S.B., Lanzoni, M.: Knowledge extraction by using an ontology-based annotation tool. In: Proceedings of the K-CAP 2001 Workshop on Knowledge Markup and Semantic Annotation, pp. 5–12 (2001)
Google Scholar
Wallach, H.M.: Conditional random fields: An introduction. Technical Report MS-CIS-04-21, University of Pennsylvania CIS (February 2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Elettronica e Informazione, Politecnico di Milano, via L. da Vinci 32, 20133, Milano, Italy
Licia Sbattella
MCPT, Politecnico di Milano, via L. da Vinci 32, 20133, Milano, Italy
Roberto Tedesco

Authors

Licia Sbattella
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Tedesco
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dip. di Ingegneria dell’Informazione, Università di Pisa, Largo Lucio Lazzarino 1, 56122, Pisa, Italy
Giuseppe Anastasi
Dipartimento di Elettronica ed Informazione, Politecnico di Milano, Piazza Leonardo da Vinci 21, 20133, Milano, Italy
Emilio Bellini , Elisabetta Di Nitto & Letizia Tanca , &
Deep-SE Group - Dipartimento di Elettronica e Informazione, Politecnico di Milano, Piazza L. da Vinci, 32, 20133, Milano, Italy
Carlo Ghezzi
Sistemi di Elaborazione dell’Informazione, Università degli Studi del Sannio, Via Traiano 1, 82100, Benevento, Italy
Eugenio Zimeo

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sbattella, L., Tedesco, R. (2012). Knowledge Extraction from Natural Language Processing. In: Anastasi, G., Bellini, E., Di Nitto, E., Ghezzi, C., Tanca, L., Zimeo, E. (eds) Methodologies and Technologies for Networked Enterprises. Lecture Notes in Computer Science, vol 7200. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31739-2_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-31739-2_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31738-5
Online ISBN: 978-3-642-31739-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics