Abstract
This chapter presents a model for knowledge extraction from documents written in natural language. The model relies on a clear distinction between a conceptual level, which models the domain knowledge, and a lexical level, which represents the domain vocabulary. An advanced stochastic model (which mixes, in a novel way, two well-known approaches) stores the mapping between such levels, taking in account the linguistic context of words. Such a stochastic model is then used to disambiguate documents’ words, during the indexing phase. The engine supports simple keyword-based queries, as well as natural language-based queries. The system is able to extend the domain knowledge, by means of a production-rules engine. The validation tests indicate that the system is able to extract concepts with good accuracy, even if the train set is small.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alani, H., Kim, S., Millard, D.E., Hall, M.J.W.W., Weal, M.J., Hall, W., Lewis, P.H., Shadbolt, N.R.: Automatic ontology-based knowledge extraction and tailored biography generation from the web. IEEE Intelligent Systems 18, 14–21 (2002)
Atserias, J., Casas, B., Comelles, E., González, M., Padró, L., Padró, M.: Freeling 1.3: Syntactic and semantic services in an open-source NLP library. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy. ELRA (May 2006)
Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P.: The Description Logic Handbook: Theory, Implementation and Applications. Cambridge University Press (2003)
Charniak, E., Johnson, M.: Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In: Proceedings of the 43rd Annual Meeting of the ACM, New York, pp. 173–180 (June 2005)
de Marneffe, M.C., Manning, C.D.: Stanford typed dependencies manual. The Stanford Natural Language Processing Group, Stanford University, Stanford, California (September 2008)
Fellbaum, C. (ed.): WordNet: An Elettronic Lexical Database. MIT Press (1998)
Ferrucci, D., et al.: Towards an interoperability standard for text and multi-modal analytics. Research Report RC24122 (W0611-188), IBM (2006)
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics (2005)
Francis, W.N., Kucera, H.: Brown Corpus Manual. Department of Linguistics. Brown University, Providence (1979)
Goodman, J.: Sequential conditional generalized iterative scaling. In: Proceedings of the 40th Annual Meeting of the Association for Conputational Linguistics (ACL), Philadelphia, USA, pp. 9–16 (July 2002)
He, Y.: Extended viterbi algorithm for second order Hidden Markov Process. In: Proceedings of 9th International Conference on Pattern Recognition, Rome, Italy, pp. 718–720 (January 1988)
Jaynes, E.T.: Prior probabilities. IEEE Transactions On Systems Science and Cybernetics sec-4(3), 227–241 (1968)
Jelinek, F.: A fast sequential decoding algorithm using a stack. IBM Journal or Research and Development 13, 675–685 (1969)
Jurafsky, D., Martin, J.H.: Speech and Language Processing, pp. 254–259. Prentice-Hall (2000)
Kashyap, V., Bussler, C., Moran, M.: Ontology Authoring and Management. In: Data-Centric Systems and Applications, ch. 6. Springer, Heidelberg (2008)
Klein, D., Manning, C.: Fast exact inference with a factored model for natural language parsing. In: NIPS (2003)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of 8th International Conference on Machine Learning (ICML 2001), Williamstown, MA, USA (June 2001)
Qiu, L., Kan, M.-Y., Chua, T.-S.: A public reference implementation of the RAP anaphora resolution algorithm. In: Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal, vol. 1, pp. 291–294 (May 2004)
Manning, C., SchĂĽtze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
McCallum, A., Freitag, D., Pereira, F.: Maximum entropy markov models for information extraction and segmentation. In: Proceedings of 7th International Conference on Machine Learning (ICML 2000), Stanford, USA, pp. 591–598 (June 2000)
Quillian, M.: Semantic Memory. Semantic Information Processing, pp. 227–270. MIT Press, Cambridge (1968)
Rabiner, L.R., Juang, B.H.: An introduction to hidden markov models. IEEE ASSP Magazine (January 1986)
Ratnaparkhi, A.: A simple introduction to maximum entropy models for natural language processing. Technical report, Institute for Research in Cognitive Science, University of Pennsylvania (1997)
Ratnaparkhi, A., Roukos, S., Ward, R.T.: A maximum entropy model for parsing. In: Proceedings of the International Conference on Spoken Language Processing, pp. 803–806 (1994)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Shieber, S.M.: The design of a computer language for linguistic information. In: Proceedings of Coling84, 10th International Conference on Computational Linguistics, pp. 362–366. Stanford University, Stanford (1984)
Shieber, S.M.: An Introduction to Unification Based Approaches to Grammar. CSLI Lecture Notes Series, vol. 4. Center for the Study of Language and Information, Stanford University (1986)
Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of HLT-NAACL, Edmonton, Canada (2003)
Vargas-Vera, M., Motta, E., Domingue, J., Shum, S.B., Lanzoni, M.: Knowledge extraction by using an ontology-based annotation tool. In: Proceedings of the K-CAP 2001 Workshop on Knowledge Markup and Semantic Annotation, pp. 5–12 (2001)
Wallach, H.M.: Conditional random fields: An introduction. Technical Report MS-CIS-04-21, University of Pennsylvania CIS (February 2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Sbattella, L., Tedesco, R. (2012). Knowledge Extraction from Natural Language Processing. In: Anastasi, G., Bellini, E., Di Nitto, E., Ghezzi, C., Tanca, L., Zimeo, E. (eds) Methodologies and Technologies for Networked Enterprises. Lecture Notes in Computer Science, vol 7200. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31739-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-31739-2_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31738-5
Online ISBN: 978-3-642-31739-2
eBook Packages: Computer ScienceComputer Science (R0)