Natural language processing (NLP), also known as computational linguistics, is a broad subject that encompasses technologies for automated processing of natural (human) language. Such processing includes parsing natural-language text, generating natural language as a form of program output, and extracting semantics. I have been working in this field since the 1980s using a wide variety of programming techniques, and I believe that quantitative methods provide better results with less effort for most applications. Therefore, in this chapter, I will cover statistical NLP. Statistical NLP uses statistical or probabilistic methods for segmenting text, determining each word’s likely part of speech, classifying text, and automatically summarizing text. I will show you what I consider to be some of the simplest yet most useful techniques for developing Web 3.0 applications that require some “understanding” of text. (Natural-language generation is also a useful topic, but I won’t cover it here.)
KeywordsNatural Language Processing Resource Description Framework Bayesian Classifier Input Text Latent Semantic Indexing
Unable to display preview. Download preview PDF.