Adapting the Naive Bayes Classifier to Rank Procedural Texts

  • Ling Yin
  • Richard Power
Conference paper

DOI: 10.1007/11735106_17

Volume 3936 of the book series Lecture Notes in Computer Science (LNCS)
Cite this paper as:
Yin L., Power R. (2006) Adapting the Naive Bayes Classifier to Rank Procedural Texts. In: Lalmas M., MacFarlane A., Rüger S., Tombros A., Tsikrika T., Yavlinsky A. (eds) Advances in Information Retrieval. ECIR 2006. Lecture Notes in Computer Science, vol 3936. Springer, Berlin, Heidelberg

Abstract

This paper presents a machine-learning approach for ranking web documents according to the proportion of procedural text they contain. By ‘procedural text’ we refer to ordered lists of steps, which are very common in some instructional genres such as online manuals. Our initial training corpus is built up by applying some simple heuristics to select documents from a large collection and contains only a few documents with a large proportion of procedural texts. We adapt the Naive Bayes classifier to better fit this less than ideal training corpus. This adapted model is compared with several other classifiers in ranking procedural texts using different sets of features and is shown to perform well when only highly distinctive features are used.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Ling Yin
    • 1
  • Richard Power
    • 2
  1. 1.Natural Language Technology Group (NLTG)University of BrightonBrightonUnited Kingdom
  2. 2.Faculty of Mathematics and ComputingThe Open UniversityMilton KeynesUnited Kingdom