Using the αβ-Neighborhood for Adaptive Document Filtering

  • Adrian Fonseca-Bruzón
  • Reynaldo Gil-García
  • Aurora Pons-Porrata
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5197)

Abstract

In this paper, we address the problem of adaptive document filtering. Traditionally, user profiles are represented by the centroid of the available examples, assuming that these are homogeneously distributed around this centroid. However, these examples may be irregularly distributed, being some areas more populated than others. While, in this case, the homogeneity assumption may not be globally true, it may still hold locally. In order to handle this phenomenon, we introduce a new approach in which a binary classifier for each user profile is used and more than one document is considered in the classification task. To decide whether a new document is relevant to the user or not, our approach uses a Nearest Neighbor classifier based on a neighborhood which inspects a sufficiently small area surrounding the new document. Experiments carried out on the TREC-11 collection show the effectiveness of the proposed method.

Keywords

adaptive filtering nearest neighbor classifier 

References

  1. 1.
    Gil-García, R., Pons-Porrata, A.: A new nearest neighbor rule for text categorization. In: Martínez-Trinidad, J.F., Carrasco Ochoa, J.A., Kittler, J. (eds.) CIARP 2006. LNCS, vol. 4225, pp. 814–823. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  2. 2.
    Zhang, Y.: Bayesian Graphical Models for Adaptive Filtering. PhD thesis, Carnegie Mellon University, Pittsburgh, USA (2005)Google Scholar
  3. 3.
    Yang, Y., Kisiel, B.: Margin-based local regression for adaptive filtering. In: CIKM 2003, pp. 191–198. ACM Press, New Orleans (2003)Google Scholar
  4. 4.
    Tebri, H., Boughanem, M., Chrisment, C.: Incremental profile learning based on a reinforcement method. In: Liebrock, L.M. (ed.) 2005 ACM Symposium on Applied Computing, pp. 1096–1101. ACM Press, Santa Fe (2005)CrossRefGoogle Scholar
  5. 5.
    Lewis, D.: Applying Support Vector Machines to the TREC-2001 Batch Filtering and Routing Tasks. In: TREC 2001, pp. 286–292. NIST (2001)Google Scholar
  6. 6.
    Cancedda, N., Cesa-Bianchi, N., Conconi, A., Gentile, C., Goutte, C., Graepel, T., Li, Y., Renders, J.M., Taylor, J.S., Vinokourov, A.: Kernel methods for document filtering. In: TREC 2002, NIST (2002)Google Scholar
  7. 7.
    McNamee, P., Piatko, C., Mayfield, J.: JHU/APL at TREC 2002: Experiments in Filtering and Arabic Retrieval. In: TREC 2002, pp. 358–363. NIST (2002)Google Scholar
  8. 8.
    Zhang, Y., Xu, W., Callan, J.: Exploration and exploitation in adaptive filtering based on bayesian active learning. In: Fawcett, T., Mishra, N. (eds.) ICML 2003, Washington, DC, pp. 896–903 (2003)Google Scholar
  9. 9.
    Kassab, R., Lamirel, J.C.: A new approach to intelligent text filtering based on novelty detection. In: 17th Australasian Database Conference, pp. 149–156. Australian Computer Society, Hobart (2006)Google Scholar
  10. 10.
    Ault, T., Yang, Y.: kNN at TREC-9. In: TREC 2000, NIST (2000)Google Scholar
  11. 11.
    Xu, H., Yang, Z., Wang, B., Liu, B., Cheng, J., Liu, Y., Yang, Z., Cheng, X., Bai, S.: TREC 11 Experiments at CAS-ICT: Filtering and Web. In: TREC 2002, NIST (2002)Google Scholar
  12. 12.
    Ault, T., Yang, Y.: kNN, Rocchio and Metrics for Information Filtering at TREC-10. In: TREC 2001, pp. 84–93. NIST (2001)Google Scholar
  13. 13.
    Duda, R., Hart, P., Stark, D.G.: Pattern Classification. Wiley-Interscience, Chichester (2000)Google Scholar
  14. 14.
    Robertson, S., Soboroff, I.: The TREC 2002 Filtering Track Report. In: TREC 2002, NIST (2002)Google Scholar
  15. 15.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24, 513–523 (1988)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Adrian Fonseca-Bruzón
    • 1
  • Reynaldo Gil-García
    • 1
  • Aurora Pons-Porrata
    • 1
  1. 1.Center for Pattern Recognition and Data MiningUniversidad de OrienteSantiago de CubaCuba

Personalised recommendations