Advertisement

Improved Document Filtering by Multilevel Term Relations

  • Adrian Fonseca BruzónEmail author
  • Aurelio López-López
  • José E. Medina Pagola
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11896)

Abstract

Humans tend to organize information in documents in a logical and intentional way. This organization, which we call textual structure, is commonly in terms of sections, chapters, paragraphs, or sentences. This structure facilitates the understanding of the content that we want to transmit. However, such structure, in which we usually encode the semantic content of information, is not usually exploited by the filtering methods for the construction of user profile. In this work, we propose the use of term relations considering different context levels for enhancing document filtering. We propose methods for obtaining the representation, considering the existence of imbalance between the documents that satisfy the information needs of users, as well as the Cold Start problem (having scarce information) during the initial construction of the user profile. The experiments carried out allowed to assess the impact on the filtering task of the proposed representation.

Keywords

Document Filtering Term Relations Cold Start Document structure 

References

  1. 1.
    Ault, T., Yang, Y.: KNN, Rocchio and metrics for information filtering at TREC-10. In: Proceeding of the Tenth Text REtrieval Conference, pp. 84–93. National Institute of Standards and Technology (2001)Google Scholar
  2. 2.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)zbMATHGoogle Scholar
  3. 3.
    Gao, Y., Xu, Y., Li, Y.: Pattern-based topics for document modelling in information filtering. IEEE Trans. Knowl. Data Eng. 27(6), 1629–1642 (2015)CrossRefGoogle Scholar
  4. 4.
    Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5(Apr), 361–397 (2004)Google Scholar
  5. 5.
    Montejo-Ráez, A., Perea-Ortega, J.M., Díaz-Galiano, M.C., Ureña-López, L.A.: Experiments with Google news for filtering newswire articles. In: Peters, C., et al. (eds.) CLEF 2009. LNCS, vol. 6241, pp. 381–384. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15754-7_46CrossRefGoogle Scholar
  6. 6.
    Pons-Porrata, A., Berlanga-Llavori, R., Ruiz-Shulcloper, J.: On-line event and topic detection by using the compact sets clustering algorithm. J. Intell. Fuzzy Syst. 12(3, 4), 185–194 (2002)zbMATHGoogle Scholar
  7. 7.
    Soboroff, I., Ounis, I., Macdonald, C., Lin, J.J.: Overview of the TREC-2012 microblog track. In: TREC, vol. 2012, p. 20 (2012)Google Scholar
  8. 8.
    Wai, T.T., Aung, S.S.: Enhanced frequent itemsets based on topic modeling in information filtering. In: 2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS), pp. 155–160. IEEE (2017)Google Scholar
  9. 9.
    Wu, L., Huang, X., Niu, J.: FDU at TREC 2002: filtering, Q&A, web and video tasks. In: Proceeding of the Eleventh Text REtrieval Conference, pp. 232–247 (2002)Google Scholar
  10. 10.
    Wu, S.T., Li, Y., Xu, Y., Pham, B., Chen, P.: Automatic pattern-taxonomy extraction for web mining. In: Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 242–248. IEEE Computer Society (2004)Google Scholar
  11. 11.
    Xu, H., et al.: TREC-11 experiments at CAS-ICT: filtering and web. In: Proceeding of the Eleventh Text REtrieval Conference (TREC 2011), pp. 141–151 (2002)Google Scholar
  12. 12.
    Zhang, L., Zhang, Y.: Hierarchical Bayesian models with factorization for content-based recommendation. arXiv preprint arXiv:1412.8118 (2014)
  13. 13.
    Zhang, L., Zhang, Y., Xing, Q.: Learning from labeled features for document filtering. CoRR abs/1412.8125 (2014)Google Scholar
  14. 14.
    Zhang, Y.: Using Bayesian priors to combine classifiers for adaptive filtering. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2004, pp. 345–352. ACM, New York (2004)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Adrian Fonseca Bruzón
    • 1
    Email author
  • Aurelio López-López
    • 1
  • José E. Medina Pagola
    • 2
  1. 1.Instituto Nacional de Astrofísica, Óptica y ElectrónicaPueblaMexico
  2. 2.University of Informatic SciencesHavanaCuba

Personalised recommendations