Toward File Consolidation by Document Categorization

  • Abdel Belaïd
  • André Alusse
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3872)

Abstract

An efficient adaptive document classification and categorization approach is proposed for personal file creation corresponding to user’s specific needs and profile. This kind of approach is needed because the search engines are often too general to offer a precise answer to the user request. As we cannot act directly on the search engines methodology, we propose to rather act on the documents retrieved by classifying and ranking them properly. A classifier combination approach is considered. These classifiers are chosen very complementary in order to treat all the query aspects and to present to the user at the end a readable and comprehensible result. The application performed corresponds to the law articles stemmed from the European Union data base. The law texts are always entangled with cross-references and accompanied by some updating files (for application dates, for new terms and formulations). Our approach found here a real application offering to the specialist (jurist, lawyer, etc. ) a synthetic vision of the law related to the topic requested.

Keywords

Vector Space Model Agglomerative Hierarchical Cluster Document Retrieval Document Categorization Automatic Cluster 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Rangoni, Y., Belaïd, A.: Data categorization for a context return applied to logical document structure recognition. In: ICDAR, Seoul, Korea, pp. 297–301 (2005)Google Scholar
  2. 2.
    Hearst, M.A., Pedersen, J.O.: Reexamining the cluster hypothesis: Scatter/Gather on retrieval results. In: Actes of ACM/SIGIR Conference on Research and Development in Information Retrieval, Zurich, Suisse, pp. 76–84 (1996)Google Scholar
  3. 3.
    Lam, W., Lai, K.Y.: A meta-learning approach for text categorization. In: Proceedings of SIGIR 2001, New Orleans, US, pp. 303–309 (2001)Google Scholar
  4. 4.
    Bennett, P.N., Dumais, S.T., Horvitz, E.: Probabilistic combination of text classifiers using reliability indicators: Models and results. In: Proceedings of SIGIR 2002, Tampere, Finland, pp. 207–215 (2002)Google Scholar
  5. 5.
    Voorhees, E.M.: Implementing agglomerative hierarchical clustering algorithms for use in document retrieval. Information Processing and Management 22, 465–476 (1986)CrossRefGoogle Scholar
  6. 6.
    Zamir, O., Etzioni, O.: Web document clustering: a feasibility demonstration. In: Proceedings of the 19th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1998), pp. 46–54 (1998)Google Scholar
  7. 7.
    Osinski, S.: An Algorithm for clustering of Web Search results”, Master thesis, Poznan Universitv of technology (2003)Google Scholar
  8. 8.
    Lamirel, J.C., Francois, C., Al Shehadi, S., Hoffman, M.: Multi-Topographic new classification quality estimators for analysis of documentary information: Application to patent analysis and web mapping. Scientometrics international Journal 60(3), 445–462 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Abdel Belaïd
    • 1
  • André Alusse
    • 1
  1. 1.LORIAVandoeuvre-Lès-NancyFrance

Personalised recommendations