Advertisement

Scientometrics

, Volume 60, Issue 3, pp 445–562 | Cite as

New classification quality estimators for analysis of documentary information: Application to patent analysis and web mapping

  • Jean-Charles Lamirel
  • Claire Francois
  • Shadi Al Shehabi
  • Martial Hoffmann
Article

Abstract

The information analysis process includes a cluster analysis or classification step associated with an expert validation of the results. In this paper, we propose new measures of Recall/Precision for estimating the quality of cluster analysis. These measures derive both from the Galois lattice theory and from the Information Retrieval (IR) domain. As opposed to classical measures of inertia, they present the main advantages to be both independent of the classification method and of the difference between the intrinsic dimension of the data and those of the clusters. We present two experiments on the basis of the MultiSOM model, which is an extension of Kohonen's SOM model, as a cluster analysis method. Our first experiment on patent data shows how our measures can be used to compare viewpoint-oriented classification methods, such as MultiSOM, with global cluster analysis method, such as WebSOM. Our second experiment, which takes part in the EICSTES EEC project, is an original Webometrics experiment that combines content and links classification starting from a large non-homogeneous set of web pages. This experiment highlights the fact that break-even points between our different measures of Recall/Precision can be used to determine an optimal number of clusters for web data classification. The content of the clusters obtained when using different break-even points are compared for determining the quality of the resulting maps.

Keywords

Classification Method Information Retrieval System Class Content Class Profile Galois Lattice 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    M. Barbut, B. Monjardet, Ordre et Classification: Algèbre et Combinatoire, Hachette Université, Paris, 1970.Google Scholar
  2. 2.
    IST-1999-20350.Google Scholar
  3. 3.
    T. Kohonen, Self-Organizing Maps, Springer Verlag, Berlin, 1997.Google Scholar
  4. 4.
    S. Kaski, T. Honkela, K. Lagus, T. Kohonen, WEBSOM-self organizing maps of document collections, Neurocomputing, 21 (1998) 101-117.MATHCrossRefGoogle Scholar
  5. 5.
    J. C. Lamirel, Y. Toussaint, C. Francois, X. Polanco, Using artificial neural networks for mapping of science and technology: application to patents analysis, M. Davis, C. S. Wilson (Eds), Proceedings of ISSI 2001, Sydney, Australia, July 2001, pp. 339-353.Google Scholar
  6. 6.
    J. C. Lamirel, Y. Toussaint, Combining symbolic and numeric techniques for digital libraries contents classification and analysis, Proceedings of First DELOS Network of Excellence Workshop, Zurich, December 2000.Google Scholar
  7. 7.
    J. C. Lamirel, Application d'une approche symbolico-connexionniste pour la conception d'un système documentaire hautement interactif, Thèse de l'Université de Nancy 1 Henri Poincaré, 1995.Google Scholar
  8. 8.
    L. Lebart, A. Morineau, J. P. FÉnelon, Traitement des données statistiques, Dunod, Paris, France, 1982.Google Scholar
  9. 9.
    A. Lelu, A. Georgel, Neural models for orthogonal and oblique factor analysis: Towards dynamic data analysis of large sets of highly multidimensional objects, Proceedings of IJCNN, Paris, France, 1990, pp. 829-832.Google Scholar
  10. 10.
    L. A. Mather, A linear algebra measure of cluster quality, Journal of the American Society for Information Science, 51(7) (2000) 602-613.CrossRefGoogle Scholar
  11. 11.
    M. A. Ould Mahamed Yahya, Comparaison de méthodes neuronales avec des méthodes d'analyse des données dans le cadre d'ingénierie de l'information, Mémoire de stage de D.E.S.S. en “Ingénierie mathématique et outils informatiques”, Centre Elie Cartan, Université de Nancy I, France, 1997.Google Scholar
  12. 12.
    X. Polanco, J. C. Lamirel, C. Francois, Using artificial neural networks for mapping of science and technology: A multi self-organizing maps approach, Scientometrics, 51(1) (2001) 267-292.CrossRefGoogle Scholar
  13. 13.
    S. E. Robertson, K. Sparck-Jones, Relevance weighting of search terms, Journal of the American Society for Information Science, 27 (1976) 129-146.Google Scholar
  14. 14.
    C. Rham (De), La classification hiérarchique ascendante selon la méthode des voisins réciproques, Les cahiers de l'analyse de données, 5(2) (1980) 135-144.Google Scholar
  15. 15.
    G. Salton, The SMART Retrieval System: Experiments in Automatic Document Processing, Prentice Hall Inc., Englewood Cliffs, New Jersey, 1971.Google Scholar
  16. 16.
    SOM papers, http://www.cis.hut.fi/nnrc/refs/Google Scholar
  17. 17.
    C. J. Van Rijsbergen, Information Retrieval, Butterworths, London, England, 1975.Google Scholar
  18. 18.
    Web: http://www.cl.cam.ac.ukGoogle Scholar

Copyright information

© Kluwer Academic Publisher/Akadémiai Kiadó 2004

Authors and Affiliations

  • Jean-Charles Lamirel
    • 1
  • Claire Francois
    • 2
  • Shadi Al Shehabi
    • 1
  • Martial Hoffmann
    • 2
  1. 1.LORIA Vandoeuvre-lès-Nancy (France
  2. 2.URI/INIST-CNRS Vandoeuvre-lès-Nancy (France

Personalised recommendations