Skip to main content
Log in

Data mining in a closed Web environment

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

The need to understand the fabric of relationships that are building up on the World Wide Web calls for the application of tools that allow one to extract the underlying knowledge. Some of the most interesting relationships are those that are brought to light by co-linking analysis (the Web analogue of cocitation analysis). We here propose such an analysis based on the co-links that are generated within a closed web environment, using multivariate statistics (Principal Component Analysis, and Multidimensional Scaling) and a connection-based technique (Kohonen's Self-Organizing Maps). An application was made to a generic thematic environment, and the underlying relationships and structures were manifest in the interpretation of the results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • ALMIND, T. C., INGWERSEN, P. (1997), Informetric analyses on the World Wide Web: methodological approaches to ‘Webometrics’. Journal of Documentation, 53 (4): 404-426.

    Article  Google Scholar 

  • BJöRNEBORN, L., INGWERSEN, P. (2001). Perspectives of webometrics. Scientometrics, 50 (1): 65-82.

    Article  Google Scholar 

  • BOUDOURIDES, M. A., SIGRIT, B., ALEVIZOS, P. D. (1999), Webometrics and the self-organization of the European Information Society. Available at: http://hyperion.math.upatras.gr/webometrics (visited: 16 October 2000).

  • CHEN, H., HOUSTON, A., SEWELL, R., SCHATZ, B. (1998), Internet browsing and searching: user evaluations of category map and concept space techniques. Journal of the American Society for Information Science, 49 (7): 582-603.

    Google Scholar 

  • CHEN, H., COOPER, M. D. (2001), Using clustering techniques to detect usage patterns in a Web-based Information System. Journal of the American Society for Information Science & Technology, 52 (11): 888-904.

    Article  Google Scholar 

  • CHU, H., HE, S., THELWALL, M. (2002), Library and Information Science Schools in Canada and USA: a Webometric perspective. Journal of Education for Library and Information Science, 43 (2): 110-125.

    Google Scholar 

  • CRONIN, B. (2001), Bibliometrics and beyond: some thoughts on web-based citation analysis. Journal of Information Science, 27 (1): 1-7.

    Article  Google Scholar 

  • DING, Y., CHOWDHURY, G. G., FOO, S. (2000). Journals as markers of intellectual space: journals of the information retrieval area, 1987-1997. Scientometrics, 47 (1): 55-73.

    Article  Google Scholar 

  • EGGHE, L. (2000). New informetric aspects of the Internet: some reflections-many problems. Journal of Information Science, 26 (5): 329-335.

    Article  Google Scholar 

  • FABA-PéREZ, C. (2003), Análisis cibermétrico de la información WEB: el caso de Extremadura en Internet. PhD. Thesis, University of Granada, Spain.

    Google Scholar 

  • FUJIGAKI, Y. (1998). The citation system: citation networks as repeatedly focusing on difference, continuous re-evaluation, and as persistent knowledge accumulation. Scientometrics, 43 (1): 77-85.

    Article  Google Scholar 

  • GARCíA-SANTIAGO, M. D. (2001), Topología de la información en la World Wide Web: modelo experimental y bibliométrico en una red hipertextual nacional. PhD. Thesis, Univ. of Granada, Spain.

    Google Scholar 

  • GARFIELD, E. (1998), From Citation Indexes to Informetrics: Is the tail now wagging the dog? Libri, 48 (2): 67-80.

    Article  Google Scholar 

  • GUERRERO-BOTE, V. P. (1997), Redes Neuronales aplicadas a las Técnicas de Recuperación Documental, PhD. Thesis, University of Granada, Spain.

    Google Scholar 

  • GUERRERO-BOTE, V. P., MOYA-ANEGÓN, F. DE (2001), Reduction of the Dimension of a Document Space using the Fuzzified Output of a Kohonen Network. Journal of the American Society for Information Science, 52: 1234-1241.

    Article  Google Scholar 

  • GUERRERO-BOTE, V. P., MOYA-ANEGÓN, F. DE, HERRERO-SOLANA, V. (2002a), Document organization using Kohonen.s algorithm. Information Processing & Management, 38: 79-89.

    Article  MATH  Google Scholar 

  • GUERRERO-BOTE, V. P., MOYA-ANEGÓN, F. DE, HERRERO-SOLANA, V. (2002b), Automatic extraction of relationships between terms by means of Kohonen.s algorithm. Library & Information Science Research, 24: 235-250.

    Article  Google Scholar 

  • HARTER, S. P., FORD, C. E. (2000), Web-based analyses of e-journal impact: approaches, problems, and issues. Library Science With a Slant to Documentation and Information Studies, 51 (13): 1159-1176.

    Google Scholar 

  • HE, Y., HIU, S. C. (2002), Mining a Web Citation Database for author co-citation analysis. Information Processing and Management, 38: 491-508.

    Article  MATH  Google Scholar 

  • HERRERO-SOLANA, V. (2000), Modelos de representación visual de la información bibliográfica: aproximaciones multivariantes y conexionistas. PhD. Thesis, Univ. of Granada, Spain.

    Google Scholar 

  • HILERA, J. R., MARTíNEZ, V. J. (1995), Redes neuronales artificiales: fundamentos, modelos y aplicaciones. Madrid: RAMA.

    Google Scholar 

  • KASKI, S. (1999), Fast winner search for SOM-based monitoring and retrieval of high-dimensional data. In: Proceedings of the Ninth International Conference on Artificial Neural Networks (ICANN99). London: Institution of Electrical Engineers, pp. 940-945.

    Google Scholar 

  • KIM, H. J. (2000), Motivations for hyperlinking in scholarly electronic articles: a qualitative study. Journal of the American Society for Information Science, 51 (10): 887-899.

    Article  Google Scholar 

  • KOHONEN, T. (1982), Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43 (1): 59-69.

    Article  MATH  MathSciNet  Google Scholar 

  • KOHONEN, T. (1989), Self-Organization and Associative Memory. Berlin: Springer Verlag.

    Google Scholar 

  • KOHONEN, T. (1990), The self-organizing Map. In: Proceedings of the IEEE, pp. 1464-1480.

  • KOHONEN, T. (1995). Self-Organization Maps. Berlin, Heidelberg: Springer Verlag.

    Google Scholar 

  • KOHONEN, T., KASKI, S., LAGUS, K., SALOJARVI, J., HONKELA, J., PAATERO, V., SAARELA, A. (1999), Self-organization of a massive text document collection. In: OJA, E., KASKI, S. (Eds), Kohonen Maps. Amsterdam: Elsevier, pp. 171-182.

    Google Scholar 

  • LAGUS, K., KASKI, S. (1999), Keyword selection method for characterizing text document maps. In: Proceedings of the Ninth International Conference on Artificial Neural Networks (ICANN99). London: Institution of Electrical Engineers, pp. 371-376.

    Google Scholar 

  • LAGUS, K., HONKELA, T., KASKI, S., KOHONEN, T. (1999), WEBSOM for textual data mining. Artificial Intelligence Review, 13 (5-6): 345-364.

    Article  Google Scholar 

  • Larsen, J., L. K. HANSEN, A. SZYMKOWIAK, T. CHRISTIANSEN, T. KOLENDA (2002), Webmining learning from the World Wide Web. Computational Statistics & Data Analysis, 38 (4): 517-532.

    Article  MATH  MathSciNet  Google Scholar 

  • LARSON, R. R. (1996), Bibliometrics of the World Wide Web: an exploratory analysis of the intellectual structure of cyberspace. In: HARDIN, S. (Ed.). Proceedings of the 59th Annual Meeting of the American Society for Information Science (Baltimore, Maryland, 1996), Information Today, Medford, New Jersey, pp. 71-78. Available at: http://sherlock.berkeley.edu/asis96/asis96.html (visited: 14 October 2000).

    Google Scholar 

  • LIN, X. (1997), Maps displays for information retrieval. Journal of the American Society for Information Science, 48 (1): 40-54.

    Article  Google Scholar 

  • LÓPEZ-LÓPEZ, P. (1996), Introducción a la Bibliometría. Valencia: Promolibro.

    Google Scholar 

  • MARSHAKOVA, V. (1973), System of document connections based on references. Nauchno-Tekhnichescaya Informatisya, Series II (6): 3-8.

    Google Scholar 

  • MCKIERNAN, G. (1996). CitedSites(sm): Citation Indexing of Web Resources. Available at: http://www.public.iastate.edu/∼CYBERSTACKS/Cited.htm (visited: 24 February 2000).

  • MILMAN, B. L. (1994), Individual co-citation clusters as nuclei of complete and dynamic informetric models of scientific and technological areas. Scientometrics, 31 (1): 45-57.

    Article  MathSciNet  Google Scholar 

  • MOYA-ANEGÓN, F., MOSCOSO, P., OLMEDA, C., ORTIZ-REPSIO, V., HERRERO-SOLANA, V., GUERRERO-BOTE, V. P. (1999), NeuroISOC: un modelo de red neuronal para la representación del conocimiento. In: LÓPEZ-HUERTAS, M. J., FERNáNDEZ-MOLINA, J. C. (Eds), La representación y la organización del conocimiento en sus distintas perspectivas: su influencia en la recuperación de información. Actas del IV Congreso ISKO-España (EOCONSID-99). Granada: ISKO-España, pp. 151-156.

    Google Scholar 

  • MOYA-ANEGÓN, F. DE, JIMéNEZ-CONTRERAS, E., MONEDA-CORROCHANO, M. DE LA (1998), Research fronts in library and information science in Spain (1985-1994). Scientometrics, 42 (2): 229-246.

    Article  Google Scholar 

  • PERSSON, O. (1994), The intellectual base and research fronts of JASIS 1986-1990. Journal of the American Society for Information Science, 45 (1): 31-38.

    Article  Google Scholar 

  • PRICE, D. J. DE SOLLA. (1970), Citation measures of hard science, soft science, technology and non-science. In: NELSON, C. C., POLLOCK, D. E. (Eds), Communication among scientists and engineers. Lexington, Mass.: D. C. Health and Co., pp. 3-22.

    Google Scholar 

  • ROUSSEAU, R. (1997). Sitations: an exploratory study. Cybermetrics: International Journal of Scientometrics, Informetrics and Bibliometrics, Vol. 1. Available at: http://www.cindoc.csic.es/cybermetrics/articles/v1i1p1.html (visited: 5 September 2000).

  • SMALL, H. (1973). Co-citation in the scientific literature: a new measure of the relationship between two documents. Journal of the American Society for Information Science, 24 (4): 265-269.

    Google Scholar 

  • SMALL, H., SWEENEY, E. (1985), Clustering the Science Citation Index using co-citations: 2-mapping science, Scientometrics, 8 (5-6): 321-340.

    Article  Google Scholar 

  • SMITH, A. G. (1999), The impact of web sites: a comparison between Australasia and Latin America. Available at: http://www.vuw.ac.nz/∼agsmith/publns/austlat/ (visited: 14 May 2001).

  • VAN DER BESSELAAR, P., LEYDESDORFF, L. (1997), Mapping change in scientific specialties: a scientometric reconstruction of the development of Artificial Intelligence. Journal of the American Society for Information Science, 47 (6): 415-436.

    Article  Google Scholar 

  • VAN RAAN, A. F. J. (1991), Fractal geometry of information space as represented by co-citation clustering. Scientometrics, 20 (3): 439-449.

    Article  Google Scholar 

  • VAN RAAN, A. F. J. (2001), Bibliometrics and Internet: some observations and expectations. Scientometrics, 50 (1): 59-63.

    Article  Google Scholar 

  • VREELAND, R. C. (2000), Law libraries in hyperspace: a citation analysis of World Wide Web sites. Law Library Journal, 92 (1): 9-25.

    Google Scholar 

  • WHITE, H. D. (1981), Cocited author retrieval online: an experiment with the social indicators literature. Journal of the American Society for Information Science, 32 (1): 16-21.

    Google Scholar 

  • WHITE, H. D. (1983), A cocitation of the social indicators movement. Journal of the American Society for Information Science, 34 (5): 307-312.

    Google Scholar 

  • WHITE, H. D., GRIFFITH, B. C. (1981), Author cocitation: a literature measure of intellectual structure. Journal of the American Society for Information Science, 32 (3): 163-171.

    Google Scholar 

  • WHITE, H. D., MCCAIN, K. W. (1997), Visualization of literatures. In: WILLIAM, M. (Ed.). Annual Review of Information Science and Technology. Medford: Information Today, pp. 99-168.

    Google Scholar 

  • WHITE, H. D., MCCAIN, K. W. (1998), Visualizing a discipline: an author co-citation analysis of information science, 1872-1995. Journal of the American Society for Information Science, 49 (4): 327-355.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vicente P. Guerrero-Bote.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Faba-Pérez, C., Guerrero-Bote, V.P. & De Moya-Anegón, F. Data mining in a closed Web environment. Scientometrics 58, 623–640 (2003). https://doi.org/10.1023/B:SCIE.0000006884.08036.73

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:SCIE.0000006884.08036.73

Keywords

Navigation