Abstract
Since their arrival in the 1960s, electronic databases have been an invaluable tool for informetricians. Databases and their delivery mechanism have provided both the source of raw data, as well as the analytical tools for many informetric studies. In particular, the citation databases produced by the Institute for Scientific Information have been the key source of data for a whole range of citation-based research. However, there are also many problems and challenges associated with the use of online databases. Most of the problems arise because databases are designed primarily for information retrieval purposes, and informetric studies represent only a secondary use of the systems. The sorts of problems encountered by informetricians include: errors or inconsistency in the data itself; problems with the coverage, overlap and changeability of the databases; as well as problems and limitations in the tools provided by the database hosts such as DIALOG. For some informetric studies, the only viable solution to these problems is to download the data and perform offline correction and data analysis.
Similar content being viewed by others
References
BOURKE, P., BUTLER, L. (1996a), Publication types, citation rates and evaluation. Scientometrics, 37: 473-494.
BOURKE, P., BUTLER, L. (1996b), Standards issues in a national bibliometric database: the Australian case. Scientometrics, 35: 199-207.
BOURNE, C. P. (1977), Frequency and impact of spelling errors in bibliographic databases. Information Processing & Management, 13: 1-12.
BRAUN, T., BROCKEN, M., GLäNZEL, W., RINIA, E., SCHUBERT, A. (1995), Hyphenation of databases in building scientometric indicators: Physics briefs, SCI based indicators of 13 European countries, 1980-1989. Scientometrics, 33: 131-148.
BRAUN, T., BUJDOSO, E., SCHUBERT, A. (1987), Literature of analytical chemistry: A scientometric evaluation. Boca Raton: CRC Press, Inc.
BROOKS, T. A. (1998), The Bibliometrics Toolbox. [Available at ftp://ftp.u.washington.edu/public/tabrooks/toolbox]
BURTON, H. D. (1988), Use of a virtual information system for bibliometric analysis. Information Processing & Management, 24: 39-44.
BYLER, A. M., RAVENHALL, M. (1988), Using Dialindex for the identification of online databases relevant to urban and regional-planning. Online Review, 12: 119-133.
Carpenter, M. P., NARIN, F. (1981), The adequacy of the Science Citation Index (SCI) as an indicator of international scientific activity. Journal of the American Society for Information Science, 32: 430-439.
CHRISTENSEN, F. H., INGWERSEN, P. (1996), Online citation analysis: a methodological approach. Scientometrics, 37: 39-62.
COILE, R. C. (1977), Error Detection in Computerized Information Retrieval Data Bases. Arlington, VA: Center for Naval Analyses.
CRONIN, B., ATKINS, H. B. (Eds) (2000), The Web of Knowledge: A Festschrift in Honor of Eugene Garfield. Medford, NJ, Information Today.
DE STRICKER, U., SERIO, S., CASEY, V. (1997), Information resources in Canada. Database, 20: 18-32.
DEOGAN, M. S. (1987), On-line bibliometrics. Lucknow Librarian, 19: 43-48.
DHYANI, D., NG, W. K., BHOWMICK, S. S. (2002), A survey of Web metrics. ACM Computing Surveys, 34 (4): 469-503.
DIALOG (2002a), DIALINDEX. http://library.dialog.com/pocketguide/pktgde.pdf. pp. 46.48. 8th July, 2002.
DIALOG (2002b), DIALOG Home Page. http://www.dialog.com. 8th July, 2002.
DIALOG (2002c), Duplicate Detection. http://library.dialog.com/pocketguide/pktgde.pdf. p. 30. 8th July, 2002.
DIALOG (2002d), OneSearch. http://library.dialog.com/pocketguide/pktgde.pdf. pp. 27.30. 8th July, 2002.
DIALOG (2002e), RANK Command. http://library.dialog.com/pocketguide/pktgde.pdf. pp. 17.21. 8th July, 2002.
DIALOG Bluesheets (2002), Databases in Alphabetical Order. http://library.dialog.com/bluesheets/html/blf.html. 17th July, 2002.
EGGHE, L. (1988), Concentration places, concentration evolutions, and online information retrieval techniques for calculating them. Information Processing & Management, 24: 109-121.
EPSTEIN, B. A., ANGIER, J. J. (1980), Multi-database searching in the behavioral sciences. Part 1: basic techniques and core databases. Database, 3: 9-15.
ERNEST, D. J., LANGE, H. R., HERRING, D. (1988), An online comparison of three library science databases. RQ, 28: 185-194.
EVANS, J. E. (1980), Database selection in an academic library: are those big multi-file searches really necessary? Online, 4: 35-43.
FEDOROWICZ, J. (1982), A Zipfian model of an automatic bibliographic system: an application to MEDLINE. Journal of the American Society for Information Science, 33: 223-232.
Gale Directory of Online, Portable, and Internet Databases. (2002), http://library.dialog.com/bluesheets/html/bl0230.html. 28th August, 2002.
GARFIELD, E. (1955), Citation indexes for science: A new dimension in documentation through association of ideas. Science, 122(3159), 108-111.
GARFIELD, E. (1990), The Russians are coming! Part 1. The red-hot 100 Soviet scientists, 1973-1988. In: Essays of an information scientist: Journalology, KeyWords Plus, and other Essays. Vol. 13, 202-215. Also available from: Current Contents. (24), 5-18, June 11, 1990.
HAAS, S., CLARK, M. (1992), Research journals and databases covering the field of agrochemicals and water pollution. Science and Technology Libraries, 13: 57-64.
HAWKINS, D. T. (1977), Unconventional uses of on-line information retrieval systems: on-line bibliometric studies. Journal of the American Society for Information Science, 28: 13-18
HAWKINS, D. T. (1978), Multiple database searching: techniques and pitfalls. Online, 2: 9-15.
HAWKINS, D. T. (1981), Machine-readable output from online searches. Journal of the American Society for Information Science, 32: 253-256.
HIBBS, J. E., BOBNER, R. R., NEWMAN, I., DYE, C. M., BENZ, C. R. (1984), How to use online databases to perform trend analysis in research. Online, 8: 59-64.
HOOD, W. W. (1998), An Informetric Study of the Distribution of Bibliographic Records in Online Databases: A Case Study Using the Literature of Fuzzy Set Theory (1965-1993), PhD dissertation. Sydney, The University of New South Wales. http://www.library.unsw.edu.au/∼thesis/adt-NUN/public/adt-NUN1999.0033.
HOOD, W. W., WILSON, C. S. (1992), An Analysis of the Indexing Used in the LISA Database. (Ed.), Kensington, Australia: The School of Information, Library and Archive Studies, University of New South Wales.
HOOD, W. W., WILSON, C. S. (1994), Indexing terms in the LISA database on CD-ROM. Information Processing & Management, 30: 327-342.
HOOD, W. W., WILSON, C. S. (1999), The distribution of bibliographic records in databases using different counting methods for duplicate records. Scientometrics, 46: 473-486.
HOOD, W. W., WILSON, C. S. (2001), The scatter of documents over databases in different subject domains: How many databases are needed. Journal of the American Society for Information Science and Technology, 54: 1242-1254.
HOOD, W. W., WILSON, C. S. (2002), Analysis of the fuzzy set literature using phrases. Scientometrics, 54: 103-118.
HOOD, W. W., WILSON, C. S. (2003), Overlap in bibliographic databases. Journal of the American Society for Information Science and Technology, (in press).
HUDNUT, S. K. (1993), Finding answers by the numbers: statistical analysis of online search results. In: M. E. WILLIAMS (Ed.), Proceedings of the 14th National Online Meeting, (pp. 209-219), Medford, NJ, Learned Information.
INGWERSEN, P. (1998), Personal Communication.
INGWERSEN, P., CHRISTENSEN, F. H. (1997), Data set isolation for bibliometric online analyses of research publications: fundamental methodological issues. Journal of the American Society for Information Science, 48: 205-217.
ISI (2002), Web of Science, http://www.isinet.com/isi/products/citation/wos/. 28th August, 2002.
JACSÓ, P. (1997), Content evaluation of databases. In: WILLIAMS, M. E. (Ed.) Annual Review of Information Science and Technology, Vol. 32. (pp. 231-267), Medford, NJ, Information Today.
JACSÓ, P. (1999), Database section tools. Online & CD-ROM Review. 23: 227-229.
LANCASTER, F. (1991), Bibliometric Methods in Assessing Productivity and Impact of Research. (Ed.), Bangalore, Sarada Ranganathan Endowment for Library Science.
LANCASTER, F. W., LEE, J.-L. (1985), Bibliometric techniques applied to issues management: A case study. Journal of the American Society for Information Science, 36: 389-397.
LANCASTER, F. W., MEHROTRA, R., OTSU, K. (1984), Some publication patterns in Indian and Japanese science: a bibliometric comparison. International Forum on Information and Documentation, 9: 11-16.
LUUKKONEN, T. (1989), Publish in a visible journal or perish? Assessing citation performance of Nordic cancer research. Scientometrics, 15: 349-367.
MARX, W., SCHIER, H., WANITSCHEK, M. (2001), Citation analysis using online databases: Feasibilities and shortcomings. Scientometrics, 52: 59-82.
MCGRATH, W. E. (1996), The unit of analysis (objects of study) in bibliometrics and Scientometrics. Scientometrics, 35: 257-264.
MIDORIKAWA, N., MIYAMOTO, S., NAKAYAMA, K. (1990) A view of studies on bibliometrics and related subjects in Japan. In: BORGMAN, C. L. (Ed.), Scholarly Communication and Bibliometrics. (pp. 73-83), Newbury Park, SAGE Publications.
MILLER, C. (1990), Detecting duplicates: a searcher.s dream come true. Online, 14: 27-34.
MIYAMOTO, S., MIDORIKAWA, N., NAKAYAMA, K. (1989), A view of studies on bibliometrics and related subjects in Japan. Communication Research, 16: 629-641.
MOED, H. F. (1988), The use of on-line databases in bibliometric analysis. In: L. EGGHE, R. ROUSSEAU (Eds), Informetrics 87/88. Select Proceedings of the First International Conference on Bibliometrics and Theoretical Aspects of Information Retrieval. (pp. 133-146), Netherlands, Elsevier.
MOED, H. F. (1989), Bibliometric measurement of research performance and Price.s theory of differences among sciences. Scientometrics, 15: 473-483.
NORTON, N. P. (1981), Dirty data-a call for quality control. Online, 5: 40-41.
OJALA, M. (1992), Quality online and online quality. (The Dollar $ign), Online, 16: 73-75.
OSAREH, F., WILSON, C. S. (2002), Collaboration in Iranian scientific publications. Libri, 52: 25-35.
PAO, M. L. (1989), Importance of quality data for bibliometric research. In: C. NIXON, L. PADGETT (Eds), National Online Meeting. Proceedings. (pp. 321-327), Medford, NJ, Learned Information.
PERSSON, O. (1986), Online bibliometrics. A research tool for everyman. Scientometrics, 10: 69-75.
PERSSON, O. (1988), Measuring scientific output by online techniques. In: VAN RAAN, A. F. J. (Ed.), Handbook of Quantitative Studies of Science and Technology. (pp. 229-252), Amsterdam, Elsevier Science.
PITERNICK, A. B. (1982), Standardization of journal titles in databases (letter to the editor), Journal of the American Society for Information Science, 33: 105.
PROVOST, F., NIEUWENHUYSEN, P. (1992), Measuring overlap of databases in water supply and sanitation using sampling and the binomial probability distribution. Scientometrics, 25: 201-208.
REID, E. O. F. (1992), Using online databases to analyze the development of a specialty: case study of terrorism. In: WILLIAMS, M. E. (Ed.), 13th National Online Meeting. (pp. 279-291), Medford, NJ, Learned Information.
ROY, D., HUGHES, J. P., JONES, A. S., Fenton, J. E. (2002), Citation analysis of otorhinolaryngology journals. Journal of Laryngology and Otology. 116(5): 363-366.
SANDISON, A. (1989), Thinking about citation analysis. Journal of Documentation, 45: 59-64.
SAARTI, J. (2001), Consistency of subject indexing of novels by public library professionals and patrons. Journal of Documentation. 58(1): 49-65.
SEGLEN, P. O. (1989), Evaluering av forskningskvalitet ved hjaelp af siteringsanalyse og andre bibliometriske metoder. In Norwegian. [Evaluation of research quality by means of citation analysis and other bibliometric methods]. Nordisk Medicin, 104, 331-335; 341-342.
SMITH, L. C. (1981), Citation analysis. Library Trends, 30: 83-106.
SNOW, B. (1993), RANK: A new tool for analyzing search results on DIALOG. Database, 16: 111-119.
STEFANIAK, B. (1987), Use of bibliographic data bases for scientometric studies. Scientometrics, 12: 149-161.
STERN, B. T. (1977), Evaluation and design of bibliographic data bases. In: M. E. WILLIAMS (Ed.), Annual Review of Information Science and Technology. (pp. 3-30), New York, Knowledge Industry Publications for American Society for Information Science.
TENOPIR, C. (1982), Distributions of citations in databases in a multidisciplinary field. Online Review, 6: 399-419.
THORNE, F. C. (1977), The citation index: another case of spurious validity. Journal of Clinical Psychology, 33: 1157-1161.
TORRICELLA-MORALES, R. G., VAN HOODYDONK, G., ARAUJO-RUIZ, J. A. (2000), Citation analysis of Cuban research. Part 1. A case study: the Cuban Journal of Agricultural Science. Scientometrics, 47: 413-426.
VAN CAMP, A. J. (1991), StarSearch for the health sciences. (Caduceus), Database, 14: 99-101.
WALKER, G. (1990), Searching the humanities-subject overlap and search vocabulary. Database, 13: 37-46.
WANGER, J. (1977), Multiple database use. Online, 1: 35-41.
WHITE, H. D. (1996), Literature retrieval for interdisciplinary synthesis. Library Trends, 45: 239-264.
WHITE, H. D. (2001), Computing a curriculum: Descriptor-based domain analysis for educations. Information Processing & Management, 37: 91-117.
WHITE, H. D., GRIFFITH, B. C. (1987), Quality of indexing in online data bases. Information Processing & Management, 23: 211-224.
WHITE, H. D., MCCAIN, K. W. (1989), Bibliometrics. In: WILLIAMS, M. E. (Ed.), Annual Review of Information Science and Technology, Vol. 24. (pp. 119-186), Amsterdam, The Netherlands, Elsevier Science Publishers B.V. for the American Society for Information Science.
WILLIAMS, M. E. (2002), The state of databases today: 2002. In: E. NAGEL (Ed.), Gale Directory of Databases. (pp. xvii-xxx) Detroit, Gale Group, Inc.
WILLIAMS, M. E., LANNOM, L. (1981), Lack of standardization of the journal title data element in databases. Journal of the American Society for Information Science, 32: 229-233.
WILSON, C. S. (1999), Informetrics. In: WILLIAMS, M. E. (Ed.), Annual Review of Information Science and Technology, Vol. 34. (pp. 107-247), Medford, NJ, Information Today.
WILSON, C. S., MARKUSOVA, V. A. (in prep.), The effect of politico-economic changes in Russia from 1980 to 2000 on its scientific output as reflected in the Science Citation Index.
WILSON, C. S., OSAREH, F. (2003), Science and research in Iran: A scientometric Study. Interdisciplinary Science Reviews, 28(1): 26-37.
WOLFRAM, D., CHU, C. M., LU, X. (1990), Growth of knowledge: bibliometric analysis using online database data. In: L. EGGHE, R. ROUSSEAU (Eds), Informetrics 89/90: Selection of Papers Submitted for the 2nd International Conference on Bibliometrics, Scientometrics and Informetrics, London, Ontario, Canada. (pp. 355-372), Amsterdam, The Netherlands, Elsevier.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hood, W.W., Wilson, C.S. Informetric studies using databases: Opportunities and challenges. Scientometrics 58, 587–608 (2003). https://doi.org/10.1023/B:SCIE.0000006882.47115.c6
Issue Date:
DOI: https://doi.org/10.1023/B:SCIE.0000006882.47115.c6