Abstract
Univariate measures of concentration (or dispersion) can be applied to the description of the citation patterns within a text corpus, and also the citation links between that corpus and an alternative (possibly contextual) literature. To assist in this, a simple data-flow schema introduced byLano to assist with the design of software can be used to provide an appropriate data-definitional tool. The schema, as applied here, comprises: (1) a matrix of cells containing 0 or 1 values (in its non-diagonal cells) representing within-corpus citations, with the diagonal cells representing the corpus documents; and (2) two associated vectors of cells which record the total numbers of citations that link the corpus documents with an external-to-corpus literature. An initial data-exploration based on a application of this schema to a trial document corpus is reported. On this basis, several provisional conjectures are put forward to attract further research on data of this type. These conjectures include: (1) Concentration amongst citationsto corpus items from within a young corpus is less than it is amongst citationsby corpus items to that corpus; (2) A young literature corpus imports significantly more information from its external world than it exports to it; and (3) Information transfer from an into contextual literature dominates within-corpus information transfer. The author emphasises that these are conjectures at this stage, not hypotheses.
Similar content being viewed by others
References
C. L. Borgman (Ed.),Scholarly communication and bibliometrics. Beverly Hills, Sage, 1990.
L. Egghe, R. Rousseau (Eds),Informetrics 89/90: Selection of Papers Submitted for the Second International Conference on Bibliometrics, Scientometrics and Informetrics, London, Ontario, Canada, 5–7 July 1989, Amsterdam, Elsevier, 1990.
S. C. Bradford, Sources of information on specific subjects,Engineering, 137 (1934) 85–86.
B. C. Brookes, Bradford's Law and the bibliography of science,Nature, 224 (1969) 953–955.
A. E. Cawkell, Understanding science by analysing its literature,Information Scientist, 10 (1978) 3–10.
E. Garfield, Citation indexes in sociological and historical research,American Documentation, 14 (1963) 289–291.
E. Garfield, I. E. Sher, R. J. Torpie,The Use of Citation Data in Writing the History of Science, Institute of Scientific Information, Philadelphia, 1964.
W. Goffman, V. A. Newill, Communication and epidemic processes,Proceedings of the Royal Society, A298 (1967) 316–334.
B. C. Griffith et al., The structure of scientific literatures II: Toward a macro- and microstructure for science,Science Studies, 4 (1974) 339–365.
M. Kochen,Integrative Mechanisms in Literature Growth, Westport, Conn., Greenwood, 1974.
J. Margolis, Citation indexing and evaluation of scientific papers,Science, 155 (1967) 1213–1219.
A. Mendez, Some considerations on the retrieval of literature based on citations,Information Scientist, 12 (1978) 67–71.
D. de Solla Price, Networks of scientific papers,Science, 149 (1965) 510–515.
J. Vlachy, Scientometric analyses in physics — where we stand,Czechoslovak Journal of Physics, B36 (1986) 1–13.
P. Zunde, Structural models of complex information sources,Information Storage and Retrieval, 7 (1971) 1–8.
W. Glänzel, A. Schubert (1990) The cumulative advantage function. In:L. Egghe, R. Rousseau (Eds), (1990)Informetrics 89/90: Selection of Papers Submitted for the Second International Conference on Bibliometrics, Scientometrics and Informetrics, London, Ontario, Canada, 5–7 July 1989; 139–147. Amsterdam, Elsevier, 1990.
L. Egghe, R. Rousseau, Elements of concentration theory. In:L. Egghe, R. Rousseau (Eds)Informetrics 89/90: Selection of Papers Submitted for the Second International Conference on Bibliometrics, Scientometrics and Informetrics, London, Ontario, Canada, 5–7 July 1989; 97–137. Amsterdam, Elsevier, 1990.
M. H. Heine, Indices of literature dispersion based on qualitative attributes,Journal of Documentation, 34 (1978) 175–188.
A. D. Pratt, A measure of class concentration in bibliometrics,Journal of the American Society for Information Science, 28 (1977) 285–292.
R. J. Lano,A technique for software and system design. New York, North-Holland, 1979. (Republished from hisThe N 2 chart. TRW Defense and Space Systems Group, Redonda Beach, Ca., 1977, as its Report TRW-SS-77-04).
Author information
Authors and Affiliations
Additional information
Revised version of a paper presented to the Fourth International Conference on Bibliometrics, Informetrics and Scientometrics, Berlin, 13–18 September, 1993.
Rights and permissions
About this article
Cite this article
Heine, M.H. The characterisation of text corpora using an input/output schema for citations. Scientometrics 32, 177–194 (1995). https://doi.org/10.1007/BF02016893
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF02016893