Skip to main content
Log in

The characterisation of text corpora using an input/output schema for citations

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Univariate measures of concentration (or dispersion) can be applied to the description of the citation patterns within a text corpus, and also the citation links between that corpus and an alternative (possibly contextual) literature. To assist in this, a simple data-flow schema introduced byLano to assist with the design of software can be used to provide an appropriate data-definitional tool. The schema, as applied here, comprises: (1) a matrix of cells containing 0 or 1 values (in its non-diagonal cells) representing within-corpus citations, with the diagonal cells representing the corpus documents; and (2) two associated vectors of cells which record the total numbers of citations that link the corpus documents with an external-to-corpus literature. An initial data-exploration based on a application of this schema to a trial document corpus is reported. On this basis, several provisional conjectures are put forward to attract further research on data of this type. These conjectures include: (1) Concentration amongst citationsto corpus items from within a young corpus is less than it is amongst citationsby corpus items to that corpus; (2) A young literature corpus imports significantly more information from its external world than it exports to it; and (3) Information transfer from an into contextual literature dominates within-corpus information transfer. The author emphasises that these are conjectures at this stage, not hypotheses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. C. L. Borgman (Ed.),Scholarly communication and bibliometrics. Beverly Hills, Sage, 1990.

    Google Scholar 

  2. L. Egghe, R. Rousseau (Eds),Informetrics 89/90: Selection of Papers Submitted for the Second International Conference on Bibliometrics, Scientometrics and Informetrics, London, Ontario, Canada, 5–7 July 1989, Amsterdam, Elsevier, 1990.

    Google Scholar 

  3. S. C. Bradford, Sources of information on specific subjects,Engineering, 137 (1934) 85–86.

    Google Scholar 

  4. B. C. Brookes, Bradford's Law and the bibliography of science,Nature, 224 (1969) 953–955.

    Google Scholar 

  5. A. E. Cawkell, Understanding science by analysing its literature,Information Scientist, 10 (1978) 3–10.

    Google Scholar 

  6. E. Garfield, Citation indexes in sociological and historical research,American Documentation, 14 (1963) 289–291.

    Google Scholar 

  7. E. Garfield, I. E. Sher, R. J. Torpie,The Use of Citation Data in Writing the History of Science, Institute of Scientific Information, Philadelphia, 1964.

    Google Scholar 

  8. W. Goffman, V. A. Newill, Communication and epidemic processes,Proceedings of the Royal Society, A298 (1967) 316–334.

    Google Scholar 

  9. B. C. Griffith et al., The structure of scientific literatures II: Toward a macro- and microstructure for science,Science Studies, 4 (1974) 339–365.

    Google Scholar 

  10. M. Kochen,Integrative Mechanisms in Literature Growth, Westport, Conn., Greenwood, 1974.

    Google Scholar 

  11. J. Margolis, Citation indexing and evaluation of scientific papers,Science, 155 (1967) 1213–1219.

    Google Scholar 

  12. A. Mendez, Some considerations on the retrieval of literature based on citations,Information Scientist, 12 (1978) 67–71.

    Google Scholar 

  13. D. de Solla Price, Networks of scientific papers,Science, 149 (1965) 510–515.

    Google Scholar 

  14. J. Vlachy, Scientometric analyses in physics — where we stand,Czechoslovak Journal of Physics, B36 (1986) 1–13.

    Google Scholar 

  15. P. Zunde, Structural models of complex information sources,Information Storage and Retrieval, 7 (1971) 1–8.

    Google Scholar 

  16. W. Glänzel, A. Schubert (1990) The cumulative advantage function. In:L. Egghe, R. Rousseau (Eds), (1990)Informetrics 89/90: Selection of Papers Submitted for the Second International Conference on Bibliometrics, Scientometrics and Informetrics, London, Ontario, Canada, 5–7 July 1989; 139–147. Amsterdam, Elsevier, 1990.

    Google Scholar 

  17. L. Egghe, R. Rousseau, Elements of concentration theory. In:L. Egghe, R. Rousseau (Eds)Informetrics 89/90: Selection of Papers Submitted for the Second International Conference on Bibliometrics, Scientometrics and Informetrics, London, Ontario, Canada, 5–7 July 1989; 97–137. Amsterdam, Elsevier, 1990.

    Google Scholar 

  18. M. H. Heine, Indices of literature dispersion based on qualitative attributes,Journal of Documentation, 34 (1978) 175–188.

    Google Scholar 

  19. A. D. Pratt, A measure of class concentration in bibliometrics,Journal of the American Society for Information Science, 28 (1977) 285–292.

    Google Scholar 

  20. R. J. Lano,A technique for software and system design. New York, North-Holland, 1979. (Republished from hisThe N 2 chart. TRW Defense and Space Systems Group, Redonda Beach, Ca., 1977, as its Report TRW-SS-77-04).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Revised version of a paper presented to the Fourth International Conference on Bibliometrics, Informetrics and Scientometrics, Berlin, 13–18 September, 1993.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Heine, M.H. The characterisation of text corpora using an input/output schema for citations. Scientometrics 32, 177–194 (1995). https://doi.org/10.1007/BF02016893

Download citation

  • Received:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02016893

Keywords

Navigation