Skip to main content
Log in

Delineation of the genomics field by hybrid citation-lexical methods: interaction with experts and validation process

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

In advanced methods of delineation and mapping of scientific fields, hybrid methods open a promising path to the capitalisation of advantages of approaches based on words and citations. One way to validate the hybrid approaches is to work in cooperation with experts of the fields under scrutiny. We report here an experiment in the field of genomics, where a corpus of documents has been built by a hybrid citation-lexical method, and then clustered into research themes. Experts of the field were associated in the various stages of the process: lexical queries for building the initial set of documents, the seed; citation-based extension aiming at reducing silence; final clustering to identify noise and allow discussion on border areas. The analysis of experts’ advices show a high level of validation of the process, which combines a high-precision and low-recall seed, obtained by journal and lexical queries, and a citation-based extension enhancing the recall. This findings on the genomics field suggest that hybrid methods can efficiently retrieve a corpus of relevant literature, even in complex and emerging fields.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. Consortium: ADIS (Université Paris-Sud), Lereco (INRA), OST.

  2. See http://dnapatents.georgetown.edu/.

  3. Probabilistic index = observed cell population c(ij)/expected cell population where expected cell population = c(i.)c(.j)/c(..).

  4. AKM is implemented in the commercial software Neuronav by Diatopie (S. Aubin, www.diatopie.com). It is enhanced it with a basic but robust and efficient term-extracting sequence.

References

  • Aksnes, D. W., Olsen, T. B., & Seglen, P. O. (2000). Validation of bibliometric indicators in the field of microbiology: A Norwegian case study. Scientometrics, 49(1), 7–22.

    Article  Google Scholar 

  • Archambault, É., Gingras,Y., Godin, B. & Vallières F. (1999). Characterization of genomics in Canada—a bibliometric study of scientific articles and research grants 1995–1997. Prepared for Genome Canada by OST. 19 pp.

  • Bassecoulard, E., Lelu, A., & Zitt, M. (2007). Mapping nanosciences by citation flows: a preliminary analysis. Scientometrics, 70(3), 859–880.

    Article  Google Scholar 

  • Basu,A., & Lewison, G. (2006). Visualization of a scientific community of Indian origin in the US: A case study of bioinformatics and genomics. International Workshop on Webometrics, Informetrics and Scientometrics & seventh COLLNET meeting, 10–12 May 2006, Nancy.

  • Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer Networks and Isdn Systems, 30(1–7), 107–117.

    Article  Google Scholar 

  • Debruin, R. E., & Moed, H. F. (1993). Delimitation of scientific subfields using cognitive words from corporate addresses in scientific publications. Scientometrics, 26(1), 65–80.

    Article  Google Scholar 

  • Garfield, E. (1967). Primordial concepts, citation indexing and historio-bibliography. Journal Library History, 2, 235–249.

    Google Scholar 

  • International Human Genome Sequencing Consortium (IHGSC). (2004). Finishing the euchromatic sequence of the human genome. Nature, 431(7011), 931–945.

    Article  Google Scholar 

  • Lelu, A. (1994). Clusters and factors: neural algorithms for a novel representation of huge and highly multidimensional data sets. In E. Diday & Y. Lechevallier (Eds.), New approaches in classification and data analysis (pp. 241–248). Berlin: Springer-Verlag.

    Google Scholar 

  • Lelu, A., & François, C. (1992). Automatic generation of hypertext links in information retrieval systems. In D. L. al (Ed.), Proceedings of ECHT’92 (Milano) (pp. 112–121). New York: ACM Press.

    Google Scholar 

  • McKusick, V. A., & Ruddle, F. H. (1987). A new discipline, a new name, a new journal. Genomics, 1(1), 1–2.

    Article  Google Scholar 

  • Rinia, E. J., Delange, C., & Moed, H. F. (1993). Measuring national output in physics—delimitation problems. Scientometrics, 28(1), 89–110.

    Article  Google Scholar 

  • van Leeuwen, T. N., van der Wurff, L. J., & van Raan, A. F. J. (2001). The use of combined bibliometric methods in research funding policy. Research Evaluation, 10, 195–201.

    Article  Google Scholar 

  • Zitt, M., & Bassecoulard, E. (2006). Delineating complex scientific fields by an hybrid lexical-citation method: An application to nanosciences. Information Processing & Management, 42(6), 1513–1531.

    Article  Google Scholar 

  • Zitt M., Lelu A., & Bassecoulard E. (2008). Hybrid maps of scientific fields (terms and citations): an application to nanosciences. In J. Gorraiz & Schiebel, E. (Eds.), Excellence and emergence. A new challenge for the combination of quantitative and qualitative approaches. 10th International conference on science & technology indicators. Book of abstracts. Vienna, Austria, 17–20 September, 2008 (pp. 53–56). Vienna (AUT): Austrian Research Centers GmbH.

Download references

Aknowledgments

This work is part of the CSTG project launched by Antoine Schoen and Bertrand Bellon, a project supported by ANR. The authors are indebted to Sylvain Aubin (DIATOPIE) for term extraction and clustering. They also thank the panel of experts for their commitment: P. Bessières (INRA), A. Lecharny (CNRS-Evry), M. Pinto (Université Paris XI), MM JP Rousset and M Dubow (IGMORS-Université Paris XI), P. Wincler (Genoscope). We also benefited from the experience in the field of Bérangère Virlon (OST). The authors are solely responsible for the views expressed in this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patricia Laurens.

Appendices

Annex I List of core journals

ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS

GENOME

BIOINFORMATICS

GENOME BIOLOGY

BMC BIOINFORMATICS

GENOME RESEARCH

BMC GENOMICS

GENOMICS

BRIEFINGS IN BIOINFORMATICS

JOURNAL OF PROTEOME RESEARCH

COMPARATIVE AND FUNCTIONAL GENOMICS

MAMMALIAN GENOME

CURRENT GENOMICS

MOLECULAR & CELLULAR PROTEOMICS

CYTOGENETIC AND GENOME RESEARCH

MOLECULAR GENETICS AND GENOMICS

DNA REPAIR

PHARMACOGENETICS AND GENOMICS

DNA RESEARCH

PHARMACOGENOMICS

DNA SEQUENCE

PHARMACOGENOMICS JOURNAL

EXPERT REVIEW OF PROTEOMICS

PHYSIOLOGICAL GENOMICS

GENES CHROMOSOMES & CANCER

PROTEOMICS

Annex II List of themes

The clusters were labelled arbitrarly from M1 to M50 during the clustering process. The most central terms related to each cluster are shown to point out its thematic content

Core genomics

Extension share: high

 

M18/Population_genomics

M32/Marker/RAPD/AFLP/Polymorphism

M20/Resistance/Resistance_genes/Plant_&_Trout_resistance

M40/QTL/Trait/Mapping/Polymorphism

M31/LOH/Tumor_suppressor/Genome_&_Cancer

M47/Species/Phylogeny/Evolutionary_genomics

Extension share: average

 

M 3/Plant_genomics/Transgenic_plants

M25/Patient/Disease_genomics/Biomarkers/Pharmacogenomics

M 4/DNA_sequence/Satellite

M27/Evolution/Evolutionary_genomics

M 5/Strain/Microbial_genomics

M28/Cancer/Genome_&_cancer

M 6/Cell_identity_&_Gene_expression

M35/C-DNA/Transcription/C-DNA_library

M 8/Alignment/Bioinformatics

M36/Polymorphism

M12/Network/Biological_networks/Model

M43/Mouse/Murine_genomics

M14/Locus/Microsatellite_locus/Polymorphism

M44/Expression/Cell_identity_&_Gene_expression

M15/Cell_line/Tumor/Genome_&_Cancer

M45/LOD/Linkage_analysis/Polymorphism

M16/Spectrometry/Proteomics

M46/Human/Primate/Gene_annotation/Comparative_genomics

M22/Human/C-DNA/Gene_annotation

M48/C57BL/Congenic_strains/Murine_genomics

M23/Exon/Genomic_organization/Gene_annotation

 

Extension share: low

 

M 1/Human_genome/Human_genome_project

M17/Map/Linkage_maps/Polymorphism

M 9/Genome/

M24/System/Systems_biology/Bioinformatics

M10/Comparative_genomic_hybridization/Tumor

M38/Genome/Genome_sizes

M11/SNPs/Polymorphism

 

Border themes

Extension share: high

 

M 2/Translocation/FISH/leukemia

M21/Hybrid/Somatic_hybrids/Ferility

Extension share: average

 

M13/Transcriptional/Saccharomyces_cerevisiae/Transcriptome

M50/Virus/Virus_replication/Virus_recombinatiio,

M26/Virus/Nucleotide_sequence

 

Noisy, mostly Non genomics

 

Extension share: high

 

M30/Mutant/Mutagenesis

 

Extension weight: average

 

M 7/Enzyme/Escherichia_Coli

M37/Cell/DNA_damage

M19/Repair/DNA_damage

M39/DNA/Arrays/Genomic techniques

M29/Promoter/Transcription

M41/Signaling/Kinase/MAPK

M33/RNA-/Virus

M42/Mutation/Missence_mutation

M34/PCR/Methods/applications

M49/Residue/Amino_acid_sequence

Rights and permissions

Reprints and permissions

About this article

Cite this article

Laurens, P., Zitt, M. & Bassecoulard, E. Delineation of the genomics field by hybrid citation-lexical methods: interaction with experts and validation process. Scientometrics 82, 647–662 (2010). https://doi.org/10.1007/s11192-010-0177-9

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-010-0177-9

Keywords

Navigation