Creating Datasets for Bioinformation

  • Pandjassarame Kangueane


Data is the key in biological knowledge discovery. The data used in discovery is specific and specialized to a specific issue in cell and molecular biology. This is generally achieved by creating datasets of specific nature. Here, we discuss the importance of biological datasets in gleaning information and describe procedures for the creation of specialized datasets. We describe the creation of several datasets for biological knowledge discovery using examples.


Data Dataset Subset Source Derived Grouping Class Features Analysis Molecule-specific Human leukocyte antigen Peptide Protein subunit Interactions Intron Intronless Folding 


  1. Alexander J, Del Guercio MF, Fikes JD et al (1998) Recognition of a novel naturally processed, A2 restricted, HCV-NS4 epitope triggers IFN-gamma release in absence of detectable cytopathicity. Hum Immunol 59(12):776–782CrossRefPubMedGoogle Scholar
  2. Berthonneau E, Mirande M (2000) A gene fusion event in the evolution of aminoacyl-tRNA synthetases. FEBS Lett 470:300–304CrossRefPubMedGoogle Scholar
  3. Chang KM, Gruener NH, Southwood S et al (1999) Identification of HLA-A3 and HLA-B7-restricted CTL response to hepatitis C virus in patients with acute and chronic hepatitis C. J Immunol 162:1156–1164PubMedGoogle Scholar
  4. Chen W, Khilko S, Fecondo J et al (1994) Determinant selection of major histocompatibility complex class I-restricted antigenic peptides is explained by class I-peptide affinity and is strongly influenced by nondominant anchor residues. J Exp Med 180:1471–1483CrossRefPubMedGoogle Scholar
  5. Den Haan JM, Meadows LM, Wang W et al (1998) The minor histocompatibility antigen HA-1: a diallelic gene with a single amino acid polymorphism. Science 279:1054–1057CrossRefGoogle Scholar
  6. Gianfrani C, Oseroff C, Sidney J et al (2000) Human memory CTL response specific for influenza A virus is broad and multispecific. Hum Immunol 61:438–452CrossRefPubMedGoogle Scholar
  7. Henrick K, Thornton JM (1998) PQS: a protein quaternary structure file server. Trends Biochem Sci 23:358–361CrossRefPubMedGoogle Scholar
  8. Kan JL, Moran RG (1997) Intronic polyadenylation in the human glycinamide ribonucleotide formyltransferase gene. Nucleic Acids Res 25:3118–3123CrossRefPubMedPubMedCentralGoogle Scholar
  9. Kawashima I, Hudson SJ, Tsai V et al (1998) Multi-epitope approach for immunotherapy for cancer: identification of several CTL epitopes from various tumor-associated antigens expressed on solid epithelial tumors. Hum Immunol 59:1–14CrossRefPubMedGoogle Scholar
  10. Jones S, Thornton JM (1996) Principles of protein-protein interactions. Proc Natl Acad Sci U S A 93:13–20CrossRefPubMedPubMedCentralGoogle Scholar
  11. Lang D, Thoma R, Henn-Sax M et al (2000) Structural evidence for evolution of the beta/alpha barrel scaffold by gene duplication and fusion. Science 289:1546–1550CrossRefPubMedGoogle Scholar
  12. Laskowski RA (1995) SURFNET: a program for visualizing molecular surfaces, cavities and intermolecular interactions. J Mol Graph 13:323–330CrossRefPubMedGoogle Scholar
  13. Lauvau G, Kakimi K, Niedermann G et al (1999) Human transporters associated with antigen processing (TAPs) select epitope precursor peptides for processing in the endoplasmic reticulum and presentation to T cells. J Exp Med 190:1227–1240CrossRefPubMedPubMedCentralGoogle Scholar
  14. Lee B, Richard FM (1971) The interpretation of protein structures: estimation of static accessibility. J Mol Biol 55:379–400CrossRefPubMedGoogle Scholar
  15. Livingston BD, Crimi C, Fikes J et al (1999) Immunization with the HBV core 18-27 epitope elicits CTL responses in humans expressing different HLA-A2 supertype molecules. Hum Immunol 60:1013–1017CrossRefPubMedGoogle Scholar
  16. Long M (2000) A new function evolved from gene fusion. Genome Res 10:1655CrossRefPubMedGoogle Scholar
  17. Marcotte EM, Pellegrini M, Ng HL et al (1999) Detecting protein function and protein-protein interactions from genome sequences. Science 285:751–753CrossRefPubMedGoogle Scholar
  18. McDonald IK, Thornton JM (1994) Satisfying hydrogen bonding potential in proteins. J Mol Biol 238:777–793CrossRefPubMedGoogle Scholar
  19. Nukaya I, Yasumoto M, Iwasaki T et al (1999) Identification of HLA-A24 epitope peptides of carcinoembryonic antigen which induce tumor-reactive cytotoxic T lymphocyte. Int J Cancer 80:92–97CrossRefPubMedGoogle Scholar
  20. Rechenmann F (2000) From data to knowledge. Bioinformatics 16:411CrossRefPubMedGoogle Scholar
  21. Sakharkar MK, Kangueane P (2004) Genome SEGE: a database for ‘intronless’ genes in eukaryotic genomes. BMC Bioinformatics 5:67CrossRefPubMedPubMedCentralGoogle Scholar
  22. Service RF (2000) Structural genomics offers high-speed look at proteins. Science 287:1954–1956CrossRefPubMedGoogle Scholar
  23. Sette A, Sidney J, del Guercio MF et al (1994) Peptide binding to the most frequent HLA-A class I alleles measured by quantitative molecular binding assays. Mol Immunol 31:813–822CrossRefPubMedGoogle Scholar
  24. Yanai I, Derti A, DeLisi C (2001) Genes linked by fusion events are generally of the same functional category: a systematic analysis of 30 microbial genomes. Proc Natl Acad Sci U S A 98:7940–7945CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Pandjassarame Kangueane
    • 1
  1. 1.PondicherryIndia

Personalised recommendations