Creating Datasets

  • Pandjassarame Kangueane


Data is the key in biological knowledge discovery. The data used in discovery is specific and specialized to a specific issue in cell and molecular biology. This is generally achieved by creating datasets of specific nature. Here, we discuss the importance of biological datasets in information gleaning and describe procedures for specialized dataset creation. The creation of data subsets for human leukocyte antigen (HLA) peptide binding, HLA–peptide structures, HLA class I and class II grouping of structures with peptides, protein subunit interactions, homodimers, heterodimers, homodimer folding into categories, fusion proteins, intron-containing genes in eukaryotes and intronless genes in eukaryotes, is described in this chapter.


Data Dataset Subset Source Derived Grouping Class Features Analysis Molecule specific HLA MHC Peptide Protein subunit Interactions Intron Intronless Folding 


  1. Alexander J, Del Guercio MF, Fikes JD, et al. Recognition of a novel naturally processed, A2 restricted, HCV-NS4 epitope triggers IFN-gamma release in absence of detectable cytopathicity. Hum Immunol. 1998;12:776–782.CrossRefGoogle Scholar
  2. Chang KM, Gruener NH, Southwood S, et al. Identification of HLA-A3 and HLA-B7-restricted CTL response to hepatitis C virus in patients with acute and chronic hepatitis C. J Immunol. 1999;162:1156–1164.PubMedGoogle Scholar
  3. Chen W, Khilko S, Fecondo J, et al. Determinant selection of major histocompatibility complex class I-restricted antigenic peptides is explained by class I-peptide affinity and is strongly influenced by nondominant anchor residues. J Exp Med. 1994;180:1471–1483.CrossRefPubMedGoogle Scholar
  4. Den Haan JM, Meadows LM, Wang W, et al. The minor histocompatibility antigen HA-1: A diallelic gene with a single amino acid polymorphism. Science. 1998;279:1054–1057.CrossRefGoogle Scholar
  5. Gao Y, Wang R, Lai L. Structure-based method for analyzing protein-protein interfaces. J Mol Model. 2004;10:44–54.CrossRefPubMedGoogle Scholar
  6. Gianfrani C, Oseroff C, Sidney J, et al. Human memory CTL response specific for influenza A virus is broad and multispecific. Hum Immunol. 2000;61:438–452.CrossRefPubMedGoogle Scholar
  7. Henrick K, Thornton JM. PQS: a protein quaternary structure file server. Trends Biochem Sci. 1998;23:358–361.CrossRefPubMedGoogle Scholar
  8. Kawashima I, Hudson SJ, Tsai V, et al. Multi-epitope approach for immunotherapy for cancer: identification of several CTL epitopes from various tumor-associated antigens expressed on solid epithelial tumors. Hum Immunol. 1998;59:1–14.CrossRefPubMedGoogle Scholar
  9. Jones S, Thornton JM. Principles of protein-protein interactions. Proc Natl Acad Sci U S A. 1996;93:13–20.CrossRefPubMedGoogle Scholar
  10. Laskowski RA. SURFNET: a program for visualizing molecular surfaces, cavities and intermolecular interactions. J Mol Graph. 1995;13:323–330.CrossRefPubMedGoogle Scholar
  11. Lauvau G, Kakimi K, Niedermann G, et al. Human transporters associated with antigen processing (TAPs) select epitope precursor peptides for processing in the endoplasmic reticulum and presentation to T cells. J Exp Med. 1999;190:1227–1240.CrossRefPubMedGoogle Scholar
  12. Lee B, Richard FM. The interpretation of protein structures: estimation of static accessibility. J Mol Biol. 1971;55:379–400.CrossRefPubMedGoogle Scholar
  13. Livingston BD, Crimi C, Fikes J, et al. Immunization with the HBV core 18–27 epitope elicits CTL responses in humans expressing different HLA-A2 supertype molecules. Hum Immunol. 1999;60:1013–1017.CrossRefPubMedGoogle Scholar
  14. McDonald IK, Thornton JM. Satisfying hydrogen bonding potential in proteins. J Mol Biol. 1994;238:777–793.CrossRefPubMedGoogle Scholar
  15. Nukaya I, Yasumoto M, Iwasaki T, et al. Identification of HLA-A24 epitope peptides of carcinoembryonic antigen which induce tumor-reactive cytotoxic T lymphocyte. Int J Cancer. 1999;80:92–97.CrossRefPubMedGoogle Scholar
  16. Rechenmann F. From data to knowledge. Bioinformatics. 2000;16:411.CrossRefPubMedGoogle Scholar
  17. Sakharkar MK, Kangueane P. Genome SEGE: A database for ‘intronless’ genes in eukaryotic genomes. BMC Bioinformatics. 2004;5:67.CrossRefPubMedGoogle Scholar
  18. Service RF. Structural genomics offers high-speed look at proteins. Science. 2000;287:1954–1956.CrossRefPubMedGoogle Scholar
  19. Sette A, Sidney J, del Guercio MF, et al. Peptide binding to the most frequent HLA-A class I alleles measured by quantitativemolecular binding assays. Mol Immunol. 1994;31:813–822.CrossRefPubMedGoogle Scholar
  20. Yiting Y, Chaturvedi I, Liew KM, et al. Can ends justify the means? Digging deep for human fusion genes of prokaryotic origin. Front Biosci. 2004;9:2964–2971.CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Biomedical Informatics Irulan Chandai AnnexPondicherryIndia

Personalised recommendations