Protein Sequence Databases
With the availability of almost 150 completed genome sequences from both eukaryotic and prokaryotic organisms, efforts are now being focused on the identification and functional analysis of the proteins encoded by these genomes. The rapidly emerging field of proteomics, the large-scale analysis of these proteins, has started to generate huge amounts of data as a result of the new information provided by the genome projects and by a range of new technologies in protein science. For example, mass spectrometry approaches are being used in protein identification and in determining the nature of posttranslational modifications (1, and large-scale yeast two-hybrid screens provide valuable data about protein-protein interactions (2. These and other methods now make it possible to quickly identify large numbers of proteins in a complex, to map their interactions in a cellular context, to determine their location within the cell, and to analyze their biological activities. Protein sequence databases play a vital role as a central resource for storing the data generated by these efforts and making them freely available to the scientific community. Data from large-scale experiments are often no longer published in a conventional sense but are deposited in a database. This means that protein sequence databases are the most comprehensive resource of information on proteins available to scientists.
- 11.Dayhoff, M. O. (1978) Atlas of Protein Sequence and Structure Vol. 5Supplement 3. National Biomedical Research Foundation, Washington, DC.Google Scholar
- 30.Butler, D.(2002) NIH pledges cash for global protein database. Nature 419, 101.Google Scholar