Abstract.
Although the sequencing of the human genome is complete, identification of encoded genes and determination of their structures remain a major challenge. In this report, we introduce a method that effectively uses full-length mouse cDNAs to complement efforts in carrying out these difficult tasks. A total of 61,227 RIKEN mouse cDNAs (21,076 full-length and 40,151 EST sequences containing certain redundancies) were aligned with the draft human sequences. We found 35,141 non-redundant genomic regions that showed a significant alignment with the mouse cDNAs. We analyzed the structures and compositional properties of the regions detected by the full-length cDNAs, including cross-species comparisons, and noted a systematic bias of GENSCAN against exons of small size and/or low GC-content. Of the cDNAs locating the 35,141 genomic regions, 3,217 did not match any sequences of the known human genes or ESTs. Among those 3,217 cDNAs, 1,141 did not show any significant similarity to any protein sequence in the GenBank non-redundant protein database and thus are candidates for novel genes.
Similar content being viewed by others
Author information
Authors and Affiliations
Additional information
Received: 18 January 2001 / Accepted: 17 May 2001
Rights and permissions
About this article
Cite this article
Kondo, S., Shinagawa, A., Saito, T. et al. Computational analysis of full-length mouse cDNAs compared with human genome sequences. Mammalian Genome 12, 673–677 (2001). https://doi.org/10.1007/s00335-001-2048-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00335-001-2048-4