Skip to main content

GENCODE Annotation for the Human and Mouse Genome: A User Perspective

  • Chapter
  • First Online:
Practical Guide to Life Science Databases

Abstract

The GENCODE project provides comprehensive annotation of the functional elements in human and mouse genomes with high accuracy. The annotations are released for the benefit of biomedical and genomic research domain. In this initiative, we have provided a basic user manual or roadmap to facilitate the exploration of GENCODE annotation. We have provided a brief history of GENCODE and the general working principles that GENCODE adopts for their annotation. Then, we have introduced few workflows to guide users in the extraction and exploration of GENCODE resources for downstream analysis. The structure of this chapter is as follows. We started by introducing the GENCODE from a historical perspective, the needs and objectives that led to its creation, and being one of the most reliable sources for human and mouse genome functional elements. Afterward, we provided an overview of the GENCODE database. Mainly, different types of annotated genes, their description, basic statistics, and how they were created with emphasis on the latest four releases. Following this database overview, we described different annotation methods adopted by the GENCODE consortium for both human and mouse genomes along with validation methods. Besides GENCODE annotation methods, the user can find GENCODE annotation data format fields and definitions as they appear in the GTF and GFF3 files. Then we described three different ways to access GENCODE annotations via the GENCODE portal, Ensembl Genome Browser, and UCSC Genome Browser. We concluded with three use cases showcasing how to explore the GENCODE annotation for answering research questions. Source code, interactive user guide, and other files are made available for users at https://github.com/smusleh/BookChapterGENCODE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 409.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 519.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 519.00
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.sanger.ac.uk/tool/zmap/

  2. 2.

    https://sonnhammer.sbc.su.se/Dotter.html

  3. 3.

    https://www.sanger.ac.uk/tool/otter/

  4. 4.

    https://www.sanger.ac.uk/tool/seqtools/

  5. 5.

    https://www.gencodegenes.org/pages/tags.html

  6. 6.

    http://asia.ensembl.org/info/website/upload/gff.html

  7. 7.

    https://www.gencodegenes.org/pages/data_format.html

  8. 8.

    https://www.gencodegenes.org/pages/biotypes.html

  9. 9.

    https://www.gencodegenes.org/pages/tags.html

  10. 10.

    https://github.com/openvax/gtfparse

  11. 11.

    https://github.com/daler/gffutils

  12. 12.

    https://biojava.org/

  13. 13.

    https://bioperl.org/

  14. 14.

    https://asia.ensembl.org/info/docs/webcode/mirror/install/ensembl-data.html

  15. 15.

    http://asia.ensembl.org/info/data/export.html

  16. 16.

    https://rest.ensembl.org/documentation/info/data

  17. 17.

    http://asia.ensembl.org/info/data/biomart/biomart_r_package.html

  18. 18.

    http://asia.ensembl.org/info/data/biomart/biomart_restful.html

  19. 19.

    http://asia.ensembl.org/info/data/biomart/biomart_perl_api.html

  20. 20.

    http://genome.cse.ucsc.edu/cgi-bin/hgTables

  21. 21.

    BEDOPS: the fast, highly scalable and easily-parallelizable genome analysis toolkit — BEDOPS v2.4.39

  22. 22.

    https://bedops.readthedocs.io/en/latest/content/reference/file-management/conversion/gff2bed.html#downloads

  23. 23.

    http://ftp.ebi.ac.uk/pub/databases/gencode/covid19_trackhub/data/

References

Download references

Acknowledgement

Funding: None

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tanvir Alam .

Editor information

Editors and Affiliations

1.1 Electronic Supplementary Material

Supplementary Data 1.1

Interactive user guide highlighting different ways to access GENCODE annotation (HTML 10 kb)

Supplementary Data 1.2

Shell commands for the use case 1 (SH 4 kb)

Supplementary Data 1.3

MALAT1 gene and associated transcripts and exons (GFF3 22 kb)

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Musleh, S., Alazmi, M., Alam, T. (2021). GENCODE Annotation for the Human and Mouse Genome: A User Perspective. In: Abugessaisa, I., Kasukawa, T. (eds) Practical Guide to Life Science Databases. Springer, Singapore. https://doi.org/10.1007/978-981-16-5812-9_1

Download citation

Publish with us

Policies and ethics