Abstract
Knowledge graphs are emerging as one of the most popular means for data federation, transformation, integration and sharing, promising to improve data visibility and reusability. Immunogenetics is the branch of life sciences that studies the genetics of the immune system. Although the complexity and the connected nature of immunogenetics data make knowledge graphs a prominent choice to represent and describe immunogenetics entities and relations, hence enabling a plethora of applications, little effort has been directed towards building and using such knowledge graphs so far. In this work, we present the IMGT Knowledge Graph (IMGT-KG), the first of its kind FAIR knowledge graph in immunogenetics. IMGT-KG acquires and integrates data from different immunogenetics databases, hence creating links between them. Consequently, IMGT-KG provides access to 79 670 110 triplets with 10 430 268 entities, 673 concepts and 173 properties. IMGT-KG reuses many existing terms from domain ontologies or vocabularies and provides external links to other resources of the same domain, as well as a set of rules to guide inference on nucleotide sequence positions by applying Allen Interval Algebra. Such inference allows, for example, reasoning about genomics sequence positions. IMGT-KG fills in the gap between genomics and protein sequences and opens a perspective to effective queries and integrative immuno-omics analyses. We make openly and freely available IMGT-KG with detailed documentation and a Web interface for access and exploration.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
A feature is a region in a sequence—a succession of nucleotide or amino acids—with coordinates (start and end value) and a label.
- 3.
https://www.imgt.org/imgt-kg/, gives access to the entire IMGT®database.
- 4.
Successions of amino acids.
- 5.
Either an insertion of nucleotide, either a deletion of nucleotide or substitution of nucleotide.
- 6.
- 7.
- 8.
- 9.
- 10.
RDF, RDFS, and OWL.
- 11.
Uniform Resource Identifier.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
Java Persistence Query Language.
- 21.
A nucleotide sequence consists of many features with a position and IMGT label.
- 22.
- 23.
- 24.
We plan to communicate our results and resources to the biological community.
- 25.
- 26.
- 27.
- 28.
- 29.
References
Allen, J.F., Hayes, P.J.: Moments and points in an interval-based temporal logic. Comput. Intell. 5(3), 225–238 (1989). https://doi.org/10.1111/j.1467-8640.1989.tb00329.x
Ashburner, M., et al.: Gene ontology: tool for the unification of biology (2000). https://doi.org/10.1038/75556, http://www.flybase.bio.indiana.edu, http://fruitfly.bdgp.berkeley.edu, http://www.genome.stanford.edu, http://www.informatics.jax.org
Berners-Lee, T.: Linked Data’s rule (2006). https://www.w3.org/DesignIssues/LinkedData.html
Berners-Lee, T., Hendler, J., Lassila, O.: The semantic web. Sci. Am. 284(5), 34–43 (2001). https://doi.org/10.1038/scientificamerican0501-34
Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. Int. J. Semant. Web Inf. Syst. 5(3), 1–22 (2009). https://doi.org/10.4018/jswis.2009081901
Bolleman, J.T., et al.: FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation. J. Biomed. Seman. 7(1), 1–12 (2016). https://doi.org/10.1186/s13326-016-0067-z
Chen, C., et al.: Protein ontology on the semantic web for knowledge discovery. Sci. Data 7(1) (2020). https://doi.org/10.1038/s41597-020-00679-9
Ehrenmann, F., Giudicelli, V., Duroux, P., Lefranc, M.P.: IMGT/collier de perles: IMGT standardized representation of domains (IG, TR, and IgSF variable and constant domains, MH and MhSF groove domains). Cold Spring Harb. Protoc. 6(6), 726–736 (2011). https://doi.org/10.1101/pdb.prot5635
Eilbeck, K., et al.: The sequence ontology: a tool for the unification of genome annotations. Genome Biol. 6(5) (2005). https://doi.org/10.1186/gb-2005-6-5-r44
Giudicelli, V.: IMGT/LIGM-DB, the IMGT(R) comprehensive database of immunoglobulin and T cell receptor nucleotide sequences. Nucleic Acids Res. 34(90001), D781–D784 (2006). https://doi.org/10.1093/nar/gkj088
Giudicelli, V., Chaume, D., Lefranc, M.P.: IMGT/GENE-DB: a comprehensive database for human and mouse immunoglobulin and T cell receptor genes. Nucleic Acids Res. 33(Database Iss.), 256–261 (2005). https://doi.org/10.1093/nar/gki010
Giudicelli, V., Lefranc, M.P.: IMGT-Ontology 2012. Front. Genet. 3(May), 1–16 (2012). https://doi.org/10.3389/fgene.2012.00079
Lefranc, M.P., et al.: IMGT R, the international ImMunoGeneTics information system R 25 years on. Nucleic Acids Res. 43(D1), D413–D422 (2015). https://doi.org/10.1093/nar/gku1056. http://www.imgt.org
Manso, T., et al.: IMGT® databases, related tools and web resources through three main axes of research and development. Nucleic Acids Res. 50(D1), D1262–D1272 (2022). https://doi.org/10.1093/nar/gkab1136
Nguyen, D.Q.: A survey of embedding models of entities and relationships for knowledge graph completion. In: Graph-Based Natural Language Processing (TextGraphs 2020), pp. 1–14 (2021). https://doi.org/10.18653/v1/2020.textgraphs-1.1
Pojero, F., et al.: The role of immunogenetics in covid-19 (2021). https://doi.org/10.3390/ijms22052636
Rossi, A., Barbosa, D., Firmani, D., Matinata, A., Merialdo, P.: Knowledge graph embedding for link prediction: a comparative analysis. ACM Trans. Knowl. Discov. Data 15(2) (2021). https://doi.org/10.1145/3424672, http://arxiv.org/abs/2002.00819
Smith, B., et al.: Relations in biomedical ontologies. Genome Biol. 6(5) (2005). https://doi.org/10.1186/gb-2005-6-5-r46
Xiang, Z., Courtot, M., Brinkman, R.R., Ruttenberg, A., He, Y.: OntoFox: web-based support for ontology reuse. BMC Res. Notes 3 175 (2010). https://doi.org/10.1186/1756-0500-3-175, http://www.biomedcentral.com/1756-0500/3/175
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sanou, G., Giudicelli, V., Abdollahi, N., Kossida, S., Todorov, K., Duroux, P. (2022). IMGT-KG: A Knowledge Graph for Immunogenetics. In: Sattler, U., et al. The Semantic Web – ISWC 2022. ISWC 2022. Lecture Notes in Computer Science, vol 13489. Springer, Cham. https://doi.org/10.1007/978-3-031-19433-7_36
Download citation
DOI: https://doi.org/10.1007/978-3-031-19433-7_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19432-0
Online ISBN: 978-3-031-19433-7
eBook Packages: Computer ScienceComputer Science (R0)