OntoPPI: Towards Data Formalization on the Prediction of Protein Interactions

  • Yasmmin Cortes MartinsEmail author
  • Maria Cláudia CavalcantiEmail author
  • Luis Willian Pacheco ArgeEmail author
  • Artur ZivianiEmail author
  • Ana Tereza Ribeiro de VasconcelosEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1057)


The Linking Open Data (LOD) cloud is a global data space for publishing and linking structured data on the Web. The idea is to facilitate the integration, exchange, and processing of data. The LOD cloud already includes a lot of datasets that are related to the biological area. Nevertheless, most of the datasets about protein interactions do not use metadata standards. This means that they do not follow the LOD requirements and, consequently, hamper data integration. This problem has impacts on the information retrieval, specially with respect to datasets provenance and reuse in further prediction experiments. This paper proposes an ontology to describe and unite the four main kinds of data in a single prediction experiment environment: (i) information about the experiment itself; (ii) description and reference to the datasets used in an experiment; (iii) information about each protein involved in the candidate pairs. They correspond to the biological information that describes them and normally involves integration with other datasets; and, finally, (iv) information about the prediction scores organized by evidence and the final prediction. Additionally, we also present some case studies that illustrate the relevance of our proposal, by showing how queries can retrieve useful information.


Protein interaction ontology Biological dataset Linked open data Prediction experiment 



This work was partially funded by CAPES, CNPq, and FAPERJ.


  1. 1.
    Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. Int. J. Semant. Web Inf. Syst. 5(3), 1–22 (2009)CrossRefGoogle Scholar
  2. 2.
    Cannataro, M., Guzzi, P.H., Veltri, P.: Using ontologies for querying and analysing protein-protein interaction data. Procedia Comput. Sci. 1(1), 997–1004 (2010)CrossRefGoogle Scholar
  3. 3.
    Chang, J.W., Zhou, Y.Q., Ul Qamar, M., Chen, L.L., Ding, Y.D.: Prediction of protein-protein interactions by evidence combining methods. Int. J. Mol. Sci. 17(11), 1946 (2016)CrossRefGoogle Scholar
  4. 4.
    Cuevas-Vicenttín, V., et al.: ProvONE: a PROV extension data model for scientific workflow provenance. DataOne Project (2014)Google Scholar
  5. 5.
    De Las Rivas, J., Fontanillo, C.: Protein-protein interactions essentials: key concepts to building and analyzing interactome networks. PLoS Comput. Biol. 6(6), e1000807 (2010)CrossRefGoogle Scholar
  6. 6.
    Demir, E., et al.: The biopax community standard for pathway data sharing. Nat. Biotechnol. 28(9), 935 (2010)CrossRefGoogle Scholar
  7. 7.
    Esteves, D., et al.: MEX vocabulary: a lightweight interchange format for machine learning experiments. In: Proceedings of the 11th International Conference on Semantic Systems, pp. 169–176. ACM (2015)Google Scholar
  8. 8.
    TWSW Group: Sparql 1.1 overview (2013). Accessed 02 Dec 2015
  9. 9.
    Guzzi, P.H., Mina, M., Guerra, C., Cannataro, M.: Semantic similarity analysis of protein data: assessment with biological features and issues. Brief. Bioinform. 13(5), 569–585 (2011)CrossRefGoogle Scholar
  10. 10.
    Kazemzadeh, L., Kamdar, M.R., Beyan, O.D., Decker, S., Barry, F.: LinkedPPI: enabling intuitive, integrative protein-protein interaction discovery. In: Proceedings of the 4th Workshop on Linked Science 2014 - Making Sense Out of Data (LISC 2014) co-located with the 13th International Semantic Web Conference (ISWC 2014), Riva del Garda, Italy, 19 October 2014, pp. 48–59 (2014)Google Scholar
  11. 11.
    Li, Y., Ilie, L.: Sprint: ultrafast protein-protein interaction prediction of the entire human interactome. BMC Bioinform. 18(1), 485 (2017)CrossRefGoogle Scholar
  12. 12.
    Mosca, R., Céol, A., Stein, A., Olivella, R., Aloy, P.: 3DID: a catalog of domain-based interactions of known three-dimensional structure. Nucleic Acids Res. 42(D1), D374–D379 (2013)CrossRefGoogle Scholar
  13. 13.
    Newman, A., Hunter, J., Li, Y.F., Bouton, C., Davis, M.: BioMANTA ontology: the integration of protein-protein interaction data (2008)Google Scholar
  14. 14.
    Perfetto, L., et al.: Causaltab: Psi-mitab 2.8 updated format for signaling data representation and dissemination. BioRxiv, p. 385773 (2018)Google Scholar
  15. 15.
    Sicilia, M.Á., García-Barriocanal, E., Sánchez-Alonso, S., Mora-Cantallops, M., Cuadrado, J.-J.: Ontologies for data science: on its application to data pipelines. In: Garoufallou, E., Sartori, F., Siatri, R., Zervas, M. (eds.) MTSR 2018. CCIS, vol. 846, pp. 169–180. Springer, Cham (2019). Scholar
  16. 16.
    Studer, R., Benjamins, V.R., Fensel, D.: Knowledge engineering: principles and methods. Data Knowl. Eng. 25(1–2), 161–197 (1998)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.National Laboratory of Scientific ComputingPetrópolisBrazil
  2. 2.Military Institute of EngineeringRio de JaneiroBrazil

Personalised recommendations