OntoPPI: Towards Data Formalization on the Prediction of Protein Interactions
The Linking Open Data (LOD) cloud is a global data space for publishing and linking structured data on the Web. The idea is to facilitate the integration, exchange, and processing of data. The LOD cloud already includes a lot of datasets that are related to the biological area. Nevertheless, most of the datasets about protein interactions do not use metadata standards. This means that they do not follow the LOD requirements and, consequently, hamper data integration. This problem has impacts on the information retrieval, specially with respect to datasets provenance and reuse in further prediction experiments. This paper proposes an ontology to describe and unite the four main kinds of data in a single prediction experiment environment: (i) information about the experiment itself; (ii) description and reference to the datasets used in an experiment; (iii) information about each protein involved in the candidate pairs. They correspond to the biological information that describes them and normally involves integration with other datasets; and, finally, (iv) information about the prediction scores organized by evidence and the final prediction. Additionally, we also present some case studies that illustrate the relevance of our proposal, by showing how queries can retrieve useful information.
KeywordsProtein interaction ontology Biological dataset Linked open data Prediction experiment
This work was partially funded by CAPES, CNPq, and FAPERJ.
- 4.Cuevas-Vicenttín, V., et al.: ProvONE: a PROV extension data model for scientific workflow provenance. DataOne Project (2014)Google Scholar
- 7.Esteves, D., et al.: MEX vocabulary: a lightweight interchange format for machine learning experiments. In: Proceedings of the 11th International Conference on Semantic Systems, pp. 169–176. ACM (2015)Google Scholar
- 8.TWSW Group: Sparql 1.1 overview (2013). https://www.w3.org/TR/sparql11-overview/. Accessed 02 Dec 2015
- 10.Kazemzadeh, L., Kamdar, M.R., Beyan, O.D., Decker, S., Barry, F.: LinkedPPI: enabling intuitive, integrative protein-protein interaction discovery. In: Proceedings of the 4th Workshop on Linked Science 2014 - Making Sense Out of Data (LISC 2014) co-located with the 13th International Semantic Web Conference (ISWC 2014), Riva del Garda, Italy, 19 October 2014, pp. 48–59 (2014)Google Scholar
- 13.Newman, A., Hunter, J., Li, Y.F., Bouton, C., Davis, M.: BioMANTA ontology: the integration of protein-protein interaction data (2008)Google Scholar
- 14.Perfetto, L., et al.: Causaltab: Psi-mitab 2.8 updated format for signaling data representation and dissemination. BioRxiv, p. 385773 (2018)Google Scholar
- 15.Sicilia, M.Á., García-Barriocanal, E., Sánchez-Alonso, S., Mora-Cantallops, M., Cuadrado, J.-J.: Ontologies for data science: on its application to data pipelines. In: Garoufallou, E., Sartori, F., Siatri, R., Zervas, M. (eds.) MTSR 2018. CCIS, vol. 846, pp. 169–180. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-14401-2_16CrossRefGoogle Scholar