Analyzing protein interactions and protein functions is crucial for the analysis of complex biological processes as well as the consequences from aberrant gene products [1]. Protein-Protein Interaction Networks (PPIN) are invaluable means enabling scientists to get a global understanding of interactomes, while analyzing individual protein functions [2]. High-throughput experiments and complete literature mining analyses have been used to deliver well-structured data into scientific databases reporting on proteins, their interactions and functions. These repositories form a precious resource to scientists, but only cover a portion of the proteome and often underrepresented alternative splice forms [3].

Alternative splicing (AS) is a cellular process that produces from a single gene different physical variants of a given protein which may differ in its structure or its function. This process produces molecular variability and contributes to the complexity of the proteomes and their interactomes. The analysis of AS shows that this process is most relevant to molecular regulation processes. In this research, we attempt to identify functional variability linked to alternative splice forms within their PPINs from the scientific literature. For this purpose, we gather AS events and analyze the transcript data for 16,826 different genes from the HumanSDB3 database [4, 5]. We have collected around 4 million abstracts from NCBI’s PubMed by utilizing a rich search term set for each individual isoform by using Gene DB, Swissprot DB and synonym generation. We then utilize an SVM classifier which uses in-domain features together with standard term weights and have trained it on the BioCreative-II IAS corpus (81.31% F1-measure on test set) for selecting those abstracts which are likely to contain interaction data. Finally, we employ another SVM classifier based on syntactic features and have trained it on the AIMed corpus (F1-measure of 54.20% cross validation performance) for extracting PPI information from the selected abstracts. The obtained PPIN comprises a total of 31,819 distinct interactions between 7,161 distinct proteins out of which 5,615 are considered to represent an isoform from HumanSDB3.

To the best of our knowledge, neither a genome-wide PPIN for the human protein isoforms has been built nor has their variability concerning interactions and functions been analyzed. Currently, we analyze the distribution of functional annotations based on the GO terms from the literature for all the isoforms. Both PPIN and functional annotations of the isoforms will be suitable for identifying potential interactions or functional variations of AS. Our findings are linked to the HumanSDB3 database and will be available through a publicly accessible web interface for further use.