Background

From the beginning of scientific discoveries, it has been central to understand the causes of disease, pain, and senescence. Over the centuries, quests for the answers have led us to take giant leaps. It was only in the last century that the discovery of antibiotics freed us from many of the dreaded diseases of the past. Today, we stand on the threshold of a new medical revolution, just as big and far-reaching. Despite all our scientific knowledge, medicine still faces several critical and conflicting challenges. One of the challenges is the transition from a disease-based model to a patient-oriented approach as much of medicine is still based on symptomatic treatments. Disease classification is routinely derived from different streams of healthcare unit data, which includes imaging, pathology, genomics, electrophysiology, and others [1]. Incorporating genetic information assists in producing individual treatment solutions, rather than what works for the average person, and understanding who is at risk for critical diseases like diabetes, high blood pressure, or cancer. This allows for rapid disease at an early stage, accurate characterization of disease, and preventive measures needed before the disease even appears. Also, timely discovery and association of genetic variants with diseases can help develop a more effective therapy tailored to an individual’s precise genetic makeup and reduces adverse drug reactions. Occasionally, technological advancements in genomics have revolutionized the field with gene number proposition, genetic mapping, data banks, gene-disease maps, catalogues of human genes and genetic disorders, big data, and next generation sequencing (NGS) [2]. As biological data accumulates at larger scales and at exponential rates, with higher-throughput and lower-cost DNA sequencing technologies, it has become essential to develop innovative, smart, and modern bioinformatics applications to help improve research quality. New tools provide a progressive understanding of heterogeneous genomics and clinical findings and facilitate increased clinical utilization of information in these databases and translation to healthcare.

The word “Gene” was introduced over 100 years ago [3], and its meaning has progressively evolved in several scientific directions [4,5,6]. A gene is a segment of DNA sequence that carries genetic information defining a biological function and can be transferred from parent to offspring [7, 8]. Most human genes have a discontinuous structure, with the protein coding regions, or exons, interrupted by non-coding regions, or introns [9, 10]. For some time, many researchers used a broad estimate of gene count at more than 50,000 genes including 21,000 protein-coding genes [11]. However, this number has repeatedly been overturned with advancements in genetics and genomics research. A major goal of medical genetics is to identify genes that when altered lead to human disease, but not all recognizable DNA sequence alterations result in disease [12]. Most alterations, or mutations, are simple differences called single nucleotide polymorphisms (SNPs) that may not change the expression or coding of a gene, but some specific mutations can change gene instructions, and ultimately create a protein malfunction, which may cause disease. If we can identify which genetic variations are associated with specific diseases, we will be better equipped to find new treatments and even cures.

Today, scientists have identified genetic mutations responsible for thousands of conditions, such as cancer, hypertension, and heart disease that affect millions of people. These associations were not easily deciphered, because they are often impacted by interactions between dozens of different genes, many of which are caused by single gene elements or the environment. To identify the genetic signatures of these complex common elements, scientists may have to profile the genetic signatures of thousands of people, even multiple populations, and not just a few individuals. However, studying the genome and epigenome (chemically-modified genome) [13] has led to the fundamentals of development and progression of human diseases [14], which are characterized as multifactorial, mitochondrial [15], chromosomal [16], and monogenic [17] diseases. All human diseases are maintained by the World Health Organization (WHO) with the standard creation of International Classification of Diseases (ICD) codes. With the emergence of next-generation gene sequencing, numerous databases have surfaced for gene annotation, which claim to provide information about genes and link them to related diseases (e.g., Disease Ontology [18], DiseaseEnhancer [19], DISEASES [20], DisGeNET [21], eDGAR [22], GeneCard [23], GTR [24], MalaCard [25], OMIM [26], miR2Disease [27], HGMD [28], DNetDB [29], ClinVar [30], Orphanet, Gene2Function, etc.), and are accessed through web and desktop interfaces. These databases are useful, but none of them contain up-to-date genome and disease data in a standardized format and accessible through a single application platform.

One platform that has proven to be an efficient tool in several areas including healthcare, is the smartphone application. As smart devices have become increasingly popular, there is still no iOS app publicly available that can provide unified access to genomic databases with easy navigation and free portable access to genes and related diseases for efficient and robust classifications. The reasons could be extensive heterogeneity of clinical and genomic data collection and management, and addressing complexities of implementing an Apple mobile app. Developing such a mobile repository, can assist healthcare providers, researchers, and pharmaceutical companies to integrate their health information systems inter-organizationally, develop clinical decision-support systems for disease state management, perform effective comparisons between studies, and enable the quick identification of patients for inclusion to intervention and observational studies. The objectives of our research is to create a centralized gene-disease database, which not only stores, organizes, and shares data in a structured and searchable manner but also facilitates data retrieval with a smartphone application.

Implementation

Developing an iOS app is an unorthodox bioinformatics application development process, especially when it is expected to be installed in all models of the available iPhone and iPad devices working with timely and latest versions of operating systems installed. It is even more complex when it needs to connect to the external web-based database servers for data acquisitions utilizing internet resources, with imposed stringent security conditions by the host organization. One of the most difficult and complex tasks of implementing an iOS app connecting a mobile interface via web programmed modules to the database server for data exchange is the integration of all modules developed using different programming languages and processed through different compilers/interpreters on a single platform. This often leads to complicated logical errors that are hard to resolve.

PROMIS-APP-SUITE (PAS)—Gen (Fig. 1) is an iOS app developed with Swift programming language, using the XCODE (Version 10.2.1 (10E1001)) integrated development environment for MacOS. We designed the human interface of PAS-Gen following Apple’s recommended design principles, which include Aesthetic Integrity, Consistency, Direct Manipulation, Feedback, Metaphors, and User Control. The front end of all the graphical user interfaces (scenes) were designed and connected using XCODE’s built-in Storyboard. The backend of all the screens were programmed in Swift programming language, mainly importing UIKit. The database of PAS-Gen was modelled and implemented within the MySQL database management system, which was publicly hosted via Apache HTTP Server. PAS-Gen database includes human reference genomes collected from different genomics databases worldwide, including ClinVar [30], GeneCards [23], DISEASES [20], HGMD [28], OMIM [26], GTR [24], CNVD [31], Ensembl [32], GenCode [33], Novoseek, Swiss-Prot, LncRNADisease, and Orphanet. None of these databases provide a mobile interface for usage. PAS-Gen design is very flexible, and can accommodate new releases and updates of genes and diseases without requiring its users to install a new version (Fig. 1). Dynamic web-based modules (pages) were developed using the PHP scripting language to facilitate data migration between the iOS app screens and MySQL database server (Fig. 2). The design is based on product line architecture (PLA) [34,35,36], modelled on the Butterfly model [37, 38], with all major modules implemented following software engineering principles, which are capable of performing individual key roles and can assimilate in a large-scale project. During development, the performance of PAS-Gen was tested using built-in virtual iPhone and iPad kits, and real time iPhone (8 and XS with pre-installed iOS 12.4) and 3rd generation iPad devices. The released, currently available version of PAS-Gen was tested and approved by Apple for meeting expected international standards, which include architecture, user interaction, system capabilities, visual design, icon and images, windows and views, extensions etc.

Fig. 1
figure 1

PAS-Gen navigating graphical user interfaces with examples of searched Gene, Gene to Disease, and Disease to Gene results. PAS-Gen (iPhone XS and 8) screen display includes About, Register User, Reset Password, Main, Menu, Genomics, Clinical Genomics, Genes, and Genes and Disease interfaces. Example 1 shows a search by entering an incomplete gene name “BRCA” (BReast CAncer gene) that reveals the for protein coding genes “BRCA1” and “BRCA2” and related details. Example 2 is a search using keyword “cancer” that presents 6443 genes known to be involved in different kinds of cancers. In example 3, a search for a specific disease “lung cancer” resulted in a total of 11 genes and related diseases. Example 4 demonstrates a search for the gene “RFWD2”, and results revealed 17 disease matches including a protein coding gene with Ensembl ID “ENSG00000143207” at Chromosome 1 associated with the disease “Autism”. Detailed results are attached in Additional file 1

Fig. 2
figure 2

PAS-Gen components design, development, and data flow. PAS-Gen is an iOS app developed with Swift programming language, XCODE integrated development environment for MacOS, MySQL database management system, PHP scripting language, and UNIX-based web and database servers

PAS-Gen graphical interface provides user profile, login, and password management modules, requiring new users to first register by creating an account and login with valid credentials. The major reason for requesting users to create a profile, is to apply security features to the app to track usage and backtrack in case of any trouble, such as a breach or violation. In the future, we plan to implement artificial intelligence and machine learning-based features to help users search data of their interest based on their search history, and having their profile will be extremely useful in such cases. Moreover, a user email address is required to inform on major updates to the app and database. At successful login, users will be directed to the main menu leading to the “Genomics” and “Clinical Genomics” interfaces, with two similarly designed interfaces: “Genes” and “Gene & Disease”. The “Genomics” button leads to the “Genes” interface, which allows users to search for only genes and related information, which includes Gene Name, Ensembl ID, Type, and Chromosome. The “Clinical Genomics” button leads to the “Gene & Disease” interface, which lets users search for related diseases by complete or partial word matching. One important thing to remember while searching for any disease leading to genes is, if the name of the disease consists of multiple words then using underscore “_” instead of space or hyphen is required (e.g., type “Down_Syndrome” for “Down Syndrome” or “Tay_Sachs” for “Tay-Sachs”). PAS-Gen is for non-commercial research and educational use only. It is freely and only available on the App Store for iOS devices, tested and recommended for the iPhone 6, 8, X (XS, MAX), and iPad (2nd and 3rd Generation) mobile devices with iOS version 12.1 or above (Fig. 1).

Further download and project-related details are available at the following web site: https://itunes.apple.com/us/app/pas-gen/id1447766164?ls=1&mt=8.

Results

PAS-Gen is an easy-to-use application designed to simplify navigation across the landscape of gene annotation resources by an efficient mobile record search engine, which is based on standardized genes and related diseases to help explore multi-purpose clinical and genomics concepts in meaningful ways (Fig. 1). The PAS-Gen database includes a total of 59,293 genes, where 19,989 are protein-coding and 39,304 are non-protein-coding (processed transcript, lincRNA, antisense, IG C gene, bidirectional promoter lncRNA, polymorphic pseudogene, transcribed unitary pseudogene, transcribed unprocessed pseudogene, transcribed processed pseudogene, sense overlapping, scRNA, noncoding, unprocessed pseudogene, IG V gene, unitary pseudogene, vaultRNA, TR C gene, sense intronic, snRNA, processed pseudogene, TEC, TR V pseudogene, TR V gene, and macro lncRNA) (Table 1). The PAS-Gen database is composed of 98,064 gene-disease combinations reported from 809 distinct sources (combinations of sources for individual gene-disease relationship) and based on 26 types of genes, located at 23 pairs of genomic chromosomes and mitochondrial DNA, and 13,216 genes (including aliases), 10,598 genes with distinct Ensembl identifiers, 12,257 distinct diseases, 32,089 combinations with actionable genes, and 8063 cancer-causing genes (Table 2). Here, we present results to help users better understand the data search capabilities of PAS-Gen (Figs. 3, 4, 5, 6), detailed results are included in Additional file 2.

Table 1 PAS-Gen database description: type and sub-types of genes
Table 2 PAS-Gen database description and statistics
Fig. 3
figure 3

PAS-Gen (iPhone 8) screenshot examples of gene results (top two shown) from searches for the four most common diseases: a 931 results for Diabetes, b 60 results for Obesity, c 391 results for Schizophrenia, and d 313 results for Autism. Detailed results are attached in Additional file 1

A combination of various genetic and environmental factors leads to the most common diseases [39], e.g., Diabetes [40], Obesity [41], Schizophrenia [42, 43], Autism [44], Heart disease [45, 46], Polydactyly [47, 48], Spina Bifida [49], and Cancer [50]. The most common genetic diseases are Thalassemia [51], Down Syndrome [52], Cystic Fibrosis [53], Sickle Cell Anemia [54], Tay-Sachs disease [55], Fragile X Syndrome [56], Hemophilia [57], and Huntington [58]. Examples of gene search results for some of the most common diseases are shown in Figs. 3, 4 and the most common genetic diseases are shown in Figs. 5, 6. We present search results for gene-disease associations for the most common diseases, which includes 931 results for Diabetes, 60 results for Obesity, 391 results for Schizophrenia, 313 results for Autism, 512 Heart and related diseases, 168 results for Polydactyly, 79 results for Spina Bifida, and 6443 results for Cancer (Figs. 3, 4). Search results presenting gene-disease associations for most common genetic diseases include, 117 results for Thalassemia, 49 results for Down Syndrome, 91 results for Cystic Fibrosis, 18 results for Sickle Cell Anemia, 16 results for Tay-Sachs disease (Tay-Sachs is generally hyphenated, to search using PAS-Gen, its recommended to use underscore instead), 31 results for Fragile X Syndrome, 64 results for Hemophilia, and 81 results for Huntington (Figs. 5 and 6).

Fig. 4
figure 4

PAS-Gen screenshot examples of gene results (top two shown) from searches of the most common diseases: a 512 Heart and related diseases, b 168 results for Polydactyly, c 79 results for Spina Bifida, and d 6443 results for Cancer. Detailed results are attached in Additional file 1

Fig. 5
figure 5

PAS-Gen screenshots examples of gene results (top two shown) for searches of common genetic diseases: a 117 results for Thalassemia, b 49 results for Down syndrome, c 91 results for Cystic Fibrosis, and d 18 results for Sickle Cell Anemia. Detailed results are attached in Additional file 1

Fig. 6
figure 6

PAS-Gen screenshots examples of gene results (top two shown) for searches of common genetic diseases: a 16 results for Tay-Sachs disease, b 31 results for Fragile X Syndrome, c 64 results for Hemophilia, and d 81 results for Huntington. Detailed results are attached in Additional file 1

Discussion

We are entering the era of personalized medicine in which an individual’s genetic makeup will eventually determine how a doctor can tailor his or her therapy. Therefore, it is critical to understand the genetic basis of common diseases (e.g., which genes and genetic variants contribute to disease phenotypes). Human diseases are at the heart of extensive research encompassing genomics, bioinformatics, systems biology, and systems medicine. To gain new insight into disease taxonomy, etiology, and pathogenesis, it’s important to understand how diseases are related to each other [29]. In the past, various efforts have been made in deciphering diseases to facilitate predictive diagnosis and thereby guide treatment factors [39], which includes drawing disease relationships using clinical manifestations [59,60,61,62], healthcare records [63,64,65,66], images and data generated using wearable technology and artificial intelligence [67,68,69,70], and information encapsulated within related genes [71, 72], proteins [73], signaling [74] and metabolic pathways [75], microRNA [76], chemo-centric views [77], phenotypic characteristics, and microbes [78]. Multiomics approaches (genome, transcriptome, proteome, metabolome, microbiome, and epigenome) are becoming increasingly common with the advancement of high-throughput technologies. A key challenge in this realm is NGS interpretation. Scientists are faced with the daunting challenge of identifying candidate genes that are relevant to their biological system of interest. Most often, the researcher only has direct knowledge of a few, if any, candidate genes. The clinical interpretation of the significance of specific gene variants can be unique to a patient. Variability in interpretation for sequence variants is due, in part, to the lack of standard curated information to support clinical decision-making.

The underlying assumption here is that creating a database with smart distillation and abundant distribution of genes and SNPs linked to the classified diseases and drugs through their description and IDs (e.g., ICD and NDC) can support both clinical and research environments [6]. Currently, investigation of multiple databases is required to assess the potential significance of even one sequence variant, and that is a cumbersome, time-consuming, and an increasingly unfeasible process with regard to identification and reports of variants in actionable genes because of the absence of a standard centralized platform for connecting genes to their disease phenotype [79]. Such a database must not be redundant and should only include human reference genome and disease-based information collected from valid sources available worldwide. It’s very important to facilitate interested users with efficient, user friendly, easy navigation, and free portable access to the database using platforms that have proven to be efficient tools in several areas including healthcare. In this manuscript, we present design and development of an iOS application to explore genes and diseases to support medical research that will support implementation of precision medicine.

The greatest strength of our approach is unearthing the biological roots of complex and rare diseases by facilitating mobile search mechanism for known and authentic genes that have been associated with their respective diseases. PAS-Gen aims to benefit every type of user (e.g., researchers, medical practitioners, life science students, and even patients) with easy one-touch browsing and saving time scanning through genes and developing gene-disease lists for a research study [6]. To harness the power of reported genes, our presented solution can contribute as a state-of-the-art, leading mobile application. In the future, we are looking to extend the scope of this project by curating and adding more genes, classified diseases and their relationships in PAS-Gen database, implementing data science and visualization features for analytics, and implementing actionable genes-based data classification e.g., The American College of Medical Genetics and Genomics (ACMG) [80] and MSK-IMPACT [81] approved actionable genes. We are extending the scope of our project by adding germline and somatic mutations, especially maintained by the Genome-Wide Association Studies (GWAS) [82] and Catalog of Somatic Mutations in Cancer (COSMIC) [83, 84]. We aim for the integration and annotation of our genomics (genes and variants) and clinical (diseases and drugs and their code sets) databases to assist clinicians to directly interpret a patient’s genomic profile and collaborate with scientists to translate variant data into therapy. Furthermore, we are interested in advancing the graphical user interface of PAS-Gen with the implementation of machine learning techniques to facilitate users in intelligently searching data of their interest based on their personal preferences and search history.

Conclusions

Gene-disease data are highly significant at every level of biological research and healthcare, but inconsistencies and inabilities in terms of gene annotation and specificity of disease classification terminologies add to the complexity and lack of an efficient integrative searchable system make it difficult to comprehend the underlying implications. We offer PAS-Gen to the biomedical research community with a social pledge to educate individuals by providing them with an interactive app to query, easily explore, and access information on gene annotation and classified disease phenotypes with greater visibility and easy browsing. The gene-disease querying ability offered by PAS-Gen provides the user with an important knowledge discovery tool, just a click away from any location. PAS-Gen is an exclusively academic application founded on genomics, clinical, scientific, and modern technology to support healthcare by enabling scientific data retrieval using efficient mobile-based tools.