Background

Hypertension is a main risk factor of stroke, heat failure and ischemic heart disease. In spite of the huge amount of researches recently performed in this area, the pathogenesis of human hypertension remains elusive. Thus, hypertension has to be defined as “essential” for 95 to 99% of cases [1]. Essential hypertension (EH) is viewed as a consequence of interaction between environmental factors and genetic background. Data from animal models, human twin and family studies have indicated that approximately 30%-60% of BP variation is caused by genetic factors [2, 3]. Furthermore, association study and linkage analysis have determined many casual or susceptible genes related to EH. BP must be a highly regulated quantity, affected by a multitude of physiological systems that finally integrate and maintain BP levels to secure an adequate blood perfusion of all tissues [4]. BP variation is a consequence of altered activity in signal transduction pathways and interactions of complex intra- and intercellular processes. As all biochemical processes are governed by the proteins, we propose that protein–protein interactions (PPIs) especially the proteins encoded by these casual or susceptible genes are extremely important in orchestrating the BP variation.

In the recent years, the topological analyses have been applied to molecular networks including protein interaction networks, whose nodes are proteins linked to each other via physical interactions [5]. In this study, we aimed to identify the important proteins and the biological regulatory pathways involving in EH and further explore the molecular connectivity between these pathways by the topological analysis of the PPIs network derived from the proteins encoded by casual or susceptible genes for EH. The parameters of degree and betweenness are two fundamental measures in network theory. Degree measures how many neighbors a node direct connect to while betweenness measures how often nodes occur on the shortest paths between other nodes [6]. In the PPIs network the nodes with high degree defined as hub protein and the nodes with high betweenness defined as bottleneck protein, both are key or important protein [6]. Yu H and colleagues think of protein networks in analogy to a transportation network, so proteins with high betweenness are similar to heavily used intersections, such as those leading to major highways or bridges [7]. In this study, we employed the proteins with high BC to identify the important genes and their related signal pathways in regulating BP.

Methods

The research method used in this study mainly consisted of seven steps. Step one: extraction of the candidate genes associated with EH from the literature using PolySearch text mining system. Step two: Scanning protein interactions from the database STRING. Step three: Construction of PPIs network and extraction the giant component from the extended network. Step four: Topological Analysis of PPI network. Step five: extraction the large BC nodes from the giant network to create a backbone network. Step six: Construction a subnetwork consisting of all shortest paths between the candidate genes from the giant network. Step seven: Validation of the backbone network and the NOS3 as central protein.

Extraction of genes associated with essential hypertension from the literature

We searched candidate genes associated with EH by PolySearch text mining system, which can produce a list of concepts relevant to the user’s query by analyzing multiple information sources including PubMed, OMIM, DrugBank and Swiss-Prot. It covers many types of biomedical concepts including diseases, genes/proteins, drugs, metabolites, SNPs, pathways and tissues [8]. We used PolySearch system to search the genes associated with EH. The query type is ‘Disease-Gene/Protein Association’ and the query keyword is ‘essential hypertension’. PolySearch system returns 1435 literatures. To check the accuracy, we manually confirmed whether these genes are associated with the essential hypertension. Finally a total of 69 candidate genes were obtained (Table 1).

Table 1 The list of genes extracted from literary database showing association with essential hypertension

Scanning protein-protein interactions

The candidate genes listed in Table 1 were converted to be the seed proteins. We obtained PPIs from STRING database, a precomputed database for the exploration of protein–protein interactions. The newest version of STRING, 9.0, covers approximately 2.5 million proteins from 630 different organisms [9].

Construction of PPIs network and extracting the giant component from the extended network

We constructed an extended network that not only consists of the seed proteins but their direct PPI neighbors and the interactions between these proteins. The network was constructed using Pajek [10], a highly versatile program for the analysis, operation and visualization of large networks. In this study, the extended network includes a giant component and two small separate components derived from two seeds proteins. This study aimed to explore the mechanism of EH at the system level and the nodes with large BC value must be in the giant network obviously because both of two small separate components consist of small number of nodes, so only the giant network and its parameters related to the network theory had been analyzed or processed. In order to analyze and process the giant network conveniently, we extracted it from the extended network.

Topological analysis of protein interaction network

Properties of nodes including connectivity degree (k), betweenness centrality (BC) and closeness centrality (CC) were adopted to evaluate the nodes in a network; especially k and BC are two fundamental parameters in the network theory [6, 11]. Degree (k), the most basic characteristic of a node in a network is defined as the number of adjacent links, i.e. the number of interactions that connect one protein to its neighbors. BC is the fraction of the number of shortest paths that pass through each node, which measures how often nodes occur on the shortest paths between other nodes. The shortest path is calculated by measuring the length of all the geodesics from or to the vertices in the network. A node with high BC has great influence over what flows in the network. BC may play a major role as a global property since it is a useful indicator for detecting bottlenecks in a network. Closeness centrality (CC) is defined as the inverse of the average length of the shortest paths to/from all the other nodes in the graph, which tells us the topological center of the network. Global topological measurements of networks include average degree, mean shortest path length and diameter used to character network [6]. Average degree (<k>): it represents the mean of all degree values of nodes in a network. Mean shortest path length (mspl): is the average of the steps needed to connect every pair of nodes through their shortest path. Diameter (D): is the longest among all shortest paths. In this study, properties of nodes and measurements used to characterize network were calculated by Pajek software.

Searching high BC nodes to create a backbone network

In this study, we viewed PPI maintaining the blood pressure homeostasis as a transportation network. Thus, the proteins with high BC should be the heavily used intersections, these proteins and the links between them make up a backbone network. The critical point of high BC was set at 5% of the total node set of the network [12, 13]. These high BC nodes and the links between them were extracted from the giant network to create a backbone network. BC was originally introduced to measure the centrality of the nodes in a network. By definition, most of the shortest paths in a network go through the nodes with high BC. These nodes function as bottleneck control the communication among other nodes in the network.

Construction a subnetwork consisting of all shortest paths between the candidate genes

Even in the giant network, there are a few pairs of candidate gene are not connected directly. In order to construct a subnetwork in which all genes associated with EH are connected directly or indirectly with minimum number of nodes, we found out all shortest paths between every pair of candidate genes. The shortest paths between the candidate genes are calculated by Pajek software. Then the subnetwork consists of nodes in these paths.

Validation of the backbone network and the NOS3 as central protein

In order to validate the robustness of the backbone network and the NOS3 as central protein, we constructed test networks only using a part of 69 genes as initial seeds. The initial seed genes were determined by omitting from 1 to 7 (10% of 69) genes. If the number of the omitted genes is 3, there are 314364 (69 × 68 × 67) combinations. Therefore, the omitted genes were selected randomly if the number of omitted genes is more than 3. However, considering the importance of NOS3 in our conclusion, NOS3 was omitted always. Then the exact method of omitting genes is as below. If the number of omitted gene is 1, then there are 69 combinations because every gene of 69 genes was omitted once. If the number of omitted gene is 2, then there are 68 combinations (NOS3 and other 68 genes). If the number of omitted genes is more than 3, the omitted genes are NOS3 and other genes selected randomly and regardless of number of omitted genes, randomly selecting is 30 times. Finally, 287 test networks (69 + 68 + 5*30) ware constructed (Additional file 1), and the BC values of nodes in these networks ware calculated by Pajek software. Then the nodes with top 27 BC value were determined in these test networks. We tested the robustness of the backbone network and the NOS3 as central protein by calculating frequency of NOS3 as a node with the largest BC value and the accuracy of the backbone nodes in the test networks. The accuracy of backbone was estimated as the fraction of the nodes with top 27 BC in the test networks which agree with the nodes in the backbone network described in step five.

Results

Protein-protein interaction network

The extended network includes one giant network and two separated small networks which are derived respectively from the seed protein CYBA (cytochrome b-245, alpha polypeptide) and PSMA6 (proteasome subunit, alpha type, 6) (Figure 1). The giant network consisted of 535 nodes connected via 2572 edges (Figure 2). The backbone network consisted of 27 nodes connected via 39 edges (Figure 3). Accordingly, we studied the measurements charactering network listed in Table 2: number of nodes (N), average degree (<k>), diameter (D) and mean shortest path length (mspl). The largest degree in the giant network is 43, while its average degree is 7.61. This network is characterized by a small number of highly connected nodes, while most of the other nodes have few connections. It indicates that the giant network is similar to other human PPIs [14].

Figure 1
figure 1

Overview of the extended network. The extended network includes one giant network and two separated small networks which are derived respectively from the seed protein CYBA (cytochrome b-245, alpha polypeptide) and PSMA6 (proteasome subunit, alpha type, 6). The nodes with label are seed proteins converted from the candidate genes listed in Table 1 while the nodes without name are their neighbors scanned from STRING database.

Figure 2
figure 2

The topology of the giant network. The giant network extracted from the extended network is the biggest component in the extended network. The size of nodes corresponds to their BC values.

Figure 3
figure 3

The topology of the backbone network. The backbone network consists from 27 nodes with high BC value. The size of nodes corresponds to their BC values.

Table 2 The general network measurements for networks

Key nodes in the PPI network

In this study, the nodes with large degree or high BC were viewed as key nodes, and 5% of the total nodes set of the network was used as the critical point of large degree and high BC nodes. Of 535 total nodes, 27 nodes have high BC (Table 3), 28 nodes have large degree (Table 4) and 13 nodes were selected with high BC and large degree (Table 5) and 14 nodes only with high BC (Table 6). In order to discern their roles in the network, these nodes were highlighted in different color and size (Figure 2). KNG1 (kininogen 1) is a hub protein with the largest degree, while NOS3 (nitric oxide synthase 3) is a bottleneck protein with the highest BC. NOS3 has highest CC value, which indicates that NOS3 locates at the centre of the network.

Table 3 The list of high BC nodes and their CC values
Table 4 The list of large degree nodes and their CC values
Table 5 The list of proteins with both high BC and large degree and their functions
Table 6 The list of proteins only with high BC and their functions

The signaling pathsway in the high BC network and cross-talk between them derived from backbone network

The backbone network consists from 27 high BC nodes, the size of which corresponds to their BC value and the 39 links between them (Figure 3). Without calculating the values of BC and CC, we can find out that NOS3 locates at the centre of the backbone network with the highest BC value and the largest degree. NOS3 has 8 neighbors: SIRT1, CAT, AKT1, IFNG, TNF, KNG1, REN and CALM1. These proteins also represented SIRT1 pathway, antioxidant system, AKT pathway, inflammatory system, kallikrein-kinin system, rennin-angiotensin system and Calcium signaling pathway. The details of other proteins in the backbone network were not presented here.

Subnetwork consisting of all shortest paths between the candidate genes

This subnetwork consists of 93 nodes including 6 proteins which are not large BC nodes nor seed proteins, 60 seed proteins, 20 large BC nodes and 7 nodes which are both seed protein and large BC node (Figure 4). We can find out that NOS3 has the highest BC value and the top 27 BC nodes in this subnetwork coincide well with 27 nodes in the backbone network. There are only 6 proteins is not in the list of 27 nodes with large BC value in the giant network. They are TGFBR2, AGT, ACE2 GNAQ, HSD11B2 and KCNJ1 (Table 7).

Figure 4
figure 4

The subnetwork consisting of all shortest paths between the genes associated with essential hypertension. The candidate genes are connected by all shortest paths in the giant network. The size of nodes corresponds to their BC values and there are 6 yellow nodes without large BC (6 outside 27).

Table 7 The list of top 27 BC nodes in the subnetwork consisting of candidate genes mainly

The robustness of the backbone network and the NOS3 as central protein

There are 7 genes with the largest BC value in the test networks. They are CALM1, HIF1A, IFNG, KNG1, NOS3, NPY and REN (Additional file 1). Though NOS3 is not as the initial seed gene, its frequency as the node with the largest BC value is 211 in 287 test networks (Table 8 and Figure 5). The accuracy of the backbone is 0.80344 (Table 8). Both the accuracy of the backbone and the frequency of NOS3 as the node with the largest BC value decrease rapidly when the number of omitted genes is 3 (Table 8 and Figure 6).

Figure 5
figure 5

The frequency of the nodes with the largest BC value in the test networks grouped by the number of omitted genes.

Figure 6
figure 6

The accuracy of the backbone grouped by the number of the omitted genes.

Table 8 Frequency of nodes with the largest BC value and accuracy of backbone in the 287 test networks

Discussion

Though larger number of study had been finished on EH and many casual or susceptible genes related to EH had been reported, its pathogenesis remains elusive. We proposal that the proteins encode by these genes can determine BP level by the interactions between them. The purpose of this study is to analysis the contribution of these proteins to the pathogenesis of EH and discovers other key proteins cooperating with them by topological analyses. As two fundamental measures in the network theory, degree and betweenness had been widely used to evaluate the proteins in the different PPIs associated with diseases, though there are some new parameters derived from them [12, 2830]. We also utilized degree and betweenness as main parameters to evaluate the nodes in the PPIs.

In this study, 69 genes have been searched as causative or susceptible genes involved in EH. The network derived from seed proteins converted from these genes, consists a giant network and two separated small network (Figure 1). Only two seed proteins (CYBA and PSMA6) separate from the giant network, it suggests that the PPIs between these proteins orchestrate the BP variation. There must be some missed genes from literature searching and new causative or susceptible genes remained to be discovered for EH, even false nodes result from false interactions in the network. However, as reviewed by Gipsi Lima-Mendez and Jacques van Helden [14], biological networks are tolerant to nodes deletion, and new nodes prefer to link to nodes with large degree. In another word, biological networks are robust to random alteration of nodes but sensitive to hub removal.

In the giant network, there are 28 proteins with large degree and 27 proteins with high BC, 13 proteins with both large degree and high BC among them (Tables 3, 4, 5 and Figure 2). In order to disentangle the effects of betweenness and degree, Yu and co-workers divided all proteins in a certain network into four categories [7]: nonhub–nonbottlenecks (small degree and low BC); hub–nonbottlenecks (large degree but low BC); nonhub–bottlenecks (small degree but high BC); and hub–bottlenecks (large degree and high BC). Han et al. distinguish two subtypes among the highly connected proteins: hub–bottlenecks tend to be date-hubs, whereas hub–nonbottlenecks tend to be party-hubs. Party hubs interact with most of their partners simultaneously, whereas date hubs bind different partners at different times or locations [15]. We believe that further verify the space-time effect of these proteins, which will help us to identify drug targets and biomarkers for EH. KNG1 with the largest degree ranks 5 in the high BC proteins list while NOS3 with the highest BC ranks 5 in the large degree proteins list. KNG1 representing kallikrein-kinin system and NOS3 representing Endothelial NO system both mainly function as vasodilatation in the regulation of BP. In certain degree, we can cautiously speculate that EH originates from the failure of systemic or local vasodilatation in the right time and right place.

NOS3 with the largest CC value locates at the centre of the giant network and the backbone network derived from high BC proteins, which highlight the significant role of NO system in maintaining BP homeostasis. In the study, the backbone network centering on NOS3 is a signaling high pathway to regulate the BP variation (Figure 3). The proteins within it are key intersections. The intersections direct linking to NOS3 include SIRT1, CAT, AKT1, IFNG, TNF, KNG1, REN and CALM1. It has been reported that SIRT1 promotes endothelial-dependent vasodilatation by targeting NOS3 for deacetylation, leading to enhance nitric oxide (NO) production [16]. A recent study has shown that production of NO, stimulated by caloric restriction, increases SIRT1 expression; this study suggests that eNOS may be involved in regulation of the expression of SIRT1 in murine white adipocytes [17]. Although H2O2 is not directly involved in NO synthesis, the H2O2/ CAT stimulate NO synthase activity [18]. As the major cardiovascular enzymatic antioxidants, CAT indicates the role of oxidative stress in the hypertension [19]. Akt regulates the activity of NOS3 via phosphorylation at Ser1177, regulating NO production and vasodilation [20]. It has been estimated that Akt kinase has over 9000 possible substrates [21]. The evidence regarding the role of inflammatory system (TNF, IFNG) and renin-angiotensin system (REN) in BP regulation and their interactivity with NOS3 is available anywhere. After release from its precursor KNG1, kinin regulates NOS3 by activating two distinct G protein-coupled receptors called B2R and B1R [22]. CALM1 activates NO synthesis in NOS3 through a conformation change of the flavin mononucleotide domain from its shielded electron-accepting state to a new electron-donating state [23]. Theses proteins also represented SIRT1 pathway, antioxidant system, AKT pathway, inflammatory system, kallikrein-kinin system, rennin-angiotensin system and Calcium signaling pathway. Their role in BP regulation and their interactions with NO system are reported by many researches [2327].

The backbone network presents a clear and visual overview which shows all important genes and related regulatory pathways for BP and the crosstalk between them. In order to further confirm the role of NOS3 and other proteins in the backbone network, we construct a subnetwork consisting of all shortest paths between the candidate genes (Figure 4). In this subnetwork there are only 6 proteins neither seed proteins converted from candidate genes or nodes with large BC value in the giant network. In another word, the large BC nodes can connect and integrate these seed proteins well. We can also find out that NOS3 has the highest BC value and the top 27 BC nodes in this subnetwork coincide well with 27 nodes with large BC value in the giant network.

To test how robust the conclusions obtained in this work against the change of initial seed genes, 287 test networks had been constructed by omitting several initial seed genes. Despite that NOS3 was not as initial seed genes always, its frequency as a node with the largest BC value is 211 in 287 test networks. KNG1, REN, NPY, CALM1, HIF1A and IFNG Flowing NOS3, their frequency is 50, 10, 6, 5, 4 and 1 respectively (Table 8, Figure 5). All of these 7 proteins are the nodes with high BC and degree in the original network (Table 5). The accuracy of backbone is 0.80344 (Table 8). Both the accuracy of the backbone and the frequency of NOS3 as the node with the largest BC value decrease rapidly when the number of omitted genes is 3 (Table 8 and Figure 6). It may suggest that the NOS3 as central protein and the component of backbone network dependent each other.

Conclusion

Most of seed proteins (67 of 69) associated with EH and their PPI neighbours connected to a giant network. The backbone network presented a clear overview, which shown all important genes, their related regulatory pathways for BP and the crosstalk between them. The backbone network is robust against the changes of initial seed genes. Our finding suggested that blood pressure variation was orchestrated by an integrated PPI network centered on NOS3.