Background & Summary

Cancer is a leading cause of death and a significant barrier to increasing life expectancy worldwide1. Cancer treatment has improved in the past few decades, but chemotherapy remains the mainstay of cancer treatment. Multidrug resistance is a major problem associated with anticancer chemotherapy2,3. Data show that 90% of cancer deaths can be attributed to multidrug resistance4. Due to structural differences with small-molecule compounds, bioactive peptides have received much attention and are believed to be alternative candidates for multidrug-resistant cancer therapy5,6,7.

Anticancer peptides (ACPs) are biologically active peptides with antitumor activities that exist widely in a variety of organisms, including mammals, amphibians, insects, plants, and microorganisms. ACPs have many advantages in the treatment of tumours, such as low molecular weight (compared to protein-based therapy), simple structure, high anticancer activity, high selectivity, fewer side effects, easy modifications, and less possibility to cause resistance8,9. Although ACPs have been extensively studied, its mechanism of action is not fully understood. At present, the known mechanisms of ACPs mainly include inhibition of tumour cell proliferation or migration10,11, inhibition of tumour blood vessel formation12, causing cancer cell lysis13, and induction of cancer cell apoptosis14. In addition, peptides can also serve as a targeted therapeutic agent that can target and directly bind specific cancer cells or cancer related biomarkers, and can also serve as a peptide carrier linked to traditional anticancer drugs15,16.

Although the enormous potential of peptides in cancer therapeutics, there is a relative scarcity of dedicated databases specifically storing cancer therapy peptides information. Most of the ACPs information is dispersed in bioactive peptide databases, such as DRAMP17, APD18, DBAASP19, HORDB20, CPPsite21, and SATPdb22, which mainly focus on antimicrobial peptides or hormones. The CancerPPD23 database is a known database for annotating ACPs and anticancer proteins; however, its data have not been updated since 2015. Many antimicrobial peptide databases also store information about the anticancer activity of some antimicrobial peptides, but it does not contain detail annotation of ACPs. For example, they did not fully provide information on cancer cells or molecular targets of ACPs, nor do they include peptide drugs. Therefore, we constructed an open, comprehensive database of cancer therapy peptides, DCTPep, that not only includes traditional ACPs, but also peptides with targeted effects on cancer therapeutics. DCTPep can be freely accessed and downloaded from http://dctpep.cpu-bioinfor.org/.

Developing targeted therapies that selectively act on cancer cells has always been an ideal approach for cancer treatment. A promising targeted therapy is drug conjugates, which involve linking targeting carriers with chemotherapy drugs or cytotoxic agents through a linker, such as antibody-drug conjugates (ADCs) and peptide-drug conjugates (PDCs)24. Currently, the most common drug conjugates used in cancer treatment in clinical practice are ADCs. However, with the increasing presence of peptides in clinical, PDCs has also emerged. PDCs have the potential to overcome the limitations of ADCs, such as smaller molecular weight and ease of synthesis25. Nowadays, only two PDCs, 177 Lu-dotatate (DCTPepD0013) and Melflufen (DCTPepD0108), have been approved for clinical cancer treatment, of which Melflufen being withdrawn from the market by the FDA. However, there are still many PDCs in cancer clinical development or about to enter clinical trials. The potential of PDCs cannot be ignored. Peptides play a crucial role as carriers in PDCs. Therefore, DCTPep not only focuses on collecting ACPs but also emphasizes the collection of cancer targeted peptides. The carrier peptides in PDCs include cell-penetrating peptides (CPP) and cell-targeting peptides (CTP)26. The classification field in the database also follows a similar category, including cell-penetrating peptides, cancer-targeting peptides, and targeted peptide conjugates.

Figure 1 and Table 1 presents the comparative results of DCTPep datasets with ACP datasets in other peptide databases. Compared to DBAASP, CancerPPD and SATPdb, DCTPep possesses over 3000 unique entries. DCTPep provides a vast amount of cancer therapy peptide data, including clinically relevant peptide drugs curated in the drug library, filling the gaps in existing data and offering assistance in the design and screening of novel cancer therapeutic peptides. Particularly, the targeted peptide data will offer more options for PDC design. In order to better understand the mechanism of action of cancer therapy peptides, we have added target annotations and collected over 60 targets for these peptides that are not included in other ACPs databases. The dataset is freely available to all via the web without the need to login or registration and is not password protected. We believe that DCTPep will become a valuable resource for the development of novel bioactive peptides, particularly in the field of cancer therapeutics.

Fig. 1
figure 1

Venn diagram illustrating the numbers of overlapping and non-overlapping peptide sequences related to cancer therapy from the DCTPep, CancerPPD, SATPdb and DBAASP.

Table 1 Comparison of peptides related to cancer therapy in DCTPep with other peptide databases (data as of 2023.12.20).

Methods

Data collection and compilation

In order to develop DCTPep, extensive searches were conducted on published articles, patents, and public databases. The data of DCTPep was stored in two sub libraries: peptide library and drug library. The inclusion criteria for the peptide library in the DCTPep were as follows: 1. The sequence of amino acids is reported; 2. Mature peptide sequences without precursor and signal regions; 3. The length of the sequence does not exceed 100 amino acids; 4. Peptides that exhibit anticancer/antitumor activity or target specific molecules/biomarkers overexpressed in cancer cells; 5. Cell-penetrating peptides that can enhance the delivery of drugs into cancer cells. The inclusion criteria for drug library were similar to those for peptide library: 1. Peptides and their derivatives or amino acid derivatives related to cancer treatment; 2. Entered clinical research or approved by FDA, EMA or HC.

To collect peptide data, keywords were used to search in academic search engines such as Google Scholar, Web of Science, PubMed, and Google Patents. The keywords included “ACP”, “antiangiogenic peptides”, “cancer therapy peptide”, “cancer targeted peptide”, and “peptide conjugates”. After collecting research papers, patents, and clinical research literature, data were manually extracted. In addition to manually extracting information of cancer therapy peptide from literature, also included other information related to peptides (such as three-dimensional structures) in UniProt27, PDB28, and other databases. The physicochemical information of peptides is calculated using Expasy Protparam server (https://web.expasy.org/protparam/, accessed on March 2024) and SciDBMaker29.

The data of drug library mainly originated from the portal websites of drug regulatory authorities and organisations in several countries and regions. In addition, it was supplemented by the drug databases DrugBank30, PubChem31, NCI Thesaurus32 and Global Substance Registration System (GSRS)33. By entering keywords such as “peptides and their derivatives”, “amino acids and their derivatives”, and “anticancer” into the aforementioned website or database, relevant information can be found.

Structural prediction and evaluation

Due to the difficulties in experimental determination of peptide and protein structures, most of the peptides lack experimental determined structures. AlphaFold34 was used to predict the potential 3D structures of DCTPep peptides. Default structure parameters for AlphaFold prediction were used: peptide was modeled as a monomer; Multiple sequence alignment (MSA) information databases: full_dbs (all gene databases)34. Each peptide generates 5 structures, and the structure with the highest score is selected based on predicted local distance difference test (pLDDT)34. To evaluate the reliability of AlphaFold predicted peptide structures, 30 peptides with experimental determined structures were selected and their structures were predicted by AlphaFold. The differences between predicted structure and experimentally determined structure were calculated by Root-Mean-Square Deviation (RMSD)35. Given two conformations, α and β of N residues, let rα and rβ be the respective coordinates of their residues at position i, for 1, …, N. RMSD between α and β as Eq. (1):

$$RMSD=\sqrt{\frac{1}{N}\mathop{\sum }\limits_{i=1}^{N}{\left({r}_{\alpha ,i}-Q{r}_{\beta ,i}\right)}^{2}}$$
(1)

Where Q is the unitary rotation matrix that optimally aligns the vectors. Disulfide bonds are also considered to see if AlphaFold can correctly predict the disulfide bonds. Whatcheck36 and Procheck37 are used to assess the quality of the predicted structures. Whatcheck36 evaluates multiple parameters such as bond lengths, bond angles, and torsion angles of the input structure. Procheck37 assesses the stereochemical quality of the input structure and provides various graphical outputs. Ramachandran plot38 is used to evaluate the rationality of the structure, where peptide bond dihedral angles Ψ(psi) and Φ(phi) combinations are expected to located in most favored regions and allowed regions (core regions) in the plot. Ideally, a protein structure should have over 90% dihedral angles Φ-Ψ of residues in these core regions37.

Data Records

The datasets of DCTPep are available at Figshare39 and contains the following files: All_information (annotation information table for storing peptide library entries), peptideactivity (activity information annotation of peptide library entries), peptidedrug (annotation information table for storing active Ingredients of drug library entries), marketpeptide (approved drug preparations information annotation of drug library entries), clinicalpeptide (clinical peptide information annotation of drug library entries), peptide_library_all (peptide library data stored in Fasta format) and prediction pdb (compressed packets for storing predicted structures). The architecture of the DCTPep is shown in Fig. 2. DCTPep contains a total of 6214 peptide entries, of which 6106 are stored in the peptide library and 108 are stored in the drug library (DCTPepD), involving over 60 targets and over 380 cancer cell lines.

Fig. 2
figure 2

Architecture of the datasets in DCTPep.

Table 2 displays detailed annotation information of the data in the peptide library. Each entry in the peptide library consists of the following sections: general information, activity information, structural information, physicochemical information, literature information, and links. The peptides in the peptide library included cancer therapeutic peptides such as traditional ACP and cancer targeted peptides. Low cytotoxicity and hemolytic activity are also important criteria for developing peptide-based drugs. Therefore, in addition to anticancer activity and targets, activity information also includes cytotoxicity and hemolytic activity. All annotation information is manually extracted from the literature, and corresponding paper or patent source information is provided. The physicochemical information is calculated by Protparam and SciDBMaker29. For the same peptide, the emphasis of the information recorded in different databases may vary. Therefore, DCTPep provides corresponding peptide entry codes in other peptide databases.

Table 2 Peptide library data annotation field list.

The data in the drug library includes peptide drugs that have been approved or are in clinical research stage. Table 3 shows detailed annotation information for drug library data. Each entry consists of four sections: general information, structural information, external codes, and drug approval. The external codes provide identification codes for drug entries in other public databases, allowing users to obtain more comprehensive information on related entries from other sources. Approved drug formulations and clinical information can be found in the drug approval section. A total of 28 approved anticancer peptide drugs and 80 peptides in various clinical trial stages are included in the drug library.

Table 3 Drug library data annotation field list.

Technical Validation

Alphafold demonstrated unprecedented accuracy in 14th Critical Assessment of protein Structure Prediction (CASP14)34. The study conducted by McDonald et al.40 also indicated that AlphaFold can accurately predict peptides with α-helices, β-sheets, and rich in disulfide bonds. To evaluate the accuracy of AlphaFold, 30 ACPs with experimentally determined structures were predicted by AlphaFold.

Table 4 and Fig. 3 displays the comparison results between predicted structures and experimental structures, including RMSD and disulfide bond positions. The results indicate that the predicted structures have high accuracy. The deviations between the predicted and experimental structures are small, with an average of Cα (α-carbon atom) RMSD value is 1.621 Å. For structures containing disulfide bonds, AlphaFold can accurately predict the positions of the disulfide bonds. Some of the predicted structures of peptides can be directly obtained from the AlphaFold Protein Structure Database41, for example, AF-P82393-F1 (DCTPep00006) and AF-P80400-F1 (DCTPep00097).

Table 4 Comparison between predicted structures and experimental structures.
Fig. 3
figure 3

Alignment and superimposition plot of predicted structures and experimental structures. Predicted structures: helix-orange, strand-green, turn-magenta, Cys-dark cyan; Experimental structures: helix-red, strand-yellow, turn-blue, Cys-black.

pLDDT is an important parameter for assessing the confidence of predictions34. While using pLDDT alone to define the accuracy of predicted peptide structures may not be entirely accurate, it can still reflect their accuracy to some extent. DCTPep integrates the Mol* Viewer42 to display the predicted structures, where the pLDDT of each residue can be visualized in the structure43 (Fig. 4).

Fig. 4
figure 4

Example of the predicted structure of DCTPep00001 showing by Mol* Viewer.

The quality assessment of the predicted structures was performed using Whatcheck36 and Procheck37 (Table 5), and the results indicate that the predicted structures are reliable. The average error rate of Whatcheck is 11.52%, which is at a relatively low level. In the Ramachandran plot generated by Procheck, the average core regions occupancy rate is 95.11%, only the DCTPep00623 has a low occupancy rate of core regions. The average disallowed regions occupancy rate is 0.26%, only DCTPep00267 has one residue present in the disallowed regions. These errors are within an acceptable range.

Table 5 The results of predicted structures in Whatcheck and Procheck.