Background

Oral squamous cell cancer (OSCC) is the most common malignant neoplasm arising in the mucosa of oral cavity and includes subsites like the buccal mucosa, alveolus (upper and lower) tongue, palate, and lip [1]. Head and neck cancer accounts for more than 550,000 cases worldwide annually [2]. Oral cancers are more common in the Indian subcontinent, while cancer of the laryngopharynx is more common in other populations [3]. Overall, 57.5% of global oral cancers occur in Asia especially in India. It is 30% of all cancers in India, of which 60 to 80% of the patients present with advanced diseases as compared to 40% in developed countries, this also suggests lack of awareness and need for markers of early identification [4].

Almost all of these malignancies are squamous cell carcinomas (SSCs) which historically in the developed world was associated mostly with alcohol and tobacco consumption and the combination of the two, producing a synergistic increase in the risk. However, over the past 20 years, investigators have found a growing proportion of HNSCC patients with human papillomavirus (HPV) positive tumors that develop in younger people and those having a lower or no intake of tobacco and alcohol, the association in oropharynx is higher than oral cavity [5].

Improvement in understanding the steps leading to carcinogenesis will enable the identification and prediction of malignant progression at an earlier stage of OSCC. Cancer signifies deviation from normal signaling network toward a dysregulated cellular proliferation. Proteins with linkages to various pathways when altered the functional state may shift the equilibrium of the signaling network to enhance the survival of the affected cells or reduce its apoptosis [6]. Searching for such proteins is the main purpose of cancer proteomics. Proteins being the common molecule that participate in the cellular function are often affected by disease, response to treatment, and being disease free. Development of novel protein biomarkers of OSCC in the light of proteomics can help in early cancer diagnosis, treatment, and prognosis.

Material and methods

This systematic review was performed as per PRISMA guidelines. A bibliographic search was performed for studies published till August 2021, using PubMed, Cochrane database, Google scholar, the National Library of Medicine, SpringerLink, and Science Open. The keywords used were “proteomic biomarkers,” AND “head and neck cancer,” AND “oral cancer.” The detailed search strategy for PubMed is detailed in Additional file 1. All relevant studies assessing proteomic characteristics of oral cancer and precancers were considered for analysis. Abstracts, incomplete articles, and non-comparative studies and article in language other than English were excluded. We performed a restriction of articles including only studies in humans; studies on cell line and animals were excluded.

The review also discusses proteomics-based techniques that are used in the identification of proteins that are altered in the disease process or in response to treatment or disease stage and course, and such information could be used to individualize therapy. Research findings in the review are highlights from articles focusing on proteomic approaches toward diagnosis and detection of oral cancer; identification of biomarkers through proteolytic analysis carried out using mass spectrometry, 2D electrophoresis, and other proteomic techniques.

Results

The search revealed 304 articles in English of these this systematic review includes a total of 112 articles (Fig. 1). The review articles were excluded, the two meta-analyses published on the subject has been discussed. These articles were categorized under subsections enumerated below followed by a list of all protein biomarkers identified and brief description of their importance.

Fig. 1
figure 1

PRISMA flowchart of studies included in the systematic review

Biomarker discovery—a proteomic approach

A biomarker is “a measurable indicator of a specific biological state relevant to the risk of contraction, presence or the stage of disease.” Biomarkers can be clinically used to screen, diagnose, and monitor the activity of disease and to assess therapeutic response [7]. “An ideal biomarker should be sensitive, specific, cost-effective, and robust against situational variability and should have added value beyond that of current standards [8].”

Biomarkers can be carbohydrates, DNA, mRNAs, proteins, or small molecules like metabolites and other cellular molecules [9]. Predictive biomarkers lead to detection of abnormalities that causes the development of OSCC [10], while prognostic biomarkers help in predicting the response to therapy and prognosis of patient. Nucleic acid-based microsatellite analysis and tumor-specific aberrant promoter methylation have been used as markers to detect tumor-specific alterations in body fluid and somatic cells of patients with OSCC [11, 12], while this article focuses on protein biomarkers.

Genome sequencing has produced a wealth of information during the last two decades. Following this step was taken to look at proteins, which are the biomolecules translated from genes and govern overall cellular processes. It is proposed that the genes exert their actions through proteins to cause diseases including malignancies. Mechanisms like alternative splicing and post-translational modifications of proteins (e.g., phosphorylation, glycosylation, acetylation, and proteolytic cleavage) contribute to the human proteome that comprises more than half a million proteins [13, 14] in comparison with about 22,000 protein-coding genes [15]. Proteins are important cellular molecules that participate in the cellular process and even control synthesis of DNA and its transcription, proteomic techniques can provide greater insight into cellular physiology and molecular biology. Biomarkers are of extreme importance and can be utilized either alone or in combination with other biomarkers. The available tools are able to identify the quality, quantity, and structural modification beside sub cellular localization [16]. However, most of these require validation.

These protein biomarkers can be secreted by tumors and hence could be differentially expressed compared to normal tissue. The fresh tissue is generally required to study the translation while paraffin-embedded tissue can be used for cellular localization and study of expression. Apart from serum, these proteins can be estimated in other body fluids like urine, saliva, sputum, etc.; however, their quantity may vary according to their secretion by the tumor.

Before being analyzed by mass spectrometry (MS), the sample undergoes preliminary separation, enrichment or fractionation of their proteins. The techniques of the enrichments include one-directional polyacrylamide gel electrophoresis (1D-PAGE), two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) among others [17].

Liquid chromatography, coupled to tandem MS (LC–MS/MS) is used to identify and quantify proteins from human tissues. This is based on interactions between protein, peptide, and column. First, the separation is done by liquid chromatography before identification by mass spectrometry (MS).

A mass spectrometer (MS) has mainly three components: an ionization source, a mass analyzer, and an ion detector [18, 19]. The most common ion sources used are electrospray ionization (ESI) and matrix-assisted laser desorption/ionization (MALDI). These sources produce ion from the sample which are then analyzed on mass spectrophotometer. The main ion analyzers used in proteomics are quadrupole (Q), time of flight (TOF), ion traps, and Fourier transform ion cyclotron (FT-ICR). The cellular localization and quantification are usually done by immunohistochemistry and ELISA; these are also used for validation of protein biomarkers.

OSCC biomarkers

As there is considerable variation in protein expression, there is a variety of potential biomarkers of OSCC. These can be broadly classified in to (i) tissue-based biomarkers, (ii) secretomes (plasma, saliva, blood, or other secretions), and (iii) autoantibodies.

Potential biomarkers

Tissue-based biomarkers in head and neck cancer (Table 1)

Majority of selected biomarkers investigated are tissue-based biomarkers by using different approaches and are summarized in Table 1. The approaches employed include LC-MS, RPLC-MS, SELDI-TOF MS, 2D DGE, iTRAQ, and 2DLC. Further, the results verified by using IHC, PCR, and western blot techniques as described above.

Table 1 Potential protein tissue biomarkers of head and neck cancers

Serum/plasma biomarkers/saliva/secretome (Table 2)

Majority of selected serum/plasma-based biomarkers by using different approaches are summarized in Table 2. Only a few important ones are discussed.

Table 2 Potential protein biomarkers of head and neck cancers: Serum/plasma/saliva/secretome

Epidermal growth factor receptor (EGFR)

EGFR is an important member of the family of the membrane-bound tyrosine kinase receptors activated in tumor cells of epithelial origin. This receptor regulates cellular growth, proliferation, apoptosis, differentiation, migration, and secretion of certain proteins [94]. High EGFR expression has been observed in OSCC suggesting that an uncontrolled growth may be mediated by abnormal EGFR expression [82, 124].

Vitamin D-binding protein

Vitamin D-binding protein is a secreted transport protein which transports the vitamin D sterols in serum and prevents polymerization of actin. The level of vitamin D-binding protein level was significantly low in OSCC plasma. Plasma fibrinogen is a blood coagulation regulator associated with angiogenic and metastatic prediction in numerous tumors [50]. Vitamin D-binding protein has been used as a biomarker for breast cancer, thyroid cancer, and lung cancer [83]. In oral cancer, it has not been found to be increased in human plasma; however, higher concentrations are observed in mouse plasma [83]. Tung et al. (2013) [82] found vitamin D-binding protein to be reduced in OSCC plasma; these results suggested differential regulation in different species.

Fibrinogen (alpha/beta/gamma chain)

Plasma fibrinogen is commonly estimated for blood coagulation and is reported as angiogenic and a metastatic predictor in many tumors [50, 84]. The high expression level of serum fibrinogen has been found to be observed in OSCC patients [50]. Fibrinogen beta chain is a blood-borne glycoprotein, functions in inflammatory responses. It has shown elevated expression in OSCC samples [50]. Fibrinogen gamma chain is a gamma component of fibrinogen and has a major function in homeostasis. It can be considered tumor marker, as the protein shows significantly higher expression in OSCC samples compared to the healthy ones [50].

Carcinoembryonic antigen (CEA)

CEA is a glycoprotein produced by the cells of gastrointestinal tract during embryonic development and is involved in cell adhesion. The salivary and serum levels of CEA were found to be increased in malignant tumors than in healthy tissues [97]. It has been reported previously that the content of saliva CEA was significantly higher in oral-maxillofacial cancer patients and benign tumor than in normal persons (P < 0.01) [97]. Thus, saliva CEA is of guiding significance to a certain extent for identification of malignant and benign tumor, assisting clinical diagnosis and prognosis monitoring of treatment efficacy for cancer [97].

Autoantibodies (Table 3)

Majority of selected biomarkers investigated autoantibodies-based biomarkers by using different approaches are summarized in Table 3. Few important ones are discussed herein.

Table 3 Potential protein biomarkers of head and neck cancers: autoantibodies

P53 autoantibody

p53 antibodies are found in serum and saliva of patients showing overexpression of p53 in their tumor tissues. This is an easy process as these can be detected from saliva [125].

Hsp 70 autoantibody

HSPs are frequently overexpressed in tumor cells. Autoantibodies directed against HSP70 can discriminate the risk condition between healthy and tumor cells. Its level increases from healthy controls to SCC, suggesting that autoantibodies might be used as both early marker and screening risk marker for SCC [126].

Discussion

Development of OSCC is a multistep process. Field cancerization is one of the hallmark of oral cancer, wherein the whole of the mucosa of the oral cavity and upper aerodigestive tract undergo molecular changes and is susceptible to develop cancer. Change in the protein expression profile can be a manifestation of the field cancerization and hence its identification is an important biomarker to predict risk of development of cancer, second primary or recurrence of OSCC.

Tobacco and alcohol consumption are the major independent risk factors for development of HNSCC that also show synergy when combined [128]. Oral cancer development risk is 3 to 9 times greater in those who smoke and drink than in those who consume neither of the two [6, 128]. The upper aerodigestive tract is first to make contact with the harmful components of tobacco-like aromatic polycyclic hydrocarbon (PAH), nitrosamines, aromatic amines, and aldehydes that are responsible for malignant transformation [129]. The metabolism of chemicals occurs in two phases. In phase 1, reduction and oxidation reactions occur in cytochrome P-450 system, producing reactive and toxic substances. This oxidative stress induces glutathione S-transferase transcription to eliminate the toxic substances [130]. The toxic metabolites produced genetic instability, mutation, and may initiate the carcinogenesis. After the glucuronidation, sulfation, methylation, and conjunction reactions, the toxic agents are inactivated and become hydro soluble, and are excreted [131]. Mutation of p53 have been found to occur more frequently in tobacco and alcohol uses [132], suggesting that inactivation of p53 tumor suppressor gene may play an important role in tobacco-induced carcinogenesis.

Infection with human papilloma virus (HPV) is another risk factor specially for oropharyngeal cancer. This dsDNA virus has a 7 kB genome with number of early and late genes that synthesize proteins. Only a subset of more than 100 known HPV subtypes are oncogenic and high-risk types. HPVs encodes E6 and E7 oncoproteins that inactivates p53 and Rb respectively, leading to failure of tumor suppressor mechanism [133]. Few HPV-associated biomarkers have also been identified.

Association of OSCC with genetic polymorphisms in genes encoding human enzymes related to toxic substance metabolism has also been reported [134] that affects the individual’s susceptibility to noxious effects of cancer. Patients with Fanconi anemia (FA) are predisposed to develop OSCC [135]. Fanconi anemia is a recessive genetic disorder caused by biallelic mutation in a member of FA/BRCA pathway [136]. These cancers usually develop at a young age [137]. Another predisposing factor for cancers of hypopharynx is Plumer-Vinson (also called Paterson-Kelly) syndrome, which results from iron-deficiency [138].

Arroyo et al. [139] in a recent meta-analysis found 11 biomarkers of which they did meta-analysis for 4. Of these, only carcinoembryonic antigens (CEA) and soluble fragment of cytokeratin 19 (CYFRA21) were found to be significantly associated with oral cancer. Kasradze [140] in their review found 44 relevant proteins. Of them, proteins (14-3-3γ, extracellular matrix metalloproteinase inducer, and PA28γ) were found to be most significant. Other studies reported only the number of proteins differentially expressed without any identification [141,142,143,144]. Li et al. [145] identified differential protein expression in oral cancer patients with or without lymph node metastasis. Levels of PF4V1 and F13A1 correlated with number of lymph nodes. Immunoglobulin (Ig) Kappa chain C region and Isoform 2 of fructose bisphosphate aldolase A are found to increase in tobacco users; however, these markers are not yet validated [146]. Other investigators found Serpin family of proteins to be overexpressed in tobacco users [147], while some just reported number of proteins with differential expression [148].

The OSCCs occur as a consequence of proto-oncogene activation or tumor suppressor gene inactivation. Promoter hypermethylation is an example of indirect mechanism [149]. The three main alterations in gene function that occur in OSCC are (1) inactivation of p53 tumor suppressor gene, (2) inactivation of cyclin- dependant kinase (CDK) inhibitor p16, and (3) overexpression of epidermal growth factor receptor (EGFR); however, mutations in the EGFR genes occur with very low frequencies.

Inactivation of p53 tumor suppressor gene

p53 has a role in maintaining genomic stability, cell-cycle progression, cell differentiation, DNA repair, and apoptosis, and hence is aptly called the “guardian of the genome.” Mutations, deletions, and binding with viral proteins can produce p53 dysfunction [150]. It is found in approximately 50% of OSCC tumors and is one of the most common cancer development events [151] (Fig. 2).

Fig. 2
figure 2

Line diagram showing p53-mediated downstream signaling pathway

Inactivation of cyclin-dependant kinase (CDK) inhibitor p16

CDK are important molecules responsible for regulation of the cell-cycle. A number of these proteins have been identified and some of these can be targeted.

The function of CDK is regulated by number of genes like p16 and retinoblastoma gene. The effect is brought by regulating the phosphorylation of genes during G1 to S phase, through inhibition of CDK 4 and 6 [152]. The formation of CDK 4-6/cyclin D complex is inhibited by the p16 gene, and p21 gene (Fig. 1) thus leading to cell cycle arrest. Downregulation of these proteins is often associated with OSCC [153]. Regulation of phosphorylation of retinoblastoma gene by p16, p21, Cyclin D, and CDK leads to cell cycle arrest, DNA repair, and apoptosis if repair fails (Fig. 1).

Overexpression of EGFR

EGFR promotes epidermal cell growth and regulates cell proliferation, while regulation of metastasis and angiogenesis leads to development of OSCC. Therefore, EGFR proteins overexpression leads to increased tumor proliferation. EGFR ligand binding results in a molecular cascade that covers receptor-linked tyrosine kinase activation and other downstream pathways. EGFR family has four types of receptors and can have homo or heterodimers where in two similar members or different members bind to produce a dimer. EGFR controls many pathways however its overexpression is found to be associated with increased carcinogenesis [98, 127, 153, 154]. However, for targeting its mutations are normally looked at and mutant EGFR with chromosome 19-21 mutations are often targeted with tyrosine kinase inhibitors.

The biomarker data presented in the article show that this is still a new field and though a lot of the markers are identified, not much work has been done on validating these so far. Further, the data shows differences in the proteomic profile between continents and also between subsites. There is also a difference between tissue and secretome profile wherein more inflammatory markers are seen in saliva. The validation of diagnostic and prognostic biomarkers is a long-drawn process, and there is a need to have more proteomic research to identify better markers that will improve the diagnosis and prognostication of the patients.

Conclusion

Proteomic and genomic characterization of tumors is essential for identification of biomarkers of carcinogenesis, therapeutics, prognosis, progression, and metastasis. This is frequently been used in many tumors while their role in others is still under investigation. OSCC is uncommon tumor in the west but is common is South East Asia; hence, very little work is done on it. In recent times, the newer evidence has come that shows p53 and ras mutations to be common, and these tumors have poor prognosis compared to that without it. Further work on proteomics will help identify more markers of carcinogenesis, prognosis, and therapeutic significance and will help identify newer targets.