Abstract
Pistacia chinensis subsp. integerrima is one of the medicinal plants, well known for gall formation and popularly used in Ayurveda to treat various systemic diseases such as chronic disorders, respiratory problems, etc. P. integerrima genome characterization will aid in the study of Pistacia genes and pathways involved in therapeutic application. To understand the biological characteristics of this plant and to gain the genetic insight into the biosynthesis of its natural compounds, the whole genome of P. integerrima and its leaf transcriptome was sequenced using Illumina sequencing technology. The sequenced genome was functionally annotated, and gene prediction was performed with integrated genome annotation workflow. The pathway analysis was carried out using KEGG database. We obtained a draft genome assembly of 462 Mb with N50 16,145 bp. A total of 39,452 genes were found, and 18,492 of these contained RNA or protein evidence. We characterized the genes involved in biosynthetic pathways of different plant secondary metabolites such as flavonoids and terpenoids. Also, we identified miR397 and miR828 family noncoding RNA; which mainly targets the laccase (LCA) and MYB protein functioning respectively. Phylogeneic analysis showed that P. integerrima is genetically more closer to P. vera. In this study, we attempt to explore the whole genome information of P. integerrima which will provide a genomic insight in the future for omics studies as well as serves as valuable resource for the molecular characterization of medicinal compounds.
Similar content being viewed by others
References
Aggarwal B. B., Ichikawa H., Garodia P., Weerasinghe P., Sethi G., Bhatt I. D. et al. 2006 From traditional Ayurvedic medicine to modern medicine: Identification of therapeutic targets for suppression of inflammation and cancer. Expert Opin. Ther. Targets 10, 87–118.
Andrews S. 2010 FastQC: a quality control tool for high throughput sequence data (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/).
Bankevich A., Nurk S., Antipov D., Gurevich A. A., Dvorkin M., Kulikov A. S. et al. 2012 SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477.
Bao W., Kojima K. K. and Kohany O. 2015 Repbase update, a database of repetitive elements in eukaryotic genomes. Mob. DNA 6, 11.
Barinder K. and Saurabh S. 2015 A review on gall karkatshringi. J. Med. Plants Res. 9, 636–640.
Basr I. H., Kafkas S. and Topaktas M. 2003 Chromosome numbers of four Pistacia (Anacardiaceae) species. J. Hortic. Sci. Biotechnol. 78, 35–38.
Bi Q., Zhao Y., Du W., Lu Y., Gui L., Zheng Z. et al. 2019 Pseudomolecule-level assembly of the Chinese oil tree yellowhorn (Xanthoceras sorbifolium) genome. Gigascience 8, 1–11.
Bibi Y., Zia M. and Qayyum A. 2015 Review-An overview of Pistacia integerrima a medicinal plant species: Ethnobotany, biological activities and phytochemistry. Pak. J. Pharm. Sci. 28, 1009–1013.
Blanco E., Parra G. and Guigó R. 2007 Using geneid to Identify Genes. In Current protocols in bioinformatics. 4, unit 4.3.
Boeckmann B., Bairoch A., Apweiler R., Blatter M., Estreicher A., Gasteiger E. et al. 2003 The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370.
Boetzer M., Henkel C. V., Jansen H. J. and Butler D. 2011 Scaffolding pre-assembled contigs using SSPACE Summary. Bioinformatics (Oxford, England). 27, 578–579.
Camacho C., Coulouris G., Avagyan V., Ma N., Papadopoulos J., Bealer K. et al. 2009 BLAST+: architecture and applications. BMC Bioinformatics 9, 1–9.
Chan P. and Lowe T. 2019 tRNAscan-SE: searching for tRNA genes. Gene Predict. 1962, 1–21.
Chikhi R. and Medvedev P. 2014 Informed and automated k-mer size selection for genome assembly. Bioinformatics 30, 31–37.
Chopra R. N., Nayar S. L. and Chopra I. 1986 Glossary of Indian medicinal plants (Including the Supplement). Council of Scientific and Industrial Research, New Delhi.
Ghaffari S. M., Shabazaz M. and Behboodi B. S. 2005 Chromosome variation in Pistacia genus. Options Mediterraneennes Serie A. 63, 347–354.
Grabherr M. G., Haas B. J., Yassour M., Levin J. Z., Thompson D. A., Amit I. et al. 2011 Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652.
Griffiths-Jones S., Bateman A., Marshall M., Khanna A. and Eddy S. R. 2003 Rfam: an RNA family database. Nucleic Acids Res. 31, 439–441.
Haas B. J., Delcher A. L., Mount S. M., Wortman Jr. J. R. et al. 2003 Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31, 5654–5666.
Haas B. J., Salzberg S. L., Zhu W., Pertea M., Allen J. E., Orvis J. et al. 2008 Automated eukaryotic gene structure annotation using evidence modeler and the program to assemble spliced alignments. Genome Biol. 9, 1–22.
Huang S., Zhou J., Gao L. and Tang Y. 2021 Plant miR397 and its functions. Funct. Plant Biol. 48, 361–370.
Keilwagen J., Wenk M., Erickson J. L., Schattat M. H., Grau J. and Hartung F. 2016 Using intron position conservation for homology-based gene prediction. Nucleic Acids Res. 44, e89.
Kent W. J. 2002 BLAT - The BLAST -like alignment tool. Genome Res. 12, 656–664.
Kim D., Pertea G., Trapnell C., Pimentel H., Kelley R. and Salzberg S. L. 2013 TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36.
Korf I. 2004 Gene finding in novel genomes. BMC Bioinformatics 9, 1–9.
Krueger F. 2015 Trim galore (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/).
Lechner M., Findeiß S., Steiner L., Marz M., Stadler P. F. and Prohaska S. J. 2011 Proteinortho: detection of (Co-) orthologs in large-scale analysis. BMC Bioinformatics 12, 124.
Li C., Wang M., Qiu X., Zhou H. and Lu S. 2021 Noncoding RNAs in medicinal plants and their regulatory roles in bioactive compound production. Curr. Pharm. Biotechnol. 22, 341–359.
Lu M., An H. and Li L. 2016 Genome survey sequencing for the characterization of the genetic background of rosa roxburghii tratt and leaf ascorbate metabolism genes. PLoS One 11, 1–17.
Luo R., Liu B., Xie Y., Li Z., Huang W., Yuan J. et al. 2012 SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience. 1, 2047-217X-1-18.
Majoros W. H., Pertea M. and Salzberg S. L. 2004 TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics (Oxford, England) 20, 2878–2879.
Marçais G. and Kingsford C. 2011 A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 27, 764–770.
Mistry J., Chuguransky S., Williams L., Qureshi M., Salazar G. A., Sonnhammer E. L. L. et al. 2021 Pfam: The protein families database in 2021. Nucleic Acids Res. 49, D412–D419.
Moriya Y., Itoh M., Okuda S., Yoshizawa A. C. and Kanehisa M. 2007 KAAS: An automatic genome annotation and pathway reconstruction server. Nucleic Acids Res. 35, 182–185.
Munir M., Khan M. A., Ahmed M., Bano A., Ahmed S. N., Tariq K. et al. 2011 Foliar epidermal anatomy of some ethnobotanically important species of wild edible fruits of northern Pakistan. J. Med. Plants Res. 5, 5873–5880.
Nawrocki E. P. and Eddy S. R. 2013 Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935.
Novaes R. M. L., Rodrigues J. G. and Lovato M. B. 2009 An efficient protocol for tissue sampling and DNA isolation from the stem bark of Leguminosae trees. Genet. Mol. Res. 8, 86–96.
Orwa C., Mutua A., Kindt R., Jamnadass R. and Simons A. 2009 Agroforestree database: a tree reference and selection guide version 4.0. World Agroforestry Centre, Kenya. (http://apps.worldagroforestry.org/treedb2/).
Pant S. and Samant S. S. 2010 Ethnobotanical observations in the mornaula reserve forest of Komoun, West Himalaya, India. Ethnobot. Leafl. 14, 193–217.
Rauf A. 2019 A Mini Review on a Pistacia integerrima well-known medicinal plant: its active phytochemicals with exciting pharmacological profile. Act. Sci. Nutr. Health. 3, 45–48 .
Rauf A., Saleem M., Uddin G., Siddiqui B. S., Khan H., Raza M. et al. 2015 Phosphodiesterase-1 Inhibitory Activity of Two Flavonoids Isolated from Pistacia integerrima J. L. Stewart Galls. Evid. Based Complement Alternat. Med. 2015, 506564.
Sima F. A., Waterhouse R. M., Ioannidis P., Kriventseva E. V. and Zdobnov E. M. 2015 Genome analysis BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212.
Stanke M. and Stephan W. 2003 Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics (Oxford, England). 19, 215–225.
Thiel T., Michalek W., Varshney R. K. and Graner A. 2003 Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.). Theor. Appl. Genet. 106, 411–422.
Tirumalai V., Swetha C., Nair A., Pandit A. and Shivaprasad P. V. 2019 MiR828 and miR858 regulate VvMYB114 to promote anthocyanin and flavonol accumulation in grapes. J. Exp. Bot. 70, 4775–4791.
Trapnell C., Williams B. A., Pertea G., Mortazavi A., Kwan G., van Baren M. J. et al. 2010 Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 28, 511–515.
Ullah Z., Mehmood R., Imran M., Malikb A. and Afzal R. A. 2012 Flavonoid constituents of Pistacia integerrima. Nat Prod Commun. 7, 1011–1014.
Vogt T. 2010 Phenylpropanoid biosynthesis. Mol. Plant 3, 2–20.
Wang Q., Quan S. and Xiao H. 2019 Towards efficient terpenoid biosynthesis: manipulating IPP and DMAPP supply. Bioresour. Bioprocess 6, 6.
Xu L., Dong Z., Fang L., Luo Y., Wei Z., Guo H. et al. 2019 OrthoVenn2: a web server for whole-genome comparison and annotation of orthologous clusters across multiple species. Nucleic Acids Res. 47, 52–58.
Zahoor M., Zafar R. and Rahman N. U. 2018 Isolation and identification of phenolic antioxidants from Pistacia integerrima gall and their anticholine esterase activities. Heliyon 4, e01007.
Zhang Y., Zheng L., Zheng Y., Zhou C., Huang P., Xiao X. et al. 2019 Assembly and annotation of a draft genome of the medicinal plant Polygonum cuspidatum. Front. Plant Sci. 10, 1274.
Ziya M. E., Kafkas S., Khodaeiaminjan M., Çoban N. and Gözel H. 2016 Genome survey of pistachio (Pistacia vera L.) by next generation sequencing: development of novel SSR markers and genetic diversity in Pistacia species. BMC Genomics 17, 998.
Acknowledgements
Authors acknowledge Dr N. B. Brindavanam for his support at every stage in this study, right from inception; Bengaluru Genomics Centre for their support in sequencing and data analysis; Dabur India Ltd. and Dr Sasibhushan Vedula for the support at various strata.
Author information
Authors and Affiliations
Contributions
MG designed field experiments and data generation and analysis. PN and MG were involved in sample collection. NB and AB were involved in authentic sample collection. MG and PV was involved in initiating and heading the project. SNH was involved in data analysis, interpretation and drafting manuscript. MG, PN, SK, NB and PV were involved in editing the manuscript and providing inputs.
Corresponding authors
Additional information
Corresponding editor: Durgadas P. Kasbekar
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Hegde, S.N., Begum, N., Bhatt, A. et al. De novo genome assembly and annotation of gall-forming medicinal plant Pistacia chinensis subsp. integerrima (J. L. Stewart ex Brandis) Rech. f.. J Genet 101, 51 (2022). https://doi.org/10.1007/s12041-022-01391-w
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12041-022-01391-w