Skip to main content
Log in

An integrated study fusing systems biology and machine learning algorithms for genome-based discrimination of IPF and NSIP diseases: a new approach to the diagnostic challenge

  • Application of soft computing
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Idiopathic pulmonary fibrosis (IPF) and nonspecific interstitial pneumonia (NSIP) are the two types of idiopathic interstitial pneumonia that are most prevalent. IPF and NSIP, often known as chronic interstitial pneumonia, must be differentiated from other forms of idiopathic interstitial pneumonia. However, distinguishing IPF from NSIP on radiographic imaging is challenging. Our goal in this work is to propose a novel approach to this clinical diagnostic challenge by distinguishing IPF from NSIP and healthy individuals via a complete systems biology analysis of existing microarray datasets. The Gene Expression Omnibus (GEO) database was searched, and two microarray datasets were identified. These datasets included normal, IPF, and NSIP samples. A second dataset was retrieved to validate further the built prediction models trained on the first dataset. Following the completion of the stages for data preparation and normalization, the profiles of gene expression were analyzed to determine the differentially expressed genes (DEGs). After that, we constructed module analysis and identified possible biomarkers by leveraging the prioritized and statistically significant DEGs to construct protein–protein interaction networks. The DEGs with the most important priority were also utilized to determine the implicated Kyoto Encyclopedia of Genes and Genomes (KEGG) signaling pathways and gene ontology (GO) enrichment analyses. Using the Kaplan–Meier approach, we performed three separate assessments of the gene biomarkers' effect on patients' chances of survival. In addition, the found genes were validated not just through several different categorization models, but also by analyzing the published experimental work on the target genes. A total of 32 distinct genes were found when comparing IPF to normal, NSIP to normal, and IPF to NSIP. This was accomplished by identifying seven (14 genes), six (7 genes), and eight (13 genes) modules, as well as three genes (i.e., C6, C5, STAT1). Results from GO analysis and the KEGG pathway evaluation showed evidence for biological processes, cellular components, and molecular activities. When considering the overall survival (OS), fast progression (FP), and post-progression survival (PPS) rates, the Kaplan–Meier analysis demonstrated that 27 out of 32, 16 out of 32, and 13 out of 32 genes were significant. Additionally, the identified biomarkers show high performance for the machine learning classification models. In addition, the scientific literature findings have validated each gene biomarker discovered for IPF, NSIP, and other lung-related conditions. The 32-mRNA signature shows promise as a gene set for IPF and NSIP and as a driver for treatments with the ability to predict and manage patients' survival rates accurately.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Availability of data and material

Not applicable.

References

Download references

Acknowledgements

The authors would like to thank the Research Office of Tabriz University of Medical Sciences for providing support under the support scheme (Grant no. 64907).

Funding

Funding was provided by Tabriz University of Medical Sciences (Grant no. 64907).

Author information

Authors and Affiliations

Authors

Contributions

BS—conceptualization, supervision, and project administration; EA and BS—data curation, methodology, and investigation; EA, SA, and SD—formal analysis; EA and BS—roles/writing—original draft; EA, BS, SA, and SD—writing—review and editing. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Siavoush Dastmalchi or Babak Sokouti.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Ethical approval and consent to participate

This article does not contain any studies with human participants or animals performed by any of the authors.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Amjad, E., Asnaashari, S., Dastmalchi, S. et al. An integrated study fusing systems biology and machine learning algorithms for genome-based discrimination of IPF and NSIP diseases: a new approach to the diagnostic challenge. Soft Comput 28, 5721–5749 (2024). https://doi.org/10.1007/s00500-023-09364-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-023-09364-6

Keywords

Navigation