Skip to main content

Data Transformation for Clustering Utilization for Feature Detection in Mass Spectrometry

  • Conference paper
  • First Online:
Bioinformatics and Biomedical Engineering (IWBBIO 2022)

Abstract

Feature detection and peak detection are one of the first steps of mass spectrometry data processing. This data comes in large volumes; thus, the processing needs to be optimized, not overloaded. State-of-the-art clustering algorithms can not perform feature detection for several reasons. First issue is the volume of the data, second is the disparity of the sampling frequency in the MZ and RT axis. Here we show the data transformation to utilize the clustering algorithms without the need to redefine its kernel. Data are first pre-clustered to obtain regions that can be processed independently. Then we transform the data so that the numerical differences between consecutive points should be the same in both space axes. We applied a set of clustering algorithms for each region to find the features, and we compared the result with the Gridmass peak detector. These findings may facilitate better utilization of the 2D clustering method as feature detectors for mass spectra.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: OPTICS. ACM SIGMOD Record 28(2), 49–60 (1999). https://doi.org/10.1145/304181.304187. https://dl.acm.org/doi/abs/10.1145/304181.304187

  2. Castillo, S., Gopalacharyulu, P., Yetukuri, L., Orešič, M.: Algorithms and tools for the preprocessing of LC-MS metabolomics data. Chemometr. Intell. Lab. Syst. 108(1), 23–32 (2011). https://doi.org/10.1016/J.CHEMOLAB.2011.03.010

  3. Constantinopoulos, C., Titsias, M.K., Likas, A.: Bayesian feature and model selection for Gaussian mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 28(6), 1013–1018 (2006). https://doi.org/10.1109/TPAMI.2006.111

  4. Dixon, S.J., Brereton, R.G., Soini, H.A., Novotny, M.V., Penn, D.J.: An automated method for peak detection and matching in large gas chromatography-mass spectrometry data sets. J. Chemometr. 20(8–10), 325–340 (2006). https://doi.org/10.1002/CEM.1005

  5. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. Technical report (1996). www.aaai.org

  6. Katajamaa, M., Orešič, M.: Data processing for mass spectrometry-based metabolomics. J. Chromatogr. A 1158(1–2), 318–328 (2007). https://doi.org/10.1016/J.CHROMA.2007.04.021

  7. McDonnell, L.A., van Remoortere, A., de Velde, N., van Zeijl, R.J., Deelder, A.M.: Imaging mass spectrometry data reduction: automated feature identification and extraction. J. Am. Soc. Mass Spectrom. 21(12), 1969–1978 (2010). https://doi.org/10.1016/J.JASMS.2010.08.008

  8. Melymuk, L., Diamond, M.L., Riddell, N., Wan, Y., Vojta, Š., Chittim, B.: Challenges in the analysis of novel flame retardants in indoor dust: results of the INTERFLAB 2 interlaboratory evaluation. Environ. Sci. Technol. 52(16), 9295–9303 (2018)

    Google Scholar 

  9. Morris, J.S., Coombes, K.R., Koomen, J., Baggerly, K.A., Kobayashi, R.: Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum. Bioinformatics 21(9), 1764–1775 (2005). https://doi.org/10.1093/BIOINFORMATICS/BTI254. https://academic.oup.com/bioinformatics/article/21/9/1764/408956

  10. Roberts, S.J., Husmeier, D., Rezek, I., Penny, W.: Bayesian approaches to Gaussian mixture modeling. IEEE Trans. Pattern Anal. Mach. Intell. 20(11), 1133–1142 (1998). https://doi.org/10.1109/34.730550

    Article  Google Scholar 

  11. Schubert, E., Sander, J., Ester, M., Kriegel, H.P., Xu, X.: DBSCAN revisited, revisited. ACM Trans. Database Syst. (TODS) 42(3) (2017). https://doi.org/10.1145/3068335. https://dl.acm.org/doi/abs/10.1145/3068335

  12. Treviño, V., et al.: GridMass: a fast two-dimensional feature detection method for LC/MS. J. Mass Spectrom. 50(1), 165–174 (2015). https://doi.org/10.1002/jms.3512. http://doi.wiley.com/10.1002/jms.3512

  13. Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: a new data clustering algorithm and its applications. Data Min. Knowl. Discov. 1(2), 141–182 (1997). https://doi.org/10.1023/A:1009783824328. https://link.springer.com/article/10.1023/A:1009783824328

Download references

Acknowledgment

We make the test dataset and proof-of-concept available at 10.5281/zenodo.6337968.

Authors thanks to Research Infrastructure RECETOX RI (No LM2018121) financed by the Ministry of Education, Youth and Sports, and Operational Programme Research, Development and Innovation - project CETOCOEN EXCELLENCE (No CZ.02.1.01/0.0/0.0/17_043/0009632) for supportive background.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vojtech Barton .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Barton, V., Skutkova, H. (2022). Data Transformation for Clustering Utilization for Feature Detection in Mass Spectrometry. In: Rojas, I., Valenzuela, O., Rojas, F., Herrera, L.J., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2022. Lecture Notes in Computer Science(), vol 13347. Springer, Cham. https://doi.org/10.1007/978-3-031-07802-6_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-07802-6_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-07801-9

  • Online ISBN: 978-3-031-07802-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics