Abstract
The Python programing language is becoming a promising tool for data analysis in various fields. However, little attention has been paid to using Python in the field of analytical chemistry, though recent advances in instrumental analysis require robust and reliable data analysis. In order to overcome the difficulty in accurate analysis, multivariate analysis, or chemometrics, has been widely applied to various kinds of data obtained by instrumental analysis. In the present work, the potential usefulness of Python for chemometrics and related fields in chemistry is reviewed. Many practical tools for chemometrics, e.g., principal component analysis (PCA), partial least squares (PLS), support vector machine (SVM), etc., are included in the scikit-learn machine learning (ML) library for Python. Other useful libraries such as pyMCR for multivariate curve resolution (MCR), 2Dpy for two-dimensional correlation spectroscopy (2D-COS), etc. can be obtained from GitHub. For these reasons, a computational environment for chemometrics is easily constructed in Python.
Similar content being viewed by others
References
Python.org, www.python.org.
G. Van Rossum and Python Dev Team, "Python 3.6 Tutorial", 2016, Samurai Media Limited, Hong Kong.
J. Unpingco, "Python for Signal Processing", 2016, Springer, Heidelberg.
A.B. Downey, "Think DSP", 2016, O'Reilly Media, Sebastopol.
J. VanderPlas, "Python Data Science Handbook", 2016, O'Reilly Media, Sebastopol.
W. McKinney, "Python for Data Analysis"", 2017, O'Reilly Media, Sebastopol.
R. Mitchell, "Web Scraping with Python", 2018, O'Reilly Media, Sebastopol.
GitHub, github.com.
B. Beer, "Introducing GitHub"", 2018, O'Reilly Media, Sebastopol.
Scikit-learn, scikit-learn.org.
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, É. Duchesnay, J. Mach. Learn. Res., 2011, 12, 2825.
A. Géron, "Hands-On Machine Learning with Scikit-Learn and TensorFloW", 2017, O'Reilly Media, Sebastopol.
K.V. Mardia, J.T. Kent, J.M. Bibby, "Multivariate Analysis"", 3rd ed., 1980, Academic Press, San Diego.
A.C. Rencher, "Methods of Multivariate Analysis"", 3rd ed., 2012, Wiley, Hoboken.
M. Otto, "Chemometrics: Statistics and Computer Application in Analytical Chemistry"", 3rd ed., 2016, Wiley- VCH, Weinheim.
R.G. Brereton, "Chemometrics: Data Driven Extraction for Science", 2nd ed., 2018, Wiley, Hoboken.
H. Mark and J. Jerry Workman, "Chemometrics in Spectroscopy", 2018, Academic Press, Burlington.
Y. Morisawa, Anal. Sci., 2019, 35, 833.
K. Peason, Philosophical Magazine, 1901, 2, 559.
I.T. Jolliffe, "Principal Component Analysis"", 2002, Springer, Heidelberg.
N. Shioya, T. Shimoaka, T. Hasegawa, Anal. Sci., 2017, 33, 117.
N. Wijit, S. Prasitwattanaseree, S. Mahatheeranont, P. Wolschann, S. Jiranusornkul, P. Nimmanpipug, Anal. Sci., 2017, 33, 1211.
X.-F. Gao, Y. Xiao, Y. Dai, Anal. Sci., 2018, 34, 1067.
S. Wold, M. Sjöström, L. Eriksson, Chemometrics Intellig. Lab. Syst., 2001, 58, 109.
R. Tanaka, N. Takahashi, Y. Nakamura, Y. Hattori, K. Ashizawa, M. Otsuka, Anal. Sci., 2017, 33, 41.
S. Kasemsumran, N. Suttiwijitpukdee, V. Keeratinijakal, Anal. Sci., 2017, 33, 111.
M. Li, L. Zhang, X. Yao, X. Jiang, Anal. Sci., 2017, 33, 1225.
M.F. Barbosa, D.S.D. Nascimento, M. Grünhut, H.V. Dantas, B.S.F. Band, M.C.U.D. Araüjo, M. Insausti, Anal. Sci., 2017, 33, 1285.
Y. Chen and L. Dai, Anal. Sci., 2019, 55, 511.
pyMCR, github.com/usnistgov/pyMCR.
C.H. Camp, J. Res. Natl. Inst. Stand. Technol., 2019, 124, 1.
A. Tanabe, S. Morita, M. Tanaka, Y. Ozaki, Appl. Spectrosc., 2008, 62, 46.
A. Uda, S. Morita, Y. Ozaki, Polymer, 2013, 54, 2130.
C. Ruckebusch and L. Blanchet, Anal. Chim. Acta, 2013, 765, 28.
A. de Juan, J. Jaumot, R. Tauler, Anal. Methods, 2014, 6, 4964.
H. Noothalapati, K. Iwasaki, T. Yamamoto, Anal. Sci., 2017, 33, 15.
H. Yin, L. Zou, Y. Sheng, X. Bai, Q. Liu, B. Yan, Anal. Sci., 2018, 34, 207.
PyPI, pypi.org.
Anaconda, www.anaconda.com.
D.Y. Yan and J. Yan, "Hands-On Data Science with Anaconda", 2018, Packt Publishing, Birmingham.
Project Jupyter, jupyter.org.
D. Toomey, "Learning Jupyter", 2016, Packt Publishing, Birmingham.
Google Colaboratory, colab.research.google.com.
Microsoft Visutal Studio, visualstudio.microsoft.com.
M. Sabia and C. Wang, "Python Tools for Visual Studio", 2014, Packt Publishing, Birmingham.
Choosing the right estimator, scikit-learn.org/stable/tutorial/machine_learning_map/.
Scikit-Learn Cheat Sheet: Python Machine Learning, www.datacamp.com/community/blog/scikit-learn-cheat-sheet.
MATLAB, www.mathworks.com.
G. Ciaburro, "MATLAB for Machine Learning", 2017, Packt Publishing, Birmingham.
R. Wehrens, "Chemometrics with R", 2011, Springer, Heidelberg.
T. Adzuhata, J. Inotsume, T. Okamura, R. Kikuchi, T. Ozeki, M. Kajikawa, N. Ogawa, Anal. Sci., 2001, 17, 71.
A. Hyvärinen and E. Oja, Neural Networks, 2000, 13, 411.
A. Hyvärinen, J. Karhunen, E. Oja, "Independent Component Analysis'", 2001, Wiley-Interscience, New York.
D.D. Lee and H.S. Seung, Nature, 1999, 401, 788.
H.-T. Gao, T.-H. Li, K. Chen, W.-G. Li, X. Bi, Talanta, 2005, 66, 65.
K. Neymeyr, M. Sawall, D. Hess, J. Chemometrics, 2010, 24, 67.
B. Yousefi, S. Sojasi, C.I. Castanedo, X.P. Maldague, G. Beaudoin, M. Chamberland, Appl. Opt., 2018, 57, 6219.
G. Strang, "Introduction to Linear Algebra", 5th ed., 2016, Wellesley-Cambridge Press.
NumPy, numpy.org.
N.K.M. Faber, R. Bro, P.K. Hopke, Chemom. Intell. Lab. Syst., 2003, 65, 119.
A. Quatela, A.M. Gilmore, K.E.S. Gall, M. Sandros, K. Csatorday, A. Siemiarczuk, B.B. Yang, L. Camenen, Methods Appl. Fluoresc., 2018, 6, 1.
scikit-tensor-py3, github.com/evertrol/scikit-tensor-py3.
J.C. Hoggard and R.E. Synovec, Anal. Chem., 2007, 79, 1611.
K. Shigeta, H. Tao, K. Nakagawa, T. Kondo, T. Nakazato, Anal. Sci., 2018, 34, 227.
Y. Horie, A. Goto, S. Tsubuku, M. Itoh, S. Ikegawa, S. Ogawa, T. Higashi, Anal. Sci., 2019, 35, 427.
D. Bylund, R. Danielsson, G. Malmquist, K.E. Markides, J. Chromatogr., 2002, 961, 237.
T. Toyo'oka, Anal. Sci., 2017, 33, 555.
K.-i. Ohno, T. Hasegawa, T. Tamura, H. Utsumi, K. Yamashita, Anal. Sci., 2018, 34, 1017.
B. Schmidt, J.W. Jaroszewski, R. Bro, M. Witt, D. Stœrk, Anal. Chem., 2008, 80, 1978.
Y. Li, R. Guo, S. Liu, A. He, Y. Bao, S. Weng, Y. Huang, Y. Xu, Y. Ozaki, I. Noda, Anal. Sci., 2017, 33, 105.
S. Liu, X. Zhang, R. Guo, Y. Wei, I. Noda, Y. Ozaki, Y. Xu, J. Wu, Anal. Sci., 2018, 34, 1351.
J. Ferrasse, S. Chavez, P. Arlabosse, N. Dupuy, Thermochim. Acta, 2003, 404, 97.
C. Vogel, S. Morita, H. Sato, I. Noda, Y. Ozaki, H.W. Siesler, Appl. Spectrosc., 2007, 61, 755.
R. Xiao, H.-L. Wu, Y. Hu, X.-L. Yin, H.-W. Gu, Z. Liu, T. Wang, X.-D. Sun, R.-Q. Yu, Anal. Sci., 2017, 33, 29.
C. Qian, L.-F. Wang, W. Chen, Y.-S. Wang, X.-Y. Liu, H. Jiang, H.-Q. Yu, Anal. Chem., 2017, 89, 4264.
M. Kamruzzaman, G. ElMasry, D.-W. Sun, P. Allen, Anal. Chim. Acta, 2012, 714, 57.
R. Vejarano, R. Siche, W. Tesfaye, Int. J. Food Prop., 2017, 20, 1264.
H. Yabe, N. Katayama, M. Miyazawa, Anal. Sci., 2017, 33, 121.
K. Hara, T.-A. Yano, K. Suzuki, M. Hirayama, T. Hayashi, R. Kanno, M. Hara, Anal. Sci., 2017, 33, 853.
A. Watanabe, S. Morita, Y. Ozaki, Appl. Spectrosc., 2006, 60, 1054.
A. Watanabe, S. Morita, S. Kokot, M. Matsubara, K. Fukai, Y. Ozaki, J. Mol. Struct., 2006, 799, 102.
H. Shinzawa, S. Morita, Y. Ozaki, R. Tsenkova, Appl. Spectrosc., 2006, 60, 884.
T. Nœs and H. Martens, J. Chemom., 1988, 2, 155.
Y.M. Jung, Vib. Spectrosc., 2004, 36, 267.
T. Chen, E. Martin, G. Montague, Comput. Stat. Data Anal., 2009, 53, 3706.
S. Morita, H. Shinzawa, I. Noda, Y. Ozaki, Appl. Spectrosc., 2006, 60, 398.
S. Morita, K. Kitagawa, I. Noda, Y. Ozaki, J. Mol. Struct., 2008, 883, 181.
A.K. Jain, M.N. Murty, P.J. Flynn, ACM Computing Surveys, 1999, 31, 264.
J.H. Ward Jr., J. Am. Stat. Assoc., 1963, 58, 236.
J. Wu, "Advances in K-means Clustering", 2014, Springer, Heidelberg.
M. Ester, H.-P. Kriegel, J. Sander, X. Xu, KDD-96 Proceedings, 1996, 96, 226.
SciPy, www.scipy.org.
H. Yamazaki, S. Gohda, K.-i. Yokota, T. Shirasaki, Anal. Sci., 2001, 17, i1565.
M. Hida, H. Satoh, T. Mitsui, Anal. Sci., 2001, 17, i1507.
Y. Suzuki, M. Kasamatsu, S. Suzuki, T. Nakanishi, M. Takatsu, S. Muratsu, O. Shimoda, S. Watanabe, Y. Nishiwaki, N. Miyamoto, Anal. Sci., 2005, 21, 855.
G. Tripolis, Informatica, 2007, 31, 249.
R.A. Fisher, Ann. Eugen., 1936, 7, 179.
K. Ariyama, H. Horita, A. Yasui, Anal. Sci., 2004, 20, 871.
T. Hofmann, B. Schölkopf, A.J. Smola, Ann. Stat., 2008, 1171.
N.S. Altman, Am. Stat., 1992, 46, 175.
H. Sun, J. Med. Chem., 2005, 48, 4031.
I. Steinwart and A. Christmann, "Support Vector Machines", 2008, Springer, Heidelberg.
A. Niazi, J. Zolgharnein, S. Afiuni-Zadeh, Anal. Sci., 2007, 23, 1311.
Y.-P. Zhou, L. Xu, L.-J. Tang, J.-H. Jiang, G.-L. Shen, R.-Q. Yu, Y. Ozaki, Anal. Sci., 2007, 23, 793.
A.A. Ensafi, M. Taei, T. Khayamian, F. Hasanpour, Anal. Sci., 2010, 26, 803.
H. Chen, Z. Lin, H. Wu, L. Wang, T. Wu, C. Tan, Spectrochim. Acta, Part A, 2015, 135, 185.
T. Zhang, D. Xia, H. Tang, X. Yang, H. Li, Chemom. Intell. Lab. Syst., 2016, 157, 196.
S. Kito, T. Hattori, Y. Murakami, Anal. Sci., 1991, 7, 761.
S. Sun, H. Huang, Y. Xu, S. Cai, Anal. Sci., 2001, 17, a451.
K. Saeki, K. Funatsu, K. Tanabe, Anal. Sci., 2003, 19, 309.
E.C. Ferreira, D.M. Milori, E.J. Ferreira, R.M. Da Silva, L. Martin-Neto, Spectrochim. Acta, Part B, 2008, 63, 1216.
P. Mamoshina, A. Vieira, E. Putin, A. Zhavoronkov, Mol. Pharm., 2016, 13, 1445.
M. Ziatdinov, O. Dyck, A. Maksov, X. Li, X. Sang, K. Xiao, R.R. Unocic, R. Vasudevan, S. Jesse, S.V. Kalinin, ACS Nano, 2017, 11, 12742.
TensorFlow, www.tensorflow.org.
Chainer, chainer.org.
Z. Chen, L. Zang, Y. Wu, H. Nakayama, Y. Shimada, R. Shrestha, Y. Zhao, Y. Miura, H. Chiba, S.-P. Hui, N. Nishimura, Anal. Sci., 2018, 34, 1201.
M. Mimura, S. Tomita, R. Kurita, K. Shiraki, Anal. Sci., 2019, 35, 99.
E. Taira, M. Ueno, K. Saengprachatanarug, Y. Kawamitsu, J. Near Infrared Spectrosc., 2013, 21, 281.
R. Tsenkova, S. Atanassova, K. Itoh, Y. Ozaki, K. Toyoda, J. Anim. Sci., 2000, 78, 515.
T. Fujimoto, Y. Kurata, K. Matsumoto, S. Tsuchikawa, J. Near Infrared Spectrosc., 2007, 16, 529.
C.-K. Huang, M. Ando, H.-o. Hamaguchi, S. Shigeto, Anal. Chem., 2012, 84, 5661.
T. Miyasaka, T. Ikemoto, T. Kohno, Appl. Surf. Sci., 2008, 255, 1576.
Z.R. Lazic, "Design of Experiments in Chemical Engineering: A Practical Guide", 2006, Wiley-VCH, Weinheim.
pyDOE2github.com/clicumu/pyDOE2.
T. Takayama, H. Mizuno, T. Toyo'oka, K. Todoroki, Anal. Sci., 2019, 35, 1053.
V. Liberman, R. Adato, T.H. Jeys, B.G. Saar, S. Erramilli, H. Altug, Opt. Express, 2012, 20, 11953.
M. Eliasson, S. Rännar, R. Madsen, M.A. Donten, E. Marsden-Edwards, T. Moritz, J.P. Shockcor, E. Johansson, J. Trygg, Anal. Chem., 2012, 84, 6869.
I. Noda, Appl. Spectrosc., 1993, 47, 1329.
I. Noda, Y. Ozaki, "Two-Dimensional Correlation Spectroscopy: Applications in Vibrational and Optical Spectroscopy", 2004, Wiley, Chichester.
I. Noda, Anal. Sci., 2007, 23, 139.
Y. Park, S. Jin, I. Noda, Y.M. Jung, J. Mol. Struct., 2018, 1168, 1.
2Dpy, github.com/shigemorita/2Dpy.
matplotlib, matplotlib.org.
T. Nishii, S. Morita, T. Genkawa, M. Watari, D. Ishikawa, Y. Ozaki, Appl. Spectrosc., 2015, 69, 665.
T. Nishii, T. Genkawa, M. Watari, Y. Ozaki, Anal. Sci., 2012, 28, 1165.
W. Gu and P. Wu, Anal. Sci., 2007, 23, 823.
Student, Biometrika, 1908, 1.
J. Nakanishi, K. Sugiyama, H. Matsuo, Y. Takahashi, S. Omura, T. Nakashima, Anal. Sci., 2019, 35, 65.
L. Dolatyari, M.R. Yaftian, S. Rostamnia, M.S. Seyeddorraji, Anal. Sci., 2017, 33, 769.
X. Zhang, F. Ji, Y. Li, T. He, Y. Han, D. Wang, Z. Lin, S. Chen, Anal. Sci., 2018, 34, 407.
Y. Zhu, Y. Kitamaki, M. Numata, Anal. Sci., 2017, 33, 209.
A. Savitzky and M.J. Golay, Anal. Chem., 1964, 36, 1627.
M.T. Heideman, D.H. Johnson, C.S. Burrus, Arch. Hist. Exact Sci., 1985, 34, 265.
Pandas, pandas.pydata.org.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Morita, S. Chemometrics and Related Fields in Python. ANAL. SCI. 36, 107–111 (2020). https://doi.org/10.2116/analsci.19R006
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.2116/analsci.19R006