Use of Unsupervised Machine Learning for Agricultural Supply Chain Data Labeling

Silva, Roberto F.; Mostaço, Gustavo M.; Xavier, Fernando; Saraiva, Antonio M.; Cugnasca, Carlos E.

doi:10.1007/978-3-030-84148-5_11

Roberto F. Silva²³,
Gustavo M. Mostaço²³,
Fernando Xavier²³,
Antonio M. Saraiva²³ &
…
Carlos E. Cugnasca²³

Part of the book series: Springer Optimization and Its Applications ((SOIA,volume 183))

397 Accesses
1 Citations

Abstract

The heterogeneous data produced in agricultural supply chains can be divided into three main systems: (i) product identification and traceability, related to identifying production batches and locations of the product throughout the supply chain; (ii) environmental monitoring, considering environmental variables in production, storage and transportation; and (iii) processes monitoring, related to the data describing the production processes and inputs used. Data labeling on the different systems can improve decision-making, traceability, and coordination in the chains. Nevertheless, this is a labor-intensive task. The objective of this Chapter was to evaluate if unsupervised machine learning techniques could be used to identify patterns in the data, clusters of data, and generate labels for an unlabeled agricultural supply chain dataset. A dataset was generated through merging seven datasets that contained information from the three systems, and the k-means and self-organizing maps (SOM) models were evaluated on clustering the data and generating labels. The use of principal component analysis (PCA) was also evaluated together with the k-means model. Several supervised and unsupervised learning metrics were evaluated. The SOM model with the Gaussian neighborhood function provided the best results, with an F1-score of 0.91 and a more defined clusters map. A series of recommendations for the use of unsupervised learning techniques on supply chain data are discussed. The methodology used in this Chapter can be implemented on other supply chains and unsupervised machine learning research. Future work is related to improving the dataset and implementing other clustering models and dimensionality reduction techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Hardcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Assessing the Environmental Performance of the Food Industry Using Unsupervised Machine Learning

Detection and diagnosis of process fault using unsupervised learning methods and unlabeled data

Article 10 February 2023

A Novel on Altered K-Means Algorithm for Clustering Cost Decrease of Non-labeling Big-Data

Notes

1.
For a thorough description of the steps of each clustering model, we refer the reader to the work by Mehta et al. [17].
2.
https://jupyter.org/
3.
https://www.libreoffice.org/discover/calc/
4.
Data preprocessing and processing techniques focus on eliminating faulty data, dealing with missing values, and preparing the data to be used by the knowledge extraction and pattern recognition models. For an in-depth review of data processing and preprocessing techniques, the reader is referred to the work by Van den Broeck et al. [28]. If the reader is interested in processing time series data, please refer to the work by Wang and Wang [29]. If the reader is interested in processing natural language data, please refer to the work by Sun et al. [30].
5.
Data fusion can be described as the combination of multiple datasets or data sources to improve the quality of the information [31]. It is important to note that these datasets may contain different features or variables. For an in-depth review of the different techniques used for data fusion, the reader is referred to the work by Castanedo [31].
6.
Data normalization is an essential step for most machine learning models, especially for artificial neural networks. According to Singh and Singh [32], it can be defined as transforming the features to a specific range. Two of its main advantages are: allowing for faster model training, and avoiding problems due to features with very distinct value ranges. For more information on the use of data normalization for classification, please refer to the work by Singh and Singh [32].
7.
For implementation aspects of the standard scaler technique, the reader is referred to the scikit-learn library documentation: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
8.
For implementation aspects of the MinMax scaler technique, the reader is referred to the scikit-learn library documentation: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html
9.
The Agglomerative is a commonly used variation of the hierarchical clustering models. Initially, each data point is considered a cluster. The algorithm then joins neighbor clusters, using a linkage method to evaluate the dissimilarity between them and to choose which clusters should be aggregated [12]. A dendrogram can then be used to visualize the results of each step of the algorithm.
10.
For an in-depth description of the different methods that are commonly used to estimate the optimal number of clusters for hierarchical clustering models, please refer to the work by Zambelli [33].
11.
For a detailed analysis of the k-means model, the reader is referred to the works by Jain [9] and Steinley [10]. For implementation aspects of the k-means model, the reader is referred to the scikit-learn library tutorials on its implementation: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html and https://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_digits.html#sphx-glr-auto-examples-cluster-plot-kmeans-digits-py
12.
For a detailed description of the elbow method, as well as other methods for determining the optimal number of clusters for applying the k-means model, please refer to the works by Kodinariya and Makwana [34] and Zambelli [33].
13.
For a detailed analysis of the PCA method, as well as examples of uses, please refer to the work by Jolliffe and Cadima [35]. For implementation aspects related to PCA, the reader is referred to the scikit-learn library tutorials on its implementation: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html and https://scikit-learn.org/stable/auto_examples/decomposition/plot_pca_iris.html#sphx-glr-auto-examples-decomposition-plot-pca-iris-py
14.
For a detailed analysis of the SOM model, the reader is referred to the works by Kohonen [11], Yin [36], and Liu and Weisberg [37]. For implementation aspects of the SOM model, the reader is referred to the Minisom library tutorials on its Github repository: https://github.com/JustGlowing/minisom
15.
https://github.com/JustGlowing/minisom
16.
For implementation aspects as well as descriptions of the supervised learning metrics used in this chapter, please refer to the scikit-learn library documentation: https://scikit-learn.org/stable/modules/model_evaluation.html#classification-metrics
17.
For implementation aspects as well as descriptions of the supervised and unsupervised clustering metrics used in this chapter, please refer to the scikit-learn library documentation on clustering performance evaluation: https://scikit-learn.org/stable/modules/clustering.html#clustering-evaluation
18.
https://www.kaggle.com
19.
https://datahub.io
20.
https://github.com
21.
For an in-depth analysis of classification metrics, including the F1 score, we refer the reader to the work by Goutte and Gaussier [42]. For implementation purposes, the reader can refer to the Scikit Learn documentation: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html
22.
https://colab.research.google.com
23.
https://github.com/rfsilva1/unsupervised-learning-agriculture
24.
Cluster maps are used with the SOM model to graphically illustrate the clusters’ spatial relations [24]. These maps can be drawn using different formats. For a thorough exploration of different cluster maps, as well as best practices, please refer to the work by Samsonova et al. [24].
25.
For an in-depth exploration of the data imbalance problem, refer to the work by Kotsiantis et al. [45].
26.
The strategy of partially labelling the dataset and then using the partially labelled subset to train the model and predict the labels for the whole dataset is a form of semi-supervised learning. For a detailed description of semi-supervised learning models, please refer to the work by Bagherzadeh and Asil [46].
27.
For a full list of papers that used the Minisom implementation of SOM, we refer the reader to its official Github repository: https://github.com/JustGlowing/minisom

References

Chopra, S., & Meindl, P. (2013). Supply chain management: Strategy, planning, and operation (5th ed., 528pp). Pearson Education.
Google Scholar
Corella, V. P., Rosalen, R. C., & Simarro, D. M. (2013). SCIF-IRIS framework: A framework to facilitate interoperability in supply chains. International Journal of Computer Integrated Manufacturing, 26(1–2), 67–86.
Article Google Scholar
Huang, C. C., & Lin, S. H. (2010). Sharing knowledge in a supply chain using the semantic web. Expert Systems with Applications, 37(4), 3145–3161.
Article Google Scholar
Pang, Z., Chen, Q., Han, W., & Zheng, L. (2015). Value-centric design of the internet-of-things solution for food supply chain: Value creation, sensor portfolio and information fusion. Information Systems Frontiers, 17(2), 289–319.
Article Google Scholar
Verdouw, C. N., Vucic, N., Sundmaeker, H., & Beulens, A. (2013). Future internet as a driver for virtualization, connectivity and intelligence of agri-food supply chain networks. International Journal on Food System Dynamics, 4(4), 261–272.
Google Scholar
Verdouw, C. N., Wolfert, J., Beulens, A. J. M., & Rialland, A. (2016). Virtualization of food supply chains with the internet of things. Journal of Food Engineering, 176, 128–136.
Article Google Scholar
Verdouw, C., Sundmaeker, H., Tekinerdogan, B., Conzon, D., & Montanaro, T. (2019). Architecture framework of IoT-based food and farm systems: A multiple case study. Computers and Electronics in Agriculture, 165(104939), 1–26.
Google Scholar
Harris, I., Wang, Y., & Wang, H. (2015). ICT in multimodal transport and technological trends: Unleashing potential for the future. International Journal of Production Economics, 159, 88–103.
Article Google Scholar
Jain, A. K. (2010). Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31(8), 651–666.
Article Google Scholar
Steinley, D. (2006). K-means clustering: A half-century synthesis. British Journal of Mathematical and Statistical Psychology, 59(1), 1–34.
Article MathSciNet Google Scholar
Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43(1), 59–69.
Article MathSciNet MATH Google Scholar
Hartigan, J. A. (2000). 'Statistical clustering'. International Encyclopedia of the Social and Behavioral Science, Yale University, 15014–15019. https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.38.1277
Lance, G. N., & Williams, W. T. (1967). A general theory of classificatory sorting strategies: 1. Hierarchical systems. The Computer Journal, 9(4), 373–380.
Article Google Scholar
Ghahramani, Z. (2003). Unsupervised learning. In: Summer School on Machine Learning. Springer: Berlin, 72–112.
Google Scholar
Liakos, K., Busato, P., Moshou, D., Pearson, S., & Bochtis, D. (2018). Machine learning in agriculture: A review. Sensors, 18(8), 1–29.
Article Google Scholar
Arthur, D., & Vassilvitskii, S. (2007). K-means++: The advantages of careful seeding. In Proceedings of the 18th annual ACM-SIAM symposium on discrete algorithms (pp. 1027–1035). ACM.
MATH Google Scholar
Mehta, P., Shah, H., Kori, V., Vikani, V., Shukla, S., & Shenoy, M. (2015). Survey of unsupervised machine learning algorithms on precision agricultural data. In 2015 international conference on innovations in information, embedded and communication systems (ICIIECS) (pp. 1–8). DRDO.
Google Scholar
Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining, AAAI Press, pp. 226–231.
Google Scholar
Ankerst, M., Breunig, M. M., Kriegel, H. P., & Sander, J. (1999). OPTICS: Ordering points to identify the clustering structure. ACM SIGMOD Record, 28(2), 49–60.
Article Google Scholar
Gowda, K. C., & Ravi, T. V. (1995). Divisive clustering of symbolic objects using the concepts of both similarity and dissimilarity. Pattern Recognition, 28(8), 1277–1282.
Article Google Scholar
Fisher, D. H. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2(2), 139–172.
Article Google Scholar
Ramesh, V., Ramar, K., & Babu, S. (2013). Parallel k-means algorithm on agricultural databases. International Journal of Computer Science Issues (IJCSI), 10(1), 710–713.
Google Scholar
Kind, M. C., & Brunner, R. J. (2014). SOMz: Photometric redshift PDFs with self-organizing maps and random atlas. Monthly Notices of the Royal Astronomical Society, 438(4), 3409–3421.
Article Google Scholar
Samsonova, E. V., Kok, J. N., & IJzerman, A. P. (2006). TreeSOM: Cluster analysis in the self-organizing map. Neural Networks, 19(6–7), 935–949.
Article MATH Google Scholar
Jeong, K. S., Hong, D. G., Byeon, M. S., Jeong, J. C., Kim, H. G., Kim, D. K., & Joo, G. J. (2010). Stream modification patterns in a river basin: Field survey and self-organizing map (SOM) application. Ecological Informatics, 5(4), 293–303.
Article Google Scholar
Ruß, G., & Kruse, R. (2011). Exploratory hierarchical clustering for management zone delineation in precision agriculture. In Industrial conference on data mining (pp. 161–173). Springer.
Google Scholar
Mingoti, S. A., & Lima, J. O. (2006). Comparing SOM neural network with fuzzy c-means, k-means and traditional hierarchical clustering algorithms. European Journal of Operational Research, 174(3), 1742–1759.
Article MATH Google Scholar
Van den Broeck, J., Cunningham, S. A., Eeckels, R., & Herbst, K. (2005). Data cleaning: detecting, diagnosing, and editing data abnormalities. PLoS Med, 2(10), e267, pp.966–970.
Article Google Scholar
Wang, X., & Wang, C. (2019). Time series data cleaning: A survey. IEEE Access, 8, 1866–1881.
Article Google Scholar
Sun, W., Cai, Z., Li, Y., Liu, F., Fang, S., & Wang, G. (2018). Data processing and text mining technologies on electronic medical records: A review. Journal of Healthcare Engineering, 2018, 1–9.
Article Google Scholar
Castanedo, F. (2013). A review of data fusion techniques. The Scientific World Journal, 2013, 1–19.
Article Google Scholar
Singh, D., & Singh, B. (2019). Investigating the impact of data normalization on classification performance. Applied Soft Computing, 105524, 1–23.
Google Scholar
Zambelli, A. E. (2016). A data-driven approach to estimating the number of clusters in hierarchical clustering. F1000Research, 5, 1–11.
Article Google Scholar
Kodinariya, T. M., & Makwana, P. R. (2013). Review on determining number of cluster in k-means clustering. International Journal of Advance Research in Computer Science and Management Studies, 1(6), 90–95.
Google Scholar
Jolliffe, I. T., & Cadima, J. (2016). Principal component analysis: A review and recent developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 374, 1–16.
MathSciNet MATH Google Scholar
Yin, H. (2008). The self-organizing maps: Background, theories, extensions and applications. In Computational intelligence: A compendium (pp. 715–762). Springer.
Chapter Google Scholar
Liu, Y., & Weisberg, R. H. (2011). A review of self-organizing map applications in meteorology and oceanography. Self-organizing maps: Applications and Novel Algorithm Design, InTech:Croatia, 253–272. https://books.google.com.br/books?hl=en&lr=&id=k-SgDwAAQBAJ&oi=fnd&pg=PA253
Vettigli, G. (2013). MiniSom: minimalistic and NumPy-based implementation of the Self Organizing Map. Available at: https://github.com/JustGlowing/minisom/. Accessed on: October 15th, 2020.
Scribner, E. A., Battaglin, W. A., Dietze, J. E., & Thurman, E. M. (2003). 'Reconnaissance data for glyphosate, other selected herbicides, their degradation products, and antibiotics in 51 streams in nine Midwestern States', open-file report 03–217 (102pp). U.S. Geological Survey.
Google Scholar
Silva, A. M. D., Degrande, P. E., Suekane, R., Fernandes, M. G., & Zeviani, W. M. (2012). Impacto de diferentes níveis de desfolha artificial nos estádios fenológicos do algodoeiro. Revista de Ciências Agrárias, 35(1), 163–172.
Google Scholar
Omer, S. O., Abdalla, A. W. H., Mohammed, M. H., & Singh, M. (2015). Bayesian estimation of genotype-by-environment interaction in sorghum variety trials. Communications in Biometry and Crop Science, 10, 82–95.
Google Scholar
Goutte, C., & Gaussier, E. (2005). A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In European conference on information retrieval (ECIR) (pp. 345–359). Springer.
Google Scholar
Murtagh, F., & Contreras, P. (2012). Algorithms for hierarchical clustering: An overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(1), 86–97.
Google Scholar
Mohamad, I. B., & Usman, D. (2013). Standardization and its effects on k-means clustering algorithm. Research Journal of Applied Sciences, Engineering and Technology, 6(17), 3299–3303.
Article Google Scholar
Kotsiantis, S., Kanellopoulos, D., & Pintelas, P. (2006). Handling imbalanced datasets: A review. GESTS International Transactions on Computer Science and Engineering, 30(1), 25–36.
Google Scholar
Bagherzadeh, J., & Asil, H. (2019). A review of various semi-supervised learning models with a deep learning and memory approach. Iran Journal of Computer Science, 2(2), 65–80.
Article Google Scholar
Mansha, S., Babar, Z., Kamiran, F., & Karim, A. (2016). Neural network based association rule mining from uncertain data. In International conference on neural information processing (pp. 129–136). Springer.
Chapter Google Scholar
Stoean, C., Stoean, R., Becerra-García, R. A., García-Bermúdez, R., Atencia, M., García-Lagos, F., Velázquez-Pérez, L., & Joya, G. (2019). Unsupervised learning as a complement to convolutional neural network classification in the analysis of saccadic eye movement in spino-cerebellar ataxia type 2. In International work-conference on artificial neural networks (pp. 26–37). Springer.
Google Scholar
Riese, F. M., Keller, S., & Hinz, S. (2020). Supervised and semi-supervised self-organizing maps for regression and classification focusing on hyperspectral data. Remote Sensing, 12(1), 1–23.
Google Scholar

Download references

Acknowledgments

This work was supported by Itaú Unibanco S.A. through the Itaú Scholarship Program, at the Centro de Ciência de Dados (C²D), Universidade de São Paulo, Brazil, by the National Council for Scientific and Technological Development (CNPq), and also by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brazil (CAPES) - Finance Code 001.

Author information

Authors and Affiliations

Department of Computer Engineering and Digital Systems, Escola Politécnica da Universidade de São Paulo (USP), São Paulo, Brazil
Roberto F. Silva, Gustavo M. Mostaço, Fernando Xavier, Antonio M. Saraiva & Carlos E. Cugnasca

Authors

Roberto F. Silva
View author publications
You can also search for this author in PubMed Google Scholar
Gustavo M. Mostaço
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Xavier
View author publications
You can also search for this author in PubMed Google Scholar
Antonio M. Saraiva
View author publications
You can also search for this author in PubMed Google Scholar
Carlos E. Cugnasca
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Bio-economy and Agri-technology (iBO), Centre for Research & Technology Hellas (CERTH), Thessaloniki, Greece
Dionysis D. Bochtis
School of Agriculture, Aristotle Univeristy of Thessaloniki, Thessaloniki, Greece
Dimitrios E. Moshou
Institute for Bio-economy and Agri-technology (iBO), Centre for Research & Technology Hellas (CERTH), Thessaloniki, Greece
Giorgos Vasileiadis
Institute for Bio-economy and Agri-technology (iBO), Centre for Research & Technology Hellas (CERTH), Thessaloniki, Greece
Athanasios Balafoutis
Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL, USA
Panos M. Pardalos

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Silva, R.F., Mostaço, G.M., Xavier, F., Saraiva, A.M., Cugnasca, C.E. (2022). Use of Unsupervised Machine Learning for Agricultural Supply Chain Data Labeling. In: Bochtis, D.D., Moshou, D.E., Vasileiadis, G., Balafoutis, A., Pardalos, P.M. (eds) Information and Communication Technologies for Agriculture—Theme II: Data. Springer Optimization and Its Applications, vol 183. Springer, Cham. https://doi.org/10.1007/978-3-030-84148-5_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-84148-5_11
Published: 18 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-84147-8
Online ISBN: 978-3-030-84148-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Use of Unsupervised Machine Learning for Agricultural Supply Chain Data Labeling

Abstract

Access this chapter

Similar content being viewed by others

Assessing the Environmental Performance of the Food Industry Using Unsupervised Machine Learning

Detection and diagnosis of process fault using unsupervised learning methods and unlabeled data

A Novel on Altered K-Means Algorithm for Clustering Cost Decrease of Non-labeling Big-Data

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Use of Unsupervised Machine Learning for Agricultural Supply Chain Data Labeling

Abstract

Access this chapter

Similar content being viewed by others

Assessing the Environmental Performance of the Food Industry Using Unsupervised Machine Learning

Detection and diagnosis of process fault using unsupervised learning methods and unlabeled data

A Novel on Altered K-Means Algorithm for Clustering Cost Decrease of Non-labeling Big-Data

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation