Schlüsselwörter

1 Introduction

1.1 Problem

In industrial laundry technology, the sorting of dirty laundry according to washing categories has so far either not been carried out at all, has been done by scanning barcodes and RFID chips, or has been carried out manually by employees under safety measures. The latter requires human contact with the soiled and often contaminated laundry, which represents a great physical and psychological burden for the employees and also involves a considerable health risk. This problem arises in almost all industrial laundries (several hundred companies in Germany), so that support for laundries in this problem area would mean a huge improvement in working conditions for thousands and thousands of employees.

However, the application of AI in the laundry industry is currently very problematic. As in other industries, the term AI has extremely negative connotations because there is a general fear that AI will destroy jobs. Employees and their representatives are therefore skeptical about the introduction of AI. This is largely justified and must be part of a company’s AI introduction strategy.

In some areas, however, the skepticism is also based on ignorance and misconceptions about AI and its practical applications. The actual goal of and the actual changes brought about by AI in the work process must be made clear. Educating employees through further training is thus a central prerequisite for the successful introduction of AI.

1.2 Objective and Solution Approach

The objective is to minimize the contact of working people with soiled and possibly contaminated laundry by automatically classifying the laundry delivered to the laundries so that it can be treated without direct human contact.

Based on an existing automatic machine for laundry separation, a solution for laundry sorting according to washing programs, primarily based on camera images and AI, is to be developed.

In addition, it must be taken into account that different types of laundry are washed in different laundries, so that a human-machine interaction module for intuitive training of the concrete requirements of a specific laundry shop will also be co-developed using “Active Learning” methods.

Moreover, the learned knowledge of the trained artificial neural network is not to function as a “black box”, but rather current approaches of “Explainable Artificial Intelligence” are to be adapted and further developed to meet the needs of employees in medium-sized companies as well as increase their acceptance of AI.

This creates a new division of labor between humans and AI-controlled machines. This process of introducing AI control is to be supported by a qualification process that prepares and trains employees who are unaccustomed to cooperate with an AI system. The focus here is not only on technical training, but also on conveying a basic understanding of how this AI system works and the knowledge acquired, in order to minimize reservations about the use of this AI-based system.

Summarizing, the addition of a classification component to a wash chain of an industrial laundry shop improves working conditions in the area of laundry sorting. This is beneficial for the employees in industrial laundries and makes it easier for the mostly medium-sized companies to recruit workers for this not particularly popular area of work.

2 Related Work

2.1 Developing a Human-centred AI in an Industrial Setting

By taking over routine tasks (such as information search), the workload of humans (e.g., machine operators) can be reduced and value creation increased by focusing on core tasks. In the form of intelligent assistance systems, AI applications can support humans in specific tasks. To do this, the assistance systems analyze the current situation and make predictions. They are context-sensitive and can successively interact better with humans in a natural way (e.g., understanding speech) [11]. However, it is important to note that production conditions are subject to temporal variations, such as changing workloads and aging sensor systems. The data generated from this is not stationary, it is subject to drift, i.e. the observed data distribution is fundamentally different from the data used to train the systems. Efficient adaptive fusion architectures, where the learning process can be controlled by detected drift, are needed to facilitate human decision making in such problems while reducing complexity [4, 5].

Furthermore, humans can adapt the AI, to minimize the drift. In order to accomplish this, however, it is necessary to make the AI’s decisions understandable even to people with low technical expertise. In order to break the black-box character of modern AI models, a number of possibilities have been developed in recent years under the keyword “Explainable Artificial Intelligence” (XAI) to visualize the learned models of neural networks and to make them interpretable [12, 17]. Thus, various methods of active learning (AL) have been developed for these networks, which, on the one hand, actively use human expertise in training these networks, and thus also reduce the necessary amount of hand annotation of huge datasets, which is particularly daunting for small- and medium-sized companies [13, 16]. Through these methods, it is also possible to increase the understanding and therefore also trust of users.

2.2 Laundry Classification

Classification of textiles and cloth has been the subject of research in various fields, including computer vision, machine learning, and textile engineering. There are two popular open datasets, FashionMNIST and DeepFashion, both focusing on classifying the type of the clothing object. The pictures show the garments frontally and not bent/creased, thus in a totally different setting with less complexity compared to this project.

Due to the global increase in textile waste, various governments have issued orders for the recycling of textiles [19, 20] and [21], increasing the research of classification of fabric. Kampouris et al. [10] used geometry to classify fabrics based on their surface textures and reflectance, while Sonawane et al. [15] employed a convolutional neural network (CNN) to recognize different types of clothing fabrics. Additionally, recent advances were able to achieve good classification rates based on hyperspectral data [3].

3 Methods

In this chapter, we will first discuss the conception of the project. Afterwards, the AI methods and those that support and simplify the work with the AI will be described.

3.1 Experimental Setup (AreaScan)

At first, a system was designed for a test bed, which is to be used for the recognition and classification of individual pieces of laundry. The concept can be seen in Fig. 1.

Abb. 1.
figure 1

Design of the identification process. The recognition module is shown on the left. When a piece of laundry passes over the conveyor belt, an image is stored. This is then processed by neural networks, which predict the individual attributes of the laundry item (right) to decide upon the correct washing category

In the middle of this module there is a gray conveyor belt over which the laundry pieces travel through the module one by one. The speed of the conveyor belt is 0.6m/s. A FRAMOS D415e depth camera with a resolution of \(1280\,\)X\(\,720\) pixels is mounted above the conveyor belt. In order to filter out variations in illumination, the conveyor belt surface is isolated from above, below and the sides by the walls and the conveyor belt itself. Lighting is provided by two LED strips in front of and behind the camera.

For the recording, the laundry pieces are separated and dropped at three different positions onto the conveyor belt. This is to simulate the dropping of a gripper arm, which in the future will take over the separation and dropping of the laundry pieces. The laundry then runs along the conveyor belt and reaches the camera’s pickup area. When the camera detects a piece of laundry, several images are taken of the piece of laundry at regular intervals. This is necessary because some laundry items are too large for the recording area. A disadvantage of this methodology is that the laundry item is photographed from only one side and may also be scrunched up due to dropping. However, characteristics such as the color, type, and material should still be recognizable. In the case of soiling that is only visible on the underside, however, it is not possible to photograph it from above. Accordingly, the soiling would not be visible on the pictures. However, this restriction had to be made because in many laundries there is not enough space to hang up the laundry, so that it may be photographed from both sides. A possible solution to this problem could be a mechanism that turns the piece of laundry to the other side, whereupon another picture is taken.

In the next step, a manual annotation of the relevant classes is done. These categories are: color (white, black, red tones, blue tones, ...), type (shirt, trousers, gown, ...), degree of contamination (none, strongly, infectious, ...), kind of soiling (ink, mildew, blood), damage (none, chemical, mechanical) and washing temperature (\(30^{\circ }\), \(60^{\circ }\) and 90 (75)\({}^{\circ }\)). The classes are selected very finely. An adjustment for coarser classes for sorting is thereby also easily possible.

Adapted Setup (LineScan) When the module was exhibited and tested at a different location, the segmentation, and therefore the classification, was sometimes massively affected due to different lighting conditions despite the measures. Thus, the explanations and the observations we noticed during the annotation and segmentation process serve as a basis to make some optimizations to the recognition module.

The goal is to eliminate both the interfering objects and variance due to e.g. different lighting conditions. The interior should be shielded from the outside as much as possible, have a homogeneous/same color background and be well illuminated. Since we have laundry pieces of different lengths and depth data did not improve the classification, we decided to use a line scan camera as a reasonable adaptation. The line of the camera is aligned in such a way that it does not contain any interfering objects such as black blocks or holes. In addition, it is then only necessary to paint the walls in the area of the line to match the conveyor belt color and to adjust the illumination. Using this setup and using presorted laundry, two people were able to take and annotate 5, 430 pictures over 6 days, which is a speedup of factor 4.3 when considering the number of laundry items passing the camera compared to the first setup.

Due to feedback on our first working demonstrator and discussions with experts in the washing industry over time, we also adapted the categories. Color as well as soiling were extended by a few classes as well as considered as a multi-label classification problem. For the type, we only added a few new classes. We also added a new class “washing color”, describing the color group with which the laundry item is to be washed together.

3.2 AI in Image Processing

We focus on the use of convolutional neural networks (CNN) [7], which are nowadays used in almost all areas of computer vision due to their great performance. These are artificial neural networks, which are particularly suitable for image data due to their design.

In order to simplify the handling of these methods for the employees, we also deal with methods of active learning and the explainability of artificial intelligence.

The way in which the neurons are arranged and connected is called the architecture of the artificial neural network. So far in this project, we have looked at architectures that achieved the best performance on the ImageNet dataset at the time of their publication [18]. For classification, the CNN architectures VGGNet [14] and DenseNet [6] were initially implemented to gain first insights. The basis for this decision was the low complexity of the VGGNet and the parameter efficiency of the DenseNet. For each of these architectures, one network per category was trained and optimized to keep the complexity as low as possible initially.

As we identified that the networks learned incorrect shortcuts for the classification (e.g. it has learned, that if the black blocks were visible at the edge, it has tended to be a small piece of laundry, which in almost all cases were colorful), we also trained PerturbGAN [1] and a U-Net [8] to extract the laundry object from the background. Therefore, only the area of the piece of laundry can be used for the classification.

3.3 Active Learning

The required annotation of this data adds to the effort, which would also have to be redone specifically for each different laundry shop due to different customers. Therefore, it makes sense to develop strategies that reduce the number of required annotated data or simplify the annotation. Such algorithms are assigned to the field of active learning.

The authors of the CEAL algorithm [16] (cost-effective active learning) propose to adopt the certain classifications obtained by the neural network on unlabeled data and use them in addition to the already annotated data for a new training of the same AI. This process is illustrated in Fig. 2. This process is repeated until either a satisfactory recognition performance of the neural network is achieved or the complete dataset is annotated. Such human-in-the-loop concepts can also increase acceptance and understanding.

3.4 Explainable AI

As already mentioned, the decision path of deep neural networks is difficult to understand. Artificial neural networks usually contain millions of parameters and links, which makes it very difficult for a human to keep track of them. This is why CNNs are also called black-box models. Explainable AI (XAI) methods try to shed light into this black box so that the neural network’s decisions can be understood. In this project, we first looked at layer-wise relevance propagation (LRP) [2], as it produces an easy to understand heatmap in the input image indicating the level of importance of each pixel contributing to the final classification outcome.

Abb. 2.
figure 2

The CEAL paradigm gradually feeds samples from the unlabeled dataset into a neural network (a). Then, the selection criteria for the certain images and the most informative/uncertain images are applied to the classification results of the neural network (b). After adding user-annotated uncertain images to the annotated dataset (c) and the pseudo-annotations of non-user-annotated certain images (d), the model is further updated/trained (e). Adapted from Wang et al. [16]

LRP considers the inputs and outputs of the network. Intuitively, it uses the network weights and the activations of the neurons generated by the forward pass to propagate the output back through the network to the input layer. This gives the input layer the relevance of each pixel to the predicted class. From this it can be deduced which pixels the network uses as positive (red), slightly positive (yellow) or neutral (green) indicators for the respective classification. Some LRP explanations in the context of this project can be seen in Fig. 3.

4 Results

We created two datasets, consisting of 9, 405 (AreaScan) and 5, 430 (LineScan) images. For each of the images, information exists about the color, the type, and mostly also about the soiling, the damage, and the material. For the AreaScan dataset, segmentation was considered before classification, so we present these results first below.

4.1 Segmentation

First, 5, 000 of the images were segmented by hand, as the unsupervised PerturbGAN did not provide good results. Subsequently, we iteratively trained a U-Net, predicted new segmentations for the remaining images and either accepted or rejected them until all images were segmented. Through this active learning oriented process, 2, 091 segmentations were automatically created. Without deducting the effort for accepting/rejecting, this saved about \(22.23\%\) (here specifically 32 hours) of work time. The segmentation results can be seen in Tab. 1.

Tab. 1. Accuracy and IoU on the test dataset of the best model for the AreaScan dataset. For comparison, we calculated the segmentation by means of difference to an image of an empty conveyor belt and the output of the GrabCut [9] algorithm without corrections

4.2 Classification

Dataset AreaScan In this dataset 6, 855 images (\(72.89\%\)) are of laundry items that originate from nursing homes and 2, 550 images (\(27.11\%\)) are of workwear. Since only for the color, the type and the soiling to more than \(90\%\) the classes could be determined with certainty and the dataset is already rather small in general, the focus was initially placed on these three categories.

For both CNN architectures, hyperparameter optimization was performed for each category combined with each dataset (unprocessed and segmented). We first restricted ourselves to the learning rate, batch size and to the use of a pre-trained network. Here, the best scores on the test dataset are listed in Tab. 2.

Tab. 2. Accuracy on the test dataset of the best model for the AreaScan dataset

Dataset LineScan Here, using the 5, 430 images, we also performed a hyperparameter optimization for both CNN architectures for each category. Since we now considered the classification of color as well as soiling from the creation of this dataset as a multi-label problem, we observed the \(\text {F}_{1}\) score for these classes and had to change to a sigmoid activation function for the output. Furthermore, height equalization as preprocessing and data augmentation strategies like random rotation and vertical/horizontal flipping during training were performed. Here, the best scores on the test dataset are listed in Tab. 3.

Tab. 3. Accuracy (type, washing color) and \(\text {F}_{1}\) score (color, soiling, damage) on the test dataset of the best model for the LineScan dataset

4.3 Explainable AI

In addition, the LRP algorithm has been implemented and can be applied to any network architecture. A visualization of the color classification explanations along with the certainty of the respective neural network can be seen in Fig. 3. Looking at the classification with high certainty (a-b, \(>98\%\)), it is noticeable that only regions of the laundry piece were used for the classification. In the less certain range (c, \(80-90\%\)), the laundry piece was used for the most part, but also areas at the edge. In a relatively uncertain classification (d, \(<70\%\)), the laundry piece was obviously not recognized and only edge pixels and pixels at the transition between conveyor belt and wall were used.

Abb. 3.
figure 3

Original AreaScan pictures (left) and the color classification including the LRP explanation (right). The relevance of each pixel for the classification (above) is color coded (green – neutral, red – very important)

4.4 Output of the System

Finally, Fig. 4 shows the representation of the whole output including certainties for each class and visualizations for a randomly chosen piece of laundry.

Abb. 4.
figure 4

Original images with the segmentations of the trained U-Net and the outputs of the most probable classes of a trained neural network per category. Here, the most probable class is highlighted in yellow. In addition, the relevance of the individual input pixels for the classification of the neural network is calculated using LRP and displayed in the form of a heatmap (green – neutral, red – very important)

5 Discussion

The CNNs were able to achieve good classification accuracies for all categories, although the AreaScan dataset being fairly small. By using LRP, unwanted strategies could already be detected, which is why networks were trained for segmentation. Since the unsupervised segmentation approach did not give good results, segmentations had to be created manually. Through the successful use of CEAL for creating the segmentations, time for the annotation process could be saved. The subsequent segmentation performance of the U-Net is very good. However, the classification could not be improved by using the segmented images. Nevertheless, the use of the shortcut could be eliminated.

Since, on the one hand, the laundry items were not completely visible on the images at the time and, on the other hand, tests revealed some optimization possibilities, the setup was adapted using a line scan camera, among other things. In addition, the recording process was further optimized, which quickly led to the creation of a second larger dataset (LineScan). On this one, the classification performances of the first trainings are for some classes already very good. In the next steps, CEAL will be used to further increase the size of the LineScan dataset and to continue training models. In parallel, further user studies are planned.

6 Conclusion

The paper proposes a human-centred system for sorting laundry based on deep learning models. The system uses convolutional neural networks to classify laundry items based on their visual features, and layer-wise relevance propagation to explain the model’s predictions in a human-understandable way. The proposed system offers a promising approach for developing a user-friendly laundry sorting system which aims to take human needs and preferences into account, which will be examined in a subsequent study.