1 Introduction

As the words “food, clothing, and shelter” suggest, eating is an important process for fulfilling the appetite, one of the three major human needs. Research reports related to nutrition education, which analyze the actual state of eating behavior by surveying participants regarding their daily eating behavior, have shown that increasing interest, awareness, and knowledge about food, nutrition, and cooking throughout life is effective for health [1, 2]. Therefore, in recent years, texture which is an important factor in human perception of food taste has been extensively researched in the food field [3,4,5,6,7], so that we can enjoy our daily meals more. There are two main methods for evaluating food textures: sensory evaluation [8, 9] and physical property evaluation [10, 11]. The sensory table is a statistical analysis of texture based on qualitative indicators using five human senses. Physical property evaluation involves a numerical interpretation based on quantitative indices obtained from compression, tension, and crushing. Both methods play a role in the development of food products that appeal to all tastes and new textures. This study focuses on a quantitative texture evaluation with characteristics in which the evaluation index strongly depends on the softness of the food. Among the quantitative texture evaluation studies, an interesting study combines machine learning with time-series data obtained from crushing food products in a compression testing machine to perform food classification. Yoshida et al. have shown that classification is possible with approximately 80% accuracy using time-series data available from load cell sensors when crushing three types of commercial potato chips [12]. A combination of existing quantitative analyzers and machine learning can be used to propose a new texture evaluation method.

Next, a new framework of computational methods using tactile sensation is physical reservoir calculation [13], which combines physical properties and sensors. Sudo et al. embedded a piezoelectric sensor in an owl composed of soft materials and applied the obtained data to machine learning to demonstrate that it is possible to determine the touched area with high accuracy. The physical reservoir computation derived from the recurrent neural network replaces the nonlinear transformation function of the reservoir computation with a physical system that performs the computation suitable for fast machine learning of time-series data. This calculation is faster and has lower power than software implementation [14, 15]. In particular, the ability to process time-series data acquired from physical systems in real-time has raised expectations for industrial applications, and various studies have been reported [16,17,18,19,20]. This physical system has been employed in a wide range of fields including electrical and electronic systems, optical systems, and biological and biomedical systems. Among them, the use of softness as a computational resource, which deforms when a force is applied to a material-mechanical system [21, 22], has a wide range of applications because it exhibits multiple degrees of freedom and nonlinear behavior that cannot be handled by conventional mathematical models of robotics.

This study proposes a new soft machine that can classify even minute differences in food products with high accuracy by combining physical storage calculations and texture evaluation of soft matter, which are soft substances such as polymers. We then aimed to establish it as a food texture evaluation device to improve the reproducibility of cooking and mapping in future food production. To achieve this highly accurate recognition capability, we focused on the advanced texture recognition capability of human oral structures. As is well known, food texture is recognized by chewing with a combination of organs with different elastic moduli, such as the teeth and tongue. If we can mimic these functions, we can perform classification with higher accuracy than in related studies. Therefore, we propose a tactile recognition robot that combines several polymeric materials in the end-effector of a robotic arm and attaches a tooth model to a sensor for tactile recognition inside each material. Using a robotic arm instead of a conventional compression tester, we can perform a pinching method of compression using 3D modeled materials, such as oral structures, and by controlling the servo motors of the end-effector, we can make the robot bite like a living creature, which is highly promising for achieving tactile detection that is more human-like. The proposed soft machine classifies hard and soft snacks on the market. We examined the shapes and hardnesses of snacks that were easy or difficult to classify. We also examined whether the robot could imitate the mouth-like perception of an object from multiple angles by classifying the objects by combining materials or using only one material.

2 Design and create of Gel Biter

2.1 Using material and sensing with the characteristics of each material

Figure 1 shows a schematic and the external appearance of the Gel Biter. Gel Biter is the name of the soft robot proposed in this research, which combines the words “gel,” which is a substance form between solid and liquid and has viscosity, and “Biter,” which expresses chewing. As described in Sect. 1, the Gel Biter consists of a soft-matter artificial mouth with a tactile sensor embedded inside a tooth model made of polymer materials with different physical properties attached to a robot arm.

Fig. 1
figure 1

Configuration of Gel Biter

Piezoelectric film sensors (TE Connectivity Ltd.) were attached inside each part of the artificial mouth to extract contact information. The piezoelectric effect, in which an electric charge is generated when pressure is applied, is used to measure the vibrations that spread from the tooth surface to the entire mouth when an Gel Biter is bitten by an object. We believe that this system is close to the human sense of touch [23], which enables us to recognize mechanical interactions with the outside and feel the contact, shape, texture, and hardness of objects through the sense of touch. As the sensing procedure in this research, the data acquired from the piezoelectric sensor are sent to Arduino, one of the one-board microcontrollers, and after transforming the data range by a normalization process to apply it to machine learning, it is sent to a Python script via serial communication. Finally, the data were subjected to machine learning in Python, as described below, to perform classification learning (Fig. 2). Because the data values obtained from the piezo unit are quite small, an A/D converter was inserted to perform analogue conversion.

Fig. 2
figure 2

Data acquisition process

First, as a hard polymer material, polylactic acid resin (PLA), which is one of the hard resins and widely used as a filament for 3D printers, was used to create the upper and lower tooth models. Next, as a soft polymer material, silicone, which is a two-component mixture whose hardness can be adjusted, is used to imitate the gum parts by placing the silicone cured in the mold on the bottom surface of the lower tooth model as described above. Finally, as a soft and viscous polymer material, Wizard Gel [24], a type of self-healing hydrogel with “high toughness, high elasticity, and drying resistance”, was used to create a tongue model. Piezoelectric sensors are attached inside the above three areas to obtain the aforementioned contact information, enabling the Gel Biter to simultaneously generate data from three completely different signal waveforms when the object is chewed. The fact that different waveforms can be obtained for different parts of an object is related to the elastic modulus of each material, and previous research has shown that the softer the object, the higher the data value obtained [25]. Therefore, the materials used in this study were subjected to compression testing, and from Fig. 3, Hooke’s law indicates that Wizard Gel > Silicone > PLA are softer materials, in that order.

Fig. 3
figure 3

Stress–strain curve and Young’s modulus of adopted polymer materials (PLA, silicone, hydrogel)

2.2 Multiple soft matter reservoir computing

The purpose of this study is to confirm whether materials with different elastic moduli can be utilized as physical storehouses and whether things can be viewed from multiple angles, such as the human oral structure. Figure 4 shows the reservoir section and computing model in Gel Biter. The material and shape of the teeth in the reservoir section were varied to verify how the combination of different data acquired from the sensors affects the accuracy of machine learning.

Fig. 4
figure 4

Framework PRC in Gel Biter

To connect to machine learning, raw data were obtained from the upper and lower teeth and tongue in one batch, and then denoised and linearized using low-pass processing of Fourier transform. Peak extraction and feature creation were performed in Python scripts. Fig. 5 shows the waveforms after a series of processes.

Fig. 5
figure 5

An example of waveform and peak extraction obtained for each material. Maker X is the peak, and the gray background is the peak extraction area. Left is an overall view and Right is an enlarged view of the peak extraction area

A feature value is a quantified characteristic of an object, and in object recognition, it is an important element for recognizing a specific object, such as a face or person, from an image [26]. If one tries to create features from time-series data from scratch, a large number of features, such as maximum value, minimum value, average value, and number of peaks, can be calculated, resulting in a huge amount of computation. In this study, we use tsfresh [27], which automatically creates features from time-series data. We also used three training tools: nonlinear support vector machines (SVM) [28], a type of pattern recognition that solves classification and regression problems; K-Neighbors [29], a two-class discriminator; and random forest [30], a type of ensemble learning that builds powerful models and compares their accuracies. K-partition cross-validation is used instead of the usual holdout method to improve generalization performance and to verify and confirm the validity of the analysis itself. Figure 6 briefly summarizes the flow of the machine-learning part of the Gel Biter described above.

Fig. 6
figure 6

Gel Biter’s machine learning flow

3 Subtle texture identification of sweets and snacks

In this study, we examined the accuracy with which the commercial sweets and snacks shown in Fig. 7 can be classified using Gel Biter. Experiments I, II(a), II(b), and III were conducted four times in sequence so that the shape and hardness of the snacks to be classified would become more similar as the experiments progressed. In Experiment III, the crackers were classified into the same type, but each cracker was different. The purpose of this experiment was to determine whether tiny differences in the shape and hardness of each cracker could be detected. We set up a program that allows the robot to chew the sweets and snacks mentioned above 20 times at 1-s intervals and collect training and validation data by having the robot chew each object five sets, for a total of 100 chews. The acquired data were passed to a trainer to check the classification accuracy, but no parameter tuning was performed in this study because we wanted to take full advantage of the differences in the acquired waveforms caused by the Young’s modulus and elastic modulus, which differ from one material to another. Table 1 shows the hardness of each sweet and snack, as measured using a durometer.

Fig. 7
figure 7

Sweets and snacks to be classified: Class I (ae), five types with distinctly different hardness, texture and ingredients; Class II (A) (fj), five types of rice crackers as hard snacks; Class II (B) (ko), five types of cream sandwiches as soft sweets; Class III (pt), five crackers of the same type

Table 1 Hardness of Fig. 7a–t by Type E durometer (- notation if not measurable)[10]

3.1 Shape and material dependance in Gel Biter

Human teeth chew and grind food in the mouth for swallowing. Therefore, there are teeth with different functions such as shovel-like incisors for cutting food, canine teeth for slitting, and molar-like teeth for grinding. In other words, dental contact is a combination of surface, line, and point contact. We then created upper and lower teeth with surface, line, and point contact shapes in addition to the normal tooth shape shown in Fig. 8 as basic research and examined what kinds of contact shapes affect Gel Biter’s sensing and classification results using only the two-sensor data obtained from the upper and lower teeth, without including the tongue. In addition, we examined the material dependency of the combination of PLA and silicone used in the upper and lower tooth reservoirs to determine how the classification results are affected by the combination of the two materials. As mentioned above, wizard gel was used as the tongue material in this study, but comparisons with other materials such as PLA and silicone as tongue reservoirs were also examined in parallel. Classes III in Fig. 7 were used as classifiers, and their respective accuracies are summarized below.(Table 2, 3, 4)

Fig. 8
figure 8

List of tooth shapes with different shapes

Table 2 Comparison of accuracy with change in shape (Use Stage III group snacks)
Table 3 Comparison of upper and lower tooth reservoir combinations (Use Stage III group snacks)
Table 4 Comparison of accuracy with change in shape (Use Stage III group snacks)
Fig. 9
figure 9

Classification results of sweets and snacks in each class in Fig. 7 for the three clustering methods

3.2 Classification accuracy of sweets and snacks by Gel Biter

In this section, we examine whether Gel Biter can chew and classify sweets, as shown in Fig. 7. As described in Sect. 2.2, we calculated the average accuracy in the training data when the acquired data were divided into five parts by cross-validation and finally visualized the results in a heat map. First, the average accuracy per experiment was calculated as SVM = 100.0%, K-NN = 100.0%, and RF = 100.0% for Experiment I, SVM = 93.7%, K-NN = 91.6%, and RF = 92.6% for Experiment II (a), SVM = 85.6%, K-NN = 80.4%, and RF = 80.4% for Experiment II (b), and finally SVM = 83.3%, K-NN = 74.0%, and RF = 84.4% for Experiment III. A heat map is shown in Fig. 9 to illustrate the results in detail. A heat map is a visualization of a large amount of multidimensional data using colors to briefly show the relationships. In this study, the vertical axis represents the actual object bitten by the Gel Biter, that is, the correct label, whereas the horizontal axis represents the predicted label derived by the learning model. The contents of the table represent the correctness of the answers: the stronger the red color, the higher the accuracy, and the stronger the blue color, the lower the accuracy.

3.3 Classification accuracy of sweets and snacks dependent on the sensing material of Gel Biter

In the previous section, we validated the classification evaluation by combining all the data waveforms acquired from three different sites: the upper teeth (PLA), lower teeth (silicone), and tongue (gel). As described in Sect. 2.1, the data values obtained in this case are different for each part of the mouth, even if Gel Biter chews the same object and the peak values are also different. The purpose of this study was to clarify whether it is possible to perceive objects from multiple angles using multiple polymer materials, that is, whether it is possible to achieve advanced object recognition by integrating different signal waveforms. Therefore, by selecting the sensor to be used in the program, we can compare the accuracy of the material combination patterns, such as the accuracy when only one sensor is used, and the accuracy when two sensors are combined (Table 5). The SVM that had the highest level of accuracy in the previous section was selected as the trainer used in this section. In addition, no parameter tuning was applied to the learners, as described in the previous section (Fig. 9).

Table 5 Comparison of classification accuracy between sweets and snacks by combination of sensing materials (using SVC)

4 Discussion

First, from the results of Sect. 3.1, we can assume that the most accurate results are obtained when normal teeth with multiple contacts are used as the effective tooth shape and that tooth structures with various shapes are suitable for the perception of texture. Next, as a comparison of the upper and lower teeth reservoirs, the highest accuracy was obtained with the combination of silicone silicone, which provided the clearest waveform data, but the accuracy was also high when PLA was used for the upper teeth and silicone for the lower teeth, suggesting that the combination may influence the classification accuracy. In the comparison of tongue reservoirs, the order of accuracy was Wizard Gel > Silicone > PLA, suggesting that the softness of materials can be utilized for machine learning.

The results for each stage are presented in Sect. 3.2. Stage I, which is the classification of sweets with clearly different shapes and hardness, is clearly different from the ones humans see and eat; therefore, all the training machines achieved 100% accuracy. Next, Stage II (a), which classifies crackers of different types, is less accurate than Stage I but still achieves more than 90% accuracy. From Table 1, most crackers have similar hardness except for (f), but (i) and (j) have a distinctive shape compared to the other crackers, and we assume that this is the result of being able to detect this difference. In stage II(b), the classification of different types of chocolate pies is slightly less accurate at approximately 80%, but it achieves the same level of accuracy as in a previous study [12]. A detailed look at Fig. 9c shows that the chocolate-coated surfaces of (k) and (l) can be determined with high accuracy, but the accuracy is lower for the soft and spongy surfaces of (m), (n), and (o). Currently, Gel Biter is difficult to classify unless the food is soft and the surface of the food has no noticeable characteristics. Finally, stage III, in which the same type of crackers is classified according to only minuscule differences, was found to be classifiable in approximately 90% of cases, exceeding previous studies. Although we were able to obtain a highly accurate classification evaluation overall, it is possible that some of the features created automatically by tsfresh were used without selection and that some of them were not related to improving accuracy at all or may have reduced accuracy. Sorting by importance from the overall feature set and reducing the number of unnecessary features leads to further improvements in accuracy. As mentioned above, the results of this study show that accuracy tends to be higher in the order of Stage I > II(a) > III > II(b), and the softer the object, the lower the accuracy. Here, we compare the evaluation results obtained using the other methods. Using a creep meter, which was used to evaluate food texture, we analyzed the breaking strength of each rice cracker and chocolate pie in Stages II (a) and (b) to determine the difference between the snacks (Fig. 10). As a result, first, the test results obtained for each type of cracker in Stage II(a) are different from each other, which leads to good classification accuracy with Gel Biter. In Stage II (b), the waveforms of (m) and (n) were very similar among the chocolate pies, although they were clearly different for each type, as was the case for rice crackers. The heat map in Fig. 9c shows that the decrease in accuracy is related to this result, and the reason for the lower accuracy in Stage II(b) is now known. Finally, the waveforms of the same type of crackers in Stage III show that the shape and hardness of each crack vary aesthetically, and the Gel Biter was able to detect this with high precision. The data derived from Gel Biter are largely unknown at this stage. Therefore, we would like to prove that the data obtained from Gel Biter are accurate by utilizing existing measurement devices for food products.

Fig. 10
figure 10

Waveforms for each sweets and snacks obtained from rupture strength analysis using creep meter

In addition, Table 5 shows that the combination of the materials had the lowest accuracy when the gel was used alone and the highest accuracy when the three parts were integrated. These results indicate that it is possible to identify a wide range of foods by capturing objects from multiple perspectives using Gel Biter. However, looking at other results, there are parts where high accuracy is achieved even when only one part is used alone or when two parts are combined; conversely, there are parts where the accuracy does not change significantly when the two parts are combined, so it can be said that simply increasing the number of sensors does not necessarily lead to a comparative increase in accuracy. In addition, as shown in Fig. 5, the waveform obtained from the gel tongue is smaller than that of the other parts, and it is thought that the problem of discharged waveforms and the fact that vibrations spreading from the teeth are not transmitted directly when the object is bitten compared to other parts are directly related to the decrease in accuracy. As countermeasures, it may be necessary to attach masking tape or waterproof tape to the sensor to prevent discharge, add a tongue-pressing or licking action when biting, and consider other materials suitable for the bottom.

5 Conclusion

In this study, a soft matter artificial mouth that mimics the structure of the human oral cavity was created by incorporating multiple polymer materials, and the vibrations generated during actual biting were collectively acquired as waveform data using a piezoelectric film sensor. We placed the data into three types of training devices and conducted four experiments to verify whether classification was possible and whether the objects could be captured from multiple perspectives. As a result, the maximum accuracy of all stages exceeded 80% in the previous study, and the classification was also able to take advantage of the characteristics of the materials. However, issues such as a decrease in the accuracy of stage II(b) and the location and operation of the gel tongue may be raised. In the future, as part of our pursuit of the human oral structure, we would like to investigate the combination of other materials that were not used in this study, improve the appearance of the Gel Biter and the way it is chewed, and make it more human-like in appearance, which would lead to even more realistic data acquisition and improved accuracy. In addition, we attempt to establish a new robot that has never existed before by making the most of the specifications of the robot arm. Based on the relationship between polymeric materials and accuracy in detecting even the tiniest food, we discuss their performance as an evaluation device to improve the reproducibility of food preparation in the future.