1 Introduction

Oracle inscriptions are one of the earliest pictographs in the world. Dating back to Shang Dynasty [1], ancient Chinese carved oracle characters on animal bones or tortoise shells to record the present and divine the future. Due to the gradual abandonment of oracle characters and the loss of oracle samples in the long history, we now can only discover very limited oracle bone inscriptions, and most of the characters are incompletely preserved. What is worse, as oracle bone inscriptions were created by people of different ethnic groups in different regions and were written on nail plates of various shapes and curvatures, the oracle characters are hard to recognize and distinguish from each other.

In the early time, archaeologists [1] could identify some widely used and easily identifiable characters. Then, with recognition models, researchers [2,3,4] can identify some new characters after training on annotated oracle characters. However, the ultimate goal of fully understanding the oracle inscription system is far from attainable since there remain a lot of undecipherable oracle pieces. For those deciphered words, mostly we only collect very limited characters [5] that is far from enough to train a powerful recognition model. To go further, we can formulate the oracle recognition task as a Few-Shot Learning (FSL) [6, 7] problem targeted at recognizing oracle characters in the limited data case.

FSL is a popular machine learning topic that aims to train a learning model with only a few data available. Based on the motivation to solve the limited training data issue, we can roughly classify FSL algorithms into the following categories: (1) Learning to learn, or meta-learning algorithms [8,9,10,11,12,13,14], aims to train the few-shot model to learn the capacity of learning from a few examples by simulating the few-shot scenario in the training stage; (2) Data-augmentation algorithms [5, 15, 16] directly expand the training set by generating new data based on the few training samples; (3) Semi-supervised algorithms [17,18,19,20,21,22,23] have the access to the additional unlabeled dataset and try to utilize this unlabeled knowledge to help train the few-shot model.

However, most FSL algorithms assume the existence of related large labeled datasets to pre-train the few-shot model. But as we do not have such datasets, these algorithms are not suitable for the oracle character recognition problem. Current algorithms to tackle the oracle character recognition problem in the FSL pipeline mainly focus on data-augmentation, including using hierarchical representation [3], dividing characters into structure and texture components [24], converting characters to stroke vectors [5].Nevertheless, they still fail to solve the problem of limited training data for their under-utilization of structured information and the mining of stroke information is limited and costly.

Fig. 1.
figure 1

The oracle character ‘Ci’ (in the left column) is made up of two parts, an eight-like character (in the middle column) and a knife-like radical (in the right column). After the FFD transformation (from top to bottom), though the strokes are distorted, the overall structure remains unchanged. We can still re-recognize the character ‘Ci’ based on them.

In this paper, we propose a new data augmentation approach by adopting Free Form Deformation (FFD) [25] which is initially used in the field of non-rigid registration [26]. FFD deforms the image by manipulating an underlying mesh of control points [25] and calculates the displacement rule for each pixel in the image individually. When it is applied to oracle character images, each pixel that makes up a stroke moves by their each displacement rule, which leads to the distortion of strokes and the corruption of local information. Meanwhile, the two adjacent pixels’ displacement rules are similar in general, which maintains the consistency of radicals and stability of the global information (See Fig. 1 as an example). By corrupting local information while maintaining global information, FFD well preserves the radical features of oracle bone inscriptions and randomly distorts stokes, making the augmented image more representative.

With this FFD Augmentor, we can now tackle the few-shot oracle characters recognition by utilizing the online Free Form Deformation algorithm as our augmentor and generating a bunch of augmented data from each annotated training instance for each category. As the generated training data is of high quality and diversity, we now make it possible to train the few-shot model in a standard supervised manner with our generated data from scratch. To better show the effectiveness of our FFD Augmentor, we select a powerful few-shot training method called EASY [27], which is composed of widely used training modules without specifically designed algorithms, and train the model with the generated data in a standard pipeline to ensure that our proposed algorithm can be utilized as a general module for the oracle recognition task. Extensive experiments on benchmark datasets, Oracle-FS [5] and HWOBC [28], verify the effectiveness of our proposed algorithm. We further conduct experiments on a sketch dataset [29] to show that the effectiveness of our augmentor is not limited to the oracle characters.

The main contributions of our work are as follows:

  1. (1)

    To the best of our knowledge, we are the first to apply the non-rigid transformation, namely FFD, to the field of data augmentation.

  2. (2)

    Our generated training data are diverse and informative such that the deep model can be trained with generated data from scratch without the help of an additional large unlabeled or labeled dataset.

  3. (3)

    We demonstrate the effectiveness of our approach through comprehensive experiments on Oracle-FS and HWOBC datasets, reaching a significant accuracy improvement over competitors.

2 Related Works

2.1 Oracle Character Recognition

Here we mainly survey machine learning algorithms for oracle character recognition. Conventional researches [30,31,32] regarded the characters as a non-directional graph and utilized its topological properties as features for graph isomorphism. Guo et al. [3] proposed a hierarchical representation and released a dataset of hand-printed oracle characters from 261 different categories.

In the deep learning era, neural networks gradually became the mainstream recognition method. Yang et al. [33] studied several different networks and suggested using the Gabor method for feature extraction. Zhang et al. [34] utilized the nearest-neighbor classifier to reject unseen categories and configure new categories. Other models like capsule network [4] and generative model [24] are also proposed to solve this task.

However, few researchers focus on few-shot learning of oracle character recognition. Orc-Bert [5] converts character images to stroke vectors and learns the stoke features from large unlabeled source data. The method requires large unlabeled source data and the generated augmented stroke data performs average compared with the benchmark. Instead, we show that a single oracle character is informative enough to train the recognition model via our proposed data augmentation algorithm without pretraining on related large datasets.

2.2 Few-Shot Learning

Few-Shot Learning (FSL) [35, 36] aims to train a machine learning model with only a few training data. Gidaris et al. [6] created an attention-based few-shot classification weight generator with a small number of gradient steps. MAML [8] searched for the best initial weights to accelerate the learning process reducing the risk of over-fitting. Chen et al. [37] applied self-supervised learning in a generalized embedding network to provide a robust representation for downstream tasks. LEO [38] reduced the complexity by learning a low-dimension model embedding and used the nearest neighbor criterion to classify. Chen et al. [39] generated multiple features at different scales and selected the most important local representations among the entire task under the complex background.

In this paper, we used the EASY [27] framework, which combined several simple approaches like backbone training, and featured vectors projection in the above literature, and reached the state-of-art performance in training without huge computation cost.

2.3 Data Augmentation Approaches

Traditional data augmentation methods are mainly focused on rigid transformation such as flipping [40], rotation [9], shearing [41] and cropping [40]. Blending or occluding [42] parts of images is also widely used, but they require expert domain knowledge to prevent critical information from being disrupted. In the context of few-shot learning, some efficient data augmentation algorithms like AutoAugment [43] are not applicable for the lack of large-scale labeled data.

Distinguished from the traditional image, the character image is regarded as an intermediate between text and image. As an image, it retains textual features, such as the ability to be reconstructed into a sequence of strokes. Han et al. [5] captured the stroke order information of Chinese characters and utilized the pre-trained Sketch-Bert model to augment few-shot labeled oracle characters. Yue et al. [44] designed a dynamic dataset augmentation method using a Generative Adversarial Network to solve the data imbalance problem. However, none of the state-of-art data augmentation approaches addressed the problem of oracle bone characters recognition from the overall structure.

As the ancestor of Chines characters, the oracle character is at least a logographic language and the upper-level orthographic units, like radicals, should contain richer information. Simply introducing noise to strokes cannot enhance or may even weaken the model’s ability to recognize radical components.

2.4 Non-Rigid Transformation

Non-Rigid transformation is a critical technique in the field of image registration, which focuses on finding the optimal transformation from one input image to another [26]. In the medical field, extensive research has been conducted on non-Rigid transformation for its essential role in analyzing medical effects over time. Existing non-rigid transformation techniques include AIR [45], Diffeomorphic Demons [46], and FFD [47].

FFD is a commonly used algorithm for image registration [25]. The registration from moving image to fixed image is modeled by the combined motion of a global affine transformation and a free-form deformation based on B-splines. Compared with rigid transformation, FFD has a higher degree of freedom which can better model the motion between two images. When applied to data augmentation, this can also bring greater flexibility to enlarge the training dataset.

3 Methodology

3.1 Problem Formulation

Here we define the few-shot oracle character learning problem without related large datasets for pre-training, thus degenerating to the standard supervised learning manner. We are provided with labeled dataset, \(\mathcal {D}\), which comprises the category set \(\mathcal {C}\), \(|\mathcal {C}|=n\). For a certain k-shot learning task, our augmentor and classifier would only have access to k annotated training instances for each category. We randomly sample k and q images for each category \(C_i\in \mathcal {C}, i=1,...n\) to construct the training set \(\mathcal {S}\) and the evaluation set \(\mathcal {Q}\) respectively. We aim to train on \(\mathcal {S}\) and generalize to \(\mathcal {Q}\). We take accuracy on \(\mathcal {Q}\) as the evaluation metric.

3.2 Overview of Framework

As shown in Fig. 2, our data augmentation method, FFD Augmentor, consists of several parts. For each image, we first create a local coordinate system by splitting the whole image into local patches using grids. The grid vertexes are used as the control points to define the local position of the neighboring pixels. Then random offsets are generated to shift the control points, thus shifting the neighboring pixels. Due to random offsets being used to shift control points, the whole image is modified with destroyed local information.

Fig. 2.
figure 2

Illustration of our FFD Augmentor. We generate random offsets for each control point and construct a deformed image based on the recalculation of the world coordinates of each vertex of the image.

Using our proposed FFD Augmentor, we generate several augmented data for each training image and then store them to expand the training set. With this expanded training set, we now can train the few-shot model from scratch.

3.3 FFD Augmentor

Though there are few studies on the composition of oracle characters, as the ancestor of Chinese characters [48], oracle characters intuitively retain similar characteristics. For example, compared with strokes, radicals contain richer information. This motivates us to perform a data augmentation algorithm to generate local diversity while preserving global structures.

In non-rigid registration, researchers [25, 49, 50] applies FFD to achieve this goal by calculates displacement rule for each pixel in the image individually. When it is applied to oracle character images, each pixel that makes up a stroke moves according to their each displacement rules. Thus it leads to the distortion of strokes and the corruption of local information. On the other hand, thanks to the continuity of these features and physical rules, the displacement rules for two adjacent pixels are similar in general. Globally, the relative positions of radicals remain consistent and the character’s structure is well preserved. See Fig. 1 for an illustration. Hence in this paper, we adopt FFD to generate new training data for few-shot oracle character recognition problem.

Free Form Deformation. As the oracle characters can be represented as a grayscale image, we implement the 2D version of Free Form Deformation based on B-splines [25]. Specifically, for the oracle character grayscale image \(\boldsymbol{x}\in \mathbb {R}^{h\times w}\), we design a two-dimensional mapping

$$\begin{aligned} \textrm{T}: (x_{1},x_{2})\quad \rightarrow \quad (x_{1}^{'},x_{2}^{'}), \end{aligned}$$
(1)

to simulate the non-rigid transformation. We decouple \(\textrm{T}\) by a global deformation mapping and a local deformation mapping as:

$$\begin{aligned} \textrm{T}(x_{1},x_{2})=\textrm{T}_{\text {global}}(x_{1},x_{2})+\textrm{T}_{\text {local}}(x_{1},x_{2}). \end{aligned}$$
(2)

The global deformation mapping is a simple affine transformation defined as:

$$\begin{aligned} \textrm{T}_{\text {global }}(x_{1}, x_{2})=\left( \begin{array}{ll} \theta _{11} &{} \theta _{12} \\ \theta _{21} &{} \theta _{22} \end{array}\right) \left( \begin{array}{l} x_{1} \\ x_{2} \end{array}\right) +\left( \begin{array}{l} \theta _{13} \\ \theta _{23} \end{array}\right) , \end{aligned}$$
(3)

while the local deformation mapping is the major concern in our algorithm. Specifically, we first distribute a series of grid points over the image at a certain spacing based on the predetermined patch number. Denote the area of the oracle character grayscale image as \(\varOmega =\{(x_1,x_2)\mid 0\le x_1\le X_1, 0\le x_2 \le X_2\}\), we split it by control points \(\Phi =\{\phi _{i,j}\}\) into several patches of size \(n_1\times n_2\), where \(n_i\) is the distance between adjacent control points in the i-th dimension. Then we can define the local deformation mapping as the product of B-splines functions:

$$\begin{aligned} \textrm{T}_{\text {local}}(x_{1},x_{2})=\sum _{l=0}^{3}\sum _{m=0}^{3}B_{t}(u)B_{m}(v)\phi _{i+l,j+m}, \end{aligned}$$
(4)

where \(i=\left\lfloor x_1 / n_{1}\right\rfloor -1, j=\left\lfloor x_2 / n_{2}\right\rfloor -1, u=x_1 / n_{1}-\left\lfloor x_1 / n_{1}\right\rfloor , v=x_2 / n_{2}-\left\lfloor x_2 / n_{2}\right\rfloor , \) and the B-splines functions are defined as:

$$\begin{aligned} \begin{gathered} B_{0}(u)=\frac{(1-u)^{3}}{6} , B_{1}(u)=\frac{3 u^{3}-6 u^{2}+4}{6} , \\ B_{2}(u)=\frac{-3 u^{3}+3 u^{2}+3 u+1}{6} , B_{3}(u)=\frac{u^{3}}{6}. \end{gathered} \end{aligned}$$
(5)

Then the augmentation comes when we randomly apply offsets with a pre-defined range \(O=[O_{\min }, O_{\max }]\) to shift the control points. For a specific grid point \(\phi _{i,j}=(x_{\phi _i},x_{\phi _j})\), we randomly initialize its degree of shift within the offset range.

$$\begin{aligned} \textrm{T}: (x_{\phi _i},x_{\phi _j})\quad \rightarrow \quad (x_{\phi _i}+\varDelta x_{\phi _i},x_{\phi _j}+\varDelta x_{\phi _j}),\quad \varDelta x_{\phi _i},\varDelta x_{\phi _j} \in O. \end{aligned}$$
(6)

Then for each pixel \((x_1, x_2)\) within the image, we calculate the deformed location based on Eq. (4) with the shifted control points deformed in Eq. (6). After the displacement transformation rules for all pixels are determined, we finally re-sample the image by pixels according to their rules to achieve non-rigid deformation. If the transformed pixel coordinates exceed the image size, the grayscale value will be replaced by 255. Finally, for the unvisited pixels in the generated image, we use a bi-linear interpolation algorithm [51] to fill these empty holes.

Augmentor. As mentioned in Sect. 1, non-rigid transformation can destroy local information while maintaining global information. By generating multiple FFD-augmented training samples, the model extracts and learns the structured information of the oracle characters, rather than relying on some particular strokes to classify the character. This is critical in the task of few-shot oracle character recognition since it will alleviate the problem of bias and overfitting caused by the limited training samples.

Algorithm 1 illustrates the detailed pseudo-code for our FFD Augmentor. An ablation study about the number of FFD-augmented training samples and the selection of FFD hyperparameters will be further discussed in Sect. 4.3.

figure a

3.4 Training with FFD Augmentor

To show the effectiveness of the FFD Augmentor, we adopt a popular training algorithm, the Ensemble Augmented-Shot Learning method (EASY) [27] to combine with our proposed FFD Augmentor.

Specifically, we test our algorithm on several widely used CNN architectures, including ResNet-12 [52], ResNet-18, ResNet-20, and WideResNet [53], respectively. For the FFD augmented training set, we also apply standard data augmentation strategies, including cropping, flipping, and color jittering. When training, each mini-batch is divided into two parts: the first part is input to the standard classifier with the feature augmentation strategy Manifold-MixUp [54]; the second part is with the rotation transformation and input to both heads. For details on training the EASY model, we suggest to read the original paper [27].

4 Experiments

We conduct extensive experiments to validate the effectiveness of our FFD augmentor and provide ablation studies to analysis each part of our algorithm.

4.1 Experimental Settings

Table 1. Accuracy (%) of oracle characters recognition on Oracle-FS under all three few-shot settings with classifiers ResNet18 including Orc-Bert, EASY with and without FFD augmentor. Because we only share the ResNet18 classifier in common with Orc-Bert, we compare with their best-performance method, i.e. Orc-Bert Augmentor with point-wise displacement on ResNet18.
Table 2. Accuracy (%) of oracle characters recognition on Oracle-FS under all three few-shot settings with different architectures. The Basic model is the pure model without any augmentation method involved.
Table 3. Accuracy (%) of oracle characters recognition on HWOBC under 1-shot setting with different architectures.

Datasets. We demonstrate the effectiveness of our FFD Augmentor on Oracle-FS [5] and HWOBC [28]. Oracle-FS contains 200 oracle character categories. We run experiments on 3 different few-shot settings, including k-shot for \(k=1,3,5\) where for each category we only have access to k labeled training data. To evaluate the performance, we randomly select 20 instances to construct the testing dataset for each category. HWOBC consists of 3881 oracle character categories, each containing 19 to 25 image samples. We randomly selected 200 categories, and each category is divided into 1-shot training sets and 15-sample test sets. Because the accuracy of our model in the 1-shot setting is high enough, we did not test on more k-shot settings.

Competitors. We mainly compare our results with Orc-Bert [5], the SOTA algorithm for the few-shot oracle character recognition task. Orc-Bert masks some strokes, predicts them by a pre-trained model and make additional noise on each stroke to generate multiple augmented images. We also train EASY [27] for both with FFD augmentor and without FFD augmentor to compare the results.

Implementation Details. We implement FFD Augmentor training methods using PyTorch [55]. The number of training epochs is 100 with a batch size of 64. Unless otherwise specified, We follow the hyper-parameters for training EASY in their default settings. We conducted experiments with the FFD Augmentor of 5 patch num, 11 max offset, and 30 augmented samples. For 1-shot and 3-shot, we used a cosine scheduled learning rate beginning with 0.1, while we used a learning rate of 0.01 for 5-shot. The images are resized to \(50 \times 50\).

4.2 Evaluation of FFD Augmented Training

It can be clearly noticed in Table 1 that our FFD-based data augmentor can defeat the state-of-the-art method by more than 30% under all few-shot settings. Also can be seen in Table 2, our data augmentation method plays a decisive role in improving accuracy.

On Oracle-FS, our 1-shot accuracy reaches \(76.5\%\) for all the classifiers, which outperforms EASY without being augmented by 20%. Our 3-shot accuracy achieved \(93.42\%\), exceeding the accuracy of all the 5-shot models without FFD augmentation. For the 5-shot setting, our model’s accuracy reaches \(97.59\%\) on WideResNet.

On the HWOBC dataset, the effect of our data augmentation tool is more prominent. We compare the result with EASY and Conventional Data Augmentation. As seen from the Table 3, our FFD augmentor improves the accuracy from the original \(65.24\%\) to \(99.52\%\) for the 1 shot setting.

4.3 Further Analysis of FFD Augmentor

Visualization. Here we provide more visualization of the oracle characters and FFD augmented images in Fig. 3. Clearly, for all kinds of oracle characters, the FFD Augmentor will consistently generate diverse and informative images. Hence we can provide many realistic augmented images to alleviate the lack of training data in the few-shot oracle character recognition task.

Fig. 3.
figure 3

Examples of oracle character images and the FFD-augmented samples.

Ablation Study. In this part, we conduct more experiments to evaluate our FFD augmentor, including the min-max offset, the number of patches, and augmented samples. These experiments are running with ResNet18. To make our results more accurate, all experiments on hyperparameters are conducted twice and we average the results of the two experiments as the final accuracy.

(1)Max Offset Value: In our FFD augmentor, the random offset value was generated through a uniform distribution in the interval between the minimal and the maximal offset value. The maximal offset value is set as a hyper-parameter while we negate it as the minimal offset. They together limit the movement range of the offset. The closer they are, the smaller the deformation of the image. In our experiment, we tested the maximum value from 0 to 15. Max offset value of 0 indicates that no free form transformation is performed on the original image, i.e., as an ablation experiment for our data augmentor.

(2)Num of Patches: The number of patches influences the number of control points of the grid. The more patches in the FFD transformation, the more transformation control points there are, and the more complex the deformation can be. In our realization, we test the number of patches from 3, 5 to 7. Considering the effectiveness of the transformation and the time overhead, we did not test on more patch nums. The time of generating the augmented dataset is proportional to the square of patch nums. When the number of patches increases to a large amount, the enhancement effect of the picture is not obvious but will take considerable time, much longer than the training cost.

Fig. 4.
figure 4

Illustration of num of patches and maximum offset difference.

Fig. 5.
figure 5

Left: Different combination of Patch numbers and max offsets varies in accuracy. Right: Num of Augmented Sample influences the model accuracy.

Take Fig. 4 as an example, when the max offset value and num of patches are both limited, the deformation is closer to rigid transformation and the overall shape of the text remains unchanged. But when the num of patches and maximum offset value becomes too large, the deformation is so complex that the overall structure is severely damaged and the oracle character is hard to identify. As max offset value and num of patches influence the deformation process in different ways, we experimented on the trade-off between max offset value and num of patches. The results are shown in Fig. 5. With the rise of the max offset value, the accuracy of all patch nums increases. However, their accuracy begins to decrease when the offset exceeds a certain threshold. As can be seen, the top three accuracy combinations of patch Num and max offset is (3, 11), (5, 15) and (7, 11).

Num of Augmented Sample. We then experimented on the num of augmented samples to figure out whether the larger num of augmented samples contributes to the better performance of our model. Here, we mean generate num of augmented images for each training image. For example, under the 3-shot setting, if the num of Augmented Sample equals 30, we generate 90 augmented images.

Intuitively, with more augmented samples, the accuracy will be higher. However, the results of experiments (See Fig. 5) show that due to the limited number of samples, too many augmented images will lead to overfitting, i.e., the test accuracy will become lower with the decrease of training loss. Two FFD combinations show an increasing trend followed by a decreasing trend and the growth trend is also diminishing for the combination of 3 patches and 15 offset value. When the number of augmented samples equals 30, the FFD combination of 5 patch num and 11 max offset reaches the maximum accuracy.

Besides, the computation time of data augmentation is a crucial factor to be considered. FFD is a time-intensive transformation that increases with the size of the image and the number of patches. The flaw of FFD is less fatal in few-shot learning for a small number of images. Our FFD augmentor takes about 0.4 to 0.5s to generate each image of size 50\(\times \)50 for 5 patch num. Due to the expensive time cost for data augmentation, we trade off both performance and computation time. Combining all the results above, we find that with 5 patch num, 11 max offset and 30 augment samples, our model achieves the best performance of \(78.9\%\) in 1-shot.

Table 4. Top-1 Accuracy under different learning rates in the 5-shot task.
Table 5. Top-1 Accuracy under different support samples with no FFD augmentation.(k-shot=1).

Learning Rate. Different learning rates affect the convergence speed and accuracy of the model. A low learning rate may cause the model to overfit the training dataset or converge too slowly. A high learning rate may prevent the model from convergence. We experimented on the influence of different learning rates under different k-Shot settings. As shown in Table 4, for k=5, learning rate of 0.01 reaches the highest accuracy.

Num of Augmented Samples Using Random Crop. Our experiments also test the number of image samples processed by random crop and flip before backbone training. The results in Table 5 show that the accuracy rate is highest when the size is 10, and there is a risk of overfitting when the size is larger.

4.4 Applicability to Other Problems

Though our paper is intended to tackle oracle character recognition, the innovative augmentor we proposed has much broader applications. To better demonstrate its versatility, we here report more experiments on the sketch recognition task. The task takes a sketch s as input and predicts its category label c. We did the toy experiment on the sketch dataset [29], which contains 20,000 unique sketches evenly distributed over 250 object categories, with the same dataset setting as before. The experiment results in Table 6 demonstrate superior performance after adopting the FFD Augmentor, which is 15% higher than the EASY without FFD augmentation. More applications of the augmentor will be explored in future works.

Table 6. Accuracy (%) of sketch recognition on the sketch dataset. The FFD Augmentor is with 5 patch num, 11 max offset, and 30 augmented samples.

5 Conclusion

We address the task for oracle character recognition with a few labeled training samples in this study. We propose a new data augmentation tool for few-shot oracle recognition problems, FFD Augmentor, which is based on the free form deformation method commonly used the registration. FFD Augmentor generates a series of augmented images by random FFD on the original images for the classifier for training. Numerous experiments in three few-shot scenarios support the efficacy of our FFD Augmentor. Our generated training data are so efficient and informative that the deep model can be trained with generated data from scratch, without any additional large unlabeled dataset for pretraining. Our model has broad prospects in the field of written character recognition field. A wider range of applications will be explored in future studies.