1 Introduction

Measuring and classifying packages is a fundamental activity in logistics, crucial for determining how each package will be handled, stored, delivered, and the costs associated to these operations. Various technologies are available to automate this task. Devices using light (e.g., [17]) or ultrasonic sensors (e.g., [27]) can measure the distance to the package faces along each axis, enabling accurate measurements of rectangular boxes with high resolution and fast processing speeds (less than one second per package). For irregular shapes, devices equipped with sensor arrays, such as light beams (e.g., [28]), are also available. However, these devices tend to be more expensive and slower, as the package must move alongside the sensing device while being measured. All of the previous systems require specific installation and operational procedures, such as controlled spaces where boxes must be aligned with specific marks before sizing. A more flexible alternative is the use of 3D cameras (e.g., [29, 36]) based on stereo-vision, time-of-flight (ToF), Light Detection and Ranging (LiDAR), and other technologies (refer to [44]). These cameras can detect the shapes of objects within their line of sight by processing images and fitting the observations to predefined candidate object models. They operate indoors at fast speeds but come at higher costs than the approaches mentioned previously.

Cost-effective alternatives include traditional vision-based measurement solutions, but these may incur uncertainties due to calibration issues [22], or face operational challenges in dynamic or cluttered processing environments. These challenges encompass movement, varying lighting conditions, and obstructions, limiting their deployment to controlled and homogeneous spaces. The integration of deep learning-based image analysis (see [25] and references therein) has enabled object detection and classification in less controlled environments by fitting the package to a set of candidate shapes. However, these systems require large training datasets, even when adapting pre-existing models. Although some public datasets with package images are available (e.g., [1, 35]), they primarily focus on box detection rather than class/size recognition.

In addition to the technologies previously mentioned, this work explores the application of Ultra High Frequency (UHF) Radio Frequency Identification (RFID) systems for classification and sizing tasks. RFID technology is widely utilized in warehouses and logistics facilities. In RFID systems, a reader can query nearby tags for their identification. Tags utilize the power from the reader’s signal to energize themselves and respond to queries through backscattering. Originally conceived for item tracking, RFID has evolved to facilitate a variety of remote sensing tasks [9], which can be performed either by implementing specialized tags capable of executing sensing operations and transmitting supplementary data to the reader, or by post-processing the tags’ signals in the reader.

Examples of the first type include works [24, 30]. In [24], the authors develop an RFID tag capable of measuring and transmitting the pH of athletes’ sweat. Similarly, the work in [30] designs a tag for measuring and relaying information about object vibrations and tilt to the reader.

The second approach is based on collecting low-level information from the tags’ transmissions (typically, received signal strengths -RSSs- and phases) and implementing remote sensing tasks by correlating this information with the target magnitude. For example, in [39], researchers utilize the phases of a tag’s backscattered signals to estimate temperature. Another example introduces a method to measure soil moisture, based on the RSS from a tag’s backscattered signal, as described in [31]. It is also possible to analyze responses from multiple tags. For instance, in [8], authors deploy tags on the walls of a room and correlate the variance in the tags’ signal RSSs with the number of people in that room. Another strategy involves examining the temporal evolution of tags’ responses. For example, [46] detects drivers’ fatigue by analyzing time-series readings from tags placed in a hat, and [40] introduces a method to deduce breathing periods by analyzing both RSS and phase time-series from backscattered signals of several tags.

Building on these concepts, this work proposes a novel RFID-sensing application that utilizes low-level information from the RFID identification process as a signature to classify package types among a set of candidates (similarly to computer vision systems), premised on the assumption that each package will be labeled with or contain multiple RFID-tagged items. In the context of this work, the RSSs statistics are used as main information for the signature, operating under the hypothesis that RSSs range and variance should correlate with the maximal distance among tags, which, in turn, should depend on the package type and size. Moreover, additional statistics from the interrogation process, such as frame count or total reading time, are also integrated into the signature as they can be indicative of the package type. For example, a higher frame count or prolonged reading time may suggest multiple interrogation attempts to identify the most distant tags, implying larger package sizes.

The proposed approach has the ability to provide reasonable capabilities without incurring additional hardware costs, thus opening the door to various applications, such as unattended package classification, ensuring adherence to package manifests, identifying packages with unusual distribution that require alternative processing, and more. Furthermore, unlike the previously discussed methods, which require direct line of sight, RFID offers a unique advantage in measuring packages even when they are not directly accessible, such as when they are inside other containers.

A supervised machine learning model based on shallow Artificial Neural Networks (ANNs) has been considered as the predictive structure that relates the identification signature to the package class. Moreover, classes have been sorted by size to keep the sizing error minimal (since classification errors will be more likely between neighbor types). The main challenge in this scheme is to create a suitable training strategy that does not require gathering large datasets from real scenarios, which would be prohibitive in practice. To that end, a two-stage transfer learning strategy has been adopted. It consists of building first a base model, which is trained using a synthetic dataset derived from a simulator adjusted to closely match the real scenario, and then a fine-tuned model, where a small set of experimental samples obtained from the actual setup is used to recalibrate the base model.

To explore the operation of this strategy, we start by describing related works in Section 2 and by defining reference scenarios in Section 3. These scenarios encompass typical elements found in standard RFID installations. The performance of the classifier defined in Section 3.4 is analyzed in Section 5 using training data gathered using the simulator described in Section 4. Section 6 explores the impact of deviations from the reference scenario and the application of transfer learning methodologies to correct this issue. Section 7 describes the application of this method in a real testbed aimed at mail classification tasks. Finally, Section 8 provides the concluding remarks.

Fig. 1
figure 1

Reference scenario. The gate is composed of one dislocated antenna pair. Packages with RFID-tagged items arrive at the gate and are interrogated. The package placement, size, and number of tagged items are random. The reader implements a regular interrogation procedure using Framed Slotted Aloha (FSA) with 16 slots per frame. The signature from the package interrogation is used to predict the type and size of the package. Figures in the last row show examples of the top view configuration for two packages. The sizes, position, and orientation have noticeable changes, as may happen in realistic scenarios

2 Related work

Multiple machine learning predictors have been explored in various RFID sensing studies. ANNs have been utilized in works such as [38] and [4], while Recurrent Neural Networks (RNNs), specifically Long Short-Term Memory (LSTM) structures, have been examined in [46] and [2]. Deep learning architectures have also been employed, for example in [45] (refer to survey [12] for details). The majority of these studies rely on extensive real datasets to develop the predictive models. Akin to our case, some works on RFID sensing, like [41], employ transfer learning from synthetic to experimental scenarios to mitigate data scarcity.

Regarding the specific application of package size/class estimation, we are not aware of any previous proposals in the literature dealing with this challenge, aside from our preliminary model presented in a prior conference paper [34]. Nevertheless, some works have tackled different aspects of package characterization. For instance, in [3] authors describe how to perform a 3D reconstruction of RFID-tagged packages to determine their orientation and stacking, using the phases of the backscattered signals. Moreover, a method for identifying the direction of goods passing through an RFID gate in a warehouse was introduced in [2]. RFID technology was also applied by Li et al. [20] for locating packages on shelves, with the approach being extended in [21] to include a drone-based reader. Other research focusing on package location includes [7, 13, 15, 23, 26, 43], aimed at determining the relative or absolute position of tagged items or estimating their pose, as investigated in [33]. Additionally, the concept of interrogation signatures was examined by Khadka et al. [19], who studied the unique physical-layer identification signature of passive UHF RFID tags and its implications for tag holder’s privacy.

In summary, while numerous studies have demonstrated the versatility of RFID to create innovative solutions beyond simple identification tasks, its application to package class/size estimation remains largely unexplored in the literature.

3 Reference scenarios

Our target scenario is composed of a single reader with one pair of bistatic dislocated antennas, as shown in Fig. 1. The gate has dimensions: d (distance between antennas) ranging from 2 to 3 meters depending on the configuration, \(h_a\) (antenna height) of 2.5 m, and \(h_b\) (package placement height) of 1 m (these are normal setups in RFID installations). Packages with UHF RFID-tagged items inside arrive at the reading area for identification, and their location, orientation, and other parameters are subject to a random component. Three increasingly complex setups have been considered:

  1. 1.

    Ideal scenario. In this case, packages are always placed in the same reading spot (x in Fig. 1), resting on the largest face, and without any kind of rotation (L edge is parallel to the dashed line connecting RX and TX antennas, see Figs. 1b-c). The number of tags and their position inside the package is random (see Section 3.1). Tags always rest on the horizontal plane and parallel to the edge W of the package (see Figs. 1b-c). Although simplistic, this scenario may occur in practice, e.g., when packages are automatically placed on a conveyor belt and the interrogation process is triggered by a photodetector, that always stops the belt at the same spot.

  2. 2.

    Simple scenario. A slight amount of randomness is assumed for the packages in this scenario. Packages are placed near, but not exactly on, the reading spot. The actual position \(x'\) is given by adding independent random variables \(\mathcal {N}(0,\frac{L}{10})\) to both X and Y axes. The packages are assumed to rest on their largest face, but now only roughly aligned with the TX-RX line (\(\phi \) \(\sim \) \(\mathcal {N}(0,\frac{\pi }{12})\) radians, see Fig. 1). Like in the ideal scenario, the number of tags and their position inside the package is random, but the orientation of the whole bunch may vary and be either vertical or horizontal (all tags are coherently aligned). This scenario may correspond to a semi-automatic placement process, with some small differences among packages.

  3. 3.

    Complex scenario. In this scenario, more intense changes occur in the position (position displacement is \(\mathcal {N}(0,\frac{L}{5})\) for both axes), and in the orientation (\(\phi \) \(\sim \) \(\mathcal {N}(0,\frac{\pi }{6})\)). In addition, packages are randomly rotated (i.e., they can rest on any face). The number of tags and their position are random, as in the previous scenarios, but now each particular tag orientation is selected independently (tags are not coherently aligned). This case would correspond to a scenario where packages are manually prepared and positioned.

3.1 Package types and distribution of the number of tags

As stated above, the number of tags inside each package is considered random in the reference scenario. The expected number of tags in a package is assumed to be proportional to the package volume to represent a realistic situation. Three possible sets of package sizes have been considered. Table 1 summarizes them: (i) a small set with 4 types of packages (4P), (ii) a medium set with 8 types (8P), and (iii) a large set with 16 types (16P). The sizes correspond to products from the company UK packagingFootnote 1. The tag distribution is given by \(\max \{1,P(\lambda )\}\), being \(P(\lambda )\) a Poisson random variable with rate \(\lambda \). This distribution guarantees that all packages have at least one tag. Besides, a uniform density of 100 tags/m\(^3\) has been considered, which yields the mean number of tags per package shown in the last column of Table 1.

Table 1 Package dimensions (L/W/H) [m] and mean number of tags per package (\(\lambda \)) [tags]
Fig. 2
figure 2

Data representation for the simple, \(d=2.5\) meters, 8P and 16P cases (top and bottom rows, respectively). Package types are indicated with different colors. The first column shows the package type versus the number of tags read during the interrogation process. Second and third columns show some features of the model (number of frames and the RSS difference) versus the number of tags read. Last column considers the two previous features in a 3D visualization

3.2 RFID interrogation

Tag identification is performed using the well-known Framed Slotted Aloha (FSA) anti-collision protocol, which corrects situations where several tags respond simultaneously. In FSA, the interrogation process involves multiple frames, divided into slots. In a given frame, non-identified tags select a random slot to communicate their identity. If no collisions occur, the reader acknowledges their IDs. If tags do not receive acknowledgment, they will attempt to identify themselves in subsequent frames. The FSA protocol helps to reduce the possibility of collisions and improve the efficiency of the system. The number of slots allocated at each interrogation round (frame) is considered fixed and equal to 16, which is a common setting in commercial readers. During this interrogation process different data and statistics can be collected. The following ones have been considered in this work:

  1. (i)

    Number of tags read,

  2. (ii)

    Total interrogation time,

  3. (iii)

    Total number of interrogation frames required,

  4. (iv)

    Average RSS of singleton slots (slots where a tag response can be correctly decoded),

  5. (v)

    Average RSS of slots with collisions (slots where a tag response is detected but cannot be decoded),

  6. (vi)

    Minimum RSS of singleton slots, and

  7. (vii)

    Maximum RSS of singleton slots.

Depending on which features are used, three possible information models have been defined:

  • Tag model. It uses only the first feature (number of tags read). It is the simplest model that can be created. It serves as the baseline reference to compare with other models.

  • Basic model. It comprises standard information collected by the RFID readers, features (i) to (iii). Similar variables are available in off-the-shelf RFID readers such as ImpinjFootnote 2 or AlienFootnote 3.

  • Full model. Comprises all the features: (i) to (vii), and should provide the best performance. Low-level data is available in some commercial readers. For example, RF phase angle, Doppler frequency, and peak RSS can be obtained in Impinj modelsFootnote 4 and have been used in research works related to object sensing (e.g., [42, 43]). Another option to collect this info is using custom software-defined radio (SDR) readers as in the tesbed developed in this work (see Section 7).

3.3 Data representation

Figure 2 shows a representation of the dataset for the simple, d \(=\) 2.5 meters, 8P and 16P cases, obtained using the simulator described in the next section. As can be seen in the figure, the classification task is challenging. If only the number of tags read is considered (tag model) it is not possible to correctly classify the package types (there are significant overlaps in the range of read tags for each type of package, as shown in the Figs. 2a and e). When more features are added, e.g., the number of interrogation frames or the difference between the maximum and minimum RSS, the classification can be improved (see second and third columns of Fig. 2, respectively). For instance, a greater RSS difference or higher number of frames indicates that packages are larger (higher class). This effect is particularly noticeable with the RSS difference since it clearly separates small and medium types. Larger packages are still difficult to differentiate among them. However, if all the features in the example are used (3D representations in the last column of Fig. 2), the largest classes can be better separated for small RSS differences (below 2 nW), since they tend to require a higher number of interrogation frames. In summary, the classification task is challenging, and adding additional features to the data information models can lead to notable accuracy improvements.

Table 2 ANN layouts
Table 3 Simulator configuration
Table 4 Training dataset: interrogation process sample

3.4 Predictive system

The predictive problem addressed in this work is categorized as supervised classification, for which various learning structures are aptly suited. ANNs were selected due to their inherent flexibility and prevalent use in similar applications. Various ANN layouts, described in Table 2, underwent testing, including configurations with identical layouts but incorporating a 20% link dropout between layers to enhance generalization. These layouts are henceforth referred to as L1-L4, and L1D-L4D when dropout is included. The number of inputs aligns with the chosen information model: tag, basic, or full. The output layer comprises 16 nodes (some may be deactivated during training), representing the possible candidate types, and employs a softmax activation function to select the class with the highest probability. The multi-class cross-entropy serves as the loss function, as a common practice for such problems. Besides, as package types are sort by their size, as shown in Table 1, the size error is minimized (misclassifications occur more frequently between neighbor classes, as discussed in Section 5).

Datasets were generated using the simulator, which is detailed in the next section. For each scenario (ideal, simple, and complex), the dataset includes 20000 samples, uniformly distributed among the different package types (i.e., 5000, 2500, and 1250 packages per class for the 4P, 8P, and 16P cases, respectively).

Implementation of the ANN was achieved using Keras over Tensorflow. In addition to dropout, training over-fitting was mitigated using an early stopping mechanism with a patience parameter set to 100 epochs, acting on a validation data set. Accuracy results were computed utilizing the repeated holdout cross-validation resampling procedure. This algorithm operates as follows. At each repetition:

Algorithm 1
figure c

Box interrogation simulation.

  1. 1.

    Selects a random partition of 80% of the data for training, 10% for validation, and 10% for testing,

  2. 2.

    Trains the model using the training set using early stopping on the validation set,

  3. 3.

    Evaluates the model using the testing set, and

  4. 4.

    Adds the accuracy sample to compute the mean and the confidence level

This procedure is run until the accuracy is statistically (using the t-test) within a confidence interval ±2.5% for its mean with a confidence level over 99% (using at least 20 samples). ANN training is performed using the minibatch gradient descent method with a learning rate of 0.01 and a batch size of 16.

4 RFID gate simulator

To construct the learning dataset, the model described in the previous section has been simulated in a fully-detailed UHF RFID gate simulator. Using a simulated setup has the advantage of allowing to gather a rich set of data, which could be prohibitively to collect in a real test-bed. To obtain reliable results this simulator implements a comprehensive channel model where the link budgets include:

  • The power-up and the backscatter links between antennas.

  • Fading due to multipath propagation between the tag and reader antennas, to include clutter effects.

  • Efficiency reduction due to the materials where the tags are attached.

  • The difference in load states during tags’ modulation, which reduces the amount of the reflected power towards the reader.

Table 3 summarizes the main parameters and characteristics of this simulator. At decoding, we consider bit error rate due to Additive White Gaussian Noise [6], and Miller [10] coding. Tag and reader antennas include gain variations due to their relative orientations [5, 16]. Besides, the simulator comprises a detailed implementation of the FSA tag anti-collision ISO 18000-6C protocol, including capture effect and tag outage computations. The simulations assume that tags are attached to cardboard material. Different materials (metal, plastic, aluminum, etc.) would affect the radio-electric characteristics of the tag, such as the radiation pattern of the antenna, and the values of the tags’ impedance. The simulator can be adapted to other materials with estimations of losses, and radio-electric changes in the tags [14].

The interrogation process is performed in the simulator for each package, using FSA with a fixed frame length of 16 slots. The process is finished when a given number of stop frames (7 were used in the simulator) are received totally empty. This experiment has been repeated 20000 times (each with a random package configuration) to construct the datasets for training the predictive system. The simulation process for each package is summarized in Alg. 1 and receives the corresponding power-up and backscattering link path losses for each reader-tag pair as inputs. These path losses are computed using a line-of-sight channel model and adding a Rician fading with 3 dB factor.

Table 5 Training dataset: frame information sample

P

Table 6 Accuracy [%] / Dimensional error [%] obtained for the best performing layout. L1 (blue), L2 (peach), L3 (white), L4 (gray), Ties (purple)

An example of the statistics collected is shown in Tables 4 and 5. The former summarizes, at each row, the high-level statistics for the interrogation procedure of each package, and the latter provides low-level statistics for each interrogation frame (note that the interrogation procedure for each package usually comprises multiple frames). For example, the first package was inventoried in the 12 first frames in the simulator (see Table 4). Note that in this example, the frames 10 and 11 are empty (see Table 5), but since the number of stop frames is higher, the reader continues the interrogation. In the frame 12, a last tag is identified (the empty stop frames afterward are not stored in the table). Besides, Table 4 stores also the random conditions (e.g., size, tags, orientation, etc.) under which the test has been performed, and information about the real number of tags in the package and whether or not all tags have been read. This information is not provided to the machine learning model, but stored for analysis purposes.

5 Results

Results have been computed using the repeated holdout cross-validation resampling procedure outlined in Section 3.4. Table 6 summarizes the average accuracies obtained for each scenario/model/gate configuration (parameter d represents the gate width, see Table 3) obtained with the best performing ANN layout for each configuration. Experiments shown important differences between these layouts, being L2 the best performing. The average difference between the best and worst-performing layouts is 2.9%, and reaches 11.9% for the simple 8P scenario with \(d\) \(=\)2.5 m., where the L4D layout (worst) gets 69.6% accuracy compared to L2 (best), which gets 81.5%.

Fig. 3
figure 3

(a), (b) Confusion matrices for simple (left) and complex (right) 8P scenarios. (c), (d) Confusion matrices for simple (left) and complex (right) 16P scenarios. The actual classes are the row indices, while the predicted ones are the column indices

Regarding the absolute results, the classifier operates better for scenarios with less randomness, a smaller number of package candidates, and using the full information model. This is consistent with the expected performance. For example, for the 4P case, d \(=\) 2.5 m, simple scenario, the accuracy virtually reaches 100%, whereas, for the 8P case it achieves nearly 88% and drops to 82% for the 16P case. If the scenario configuration is more difficult (complex case), the accuracy drops in all cases with respect to the ideal and simple ones, but it can still reach about 99.5%, 80.9%, and 57.5%, for 16P, 8P and 4P configurations, respectively. Besides, the gate dimensions also affect the results (accuracy is reduced by about 7% when comparing the best and the worst cases). The use of better information models yields improvements in all the experiments. For instance, in the simple-16P case, the full model improves accuracy at least a 17% with respect to the tag model. These improvements are smaller in the complex setup but still significant (around a 7% compared to the other information models). Since the tag model constituted our baseline estimation, the previous results demonstrate that this estimator can be notably improved using additional data from the RFID signature, validating the hypothesis proposed in this work.

A deeper insight into the results also indicates the “smooth” behavior of the predictor (mistakes occur most likely between similar classes). Figure 3 shows the confusion matrices for the best performing ANN layout both for the simple and complex scenarios for the 8P and 16P cases. These matrices were computed by averaging the results on the test datasets with the repeated holdout procedure. The types are mainly mistaken by the most similar ones (as can be seen in Table 1 similar-sized packages have near types). This characteristic reduces the absolute size estimation error (see Section 5.1). Errors are more noticeable in the larger packages, due to the higher variance of the signature data (due, indeed, to a higher number of tags and longer distances inside the package). A higher variance makes it easier to mistake similar packages. This effect can be also seen in Fig. 2. For example, attending to the number of tags read or the number of frames used in the interrogation process, the possible input range is wider for larger packages (e.g., for the package type s15 in the Fig. 2f, the possible frame number values range from 7 to beyond 50, while for the intermediate-size or small packages this range is much narrower). This effect also causes a noticeable overlapping between the largest packages in Figs. 2c and g. Overlapping can be corrected (to some extent) by the predictive model, as shown in the confusion matrices.

5.1 Size estimation error

In order to compute the size estimation error, let \(p_{ij}\) be the classification ratio for the i-th row, j-th column of the confusion matrix, and WHL the package dimensions shown in Table 1. The dimensional error relative to the true package size, \(\varepsilon ^\%\), is defined as:

$$\begin{aligned} \varepsilon ^\% {=} \sum _{i,j} p_{ij} \frac{\sqrt{(W_i - W_j)^2 + (H_i - H_j)^2 + (L_i-L_j)^2}}{W_j+H_j+L_j} \end{aligned}$$
Fig. 4
figure 4

Model accuracy comparison by scenario and model info versus the training dataset size for \(d\) \(=\)2.5 m. Simple scenario configuration for 8P (a) and 16P (b). Complex scenario configuration for 8P (c) and 16P (d)

These errors are also shown in the Table 6. Using the full information model, the error is below 3% for the 16P simple scenario, while it increases to 6.1% for the complex one. The error rises to an 8.8% and 7.6% respectively for the tag or basic models in the simple scenario, and to 8% and 8.1% in the complex one. In conclusion, it is possible to enhance the package size estimation by using extended data of the RFID interrogation signature.

6 Transfer learning

Implementing the package type predictor requires experimental data or accurate scenario simulations. The last option is challenging since real cases could be subject to many unknown variations which affect the final RFID interrogation performance. Therefore, using actual data seems the only suitable alternative. However, obtaining and labeling such a large dataset, like the one used in the previous section, is quite lengthy or directly infeasible. A first alternative could be to reduce the dataset to an achievable size in practice. To study this idea, the accuracy results for the simple and complex scenarios with \(d=2.5\) m using the full information model have been computed using increasingly larger datasets (from 20 to 1000 records). The experiment has been repeated 100 times for each dataset size, selecting a random dataset (from the total 20000 record dataset) at each run. Figure 4 shows the average and the worst-case-scenario (wcs) results for this experiment. Besides, these figures also show (red horizontal line) the maximum accuracy in each scenario, shown in Table 6 and obtained by training the ANN with the complete 20000 record dataset. Each figure has been computed with the best-performing ANN layout corresponding to that specific scenario-information model pair.

Results reveal that more than 1000 records are necessary to achieve an average accuracy within a 5% interval of the upper limit accuracy in the simple scenario and more than 500 in the complex one. Moreover, when considering the wcs performance, this is degraded in comparison to the averaged one, and then at least 5000 samples (not shown in the figure) are needed to achieve a 10% interval goal. In short, the training dataset should include many samples (each obtained with randomized package setups) to guarantee reasonable accuracy. However, even datasets with 1000 records could be cumbersome or infeasible to obtain in practice.

Fig. 5
figure 5

Transfer learning results, (a) TL1, (b) TL2, and (c) TL3

One way to overcome the previous issue is to rely on transfer learning (TL), a well-known technique to address problems with reduced training datasets (e.g., [32, 37]). In our TL approach, the ANN should be firstly trained with a large set of synthetic data (e.g., from a reference simulated scenario), and later, a small sample dataset obtained with the actual system should be used to perform a fine-tuning of the prediction network.

To study this idea we have analyzed the performance of three TL cases. In all of them the reference scenario has been the 8P simple scenario with d \(=\) 2.5 m. The three TL cases analyzed are:

  • TL1, derived from the reference scenario, but assuming different gate dimensions: \(d\) \(=\)2.25 m, \(h_a\) \(=\)2.25 m, \(h_b\) \(=\) 1.2 m.

  • TL2, derived from the reference scenario, but considering a different package set (appending s10 and s13 from Table 1) to the regular 8P types).

  • TL3, derived from the reference scenario, but considering new random conditions for the packages placement: (i) \(x'\) \(=\) \(x-(1/4,0)+\Delta x\) being \(\Delta x\) a variable with random Gaussian length with an average of one-eight of the longest edge of the package (L) and rotationally uniform, and (ii) a random rotation angle with zero mean and standard deviation \(\frac{\pi }{10}\).

To implement and test the TL approach, the ANN is first trained (base model) using the whole lot of 20000 samples dataset for the simple 8P scenario. Then, weights in the ANN are fixed in all but the last layer, and an additional fine-tune training step is performed using a new dataset explicitly obtained for the TL scenario using a reduced learning rate of 0.001. The ANN layout used for the TL experiments is the L4D with sigmoid activation in the output layer, instead of the original softmax activation, since the TL has performed better using this configuration. Like in previous sections, the repeated holdout cross-validation resampling procedure is used. At each repetition, the TL dataset (which contains 20000 records) is sampled to the target size (up to 300 samples, as would correspond to the limited adjustment that can be performed in an existing facility). Then, this reduced dataset is divided into training (80%), validation (10%), and test (10%) datasets, which are used to train and evaluate the model. Note that the number of samples of each package type can be different in this case since the dataset is randomly drawn. However, their average number is similar (e.g., if the dataset is reduced to 100 samples, then the expected number of packages of each type for the 4P, 8P, and 16P cases would be 25, 12.5, and 6.25, respectively).

Figures 5a-c show the average and worst-case accuracies obtained for the TL approach and the average accuracy obtained by an ANN trained only with the new TL dataset (’New’ curve). Besides, the figures show the accuracy achievable in the TL scenario using the best performing ANN for the reference scenario (“No TL” line).

Fig. 6
figure 6

Experimental testbed setup

Fig. 7
figure 7

Cardboard packages used in the testbed. (a) Envelope, 0.36/0.30/0.01 (L/W/H) [m]. (b) Poster tube, 0.9/0.055 (L/Base Diameter) [m]. (c) Large box, 0.47/0.34/0.26 (L/W/H) [m]. (d) Small box, 0.22/0.205/0.075 (L/W/H) [m]

The TL can outperform the base setup with a small training dataset in all cases. For example, in TL1, with 100 samples, the TL achieves roughly \(+\)10% improvement over that base setup, and nearly reaches \(+\)20% for 300 samples. In TL2, the transfer learning can provide more than \(+\)15% improvement with 300 samples, while in TL3 the improvement is reduced, but still significant (\(+\)7% with 300 samples). In comparison, if the ANN is only trained with the new data, the results are drastically worse than those from the retrained network (\(+\)30% for TL1 and TL3, and \(+\)15% for TL2 with 300 samples, \(+\)40% for TL1, \(+\)30% for TL2, \(+\)50% for TL3 with 100 samples). Globally, these results support the TL approach.

7 Experimental testbed

Building upon the preceding results, this section presents a real case study to validate the 2-stage learning approach under actual conditions. To this end, an experimental testbed was set up, leveraging the software-defined radio (SDR) Ettus B210 Universal Software Radio Peripheral (USRP) platform. The Ettus B210 facilitates precise control over RFID interrogation signals and accurate capture of backscattered signals for subsequent processing. The interrogation software was based on the implementation by Nikos KargasFootnote 5, described in-depth in [18]. This implementation allows for the collection of signature variables proposed in Section 3. Figure 6 displays the test setup. Similar to the system presented in Section 4, it consists of a bistatic pair of antennas aiming at a target area where packages are loosely positioned, akin to the simple scenario described in Section 3. For the experiments, four specific types of packages were selected, showcased in Fig. 7. These consist of (i) an envelope, (ii) a poster tube, (iii) a large box, and (iv) a small box. In this case, for feasibility and unlike the scenarios studied through simulation, the tags were placed directly on the package surface. A total of 25 interrogation traces were taken for each type of package using 2 tags, 25 with 4 tags, and 25 with 6 tags, totaling 300 samples. The tags were placed at random spots on a regular 3x3 grid on the package surface for the envelope and boxes, and uniformly for the poster tube. As can be seen in the examples from Fig. 7, the points were located a short distance from the package edges (approximately 1-2 cm). In all experiments, tags from the Confident U8_7014 modelFootnote 6 were used. It should be highlighted that in the tests, in a significant number of cases, the number of tags read for each package was not always the total, as some of them could not be activated due to their position.

In order to obtain an initial ANN model, a simulation suited to the test scenario was also carried out, using an RFID gate with size as indicated in Fig. 6. In this simulation, the parameters from Table 3 were adjusted to fit the scenario configuration. An operating frequency of 910 MHz and a B210 transmission gain of 60 dBs were selected. Tags were positioned as mentioned in the previous paragraph, and 1000 interrogation traces were obtained for each configuration following the random variations proposed for the simple scenario. This model was trained and achieves an average accuracy of 97.5% (similar to the levels achieved in Section 5), tested on simulated data.

After this stage, the transfer-learning method was applied with the actual data obtained with the experimental setup. As in Section 6, 10% of samples were reserved for ANN validation, another 10% for final testing, and the remaining 80% for model training. The transfer-learning training was repeated 10 times (each with a random subset of the actual samples), and the averaged precision results are provided in Table 7, for each tagging size. In this case, and unlike the results presented in previous sections, it has not been assumed that the number of tags depends on the package size, so the classification is performed only with the low-level parameters of the RFID signature. The last cell in the table shows what happens when we assume a variable number of tags depending on the type of package. It has been considered that we will have large boxes with 6 tags with a probability of 75%, or 4 tags with a probability of 25%. For the poster tube, the possible cases are 6-tags (25%) and 4-tags (75%). For the small box: 4-tags (75%) or 2-tags (75%). And, finally, for the envelope: 4-tags (75%) or 2-tags (25%). It can be observed that assuming these conditions substantially improves the worst fixed case studied (the 2-tags configuration). In general, the result has been satisfactory, achieving precisions that may allow applications that do not mandate high measurement accuracy but do require a reasonable knowledge of the object’s size.

Table 7 Accuracy results for the testbed

8 Conclusions

In this work, a novel capability for RFID gates has been proposed: the determination of the class and size of packages containing tagged items. By utilizing features derived from the tags’ interrogation process, an ANN implementation achieves moderate accuracy and small dimensional error. This method does not necessitate additional hardware to existing RFID gates and can furnish valuable information for cargo management. Moreover, transfer learning emerges as a viable approach to navigate a real scenario, initiating from an approximate simulated configuration and subsequently refining it using a limited set of samples from actual gate operation, as demonstrated in both simulated and experimental conditions. This validates the applicability and robustness of the proposed RFID signature method in real-world scenarios, aimed at cases were high measurement accuracy is not paramount, but a reasonable estimation of object size is sufficient.

Future work will consider new features for the interrogation signature as well as exploit multiple readings of the same tags to enhance the model’s accuracy, in addition to conduct further tests on prototypes.