# Real-time multitarget tracking for sensor-based sorting

## Abstract

Utilizing parallel algorithms is an established way of increasing performance in systems that are bound to real-time restrictions. Sensor-based sorting is a machine vision application for which firm real-time requirements need to be respected in order to reliably remove potentially harmful entities from a material feed. Recently, employing a predictive tracking approach using multitarget tracking in order to decrease the error in the physical separation in optical sorting has been proposed. For implementations that use hard associations between measurements and tracks, a linear assignment problem has to be solved for each frame recorded by a camera. The auction algorithm can be utilized for this purpose, which also has the advantage of being well suited for parallel architectures. In this paper, an improved implementation of this algorithm for a graphics processing unit (GPU) is presented. The resulting algorithm is implemented in both an OpenCL and a CUDA based environment. By using an optimized data structure, the presented algorithm outperforms recently proposed implementations in terms of speed while retaining the quality of output of the algorithm. Furthermore, memory requirements are significantly decreased, which is important for embedded systems. Experimental results are provided for two different GPUs and six datasets. It is shown that the proposed approach is of particular interest for applications dealing with comparatively large problem sizes.

## Keywords

Linear assignment problem Sensor-based sorting Parallel algorithm Graphics processing unit## 1 Introduction

*accept-or-reject*decision. An impression of a sensor-based sorting system is provided in Fig. 1. Typical setups include opto-pneumatic separators [1], which consist of optical sensors for perceiving the material and compressed air nozzles for physically separating it. Ordinarily, systems utilize scanning sensors, such as line scan cameras, which allow two-dimensional images of objects passing through the field of vision to be generated. For the purpose of transportation, conveyor belts, slides, or chutes are commonly used.

A general challenge in sensor-based sorting lies in minimizing the delay between perception and separation of the material. This delay mainly exists due to the required processing time of the evaluation system employed. In the case of optical sorting, the evaluation is implemented in terms of several image processing tasks. For instance, systems handle the task of preprocessing the input data, extracting regions of the input image that contain objects, calculating features for the individual objects, and classifying them accordingly. Minimizing this delay is a necessity for good sorting quality. An increased delay results in increased imprecision in predicting the location of an object when reaching the separation stage.

Bipartite graph matching, as performed in order to find the assignments between measurements and tracks, is a well-studied problem. In the field of computer vision, numerous applications exist in which such assignment tasks need to be solved. It is also used in scheduling tasks. Bipartite graph matching can be regarded as a linear assignment problem, and various algorithms to solve it exist [5]. When dealing with applications that are required to have real-time capabilities, the problem becomes even harder since corresponding solvers typically pose a high computational burden on the system whenever many correspondences are to be found.

In this paper, a real-time multitarget tracking algorithm for the computer vision task of sensor-based sorting is presented. For this purpose, an enhanced solver for the linear assignment problem is considered. More precisely, a fast realization of the auction algorithm on a graphics processing unit (GPU) is proposed. The algorithm is implemented both using the OpenCL framework, which allows running the code on numerous state-of-the-art GPUs, and using CUDA. This allows utilization of two GPUs for experiments, further revealing the performance of the implementations. The CUDA-based implementation is based on the code published with [6]. The enhanced algorithm includes two specific improvements which significantly increase processing speed and also decrease memory usage, which is of particular interest for embedded systems. In order to demonstrate the success of this method, its applicability in multitarget tracking used for an evaluation system as included in sensor-based sorting is shown. It is also compared with other recent work in the field.

This paper is organized as follows. Following this introduction, Sect. 2 briefly reviews the linear assignment problem and the auction algorithm. Related work in the field of sensor-based sorting and fast implementations of the auction algorithm is then reviewed in Sect. 3. A general description of the multitarget tracking system considered in this paper is provided in Sect. 4. In Sect. 5, the improvements proposed in this paper are presented. These are further subject to experimental evaluation, which is provided in Sect. 6. Lastly, a conclusion is provided in Sect. 7.

## 2 Problem formulation

*N*be the number of workers and

*M*the number of tasks. For cases when \(|N| \ne |M|\), the smaller set is typically expanded such that \(|N| = |M|\) holds true. Furthermore, let \(x_{i,j} = 1\) in case worker

*i*is assigned to task

*j*and \(x_{i,j} = 0\) otherwise. The cost of assigning worker

*i*to task

*j*is given by \(a_{i,j}\). Then, the optimization problem is formulated as

*N*persons are competing over

*M*objects by bidding for them. The algorithm further requires that \(M \ge N\) holds true. Whenever this is not the case, the sets are swapped.

*j*. During the bidding phase, each person finds the object of maximum value as given by \(j_i \in \hbox {argmax}_{j} \, \{ a_{i,j}-p_j \}\), where \(a_{i,j}\) is the utility of object

*j*to person

*i*. It is important to note that \(a_{i,j}\) now denotes a utility instead of a cost, which represents the reformulation from a minimization problem to a maximization problem. Having identified \(j_i\), the person offers a bid denoted as \(\gamma _i\) in Eq. (2). Furthermore, \(v_i\) denotes the value of the most preferred object and \(w_i\) of the second most preferred object:

The set of persons having bid on object *j* is denoted as *P*(*j*). During the assignment phase, each object *j* determines the person having submitted the highest bid as given by \(i_j \in \hbox {argmax}_{i \in P(j)} \, \gamma _i\). The new price of the object is then set to \(p_j + \gamma _{i_j}\).

In order to represent the ownership between a person and an object as well as the current price, a \(N \times N\) matrix is typically used. Generally, the algorithm provides an approximation, since it is only guaranteed to find the optimal solution if \(\epsilon < {1}/{N}\) and \(a_{i,j} \in \mathbb {N}\) hold.

## 3 Related work

In this section, recent work from the field of sensor-based sorting and parallel strategies for solving the linear assignment problem is briefly reviewed.

### 3.1 Sensor-based sorting

Sensor-based sorting is typically used in the fields of food processing [8], waste management [1], and sorting of industrial minerals [9]. The process can be subdivided into the tasks of feeding, preparation, presentation, examination, discrimination, and physical separation [10, 11]. Preparation and presentation are realized by means of certain transport mechanisms, for instance conveyor belts, slides, or chutes. Usually, the goal is to unscramble the individual objects and achieve ideal flow control in terms of having all objects move at a defined velocity. During the examination, a task-specific sensor, possibly in combination with an appropriate illumination device, acquires the data [12, 13]. For the purpose of discrimination, data analysis is performed. For small, cohesive materials, physical separation is performed using an array of compressed air nozzles [14].

Conventional systems typically utilize scanning sensors. Recently, applying an area scan camera in place of a line scan camera has been proposed [2, 3]. The goal is to decrease the error in physical separation by selecting the nozzle(s) as well as the point in time to trigger the nozzle based on the results of a predictive tracking approach. In [15], it is shown that information derived from tracked objects can also be exploited for discrimination of products and hence be used to decrease the detection error. The challenge of respecting real-time requirements regarding multitarget tracking in sensor-based sorting is addressed in [16]. The authors propose a framework that dynamically selects an appropriate algorithm to solve the linear assignment problem based on the current system load. However, only a homogeneous hardware is assumed, and all algorithms are executed on a conventional CPU.

### 3.2 Parallel strategies for solving the linear assignment problem

Although this work focuses on the auction algorithm, it is worth mentioning that numerous algorithms for solving the linear assignment problem exist. An overview and comparison of some of the methods are provided in [5].

Due to its suitability for parallel processing, an implementation of the auction algorithm on a GPU is proposed in [6]. The authors present an implementation based on CUDA. They experimentally evaluate their implementation, comparing it with a sequential CPU implementation using a computer vision task, namely correspondence matching of 3D points. In [17], the authors present further improvements of the implementation. By splitting the \(N \times N\) matrix into two unidimensional arrays, one holding the object prices and the other one the bids, they achieve lowering of memory usage. Results are presented in terms of memory usage, showing how especially problems involving huge datasets benefit from the approach.

An implementation of the auction algorithm on a field-programmable gate array (FPGA) is presented in [18]. The authors compare their experimental results with those presented in [6] and claim to achieve results ten times faster for certain problem sizes.

Parallel versions of two variants of the Hungarian algorithm, a different method for solving the linear assignment problem, are considered in [19]. The implementation is tailored to NVIDIA devices due to its implementation using CUDA. Also, the authors state that their implementation supports multi-GPU versions and consider up to 16 GPUs.

## 4 Multitarget tracking in sensor-based sorting

As has been mentioned, sensor-based sorting systems are typically designed according to a sorting task at hand. Materials to be sorted may strongly vary in terms of size, ranging from only a few millimeters, e.g., seeds, up to several centimeters, for instance minerals. For reasons of efficiency, the throughput of the system should generally be as high as possible while respecting quality requirements. Consequently, with respect to multitarget tracking, as many as thousands of objects may need to be tracked simultaneously. For systems using a conveyor belt for transportation, the applied belt speed may also vary. Typically, it is configured within a range of 1–5 \(\mathrm {ms}^{-1}\). Also, the image resolution needs to be sufficient for detecting the smallest possible characteristic that is of importance for material characterization. In many cases, it is only a fraction of a millimeter.

In many cases, the data retrieved by the sensor of a sorting system can be represented as an image, e.g., in optical sorting. Therefore, the measurements that serve as the input for the tracking algorithm need to be detected in the image data. This task is handled by various image processing routines, such as filtering and segmentation, which are required during data evaluation in order to characterize the individual objects. The centroids of the extracted objects then represent a set of unlabeled measurements for a received image in our case.

In our system, multitarget tracking can be subdivided into four tasks, namely *state estimation*, *gating*, *association*, and *internal state management* [16]. For the purpose of *state estimation*, a standard Kalman filter is used. Assuming the assignments between the measurements and the tracks are given, we refine our knowledge about the positions of the particles by performing a Kalman filter update step for each individual particle. For the prediction step of the Kalman filter, we assume that a constant velocity model can be used to approximate the motion of the particles and state variables are given by the *x* and *y* of the measurement and the velocities in both directions, i.e., \(v_x\) and \(v_y\). The prediction step has a complexity of *O*(*n*), where,*n* denotes the number of current target tracks, which is expected to be (almost) equal to the number of measurements. The prediction yields the approximate position of the individual particles in the next time step, which then serve as the input for the *gating* and the *association* step at the next time step. The goal of *gating* is to partition the search space prior to association and hence split the problem into several smaller subproblems. This also allows for parallel processing during association. However, in this paper, gating is not considered. During *association*, the prediction of the existing tracks needs to be matched to the current measurements. This requires solving a linear assignment problem. Various algorithms exist, and they differ both in terms of computational complexity and whether they guarantee optimal results. In this paper, we focus on the auction algorithm for this task.

*N*and

*M*denote the predictions of the existing tracks and the measurements of the current frame. Initially,

*M*contains the measurements (objects) and

*N*the predictions (persons). The bidding kernel is implemented from the point of view of the persons, the assignment kernel from the point of view of the objects. Thus, the corresponding quantity indicates the problem size. However, due to the restriction that \(|M| \ge |N|\) must hold true, the sets are possibly swapped. In our scenario, the distance between a measurement obtained and the prediction of a track is used to determine which measurement is to be associated with which track. The algorithm considered here utilizes a

*cutoff*distance denoted as \(d_{\text {max}}\). Whenever the distance between the position \(\underline{M}_i\) of measurement

*i*and the position of the current prediction \(\underline{N}_j\) of track

*j*exceeds this distance, the utility is set to zero:

Furthermore, an asymmetric problem is considered, i.e., it is not required that \(|N| = |M|\) holds true. In our scenario, tracks for which no measurement yielded a utility greater than 0 are regarded as tracks to be erased. In the context of sensor-based sorting, this means that an object has left the observed area. However, due to possible occlusions, collisions, or poor object detection, tracks are not deleted immediately. Instead, a scoring system is applied. More precisely, each new track is assigned an initial score. Whenever a measurement is assigned to a track, the score is increased until it reaches a defined maximum score. Likewise, the score is decreased for frames in which no measurement was assigned to the track. When the score drops below zero, the track is finally deleted. Measurements that have not been assigned to any track are regarded as new tracks. These can be objects that just entered the observed area or measurements of tracks that have already been deleted. A more complex strategy for creation and deletion of tracks that takes the position of a measurement inside the observed area into account is presented in [4], but not considered in this paper.

## 5 Enhanced implementation of the auction algorithm

In this section, the proposed implementation of the auction algorithm for multitarget tracking is presented. It contains two improvements that increase the speed of the algorithm and/or lower memory usage. These improvements are integrated both in an OpenCL and CUDA implementation, for which results are presented in Sect. 6. It mainly consists of two kernels handling the bidding and assignment phase, respectively, and a kernel containing both phases as well as the convergence test. The latter is a necessity for the improvement proposed in Sect. 5.2.

### 5.1 Replacing the bidding matrix by one 1D array

*Person ID*wins.

*K*in correspondence with \(d_{\text {max}}\). In order to do so, the upper bound of the possible bid increment can be calculated such that

*c*, Eq. (5) holds, where

*K*indicates the scaling factor.

*K*is chosen such that the number of available bits suffices for

*c*for a given \(d_{\text {max}}\) and \(\epsilon \).

This approach is advantageous for several reasons. Firstly, memory usage and therefore the amount of data that potentially requires transfer to GPU memory is significantly reduced and can be formulated as follows. In cases when \(|N| \approx |M|\), the complete bidding matrix consists of \(|N|^2\) entries. Applying the improvements from [17], the number of entries can be decreased to 2|*N*|. The improvement proposed in this paper reduces the amount of required memory to |*N*|. This is particularly advantageous for embedded systems, which are often used in sensor-based sorting. Furthermore, compared to [17], the proposed approach allows utilization of atomic functions instead of locking two fields whenever an ownership is to be updated. More precisely, using a mutex instead, at least one atomic function would be necessary to receive the lock, followed by at least two read/write operations, and another atomic call to release the lock. In our case, only one atomic function call is necessary. Also, fewer fields require resetting, i.e., setting to zero, between the iterations. Another advantage lies in avoiding inefficient access of memory. The proposed approach enables memory coalescing, which is not possible when using a bidding matrix, because either the bidding or the assignment kernel (depending on whether the sets were swapped) requires all values from a column of the matrix to be accessed in one time step. These elements are then not located consecutively in memory. Lastly, identifying the highest bid during the assignment phase is not required anymore, since the information is directly stored in the corresponding field.

### 5.2 Synchronization on the GPU

An implementation of the auction algorithm requires the bidding phase to be completed for all persons before the assignment phase may start. Likewise, the assignment phase must be completed before the convergence test can be run. Consequently, this requires synchronization steps between each of the phases.

OpenCL enables synchronization of *work-items* which are part of the same *work-group* on-GPU. We propose to exploit this property, such that no synchronization handled by the CPU is required. However, due to the restriction that all *work-items* need to be part of the same *work-group*, the extent to which this improvement can be used depends on the hardware. Yet, it is important to note that this information can be retrieved during run time and hence whether to enable this feature or not can be dynamically decided for each time step. Likewise, CUDA supports the concept of several *threads*, which may be part of the same *block*. Aforementioned synchronization procedures are realized in the same way.

## 6 Test methodology and experimental results

In this section, the environment used for experimentation and corresponding results are presented.

### 6.1 Setup and datasets

*Spheres 1*, spheres with a diameter of \(5 \hbox { mm}\) were simulated, while for

*Spheres 2*, the diameter was reduced to 2.5 mm in order to simulate an even higher throughput in terms of the number of objects. This eventually leads to varying problem sizes considered in the linear assignment problem.

Summary of the datasets considered for the experimental evaluation

Name | Source | Sample rate (Hz) | No. of objects per frame |
---|---|---|---|

Pepper corn 1 | Camera | 220 | |

Pepper corn 2 | Camera | 220 | |

Spheres 1 | Simulation | 100 | |

Spheres 2 | Simulation | 100 | |

Spheres and plates | Simulation | 100 | |

Cylinders | Simulation | 200 | |

Generally, the parameters for the scoring system as described in Sect. 4 need to be set carefully. However, with respect to the datasets obtained via simulation, it is important to note that the data is noise free. Therefore, occlusions and missing measurements do not occur. Although collisions might occur between the objects, the input data for the tracking system is already reduced to the centroids of the objects and two centroids can never occupy the same point in space. Below, an initial track score of 5, an increase of 2, decrease of 1, and maximum score of 10 are considered without further variation.

All experiments presented were run on an Intel Core i7-6700 with 16 GB DDR-4-2133 RAM. The operating system was Microsoft Windows 10. The CUDA code was run on a dedicated graphics card, namely NVIDIA Titan X with Pascal architecture. CUDA compute capability version 3.5 was used. With respect to the OpenCL implementation, results are presented for both the aforementioned GPU and an integrated one, namely an Intel HD 530. For OpenCL, version 1.2 was used. The maximum work-group size for the Titan X GPU is 1024, and for the Intel HD 530 GPU, it is 256. For all experiments, the maximum possible work-group size was chosen, i.e., either the problem size or the maximum size supported by the hardware. It is important to note that for the improvement discussed in Sect. 5.2 to apply, it is a necessity that the problem size does not exceed the possible work-group size.

In addition to absolute times reported, speedup values are used for comparison with a reference in the remainder. Values reported are defined as \({t_{\text {reference}}}/{t_{\text {proposed}}}\). As a reference, the implementation published with [6] is used. For the CUDA-based code, the original sources are used, and for OpenCL a porting of the sources.

### 6.2 Experimental results

Overview of the tracking quality for the datasets obtained from simulation

Dataset | Total objects | Errors |
---|---|---|

Spheres 1 | 12134 | 130 |

Spheres 2 | 29693 | 19 |

Spheres and plates | 3599 | 5 |

Cylinders | 4412 | 0 |

*Spheres & Plates*, rather low speedups ranging from 1.04 for the CUDA-based implementation up to 1.11 utilizing OpenCL and the HD 530 graphics chip are obtained. However, for the dataset including the highest number of average measurements, namely

*Spheres 2*, a speedup of 1.4 is reported for CUDA on the Titan X, 1.45 for OpenCL on the Titan X, and even 1.7 for OpenCL on the HD 530. Also, from the description of the datasets as provided in Table 1, it can be observed that especially for comparatively large problem sizes, our improvements are fully effective. The latter can further be observed from Fig. 6. Here, two examples of the required time per frame are provided.

Speedup values compared with an implementation without the proposed optimization

Dataset | CUDA | OpenCL | OpenCL |
---|---|---|---|

Titan X | HD 530 | ||

Pepper corn 1 | 1.04 | 1.21 | 1.27 |

Pepper corn 2 | 1.10 | 1.20 | 1.13 |

Spheres 1 | 1.38 | 1.36 | 1.52 |

Spheres 2 | 1.40 | 1.45 | 1.70 |

Spheres & Plates | 1.04 | 1.09 | 1.11 |

Cylinders | 1.12 | 1.20 | 1.28 |

*Spheres*datasets. Considering the

*Spheres 2*dataset, the ratio is reduced by 26 percentage points in the CUDA implementation running on the Titan X graphics card and by as much as 31 percentage points using OpenCL on the same hardware. With respect to the HD 530 hardware, it becomes clear that large problem sizes cannot be handled under the given time constraint. Further, it is important to note that these numbers are based on the average processing time per frame for the individual datasets. From Fig. 6, it becomes clear that not exceeding 5 ms of processing time is not possible for each individual frame.

Ratio of the average required and the available processing time for the auction algorithm considering a camera operating at 200 Hz

Dataset | CUDA Titan X | OpenCL Titan X | OpenCL HD 530 | |||
---|---|---|---|---|---|---|

No optimization (%) | Proposed (%) | No optimization (%) | Proposed (%) | No optimization (%) | Proposed (%) | |

Pepper corn 1 | 13 | 13 | 18 | 15 | 33 | 26 |

Pepper corn 2 | 20 | 18 | 20 | 20 | 41 | 36 |

Spheres 1 | 43 | 32 | 48 | 36 | 120 | 79 |

Spheres 2 | 91 | 65 | 100 | 69 | 288 | 170 |

Spheres and plates | 7 | 7 | 9 | 9 | 10 | 8 |

Cylinders | 8 | 7 | 10 | 8 | 21 | 17 |

## 7 Conclusion

In this paper, improvements to a GPU-based implementation of the auction algorithm were proposed that result in lower memory usage as well as increased speed. Regarding the latter, it was demonstrated that the approach outperforms conventional implementations of the algorithm. In the best case, it performs 1.7 times as fast and the geometric average of the speedup is 1.24 when averaged over all platforms. With respect to the different hardware considered, the geometric average of the speedup using CUDA is 1.17 and 1.25 for OpenCL when run on the Titan X GPU and even 1.32 using OpenCL in combination with the HD 530 GPU. Also, it was shown that, especially for huge problem sizes, the proposed approach can support fulfilling firm real-time requirements. This was further elaborated in the example of data analysis in sensor-based sorting.

Regarding future work, the aim is to focus on particularly challenging situations in terms of computational burden. The experimental results presented reveal that although the run time can be significantly reduced and acceptable processing times can be achieved on average, the real-time requirement cannot be fulfilled for certain individual frames. This problem may be tackled by dynamically adapting \(\epsilon \) such that it becomes more likely that fewer auction iterations are required. Also, a hard threshold regarding the maximum number of allowed iterations may be introduced. However, it is important to note that the output quality of the auction algorithm does not increase monotonically over the iterations. Therefore, the approach would result in a loss of quality, which is not the case for the approach presented in this paper.

## Notes

### Acknowledgements

IGF project 18798 N of research association Forschungs-Gesellschaft Verfahrens-Technik e.V. (GVT) was supported by the AiF under a program for promoting the Industrial Community Research and Development (IGF) by the Federal Ministry for Economic Affairs and Energy on the basis of a resolution of the German Bundestag.

## References

- 1.Kępys, W.: Opto-pneumatic separators in waste management. In: żynieria Mineralna
**17**(2016)Google Scholar - 2.Pfaff, F., Baum, M., Noack, B., Hanebeck, U.D., Gruna, R., Längle, T., Beyerer, J.: Tracksort.: Predictive tracking for sorting uncooperative bulk materials. In: IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), IEEE, pp. 7–12 (2015)Google Scholar
- 3.Pfaff, F., Pieper, C., Maier, G., Noack, B., Kruggel-Emden, H., Gruna, R., Hanebeck, U.D., Wirtz, S., Scherer, V., Längle, T., et al.: Improving optical sorting of bulk materials using sophisticated motion models. tm-Tech. Mess.
**83**(2), 77–84 (2016)Google Scholar - 4.Pfaff, F., Pieper, C., Maier, G., Noack, B., Kruggel-Emden, H., Gruna, R., Hanebeck, U.D., Wirtz, S., Scherer, V., Längle, T., Beyerer, J.: Simulation-based evaluation of predictive tracking for sorting bulk materials. In: 2016 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI). (Sept 2016) pp. 511–516Google Scholar
- 5.Dell’Amico, M., Toth, P.: Algorithms and codes for dense assignment problems: the state of the art. Discret. Appl. Math.
**100**(1–2), 17–48 (2000)MathSciNetCrossRefMATHGoogle Scholar - 6.Vasconcelos, C.N., Rosenhahn, B.: Bipartite graph matching computation on GPU. In: International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition, Springer, pp. 42–55 (2009)Google Scholar
- 7.Bertsekas, D.P.: The auction algorithm: a distributed relaxation method for the assignment problem. Ann. Oper. Res.
**14**(1), 105–123 (1988)MathSciNetCrossRefMATHGoogle Scholar - 8.Narendra, V., Hareesha, K.: Prospects of computer vision automated grading and sorting systems in agricultural and food products for quality evaluation. Int. J. Comput. Appl.
**1**(4), 1–9 (2010)CrossRefGoogle Scholar - 9.Lessard, J., de Bakker, J., McHugh, L.: Development of ore sorting and its impact on mineral processing economics. Miner. Eng.
**65**, 88–97 (2014)CrossRefGoogle Scholar - 10.Kleiv, R.A.: Pre-sorting of asymmetric feeds using collective particle ejection. Physicochem. Probl. Miner. Process.
**48**(1), 29–38 (2012)Google Scholar - 11.Batchelor, A., Ferrari-John, R., Katrib, J., Udoudo, O., Jones, D., Dodds, C., Kingman, S.: Pilot scale microwave sorting of porphyry copper ores: part 1-Laboratory investigations. Miner. Eng.
**98**, 303–327 (2016)CrossRefGoogle Scholar - 12.Cubero, S., Aleixos, N., Moltó, E., Gómez-Sanchis, J., Blasco, J.: Advances in machine vision applications for automatic inspection and quality evaluation of fruits and vegetables. Food and Bioprocess Technol.
**4**(4), 487–504 (2011)CrossRefGoogle Scholar - 13.Gruna, R., Beyerer, J.: Feature-specific illumination patterns for automated visual inspection. In: Proceedings IEEE International Instrumentation and Measurement Technology Conference (I2MTC), Graz, Austria (May 2012)Google Scholar
- 14.Ferreira, T., Sesmat, S., Bideaux, E., Sixdenier, F.: Experimental analysis of air jets for sorting applications. In: 8th FPNI Ph.D. Symposium on Fluid Power, American Society of Mechanical Engineers (2014) V001T01A007–V001T01A007Google Scholar
- 15.Maier, G., Pfaff, F., Pieper, C., Gruna, R., Noack, B., Kruggel-Emden, H., Längle, T., Hanebeck, U.D., Wirtz, S., Scherer, Viktor Beyerer, J.: Improving material characterization in sensor-based sorting by utilizing motion information. In: OCM 2017 - Optical Characterization of Materials, KIT Scientific Publishing (2017, in press)Google Scholar
- 16.Maier, G., Pfaff, F., Pieper, C., Gruna, R., Noack, B., Kruggel-Emden, H., Längle, T., Hanebeck, U.D., Wirtz, S., Scherer, V., Beyerer, J.: Fast multitarget tracking via strategy switching for sensor-based sorting. In: 2016 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), pp. 505–510 (Sept 2016)Google Scholar
- 17.Vasconcelos, C.N., Rosenhahn, B.: Bipartite graph matching on GPU over complete or local grid neighborhoods. In: International Work-Conference on Artificial Neural Networks, Springer, pp. 425–432 (2011)Google Scholar
- 18.Zhu, P., Zhang, C., Li, H., Cheung, R.C., Hu, B.: An FPGA-based acceleration platform for auction algorithm. In: IEEE International Symposium on Circuits and Systems. IEEE
**2012**, pp. 1002–1005 (2012)Google Scholar - 19.Date, K., Nagi, R.: GPU-accelerated Hungarian algorithms for the Linear Assignment Problem. Parallel Comput.
**57**, 52–72 (2016)MathSciNetCrossRefGoogle Scholar - 20.Pieper, C., Maier, G., Pfaff, F., Kruggel-Emden, H., Wirtz, S., Gruna, R., Noack, B., Scherer, V., Längle, T., Beyerer, J., et al.: Numerical modeling of an automated optical belt sorter using the discrete element method. Powder Technol.
**301**, 805–814 (2016)CrossRefGoogle Scholar - 21.Cundall, P.A., Strack, O.D.: A discrete numerical model for granular assemblies. Geotechnique
**29**(1), 47–65 (1979)CrossRefGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.