Synonyms
Definition
The testing and evaluation of biometrics is a complex task. The difficulties in such an endeavor include the selection of the number and type of individuals that will participate in this process of testing. Determining the amount of data to be collected is another important factor in this process. Choosing an appropriate set of individuals from which to collect biometrics data is another important aspect of testing a biometrics system.
Introduction
The assessment of a biometric system’s matching performance is an important part of evaluating such a system. A biometric implementation is an ongoing process and as such will be treated as a process in the sense of Hahn and Meeker [1]. Thus, any inference regarding that process will be analytic in nature rather than enumerative as delineated by Deming [2]:
An enumerative study has for its aim an estimate of the number of units of a frame that belong to a specified class. An analytic study has for its aim a basis for action on the cause-system or the process, to improve product of the future.
Here focus is on determining the amount and type of data necessary for assessing the current matching performance of a biometrics system.
The matching performance measures that are commonly considered most important are the false match rate (FMR) and the false non-match rate (FNMR). One of the important parts of designing a test of a biometrics system is to determine, prior to completion, the amount of testing that will be done. Below calculations that explicitly allow for determining the amount of biometric data which will be sampled are described. As with any calculations of this kind it is necessary to make some estimates about the nature of process beforehand. Without these, it is not possible to determine the amount of data to collect. These sample size calculations will be derived to achieve a certain level of sampling variability. It is important to recognize that there are other potential sources of variability in any data collection process.
Selection of the individuals from whom these images will be taken is another difficulty because of the need to ensure that the biometric samples taken are representative of the matching and decision making process. The goal of any data collection should be to take a sample that is as representative as possible of the process about which inference will be made. Ideally, some probabilistic mechanism should be utilized to select individuals from a targeted population. In reality, because of limitations of time and cost, this is a difficult undertaking and often results in a convenience sample, Hahn and Meeker [1].
Test Size Calculation
Determining the amount of biometric information to collect is an ongoing concern for the evaluation of a biometrics system. Several early attempts to address this problem include those by Wayman [3] and [4] as well as the description in Mansfield and Wayman [5] of the “Rule of 3” and the “Rule of 30”. The former is due to several authors including Louis [6] as well as Jovanovic and Levy [7], while the latter, the so-called Doddington’s Rule, is due to Doddington et al. [8]. Mansfield and Wayman note that neither of these approaches is satisfactory since they assume that error rates are due to a “single source of variability”, which is not generally the case with biometrics. Ten enrolment-test sample pairs from each of a hundred people is not statistically equivalent to a single enrolment-test sample pair from each of a thousand people, and will not deliver the same level of certainty in the results.
Effectively, the use of either the “Rule of 3” or the “Rule of 30” requires the assumption that the decisions used to estimate error rates are uncorrelated. More recently, Schuckers [9] provided a method for dealing with the issue of the dual sources of variability and the resulting correlations that arise from this structure.
The calculation given below is for the determination of the number of comparison pairs, n, from which samples need to be taken. Define a comparison pair, similar to the enrolment-test sample pair of Mansfield and Wayman [5], as a pair of possibly identical individuals from whom biometric data or images have been taken and compared. If the two individuals are the same then call the comparison pair a genuine one. If the two individuals are distinct then call the comparison pair an imposter one. In order to use this information to determine test size, it is necessary to specify some estimates of the process parameters before the data collection is complete. In order to obtain sample size calculations it is necessary to make these specifications. It is worthwhile noting here that most other biological and medical disciplines use such calculations on a regular basis and the U.S. Food and Drug Administration requires them for clinical trials. Approaches to carrying this out are discussed below.
This correlation structure is based upon the idea that there will only be correlations between decisions made on the comparison pair but not between decisions made on different comparison pairs. Thus, conditional upon the error rate, there is no correlation between decisions on the ith comparison pair and decisions on the i′th comparison pair, when i ≠ i′. The degree of correlation is summarized by ρ. This is not the typical Pearson’s correlation coefficient, rather it is the intra-class correlation or here the intra-comparison pair correlation. More details can be found in Schuckers [10].
Illustration of the use of (7)
α |
B |
γ |
m |
ρ |
n | |
---|---|---|---|---|---|---|
0.05 |
0.005 |
0.01 |
10 |
0.4 |
700 | |
0.05 |
0.01 |
0.01 |
10 |
0.4 |
175 | |
0.01 |
0.005 |
0.01 |
10 |
0.4 |
1,209 | |
0.05 |
0.005 |
0.02 |
10 |
0.4 |
1,386 | |
0.05 |
0.005 |
0.01 |
5 |
0.4 |
792 | |
0.05 |
0.005 |
0.01 |
10 |
0.1 |
290 |
Equation (7) is straightforward for calculation of the number of comparison pairs that need to be tested when γ = FNMR. It is less so when interest centers on γ = FMR. This is because for FNMR the number of comparison pairs translates to the number of individuals, while for FMR the number of comparison pairs is not proportional to the number of individuals. If all cross-comparisons are used to estimate FMR, then one can replace n with n ^{ ∗ }(n ^{ ∗ } − 1) in (7). In that case n ^{ ∗ } will be the number of individuals that need to be tested.
Sample Selection
Once the number of individuals to be selected is determined, another important step is to specify the target population of individuals to whom statistical inference will be made. Having done so, a sample would ideally be drawn from that group. However, this is not possible often. The next course of action is to specify a sample that is as demographically similar to the target population as possible. The group of individuals that will compose the sample is often referred to as the “volunteer crew” or simply the “sample crew”, Mansfield and Wayman [5]. The more similar the sample crew is to the target population the more probable it will be that the estimates based upon the sample crew will be applicable to the target population. Often the sample crew is chosen to be a convenience sample, Hahn and Meeker [1]. Methodology for best selecting the sample crew is an open area of research in biometrics.
Summary
Testing and evaluation of biometric devices is a difficult undertaking. Two crucial elements of this process are the selection of the number of individuals from whom to collect data and the selection of those individuals. Determining the number of individuals to test can be calculated based on (7). To obtain the number of individuals that need to be tested, some process quantities need to be specified. These specification can be based on previous studies, pilot studies or on qualified approximations. Selection of the “crew” for a study is a difficult process. Ideally a sample from the target population is the best, but a demographically similar “crew” is often more attainable. The inference from a demographically similar crew can be improved by the use of poststratification.
Related Entries
Influential Factors to Performance