1 Introduction

Human body, its behaviours or interactions and any kind of materials or clothing attached to it are rich sources of information for identification. These sources of information can be used for features based recognition in surveillance and retrieval of probe from a larger group present in the gallery or database [45, 64, 98, 133].

The set of features used can be either intrusive or non-intrusive or both. This survey is focusing on non-intrusive features as they provide seamless recognition and retrieval [119]. However, both types of features are categorized as biometrics [29].

The field of soft biometrics is emerging as a potential alternate to traditional biometrics in recent years [29]. There are multiple reasons behind it, like non-intrusive nature of features or traits [45], independence at modality and feature level [188], presence of semantic description for each individual trait [49] and finally, a seamless method for recognition and retrieval [46]. However, the field is still facing several challenges and there are number of gaps that need to be filled before declaring it a replacement for traditional biometrics.

Due to the importance of the topic and the recent advancements, several surveys have made efforts to summarize the key developments in the field of soft biometrics. In [19], goal is to explore and analyze the effectiveness of facial soft biometrics over traditional face recognition systems in challenging realistic environments where variation of pose, expression and occlusion etc. is higher. The objective of using facial soft biometrics is to facilitate face recognition systems or develop a standalone facial soft biometrics recognition system. The paper summarizes techniques describing different types of face soft biometrics like facial marks, geometric measures, color, gender and ethnicity along with feature extraction and classification model used for this purpose. In [76], to extract 6 different facial soft biometrics i.e. gender, age, ethnicity, moustache, beard and glasses, 2 commercial off the shelf systems were used. As a prerequisite step, multiple techniques for facial soft biometrics extraction, fusion of soft and hard biometrics and search space reduction methods were compared. In another work [71], for continuous authentication during online session, techniques to extract facial soft biometrics i.e. ethnicity, eye color, hair color, skin color and gender were explored and compared. The system comprises of a traditional biometrics authentication process.

The field of soft biometrics is not just limited to facial region, it is beyond face, involves body including limbs and material attributes like clothing and accessories etc. In [46], a bag of soft biometrics from three modalities i.e. face, body and clothing is presented. This collection defines nature of attribute value, permanence, discrimination power and suggestiveness for each soft biometrics too. One of the most comprehensive survey exploring the field of soft biometrics domain [45], discusses, how soft biometrics transformed from an ancillary component of hard biometrics systems to standalone soft biometrics systems. It discusses the different soft biometrics modalities, a small set of features in each modality, taxonomy and techniques for gender, age and ethnicity recognition before year 2016.

The field of soft biometrics is gaining more and more attention over traditional biometrics due several factors along with its non-intrusive authentication nature. The standalone soft biometrics experimental systems have been developed for unconstrained environments. In our survey, a comprehensive approach towards evaluation and analysis of soft biometrics systems development in the later half of current decade followed by [45]. The survey highlights effectiveness of soft biometrics over traditional biometrics based upon certain factors, it redefines the structure of modalities and provides more than 10 times larger bag of soft biometrics, while authors of [46] focusses only on a limited number of traits. The survey also describes annotation methods, types and a summary of datasets annotated for the evaluation of soft biometrics systems. To develop robust soft biometrics systems, the survey analyzes four critical factors affecting individual soft traits. A comparison of most recent standalone soft biometrics applications developed for authentication using modalities from whole human body rather than face only [19, 71, 76] is part of survey too. The most recent global soft biometrics i.e. gender, age and ethnicity estimation techniques in both hybrid framework and independently are compared in the survey. Moreover, the soft biometrics estimation and classification using selfie and ocular images is also covered in the survey. Finally, a list of open challenges and possible solutions based upon analysis in the survey is a key contribution presented.

1.1 Soft biometrics: non-intrusive biometrics

Usually, the intrusive features are fingerprints [37], retina scans [186] etc. They are referred as traditional or hard biometrics [188]. The non-intrusive features are called soft biometrics. There is a large number of non-intrusive features present in the whole human body. These are demographic, global, anthropometric, material, behavioural and medical etc. [45, 188]. This set of soft biometrics features can be estimated or extracted from whole human body beside face or head only. The whole human body includes limbs, clothing and style etc. [133].

The origins of soft biometrics could be traced back to 18th century where Bertillon system was used for criminal identification based on suspects physical description [134]. The attributes used to quantify the physical description were termed as antropometric measurements which include head length, head breadth, length of middle finger, length of the left foot, and length of the cubit. These attributes were classified as body geometry and face geometry both of which were supplemented with a mug shot of the individual. A detailed review of soft biometric features derived from the Bertillon system could be found in [45, 46, 133].

Despite the success of Bertillon systems, the lack of ability to generalise such features in the earliest part of 20th century, the attention of the scientific community turned to using hard biometrics with fingerprinting still leading being a primary source of person identification within criminal justice systems. Until recently, the study of soft biometrics was only considered as an ancillary research domain to biometrics with research focused on the hybrid identification frameworks that investigated feature fusion of intrusive and non-intrusive features. The fusion framework performs identification in a group [213], or continuous authentication during an online session [14, 149]. The overall aim was better recognition or retrieval [214].

One of the major reasons behind using soft biometrics only is seamless recognition and retrieval and this type of system is called standalone soft biometrics system. The seamless recognition is composed of non-intrusive features which are easier to estimate in unconstrained environments too. That is why, it is essential to look for non-intrusive features present in different human body modalities.

1.2 Soft biometrics: modalities and features

Similar to traditional biometrics, soft biometrics is extracted set of features from whole human body [71].

The human face or head, its appearance, structure and body including limbs are permanent modalities [214]. Any material to cover like clothing is temporary modality [134]. The Fig. 1 presents an overview of soft biometrics modalities and traits. The global, face and body are permanent modalities while clothing is temporary one. There exist features or traits in each modality. Perhaps, the number of features in each modality is infinite and depends upon scenario. To build understanding, we have presented a few soft traits from each modality, however, a comprehensive list of soft traits can be found in Tables 45 and 6.

Fig. 1
figure 1

Soft Biometrics Modalities and Features Present in Human Body

1.3 Soft biometrics: semantic description

So, each modality of human body contains a richer set of soft traits. Indeed, there is a need to provide semantic description for each soft trait. The semantic description is actually real-world definition of a soft trait mostly depends upon scenario either recognition or retrieval [49].

The human beings recognition process for each other in real life is interesting to study. Suppose, a person is male, short height, thin face and wearing black coat. It is the semantic description against soft traits gender, height, face type and clothing. This is categorical method of describing an individual semantically. Using attribute values, one can also be retrieved from a group [134] .

The other way of identification is to compare a subject against set of subjects. For example, comparing probe against each in the gallery or database using soft traits [134]. In soft biometrics, both categorical and comparative methods of semantic description are in practice. These are referred as qualitative descriptions [166].

To find match for the probe in the gallery or identification in surveillance scenario, we need some sort of quantitative description for soft traits. Till date, soft biometrics-based recognition and retrieval is focused more on qualitative description i.e. categorical or comparative. On the other hand, we have quantitative methods like anthropometrics and geometric measurements. These are hard to estimate in real-world scenario but experimental outcomes are good enough [87]. Indeed, quantitative method is a much better solution to improve recognition or retrieval accuracy.

The Table 1 presents concept of possible semantic descriptions for few soft traits. For example, the height of human object. It is described using all three types of annotations. In real-world observation, this is true for height. The gender is categorical only either male or female. This is again true reflection of real-world observation. However, the Inter Eye Distance is not categorical but comparative and absolute. In real-world perception, this is little bit hard to define it in categorical terms.

Table 1 Some soft traits and possible semantic descriptions

1.4 Soft biometrics: a seamless solution for recognition

As of definition, “Soft Biometrics traits are physical, behavioural or adhered human characteristics, classifiable in pre–defined human compliant categories. These categories are, unlike in the classical biometric case, established and time–proven by humans with the aim of differentiating individuals. In simple words, the soft biometric traits instances are created in a natural way, used by humans to distinguish their peers” [11, 46].

One of the major reasons to adopt soft biometrics is authentication in a seamless manner as depicted in [14, 132, 149], for example, during online session or exam etc. On other hand, the set of soft biometrics features is increasing in number. This is another motivation for a standalone soft biometrics recognition or retrieval system development [45].

The applications of standalone soft biometrics recognition or retrieval are experimented over the years in surveillance [164, 180], social robotics [47], IoT [195], social media and mobile authentication [72, 129]. These factors are strong motive to use soft biometrics for seamless authentication. Such diverse applications has necessitated the importance of adopting soft biometrics in public spaces for enhanced security.

A conceptual framework for the implementation of soft biometrics based recognition system is presented in Fig. 2. The possible soft traits which can be calculated are height, spectacles, ethnicity, gender and age etc. Some of these are permanent while others are temporary.

Fig. 2
figure 2

A Conceptual Framework of Seamless Recognition

1.5 Contributions and organization of paper

In this study, we are focused on research carried out for the development of standalone soft biometrics recognition or retrieval system only. There is particular emphasis on literature since year 2015 and ownwards. The latest surveys [45, 133] already covered emergence of soft biometrics, its historical perspective, types of modalities, features and fusion with hard biometrics in detail.

So, we have summarized what has changed from previous surveys in soft biometrics research. This also includes discussion on several critical challenges. To provide a comprehensive view of the field, we identified key problems and success stories too. Indeed, this is more up to date work analyzed in the field. This survey has following contributions.

  1. 1.

    This paper summarizes datasets used in soft biometrics based recognition or retrieval, their volume, subjective and environmental diversity etc. annotation methods and types. To the best of our knowledge, we also built a largest novel collection of soft biometrics features, from different research experiments.

  2. 2.

    To improve overall recognition or retrieval performance, the study also analyzes four critical factors. These factors are attribute correlation, distance, attribute permanence score and discrimination power. These factors directly affect overall recognition or retrieval performance.

  3. 3.

    The paper also compares both modality and feature level fusion frameworks from different research experiments. It states that fusion of certain modalities and features improves performance.

  4. 4.

    Finally, the most recent hybrid soft biometrics based recognition or retrieval systems are compared in this study using a multi-scale criterion. The paper also compares three global traits i.e. gender, age and ethnicity using same multi-scale criterion. This is done in both hybrid framework and independently.

In our opinion, above stated aspects affect the overall performance of any soft biometrics system. It is critical to address them first. That is why, we decided to explore and summarize the techniques tackling these challenges. Finally, we explored and listed the open challenges present in the field. The recommendations to cope these challenges are discussed too.

The rest of the paper is organized as follows. The datasets, annotation methods, annotation types and bag of soft traits is presented in Section 2. The Section 3 discusses key factors affecting soft traits. A comparison of features and modalities level fusion is also part of the same section. A multi-scale comparison of recognition or retrieval techniques for hybrid soft biometrics systems is presented in Section 4. The multi-scale comparison of global traits is present in the same section too. In Section 5, we have identified the open challenges present in the field and presented possible recommendations. The Section 6 is conclusion summarizing the key contributions of the paper.

2 Soft biometrics: datasets

It is prerequisite to understand what type of dataset is used in soft biometrics application. The dataset annotation method and type. Moreover, diversity in datasets i.e. number of different individuals, their number of image or video sessions, session gap, gender, age and ethnicity ratios and number of modalities and features used in dataset. Finally, the recording environment, constrained or unconstrained [106].

These are the critical properties to consider. The accuracy of a standalone soft biometrics recognition system is dependent on these stated properties of a dataset. Before discussion on dataset diversity, we will discuss the types of annotation processes and their outcome first.

2.1 Annotation processes and types

To annotate soft traits, various kind of methods practiced in research. The goal of each method is to improve recognition and the outcome of each method is a set of labels for a soft trait like in [76]. The Table 2 summarizes relationship between different types of annotations and methods. It is important to remember that purpose of annotations is matching after automated estimation from dataset or recognition in surveillance.

Table 2 Annotations methods and types

Since, the emergence of soft biometrics research, categorical and comparative annotations are the most common outcomes using annotation methods like expert opinion and crowdsourcing. In expert opinion, the entire dataset is annotated by an individual expert [113].

In crowdsourcing which is a richer form of annotation in terms of annotators, a very large number of people from diverse background [219] perform annotations. Each annotator is given training before hand, but lack of experience and expertise can be an issue. Both categorical and comparative annotations are based on human perception, a qualitative measure.

For example, to categorize individuals or compare them using height attribute will result in qualitative terms using categorical or comparative method. In soft biometrics research, a large number of datasets are annotated using stated two methods and using one or both types of annotations [173].

To improve soft biometrics based recognition, it is critical to annotate human body soft traits in quantitative terms. The methods to measure various geometrical attributes of human body are needed for datasets annotation. Moreover, the estimation methods in surveillance scenarios are essential to explore too. [87]. In broader sense, these measured features used for annotation of datasets are called absolute annotations or anthropometric features. These are actually geometric measurements of human body having its roots in ancient history i.e. Bertillonage system for suspect identification [65].

As stated earlier, the most critical factor to determine the accuracy of a soft biometrics recognition system is diversity present in dataset. To the best of our knowledge, no such dataset is developed specifically for soft biometrics recognition system evaluation yet. However, different image or video based facial [104] and pedestrian datasets are annotated. [106]. These datasets cover whole human body including limbs and clothes. The Southampton University Tunnel dataset and its variants etc. [7, 184] are good initiatives in this direction. In our work, we tried to summarize datasets along with their distinguishing properties in Table 3.

Table 3 Datasets: Attributes Count, Annotation Process, Type and Volume. Abbreviation: Respondents (R-nts), Responses (R-ses), Categorical (Cat), Comparative (Com), Subjects (Sub), Instances (Inst), Attribute (Att.), Crowdsource (C-Sou), Expert Opinion (E-Opi)

In [118], a newer method called super fine attribute annotation is proposed using crowdsourcing. The soft biometrics gender, age and ethnicity are re-annotated from famous Pedestrian Attribute (PETA) dataset [48] of images. This large-scale dataset is combination of 10 re-identification datasets. Each time respondents were given with an image and a 5-scale visual protype for each trait. The respondents were advised to perform matching of image with visual prototype. The 5-scale annotation type was categorical. This is perhaps the largest and most reliable annotation performed till date.

Although, crowdsourcing is considered more reliable way of annotating a dataset but for very large datasets it is not feasible. The time and effort increases as dataset volume increases. Before crowdsourcing, expert opinion has been a way to annotate datasets. It is done for lesser number of soft traits and on small datasets by an expert. Like in [194], and [193, 198], expert experience is exploited for soft traits-based dataset annotation.

It is evident from Table 3 that in all the expert opinion-based annotation scenarios the number of distinct individuals and their images are in few hundreds except [193], where it is about 1700. That is why, it consumed lot of time and effort too. Moreover, in expert opinion, only absolute and categorical annotations are performed. The expert opinion method reflects human perception actually. They describe an individual in front in qualitative or quantitative terms. There is only one recognition scenario [76] using Labelled Faces in the Wild (LFW) dataset, where we have larger dataset of images from everyday life but the number of features to be annotated is very small i.e. 6.

Speaking, one way or the other, crowdsourcing has been the most dominant way of annotating datasets. The annotations are performed for both types i.e. categorical and comparative. We have several datasets like Southampton Multibiometric Tunnel DB [193] and [115], Soton Gait DB [184], LFW-MS4 [91], their subsets and modified versions [174], which have been annotated using both categorical and comparative annotations. However, annotation method, number of respondents, amount of responses received, and the number of soft biometrics annotated is changing.

It is important to note that most of these datasets cover whole human body modalities i.e. face, body including limbs and clothing etc. In the following 9 different recognition experiments [7, 9, 81, 95, 97, 116, 121, 165, 166] and on 5 different datasets, the complete or subset are annotated using crowdsourcing for evaluation. They all contain balanced gender and ethnicity ratio from varying age group.

Surprising is the number of respondents involved in annotation process ranging from less than a 100 to more than 3000. So, the outcome is a very large number of annotations received for distinct features like in [116] and [7]. Important is that these datasets cover almost whole human body [95, 97]. That is why, the amount of time and effort utilized to annotate these large-scale datasets is too much higher, despite having qualitative annotation.

2.2 A novel collection/bag of soft traits

There are multiple research experiments ranging from recognition to retrieval, where datasets presented in Table 3 are used [19]. It is a preliminary step to annotate the dataset before using it for a specific type of research experiment. So, the list of soft traits used in these experiments is very long.

In our study, we have reported a collection of more than 170 soft traits used in one or more research experiments, using one or more annotation type and process. To the best of our knowledge, this is the largest and novel collection of soft traits present till date than [46, 165, 209] and [97]. These soft traits are from different human body permanent and temporary modalities.

The soft traits from temporary modality i.e. clothing are covered in Table 4 while Tables 5 and 6 cover permanent modalities i.e. face or head and body. The occurrence of a particular soft trait in various types of research experiment shows its potential to become the part of a standalone soft biometrics recognition system.

Table 4 Soft traits from temporary modality (clothing/material)
Table 5 Soft traits from permanent modality (face or head)
Table 6 Soft traits from permanent modality (body)

We have also explored and summarized the types of annotations used for any single soft trait. This is actually annotation type used in a single experiment or more than one. Each kind of annotation used connot be perfect in contrast to real-world observation. However, how many different annotation types are used for a specific soft trait not only share the experience of different scenarios but opens dimension to evaluate each annotation type in real-world terms too.

It is evident from Tables 45 and 6 that we have 35 attributes which are used in different research experiments more than three times and up to 6 times. However, almost 04 times more attributes are used twice or less. Leaving the traits having frequency twice or less, we still have large group of soft biometrics to consider as a potential candidate for more accurate standalone soft biometrics system. The higher occurrence of soft traits in research experiments is one parameter to be considered as potential candidate in recognition. However, there are several soft traits which are used once but they have been evaluated in challenging recognition scenarios. That is why, it would be a wise approach to look for scenario before choosing any particular soft trait.

3 Soft biometrics: critical factors affecting individual soft traits

The accuracy of standalone soft biometrics recognition system is dependent on several properties of its traits. These are genuine concerns associated with traits, although techniques are being developed to cope these. We are going to discuss following four critical factors i.e. attribute correlation, distance, permanence or stability, discrimination and feature or modality level fusion. These factors directly affect the overall performance of any soft biometrics recognition or retrieval system.

3.1 Attribute correlation

In previous section, we explored and summarized commonly annotated soft biometrics datasets. More importantly, a huge collection of commonly used soft traits is presented. This collection of features is highly significant towards improved recognition system development, a main objective of our research too.

Keeping the main objective in front, we decided to explore and summarize the techniques to find correlation between certain attributes. It is actually performed in several research experiments earlier. To find correlation between soft traits will not only limit the size of feature set but doing so, correlated features will work in conjunction for better recognition. There are several techniques present to find correlation between features. We also discussed few.

The most commonly used technique to find correlation between features is Pearson Correlation Coefficient [26]. It is a linear association between two variables and starts from -1, passes by zero and goes towards + 1. The zero indicates no association while heading towards + 1 indicates increasing association and heading towards -1 is opposite of it.

To improve accuracy of a recognition system, the correlation between two or more soft biometrics can be a significant input. For example, it is most likely that male can have moustache but female and children cannot certainly. It is evident from research that there is certain amount of correlation relationships exist between two or more soft biometrics and this correlation has significance towards more accurate recognition. Like in [76], correlation using Pearson Correlation Coefficient between 6 soft biometrics i.e. moustache, beard, glasses, ethnicity, age and gender is computed.

To elaborate the concept, we picked three different pairs of soft traits from four distinct experiments. We summarized outcome of correlation analysis using Pearson method in Fig. 3. The samples of highly correlated attributes i.e. Gender-Beard, the moderate correlation i.e. Age-Glasses and a negative correlation i.e. Age-Gender from [76] are shown. Similarly, Pearson Correlation coefficient is used in another experiment but on a different dataset of 23 soft traits. The dataset has global traits and from head and body region [193]. The dataset contains 23 attributes of 58 distinct subjects. The correlation outcome for three sample pairs of soft traits i.e. Ethnicity-Skin Color, Gender-Ethnicity and Gender-Hair Length is presented in Fig. 3 too.

Fig. 3
figure 3

Pairs of Soft Biometrics from Head and Body (Higher Positive Correlation to Higher Negative Correlation)

The higher positive correlation between Ethnicity-Skin Color is also true in terms of real-world observation, while a little negative correlation of Gender-Ethnicity can be ignored. However, it is important to note that there is a mismatch between correlation computation and real-world observation case of Gender-Hair Length. The computation is that female has longer hairs, but it is not true for every part of the world. This is one dimension in diversity of proposed dataset, what we intend to highlight.

In two more related experiments [7, 194] using Pearson Correlation Coefficient but on higher number of soft biometrics i.e. 24 in each, correlation is computed. The Southampton Biometric Tunnel DB was used in first experiment while ATVS Forensic DB and MORPH DB was used in second experiment. The objective was to observe the correlation behaviour when the number of attributes increases. From the analysis, the most significant correlation has been found between Figure-Face Width and Jaw Shape-Face Structure.

It is interesting true that some higher correlation exists between Figure and body structures, as observed by people in real world too. However, there are certain traits which shows independence, perhaps, leading towards discrimination power. Like Eye Size-Face Shape and Eyebrows Distance-Eyes Size, having correlation coefficient leading towards higher negative i.e. -0.6 and -1.0 respectively. This higher negative correlation depicts the real-world perception too. A very little amount of positive correlation has been found between Skin Color-Eyebrow Length and Face Size-Nose Size. The later pair can be considered for features estimation in conjunction, as it looks proportional in real world too.

In two more similar experiments, Pearson Correlation coefficient was computed using 21 and 17 clothing soft biometrics [95, 97], shown in Fig. 4. In both cases, it was proposed that each attribute having correlation coefficient higher than 0.5 has significance for recognition. It was observed that clothing attributes of upper and lower body have higher correlation. Similarly, Skin Exposure and Clothing Season shown higher positive correlation. The latter does not looks true in terms of real-world observation. Moreover, the corelation between Style Category-Tattoos and Upper-Lower Color Scheme has not enough to consider.

Fig. 4
figure 4

Pairs of Soft Biometrics from Clothing (Higher Positive Correlation to Higher Negative Correlation)

The last pairs in both experiments output higher negative correlation. It is a generalized opinion that higher coefficient means more likely to have two or more attributes simultaneously present in one object, while lower correlation is considered opposite of it. It is important to note that higher and lower are specific to a dataset. These coefficient changes as the size of dataset changes.

Like, Person Correlation coefficient analysis, Kendall’s method [102] is also used to measure strength of association between two variables or features. The scale used was from 0.1 to 1. The value 0.1 indicates least association while 1 indicates the higher association.

In an experiment to find correlation between 12 soft biometrics [116], Kendall’s correlation method was used as shown in Fig. 5. The attributes were gender, height, age, weight, figure, arm thickness, leg thickness, muscle build, chest size, skin colour, hair colour and hair length. It can be observed that a higher correlation is found between skin colour and hair colour. It means darker skinned objects have darker hairs. It looks true as well in real world observation. Moreover, gender and height have moderate association while skin colour and chest size are towards least association. Later two associations are also reflection of real-world perception.

Fig. 5
figure 5

Pairs of Soft Biometrics from Head and Body (Highly Correlated to Non- Correlated)

3.2 Impact of distance in recognition

Similar to other vision-based recognition systems, distance affects accuracy in estimating different soft biometrics. In an open recognition environment, it becomes a bigger challenge. We have investigated several research experiments to cope with this challenge. The Table 7 presents performance impact for soft traits estimation at three different distances.

Table 7 Effect of distance on soft traits estimation - lower the equal error rate (EER), higher the accuracy (Acc)

It is clear from Table 7 that distance is an important factor while estimating soft traits. For instance, in [81], a newer dataset of three modalities i.e. face, body and clothing attributes from human body is developed. The dataset contains 10 features from each modality and each feature is captured at three different distances i.e. far, medium and close and for every individual.

In fact, the key objective of experiment here is to investigate how distance influences recognition. To measure sensitivity to distance for each soft biometrics, Pearson Correlation Coefficient [26] is applied in three groups i.e. Far-Medium, Far-Close and Medium-Close on each trait. Its was an observation that most of the clothing traits and many body traits are less sensitive to varying distance while facial traits are highly sensitive. In simple words, clothing and body traits are easier to measure from far distance rather face.

Earlier, in a similar work [193], a study was done to analyse recognition accuracy when distance between subject and camera was changing continuously. Although, the proposed soft biometrics system was in conjunction with a face recognition system, but input was at three levels i.e. far, medium and close. There are total 23 features used, divided in three categories i.e. head, body and global. The recognition was evaluated solely based on soft biometrics. Again, the bodily traits are the largest set and presented lower EER in terms of far distance.

To recognize gender from both modalities i.e. face and body at three different distance, close, medium and far, a comparison is performed [75]. The objective of this activity was to analyse from which modality gender is easier to estimate and from far distance. In fact, it is always easier to estimate gender from far distance from body more accurately than face, however, it is essential that full body image must be available.

3.3 Permanence or stability score and discrimination power

To develop a standalone soft biometric recognition or retrieval system, it is critical to find a set of permanent and discriminating features. It is always a wise approach to use limited and highly relevant set of features in any recognition system. Same is true for soft biometrics recognition systems. However, features should have higher permanence score and discrimination power. There are certain experiments using various mathematical and statistical methods to compute stated properties for a soft trait. The Table 8 presents several techniques related to permanence score and discrimination power computation.

Table 8 Permanence/stability score and discrimination power of soft traits

In [9], a subset of LFW-MS4 image dataset is used with comparative annotations. The Pearson Correlation Coefficient [26] is applied on both visual and semantic space. The permanence score is computed for 24 facial attributes. It is observed that several attributes have more permanence in visual space than in semantic space.

In another experiment, statistical methods i.e. mean, and mode are applied on two famous facial datasets i.e. ATVS Forensic DB and MORPH DB [194]. Each of these datasets have 32 continuous and 24 discrete attribute annotations. The permanence score for each type of attribute is computed. Moreover, they have also computed discrimination power of each continuous and discrete annotated attribute. This is computed using ratio between inter and intra subject variability by developing a mathematical formulation.

The permanence score and discrimination power of 23 soft biometrics from different modalities i.e. face or head, body and global are computed. The statistical Mode and ratio between inter and intra subject variability is measured. The experiment was performed on Southampton Multibiometric Tunnel DB [193]. The dataset has comparative annotations.

3.4 Soft biometrics: attributes and modalities fusion analysis

Soft biometrics is now a very large set encompassing features from face or head, body including limbs and clothing etc [134]. Moreover, these traits are richer in terms of taxonomy too i.e. demographic, geometric or anthropometric, medical, material and behavioural [45]. In earlier days, a few fromnow soft biometrics were used to support traditional biometric recognition systems which is called fusion framework [213]. However, standalone soft biometrics started to evolve now. There are multiple types of soft biometrics recognition systems started to develop now using different fusion architectures [201].

Keeping idea in mind, we investigated various developed fusion frameworks fully based on soft biometrics modalities and traits ahead. We have also explored what are those soft biometrics which increases the overall fusion-based recognition system. The Table 9 presents a comparative analysis of fusion frameworks.

Table 9 Fusion vs single feature/modality performance analysis, abbreviation: accuracy (Acc), equal error rate (EER), rank-1 Id-R (R-1 Id-R), Face/Head (F/H)

There are so many fusion frameworks developed for soft traits based estimation. The fusion was at either modality level, feature level, or both. For example, in [81], fusion of soft biometrics traits from 3 different modalities i.e. face or head, body and clothing is performed for improved recognition. There were total 30 attributes, 10 from each modality. First, attributes from each modality used independently for recognition on images dataset. Then, three different fusion frameworks i.e. Bayes theory, Likelihood Ratio Test (LRT) and Support Vector Machine-LRT were used.

It was observed that all three fusion frameworks provided much better recognition than individual modalities. It improved when images were taken on varying distance. For the fusion score computation, a vector from each modality was build using mean and standard deviation for each attribute. On a similar dataset, having same number and type of features, different methods for fusion i.e. PCA, LDA, gCCA and Sg-CCA were applied. The results are presented in form of equal error rate [80].

In another approach, shape and skin colour from face while height and weight from body are integrated in a fusion framework for person identification [18]. Each of the four attributes after estimation from image were tested for identification independently. Then sequentially and finally a fusion framework of three attributes i.e. Facial Shape, Height and Weight was tested for identification using five different fusion methods. Each of the five methods presented rank-1 identification rate of minimum 80 percent, while fuzzy logic has been most successful with 88 percent.

Just to emphasis on the significance of fusion framework, gender was estimated from face and body modalities separately and then in a fusion of both [75]. It was observed that when a framework taking both face and body image into account for recognition, the accuracy is increased about 4 to 5 percent and even at far distance. Similarly, 2 features from face and 6 features from body were used as a single modality. First, independent modality and then fusion framework on CASIA Gait and FACES datasets was tested. The fusion proved better than individual ones [73].

In another approach, shape, orientation and size of facial traits were exploited for recognition using images [194]. The experiments were conducted on two datasets i.e. ATVS DB and MORPH DB. There were total 32 attributes with continuous and 24 with discrete values used for experimentation. Each of 32 and 24 attributes were used for experimentation independently and in fusion framework. It is evident that continuous features are better in recognition as they have lower equal error rate.

In another work, three modalities of whole human body i.e. face, body and clothing were considered for soft biometrics traits estimation in a recognition task [134]. To evaluate performance of fusion framework over independent modalities, experiments were carried out. It was concluded that fusion of modalities presents lower equal error rate than individual ones. Also, in both fusion frameworks experimented, attributes with comparative labels presented lower equal error rate than categorical. Also, the fusion of face-body-clothing has lower equal error rate than fusion of body-clothing.

4 Soft biometrics: comparison of recognition and retrieval systems

One of the biggest problem in the development of standalone soft biometrics recognition system is accurate extraction of soft traits from different human body modalities. This becomes more challenging in surveillance scenarios. In other words, it is also called visual description which is machine description of a soft trait. This is actually an estimated or extracted value of a trait from image or video.

The process of soft traits estimation and then recognition are actually detection of a specific person in an uncontrolled environment. These object detection techniques can be single or multi-stage [139,140,141]. The Table 10 compares different approaches for soft biometrics based recognition or retrieval for detecting a specific person in an uncontrolled environment.

Table 10 Overview of soft biometrics recognition or retrieval systems

To explore and understand achieved milestones, a large range of features estimation and classification approaches are compared. Earlier, we explored a richer set of soft traits, and now, the vision-based feature estimation and classification methods are summarized. To this end, goal is matching, either performing recognition or retrieval.

The soft biometrics-based recognition roots itself in traditional facial recognition systems but focusing only non-intrusive facial features. The recognition should be performed in a seamless manner. We are not getting into details of facial recognition systems here rather focusing on experiments which directly propose models for seamless recognition. These non-intrusive features are not just limited to face, rather covers whole human body and clothing etc.

In a recognition experiment from dataset using facial soft biometrics from images, component-based approach is applied. It first localizes facial landmarks, then construct components based on these landmarks. Finally, generates a vector of visual features from these components. The Active Shape Models (ASMs) [43], Active Appearance Models (AAMs) [42] and Constrained Local Models (CLMs) [44, 175] are used for landmark localization and facial component segmentation. Lastly, the step of visual features estimation using GIST [135] performed. The experiments were carried out on LFW-MS4 dataset of images and an equal error rate of 12.71% was recorded which indicates a higher accuracy level.

In another experiment using facial image from two different datasets ATVS Forensic DB and MORPH DB, the task of recognition is performed [194]. This time 21 facial landmarks are annotated manually by an expert and 11 attributes are computed automatically using geometric measurements. Then, using these manually and automatic attributes, a newer set of 24 discrete facial attributes is generated. Finally, using different similarity measures like Euclidean, Hamming and Mahalanobis, the task of facial recognition performed. The output is presented in the form of equal error rate.

Similar to earlier experiment, anthropometric features are used for recognition [152]. There are 19 features from face region to shoulders used in recognition. These are actually different geometric and appearance features. The FERET DB and AR DB are used for experimentation. First, facial landmarks are localized in semi-automated way using MATLAB getpts function. Then, horizontal, vertical, linear and non-linear measurements are obtained using Euclidean distance and spline curves available in MATLAB. These are called features. The classification at decision and feature level was performed using Euclidean distance by matching similarity and Adaboost.

Moving out from face and using 11 bodily anthropometric features, re-identification and retrieval from dataset is performed [128]. The silhouette of human is used for shape context description. The silhouette is considered very good for discrimination in identification applications. First, shape context is extracted using a custom-built shape context descriptor. For matching, a mathematical cost function is formulated for shape context feature matching and feature vector generation. Finally, linear regression is applied for re-identification.

Again, using human silhouette but from single shot images, the biometrics such as shoulder width, height, arms-length, hips width, hair colour and body complexion are estimated [198]. The Southampton Multi-Biometric Tunnel dataset is used for this purpose. The images from far distance from the camera are used to extract person silhouette and key points. Then, euclidean distance is used to extract features automatically. A support vector machine is used for classification. An accuracy chart for all 7 attributes is developed. It was intermediate to very good.

Similarly, along with silhouette, skin and hair colour from face or head, height and gait cycle from body is estimated using spatial segmentation. The support vector machine is used for recognition then [73]. The experiments is performed on CASIA and FACES datasets. The Rank-1 identification rate for individual modalities and fusion framework is computed.

In a slightly different approach on a single image, human height and couple of anthropometric attributes are estimated [25]. The method is combination of techniques from projective and single view geometry, having prior statistical knowledge of human anthropometry. The method is tested on 96 frontal images.

Similar to above, human height and shoulder breadth are estimated from the single monocular image for re-identification across multiple cameras [24]. The focus was accuracy improvement of landmark localization in 2D image, a big source of error in overall system. It gets even harder when converted to 3D. The circular measurements are always easier to measure in 3D. So, the height is measured using Euclidean distance between head top and feet at the bottom. However, the shoulder is mapped as an ellipse for measurement.

It is evident from above that both face or head and body are used for recognition and retrieval in soft biometrics, independently or in conjunction. There is an interesting and novel experiment of head-body matching performed recently. It uses anthropometric features from face or head and body. It is actually two way matching. It is interesting to observe that both modalities have the ability to perform recognition independently. The experiment was performed on two famous datasets i.e. Long Distance Heterogenous Face (LDHF) and HumanID DB. There are 5 anthropometric features computed from head region and 15 anthropometric features were computed from body. The framework called dual pathway for head-body matching. To extract anthropometric features, segmentation of different body parts was performed and physical measurements i.e. distances were computed. These are called features. The cosine similarity and euclidean distance were used for matching on set of 20 features.

Apart from face or head and body, there is another set of soft biometrics which are material i.e. clothing. Although these attributes have significance in short term tracking and retrieval from database, it is important to study them too. Similar as in a soft biometrics-based retrieval experiment, face or head, body and clothing attributes are used for recognition in [81, 193] and [81]. A set of 10 attributes from each region i.e. face, body and clothing are combined together for recognition while in [193], 7 attributes from head and 13 from body are selected. The age, ethnicity and gender are put in global traits category. The categorical annotation is used in [81] for clothing attributes and comparative for face and body in [193]. The feature estimation method in both was Elo Rating system and Tanh-estimator respectively. For classification or matching, first used Bayes, LRT and SVM-LRT, while second is based on similarity score measurement using Mahalanobis distance.

There is an experiment which specifically discusses clothing traits. It includes overall, upper and lower body clothing exposure, season and contrast etc. The ultimate objective is identification or retrieval. There are total of 17 clothing attributes used by [97] on Soton Gait dataset. The HAAR classifier, pre-trained model for skin detection, 5-scale colour map for clothing colour and brightness detection etc. are used. Moreover, local binary patterns were used for clothing pattern i.e. contrast etc. All 17 attributes were annotated categorically but a subset of 10 attributes was annotated comparatively too. The retrieval results are presented in form of equal error rate.

The Table 10 covers around 30 different research experiments relevant to soft biometrics based recognition. It is actually a comprehensive review of different approaches developed in recent years. It includes details of datasets used, number of soft traits, features estimation methods, classification or retrieval approaches and outcome. This is indeed useful information for future research.

4.1 Global traits

Generally, its been observed that many researchers declare gender, age and ethnicity as derivative soft biometrics. They categorize them as global traits. In recent years, there is a large number of research experiments performed to estimate these from image or video captured in constrained or unconstrained environment. That is why, we decided to explore and summarize recent successful outcomes of research for global traits estimation. Our analysis includes both type of approaches i.e. hybrid and independent.

4.2 Hybrid approaches based on gender-age-ethnicity

To recognize individuals in constrained or unconstrained scenes, the three global traits are used in combination. The task of recognition is performed on image or video datasets.

The hybrid recognition model of gender, age and ethnicity is used for recognition in multiple research experiments. A large number of datasets and various feature estimation, classification and deep learning methods are used for this purpose. The hybrid recognition models are of two types i.e. gender-age-ethnicity and gender-age. The Tables 11 and 12 presents a comparison of both types of hybrid models.

Table 11 Overview of hybrid approaches based on gender-age-ethnicity
Table 12 Overview of hybrid approaches based on gender-age

The first and most common hybrid model for global traits estimation is composed of gender, age and ethnicity. There are too many different models developed to estimate these three features using different datasets over the years. The feature estimation techniques include face and body landmarks based measurements to color features. For the purpose of classification or retrieval, support vector machine and deep learning based methods like VGG-16 are used. The Table 11 presents a summary of most recent research along with outcome.

The second common hybrid model for global traits estimation includes gender and age. It misses ethnicity and there is no specific reason for it. The ethnicity is to distinguish populations actually while gender and age are specific to every distinct individual.

The gender and age based models developed over the years are versatile too. There are so many different kinds of features estimation and classification or retrieval methods used for this purpose. The most common features estimation techniques include raw pixel processing, HAAR features, local binary patterns, texture and biologically inspired etc. The classification again includes simple classifiers like variants of support vector machine and many different deep learning methods like ML-Net, CNN-ELM, Attention Networks and Deep-CNN etc. This is an active research fired and models are discussed in Table 12.

4.3 Independent approaches for recognition or retrieval based on gender, age and ethnicity

Contrary to hybrid approaches discussed above, the three global traits i.e. gender, age and ethnicity are used independently as well. The goal is to perform recognition or retrieval.

We have already discussed there may be a scenario where a global trait is used independently for recognition or retrieval. That is why, it is important to analyze each global trait individually in different research experiments. The Tables 1314 and 15 summarizes individual approaches comparing global traits using multi-scale criterion.

Table 13 Overview of gender based recognition or retrieval
Table 14 Overview of age based recognition or retrieval
Table 15 Overview of ethnicity based recognition or retrieval

In Table 13, there is a long list of experiments performing recognition or retrieval using only one global trait i.e. gender.This is an active research area and performed using many different kind of datasets. The list of feature estimation and classification techniques tested is richer too. It includes landmark localization using openpose, local binary patterns, histogram of gradients, aesthetic, intensity based and texture features. For classification or retrieval, variants of support vector machine, clustering techniques like K-NN and deep learning based methods like CNN, ResNet etc. are used.

Similar to gender, age is estimated from unconstrained image or video scenes individually. Age is the only global trait among group of three which used for recognition more. The outcome of age estimation can be of two types.

It can be presented as overall accuracy or in form of age group. The Table 14 presents an overview of most recent age estimation methods, where a large range of diverse datasets is used. Moreover, raw pixels, appearance features, landmarks, local binary patterns etc. are used for feature estimation. The classification is again performed using variant of support vector machine and different deep learning methods. The outcome of each experiment is presented in last column.

The third global trait i.e. ethnicity is used less number of times for recognition or retrieval as compared to gender and age. One reason is application at higher abstraction level i.e. to distinguish populations. Also, there is not any specialized list of ethnicity dataset, it is really hard to collect multi-ethnicity dataset.

Similar to gender and age, raw pixel to landmarks and local binary patterns are the techniques used for ethnicity features estimation. The classification is mostly performed using deep learning methods like VGG-16 etc. The Table 15 presents a comparison of ethnicity recognition techniques.

5 Open challenges and recommendations

We have performed a thorough study on recent soft biometrics in previous sections. It is now clear that several important steps are necessary to take for the development of a robust and seamless recognition system using Soft Biometrics. These are actually open challenges present in the field and requires rectification. To present a list of challenges and recommendations is one of the main objectives of this survey too.

5.1 Design or development of benchmark dataset

The development of any practical soft biometrics recognition system requires its evaluation on a specially designed or developed challenging dataset. Till date, several face and pedestrian datasets like PETA [48], variants of LFW [91, 104], MORPH [199] and ATVS Forensic DB [167] are used. These datasets are used alone or in concatenation for the evaluation of soft biometrics system. None of these datasets cover all the modalities of human. Usually only face or body are in focus while recording. Also, the number of images or videos for distinct individuals is few hundreds except PETA and LFW and they both do not cover all the modalities. Another challenge is missing multiple number of sessions per subject and information about time lapse between different sessions, if there is any. More importantly, diversity in terms of recording environment, lighting conditions, gender, ethnicity and viewing angle etc. is not catered fully.

That is why, it is preliminary step to develop or design a single dataset catering all the modalities of human from different viewing angles, having multiple sessions spanning over longer period of time and including selfie images [155]. The Southampton University Tunnel Dataset [121, 179], its variants [174] and Soton Gait Dataset [184] are appreciating steps in this direction. However, they consist few hundred subjects, having fewer sessions with little time gap and recorded in a controlled environment, simply, lower in diversity.

5.2 Methods for quantitative annotations

Like, traditional biometrics, soft biometrics recognition process also involves matching of automated estimated feature value with actually annotated ones [27]. So, the annotation of dataset is the next step after development of a diverse dataset. As discussed earlier, categorical [113] and comparative [219] annotations are the most widely used methods for annotation. The annotation process used is expert opinion or crowd sourcing.

It is important to note that both categorical and comparative annotations provide qualitative value for soft trait. One has very good application in short term tracking while other is useful in feature based retrieval on a small dataset [10]. This is indeed a major limitation while performing recognition in an unconstrained environment.

This is the main reason for a using quantitative method of feature annotation, like Bertillonage system from 18th century i.e. anthropometrics [65]. The quantitative method of annotation will provide absolute value for each trait of each individual subject. This is absolute value will be highly discriminating too. There are certain experiments like [177, 194, 204] where these anthropometric and geometric measurements of human are used for recognition. However, these experiments are performed on smaller datasets with limited diversity. Therefore, there is a great need to first explore and develop tools like Bertillon for quantitative annotation of datasets, following by development of automated techniques for estimation in surveillance.

5.3 Feature selection

In our work, we reported a collection of soft biometrics features from whole human body i.e. face or head, body including limbs and clothing. To the best of our knowledge, this is 10 times larger collection till date after [46, 165, 209] and [97] etc. These traits are used in various research experiments to perform different recognition or retrieval tasks.

This is a huge collection of more than 170 soft biometrics. Now, it is an open question to select those features which are highly significant to recognition or retrieval [143]. This information can be identified in multiple ways like; occurrence of a feature in various experiments, type of experiment, application domain, weighted significance in a specific scenario, permanence or stability score of a particular trait and discrimination power for each trait etc. In fact, we can calculate these properties of an individual trait on a comprehensive dataset.

It is also important to note that several traits are estimated using all the three annotation types and others using one or two. The question is to understand in which scenario a specific annotation type is used and how much accurate it has been. So, there should be a mechanism for evaluating annotation type for each trait. A mechanism reflection of real-world observation.

5.4 Development of techniques for improved automated estimation of soft biometrics

To improve recognition accuracy, we have also investigated four critical factors affecting overall performance. These four factors are attribute correlation [26, 102], permanence score [9], discrimination power [193, 194] and distance [81]. These are directly linked to each soft biometrics and can be defined at trait level.

A better handling of these factors for each trait will result in form of improved recognition, if included. So, it is highly recommended that each of these traits from this larger collection should be tested for these four factors. This activity can only be performed on a diverse and large dataset after annotation. The higher the value of four factors of a soft trait, higher is the choice of being part of soft biometrics system. By this way, a smaller set of soft biometrics will provide better recognition or retrieval.

5.5 Development of feature and modality level fusion framework

In soft biometrics research, we have explored that human body is consist of modalities and these modalities contains features. So, we have found two kind of fusion scenarios in various research experiments; i.e. 1) feature level based on permanence and discrimination power [18, 194], and 2) modality level [80, 81]. There are different kinds of mathematical and statistical operations used to perform feature or modality level fusion. It has been predominantly observed that fusion always perform better than independent ones. As of our analysis, fusion contains larger number of features from each modality. So, it is better to select few features which are permanent [9] and discriminating [193, 194]. Then, for improved recognition, it is better to compare different sets of fusion at multiple levels and in distinguishing scenarios.

5.6 Techniques for improved hybrid recognition

We also compared various approaches of soft biometrics features estimation and classification or retrieval, tested on various datasets. There is a large number of methods like Raw Pixel processing [81], LBP [150], AAM/ASM [9], landmark estimation [25, 194] and Masrk-R CNN [67] etc. used for feature estimation. The classification or retrieval is performed using methods like similarity score or prob-gallery match [97, 193] euclidean distance [152], Bayes [25], and SVM regression [9] etc. Moreover, it can be a wise approach to test clustering techniques too for improved recognition [142].

In fact, future soft biometrics system will be hybrid in terms of modalities and features. That is why, to develop a comprehensive dataset is essential. Afterwards, to test these feature estimation and classification or retrieval techniques on this dataset will be effective. By this, we can head towards an improved hybrid recognition system using soft biometrics.

5.7 Techniques for improved global traits based recognition

As discussed earlier, the gender, age and ethnicity [118, 213] are studied as independent soft biometrics. That is why, we have explored and summarized these trio features in hybrid framework and independently. The techniques used for feature estimation are of versatile nature for this trio like image processing based [4] to wavelets etc. Some commercial applications [108] and PCA [99] etc. are used for classification or retrieval.

In our opinion, these global traits have weighted significance in recognition using soft biometrics. These can be used as independent sub-systems or become internal component of any recognition system. It is highly recommended to develop a standalone system using this trio soft biometrics. More specifically, the gender should be mapped as binary class problem, while age and ethnicity as multi-class problem. The class size should be as small as possible.

6 Summary and concluding remarks

This paper provides comprehensive analysis of soft biometrics approaches for recognition and retrieval purposes. As the overview of existing work on soft biometrics shows us, the development of a robust and highly accurate soft biometrics recognition system remains a challenging task. One of the main issues that need to be resolved include a creation of a diverse dataset and development of effective methods for classification, feature selection and quantitative annotation. As these areas have primary effect on the overall performance in recognition tasks, it is of utmost importance to direct future research these directions in order to deliver a robust soft biometrics-based recognition systems.