1 Introduction

Throughout history, cosmetics and skincare products have played many roles, for example in preparing for battle, in religious ceremonies and burial rituals, or for beauty enhancement and self-care [1]. After millennia, the way cosmetic and skincare products are developed and produced has changed significantly, but their importance has not diminished [2]. Their contemporary and primary function is to give us certain characteristics that make us aesthetically appealing from the subjective point of view of most other people [2, 3]. In terms of cosmetic skincare, this means enhancing our beauty through the maintenance of overall skin health [4, 5]. In the modern world, appearance is of great importance. Its improvement through cosmetic skincare products and procedures contributes to improved psychological and emotional well-being, emphasizing (facial) aesthetics and creating a desired impression on others [2, 3, 6, 7]. Whether it is a bright lipstick, a simple moisturizer or a state-of-the-art anti-aging skincare solution, cosmetics boost self-confidence and self-esteem [6, 8]. All this suggests that beauty is an important attribute of contemporary humans [3].

Advances in sciences and the development of modern chemical technologies have contributed to the development of cosmetic skincare products with improved efficacy, safety and sustainability. Today’s cosmetic skincare industry is driven by innovation and offerings in this industry are constantly evolving. It refers to the sector of the beauty and personal care industry that focuses on the development, production, and marketing of skincare products designed to improve the appearance and health of the skin. Recently, digital tools have entered the field of cosmetics, skincare and beauty care. They are often referred to as artificial intelligence (AI), including machine learning (ML), and impact the services offered to the consumer [9]. Digital tools in the field of beauty technology and cosmetic skincare include state-of-the-art techniques for collecting data on the characteristics of skin and hair. These tools interpret complex, multidimensional datasets and provide recommendations for a product or treatment [9,10,11]. These advanced approaches contribute to a truly personalized customer journey and shopping experience. As AI has learned to perform a range of tasks that until recently were left to humans, AI has inherited a number of negative characteristics of the human mind. These include unjustified bias and discrimination against individuals, particularly, in the context of non-conformity with global beauty standards [12, 13].

In 2016, the beauty contest Beauty.AI was held among women, in which the winner was selected by AI. It turned out that among the winners almost all of them had white skin [14]. This has raised the serious issue of bias of AI technologies in the beauty industry. Ruha Benjamin has addressed this issue in her book “Race After Technology” [15], in which she develops the concept of the "New Jim Code". It describes how discriminatory practices and biases are embedded in technological systems, perpetuating racial hierarchies and reinforcing social injustices. It is discussed how racial bias can emerge in different ways, such as biased data used for training algorithms, biased design and development processes. The author argues that these biases are not accidental or unintentional but are rather a product of systemic racism and unequal power dynamics. The use of the demographically balanced databases (at least in terms of race, age and sex) is important for the development of facial analysis systems [16]. Bias occurs with AI robotic systems, for instance with robots used in the medical field [17]. Examples include the da Vinci Surgical System, hospital delivery robots for supplies and medication, medical drones to deliver drugs and blood to remote areas, and medical robots with learning algorithms or direct programming. An ethical framework for decision-making by such robots is needed, given the impact of bias on care delivery and triage situations [17]. There are multiple other domains that are prone to biases, such as in voice-dictation software and integrated speech-recognition technology. They show biases in recognizing and understanding the voices of women compared to men [17]. There is also gender bias in search engines, resulting in job postings for high-paying technical jobs being offered to men rather than women [17]. Algorithmic plagiarism filters used in academia may bias second-language students, who may face challenges paraphrasing words and phrases effectively compared to native speakers [18].

Current article summarizes the data on bias in AI in the cosmetic skincare industry, as to the best of our knowledge, this topic has been poorly researched. This problem is particularly prevalent in dermatological and cosmetic literature, including dermatology textbooks, where a racial bias is reflected in the underrepresentation of images of minorities compared to the general population, which can lead to inequalities in skin health care [19, 20]. In parallel, the skincare industry has focused its clinical testing mainly on pale or white skin, known as Fitzpatrick skin types I–III [21]. As a result, only an estimated 4% of participants in these trials had brown or black skin (Fitzpatrick skin types V and VI) [22].

The work aims to reveal potential sources of AI bias within the cosmetic skincare industry using published data. These sources of bias have been categorized based on the stage of the AI lifecycle in which they may occur: biases associated with target setting, biases tied to data acquisition and annotation, biases in modeling, biases during validation and evaluation, and biases during deployment and monitoring. For convenience, the main definitions in the field of AI bias research are also presented. To justify the systematization of knowledge about AI bias in different industries, current data on AI legal regulations are provided, highlighting the increased public attention to this issue.

2 What is bias? Main definitions

According to Ntoutsi et al. [23], AI bias is defined as “the inclination or prejudice of a decision made by an AI system which is for or against one person or group, especially in a way considered to be unfair”. With regard to AI in cosmetic skincare, this can be modified as follows. Bias is the preference of an AI system to predict outcomes for a particular group of individuals that differ from the actual outcomes/ground truth score, such that the decision is made for or against a person or group, especially in a way that is considered unfair.

According to this definition, bias occurs when the collection or treatment of data is inaccurate, but only leads to discrimination when its effects are perceived as unfair and therefore unacceptable. Whether something is fair or not is based on whether a bias leads to an outcome that is inconsistent with the values of the society we live in. It is important to distinguish between objective and subjective bias [24]. Objective bias refers to a situation where the measurement or decision-making process is biased, regardless of the intention or motivation of the decision-maker. In such cases, the bias stems from the method used to collect data, including technical deficiencies or the improper use of a measurement device [25], or the decision-making procedure rather than from a bias due to an individual’s decision making. On the one hand, subjective refers to a situation where the bias is based on personal beliefs, attitudes, or stereotypes of the decision-making system, which may result in unfair treatment of individuals or groups. This type of bias is often rooted in historical or systemic factors and can result in discrimination against certain groups or individuals, even if this is not intentional [26]. It is important to distinguish between intentional and unintentional bias [13]. It should be noted that the latter is more common in AI systems [27].

Although this list is not exhaustive, ethnicity, mainly due to differences in skin tone; gender, due to insufficient consideration of hormonal changes; age, which can give rise to a desire for an unrealistic and unattainable youthful appearance; and overall physical appearance are the greatest risks for bias in the cosmetic skincare industry [28,29,30].

As digital transformation in general and AI in particular evolve, it becomes clear that alongside the undeniable opportunities, there is also a dark side [31]. This leads to negative, harmful or unintended outcomes of AI and fall into different dimensions, including fairness, transparency, accountability, robustness and safety as well as data governance. Sensu Mikalef et al. [31], these categories have to be understood as follows. Fairness can be defined as ethical and equitable treatment of individuals or groups in the development and deployment of AI algorithms. Transparency is making the outcomes and decision-making processes of AI algorithms understandable and interpretable. Accountability refers to the responsibility and liability associated with the development, deployment, and use of AI. Robustness and safety are closely related concepts related to ensuring the responsible and reliable use of AI. Robustness primarily focuses on the ability of AI systems to perform effectively and reliably in various conditions and deal with unforeseen challenges, errors, or disruptions. Safety, on the other hand, is a broader concept that includes ethical considerations and measures to prevent harm caused by AI systems. Finally, data governance refers to the set of practices, policies, and measures implemented to ensure the responsible management, quality, privacy, and security of data throughout the AI projects.

3 Legal regulation of bias in AI

Different countries can define different ethical frameworks and regulations for AI [13]. Ethical frameworks in AI regulation may differ based on a country's cultural, legal, social, and political context. These frameworks often reflect the values, priorities, and concerns of the respective country. Thus, while there are global discussions on AI ethics and principles, the actual ethical frameworks and regulations for AI can vary significantly between countries [13].

Several regions and countries, including the EU, Australia, Japan, the UK, the US, Canada, and Singapore, have introduced national standards and practices for the responsible use of AI. This aspect of AI is also regulated by international organizations, such as UNESCO, G20, G7, and OECD [13]. To ensure both safety and the protection of fundamental rights in AI applications, the European Union is working on an AI Act [32], which consists of a set of harmonized rules for AI applications. These standards are also being developed by ISO/IEC and the IEEE Standard Association [13, 27, 33]. In particular, the system of IEEE Standard on Algorithmic Bias Considerations has been proposed as a possible set of ethical design standards [27, 33]. The IEEE P7001™ [27] and IEE P7003™ [33] Standards Project for Transparency of Autonomous Systems has been created. They aim to certify AI algorithms that meet requirements of transparency to avoid bias [27]. The importance of regulating bias in AI is also important at the corporate level. The use of AI in sustainable decision-making should be guided by ethical principles and legal frameworks that ensure fairness, transparency, and accountability [34]. However, very few national AI strategies address human rights [13]. The regulatory requirements for AI relate to human supervision, training data, recordkeeping, information provision, robustness and accuracy [13].

The adoption of AI carries specific risks, in particular that it can have adverse effects on the fundamental rights, safety, and health of individuals if not applied correctly [35]. Different countries have different criteria and procedures to assess these risks [13]. The EU has developed the AI Act (entitled “Laying Down Harmonised Rules On Artificial Intelligence (Artificial Intelligence Act) And Amending Certain Union Legislative Acts”) describing classification criteria of AI applications based on a risk of their implementation [36]. According to this act, the degree of risk is not a quantitative assessment but rather the compliance with specific qualitative criteria [35]. The AI Act classifies applications of AI into following categories [35, 36]. (i) Applications that pose an unacceptable risk, encompassing misuse of AI for harmful purposes or applications that violate EU values, such as social scoring and biometric facial recognition in public places. These are all prohibited. This does not apply to cosmetic skincare/beauty tech, because it does not cause intentional harm to individuals or society, does not violate human rights and does not violate fundamental values and principles of the EU. (ii) High-risk applications pertain to AI systems that may have harmful effects on individuals' safety, well-being, or fundamental rights. They are subject to specific legal criteria in terms of data governance, documentation and record keeping, transparency, human oversight, robustness, accuracy and security. A complete list of criteria and requirements for their approval, as well as documentation, is described in the AI Act [36] or in specialized literature, e.g. [35]. As a brief illustration, AI applications for skincare that process sensitive information, such as health or biometric data, fall into the high-risk category. Such AI applications may be subject to stricter regulations and requirements under the EU AI Act [36]. Due to the large amount of information, a complete listing of these criteria and analysis in the context of the cosmetic skincare industry and beauty tech is not possible here and would consequently be a topic for a separate paper. (iii) AI systems that are classified as limited-risk applications require transparency and individuals exposed to them must be informed of their operation [36]. The EU AI Act does not provide specific criteria for limited risk AI applications in the same detailed manner as for high-risk AI systems. Whether an AI application falls into the limited risk category depends on various factors and it is advisable to consult the EU AI Act and relevant guidelines or regulations for specific guidance on this classification [35, 36]. (iv) Minimal risk includes all other AI systems that can be applied within the EU without the need for additional regulatory obligations beyond those already in place [36]. All AI systems developed for the cosmetic skincare industry and/or beauty tech applications that are not classified as being high-risk, will obviously be classified as minimal or no risk. Even though the AI Act proposes stringent regulation only for high-risk AI systems, it should not be assumed that negative impacts on low-risk applications can be ignored [13].

4 Possible sources of bias in cosmetic skincare industry

4.1 Key stages of the AI algorithm development in cosmetic skincare industry

AI refers to the ability of a computer system to mimic human cognitive functions. ML is a component of AI and is defined as the use of mathematical models that enable a computer to learn without direct instructions [37, 38]. This article primarily focuses on bias in ML technologies as an important part of AI. ML-based applications are integrated in all segments of the cosmetic skincare industry chain, from basic research to customized services [9, 10]. These ML applications include systems designed to capture individual characteristics of skin and hair from images and instrumental devices or collect information from online questionnaires, possibly with geolocation, etc. On the one hand, this complex digital-based approach leads to a meaningful, well-founded outcome, such as an accurate skin diagnosis or product recommendation [9, 10]. On the other hand, it carries the risk of bias and discrimination [12, 13, 27, 33].

The development of ML algorithms involves a set of sequential stages. Often, this process is iterative and is called the ML or AI lifecycle [38]. It makes sense to discuss the risks of ML-related biases in the skincare industry for each of the key stages in the development of a ML application (Fig. 1). The goal or the target of an algorithm is defined during the initial stage of algorithm development. Next, the algorithm is trained with a set of data. The trained algorithms should be validated to understand how they perform on various groups. Finally, ML products must be monitored after they are released. If bias is detected at any stage of the AI lifecycle, appropriate action should be taken to correct or eliminate the bias, potentially affecting other stages of the lifecycle. This should minimize the impact of biases on the performance of the algorithm.

Fig. 1
figure 1

Key stages of the AI lifecycle. Consecutive stages are indicated by the numbers (1–5). Bold arrow represents the main workflow sequence of one of the AI lifecycle iterations. Step back due to the bias detection could be done at any stage to any previous step (represented by dash lines). Points of bias intrusion into AI lifecycle represented in frames with text. AI artificial intelligence

4.2 Biases related to target setting

The first stage of algorithm development comprises the definition of the goal or the target of the algorithm (Fig. 1). This target is the attribute predicted by AI using the input data, which can be complex in size, content and structure. Bias will arise when the target is not aligned with the true value, whether intentional or not. This is illustrated with the concept of beauty, as it is directly related to the cosmetic skincare industry.

4.2.1 Beauty definition bias

Beauty definition bias may have multiple causes. One cause relates to cultural differences in beauty perception. As was mentioned above, beauty is defined as a combination of qualities that make a person aesthetically appealing from the subjective point of view of most other people in a certain cultural environment [4]. Traditional cultures are characterized by a wide variety of beauty “ideals”, which emphasize the physical characteristics of certain ethnic groups. For instance, in Japanese culture, the ideal representation of female beauty is embodied by the geisha. Women strive to emulate their playful and charming personality, as well as their physically desirable features, such as light skin, petite facial features, oval face, and long, healthy-looking hair. In Indian cultures, beauty is symbolized by a colorful sari, nose-ring, and a bindi (red dot or piece of jewelry) worn on the forehead. In the Wodaabe community in Central Africa, beauty is assessed based on the height of the neck, and the whiteness of eyes and teeth [39]. Globalization and the blurring of intercultural boundaries have led to the establishment of international beauty standards. General standards of beauty in mass culture are characterized by (i) youthfulness (looking young, with smooth skin, bright eyes, and a toned, fit body), (ii) symmetry (faces and bodies that are more symmetrical are often perceived as more attractive), (iii) clear, flawless skin, (iv) slimness and (v) "Western" features (lighter skin, a narrow nose, a pronounced jawline, etc.) [40,41,42,43]. Bias, leading to discrimination, may occur if these international standards ignore some of the above-mentioned features prominent in various cultures. Another undesirable consequence is the pursuit of, sometimes unrealistic, beauty ideals, a phenomenon that is very difficult to rectify. Even at the risk of their own health, people use products to lighten their skin, color their hair [41, 42, 44, 45] and apply products or treatments to pursuit smooth and uniformly pigmented skin [43, 44, 46, 47]. There is some optimism in this regard, as in recent years the universality of beauty standards is being replaced by features specific to different ethnicities and cultures [43, 47, 48]. In addition, traditional conceptualizations of beauty are being broken and beauty today is much more based on self-expression and inclusiveness.

In ML terms, the multidimensionality of beauty leads to a great variety of potential target variables. The unconsidered choice of the target variables in a beauty tech AI tool and the corresponding target values underlie a potential risk for bias, leading to discrimination primarily based on race and ethnicity, but also on age, sex and gender. Biophysical differences in skin properties among people from different ethnic backgrounds are a particular cause of bias. As an example, the process of skin aging, caused by both intrinsic and extrinsic factors, is different among people from Caucasian, East-Asian, Hispanic and African-American descent [49, 50]. Dark-skinned individuals are generally believed to have firmer and smoother skin than lighter-skinned individuals of the same age; however, aging in the form of mottled pigmentation, wrinkles, and skin laxity does occur. Asian and black skin has thicker and more compact dermis than white skin, which likely contributes to the lower incidence of facial rhytids in Asians and blacks. These ethnic related differences must be included in the development of the AI application to avoid discrimination based on ethnicity.

4.3 Biases related to acquisition and annotation

4.3.1 Sampling bias

If the input data are skewed, the likelihood of sampling bias increases [12]. Most cosmetic skincare studies are conducted on panelists with a light skin tone, with a significant underrepresentation of people of skin of color. Besides unbalanced (image) datasets related to pigmentation, there may also be imbalances on race, ethnicity, sex, gender and age [51, 52]. AI algorithms that have used such unbalanced dataset for their development or training will have limited applicability and the outcome will not be representative for the general population. In this regard, the publication of ethnic-specific [53,54,55] and ethnic-balanced [47, 56,57,58] studies are most welcome. Oversimplified face datasets for training are main sources of AI bias [16]. Other characteristics to consider when avoiding sampling bias in cosmetic skincare industry and/or beauty tech include living environment (rural versus urban) and socioeconomic status. Careful mapping of the demographic characteristics allows for the composition of a balanced study population and minimizes the likelihood of sampling bias. In this connection, the projects, such as Atlas of Beauty [59], are especially promising. Atlas of Beauty is a set of female photographs of different ethnicities and ages collected over the world. Such balanced collections of reference images help both experts and AI developers assess subjects fairly across boundaries of ethnicity, skin color, and gender. The images in the "Atlas of Beauty'' were not labeled in the conventional sense but were presented as visual representations of the diverse beauty of women from around the world. Such annotated facial images have been collected and presented in six volumes of L’Oréal’s Skin Aging Atlases [57]. These skin atlases allow to evaluate or predict the general aging of the face and assist both clinicians, experts and AI developers in the assessment of facial aging across boundaries of ethnicity.

4.3.2 Confounding bias

Confounding bias, also known as confounding, is a type of bias that occurs when the relationship between an exposure (the measurement of the independent variable) and an outcome (target or dependent variable) is distorted or obscured by the presence of one or more other variables [60]. This can lead to an issue of underfitting as one or more relevant features are not included in the analysis. As an example, the assessment of wrinkle-related parameters can be used to determine a perceived age. However, bias will occur if parameters other than wrinkles contribute to a person’s perceived age, e.g., gray hair. If hair color is not included in the annotation, this form of confounding bias may lead to unfairness [61]. In a study of the effect of hand skin hygiene of medical staff on the rate of infections in an emergency intensive care unit, the effect of Hospital grade, patients’ age, disease type, disease severity, and underlying diseases were confounding factors in the relationship between hand hygiene compliance and hospital-acquired infection rate [62]. In a study of the effect of ambient air pollution on inflammatory acne, a list of confounding factors with potential acne-promoting effects was taken into account. However, this set was considered to be non-exhaustive, suggesting that the outcome may still be biased due to confounding. This limitation of the study has been mentioned by the authors as a possibility of “residual confounding bias” [63].

4.3.3 Measurement bias

Often AI (including ML) algorithms are used for estimating optical, mechanical and other properties of the skin. These features can be expressed in various ways such as skin color or tone (red, yellow or black), melanin index, brightness, trans-epidermal water loss, evenness, roughness or elasticity. All these features are collected by special devices, such as spectrometers, colorimeters and skin conductometers, and expressed in numerical values. Measurement bias refers to different types of bias that result from the measurement itself. The validity of a measurement can be divided into three distinct types: construct validity, content validity, and criterion validity [64]. Construct validity is the extent to which different devices that should theoretically measure the same construct are actually correlated. An example of construct validity can be found in a paper by Wang et al. [65]. In this study, repeatability of skin color data obtained with a CM700d spectrometer (Konica Minolta, Japan) and a photoResearch PR650 telespectroradiometer (Photo Research Inc., Chatsworth, CA, USA) was different between different ethnic groups and were instrument dependent, which strongly suggests poor construct validity. Content validity specifies the extent to which the data generated by an instrument truly reflects the construct to be measured. As an example, skin redness is often used as a proxy for sensitive and inflamed skin. However, skin redness has been shown to depend, among others, on nutritional status, tissue perfusion, ethnicity and body site [66]. As such, skin redness cannot always be bidirectionally associated with sensitive skin, and such association needs to be validated during training and implementation of the technology. Another example of content validity relates to the use of facial images, including selfies, to estimate skin quality related features. Differences in camera performance and lightning conditions affect image quality. More importantly, built-in post-processing algorithms, such as sharpening, white balance and color correction often alter images of people with light or dark skin tones differently. This leads to pictures that do not accurately represent people of color, and can lead to bias when these pictures are used for automatic facial skin feature detection. Google introduced the “Real Tone” feature in their latest smartphone, which uses AI to take better photos of people with darker skin tones [67]. AI is used here to correct an error that has a long history in filmmaking and photography. The third type is criterion validity, which refers to the degree to which a measure relates to an outcome that is considered as valid, i.e., the “gold standard”. This basically reflects the level of accuracy of the method. In the context of bias in the evaluation of skin features, this is of less importance.

4.3.4 Label bias

Numerical values on skin properties, mostly obtained through measurement devices, are often used for classification purposes. This approach is prone to bias, as the same numerical scale cannot be applied for different individuals. Numerous examples can be given: The relationship between skin redness (measured as the a* value of the CIE L*a*b* color space) and erythema is complex and dependent on the pigmentation level, for instance due to increased vascularization or melanin content [68]. Trans-epidermal water loss is considered to be a proxy for skin barrier function, hydration, stratum corneum thickness, and skin sensitivity, but is also known to depend on sex [69], age [70], body-site [71], and ethnicity [72]. Sebum level and skin oiliness also depend on these parameters [73]. A next example is UV-induced skin fluorescence, which allows the measurement of porphyrin related fluorescence as a measure for acne severity [74]. Intensity and spectral composition of skin fluorescence strongly depend on the ethnic-specific skin tone due to optic shielding by melanin [75]. In all of these examples, labeling and classification may be inaccurate if these relevant variables are not taken into account.

A special type of bias relates to labels assigned by human visual assessment. This has been reported for age prediction based on facial pictures. The labeling strongly depends on culture, ethnic background, beliefs, age differences and preconceptions of an assessor [54, 61, 76,77,78]. As a consequence, not all annotators or experts grade pictures in the same way, leading to bias and a low inter-rater reliability. Clear annotation manuals with straightforward and transparent instructions are essential to reach valid and unbiased conclusions. A study by Ganel et al. [79] showed that the bias related to human assessment is transferable to AI algorithms for automated age prediction, causing poor performance.

4.3.5 Negative set bias

An accurate prediction or evaluation by an ML algorithm, requires the identification of a positive example i.e., a condition where the feature of interest is present. A negative example then corresponds to a condition in which the feature of interest is absent or not sufficiently present. This also includes an unequal distribution of the feature of interest, such as the overrepresentation of high-severity symptoms of a skin disease [80]. Negative set bias refers to a situation in which those negative examples, commonly termed “rest of the world”, are insufficiently present in the dataset. It affects the performance and accuracy of ML models. Just consider a wrinkle study in which the training data set predominantly includes samples with moderate to severe wrinkle concerns. The algorithm will struggle to accurately classify or quantify wrinkles that are less pronounced. Negative set bias can also be introduced in ML algorithms developed for more complex conditions, such as acne. Acne is a non-unidirectional skin disorder that has four distinct stages with regular outbreaks occurring in different parts of the body and at different ages [81]. The evaluation of an ML based system can only be accurate if the images used for training capture every aspect of development and regression of acne.

4.3.6 Other types of biases related to acquisition and annotation

A special case of data acquisition related bias comes from the use of questionnaires and surveys. Social desirability bias is a type of response bias that is the tendency of survey respondents to answer questions in a manner that will be viewed favorably by others. Questions about financial condition and buying habits are sensitive to this [13]. In Western cultures, tanned skin is associated with high social status and the ability to travel to warm countries, while in Eastern countries pale skin is the beauty ideal [41, 82]. Also, low social status and non-prestigious professions can be associated with rough and cracked skin on the hands. All these features can be detected by AI algorithms. Besides social desirability bias, there are other types of biases from surveys, such as not-representative sample, non-response bias, acquiescence bias, order-effect bias and primacy/recency bias. A detailed description of these different biases is beyond the scope of this paper.

4.4 Biases related to modeling

The modeling phase is a part of the AI life cycle which consists of feature engineering, model training and model selection. Feature engineering is a process of extracting features from raw data in order to use them in a model training. Some features might represent sensitive information that can lead to biases and impact model performance and as a result discriminate against a particular subgroup of people [38].

4.4.1 Objective optimization bias

The objective of an ML based system is a mathematical function that needs to be optimized by training to reach an optimum so that the distance between calculated and actual values of the target is minimized. The choice of the objective is a crucial step to create an unbiased algorithm [38].

It was reported that the objective function affected the correctness of results of BeautyGAN, an algorithm for transferring makeup from one face image to another by generative adversarial network (GAN), a widely used ML approach [83]. The performance of this GAN was compared with other algorithms for the same task [84, 85]. Objective optimization bias was detected as an inadequate objective function led to incorrect transferring, e.g., black eye shadow is transferred to blue on some but not all faces.

4.5 Biases related to validation and evaluation

At the validation step, the effectiveness of prediction is assessed using the test dataset. This dataset should meet the same requirements as a dataset for training (should be balanced and correctly labeled). Thus, both the training and validation datasets are prone to the same types of biases and require the same kind of monitoring and critical analysis.

4.6 Biases related to deployment and monitoring

4.6.1 Post-production biases

Once the ML-based system is integrated into a workflow, it needs to be continuously monitored. Because an ML-based system is expected to adapt itself over time, it is necessary to monitor its reliability, accuracy, cost, safety, and impact. As part of this post-production monitoring, errors and biases in the evaluation and prediction of the real world must be identified [86]. Another type of bias may arise from continuous changes and evolutions that are so characteristic of our society [87]. This can lead to concept drift where the relationships between measurements, target variables and functions change over time. If these (social) changes are not incorporated into the procedure, this can give rise to bias and discrimination. Evaluation of the “user” and his/her operations is also part of the post-production monitoring. This requires regular assessment of each user and her/his knowledge on the subject to ensure proper implementation and use of AI-based applications [86].

5 Conclusions and perspectives

The issue of discrimination through the use of AI systems has been recognized and several initiatives have already been taken by major players inside and outside cosmetic skincare industry and/or beauty tech. For these fields, there is a lot of focus on embracing diversity and inclusion. Traditional protected attributes are skin color, age, ethnicity and gender. In this paper, we listed major sources of bias in ML applications for the cosmetic skincare industry and/or beauty tech. These can be introduced at various stages of the AI lifecycle. We advocate a structured approach where classification of biases is important and where each step in the development of an AI system is carefully evaluated. The final goal of such evaluation is the deployment of an ML algorithm in which inclusivity is fostered and encouraged and groups are treated alike. If nevertheless an unacceptable bias is identified, the model needs to be refined.

It must be mentioned that AI is not the culprit. It is a mirror as to how society is viewing individuals and groups of people. As such, these technological developments are a reflection of human thinking and our subjective vision on society. Finally, it can even be argued that AI applications reveal the biases that characterize our society and that would otherwise remain hidden. Smart AI systems can then be designed not only to detect biases, but also eliminate them. AI has enormous potential for good that we absolutely must harness. The narrative around that vision should express optimism and hope, rather than fear.