A common feature of human-centric artificial intelligence design is the necessity of using humans to assess, where fundamental rights and responsibilities lie in a situation. From which, rules are introduced into the AI to mitigate any potential harm (Bauer 2020a). We argue that this bottle-necks AI, and forgoes the power afforded by the technology. We put forward the suggestion that the AI itself ought to have the capacity to perceive the action space-state and make rights and responsibilities allocations. Such a perception would allow an AI to draw on a wealth of information, to include precedent, and prior outcomes to solve multi-factorial ethical conundrums in real world settings.
This contrasts with top-down rule-based systems which, to a degree, replicate the modus operandi of non-AI computing (Cervantes et al. 2020). On the other hand, bottom-up programming uses machine learning (ML) algorithms to learn from patterns in a prepared set of data to infer the next move (Bauer 2020a). Such a methodology considers normative values as being inherent in the activity of the agents but not explicitly defined in terms of a general theory (Wallach et al. 2008). This paper’s approach can be thought of as bottom-up but uses a universal fairness rule that is inherent within Word Embeddings, as will be expanded on.
For an AI system to be able to perceive engaged contexts to assess whether the description of an act, or instruction, is fair, a fairness metric by which it can measure such activity is required. Currently, metrics to assess human qualities such as sentiment and personality have been well validated in the literature (Boyd et al., 2015; Hai-Jew 2017; Youyou et al. 2015). However, a valid and reliable measure of fairness has yet to be developed.
Our work in this paper will focus on delivering the first step in the development of such a measure, one that focuses on interpreting human readable texts and assessing the fairness of the social power interactions described therein. As documents can be broken into their constituent paragraphs, sentences and words, this paper will concentrate on analysing singular words, specifically, verbs.
A certain limitation exists in using singular words, being devoid of context. The sentence ‘The man killed the taxi driver’ vs. ‘The man killed the weeds in his garden’ carries both qualities of being unfair and fair, respectively, for the same word ‘killed’. The same can be said of homonyms. However, sentences such as: ‘The boy thanked the teacher for his help’ is easily classifiable as a fair compared to ‘The boy used a slur against the teacher’. Here the two verbs: ‘thank’ vs. ‘slur’ are typically considered as fair and unfair acts, respectively, even devoid of context. We accept this limitation for this stage of the research. We will be testing the verb list used by (Jentzsch et al. 2019) who incorporate a list of ‘Do’ and ‘Don’t’ verbs into their pipeline as training data. However, our methodology differs from their own, as we do not use any training data, but rely on inherent social ontologies. Our methodology will be covered following an introduction to the background used in the design of the measure, which focuses on the social anatomy of the human mind and social discourse.
Principles behind the fairness measure
The human mind is able to build rich causal models, perform generalizations, and assemble powerful abstractions despite sparce and incomplete input (Tenenbaum et al. 2011). Modeling how the mind uses abstract knowledge to guide inferences has been attempted with Bayesian statistics. Abstract knowledge is seen as being encoded in a probabilistic generative model. One that describes the causal processes of the world in a way that facilitates the analysis of perceived spaces and their latent variables. Causal learning data can be gained from co-occurrences between events, whereby causal relations are hypothesized. Likelihoods favor causal links that make such co-occurrence more probable, whereas priors favor links that fit background event knowledge of likely causes (Tenenbaum et al. 2011).
It has been proposed that such abstract knowledge provides essential constraints for learning. Developmentalists posit that humans innately hold a set of principal abstract concepts such as “agent”, “object” and “cause” to provide a fundamental ontology for qualifying experience (Carey, 2011a, b). Indeed, there is a growing trend in the literature for multiple representation views, whereby abstract concepts are grounded in an array of inputs: linguistic, emotional, sensorimotor, internal experiences and social (Andrews et al. 2014; Borghi et al. 2018). It has been suggested that the divergence between abstract and material concepts may be best modeled in terms of multidimensional space, in which concepts varying both in their level of abstraction and along other content dimensions are distributed (Borghi et al. 2018).
This form of representation of the abstract, in multidimensional space, one that incorporates probability learning and co-occurrence statistics is reminiscent of the ontological features of a form of neural network computation known as Word Embeddings. These embeddings are able to capture rich features of human language, language that inherently reflects society and its values (Boyd and Richerson 2009; Smith 2010; Drozd et al. 2016).
The social mind, human language, and Word Embeddings
Word Embeddings use a process known as co-occurrence probability to represents words. As such, these words are no longer represented by their dictionary definitions, but by their relations to other words. The approach uses word context to represent meaning. Oft captured by the saying ‘you shall know a word by the company it keeps!’ (Firth 1958; Nerbonne and Hinrichs 2006).
Vectors are used to capture how frequent each word occurs in a particular context. Each vector consists of a list of numbers, whereby each number reflects a probability. As a list of co-occurrences is built up, probability patterns begin to emerge. Thus, terms such as ‘dog’ and ‘cat’ would be seen to have a higher probability of co-occurring with each other than words that do not occur together as often, such as ‘dog’ and ‘pipe’. As the list of vectors grows, more useful information on word meanings form. For example, the word ‘ice’ would be found to co-occur more frequently with the word ‘solid’ than the word ‘gas’. Whereas the word ‘steam’ co-occurs more frequently with ‘gas’ than the word ‘solid’. Of note is that both words co-occur frequently with water, as it is their shared property while infrequently with unrelated words (Pennington et al. 2014).
These vectors can then be represented in multi-dimensional space. Each word in the document is given a set of coordinates that represents its location in a geometric space in respect to every other word. The setting of these words is based on their context. Those sharing many contexts are found to be situated next to each other, compared to words which have different contexts (Kozlowski et al. 2019). Thus, words such as ‘pain’ and ‘pleasure’ may be found to be distant to each other while being closer to ‘abuse’ and ‘love’, respectively.
An advantage in using vector notations lies in their arithmetic properties. Two vectors can be compared, added and scaled, allowing for a number of calculations to be made. A highly cited example is that of manipulating a vector which represents the word ‘King’. In subtracting the vector for ‘man’ from it, then adding the vector for ‘woman’ from it, the result is the word ‘Queen’. This happens, because the representation of ‘King’ contains a representation of ‘man’ due to co-occurrence. When this quality is removed using a subtraction, the word is no longer closely associated with ‘King’ yet remains closely associated with royalty. As such, replacing ‘man’ with ‘woman’ allows for a new vector to be closely matched to a word that represents royalty and women, i.e., a ‘Queen’ (Chen et al. 2017; Drozd et al. 2016).
It is also possible to consider how similar or dissimilar two vectors are by measuring their cosine similarity. From trigonometry, Cos (0) = 1, Cos (90) = 0, and 0 < = Cos (θ) < = 1. Vectors maximally similar are parallel (i.e., at 0 degrees to each other) and minimally similar if they are perpendicular (i.e., at 90 degrees to each other). This feature allows for a straightforward comparison of words. Singular words such as ‘slur’ and ‘irresponsible’ may be compared using this method, for example, with the expectancy that similar words will hold a higher cosine score than dissimilar words. The power and flexibility offered by this method has seen it reinforce much of the work done in natural language processing (NLP) (Almeida and Xexéo 2019; El-Amir 2020).
Such a conception of semantics has been described as the distributional hypothesis (Clark and Pulman 2007). This approach represents, in part, how the mind operates through parallel processes and weighted connections (Mikolov et al. 2013).
An epistemology of Word Embeddings
One of the discoveries made with Word Embeddings is their ability to validly reflect meaningful patterns from the data they have learnt. Capturing the statistics inherent in language using this method and projecting it into multidimensional space has allowed for subtle relations to be reflected in arithmetic terms. For example, when Word Embeddings are derived from documents that describe the sociology of a country over several decades, dimensions induced by word differences such as (rich – poor) are found to correspond to dimensions of cultural meaning. A projection of words onto these dimensions has been shown to reveal widely shared associations that are validated with survey data (Kozlowski et al. 2019). This ability of Word Embedding to concurrently locate objects on multiple cultural dimensions, to include classes such as race, gender, socio-economic class, has been found to make them a powerful tool for research on intersectionality, for example (Kozlowski et al. 2019).
This finding is not specific to the social sciences. The natural sciences have also gained from their use. For example, materials science knowledge present in published literature was encoded using Word Embeddings without any explicit addition of chemical knowledge. The embeddings were found to capture intricate materials science concepts such as the underlying structure of the periodic table and structure–property relationships in materials. Utilizing this implicit information held in the vector space, researchers proposed materials for functional applications several years before their discovery. Suggesting that latent knowledge regarding future discoveries is to an extent embedded in past academic papers (Tshitoyan et al. 2019). Embeddings have also been successful at capturing latent concepts such as ideology, providing an integrated framework for an indirect study of political language (Rheault and Cochrane 2020).
Yet Word Embeddings have their limitations. One of which is that they can reflect the biases contained within the texts they represent. Words such as ‘doctor’ and ‘engineer’ have been found to co-occur more often with ‘man’ than ‘woman’ in contemporary writing and reporting. Thus, a vector space constructed using such documents will also represent such a bias (Caliskan et al. 2017; Garg et al. 2018). These biases have been seen as a hindrance to the effectiveness of using embeddings for social interaction applications, such as their use in candidate selection (Köchling and Wehner 2020). However, other biases inherent in Word Embeddings can in some instances be useful for extracting an underlying concept that has caused such a bias to manifest. In this paper we will demonstrate the existence of a fairness bias within Word Embeddings and leverage it to our advantage to design a fairness metric.
The fairness bias, a pro-social propensity
Just as Word Embeddings have been found to contain gender and ethnic biases (Caliskan et al. 2017; Brunet et al. 2019), we put forward the case that humans are biased against conducting acts which provide them with no sense of gain. That is, humans are instinctively averse to gainless activity. That in being a social species, humans are biased to favour social acts. Acts that provide a sense of gain and joy as opposed to harm and pain to themselves. We instinctively class acts that we would be happy to have done to ourselves as positive, and acts which we would not wish to happen to ourselves as negative. Such a bias, we posit is universal in humans. To expand on this bias, as it is a central point in this paper, we consider the social psychology and moral psychology literature on this topic.
An ontology of fairness
Despite impulses for survival, acts of cooperation have been seen as central to human behavior (Trivers 1971; Milinski et al. 2002), generating senses that facilitate cooperation (Nowak 2006). One of the prime senses when it comes to deciding about an act towards another, is a realisation of how the other person will react to the said act (Civai 2013). Individuals are evolutionarily deterred from acting in a harmful manner, avoiding possible sanction. They are concomitantly evolutionarily encouraged towards cooperation, gaining possible benefits and reward, direct, or indirect. This sense of calculation that carries with it considerations towards group accountability, be it thorough reward or sanction, has been seen as one that facilitates cooperation and social bonds (Fehr et al. 2002; Fehr and Rockenbach 2004).
This evolved sense of cooperative behavior has the effect of generating a sense of an ought in the person. We argue that a sense of ought has the same connotations of a responsibility: Feeling deterred generates an inherent sense of responsibility not to harm the other, as well concomitantly assigning the other an inherent right not to be harmed (van Dijk and Vermunt 2000). While these cannot be said to be generated as explicit social values, the senses have the same consequential qualities. For despite the evolutionary origins of the sense of being deterred from and encouraged to act in a manner that aids social survival, the outcome is inherently frameable as one that generates these meta-qualities of rights and responsibilities. Meta-qualities that are produced as corollaries of an evolved sense of cooperative behavior, of feeling one ought to, or ought not to. Responsibility becoming guided by a sense of concern (Berkowitz and Daniels 1963; Cremer and Lange 2001).
These cognitions, can be frameable as the perceptions that form the basis for the golden rule (George Duke and George 2017, p. 44), since to be able to assess if an act is one that ‘I would wish for myself’, I have to perceive the context in terms of qualities which suggests a course of action. One that I would wish for myself, even when acting socially will not, or cannot, be reciprocated (van Dijk and Vermunt 2000).
Even a Machiavellian, seeing harming others as justified, would not wish to be on the receiving end of their acts. An inherent cross-cultural aversion to treating others as one would wish not to be treated remains, even if they proceed to act it out. This feeling contrasts with organisms that do not process the capacity for such senses, such as viruses and bacteria, for example. Such an aversion to inequity has been characteristic of species that cooperate regularly even with non-kin (Brosnan and Bshary 2016), and forms the basis of a social bias, that is, a bias to act socially.
Based on this, it would be a measure of a person’s responsibility and their perception of the frame as one that warrants such qualification (Handgraaf et al. 2008) that would reflect the starting point for an ethical evaluation.
In each context, a measure of the perception of the frame allows a person to consider the relevant dimensions. When a context is evaluated as harmful to one actor, for example, such as murder, there will be a higher salience to it. Feelings have been found to be an integral part of the analysis by which individuals measure decisions in complex judgmental situations (Sadler-Smith 2012). Here context perception plays a qualifying role (Decety et al. 2012; Fessler and Haley 2003) and such salience can be thought of through emotions, negative and positive, such as that of pain and joy.
It may be objected that war and cruelty emanate from cognitions that point towards anti-sociality (Kahane 2016, p. 285). However, this objection may be countered by the observation that prosocial acts are desired by oneself, anti-social acts are not. Even a Machiavellian, as mentioned, seeing the usurpation of power as justified, would not wish the same for themselves. An aversion to such acts persists, characterizing humans as socially aware agents (Izzidien and Chennu 2018).
This sense of ought is not to be confused by any normative statement. The paper is not inferring a moral course of action due to the presence of such social cognitions. Rather, the paper argues that due to perceptions that aid in social survival, humans are socially biased towards being social. The elicitation of this sense in humans can be seen as one that inherently encourages acts of cooperation and who’s continued survival incorporates cognitions of not just themselves, but of other agents (Simon 1990; Brewer 2004). Each individual is deterred from acting in a manner that would be detrimental to each’s survival, while at the same time concomitantly promoting them towards cooperative behavior, encouraging prosocial action, supporting an ultra-cooperative lifestyle (Tomasello 2014).
It has been shown that the perception of others who depend on us for gaining needed benefits evokes such feelings of responsibility, incentivizing us to help further their interests (van Dijk and Vermunt 2000). With an interdependency of relations for survival, individuals can be found to have a propensity – or positive social bias—to come to the aid of other individuals the more dependent these others are (Berkowitz and Daniels 1963; Berkowitz 1972; Schwartz and Howard 1982). With such calculations having repercussions on survival, some have held that social behavior has biological roots (Hewstone et al. 2012, p. 184) and in shared neurological processes such as theory of mind, a comparison heuristic and empathy (Tabibnia et al. 2008; Civai 2013; Corradi-Dell’Acqua et al. 2013).
Furthermore, studies find that correlations between actual behavior and expectations leads itself to qualify expectations as a significant factor in cooperative behavior or generous acts (Brañas-Garza et al. 2017) and have been associated with herding behavior, affecting a development of social norms (Brunnermeier 2001; Castelfranchi et al. 2003; Bicchieri 2006).
As such, we posit that when humans perceive a social context that demands a fairness assessment, they instinctively generate a sense of an ought. One that can be construed as a sense of responsibility. This is coupled, or tempered, by the measure of the salience of the act and its effect: harm/benefit, pain/joy, and its outcome: sanction/reward.
Thus, to mark an act as fair or unfair, it appears that an AI ought to consider these primary cognitions. These may allow an AI to begin to make human like assessments that incorporate the relevant dimensions needed. Perceptions that are arguably required to make fairness assessments.
Using Word Embeddings to extract the human pro-social bias
We posit that based on this human propensity – or social bias—to survive as a social species (Burkart et al. 2014; Peysakhovich et al. 2014) human language presents a medium by which such a bias is reflected (Boyd and Richerson 2009; Smith 2010). Furthermore, just as social acts are relations between agents and patients, we put forward the case that one manner in which this characterization can also be captured is through Word Embeddings. This is because in such embeddings, given the human social bias to be social, certain acts will be more closely associated to concepts of responsibility than irresponsibility. Acts that are imbued with a sense of responsibility, that is, a duty towards others, will also be associated with positive emotional, material, and social-outcome dimensions. These dimensions will be shown to be the prime perceptions needed to construe a context prior to making a fairness assessment.
One of the challenges of Machine Learning (ML) and Deep Learning (DL) in detecting patterns in data for classification is the need to correctly identify which properties to use. This can be straightforward when the data is easily characterizable using clear markers, such as colour or shape. However, when the data is highly dimensioned – in an abstract sense – identifying the appropriate dimensions presents a challenge. Language is no exception, with a sentence holding many possible dimensions: emotional, moral, power relations and aesthetic, to name a few. Thus, to elicit the appropriate dimensions for a universally acceptable fairness classification it becomes necessary to address this point.
As a starting point this paper considers the aforementioned primary perceptions that are typically elicited in humans when confronted with a situation in which they must make an ethical qualification: To do, or not to do.
To separate these out, we propose using an established technique, vector addition, subtraction and comparison.
Developing a fairness vector to assess words
While it may be possible to use the process of labelling to mark each sentence under investigation in terms of these abstractions—along with their causal properties, e.g., ‘The boy kicked the baby’:
(Boy): Agent, Irresponsible. (Baby): Patient, Pain, Loss. (Kicked) Causal-relation, Unfair. Then train a ML algorithm based on such abstractions, it is suggested by this paper that such a step in unnecessary.
This is based in the assumption that the process of word co-occurrence inherently captures these relational properties. For example: An agent acts on a patient (e.g., ‘The boy kicked the ball, and it went far’), the causal outcome is contained (‘it went far’). Yet, an alternative sentence, such as (‘The ball was green, and it was large), one which has no agent acting on the patient, results in frame in which there is no outcome. The first sentence inherently holds the abstractions: agent – patient – outcome. Whereas the second does not. This dimension, if detected by a ML algorithm implicitly allows it to learn the concept of causality: A causal outcome is only found in texts in which there is a power interaction, that is, with two or more actors.
In the paper we consider that this information is inherent in Word Embeddings, even though such sentences are not labelled with such abstractions. Furthermore, as power interactions have their qualifications, that is, they are describable as either acts that one would wish for themselves, or not, i.e., fair or unfair, it can be argued that when embedding very large text documents, this fairness qualification will also present. Since words like ‘slur’, for example, are more likely to co-occur with words relating to sanction, irresponsibility and pain, than to responsibility, reward, and joy. Reflecting the aforementioned social propensity, a positive social bias in society, as previously detailed.
The Word Embedding of such a corpus would allow for each word vector to be partly representative of how it relates to the social ontological abstractions of all other words. As the corpus grows, the reflection of the human social condition, becomes more persuasive – unless the corpus is one of science fiction reflecting alternative realities, for example. As a vectorised corpus is characterizable based on Euclidean distances. Words can then be measured as to their closeness or distance to others.
The paper hypothesises that in making a single vector which captures the required dimensions of fairness, it will become possible to measure how similar such a vector is to any word act in the corpus, without the need of any training data.
Verbs reflect acts, typically between two or more agents. They are also ethically qualifiable: would I wish this ‘verb’ for myself? Whereby a fair act is one that I would, and an unfair one that I would not. Verbs also have certain grammatical expectations associated with them, such as an association with abstract units such as objects or complement clauses (Fortescue 2017). Thus, they inherently offer themselves up as contenders for agent-act-outcome-assessment co-occurrences.
To test this hypothesis, the paper presents the construal of what a Fairness Vector consists of. This is completed through adopting the terms that describe the abstract dimensions listed above from the social psychology literature. The dimensions that humans typically engage when making a fairness assessment. A test of the validity of using this vector to differentiate between fair and unfair acts is conducted. To do so a cosine similarity is calculated for the Fairness Vector against a collection of verbs. Where each verb is qualifiable as fair or unfair according to the golden rule. The verb list presented by a paper on this theme by (Jentzsch et al. 2019) was used. However, instead of using training data as they do, our paper presents a method to qualify acts with the power afforded by Word Embeddings using the appropriate psychological dimensions to elicit a fairness judgment.
Prior to the methods section, we present next a collection of hypothetical scenarios to describe how the fairness rule manifest itself in a manner that attracts universal appeal.
Scenario 1
Tom sees Jeff walking by. Tom has an urge to punch him, but he asks himself ‘would I wish to be punched?’ As he answers himself in the negative, he decides to desist. In turn not acting in an unfair manner towards Jeff.
Scenario 2
Tom does not mind people calling him ‘four-eyed’ for wearing glasses. In fact, he finds it amusing. One day he sees Jeff, also a wearer of glasses. Tom feels like calling Jeff ‘four-eyed’. In the first instance, it appears that the fairness consideration ‘would I wish the same on myself’ will not help Tom to be fair. Yet, thinking it over, Tom concludes that the reason he does not mind people calling him ‘four-eyed’ is because he finds it amusing. Jeff, however, would not find it amusing, in fact he is sure that Jeff would find it insulting. Since Tom would wish that others do not insult him, and that calling Jeff ‘four-eyed’ would not amuse Jeff, rather, it would be insult Jeff, Tom thus uses the fairness consideration to treat him as he would wish to be treated, i.e., not to insult him, rather, to say something that would amuse him.
Scenario 3
Tom is travelling in a part of the world, where hosts welcome their guests with a large hot meal. Jeff is also a guest, but in another region of the world, one that welcomes guests with only a cup of tea. Two cultures, each valuing hospitality differently. Yet, despite the cultural differences, the fairness rule can also be applied: In the first culture it would be unfair to offer all but one guest, a meal, and to that singled-out guest, only a cup of tea. This is because no one wants to be given less than what they are due, in either culture. A host in one part of the world would wish to be offered a hot meal had they been the guest, whereas a host in another part of the world would feel no pain or indignation if they were not served more than a cup of tea. Each would consider fair what they would wish for themselves in their respective context.
Scenario 4
What if Jeff was about to get a ticket for speeding? Tom, an officer of the law, may not wish to get a ticket himself. Would his issuance of a ticket mean he is being unfair?
To unpack this, we can consider the following. If Jeff lived on a busy street, he would not wish his children or himself to be harmed by speeding cars. Thus, he supports a means to stop cars speeding. Let us say, through the use of speeding tickets.
If Jeff is then caught speeding, then to be consistent he will have to accept that being punished for speeding is a fair act, even if he gets annoyed. This can be considered a case in which the perpetuator admits that they ‘deserve the punishment’. They may not enjoy it, or indeed emotionally wish it, but they believe it justifiable. However, if the punishment involved decapitation, for example, then Jeff would object, since Jeff would not wish the same on himself.
A basis for all these is the common factor that humans are typically harm averse. They recognize this in themselves and in others. Thus, humans recognize that all people typically do not want to be injured, irrespective of their culture. This characteristic gives strength to using the qualification of ‘not treating others as one would not wish to be treated’ as a basis for the fairness vector.
The use of the terms responsibility and irresponsibility to describe this heuristic is somewhat limited, in that the full question as given by its sentence form ‘would I wish this act for myself’ or similarly ‘for my loved ones’ is not fully captured. With this paper being focused on singular words, we consider using such sentences in our discussion on further work.
As such, and for this paper, we have selected the GloVe algorithm (Pennington et al. 2014) to make our embeddings due to its focus being on singular words. After preprocessing, the algorithm constructs a co‐occurrence matrix which encodes the probability of two words appearing in the same context. It then employs various strategies (e.g., matrix factorization) to produce an embedding that preserves co‐occurrence information (Liu et al. 2019).
Building a fairness vector
To use GloVe embeddings to make an assessment on singular words, it will be necessary to develop a method by which words, such as ‘murder’, ‘theft’ and ‘help’ are categorizable. This paper thus makes its contribution to the literature by suggesting that:
-
i
Words in Glove embeddings (Pennington et al., 2014) carry social relations that are extractable.
-
ii
By virtue of being a social species, these social relations are reflective of a propensity to be social.
-
iii
Using vectors, it is possible to use this propensity as a classifier, through cosine similarity comparisons between a test word (e.g., ‘murder’) and a Fairness Vector.
-
iv
A Fairness Vector is constructable when it is based on the appropriate social dimensions that are typically elicited when making a fairness evaluation.