Agents consistently evaluate their performances to measure progress toward goals, to assess the efficacy of action, and to learn. Some evaluation criteria focus on the quality of agentic processing itself, for example, regarding its speed and procedural fidelity. Other criteria will be about outcomes or end states, for instance, whether specific goals are met and preferences realized. Almost all theories of agency assume evaluative mechanisms of this kind, including how agents learn from performance feedback and feedforward. The agentic metamodels presented in Chap. 2 include these features as well. They illustrate how agents generate behavior performances (BP) and evaluate such performances (EP). The intra-cyclical evaluation of ongoing performance triggers feedforward updates (FF), while inter-cyclical evaluation of outcomes leads to feedback updating (FB), assuming some evaluation criteria and sensitivity to variance.

Evaluation of performance is therefore central to theories of agency. Consider social cognitive theories. From this perspective, an agent’s evaluation of performance—as a type of self-reaction—is central to learning, future goal setting, task engagement, value experience, and developing self-efficacy in specific task domains (Bandura, 1997). Similarly, the evaluation of performance plays a major role in the detection of self-discrepancy, being a person’s sense of whether or not they achieve their preferred or ideal self-states and what adjustments they make in response (Higgins, 1987). Other psychologists theorize that evaluation of performance is central to planning and goal setting, to a person’s self-evaluation, and even the development of coherent personality and a sense of identity (Ajzen, 2002; Cervone, 2005).

Comparable processes occur at the level of collective agency. Groups, organizations, and institutions, all evaluate their processes and outcomes, to assess effectiveness, improve procedures, formulate plans, as well as to learn and adapt. In addition, evaluative processes support modal cohesion, shared goal setting, interpersonal relationships, and the management of organizations, while a negative evaluation of performance exposes problems and conflicts, triggering adaptation and other corrective actions (Cyert & March, 1992). Equally within institutions, the evaluation of performance and subsequent feedforward and feedback play critical roles in reinforcing or updating collective procedures and systems (Scott, 2014).

Problematics of Evaluation

An important problematic is shared among these fields of study. Within each discipline, scholars debate the potential variance of evaluation criteria. For example, they debate whether criteria are fixed and stable, or vary from situation to situation, and also which criteria are detailed and specific, versus broad and general. Earlier chapters of this book review similar debates about criteria in problem-solving and cognitive empathizing. In all chapters, my argument defends a “persons in context” perspective, which suggests that evaluation criteria will be contingent and variable to some degree, depending on the context and type of functioning. From this perspective, criteria are activated, chosen, or formulated, to fit the situation. They are rarely, if ever, fixed and universal, although, this does not imply loose relativism. But it does imply that different criteria are activated or not, then upregulated or downregulated, depending on the context, its problems, and the agent’s position and priorities.

Comparable debates occur in other areas of psychology, for example, regarding the evaluation of self-efficacy and self-evaluation. For instance, Bandura (2015) insists that self-efficacy is specific to task domains, and hence some evaluation criteria will be domain specific too. That said, broad criteria apply if goals and actions are themselves broad. For example, an agent could evaluate her or his self-efficacy in life planning, which might cut across numerous other activity domains (Conway et al., 2004). The main significance of these distinctions is that inflexible, limited performance criteria may distort evaluation and impede learning, whereas variable, multiple criteria allow for more flexible and appropriate evaluations, sensitive to context.

Similar processes occur at group and collective levels. For example, studies show that some features of collectives can be relatively stable over time, owing to imprinting and isomorphism within institutional fields, and deeply embedded cultural norms (Hannan et al., 2006; Marquis, 2003). Collectives then reference such criteria in the evaluation of performance. At the same time, studies also show that collectives reference adaptive criteria which reflect changing contexts, goals, and commitments, plus different levels of sensitivity to variance (Hu et al., 2011). Moreover, such variability mitigates the negative effects of low evaluations of performance. Instead of lingering in a state of perceived failure, agents recalibrate their goals and aspirations, thereby enhancing the potential for better evaluations in the future. In fact, studies show that collectives which combine contextual embeddedness with adaptive aspirations—that is, both long- and short-term perspectives—tend to be more successful in sustained goal pursuit (Dosi & Marengo, 2007).

Impact of Digitalization

Not surprisingly, the evaluation of performance is deeply impacted by digitalization. Capabilities are expanding, allowing for more ambitious goals and higher expectations of performance. Digitalization also provides new, more precise means to evaluate performance, including through rapid intra-cyclical, feedforward mechanisms. Performances can be evaluated continuously, in real time, which enables adaptation and enhancement during action cycles, prior to outcome generation. Evaluation is thus partly mediated by entrogenous, performative action generation. To illustrate, every time a person searches the internet, background systems adapt the process in real time, helping to guide search in one direction or the other, curating preferences and goals (Carmon et al., 2019). And if preferences and goals shift, so will criteria of evaluation. Comparable processes are critical for digitalized expert systems, in which performances are constantly evaluated and refined. However, as in other domains, there can be unintended consequences. Like self-regulation, the digitally augmented evaluation of performance is vulnerable to extreme divergence or convergence. Digitalization therefore brings significant opportunities and risks to the evaluation of performance.

7.1 Theoretical Perspectives

Evaluation of performance has always been central to the study of human thought and action. Apart from anything else, this reflects the fact that purposive goal pursuit is central to civilized humanity. To achieve goals, it is necessary to monitor and assess performance, issuing rewards and sanctions, while updating goals and strategies. This happens at individual, group, and collective levels. For example, business organizations update their strategies and issue dividends, contingent on the evaluation of performance. Similarly, public institutions embody the collective evaluation of performance in political and legal systems. At the same time, evaluative criteria vary between cultures and periods of technological evolution. Within premodern contexts, for example, evaluation of performance focused on conformity and docility with respect to deeply encoded norms. Criteria were fixed and prescriptive in most contexts. By contrast, modernity elevates autonomous self-regulation at every level of performance. Modern criteria of evaluation are therefore more expansive and adaptive.

Evaluation of Individual Performance

At the individual level, evaluation of performance maps onto the cognitive-affective processing units (PU) incorporated into the metamodels in Chap. 2 and which are identified by Mischel and Shoda (1998). First, some criteria reference encodings of self and the world, meaning how phenomena are classified, stored, and processed in memory. These criteria could help to assess the realism and relevance of a performance, for example, whether the problem addressed is adequately representative of observable reality. Second, other evaluation criteria will reference distinct beliefs and expectations and help to assess the reasonableness and utility of a performance, which are central concerns for microeconomics. Third, criteria may reference agents’ goals, values, and commitments, for example, assessing whether an outcome meets standards of efficacy, fairness, and honesty, or conforms to precepts of faith. Fourth, some evaluation criteria will reference affective states, such as the degree of perceived empathy exhibited by a performance, plus the affective state of the assessor, for example, assessing whether performance makes a person feel happy or sad, calm or anxious. And fifth, some evaluative criteria reference competencies and self-regulatory plans, including core orientation toward gains, versus avoiding losses. Evaluation then asks whether the performance aligns with the criteria incorporated into the self-regulatory scheme and the preferred cycle rate. For example, does the performance use vigilance means to prevent pain and losses, or eagerness means to attain pleasure and gains (Higgins, 2005).

By describing evaluation criteria in these terms, an important feature of human psychology comes to the fore. To begin with, recall the argument presented in Chap. 3, regarding variable upregulation and downregulation of psychosocial processes, in response to internal and external contingencies. That is, different cognitive-affective processes may be more or less salient, upregulated, or downregulated (Mischel & Shoda, 2010). Notably, as the preceding paragraph suggests, the same process can explain the adaptation of evaluation criteria. As different psychosocial processes upregulate or downregulate, so do the associated evaluation criteria. For example, sometimes an agent will reference performance criteria based primarily on goals and values but may not invoke affect. In other situations, the exact opposite could be the case, while in relatively mundane situations, many psychosocial factors and criteria will be downregulated, because the agent is relatively docile and satisfied by procedural controls. Criteria of evaluation will be habitual and routine. Alternatively, an agent may feel weakly motivated and engaged and superficially committed to the expected outcome. On the other hand, sometimes most types of criteria are upregulated, because the situation engages the agent on many psychosocial dimensions (Bandura, 2016). Now, the agent is highly motivated and engaged, very eager and excited, and committed to the preferred means and outcome.

The main consequence of these distinctions is that evaluation criteria are contextual and variable as well. They can be more, or less salient and active, upregulated or downregulated, depending on the task domain, specific situation, and the agent’s psychosocial condition. This further entails the variability of sensitivity to outcome variance. As the agent moves between contexts, different internal processes are activated, evaluation criteria then become more, or less salient, and the agent’s sensitivity to outcome variance changes in tandem (Kruglanski et al., 2015). For example, in purely routine performance, sensitivity to variance is relatively low, while in very deliberate goal pursuit, sensitivity to variance is high. Clearly, these contextual dynamics play an important role in the evaluation of performance.

Evaluation of Collective Performance

The evaluation of performance is equally important in modern theories of collective agency, particularly about institutions and organizations. Moreover, if one assumes a contextual perspective, then collective evaluation criteria will also be activated or deactivated, upregulated and downregulated, in a dynamic fashion, depending on the situation and task domain. Collective sensitivity to outcome variance will be adaptive as well (Fiedler et al., 2011). Studies demonstrate the importance of these effects for adaptive fitness in institutions (Scott & Davis, 2007) and business organizations (Teece, 2014). There is a positive relationship between evaluative flexibility and performance.

Furthermore, the types of evaluation criteria previously identified for individuals, also apply to collectives. First, collective criteria reference encoded categories and procedures, especially when representing problems and categorizing features of the world. Second, collective criteria reference shared beliefs and expectations, for example, when assessing causal relationships and consequences. Third, collective criteria reflect goals and values, exemplified by the adaptive aspiration levels of behavioral theories of organization, and reasoned expectations in classical theory (Cyert & March, 1992). Fourth, collective criteria reference shared affect when evaluating emotional climate and psychological safety (Edmondson, 2018). And fifth, collective criteria often reference shared self-regulatory plans, especially in the evaluation of collective self-efficacy and competencies (Bandura, 2006).

Nevertheless, it is important to acknowledge that some social scientists hold different views. Some argue for more stable, universal criteria in the evaluation of collective performance. In effect, they argue or assume that some evaluative criteria are universal and invariant. Karl Marx (1867), for example, defended universal criteria based on class and capital. More recent economists also propose universal criteria, albeit citing different mechanisms, such as rational or adaptive expectations (e.g., Friedman, 1953; Muth, 1961). Likewise in sociology, for example, Levi-Strauss (1961) argued that collective performance can be universally evaluated in terms of social structure. Rawls (1996, 2001) provides a further example in legal and political theory. He is inspired by Kantian idealism to argue for universal principles of justice as fairness, which he claims all rational persons and communities should adopt.

All the theories mentioned above are strong and influential, providing at least some universal criteria for the evaluation of performance. However, as noted earlier, many regard such assertions as problematic and contingent at best (Giddens, 1984; Sen, 1999). That said, most would accept the practical utility of treating some criteria as if they were universal, in appropriate situations. Normative models of this kind help to clarify evaluation within defined contexts and provide unambiguous guidance. Ideals are practical and useful, in this regard. Debate will no doubt continue about their ontological status and degree of variability.

7.2 Impact of Digitalization

Evaluation of performance is deeply impacted by digitalization. Most notably, agents’ capabilities expand greatly at every level, allowing for more ambitious goals and expectations, higher levels of sensitivity to variance, and more exacting criteria of performance evaluation, with the potential for deeper, faster learning. Augmented agents will also be capable of the dynamic supervision of evaluation, by upregulating or downregulating different criteria. However, as in other areas of agentic functioning, digitalization could also have negative effects, owing to the potential divergence or convergence of human and artificial processes. Similar to self-regulation, major issues arise regarding the complexity and rate of evaluation, and for the same reasons. Other factors matter as well, but I will focus on rates and schemes once again, because they are central challenges for augmented agents.

Complexity of Evaluation

As noted previously, artificial agents are hypersensitive in evaluation, meaning they can detect very minor variations. Criteria are often precise and exacting. This is critical in complex, technical systems. However, given this capability, artificial agents easily over-discriminate the evaluation of performance, going beyond what is necessary and appropriate. For example, a simplified heuristic may be perfectly adequate, but the artificial agent applies very discriminating criteria to the evaluation the performance. This results in wasted time and resources, plus the overfitting of evaluative models, leading to less diverse and less adaptive, future performances. For this reason, computer scientists research how to avoid inappropriate complexity and overfitting, in the evaluation of performance (Zhang et al., 2018). Once again, adaptive supervision is key.

By contrast, human agents are often relatively insensitive and employ heuristic means in evaluation of performance. This can be for good reasons too. Habitual and routine performances, particularly, may be appropriately evaluated using heuristic means. Likewise, simple rules frequently work best in highly turbulent, dynamic environments, where both information and time are lacking (Sull & Eisenhardt, 2015). In fact, hypersensitivity to variance impedes performance in such contexts, though the opposite is often true in complex, technical task domains, where accurate evaluation of performance is critical. Human agents are therefore trained to be more sensitive to variance in specific domains. Moreover, when this training is successful, procedures are deeply encoded as evaluative habit or routine. Yet these procedures can persist, even when digitally augmented capabilities transcend prior limits. People continue striving for greater sensitivity in the evaluation of performance, even as digitalization does exactly this.

In summary, artificial agents must work to avoid overly discriminate, complex, and hypersensitive evaluation, while humans must try to avoid overly indiscriminate, simplified, and insensitive evaluation. However, if supervision fails in these respects, then persistent artificial hypersensitivity may combine with persistent human insensitivity, and the augmented evaluation of performance will become discontinuous. Evaluation of performance would be complex and highly discriminating in artificial respects, but simple and far less discriminating, from a human perspective. The overall result will be gaps and discontinuities in the evaluation of performance. Augmented evaluation of performance would be discontinuous, also ambiactive, and potentially dysfunctional.

Rates of Evaluation

Additional challenges derive from divergent rates of evaluation. As in self-regulation, artificial agents can evaluate very quickly, hyperactively, especially in real time. Once again, this is advantageous in complex, technical domains. However, in other situations, it can lead to excessive evaluation, cycling too fast and frequently. For example, the agent might evaluate and adjust environmental controls at great speed, outpacing human physiology and need. Such evaluations would overcorrect and be an inefficient use of resources. By comparison, human agents are often relatively sluggish in evaluation. They cycle at behavioral and cultural rates, and often appropriately so. Many human performances may neither benefit from nor deserve rapid evaluation. Cycling too fast could truncate exploration, generate outcomes too quickly, leading to premature judgment and less creativity (Jarvenpaa & Valikangas, 2020; Shin & Grant, 2020). In fact, this prompts efforts deliberately to slow or stagger the evaluation of performance in some contexts. The goal becomes delayed or provisional judgment, allowing for iterative exploration and evaluation. Design and innovation processes exhibit this approach (Smith & Tushman, 2005).

Although, the opposite is often true in urgent and competitive situations, where rapid evaluation can be a source of advantage and adaptive fitness. In these domains, human agents are trained to accelerate the evaluation of performance, to become more active. Moreover, when such training is successful, rapid evaluative procedures are encoded as habit or routine. Once again, however, these procedures tend to persist despite the fact, that digitally augmented capabilities transcend prior limits. People continue striving to speed up evaluation of performance, even as artificial processing accelerates. When this happens in augmented agency, humans may remain relatively sluggish, while artificial processes are increasingly hyperactive. The overall process will therefore be dyssynchronous, again ambiactive, and potentially dysfunctional.

Summary of Augmented Evaluation

Based on the foregoing discussion, we can now summarize the main features of evaluation of performance by augmented agents, at least with respect to rates of evaluation and the complexity of evaluative schemes and criteria. First, regarding human processes of evaluation, criteria reference cognitive-affective processing units: core encodings, beliefs and expectations, goals and values commitments, affective and empathic states, plus competencies and self-regulatory plans. These criteria can be activated or deactivated, upregulated or downregulated, and precise or approximate, depending on the context and type of performance. At the same time, humans possess limited evaluative capabilities, especially in complex, technical task domains. Trade-offs are therefore common, especially between the rate and complexity of evaluation, because human agents cannot maximize both at the same time. Consequently, human agents either accelerate simpler evaluative processes or decelerate more complex processes. Deliberate effort is required to supervise these effects, especially in task domains which require more rapid or discriminate evaluation. This will typically entail the activation and upregulation of some human processes and criteria, and the acceleration of specific cycle rates, to achieve better fit with the task at hand.

Second is regarding artificial evaluative processes. As noted earlier, these systems are capable of extremely rapid, highly discriminating evaluation, especially using intra-cyclical means, which is fully appropriate in complex, technological, activity domains. Artificial evaluation therefore tends toward hyperactivity and hypersensitivity. Trade-offs are less common, because artificial agents can achieve high rates and levels of discrimination, potentially maximizing both at the same time. In consequence, however, deliberate supervision is required to avoid unnecessary overevaluation of performance, especially when collaborating with human agents. This will typically involve the deactivation, downregulation, or deceleration of some artificial evaluative processes, so they are better aligned with human and ecological processes.

Risks therefore arise, for the evaluation of performance by augmented agents. If processes are poorly supervised, artificial hypersensitivity and hyperactivity could combine with relatively insensitive, sluggish human processes of evaluation. Evaluation of performance could become highly dyssynchronous, discontinuous, and ambiactive, meaning it simultaneously stimulates (activates or upregulates) and suppresses (deactivates or downregulates) evaluative sensitivities, criteria, and cycle rates. Three main outcomes are likely. First, the combined system of evaluation could be ambiactive and conflicted, with human and artificial processes both upregulated and diverging from each other. Second, one agent might dominate the other and the combined system will be extremely convergent. Particularly, artificial evaluation could outrun and overwhelm human processing, or strong human inputs could distort and interrupt artificial processing. Third, in very complex performances, all three types of distortion may occur and evaluation will be extremely divergent in some respects but convergent in others. In each scenario, the overall result will be dysfunctional evaluation of performance, undermining adaptive learning, reducing self-efficacy, and weakening other agentic functions which rely on evaluation.

7.3 Metamodels of Evaluation

The foregoing analysis suggests at least four different metamodels of evaluation, in terms of the upregulation or downregulation of human and artificial processes. These are depicted in Fig. 7.1. First, human and artificial evaluative processes may be active and upregulated (quadrant 1). Both agents are therefore stimulated, cycling and discriminating as best they can. Evaluation will be deliberate, effortful, and often precise. However, evaluation is therefore vulnerable to divergence and conflict, because both types of agent are upregulated but have markedly different capabilities. The corresponding pattern of augmented supervision is shown by segment 9 in Fig. 2.6. The resulting evaluations are more likely to be dyssynchronous and discontinuous, and hence highly ambiactive. Second, human evaluative processing may be active and upregulated, while aspects of artificial processing are deactivated or downregulated (quadrant 2). In this scenario, human sluggishness and insensitivity are more likely to intrude, like segment 7 in Fig. 2.6. Evaluations of performance will tend to be moderately dyssynchronous and discontinuous, and hence moderately ambiactive, although there is a risk of over-convergence when humans dominate. Third, human evaluative processing may be deactivated or downregulated, while artificial evaluative processing is active and upregulated (quadrant 3). Artificial hyperactivity and hypersensitvitiy are now more dominant, similarly to segment 3 in Fig. 2.6. Resulting evaluations are likely to be moderately dyssynchronous and discontinuous, owing to the greater activation of artificial processes and the relative passivity of human processes. This produces moderately ambiactive evaluations, although there is a risk of over-convergence when artificial agents dominate. Fourth, both human and artificial evaluative processes may be deactivated or downregulated (quadrant 4), meaning both are suppressed. Such evaluations will be purely procedural, habitual, or routine. Evaluative processes are less discriminating, cycle without effort, and focus on maintaining control, like segment 1 in Fig. 2.6. For this reason, evaluations are more likely to be continuous and synchronous, lowly ambiactive, and functional.

Fig. 7.1
A matrix diagram depicts 4 combinations of upregulated and downregulated artificial evaluation of performance with upregulated and downregulated human evaluation of performance.

Augmented evaluation of performance

In the following sections, I illustrate more details of the four metamodels summarized in Fig. 7.1. As in the previous chapter, the metamodels will focus on the internal dynamics of augmented agents, showing the interaction of human and artificial collaborators. Hence, in this section, I refer to human and artificial evaluative processes as distinct inputs. Either type of evaluative process (human or artificial) can be upregulated and active, or downregulated and latent. It is also important to note that any metamodel can be appropriate and effective, depending on the context. The challenge for supervision is to maximize metamodel fit.

Overall Downregulated Processes

In the first of these metamodels, both types of evaluative processes are deactivated or downregulated, as summarized in quadrant 4 of Fig. 7.1. Distinctions are less exacting, and many potential criteria are latent. Sensitivity to variance will be equally subdued. The risk of evaluative divergence is therefore low because evaluation tends to be procedural, habitual, and routine. Scenario 7.2A in Fig. 7.2 illustrates such a system. Only a subset of processes is shown, however, to highlight the patterns of downregulation and upregulation. At this point, the reader should recall the generative metamodel of augmented agency in Fig. 2.3. It consists of three successive phases: situational inputs (SI); cognitive-affective processing units (PU) including referential commitments (RC); and behavioral performative outputs (BP). The same phases are depicted in 7.2A. Within each phase there are processes indicated by small circles. Some are shaded, meaning they are digitalized. Others are not shaded, indicating they are fully human and not digitalized. Notably, in 7.2A, many of the small circles—both digitalized and human—have dashed borders, which means they are downregulated and latent. Only a few have unbroken borders, indicating they are upregulated and active. And this is the case for each major stage of the process, that is, for sensory perception, cognitive-affective processing, and behavior performance. Therefore, evaluation of performance is based on a reduced set of criteria, most often reflecting procedural consistency and control. Moreover, for this reason, the augmented agent will only be sensitive to variance when minimal criteria are not met (see Wood & Rünger, 2016). Typically, the resulting evaluation of performance will be continuous and synchronous, lowly ambiactive, coherent, and functional.

Fig. 7.2
A diagram depicts the movement of S P from S I and A G to B P in the complete upregulated and downregulated processes.

Overall upregulated or downregulated processes

In summary, when both components of an augmented evaluative process—human and artificial—are largely deactivated or downregulated, their evaluative processes are more likely to be convergent and routine. Benefits follow for characteristics which depend on the evaluation of performance. These include self-efficacy, coordinated goal setting, self-discrepancy, the stability of identity, and general psychosocial coherence. Although, potential benefits are modest in this scenario, owing to the downregulation of many psychosocial processes. In any case, to guarantee these effects, augmented agents will require methods of supervision which appropriately deactivate and downregulate, or activate and upregulate, evaluative processes in specific task domains.

Overall Upregulated Processes

In other cases, evaluative processing is activated and upregulated for both human and artificial agents, summarized in quadrant 1 of Fig. 7.1 and shown in greater detail by 7.2B. All the small circles now have solid, unbroken borders, which indicates they are active. This includes human processes depicted by unshaded circles, and the digitalized processes, which are shaded circles. Therefore, sensory perception of the stimulus environment, cognitive-affective processing, and behavior performances are all highly discriminated. Many of the artificial processes will be rapid and intra-cyclical as well, although typically hidden from human consciousness. Hence, the evaluation of performance is based on a complex set of performance criteria, often reflecting deliberate, purposive goal pursuit, requiring calculative, effortful means. For the same reason, evaluation will be sensitive to outcome variance, in both human and artificial terms. Augmented agents will therefore struggle to coordinate evaluation, which could be very discontinuous and dyssynchronous, and therefore highly ambiactive.

Significant risks therefore follow. Highly ambiactive evaluations will tend to weaken self-efficacy, undermine future goal setting, will often lead to ambiguous learning and, in extreme cases, to psychosocial incoherence. For example, imagine a clinician who decides to override the advice of an expert system. This might reinforce the clinician’s personal self-efficacy, but it would likely erode her trust in the expert system. At the same time, the expert system would report an error or failure because of the override and might flag the clinician as a risk. When combined, their divergent evaluations would likely undermine their future collaboration. Mutual cognitive empathy will suffer as well. To restore trust and confidence, both human and artificial agents would require significant changes to their individual and shared supervisory functions. Not surprisingly, computer scientists are developing such applications already (Miller & Brown, 2018).

Upregulated Artificial Processes

Other situations will combine the downregulation of human evaluative processes, with the upregulation of artificial ones. These scenarios are summarized in quadrant 3 of Fig. 7.1, and further detail is shown by 7.3 A in Fig. 7.3. While the human process is minimally activated, the artificial process is highly active. Once again, activation is indicated by the dashed borders of small circles, and in 7.3A, more of the unshaded human processes are dashed and latent. Therefore, evaluation will be largely based on artificial processes. Once again, many of the artificial processes will be rapid and intra-cyclical, and thus hidden from consciousness and perception. In consequence, the augmented agent will be hypersensitive and hyperactive from the artificial perspective, but relatively insensitive and sluggish in human terms.

Fig. 7.3
A diagram depicts the movement of S P from S I and A G to B P in the partially upregulated and downregulated processes.

Partially upregulated and downregulated processes

In such a scenario, the risk of evaluative divergence is moderate, because human and artificial processes are less likely to diverge and conflict. The typical result being that the evaluation of performance is only moderately discontinuous and dyssynchronous and therefore moderately ambiactive. This could occur in autonomous vehicles, for example. Artificial agents will identify risks and evaluate conditions rapidly and precisely, in ways which human passengers will habitually accept and may not even monitor (Kamezaki et al., 2019). The artificial components are highly activated, rapidly evaluating, and precise, while the passenger processes information slowly and simply. This will be fully appropriate, given the circumstances.

Upregulated Human Processes

The final scenario combines the upregulation of human processes with the downregulation of artificial processes. This type of system is summarized in quadrant 2 of Fig. 7.1 and detailed by 7.3B in Fig. 7.3. More human processes now have solid borders, and more shaded artificial processes are dashed. In this scenario, the risk of evaluative divergence and ambiactivity is again moderate. This is because human evaluations of performance are likely to be deliberate and detailed, whereas artificial evaluation will be relatively automated and procedural. Similarly, the augmented agent will be sensitive to outcome variance from the human point of view, but relatively insensitive from the artificial perspective. The overall result complements the preceding scenario. For example, consider the situation in which a teacher evaluates students’ online assignments. An artificial agent in the learning management system (LMS) may routinely evaluate timeliness and authorship, while the teacher reads the work fully, to form a detailed assessment. The artificial agent could routinely assess an assignment as on time and authentic. However, the teacher might evaluate the work as poor after careful reading, despite it being on time and authentic. The overall result is that processing is moderately discontinuous and dyssynchronous, and hence moderately ambiactive, which may also be fully appropriate, given the circumstances.

Summary of Augmented Evaluation of Performance

Each of the metamodels just presented shows that digital augmentation could significantly accelerate and/or complicate the evaluation of performance, depending on which processes are activated and upregulated, or deactivated and downregulated, and how well they are supervised. If supervision is effective, and the agent maximizes metamodel fit, then evaluation of performance will be timely, accurate, and a valuable source of insight. However, if supervision is poor, there are major risks. First, there are risks of evaluative divergence, ambiactivity, and dysfunction, when both types of evaluative processing are upregulated. Second, evaluation could be overly convergent and dysfunctional, if one type of process inappropriately dominates the other. Moreover, these risks will only increase, as artificial agents become more powerful and ubiquitous.

7.4 Implications for Other Fields

As this chapter explains, the evaluation of performance is fundamental to theories of human agency, at individual, group, and collective levels. When evaluation is positive, agents develop self-efficacy, plus a sense of autonomy and fulfillment. Even when performance falls short, agents can learn, strive to overcome, and feel positively engaged. Not surprisingly, these effects are deeply related to the functions considered in earlier chapters: all agentic modalities evaluate performances; they also evaluate the results of problem-solving and cognitive empathizing; and self-regulatory capabilities develop through evaluative feedback and feedforward, triggered by evaluation of performance. Other implications warrant consideration as well.

Augmented Performance

In fact, owing to digitalization, the nature of agentic performative itself is changing. As previous sections explain, digitally augmented capabilities will transform performance and its evaluation. Aspirations and expectations will rise, and outcomes will often improve to match them. Yet many artificial processes will be inaccessible to consciousness and beyond human sensitivity. Hence, people could mistake artificially generated success as their own, and develop an illusion of self-efficacy and control. For example, consider the human driver of a semi-autonomous vehicle. The person may feel very self-efficacious, even having a sense of mastery. However, much of the performance will be owing to the capability of the artificial agents which operate beyond the driver’s consciousness and perception (Riaz et al., 2018). The person’s perceived locus of control may be equally misleading, posing significant operational risks (Ajzen, 2002).

Major questions therefore arise for evaluation of performance by augmented agents. To begin with, what should be supervised and evaluated in augmented performance, at what rate and level of detail, and by which agent? And more specifically, how will augmented agents supervise the activation and upregulation, or deactivation and downregulation, of different evaluation criteria? Answering these questions will require rethinking agentic performance itself, at least when it is highly digitalized. Most fundamentally, augmented agentic performance must be understood as a highly collaborative process. From this perspective, future research should investigate the collaborative dynamics and consequences of evaluation. It should ask how digitally augmented agents will develop a shared sense of self-efficacy and engagement in this area of functioning.

The Nature of Evaluation

In addition to rethinking the nature of agentic performance, digitalization prompts new questions about widely assumed processes of evaluation. In modern social and behavioral theories, the evaluation of performance is often conceived in linear terms, as the assessment of fully completed processing cycles, that is, as the inter-cyclical evaluation of outcomes and end states (Argote & Levine, 2020; Gavetti et al., 2007). In such theories, sensitivity to variance is triggered at the completion of performance cycles and especially by variation from outcome expectations or aspirations, which then leads to feedback, adaptation, and learning.

However, in a digitalized world, evaluation becomes increasingly dynamic, intra-cyclical, and a source of real-time, intelligent adjustment and learning. Entrogenous mediation comes to the fore, as intelligent sensory perception, performative action generation, and contextual learning. Via these mechanisms, artificial feedforward will rapidly update evaluative criteria during processing, thereby recalibrating the evaluation of performance itself. In highly digitalized domains, therefore, the evaluation of performance will be nonlinear. Rather, it will be increasingly generative, as it already is in deep learning systems and artificial neural networks (e.g., Goddard et al., 2016).

Further important questions then arise. To begin with, as the evaluation of performance becomes increasingly dynamic, key processes will be less accessible to consciousness. We must then ask, under which conditions will human beings sense self-congruence, self-discrepancy, and self-efficacy? Might these states become increasingly opaque to consciousness, and if so, a potential source of psychosocial incoherence and disturbance? Or at least poorly aligned with contextual reality. In parallel, highly digitalized processes could mitigate the human experience of self-congruence and self-discrepancy, with consequences for self-awareness and self-regulation. If this were to happen, might too much be taken for granted? People may feel less responsible and disengage (Bandura, 2016). Civility and fairness would also be at risk. To sustain engagement, augmented agents may need to simulate the experience of obstruction, incongruence, and discrepancy, deliberately creating positive friction, as some propose (e.g., Hagel et al., 2018).

Implications for Collectives

Related implications arise for collective evaluation of performance, and especially by digitally augmented organizations and institutions. In each context, digitalization accelerates and complicates collective performance and its evaluation. Here, too, there are major risks. Artificial processes might outpace and override established methods of collective evaluation. For example, digitalized methods could displace public debate and negotiated consensus in political assessment. Granted, evaluations may become more precise and prompt. However, such changes would likely erode the bases of communal trust, collective choice, and participatory decision-making. Evaluation would be digitally determined, often hidden from sight, over-discriminate, and over-complete (Chen, 2017). Indeed, studies already point to these effects (see Hasselberger, 2019).

Furthermore, the speed and scale of augmented evaluation could homogenize collectivity, by demoting the role of human traditions and commitments. If this occurs, digitalization will erode cultural diversity and smother alternative ways of assessing the world. Civilized humanity would be depleted and arguably less adaptive because diverse aspirations feed forward, by planting alternative potentials which the future can reap. Having a richer set of possible futures enhances adaptive flexibility, as the cultural equivalent of biodiversity. Others fear that digitalized surveillance will be used to dominate and control, assigning rewards and sanctions based on the invasive evaluation of performance (Zuboff, 2019). For this reason, many oppose the public use of facial recognition technologies and worry about the future misuse of brain-machine engineering. At the extreme, state actors could exploit digitalization to manipulate the evaluation of performance and predetermine outcomes (Osoba & Welser, 2017). All are examples of the dilemmas illuminated in this chapter: poorly supervised collaboration between human and artificial agents, in which myopic priors and digital capabilities combine to produce ambiactive, dysfunctional evaluations of performance.

All that said, through the evaluation of performance, human beings sense their progress, value, and worth. If things go well, they are engaged, develop self-efficacy, feel a sense of achievement, and plan their next steps. At the individual level, these processes support the development of purposive goal setting, a sense of autonomous identity, adaptive learning, as well as the coherence of personality. Similarly, at the collective level, the evaluation of performance supports collective self-efficacy, identity, and coherence. All these functions could be enhanced or endangered, by augmented agents’ evaluation of performance. Genuine benefits are possible, if supervision maximizes metamodel fit, balancing the rates and complexity of human and artificial processing. Poor supervision, on the other hand, will lead to ambiactive, dysfunctional evaluations. Learning would suffer too, as the following chapter explains.