1 Introduction

Artificial intelligence (AI) has been ubiquitously adopted by Fortune 500 companies in their quest to leverage big data insights to optimize various aspects of their businesses [53, 76, 127]. In parallel, competition has increased organizational pressure to create competitive advantage from AI initiatives in terms of speed, efficiency, effectiveness, expense optimization, profitability, and ROI [13, 17, 44]. Scholars have shown that AI can be effective for corporations in myriad ways, for example, in predicting mortgage loan payments [109], fighting fraud [16], preventing adversarial security breaches [108], optimizing employee hiring [59], and automating virtual agent customer service [2].

We define RAI as the ability to implement AI and ML [40, 79] models that transparently explain data inputs and predicted outputs while maintaining fairness (i.e., mitigating bias and harm). RAI focuses on ensuring the ethical, transparent, and accountable use of AI technologies in a manner consistent with user-expectations, organizational values, societal laws, and norms [41]. RAI is also often comprised of a set of principles for organizations, such as fairness, explainability, privacy, and security which are present in the key literature [43, 54, 62, 95, 115]. Fairness defined in this context as equity for the individual stakeholders engaged in the given interaction is the foundation for RAI in that it aims to remove bias from the AI decision process, which follows the definition for bias from the Equal Credit Opportunity Act (ECOA) (i.e. prohibiting discrimination in any aspect of a credit transaction) [57]. As defined, fairness and bias seem to have a negatively correlated relationship in that when bias rises, fairness declines [72]. Explainability and transparency provide a platform for evaluating fairness because if the AI process is explainable, then organizations are more apt to address bias removal [24, 74, 104]. Other key principles concern data privacy and data security encompassed by data management and data quality, which are paramount to the integrity of the AI process [80, 97]. There are also RAI principles that are related to management, accountability, and governance, however, an overarching and cohesive principle to measure the significance of these items is missing. We introduce a new category named ‘organizational commitment’ encompassing accountability, governance, financial investment, leadership, diversity, humanity, culture, employee engagement, and training.

We neither discount accountability nor consider governance RAI principles as unimportant. While accountability and governance are necessary RAI principles and present in nearly all the related literature, we argue that those principles are not sufficient to ensure that leadership commitment, culture, and financial investments are present to properly deploy RAI [89]. The rationale for including accountability and governance in the organizational commitment principle is to enable the instrument to measure a succinct list of key principles, while ensuring that there is a leadership focus on RAI. Organizational commitment encompasses and supports accountability by incorporating a culture of RAI and awareness training programs in mature RAI deployments [28]. The ‘organizational commitment’ principle may be the most critical component to focus on in developing RAI capabilities [21, 106].

RAI is critical for managing the complex web of compliance with ESG (environmental, social, and governance) obligations and business regulations [22, 78, 113]. While endeavoring to achieve competitive advantage and associated ROI [103], organizations must possess mature RAI programs that enable them to balance the tension between optimizing AI for accuracy and supporting fairness to serve an equitable, social purpose [12, 111].

With AI still in an initial adoption phase, RAI is only partially implemented in various organizational processes, affording limited opportunity to explain, interpret, and understand the nature of RAI [64, 65]. Due to the nascent nature of AI capability development, there are only scant public references on investment costs for implementing AI governance programs which can range from simple auditing [93] to fully mature RAI programs [3, 15, 66, 128]. However, the increasing rate of AI adoption has bolstered governance pressure to ensure that AI/ML has a core set of principles incorporated into organizational values [21, 38].

Most of the extant literature focuses on the impact of more narrow forms of AI (e.g., credit underwriting, automated customer service virtual agents, and employee hiring processes) or on productivity and profit [5, 9, 23, 116]. Scholars have noted various methods to assess RAI maturity [8, 11, 125]. For the purposes of our research, RAI maturity is represented by possessing the capabilities to address fairness and transparency in various organizational processes. Although a few works review the inherent biases [96] or describe RAI frameworks [125] or discuss RAII (Responsible AI Institute) certifications, these works do not deal with RAI measurement. The absence of an instrument to measure firm-level maturity of RAI is a critical gap in the pertinent academic literature. Such an instrument will accelerate empirical research on RAI and aide firms to uniformly implement RAI. We address this gap in the literature by developing an instrument to assess RAI maturity and argue that firms that exhibit robust RAI maturity should score highly in our RAI instrument.

We reviewed the RAI programs and capabilities that banks possess to manage the powerful risks and benefits of AI in credit lending through the development of a survey instrument to measure RAI maturity. We contribute to the literature in four ways. First, we introduce a novel instrument with which to measure RAI maturity by building on extant RAI principles literature. Second, within the instrument, in addition to incorporating extant commonly accepted principles from the literature, we introduce a new RAI principle named ‘organizational commitment’. Third, we created a supplementary proxy instrument leveraging archival data to satisfy the construct and convergent validity focus of our research. Lastly, we analyze RAI and ESG, and conceptually separate these related, but different measures in support of providing evidence for discriminant validity.

2 A prefatory note on the banking industry context

When employing the power of AI, corporations risk introducing biases into automated decision-making, which attracts scrutiny of regulatory agencies [27, 112]. Regulators are concerned with the Fair Credit Reporting Act (FCRA) and the Equal Credit Opportunity Act (ECOA) for fair lending. Moreover, the Supervisory Guidance on Model Risk Management (SR11-7) defines Federal Reserve (Fed) governance as the regulation inherent in technology model usage in fair lending. This regulatory oversight is similar in nature to the 2018 GDPR (General Data Protection Regulation) present in the EU (European Union). Ethical bias in data and algorithms can pervade the technical capabilities in various ways, including errors, oversights, or unintended consequences with the concern that AI deployed on a large scale can adversely impact certain groups and individuals [82, 96]. There are different types of data to consider for fairness in credit decisioning. For example, there are data related to credit bureau information and application data about the potential borrower as well as alternate data that generally possesses additional attributes that may be leveraged by the creditor [67, 84]. While there are risks of potential bias in both types of data, the larger risk exists in alternate data [61]. This rapid advancement and adoption of AI is taxing the capacity of banks to leverage the emerging technology while ensuring compliance with regulatory measures [47, 73].

Banks earn a significant amount of their profit through credit lending in mortgage underwriting, providing auto loans, and issuing credit cards [1]. Each of the credit lending underwriting decisions effectively assesses the risk that the individual borrowing the money will not re-pay the loan [92, 123]. Credit risk can be defined as the risk of potential loss to the bank if a borrower fails to meet its obligations (interest, principal amounts) and is the single biggest risk for a bank [73].

Decision makers involved with the credit lending process rely on the AI analysis tools to balance profitability with fairness in the credit lending decisions [13]. AI technology is computationally efficient in traditional logistical regression-based credit systems and also enables banks to perform advanced analytics on each potential borrower at near instantaneous speed. One of the key data elements used in analysis is the credit score managed by FICO (Fair Isaac Co.) [71], which has raised the alarm of fair lending bias risk with borrowers as well as activist groups [30].

3 Responsible AI (RAI)

RAI is about being responsible for the power of AI [41]. Coeckelbergh [33] conceptualizes RAI as AI Ethics. RAI is acutely relevant when human decision-making is delegated to AI [127]. RAI focuses on ensuring the ethical, transparent, and accountable use of AI in a manner consistent with fairness to stakeholders, as well as upholding organizational values and societal expectations [21, 81].

One of the key aspects of governing AI or determining whether AI is “responsible” is understanding the transparency and interpretability of the algorithm and model defined as explainable AI (XAI) [74, 104]. The design must engender trust and provide explainable transparency for the results from the data, model, and algorithm [120]. By providing clarity into the governance of AI components, RAI allows organizations to innovate responsibly to realize the transformative potential of AI [105].

Another key element of RAI is the care in the data lifecycle (data selection, data collection, and data management) such that bias does not creep into the overall system [98]. RAI is most critically needed in cases where there is a potential for bias or errors in the data, models, programs (algorithms), data training, and ensuring the proper governance controls are in place [31]. AI results should be examined for bias and provide evidence of fairness [113]. This includes cases where bias exists, but firms can offer some counterfactual explanatory evidence to support the AI decision [91].

Lastly, monitoring of the solution deployment is important to ensure that the results being generated match the intended design and adhere to the established governance [31, 63]. There are a few key elements involved in deploying AI in the organization such as abilities to maintain control, demonstrate fairness, ensure responsibility, and practice accountability for the capabilities [14, 41]. As part of the ongoing maintenance of the RAI programs, there are some components that need to be incorporated into an effective RAI program, such as performance drift monitoring [77], operational bias review [113] and model training [3].

Our review of RAI is built upon key research of RAI principles [18, 29, 38, 89, 115]. In terms of works which aggregate RAI principles, Jobin et al. [62] list 84 sources and summarize RAI principles into 11 categories comprising the most comprehensive and authoritative summary available. Another extensive summary was published in a matrix form [54] which listed 20 RAI principles. A work that contains a similar matrix summarizing both principles (listing 12 RAI principles) as well as fairness toolkits was published by the IFC EM Compass [95]. Lastly, another work published 8 RAI principles’ categories and provided detailed rationale for each which is a technique that we leverage in our instrument explanation [43].

4 A measurement instrument for RAI

The survey instrument for measuring the maturity of RAI capabilities is presented in Table 1 of which contains the RAI maturity instrument in panel A and a brief measure of ESG for the purposes of subsequent validity testing in panel B. The instrument we aim to advance is structurally similar to a bias governance questionnaire instrument [32], and follows a known instrument development methodology [117]. The detailed rationale and key references are listed in Table 2.

Table 1 RAI instrument structure
Table 2 RAI instrument attributes

We performed an analysis on the most common RAI principles which served as the foundation for the initial categorization of the instrument. This resulted in the five categories (i.e., organizational commitment, transparency, fairness, data management, and security) for the RAI instrument. Three of the categories (transparency, fairness, security) align directly with Jobin et al. [62], Hagendorff [54], Myers and Nejkov [95] and Fjeld et al. [43]. The two other categories (organizational commitment and data management) partially aligned to existing principles. We incorporate the ‘governance and accountability’ principles into a new principle called ‘organizational commitment’, as we argue that companies with strong support of RAI from the executive leadership will have more mature RAI programs [106]. Components such as leadership focus, financial investments, accountability, and culture are inherent in the organizational commitment category. In addition, financial investment, employee training, diversity priority, humanity focus, and financial ROI are central to governance and accountability within organizational commitment. RAI must include diverse participation to ensure that the AI systems will meet their societal and ethical principles [41]. Measuring these attributes in the ‘organizational commitment’ category creates a novel contribution to the RAI principles standard and differentiates our instrument from other RAI assessment frameworks [4]. The following sections describe the instrument categories and provide rationale supported by references to the relevant literature for the components.

4.1 Organizational commitment

There are evolving models of how organizations will manage the collaboration of AI and human capabilities [36]. Understanding how to leverage RAI in an organization requires a broader integration of the social environment within which the AI operates [31]. Banks that possess high organizational commitment to RAI will have a Center of Excellence (COE) team focused on Responsible AI, allocate and measure significant financial investments, conduct formal training programs, incorporate a decision framework for AI, have meaningful engagement from top executives, and a focus on the culture of RAI [94]. There were elements of organizational commitment present in other summaries of RAI principles  [43, 54, 62, 95], such as ‘accountability’, ‘governance’, ‘culture’, ‘diversity’, and ‘humanity’, which we incorporated in this new RAI principle and category to measure the focus of the organization [21, 106].

4.2 Transparency

Transparency (including interpretability and explainability) remains one of the most important areas within RAI governance [23, 31]. Firms are challenged to manage AI models that are explainable, interpretable, and understandable [18]. Banks that possess strong explainability and transparency will have a formal explainability governance process, a regulatory sandbox, model audit controls, model drift monitoring, potentially use of knowledge graphs and model cards, and lastly explore advanced ML techniques such as SHAP and LIME explainability. Transparency was present in each of the other referenced principles’ summaries [43, 54, 62, 95].

4.3 Fairness

Bias is inherent in human cognition and an unavoidable characteristic of data collected from human processes [41]. Bias manifests itself in scenarios where the result or action of a decision is perceived as unfair [7]. Though each of the following types of data are not equally relevant to credit underwriting; criminal records, bill payment history, education history, and residential address are examples of data that may contain or lead to bias [33]. As a result of this recognized potential for bias and discrimination [99], the regulatory agencies are focused on the Fair Credit Reporting Act (FCRA), and the Equal Credit Opportunity Act (ECOA). SR (Supervision and Regulation) 11–7 (Supervisory Guidance on Model Risk Management) from the Fed and Office of Comptroller of the Currency (OCC) explicitly defines the required risk management around credit lending. Banks concerned with fairness in credit lending will possess capabilities that focus on robust policy review, mitigating proxy discrimination, assessing training data management, understanding to what extent humans [58] are involved in the decision-making process, legal and regulatory considerations, and action plans for when the models and algorithms run amok. Fairness was also present in each of the other referenced summaries [43, 54, 62, 95].

4.4 Data management

AI systems use data that is generated through life, mirroring varied attributes which make it susceptible to containing bias [41]. Data privacy is one of the most important aspects of responsible data management [119]. Managing privacy while performing ML techniques that do not expose personally identifiable information (PII) is present in differential privacy capabilities [42]. Banks concerned with good data management ethics and governance will focus on data privacy as well as differential privacy, leverage data pipeline tools as well as data Ops, EDA for pre-modeling, knowledge graphs for visual representation of data, big data lakes, generating synthetic data (though nascent and sometimes debated) [26], enabling the right to be forgotten, and have Chief Data Officer (CDO) involvement [80]. In terms of other references to this principle and category, there was partial alignment with Jobin et al. [62] and Fjeld et al. [43] in terms of data privacy, partial alignment with Myers and Nejkov [95] who focused on data validation, and no clear alignment with Hagendorff [54].

4.5 Security

Elements of security to prevent intrusion and guard vulnerabilities are paramount to ensuring fairness, safety, and privacy in RAI [97]. In making the capabilities of AI explainable and transparent, maintaining user trust and individual privacy is critical [119]. Malicious actors may insert data into training or production environments, which could result in a data poisoning attack [60]. In terms of security elements, banks focus on capabilities that mitigate intrusions, ensure data encryption, enable processes for handling models should they become infected or fall into unintended possession, enable the ability to control the algorithms, and assign special security for production runtime environments. Security was present in the other referenced summaries [43, 54, 95], however, the concept was listed in terms of privacy in Jobin et al. [62].

5 Validity and reliability assessment

To advance the RAI instrument, the RAI trait and ESG trait are required to be represented from two different sets of data from different methods. Since there were no other extant RAI scores, we derived a secondary (proxy) instrument by coding publicly available archival data (Table 3). In addition, for purposes of testing discriminant validity in multiple ways we added an ESG trait survey panel to our instrument interviews to collect the ESG (instrument) score and additional ESG scores were obtained from Sustainalytics. Panel A collected data from the banks for the key categories and attributes of the RAI (instrument) score, and Panel B collected data for the ESG (instrument) score. As described, ensuring that the multi-traits of RAI and ESG were evaluated using the same method satisfied the multi-trait mono-method requirement.

Table 3 Proxy RAI instrument

5.1 Data description

The data sample for the instrument was collected from and represented by 48 of the 56 top US Banks listed in “ADVRatings—Top 50 Banks in America”, representing more than 85% of the population of large banks and a significant portion of the credit lending (credit cards, mortgages, and auto loans). The top banks are typically public companies, highly regulated by the Fed and OCC and have ample resources to employ mature RAI. Banks are motivated on this topic as they field social pressure for fairness and represent a good current state proxy analysis on the industry’s RAI capabilities. The data about the banks (collected in early 2022) was comprised of two RAI data elements (RAI (instrument) score and RAI (proxy) score) as well as two ESG data elements (ESG (Sustainalytics) score and ESG (instrument) score).

5.2 Validity

Measurement instruments can be evaluated for different kinds of validity. While face validity (which measures the degree of assessment effectiveness) and content validity (which measures the effectiveness of the construct) are supported by subjective evidence and argument, construct validity (which measures the effectiveness of the concept design) must be tested and confirmed empirically. We conducted validation interviews with over 40 banks’ executives related to the MRM (Model Risk Manager) function to obtain rich insights to enhance the instrument. Interview questions, categories, and/or detailed measurement attributes in the instrument that were deemed irrelevant or had critical components missing were addressed in the final version of the instrument. Our instrument attained face validity and content validity through this process.

To test construct validity, we leveraged the multi-trait multi-method matrix approach introduced by Campbell and Fiske [25]. In this approach, the presence of construct validity (i.e., both convergent and discriminant validity) is observed if the following two conditions are satisfied:

  1. (i)

    The correlation derived for a given construct (i.e., mono-trait) but scored through two different instruments (i.e., multi-method) exceeds both (a) the correlation comparing varied constructs (i.e., multi-trait) assessed through the same instrument (i.e., mono-method) and (b) the correlation comparing different constructs (i.e., multi-trait) calculated through alternative instruments (i.e., multi-method).

  2. (ii)

    The correlation derived for different constructs (i.e., multi-trait) computed through the same instrument (i.e., mono-method) exceeds the correlation between different constructs (i.e., multi-trait) scored through alternative instruments (i.e., multi-method).

To test for convergence and discriminant validity, we correlated the two RAI score (RAI (instrument) and RAI (proxy)) traits with two ESG score (ESG (instrument) and ESG (Sustainalytics)) traits. To satisfy the requirement for convergent validity, the RAI instrument must be significantly correlated with a conceptually similar construct (i.e., the RAI (proxy) score for the purposes of this study).

In the case of discriminant validity, the variables must not be as highly correlated with a seemingly related but conceptually different construct (e.g., ESG score of any format). The conceptually different construct can be measured with the same method (i.e., ESG (instrument) score) or a different method (i.e., ESG (Sustainalytics) score). In the first case, the test is for a similar method, but different traits, accomplished by correlating an RAI (instrument) score with an ESG (instrument) score.  In the second case, the test is for different traits and different methods, accomplished by correlating an RAI (instrument) score with an ESG (Sustainalytics) score or correlating an RAI (proxy) score with an ESG (instrument) score.  In addition to satisfy the second condition, the test is to confirm that the different traits using the same method correlate higher than different traits in different methods.

In the MTMM matrix in Table 4, the study displays correlations of RAI (instrument) scores and RAI (Proxy) scores (mono-trait multi-method), RAI (instrument) scores and ESG (Sustainalytics) scores (multi-trait multi-method), RAI (instrument) scores and ESG (instrument) scores (multi-trait mono-method), and ESG (instrument) scores and ESG (Sustainalytics) scores (mono-trait multi-method).

Table 4 MTMM matrix

We find that the mono-trait multi-method correlation of the RAI (instrument) score and RAI (proxy) score is significantly high at (r = 0.882), demonstrating strong evidence of convergent validity. We then test the first condition and compare the mono-trait multi-method of (RAI (instrument) score and RAI (proxy) score) of (r = 0.882) with the multi-trait mono-method correlation of (RAI (instrument) score and ESG (instrument) score) of (r = 0.553) and satisfy a first case (i)(a) of discriminant validity. Next, we compare the mono-trait multi-method of (RAI (instrument) score and RAI (proxy) score) of (r = 0.882) with the multi-trait multi-method correlations of (RAI (Instrument) score and ESG (Sustainalytics) score) of (r = 0.135) and (RAI (proxy) score and ESG (instrument) score) of (r = 0.398) and satisfy a second case (i)(b) of discriminant validity. Lastly, we test the second condition (ii) and compare a case of multi-trait mono-method (RAI (instrument) score and ESG (instrument) score) of (r = 0.553) with multi-trait multi-method (RAI (instrument) score and ESG ((Sustainalytics) score) of (r = 0.135) and (RAI (proxy) score and ESG (instrument) score) of (r = 0.398), comprehensively satisfying the criteria for discriminant validity. The MTMM analysis demonstrates clear evidence of construct validity (concept design accuracy), highlighting both convergent validity (which measures how closely a test is related to other tests of the same construct) as well as discriminant validity (which measures the extent to which a test is not related to other tests of a different construct) of the RAI instrument.

5.3 Reliability

With the RAI instrument receiving support for convergent and construct validity, we utilized two different statistical techniques to ensure that the RAI instrument contained the most relevant measurement elements and satisfied internal consistency reliability.

First, we computed Cronbach’s alpha (which measures the internal consistency of items in the survey scale) on the RAI instrument with a goal value of α > 0.7. This test was conducted on each of the five instrument categories to assess the degree of cohesion of the attributes of each category. For the RAI instrument data, Cronbach’s alpha for Organizational Commitment (containing seven items) is 0.898, Explainability (containing seven items) is 0.951, Fairness (containing six items) is 0.931, Data Management (containing nine items) is 0.947, and Security (containing five items) is 0.892 providing significant support of internal consistency reliability.

Second, we employed confirmatory factor analysis (CFA) (which measures how well the variables represent the number of constructs) with a goal factor loading threshold of 0.5 [75]. We computed the CFA for the entire RAI instrument data as well. The CFA for the 31 factors ranges from 0.638 to 0.873 with all factors exceeding the 0.5 loading threshold. Overall, the scores from both Cronbach’s alpha as well as CFA statistics offer convincing evidence of internal consistency reliability of our RAI measurement instrument.

For the RAI proxy measure, we engaged two raters and provided instructions to each rater separately to interpret the data and record 1, 0.5, or 0 if evidence is found for the attribute. Each rater coded 384 items (i.e., 8 items per bank for 48 banks) as there is a review and judgement to be conducted for each attribute on whether the evidence is true or false in meeting the criteria to record a 1, 0.5, or 0 accordingly. The coding value of 1 was recorded when the full criteria of RAI was met; the coding value of 0.5 was recorded when AI was present, but RAI was not; the coding value of 0 was recorded when neither RAI nor AI was present. The initial inter-rater agreement was 97.7%, and even after accounting for chance correlation between our two different raters [34], the Cohen’s kappa coefficient (which measures the inter-rater reliability for categorical items) is 0.965 (p < 0.001). A Cohen’s kappa coefficient above 0.60 shows acceptable levels of inter-rater reliability [69]. The very high kappa coefficient achieved in this study provides strong evidence for inter-rater reliability of the RAI proxy measure.

6 Discussion and conclusions

AI is a powerful and rapidly evolving technology that many corporations are adopting, creating a race between reaping the benefit of the capability and addressing the RAI governance for fair AI deployment [45]. RAI has also become a critical topic over the past decade in conjunction with a focus on responsible business [68], and a focus on DE&I (diversity, equity, and inclusion) as well as fairness driven by bank’s ESG (environmental, social, governance) agendas [85, 86]. Due to the profitability of credit lending (mortgages, auto loans, and credit cards) for banks, there is a constant push for innovation and efficiency [1, 13]. This study has addressed a gap in the industry by inventing a new measurement instrument (RAI) with which banks can now assess the maturity of their RAI capabilities.

6.1 Theoretical contributions

The key contribution of this paper is the introduction of a statistically valid and reliable RAI measurement instrument that organizations can deploy to assess the RAI maturity in their AI capabilities. The study incorporated the following categories (explainability, fairness, data management, and security) based on the ‘referenced ranking score’ from the analysis of RAI principles, which was a comprehensive review of the major RAI references from Jobin et al. [62], Hagendorff [54], Myers and Nejkov [95] and Fjeld et al. [43] as well as published RAI principles from 33 AI focused organizations listed in the Appendix.

Notably, as a second contribution, we add a novel RAI principle and category named ‘organizational commitment’, which incorporates elements of accountability, culture, strategy, investment, and decision-making, which we believe is paramount for organizations in their quest to leverage ethical algorithms to develop mature RAI capabilities [82, 90, 106]. This new RAI principle as an instrument category contributes to the literature, as it brings focus to leadership around ethics in technology and a commitment to RAI, amplifying extant principles of accountability and governance [21].

A third contribution we make is the creation of an additional assessment tool (RAI (proxy) instrument) which reviews key indicators of leadership focus on RAI. This contributes to the literature, as this additional instrument enables a different method of assessing RAI maturity. The RAI (proxy) instrument also lends itself to the ability to scale widely, since it is based on publicly available archival data, automated context analysis tools could scour the internet and assess various companies for their RAI maturity.

Lastly, a fourth contribution is the distinction between RAI and ESG as reported by the results of the MTMM analysis. RAI and ESG are seemingly related and in fact researchers may argue that RAI is part of ESG [85, 86]. RAI and ESG do have a shared component in that they are both increasingly subject to formal regulation imparting legal implications for firms that deviate from the requirement [22, 28], however, there is a clear distinction in our research. Through the validity tests of the instrument and the initiative of measuring ESG in the same method demonstrating relatively lower correlation between the (RAI (instrument) score and the ESG (instrument) score) (leveraging multi-trait mono-method) compared with the (RAI (instrument) score and the RAI (proxy) score), we establish a clear contrast between RAI maturity and ESG. In our research, we found that nearly all banks had a published ESG report, however, only a handful had their RAI principles published. The contribution of this potential paradox highlighted in our findings of this special intersection may motivate researchers to explore the relatedness of RAI and ESG in more detail.

6.2 Applied implications

With the introduction of the RAI instrument, banks can assess the degree of maturity they possess in their RAI. There are multiple implications that are enabled by the ability to measure RAI, especially considering recent innovation around generative AI exemplifying the power that the AI technology possesses. First, advancement of this new instrument will enable banks to highlight investments and capabilities in their RAI programs for customer acquisition. This implication is important as in the context of financial borrowers seeking an unbiased and fair credit lending process, the bank may use the RAI maturity score to craft positive messaging about fairness in lending in their marketing and advertising.

Second, we posit that RAI will increase in importance due to additional regulatory governance over AI usage and considerations for liability [22, 27, 124, 128]. There is broad speculation that formal regulation will be introduced to mandate transparency in the form of explainability and model auditing processes (i.e. Algorithm Audit) [66]. We do not assert that banks have ignored the ethical risks to date and in fact the banks have attempted to create bias free application processes and credit worthiness evaluation data [72]. However, the introduction of alternate data increases potential for bias due to the additional data attributes that are considered [61]. This raises a question on whether RAI is merely forcing banks further regulatory consideration. While the traditional scoring has been efficient for the credit process, we argue that RAI is indeed advocating additional transparency through explainability, thus illuminating the discussion [27, 78]. In addition, with momentum building for the 2022 Algorithmic Accountability Act sponsored by US Senators (Wyden and Booker), and other regulatory measures such as “truth, fairness and equity in AI” [22], banks would be well advised to proactively build these explanatory capabilities for regulatory requests. The RAI instrument can serve as a communication tool for banks and regulators to align on maturity assessments and action plans to enhance fairness in credit lending.

Third, there are a few potential areas of implications for bank stakeholders that could benefit from leveraging the RAI instrument. Similar to ESG and CSR [50], RAI could influence how both institutions and individuals invest [85, 86]. RAI could also impact how investment research analysts write about the stocks because environmental and social impacts are expected to influence stakeholders. In terms of fairness and SRI (socially responsible investing) principles, there seems to be synergies with the conceptual nature of how social issues can impact investments [46]. Following ESG precedence, investors could research the RAI score for a bank to determine worthiness. In addition, the analysts who cover stocks could leverage the RAI to improve accountability. From a strategic growth perspective, focusing on fairness in credit lending enables the bank to leverage the RAI assessment score to advertise alignment with ESG and CSR statements.

Fourth, the implication for the borrowers is significant, as with the industrialization of an RAI instrument, the banks can both comply with regulatory definitions for fairness in lending [114], as well as optimize profitability for the bank, resulting in more borrowers receiving loans. Making decisions responsibly is key to future leadership [102], and the ability to assess the RAI capabilities will serve executives well.

Lastly, we employed a CMM (capability maturity model) analysis to generalize the maturity of the banking industry and reported a mean of 53.74% across the distribution of banks. The CMM model had the most frequency in the “Operational” level of maturity, which was described by Gartner and Panetta [48] as “AI in production, creating value by e.g., processing optimization or product/service innovations”. With AI becoming ever more pervasive, leadership decisions for responsible business will increasingly be data driven through AI [118]. We argue that there will be significant investment by banks to improve business process efficiency and productivity through the AI, in turn driving a continued focus on RAI [49]. The evidence provided through this lens of the data illustrates the state of maturity of RAI in the Banking industry in 2022 creating a call to action for banks to focus on enhancing RAI maturity.

6.3 Limitations and future research

We develop a novel RAI instrument as well as leverage the instrument to assess the maturity score of RAI capabilities in banks. The survey was customized for the Banking industry since the validity and reliability tests were done using banking industry survey and archival data. Since we review the RAI programs specifically regarding banks’ credit lending, we particularly focus on the ML of the credit underwriting algorithms, models, and data that are associated with credit lending decisions. This is a limitation since in its current form, the instrument is not generalizable to other industries, however, with some minor modifications, the instrument could be made more generic or could be tailored to other industries, as some instruments of different nature originate in this capacity.

While we conducted a comprehensive review of the RAI principles included in the instrument including a careful review of Jobin et al. [62], Hagendorff [54], Myers and Nejkov [95] and Fjeld et al. [43], it is certainly possible that other principles could be incorporated that are deemed more relevant to a tool of this nature.

The RAI instrument survey collected data from the MRM (Model Risk Manager) bank executives as a self-assessment score and not a researcher interrogation of the actual environment, thus this could be perceived as a limitation. In addition, a similar limitation exists in the Proxy RAI instrument, as the study collected data from public archival data into the eight categories the corresponding researcher chose. It is possible that there could have been different categories to record the data.

Another potential area for critique of the instrument is the scoring calculation we designed. Researchers may debate that some principles are more important than others. Transparency and explainability may be deemed the most important instrument category and deserve to be weighted higher [20, 35], and others may contend that the ‘organizational commitment’ category should bear more weight [21, 106]. Our testing did not find evidence for weighting one category or attribute more heavily than another, therefore, we concluded that the RAI instrument should maintain equal weighting for all the categories and attributes.

The future research opportunities for this RAI instrument are significant. First, this RAI instrument could be generalized outside of banking. Second, this RAI instrument score could become a standard independent variable for future research to predict other aspects of companies, for example a correlation with corporate financial metrics, ESG-CSR scores [104], brand reputation indices [101], or TMT diversity [107]. Third, there could be studies on how AI is impacting decision-making in terms of capability investment. For example, reviewing how banks invest in fraud detection AI capabilities as compared with how they invest in RAI capabilities.

Finally, if responsibility becomes a significant measurement for investing, partnering, or buying from responsible firms, RAI could be a new key indicator to a possible contagion effect of a higher standard for responsible business. In fact, a body of research around building responsibility into the design of key decision-making processes and capabilities is underway with next generation firms [87, 118].