Introduction

Whether planning for retirement or borrowing for an investment property, the increasing complexity of financial decision-making stresses the need for personal financial advice. Unfortunately, the high cost of personal advice and a lack of confidence in engaging with financial advisors keeps it out of reach for many, see Robb et al. (2012). This lack of confidence can be attributed to low levels of financial literacy and a general distrust of financial (advice) services. A panel study by Egan et al. (2019) justifies this lack of trust. Tracking the universe of US financial advisors (Investment Adviser Representatives, or IAR) over a decade from 2005 to 2015, they find 7% of IARs with a misconduct record. Half of them lost their job, but many of those who did were re-employed in the same sector within a year. Even more concerning, many of the re-employing firms (Registered Investment Advisers, or RIA) ‘specialise’ in misconduct and target vulnerable clients. Consistent with those findings, Limbach et al. (2023) documents a persistent decline (from 1978 to 2016) in trust of US finance professionals. Over that period, the US public lost trust in many service professions, but not to the same extent as in financial services.Footnote 1 Systemic conflicts of interest and other recurring ethical misconduct diminished trust in the financial advice profession. For example, the Financial Crisis Inquiry Commission (2011) found increasing instances of ethical misconduct both before and after the 2008 financial crisis diminishing trust in US financial services. More recently a 2019 surveyFootnote 2 of active (i.e. current) and prospective advice clients finds increasing advice aversion following the 2018 government inquiry into misconduct allegations in Australia’s financial (advice) services. The active and prospective clients expressed a similar decline in trust across the financial advisory sector. However, active clients continued to trust their own advisor. To convince those active clients to switch from traditional to robo-advice may be as difficult as convincing first-time prospective clients to seek robo-advice.

Technology-enabled financial advice is still relatively young. Financial spreadsheeting entered the advice sector lexicon in the late 1980s supporting advisors with portfolio optimisation software–the assisted (or also known as the analog) phase of financial advice. The history of digital financial robo-advice only starts in 1998 with Nobel laureate William Sharpe leading the development of software for retirement planning and fund picking software, implementing his (and others’) financial theories of portfolio optimisation and simulation in practice–the augmented phase. Discontent with recurring misconduct in the financial advice sector, they established Financial EnginesFootnote 3–arguably the first true robo-advisor–with a fiduciary duty to its advice clients. The pace picked up with the launch of US robo-advisories Betterment and Wealthfront in 2008, joined by Charles Schwab’s intelligent portfolios in 2015, and Vanguard’s digital advisor in 2020–as robo-advice evolved from a portfolio optimisation model supporting a human advisor to a financial decision maker who no longer requires any human intervention–the autonomous phase where all decisions including implementation of the financial advice plan are made by the AI robo-advisor.

With a steady supply of aspiring robo-advisors in the US and globally, the adoption of robo-advice has increased significantly. However, the uptake by those who would benefit most has been disappointing. The low income, financially excluded, and often financially illiterate households now have access to affordable personalised advice but do not use it. According to Deloitte’s (2023) report, they rely instead on (un-registered and un-licensed) advice from social media peers–as they believe that advice from registered financial advice requires them to make a substantial investment towards a financial plan portfolio. The attraction of low-cost robo-advice seems offset by structural factors impeding trust in, and understanding of, financial advice. In reducing cost and improving financial returns/wellbeing, robo-advice should have become mainstream, but it has not yet achieved this.

Despite this ongoing financial advice exclusion, the Deloitte report claims that robo-advice has opened the door to democratisation of finance by making quality personalised financial advice accessible at minimal cost. Supported by low-cost digital infrastructure, robo-advice removes the need for human intermediation (financial advisors, planners and brokers), see Harshman et al. (2005). Indicative estimates of the cost of personalised robo-advice vary from 25–35 basis-points (bps) as compared to an average 100 bps for traditional human advice.

Robo-advice has a clear cost advantage over traditional advice, but is there also a difference in performance? D’Acunto et al. (2019) finds that adopting robo-advice could have opposing effects on the returns to clients depending on their existing level of diversification before adoption. Formerly under-diversified investors increase their portfolio diversification in terms of both the number of stocks they hold and the market-adjusted volatility of their portfolio. They display higher performance in terms of market-adjusted trade returns and market-adjusted portfolio returns. Instead, investors that are highly diversified before adoption do not change their diversification. They trade more, but this does not translate in better performance. Robo-advice delivers better (or at least the same) diversification, is more likely to find an optimal portfolio which meets client suitability and risk aversion, and finds a better (or equally good) efficient portfolio.

Switching and joining robo-advice will take more than a consideration of cost and return. Adopting robo-advice requires a high level of trust. That is, trust in chatbots, in algorithms, in autonomous decision-making, in fair and un-biased treatment, in cyber-security, etc. All of these trust issues (or ethical concerns) need to be resolved for robo-advice to be trusted. This initial trust is established on the advice client’s first encounter with a robo-advisor–at which time the client has the least amount of knowledge about the ethical conduct of the robo-advisor. In a cross-country survey of young retail investors, Nourallah (2023) finds that initial trust in robo-advisors is positively correlated with the young retail investor’s propensity-to-trust. But the strength of that relationship is moderated by financial, social and behavioural factors like saving tendency, and return performance expectancy. Where these factors are at low levels, initial trust is low.

Figure 1––a variation on Nourallah’s conceptual trust model–illustrates the pathway from a client’s general trust propensity to initial trust in, and adoption of robo-advice. While cost and return are obvious explanatory variables, other behavioural factors like individualism and effort expectancy are also tested. One missing factor is financial literacy. To trust the complexity of (robo-)advice requires a minimum level of financial literacy. A study by Lachance and Tang (2012) finds that initial trust in financial advisors is first increasing with financial literacy when confidence is starting to build–then decreasing when the increasingly confident client starts to recognise poor ethical performance by the financial advisory sector. Figure 1 also differs from Nourallah by recognising the urgent need for robo-advice regulatory licensing and monitoring, tailored to the specific ethical concerns of robo-advice. The improved transparency will increase client (initial) trust.

Fig. 1
figure 1

A conceptual model for building trust in financial robo-advice

Once adopted, continuing trust in robo-advice is built on sustained ethical conduct over time. Unfortunately, ethical conduct and its translation into trust are difficult to observe, verify and quantify. We do know that ethical misconduct reduces trust and imposes costs on the client. In particular, the cost of due diligence and ongoing monitoring erodes the low-cost advantage of robo-advice. Leaving the (initial) trust verification to individual clients is unsatisfactory. It is neither efficient as every client would have to invest in their own due diligence, nor reliable as individual clients do not have access to all relevant information and lack the auditing skills to make an informed assessment of ethical conduct and trust. Regulatory intervention is therefore urgently needed, recognising the financial vulnerability of robo-advice clients and the cyber-risk(s) they are exposed to. In a study of the ethical concerns of AI auditing, Munoko et al. (2020) urges regulatory guidance and oversight for emerging (AI) technology in its early stages. That builds initial trust when most needed. But it also requires forecasting the ethical concerns that may arise but have not yet. The next part of this paper will discuss what a regulatory and ethical framework could look like for robo-advice. In an early study (well before AI robo-advice), Khalil (1993) lists three reasons for ethical concerns about AI expert systems: their lack of human intelligence, their lack of emotions/values, and their intentional or accidental bias. Thirty years later, these concerns have not yet been fully resolved despite calls for urgent action by Scherer (2016) and Haenlein et al. (2022).

To address the apparent lack of initial trust in robo-advice, this paper proposes a regulatory framework based on licensing and monitoring of ethical conduct. Identifying and classifying the ethical concerns in robo-advice, we introduce four main dimensions (ethics gateways): competition, competence, bias, and safety-net. These four ethics gateways capture the ethical principles underpinning trust in robo-advice. For that reason, robo-advice regulation needs to expand the set of traditional licensing criteria (like educational attainment, a clean misconduct record, etc.) with the ethics licensing criteria here identified for each gateway. Successful robo-advisory applicants will then be issued a provisional AI Robo-Advice License (AIRAL). To maintain this license, the robo-advisor needs to be monitored annually on its ethical conduct. This involves tracking four times ten ethics performance metrics, generating a score for each of the four ethics gateways. The gateway scorecards and an overall aggregate rating will be made public to active and prospective advice clients allowing them to make an informed choice of robo-advisor. The paper concludes with a discussion of the robustness of the ethics scoring model, model selection issues, and correlation of ethics scores with advice performance.

Ethical Gateways

In a perfectly competitive market for financial advice, complete information allows clients to distinguish high-trust from low-trust robo-advisors. Unfortunately, this market is less than perfectly competitive, exposing clients to uncertainty about the reputation and ethical conduct. At present, it is not clear whether AI advisory services will make the financial advice market more competitive in the long run. We do know that the robo-advice newcomers need initial support to successfully compete with a market concentration of traditional advice providers. That support could include (temporarily) waiving certain regulatory requirements. Unfortunately, through adverse selection, lowering the regulatory bar would increase client choice but also attract low-trust robo-advisors.

To tell the high- from the low-trust advisors is already challenging with human advisors but it is even harder with (new entrants providing) robo-advice. Due to the impenetrable nature of their algorithms and lack of track record or reputation, it is almost impossible for informationally disadvantaged clients to verify the true–rather than advertised–ethics of any particular robo-advisor. Spence’s (1973) signalling model to overcome the information asymmetry in the market for labour can be applied to the market for robo-advice. Here, the robo-advisor could give a signal validating its ethical conduct to the client. The signal/credential could be an affiliation with a reputable financial institution and ‘borrowing’ its reputation for ethical conduct. Or it could be a recognition of competency through educational attainment in AI ethics. The client then evaluates the credibility of the signal.

In the absence of credibly signalling ethical conduct, we propose a regulatory intervention by constructing four ethical gateways. Crossing the gateways requires exceeding a minimum level of ethical conduct in each of the gateways. Even then and among Registered Investment Advisers (RIAs) registered and licensed with the Securities and Exchange Commission (SEC) only time will reveal the RIA’s and IAR’s actual ethical conduct behaviour, possibly even after a probation phase. To avoid ethical slippage and moral hazard, we advocate for an annual ethical scorecard to publish the real-time ethical conduct of robo-advisors. Improving the ethical conduct transparency allows advice clients to avoid or abandon low-trust robo-advisors. Poorly scoring robo-advisors may also be sanctioned by the regulator by means of showing cause, suspending the license, or even revoking the license.

Annual review–an online, virtual audit–of the AI ethical performance metrics should be a minimum condition of the AI Robo-Advice License (AIRAL) maintenance. Compilation and verification of the ethical scorecard should be assigned to an independent and un-conflicted auditor appointed by, and reporting to, the regulator. While there would be a regulatory cost, the audit would harness the benefits of robo-advice while managing its ethical concerns. High-trust robo-advisors will benefit from improved client trust, increasing market share. Low-trust robo-advisors will be incentivised to improve their ethical performance. The information contained in the four gateways scores and aggregate rating will inform robo-advisors to focus on areas for improvement to bridge the gap with higher ranked robo-advisors.

Competition

The first gateway considers the impact on the competitive environment for financial advice. Financial deregulation has transformed the banking sector leaving it concentrated with global conglomerate institutions. The market for financial advice is similarly concentrated but also includes many (very) small operators. A concentrated market is typically characterised by high cost and limited client choice–with little incentive to innovate by the market leaders. Robo-advice has the potential to challenge this status quo.

Jung et al. (2018) document the progressive development of robo-advice. Early versions of robo-advice complemented face-to-face advice as a spreadsheeting tool supporting the human advisor (the assisted phase). This first phase of digital technology allowed the human advisor to focus on client preferences, communication, interpretation, and explanation. But these contextual human roles have now been challenged by the introduction of AI in digital technology. This is AI-2.0 in Zheng et al. (2010), or also known as the augmented phase. Using smart online surveys, the robo-advisor has the ability to learn about a client’s risk appetite and financial capacity. And by recording a client’s interview conducted by a chatbot,Footnote 4 facial and voice pattern recognition will soonFootnote 5 remove the need for a human advisor to read a client’s visual cues revealing risk attitude, aspirations, anxieties, comfort and happiness. Combining the client profile (soft data) with the client’s financial data (hard data) and financial market data (forecasts, volatilities, and scenarios), the robo-advisor designs, decides and implements the optimal personalised financial plan (the autonomous phase).

With each new phase of digital technology, the quality of personalised advice improves. But that comes at a cost. Competition is steadily eroded as the successful robo-advisory startups are being acquired by the financial conglomerates before they go to market. These dependent robo-advisors that were either acquired or launched by traditional financial institutions (e.g. Vanguard, Schwab, BlackRock, etc.) outnumber the truly independent robo-advisors. That will become more evident with the development of autonomous AI requiring a significant upfront investment, only affordable for the large financial institutions. AI-3.0 will be out of reach for startups and the financial institutions will pass the cost on to clients. Early empirical evidence by Abraham et al. (2019) suggests an already concentrated market structure in robo-advice with a single provider (Vanguard) accounting for 25% of the US assets under management in 2018–three times as large as the next provider (Charles Schwab). That level of market concentration increases the risk of anti-competitive behaviour with ethical concerns including higher prices, less client choice, a lack of innovation, insufficient cyber-security provisions, conflicted interests, lockup clauses, cross-selling of client data, etc.

Competence

The second gateway considers advice clients’ increasing knowledge deficit with robo-advice. Financial decision-making is increasingly complicated and challenging, even for those considered financially literate. Complex new features have been inserted in traditional financial assets merely to distinguish them for monopolistic competition purposes, see Shen and Turner (2018) and Baker and Dellaert (2018a, p.748) on using complexity to take advantage of clients. Increased complexity makes the client increasingly reliant on the advisory experts.

Of course, human financial advisors may also fail to fully understand the complexities of financial advice. Collateralised Debt Obligations, securitised assets, and complex compound and correlated derivatives are just a few examples where informed decision-making failed due to a lack of understanding the fundamentals. Add to that irrational and occasional erratic client behaviour, particularly when confronted with manifold uncertainties, and it seems obvious to leave decision-making to AI. But in handing over the reins of complexity are the financial experts also losing their ability to break down complex problems into intuitive parts that can be explained and understood by clients? The ‘black-box’ nature of AI will make that very difficult. At a minimum, it requires competence in financial advice, competence in high-level computer/data science, and competence in communication with clients. A qualitative robo-advisory user study by Zhu et al. (2023) finds that the client experience is often characterised by a lack of transparency and by incomprehensible information. Because of such incompetence, clients distrust the robo-advice. Aspiring robo-advisors will then find it difficult to signal their competence due to a lack of track record in an already crowded market.

Robo-advisors have a moral imperative to educate and to advise clients about the ‘inner workings’ of the algorithmic black box and the embedded risks of financial assets, transactions and services. Specialising in knowledge to explain financial complexity requires advanced layers of professional competence to communicate the basics of robo-advice and the specifics of the financial advice tailored to the client. Educational requirements for licensing–e.g. the compulsory Series 65 exam administered by FINRA (Financial Industry Regulatory Authority), a Self-Regulatory Organisation (SRO)–should reflect these new competencies. Carlander et al. (2018) find that competence is the main driver of trust in financial advice. Licensing of competence matters. Even more so for the programmers of robo-advice platforms, where trust in personal competency has been supplemented by trust in process competency.

It is worth noting that simply increasing the amount of information disclosure to clients would be counterproductive as a client’s bounded rationality will lead to information overload, Brunnermeier and Oehmke (2009). For that reason, rather than trying to explain what happens in the AI algorithm, Koh et al. (2015) propose a client communication work-around based on a complexity-rating framework. Algorithmic complexity arises due to the number of structural layers, the number of embedded derivative structures, the availability and use of known valuation models, the number of scenarios, and transparency and ease of understanding. As complexity increases risk, risk averse clients should be advised to stay clear of high complexity AI robo-advice.

Buckley et al. (2021) note that robo-advice black-box algorithmic complexity cannot be an excuse for unethical outcomes. Nor should responsibility be shifted to external regulators. For the ethical conduct of robo-advice, the Board of Directors of the robo-advisory firm need to take personal responsibility for due diligence and explainability, assured and supported by a robust internal governance structure. But note that procedures and codes of ethics alone cannot make up for inevitable poor outcomes, creating a trust discount. Martin and Waldman (2023) proposes governance with an appeal process that is robust against disappointing outcomes (say, a decline in the value of a robo-advice financial plan). The option to appeal creates a legitimacy dividend (or, trust dividend) for robo-advisory use of algorithmic decision-making under uncertainty–not unlike a warranty.

Bias

The third gateway considers bias in financial advice. To assure fairness and equity to all clients, the AI robo-advice algorithms need to be un-biased. That is not be prejudiced but strictly impartial. We know that traditional human financial advice can be subject to a variety of discriminatory, behavioural, cultural, and irrational human-induced biases–some of them are severe and overt. A study by Pethig and Kroenung (2023) indicates that women are more likely to trust an algorithm as they are routinely ability stigmatised by human advisors on their mathematics, coding, and computer science skills.

Human advisors argue that they actively un-bias advice, but an external auditing study by Mullainathan et al. (2012) shows that human advisors amplify biases, rather than un-bias. Robo-advisors claim that robo-advice is entirely un-biased because no human intermediary is involved. But that is not quite true. First, in ‘training’ the algorithm, robo-advice programmers use data from an existing (its own) client base. The composition of that data is therefore a reflection of past human biases. Worse than that, with stratified sampling, the algorithmic training can amplify the original biases. And second, there is the possibility that the programmers import their own biases. They may bring different biases, as programmers tend to have different character traits than the traditional financial advisors.

To trust financial robo-advice to be fair and equitable, the AI algorithm needs to be un-biased from all its possible biases. A 2019 report by the New Zealand Productivity CommissionFootnote 6 discusses the various types of bias that may appear in robo-advice algorithms–from deliberate bias, historic bias, accidental bias, unexpected bias, cultural bias, gender bias, human cognitive bias, etc.

Ideally, the AI algorithms should be made available to the clients (or an independent client advocate) to verify their value and flaws of the robo-advice. Of course, robo-advisors prefer not to disclose their trade secrets in an external audit for bias. For competitive reasons and the protection of Intellectual Property (IP) rights, mandatory disclosure of the algorithmic code seems unlikely and possibly counterproductive. Compulsory algorithmic open access could stifle innovation. A possible solution is a self-regulatory approach whereby the robo-advisors assume responsibility to self-report an internal audit to the licensing authority, see Turner Lee et al. (2019). An internal audit would avoid excessive oversight (“tightly prescribed algorithms for robo-advice”) but require the robo-advisor to keep records of the advice ‘black box’ and thus leave an audit trail to reconstruct the advice given in case of dispute, see Baker and Dellaert (2018b, p. 27).

Despite this light-touch regulation as a minimum condition for licensing, robo-advisors need to run bias detection scenarios, and “these would be standard (but changing and secret) individual scenarios” see Baker and Dellaert (2018b, p. 27). The robo-advisor should then report any biases and steps taken to remove them, to the regulator. This should be part of the initial licensing audit but should also be monitored over time for slippage and newly emerging bias. To do this in a coordinated manner, a comprehensive framework is recommended to identify, monitor, and eradicate AI bias, as in Klein (2020).

Safety-net

The fourth gateway considers the client’s safety-net when things go wrong accidentally or intentionally. In first instance, we would look for provisions made by the robo-advice provider as part of their internal governance process. Respondents in a survey study by Arthur and Owen (2019, p.9) give three reasons why a start-up robo-advisory firm would be motivated to voluntarily contribute make-good provisions: to realise growth expectations, to maintain corporate standing (reputation), and out of pride (self-fulfilment). By choosing fiduciary status from its early beginnings, William Sharpe’s Financial Engines embodied these motivations to self-regulate. Signalling trust through self-regulation is particularly important for emerging technologies where trust is hard to verify, shielding low-trust robo-advisors.

This first layer of the safety-net is therefore self-regulation.Footnote 7 Comprehensive self-regulation demonstrates integrity, instils client trust, and avoids moral hazard. But self-regulation is insufficient on its own, could be costly, is vulnerable to free-riding behaviour and adverse selection, and could be anti-competitive. A government-appointed regulatory authority is therefore needed to look after market integrity, market conduct, client interest, and enforcement of legal requirements. A regulatory authority is most visible to the public when enforcing compliance with the law and taking remedial action like prosecuting misconduct, revoking a license, imposing a penalty, or accepting a settlement. Misconduct is defined by law, which does not necessarily cover all forms of ethical misconduct. Targeting vulnerable clients (as found by Egan et al., 2019) is an example where the law falls short of ethical expectations.

But regulators also act pre-emptively to avoid more broadly defined misconduct:

  • By performing due diligence prior to granting a financial advice license.

At a minimum, robo-advisors need to comply with the same regulations as traditional human financial advisors, including the licensing rules, see Gurrea-Martinez and Wan (2021). In the US, robo-advisors must be RIAs. All the normalFootnote 8 due diligence checks apply (legal compliance, risk register, verification of provider identity, material information disclosure, governance framework etc.) in the approval process for licensing RIAs. Beyond traditional financial advice regulation, most financial services regulators (including the SECFootnote 9) give ‘guidance’ (that is, guidelines, recommendations, and opinions) to robo-advisory firms. Best practice would include the following regulatory interventions of relevance to robo-advice.

  • By rejecting or conditioning anti-competitive merger and takeover proposals.

Where there is a risk that competition fractures or fails, transparency disappears, and complexity reigns, regulators need to step in. Their role is to assure fair and equitable access to a level playing field, and to provide an opportunity for aggrieved clients to seek recourse and redress. As AI services are still in their infancy, incumbent financial institutions will seek out the most promising robo-advisors, buying them outright or under license as an affiliate entity. The ensuing market power can manifest itself in many ways. To avoid these behaviours while still approving mergers and acquisitions, the regulators can ask for enforceable undertakings and commitments.

  • By designing digital finance literacy training modules.

Financial regulators offer comprehensive financial literacy training, but this is mostly aimed at high school students teaching them basic (and very important) financial concepts like risk, compound returns, and diversification. Unfortunately, those childhood foundations would have diminished with time, just when the adults start making crucial financial decisions–like purchasing a home, investing in education, planning for retirement. When a digital finance literacy course is most needed, the curriculum and delivery is left to the (conflicted) finance industry. Yet the ease of access to digital finance and AI services makes it imperative that clients are fully informed before engaging with AI. Rather than regulator-provided financial literacy training, the AI providers could be compelled to educate new clients using a regulator-approved curriculum to mitigate the conflict of interest.

  • By auditing cyber-risk in privacy, identity, and data security provisions.

Digitally sharing personal data (including comprehensive records of financial data) significantly reduces any information asymmetry about a new client’s creditworthiness. Sharing those data has also made it easier for clients to shop around for better deals as incumbent financial service providers can no longer ‘monopolise’ a captive client’s credit track record. Unfortunately, it has also created opportunity for hackers to misappropriate private data at scale. While cyber-security is part of any corporate regulator’s due diligence process, AI robo-advice amplifies cyber-risk, with privacy concerns, data -security, and identity theft as major ethical concerns. Take robo-advice’s ability to link multiple client databases to construct a multi-faceted client profile. Improved insight into client profile will improve the suitability of personalised advice, but that will need to be traded off against the ethical concerns of data integrity, cyber-attacks, and privacy violations in accessing proprietary client data. Morey et al. (2015) notes that intelligent technology gathers comprehensive data and ‘personalises’ the data to deliver services that appeal to clients. But those clients would be concerned that the same information could be used by third parties.

The systemic nature of malicious cyber-attacks requires sector-wide pro-active risk mitigation coordinated by the regulator. Regular stress-testing, system checks, and public awareness campaigns are some of the features of cyber-security regulation, but shifting all responsibility to a regulator creates moral hazard. And leaving clients exposed to and even responsible for cyber-attack could be construed as unethical–as the financial services providers stand to benefit most from access to client databases unlocked by digital technology. Most jurisdictions now require mandatory privacy compliance (e.g. with the EU General Data Protection Regulation–GDPR) regulating access and manipulation of client databases. A related ethical concern is the monetisation of “big data”, the practice of selling robo-advice user data to third parties. Robo-advisors should only justify this practice if they can demonstrate it to be in clients’ best interest and require client consent and opt-out as a default setting.

Finally, AI is particularly adept at creating a client profile from diverse (un)structured databases. The ability to include facial and voice recognition (from voice recording to voice cloning) significantly increases the risk of identity theft–resulting in scams and fraud, and inevitably a loss of trust in (digital) identity altogether.

What Makes Robo-Advice Special?

The four ethical gateway domains also feature in the ethical assessment of traditional human financial advice, see Inderst and Ottaviani (2012). Digital delivery of financial services does not fundamentally change the hallmarks of ethical misconduct like conflicts of interest, commissions, and cancellation of contract practices. But robo-advice brings some significant differences.

First, Argandoña (2020, p. 19) notes that.

“…the ethics of [robo-advice] is … the same ethics of traditional finance [advice]. [F]airness in dealings with customers, equity in offering products or services to [clients], and respect for [clients]’ autonomy are duties that all must obey. … [financial practitioners]’ duties do not change. But the characteristics of [robo-advice] –abundance of data, complexity, relations, speed, etc.create new demands”.

Second, in each gateway domain, the ethical concern for robo-advice is AI-amplified. That would be most obvious with cyber-risk and identity theft, but it also features in market share concentration, and the required digital competencies.

Third, AI might give us a better understanding of the ethical concerns of ‘taking out the human advisor’. This is particularly relevant for the bias domain. We are aware of the existence of biases in human advice, but most advice clients ignore the implications. More sophisticated clients use heuristics to account for bias impact. With AI robo-advice, it is feasible to run controlled experiments (based on typical client profiles distinguished by age, gender, social and cultural cohorts) to identify bias impact. Likewise for the safety-net domain. Experimenting with AI data analytics in a regulatory sandbox will help design an optimal regulatory regime which allows for elements of self-regulation. For the competence domain, the focus is on requiring and developing new skills for the future financial advice workforce.

And fourth, when AI progresses from assisted to augmented to autonomous robo-advice, who is responsible for the consequences if things go wrong accidentally and unintentionally? The emergence of corporations shifted accountability from personal to institutional. But can you sue a robot or even an algorithm? Does accountability rest with the licensed robo-advisoryFootnote 10? But what if AI goes rogue in making autonomous decisions? Tóth et al. (2022) develop a framework for accountability of AI robots with decision-making discretion. Accountability could then be widely distributed including: the programmer, the code maintainer, the robo-advisory firm, the management of that firm, industry standards, and regulatory agency. Tóth’s study suggests not a single accountability, but clusters arising from different concentrations of responsibility. For example, if there is a systemic problem with the robo-advice code, then the weight of accountability will rest with the designers, the programmers, and maintainers of the algorithm, see Martin (2019).

Exhibit 1 Ethical principles for AI

Mapping Principles to Gateways to Scorecard

Having justified why the ethics of robo-advice differ from the ethics of traditional advice, we can describe the four ethical gateways (competition, competence, bias, safety-net) in terms of widely adopted AI ethical principles as illustrated in Exhibit 1.

Adopting these principles when applying for a robo-advice license may be construed as signalling ethical conduct. In the absence of a track record, there is significant information asymmetry, making it impossible for prospective clients to verify such trust signals, see Spence (1973). Figure 2 therefore proposes a regulatory framework of progressive resolution of the information asymmetry–from intent (for new entrants seeking a license), to commitment (for provisional licensees on probation), to ongoing reputation (for maintenance of license). Clearing the gateways unlocks a provisional AI Robo-Advice License (AIRAL) followed by a probation phase, and subsequent periodic (annual) monitoring of their ethical performance.

Fig. 2
figure 2

Robo-Advice regulatory framework. Figure 2 illustrates the pathway from ethical licensing to ethical monitoring. It is a regulatory overlay of Fig. 1’s trust model

Based on the four gateway domains, we design a balanced scorecard as introduced by Kaplan and Norton (1992) that will enable advice clients to make an ethically informed choice tailored to their specific circumstances. Publication of the scorecard will improve transparency of the (likely) ethical conduct of new and established robo-advisors. High performing robo-advisors can use their score to signal their ethics to current and prospective clients. Poorly performing robo-advisors will become ‘visible’ and must improve their compliance with ethical standards. The scorecard would also allow the licensing regulator to monitor the licensees’ ethical performance over time. If the licensee is at risk of no longer meeting the gateway standards, then the regulator could initially ask for a show cause. And if they fall below the line, their license could possibly be suspended or revoked. Market forces and licensing penalties then either diminish demand for low-scoring AI financial advisors or put them on a path of ethical improvement.

To populate the scorecard, we begin with a comprehensive list of ethical performance metrics organised by gateway. Table 1 lists forty metrics closely aligned with the principles underpinning each of the four ethical gateways. The metrics provide a balanced scorecard along the lines of Bieker and Waxenberger (2001) as they includeFootnote 11: short-term (SN-9) and long-term (BC-8) outcomes; internally (C1-10) and externally (C1-3) focused outcomes; quantitative (BC-5) and qualitative (BC-6) outcomes; and leading (SN-8) and lagging (C2-6) indicators of effectiveness. Robo-advisors can score 10 points in each gateway, for a maximum of 40 points in aggregate. Higher scores, better ethics, higher trust.

Table 1 Ethical performance metrics

The example in Table 2 illustrates the scorecard for a sample of four hypothetical AI robo-advice providers. Rather than just distilling their ethical performance in a single aggregate ethics score–out of 40, or 0 to 5 stars–we propose that the scorecard also reveals the four gateway dimension’s scores. Prospective and active clients should consider the trade-offs between the four gateway dimensions and understand the ethical risks of relevance to them (Fig. 3).

Table 2 Ethics scorecard
Fig. 3
figure 3

Ethics scorecard visualised

Robustness of the Ethics Scoring Model

Of course, the forty ethical performance metrics are not (all) uncorrelated–some mitigating, some amplifying the ethics score. Nor are these metrics equally influential. There may well be an ethical hierarchy hidden among them. Principal component analysis (PCA) or multi-factor analysis (MFA) may reveal a much smaller set of metrics explaining > 90% variation in the dataset. Unfortunately, in tracking performance in a small(er) number of components or factors, we may lose explainability. Including orthogonal rotation (generating uncorrelated factors), or non-orthogonal rotation (generating potential correlated factors may deliver better insight into the ethical groupings (domains). Cluster analysis may also be useful.

As the consequences–a suspended or revoked license–of misclassification may be severe, we need to test the scoring model on its robustness. That is, find the predictive model that minimises type 1 error (predicting ethical misconduct when it is not) and type 2 error (rejecting ethical misconduct when it is). Our ethics score framework can best be compared with the long-established credit scoring technology–where the weights on the regressor metrics are estimates in econometric models that predict the likelihood of ethical misconduct. This literature dates to the 1960s, exemplified by Altman’s (1968) study using discriminant analysis to predict the likelihood of bankruptcy, better known as the Z-score. Nowadays, financial distress or creditworthiness is more commonly estimated by logistic regression analysis, see Dumitrescu et al. (2021) for a concise history of credit scoring models. The advantage of logistic analysis is its simplicity in generating marginal effects of each predictor metric on the probability of ethical misconduct.

Further Work

Before we conclude, we raise three open questions. First, how does the ethics score correlate with the performance score of robo-advice (based on comparative risk, return, and cost)? Is a focus on ethics a costly distraction in the pursuit of prosperity? The higher the ethics score, the lower the performance? Or is paying attention to ethics value-enhancing, perhaps even indispensable with AI robo-advice? And even with positive correlation, is there evidence of diminishing returns to ethical compliance?

Second, how predictable is ethical misconduct? The answer, of course, depends on the quality and information contained in the scoring regressors. Regardless, regressors and dependent variable are measured with error. The model predicts the probability of ethical misconduct. At what probability level(s) does misconduct become likely, a near-certainty, a guarantee? There is also the issue that actual misconduct often goes unobserved unlike default in a credit scoring model. And if we observe it, how do we classify it? Some types of misconduct are much more serious than minor digressions.

Third, the ethics scoring model is based on the assumption that AI robo-advice sector is about to transform from augmented to autonomous AI and clients need to choose from an increasing list of aspiring providers. But how “artificially intelligent” is robo-advice at present? Can it learn, make decisions, justify, argue, communicate, and operate autonomously? Undoubtedly, robo-advice can evaluate a much bigger investment universe, maybe combine that with other informative data universes, and update all that information much faster into expected asset values today … but that does not eliminate client disappointment in the future when the robo-advice plan return falls short of expectancy. AI does not remove future uncertainty. It measures it more accurately and faster. All the more important for the robo-advisory to build a trust dividend signalling ethical conduct regardless of financial uncertainty.

Concluding Discussion

Insider trading, conflicted interests, fraud, incompetence, data breaches, … instances of ethical misconduct that undermine any attempt to improve public trust in financial advice providers. Before robo-advice, this trust was vested in financial institutions and in the human advisors. We have yet to see how the shift in client trust from human to AI algorithm will deliver a more transparent, fair, and just outcome for robo-advice clients. We argue that robo-advice will require a shift in trust from personal to process which will bring new and reinforce existing ethical challenges, see Lander and Kooning (2013). These need to be kept in check by a process-driven regulatory regime, testing for evidence of market power, incompetence, bias, and regulatory inertia. Regulators need to accept that a one-size-fits-all financial advice services regulation will not be adequate. Hanisch et al. (2023), for example, proposes distinct governance models for each phase of robo-advice, i.e. analog, augmented or autonomous. The relative importance of ethical concerns about accountability, bias, or algorithmic complexity, then depends on the advanced nature of the AI technology. Traditional financial advice regulation will not be adequate to capture the evolving ethical concerns of AI embedded in robo-advice. Tóth et al. (2022) illustrates how accountability for the ethical conduct of robo-advice gets assigned to the relevant stakeholders, expressing the need for regulatory adaptability. The regulatory framework for robo-advice requires constant monitoring of AI’s learning mechanism for the emergence of new ethical concerns and recurring ethical concerns, see Zhou et al. (2022).

Clients of personalised financial advice rely on regulators to ‘augment’ trust in robo-advice through licensing and monitoring. The high proportion of recurring misconduct instances documented by Egan et al. (2019) explains the generalised decline in trust despite regulatory oversight. Without rigorous licensing and monitoring, the market for financial advice will remain less than perfectly competitive, characterised by anti-competitive behaviour, ethical misconduct, lack of transparency, and neither fair nor just. To deliver on its promise of democratisation of finance, the robo-advice sector needs to play its part in self-regulation. That includes the AI programmers who need to find a way to incorporate ethics into algorithms that are geared towards designing optimal financial plans, see Telkamp and Anderson (2022).

Robo-advisors’ promise to improve competition in financial advice should not be taken at face value. In the context of imperfect markets with significant information asymmetry, clients are looking for trust credentials before engaging with a robo-advisor. We argue that we first need to identify the manifestations of possible ethical misconduct in robo-advice and then design a regulatory framework that measures, monitors, and captures misconduct. Readily available digital technology and easy-to-use internet platforms have lowered the bar for entry to financial advice, and open access AI is driving down the cost of providing personal financial advice. In this paper, we considered the new and existing ethical concerns this brings and conclude that we need regulatory settings that keep robo-advice competitive, competent, un-biased, safe, and trustworthy.