The literature review identified 82 success factors that were qualitatively consolidated in an iterative process into five broad categories and 17 groups. The meaning of each category is provided in Table 3. Table 4 identifies the success factors and references them according to success category and group. The ethical principles are shown next to each success factor based on mapping them to the 11 ethical categories from Ryan and Stahl . Figure 2 visualizes the intersection between the success factors and the ethical principles, each column being one of the 11 ethical categories. The colors represent the percentage and the numbers the count of success factors that address the principle. The results describe the practical requirements for success with AI development and usage based on the moral issues and ethical principles found in the literature.
4.1 Project Governance
The scope definition document, or problem statement, defines the aims and rationale for the algorithmic system . The requirement for the system, the moral issues, and all aspects of the project are impacted by the algorithm’s context (e.g., country, industry sector, functional topic, and use case). Trust is context-dependent since systems can work in one context but not another; thus, the scope should act as a contract that reveals the algorithm’s goal and the behavior that can be anticipated . Furthermore, a clearly defined scope protects against spurious claims and the misapplication or misuse of the system.
Next, AI ethical principles argue that AI systems should be developed to do good or benefit someone or the society as a whole (beneficence); they should avoid harming others (non-maleficence) [12, 47]. Bondi, Xu, Acosta-Navas, and Killian  propose that the communities impacted by the algorithms should be active participants in AI projects. The community members should have some say in granting the project a moral standing based on if it benefits the community or makes the community more vulnerable. While these arguments were made in relationship to the “AI for social good” type projects, the community engagement is relevant for all types of AI projects. Finally, rules should be established to manage conflict of interest situations within the team or when the values of the system conflict with the interests or values of the users [43, 44].
A responsibility assignment matrix defines roles and responsibilities within a project, distinguishing between persons or organizations with responsibility and accountability . Accountability ensures a task is satisfactorily done, and responsibility denotes an obligation to perform a task satisfactorily, with transparency in reporting outcomes, corrective actions, or interactive controls [47,48,49]. Both responsibility and accountability assume a degree of subject matter understanding and knowledge and should include moral, legal, and organizational responsibilities [12, 47]. The project organization should promote a diverse working environment, involving various stakeholders and people from numerous backgrounds and disciplines and promoting the exchange and cooperation across regions and among organizations [12, 54]. Furthermore, the project team needs members with specialized skills and knowledge to process data and accomplish the design and development of the algorithms; the quality of the team’s skills and expertise impacts the usability of the algorithms and the governance policies . Standards and policy guidelines should be used to build consensus and provide understanding on the relevant issues, such as algorithmic transparency, accountability, fairness, and ethical use [14, 38, 51, 53].
Systematic record-keeping is the mechanism for retaining logs and other documents with contextual information about the process, decisions, and decision-making from project inception through system operations [12, 45, 51, 56]; the various types of records are listed as individual success factors. The disclosure records are logs that are themselves about disclosures or the processes for disclosure, what was actually released, how information was compiled, how it was delivered, in what format, to whom, and when [45, 51, 52]. Bonsón, Lavorato, Lamboglia, and Mancini  found that some European publicly-listed companies are disclosing their AI ethical approaches and facts about their AI projects in their annual and sustainability reports. The procurement records are contractual arrangements, tender documents, design specifications, quality assurance measures, and other documents that detail the suppliers and relevant due-diligence .
The risk assessment records identify the potential implications and risks of the system, including legality and compliance, discrimination and equality, impacts on basic rights, ethical issues, and sustainability concerns . However, model risk assessment should be an active process of identifying and avoiding or accepting risks, changing the likelihood of a risk occurring or of its consequences, sharing risk responsibility, or retaining the risk as an informed decision . Retraining and fine-tuning models, adding new components to original solutions (i.e., wrappers), and building functionally equivalent models (i.e., copies) are three methods suggested by Unceta, Nin, and Pujol  for mitigating AI risks.
Ethics policies should include ethical principles with guidelines and rules for implementation and to verify and remedy any violations; the ethics policies should be shareable externally with the public or public authorities [2, 12, 14, 51]. Ethics training should cover the practical aspects of addressing ethical principles [2, 12, 51, 53]. An independent official such as an ombudsman, or a whistleblower process, should be available to hear or investigate ethical or moral concerns or complaints from team members [12, 42, 53]. Finally, professional membership in an association or standard organization (e.g., ACM, IEEE) that provides standards, guidelines, practices on ethical design, development, and usage activities should be encouraged and supported .
Algorithm auditing is a method that reveals how algorithms work. Testing algorithms based on issues that should not arise and making inferences from the algorithms’ data is a technique for auditing complex algorithms [1, 2, 7, 45, 53, 58, 59]. Audit finding records document the audit, the basis or other reasons it was undertaken, how it was conducted, and any findings . Audit response records document remediations and subsequent actions or remedial responses based on audit findings [2, 45].
Algorithmic impact assessments investigate aspects of the system to uncover the impacts of the systems and propose steps to address any deficiencies or harm [42, 53, 57]. Certification ensures that people or institutions comply with regulations and safeguards and punishes institutions for breaches; it offers independent oversight by an external organization [38, 53, 59].
4.2 Product Quality
Source Data Qualities.
Data accessibility refers to data access and usage in the algorithm creation process. Several regulations and laws constrain how data may be accessed, processed, and used in analytical processes. Thus, a legal agreement to use the data and the confidentiality of personal data should be preserved [1, 7, 51, 52, 60, 61]. Data transparency reveals the source of the data collected, including the context or purpose of the data collection, application, or sensors (or users who collected the data), and the location(s) where the data are stored [6, 38, 51, 62,63,64,65,66]. The reviewability framework  recommends maintaining data collection records that include details on their lifecycle: purpose, creators, funders, composition, content, collection process, usage, distribution, limitations, maintenance, and data protection and privacy concerns [2, 45, 51, 62, 63]. Datasheet for datasets by Gebru, et al.  provides detailed guidance on document content.
Training Data Qualities.
Data quality and relevance refer to possessing data that are fit for purpose. The quality challenges relating to training data include the diversity of the data collected and used, how well it represents the future population, and the representativeness of the sample [45, 62, 63]. Individuals are entitled to physical and psychological safety when interacting with and processing data, i.e., interaction safety [7, 12, 38, 51, 54, 61, 74]. Kang, Chiu, Lin, Su, and Huang  recommends that in addition to labelling data points, that human experts annotate representations and rules to optimize models for human interpretation. Equitable representation applies to data and people. For data, it means having enough data to represent the whole population for whom the algorithm is being developed while also considering the needs of minority groups such as disabled people, minors (under 13 years old), and ethnic minorities. For people, it means, for example, including representatives from minority groups or their advocates in the project governance structures or teams that design and develop algorithms [38, 66,67,68,69]. Model training records should document the training workflow, model approaches, predictors, variables, and other factors; datasheets for datasets by Gebru, et al.  and model cards by Mitchell, et al.  provide a framework for the documentation.
Model and Algorithm Qualities.
Algorithm transparency refers to using straightforward language to provide clear, easily accessible descriptive information (including trade secrets) about the algorithms and data and explanations for why specific recommendations or decisions are relevant. The need for end-users to understand and explain the decisions produced influences the algorithm, data, and user interface transparency requirements [1, 6, 38, 51, 52, 58, 66, 71, 75]. For example, transparency may be needed for compatibility between human and machine reasoning, for the degree of technical literacy needed to comprehend the model, or to understand the model’s relevance or processes . Here it is worth noting that the method and technique used to create the model influences it’s explainability and transparency . Models can be interpretable by design, provide explanations of their internals, or provide post-hoc explanation of their outputs. Mariotti, Alonso, and Confalonieri  provides a framework for analyzing a model’s transparency. Chazette, Brunotte, and Speith  provide a knowledge framework that maps the relationship between a model’s non-functional requirements and their influence on explainability.
Equitable treatment means eliminating discrimination and differential treatment, whereby similarly situated people are given similar treatment. In this context, discrimination should not be equated to prejudice based on race. It is based on forming groups using “statistical discrimination” and refers to anti-discrimination and human rights protections [1, 2, 12, 66, 67, 74].
Model qualities include consistency, accuracy, interpretability, and suitability; there are no legal standards for acceptable error rates or ethical designs. Consistency means receiving the same results given the same inputs, as non-deterministic effects can occur based on architectures with opaque encodings or imperfect computing environments . Accuracy is how effectively the model provides the desired output with the fewest mistakes (e.g., false positives, error rates) [6, 7, 38, 66, 68, 69]. Overconfidence is a common modeling problem that occurs when the model’s average confidence level exceeds its average accuracy . Interpretability refers to how the model is designed to provide reliable and easy-to-understand explanations of its prediction [6, 12, 56]. Auditability refers to how the algorithm is transparent or obscured from an external view to allow other parties to monitor or critique it [2, 63, 66].
Model validation is the execution of mechanisms to measure or validate the models for adherence to defined principles and standards, effectiveness, performance in typical and adverse situations, and sensitivity. Validation should include bias testing, i.e., an explicit attempt to identify any unfair bias, avoid human subjectivity that introduces individual and societal biases, and reverse any biases detected. Models can be biased based on a lack of representations in the training data or how the model makes decisions, e.g., the selected input variables. The model outcomes should be traceable to input characteristics [2, 38, 40, 59, 72, 73].
The reviewability framework suggests maintaining model validation records that contain details on the model and how it was validated, including dates, version, intended use, factors, metrics, evaluation data, training data, quantitative analyses, ethical considerations, caveats and recommendations, or any other restrictions [45, 71]. Model cards by Mitchell, et al.  provide detailed guidance on the content.
User Interface (UI) Qualities.
Expertise is embodied in a model in a generalized form that may not be applicable in individual situations. Thus, human intervention is the ability to override default decisions [1, 6, 47]. Equitable accessibility ensures usability for all potential users, including people with disabilities; it considers the ergonomic design of the user interface [12, 38, 78]. Front-end transparency designs should meet transparency requirements and not unduly influence, manipulate, confuse, or trick users [52, 60, 64, 66, 77]. Furthermore, dynamic settings or parameters should consider the context to avoid individual and societal biases such as those created by socio-demographic variables . App-Synopsis by Albrecht  provides detailed guidance on the recommended description of algorithmic applications.
The system and architecture quality may impact the algorithm’s outcomes, introduce biases, or result in indeterminate behavior. Default choices (e.g., where thresholds are set and defaults specified) may introduce bias in the decision-making. Specifically, the selected defaults may be based on the personal values of the developer. The availability, robustness, cost, and safety capabilities of the software, hardware, and storage are essential to algorithm development and use . Decisions on methods and the parallelism of processes may cause system behavior that does not always produce the same results when given the same inputs. Obfuscated encodings may make it difficult to process the results or audit the system. The degree of automation may limit the user’s choices [7, 56]. Security safeguards allow technology, processes, and people to resist accidental, unlawful, or malicious actions that compromise data availability, authenticity, integrity, and confidentiality [2, 52, 53, 79].
The reviewability framework suggests that systems should provide a technical logging process, including mechanisms to capture the details of inputs, outputs, and data processing and computation . The framework also recommends records relevant to the technical deployment records and operations, including installation procedures, hardware, software, network, storage provisions or architectural plans, system integration, security plans, logging mechanisms, technical audit procedures, technical support processes, and maintenance procedures . Hopkins and Booth  highlights the need for model and data versioning and metadata to evaluate data, direct model changes overtime, document changes, align the product to the end documentation, and to act as institutional knowledge.
Data and Privacy Protections.
Data governance includes the practices for allocating authority and control over data, including the way it is collected managed and used; it should consider the data development lifecycle of the original source and training data [63, 80]. Data retention policy specifies the time and obligations for keeping data; personal data should be retained for the least amount of time possible [12, 52].
Privacy safeguards include processes, strategies, guidelines, and measures to protect and safeguard data privacy and implement remedies for privacy breaches [1, 6, 7, 38, 51, 53, 60]. Informed consent is the right to be informed of the collection, use, and repurposing of their personal data [6, 7, 12, 38, 52, 53]. The legal and regulatory rules covering consent vary by region and usage purposes. Personal data control means giving people control of their data [1, 6, 60], while confidentiality concerns protecting and keeping data and proprietary information confidential [7, 12, 38, 56, 65, 74]. Data encryption, data anonymization, and privacy notices are examples of privacy measures [1, 6, 7, 38, 51, 53, 60]. Data anonymization involves applying rules and processes that randomize data so an individual is not identifiable and cannot be found through combining data sources. Data protection principles do not apply to anonymous information [38, 51, 53, 60]. Data encryption is an engineering approach to secure data with electronic keys.
4.3 Usage Qualities
System Transparency and Understandability. Stakeholder-centric communication considers the explainability of the algorithm to the intended audience. Scoleze Ferrer Paulo, Galvão Graziela Darla, and de Carvalho Marly  note that there are different audiences for model and algorithm explanations: executives and manager who are accountable for regulatory compliance, third-parties that check for compliance with laws and regulation, expert users with domain-specific knowledge, people affected by the algorithms that need to understand how they will be impacted, and models designers and product owner who research and develop the models. Thus, explanations should be comprehendible and transmit essential, understandable information rather than legalistic terms and conditions, even for complex algorithms [2, 6, 12, 52, 65, 81, 82]. Interpretable models refer to having a model design that is reliable, understandable, and facilitates the explanation of predictions by expert users [6, 12]. Choices allow users to decide what to do with the model results, maintaining a human-in-the-loop for a degree of human control [6, 12, 38, 47, 53, 56, 83].
Expertise is embodied in a generalized form that may not be applicable in individual situations, so specialized skills and knowledge may be required to choose among alternatives. Consequently, professional expertise, staff training and supervision, and on-the-job coaching may be necessary to ensure appropriate use and decision quality [56, 84]. Similarly, onboarding procedures are required to orient users on the system usage, their responsibilities, and system adjustments needed to address confidence levels, adjust thresholds, or override decisions . Interaction safety refers to ensuring physical and psychological safety for the people interacting with AI systems . Problem reporting is a mechanism that allows users to discuss and report concerns such as bugs or algorithmic biases .
The complaint process means having mechanisms to identify, investigate, and resolve improper activity or receive and mediate complaints . Quality controls detect improper usage or under-performance. Improper usage occurs when the system is used in a situation for which it was not originally intended [38, 47]. Monitoring is a continual process of surveying the system’s performance, environment, and staff for problem identification and learning . Staff monitoring identifies absent or inadequate content areas, identifies systematic errors, anticipates and prevents bias, and identifies learning opportunities. System monitoring verifies how the system behaves in unexpected situations and environments. Model values or choices become obsolete and must be reviewed or refreshed through an algorithm renewal process [38, 42, 53].
The reviewability framework recommends retaining usage, consequence, and process deployment records. Usage records contain model inputs and the outputs of parameters, operational records at the technical (systems log) level, and usage instructions [45, 51]. Consequence records document the quality assurance processes for a decision and log any actions taken to affect the decision, including failures or near misses [45, 59]. Logging and recording decision-making information are appropriate means of providing traceability. Process deployment records document relevant operational and business processes, including workflows, operating procedures, manuals, staff training and procedures, decision matrices, operational support, and maintenance processes .
Awareness is educating the public about the existence and the degree of automation, the underlying mechanisms, and the consequences [2, 6, 53]. Access and redress are ways to investigate and correct erroneous decisions, including the ability to contest automated decisions by, e.g., expressing a point-of-view or requesting human intervention in the decision [1, 2, 6, 12, 53, 65, 85]. Decision accountability is knowing who is accountable for the actions when decisions are taken by the automated systems in which the algorithms are embedded [2, 12, 53, 66, 85]. Privacy and confidentiality are the activities to protect and maintain confidential information of an identified or identifiable natural person [7, 12, 38, 56, 65, 74].
4.4 Benefits and Protections
Intellectual property rights consist of the ownership of the design of the models, including the indicators. Innovation levels have to be balanced against the liability and litigation risks for novel concepts [38, 53]. Financial gains include increased revenues from a sale or licensing models that produce revenue through license or service fees [38, 74]; cost reductions from making faster, less expensive, or better decisions ; or improved efficiency from reducing or eliminating tasks . Furthermore, proven successful models, concepts, algorithms, or businesses can attract investment funds . Investment funds are needed to finance project resources and activities .
Intellectual property protection is achieved by hiding the algorithm’s design choices, partly or entirely, and establishing clear ownership of AI artifacts (e.g., data, models) [38, 53, 58]. Data and algorithm transparency and auditing requirements should be considered in deciding what to reveal . Further, model development has environmental impacts and energy costs. The environmental impacts occur as training models may be energy-intense, using as much computing energy as a trans-American flight as measured by carbon emissions [12, 69]. The energy costs from computing power and electricity consumption (for on-premise or cloud-based services) are relevant for training models [12, 69]; for an incremental increase inaccuracy, the cost of training a single model may be extreme (e.g., 0.1 increase in accuracy for 150,000 USD) . Cost efficiency occurs when acquiring and using information is less expensive than the costs involved if the data were absent . Project efficiency evaluates the project management’s success in meeting stakeholder requirements for quality, schedule, and budget [18, 20].
Legal safeguards include protection against legal claims or regulatory issues that arise from algorithmic decisions [2, 52]—limiting liability or risk of litigation for users and balancing risks from adaptations and customizations with fear of penalties or liability in situations of malfunction, error, or harm [12, 38]. Regulatory and legal compliance involves meeting the legal and regulatory obligations for collecting, storing, using, processing, profiling, and releasing data and complying with other laws, regulations, or ordinances [7, 30, 45, 51, 53, 57, 59, 65].
4.5 Societal Impacts
Civil rights and liberties protection secures natural persons’ fundamental rights and freedoms, including the right to data protection and privacy and to have opinions and decisions made independently of an automated system [12, 60]. To ensure such rights, the product and usage qualities enumerated in their respective success groups must be implemented (e.g., equitable treatment, accessibility, choices, privacy, and confidentiality) [12, 47, 60]. Finally, AI systems may introduce new job structures and patterns, eliminate specific types of jobs, or change the way of working. Thus, programs may be required to address employee and employment-related issues [12, 53].
Environmental sustainability is supported by limiting environmental impacts and reducing energy consumption in model creation [12, 69].