1 Introduction

1.1 Background

The increasing availability of software and data services on the Internet has expanded the options for designing automated and semi-automated service compositions for application developers and users. When selecting and combining Web services, the quality of service (QoS) is considered a crucial factor. QoS-aware service composition involves defining general QoS attributes such as cost, response time, reputation, and availability [52], which are important for evaluating the non-functional quality of atomic and composite services. Since the early 2000s, QoS-aware service composition has been one of the most active research topics in service-oriented computing. In previous studies, various approaches have been proposed for computing QoS based on multiple attributes [1, 8, 14, 43, 52], focusing on the optimization of the overall non-functional quality of composite services.

On the other hand, application specific quality (functional QoS attributes) may also be crucial in many real-world services. For instance, when it comes to translation services, users are primarily concerned with the translation quality rather than general attributes. Hence, it is necessary to prioritize the optimization of translation quality while also considering non-functional QoS attributes. However, certain crucial functional QoS attributes may not consistently fulfill users’ needs due to limitations specific to the application, e.g., it is not always feasible for a machine translation service to deliver flawless translation outcomes to users. Nowadays, this perspective becomes extremely important when considering various artificial intelligence services and machine learning applications in smart cities.

To address the above issue, the integration of Web services and human activities has emerged as a potential solution. While human activities have been extensively studied in the area of business process management, they have primarily been examined from an organizational or resource perspective [41, 56]. These studies have focused on situations where tasks cannot be automated and require human intervention. Since the late 2000s, the rise of crowdsourcing and cloud computing environments has sparked interest in combining human activities with existing services and applications [21, 22].

1.2 Approach

We aim at practicing and analyzing the effect of composing human activities and Web services in real-world scenarios. The human activities in this research involve both crowd workers and professionals. Specifically, we consider human activities from a perspective of QoS which was always neglected in the previous research.

We start by conducting empirical studies on designing and implementing human-in-the-loop service composition. Since 2006, we have been working on the Language Grid [16, 18, 39, 40], a service-oriented language infrastructure, which serves as the fundamentals for our research on service composition. A good example in the language service domain is that translation work can be done by composing various language services on the Language Grid, monolingual crowd workers, and bilingual professionals. In 2010, we conducted a small pilot experiment on translating a manual of digital camera and found that it was promising to combine Web services and human activities [27, 34]. In the following years, we increased the scale of the experiment, designed the human-in-the-loop composite services for supporting localization processes [31, 32], and implemented human-in-the-loop applications for real-world multilingual activities [28,29,30, 33].

On the other hand, we realize that it is necessary to provide theoretical foundations for designing and optimizing human-in-the-loop service composition. Therefore, we need to model the human activities and analyze how the composite services could achieve optimal performance with human activities. To achieve this goal, we propose theoretical crowdsourcing workflow models, use translation tasks to study human activities, and simulate the optimal service workflow under various situations [11,12,13].

This monograph reports our research efforts on designing and analyzing human-in-the-loop service compositions, in both practical aspects and theoretical aspects.

1.3 Structure of This Chapter

Section 2 introduces a motivating example of translation service design to illustrate the necessity of designing and implementing human-in-the-loop composite services in real-world applications. The section also defines various patterns of combining human activities and Web services.

Section 3 presents a large-scale experiment on the composition of human activities and Web services in the field of language translation. The study considers both the functional and non-functional QoS attributes. The experiment results demonstrate that the inclusion of human activities in service processes introduces diversity compared to traditional processes that only involve Web services. Additionally, the study analyzes the impact of human activities on the QoS of service processes. The findings also indicate that high-quality human activities can significantly enhance various QoS attributes of service processes, while low-quality human activities may have negative effects on these processes.

Section 4 focuses on the design of human-in-the-loop composite services, considering the uncertainties associated with real-world services and users’ requirements. The section proposes a service design approach, which includes phases such as observation, modeling, implementation, and analysis. The section also presents a field study on the design of multi-language communication services to demonstrate the effectiveness of the proposed service design approach.

Section 5 proposes theoretical approaches to modeling and optimizing the crowdsourcing workflow. Experiments under various situations yield results consistent with existing studies in the research community of crowdsourcing.

Section 6 describes the related work on human activities in service composition, user-centered composite service design, and crowdsourcing workflow models.

Section 7 concludes this monograph by summarizing the contributions of our work on human-in-the-loop service composition and discussing future directions.

2 Human-in-the-Loop Service Composition

2.1 A Language Service Composition Example

To illustrate the research issue, we present a case study in the field of language translation. Specifically, we examine the two methods of achieving language translation: human translation and machine translation. To provide flexible language services, we have developed the Language Grid, a service-oriented intelligence platform [16, 17]. The Language Grid collects language resources from various sources such as the Internet, universities, research labs, and companies. These resources are then encapsulated as atomic Web services with standardized interfaces. We have also created a series of composite services using these atomic language services. Furthermore, it is also possible to encapsulate human activities as Web services on the Language Grid [31]. Within the Language Grid, multiple QoS attributes are managed for language services, including general attributes such as response time and cost, as well as application-specific attributes like translation quality [34]. In the domain of language services, the application-specific QoS attributes, particularly translation quality, are of utmost importance. Previous evaluations of translations have focused on the adequacy and fluency [34]. Adequacy refers to the extent to which the translation effectively conveys the information present in the original text, while fluency pertains to the degree to which the translation adheres to the grammar of the target language.

Given that users have varying QoS requirements for language services, it is necessary to provide different atomic services or composite services with different QoS for the same function. In the Language Grid, language services are categorized into several classes, with multiple atomic services or composite services provided for different QoS requirements within each class. For instance, the translation service class includes atomic machine translation service, two-hop machine translation service, machine translation service combined with a bilingual dictionary, and so on. By creating a composite machine translation service that incorporates services such as morphological analysis and dictionary, the functional QoS can be enhanced compared to using the atomic machine translation service alone. However, despite the availability of various types of services, there are still limitations in terms of functional QoS attributes. For example, machine translation services, even when combined with dictionaries or other services for QoS improvement, cannot achieve perfect fluency and adequacy. This means that service-based processes may not always meet users’ requirements. While a composite translation service may be suitable for fulfilling QoS requirements in online multilingual chatting, it may be challenging to use a purely service-based process for writing business documents or translating product operation manuals.

To address both the functional and non-functional QoS of translation services, we conducted a preliminary experiment that aimed to integrate human activities and Web services [34]. However, we discovered that human resources can also become a bottleneck if they are not readily available. As a solution, we propose the incorporation of crowdsourcing into the service process.

2.2 Composition of Web Services and Human Activities

Given the presence of an established service process, it is feasible to incorporate human activities through various means such as replacing an atomic service or subprocess, establishing a selective control relationship with a service or subprocess, or improving the input or output of an atomic service or subprocess either fully or partially. This approach can also be applied to integrate human activities into a process that consists of both human activities and Web services. To enhance the QoS, we propose several fundamental patterns for introducing a human activity (or human service) into a service process. These fundamental patterns can also be combined to address more complicated scenarios.

  • Complete substitution: a human activity \(h_{i}\) is used to substitute a service \(s_{i}\) (or a subprocess) completely.

  • Partial substitution: a human activity \(h_{i}\) is used to form a selective control relationship with a service \(s_{i}\) (or a subprocess) under a certain condition.

  • Pre-processing: a human activity \(h_{i}\) is used to pre-process the input of a service \(s_{i}\) (or a subprocess).

  • Partial pre-processing: a human activity \(h_{i}\) is used to pre-process the input of a service \(s_{i}\) (or a subprocess) under a certain condition.

  • Post-processing: a human activity \(h_{i}\) is used to post-process the output of a service \(s_{i}\) (or a subprocess).

  • Partial post-processing: a human activity \(h_{i}\) is used to post-process the output of a service \(s_{i}\) (or a subprocess) under a certain condition.

In the context of machine translation services, the functional QoS attributes that are relevant are fluency and adequacy. In cases where the service process itself fails to meet the user’s QoS requirement, there are several alternatives for introducing human activities. These alternatives include: (1) completely substituting the machine translation service process with human activity for translation, referred to as complete substitution; (2) incorporating a human activity for pre-editing the source sentence within the original service process, such as modifying long sentences or reordering words to facilitate easier translation, known as pre-processing; (3) introducing a human activity for post-editing the translation result, such as enhancing fluency by a monolingual user, when the original service process fails to satisfy the user’s QoS requirement, referred to as partial post-processing; and (4) combining the human activities of pre-editing and post-editing to enhance the QoS of the original service process, which involves a combination of pre-processing and post-processing patterns.

3 Empirical Study on Human-in-the-Loop Translation Services

3.1 Experiment Design

To examine the impact of the composition of Web services and human activities on QoS, a comprehensive experiment is conducted focusing on language translation. The translation procedures employed in this experiment are constructed based on the patterns outlined in Sect. 2.2. Within the language service domain, QoS encompasses both non-functional attributes (such as cost and time) and functional attributes (specifically, the quality of translation, i.e., the adequacy of the translation result). To assess the effectiveness of combining human activities with Web services, a three-step experimental design is devised:

  • Step 1 (CMT): Use a composite machine translation service that integrates three atomic services (a machine translation service, a morphological analysis service, and a dictionary service).

  • Step 2 (CMT+Mono): Incorporate human activities involving partial post-processing into CMT. The human activities are conducted by monolingual users for post-editing a specific portion of the CMT-generated translation results, with the condition that monolingual users can understand the machine translation results.

  • Step 3 (CMT+Mono+Bi): Incorporate human activities of post-processing into CMT+Mono. The human activities are conducted by bilingual users to confirm the correctness of the post-editing results in CMT+Mono as well as translating the unmodified parts in CMT+Mono. The whole flow is shown in Fig. 1.

Fig. 1
A flowchart illustrates a translation and post-editing process involving both machine and human elements. It begins with a composite web service. If the Q o S adequacy is less than 3, it proceeds to bilingual human activity. If the Q o S adequacy is 3 or more, it involves monolingual human activity.

Translation process composing by Web services and human activities (Step 3: CMT+Mono+Bi)

In this experiment, the Language Grid provides a range of essential Web services, such as machine translation services, morphological analysis services, and dictionary services. These Web services are constructed by wrapping language resources that are originally provided by various organizations.

  • Machine translation services: JServer service (language pairs used in the experiment: Japanese (ja) \(\leftrightarrow \) English (en), Japanese (ja) \(\leftrightarrow \) Korean (ko), Japanese (ja) \(\leftrightarrow \) Simplified Chinese (zh-CN) and Japanese (ja) \(\leftrightarrow \) Traditional Chinese (zh-TW)) provided by Kodensha Co., Ltd, GoogleTranslate service (language pairs used in the experiment: English (en) \(\leftrightarrow \) Traditional Chinese (zh-TW)) provided by Google, WebTranser service (language pairs used in the experiment: English (en) \(\leftrightarrow \) German (de), English (en) \(\leftrightarrow \) French (fr), English (en) \(\leftrightarrow \) Spanish (es), and English (en) \(\leftrightarrow \) Portuguese (pt)) provided by Cross Language Inc.

  • Morphological analysis services: Mecab Japanese morphological analysis service provided by NTT Communication Science Laboratories, and TreeTagger English morphological analysis service provided by University of Stuttgart.

  • Dictionary services: dictionary service for Business, University, and Temple provided by Kyoto Information Card System LLC, Ritsumeikan University, and the Kodaiji Temple.

The experiment incorporates two types of human activities. Monolingual users are involved in post-editing machine translation results, while bilingual users engage in translation and post-editing of results produced by monolingual users. To examine the impact of human activities on the QoS of service processes, we employ two distinct configurations of human activities as follows:

  • Crowd workers for monolingual human activities: Crowd workers are selected from a list of numerous registered foreign student users at Kyoto University, Japan. The sole prerequisite is that the registered user is a native speaker of the language in which post-editing is needed. Consequently, the quality of human activities conducted by the monolingual crowd workers cannot be predicted during the experiment.

  • Professionals for bilingual human activities: Since the translation/confirmation tasks have stringent criteria for participation, only registered users who possess expertise in two languages required for the tasks are eligible. Consequently, the experiment ensures the inclusion of bilingual users who can deliver high-quality translations.

Table 1 shows the 14 service processes employed in the translation experiment. Each process follows the three steps outlined in Sect. 3.1. For instance, Process (1) in Table 1 pertains to the translation of business-related documents from Japanese to English. The experiment consisted of a total of 551 process instances, with each instance representing the translation of a Japanese sentence to an English sentence. Consequently, there are 551 subtasks available for translation in Process (1). The composite translation service used for Process (1) relies on three atomic services on the Language Grid: the JServer Japanese-English machine translation service, the business bilingual dictionary service, and the Mecab Japanese morphological analysis service. Human activities include post-editing tasks for English monolingual users and translation/post-editing tasks for Japanese-English bilingual users.

Table 1 Translation processes used in the experiments that combine Web services (MT: machine translation service; Dic: bilingual dictionary service; MA: morphological analysis service) and human activities (Mono: monolingual human activity; Bi: bilingual human activity)

3.2 Experiment Results

We perform a series of measurements to examine the impact of human activities on the QoS in service processes.

  • Evaluation of the functional QoS in terms of translation adequacy, as well as the non-functional QoS attributes such as execution time and cost.

  • Examination of the correlation between the functional and non-functional QoS.

  • Analysis of the impact of variations in human activities on the QoS attributes.

To assess the quality of human activities, we establish three indices: submission rate, acceptance rate, and completion rate for monolingual users. The rationale behind defining the three indices exclusively for monolingual users is rooted in the assurance of the bilingual users’ quality throughout the experiments, as outlined in Sect. 3.1. Consequently, the submission rate, acceptance rate, and completion rate can be considered as \(100\%\) for bilingual users in this experiment.

  • Monolingual Submission Rate (MSR): the proportion of post-edited results among all machine translation results for monolingual users in Step 2.

  • Monolingual Acceptance Rate (MAR): the proportion of successfully accepted post-edited results among all submitted results for monolingual users in Step 3.

  • Monolingual Completion Rate (MCR): the proportion of completed post-edited (submitted and accepted) results among all the machine translation results for monolingual users in Step 3, which is determined by \(MCR=MSR\times MAR\).

To investigate the impact of human activities on the execution time (duration) of the service process, we assess the following items:

  • Monolingual Work Time (MWT): execution time of the monolingual human activities.

  • Bilingual Work Time (BWT): execution time of the bilingual human activities.

  • Total Work Time (TWT): summation of monolingual work time (MWT) and bilingual work time (BWT), which is determined by \(TWT=MWT+BWT\).

  • Common Work Time (CWT): execution time when the process is a purely human translation process.

  • Time Reduction Rate (TRR): the extent to which the execution time is reduced in comparison to the conventional human translation process, which is determined by \(TRR=1-\frac{TWT}{CWT}\).

Table 2 presents the results of above the indices for all 14 processes conducted in the experiments. The results indicate significant variations in the quality of monolingual human activities and execution time across the different processes.

Table 2 Measurements of the human-in-the-loop service processes

3.2.1 Effects of Human Activities on Execution Time

Figures 2 and 3 provide an analysis of the correlation between time reduction rate (TRR), monolingual submission rate (MSR), and monolingual completion rate (MCR). The data presented is based on the translation task of an average calculation of a single A4-size page, which is approximately 700 Japanese characters or 400 English words. The involvement of human activities in the translation process results in a reduction in execution time for half of the 14 processes, while the other half experiences an increase in execution time compared to a purely human translation process. The findings also indicate that a high monolingual submission rate (MSR) does not necessarily lead to a high time reduction rate (TRR). However, there is a trend suggesting that a higher monolingual completion rate (MCR) is associated with a greater time reduction rate (TRR). Additionally, it appears challenging to reduce execution time when the monolingual submission rate (MSR) is relatively high, but the monolingual completion rate (MCR) is low (e.g., Process (5), Process (8) and Process (9)). This difficulty arises from the significant time wasted in dealing with low-quality submissions by monolingual users that are not accepted.

Fig. 2
A scatterplot of time reduction rate versus monolingual submission rate. The datapoints labeled from 1 to 14 are scattered after the M S R 0.4. The data are estimated.

Relationship between time reduction rate (TRR) and monolingual submission rate (MSR)

Fig. 3
A scatterplot of time reduction rate versus monolingual submission rate. The datapoints labeled from 1 to 14 are scattered and follow a diagonally increasing trend.

Relationship between time reduction rate (TRR) and monolingual completion rate (MCR)

3.2.2 Effects of Human Activities on Cost

To investigate the impact of human activities on the cost of executing the service process, a series of measurements are conducted. In this experiment, bilingual users and monolingual users are paid at rates of US$ 50.00 and US$ 5.00 per A4-size page, respectively. However, in cases where the results were not accepted, the payment to the monolingual users was reduced by half.

  • Monolingual Work Cost (MWC): cost of monolingual human activities, which is calculated by \(MWC=5.00\times (MCR+\frac{1}{2}(MSR-MCR))\).

  • Bilingual Work Cost (BWC): cost of bilingual human activities, which is calculated by \(BWC=50.00\times (1-MCR)\).

  • Total Work Cost (TWC): summation of the cost of monolingual human activities and bilingual human activities, which is determined by \(TWC=MWC+BWC\).

  • Common Work Cost (CWC): cost when the process is a purely human translation process, and \(CWC=50.00\).

  • Cost Reduction Rate (CRR): the cost reduction percentage in comparison to a purely human translation process, which is calculated by \(CRR=1-\frac{TWC}{CWC}\).

Fig. 4
A multiple scatterplot of cost versus monolingual submission rate. The parameters are T W C, B W C, and M W C. The T W C datapoints are labeled from 1 to 14. T W C and B W C follow a descending trend, while M W C follows a parallel trend.

Relationship between execution cost (monolingual work cost (MWC), bilingual work cost (BWC), total work cost (TWC)) and monolingual completion rate (MCR)

Figure 4 illustrates the correlation between the cost (monolingual work cost (MWC), bilingual work cost (BWC), total work cost (TWC)) and monolingual completion rate (MCR). The findings indicate that employing a composite process involving both human activities and Web services can effectively reduce translation costs compared to relying solely on human translation. This supports the analysis conducted in our previous preliminary experiments [34]. The reason lies in that a part of the work in a purely human translation process is substituted with Web services and monolingual users with lower cost. Additionally, the results demonstrate that the cost reduction rate (CRR) increases as the monolingual completion rate (MCR) rises. An extremely successful example is Process (3), which achieves a cost reduction rate (CRR) of \(80.41\%\) due to the high quality of monolingual human activity with the monolingual completion rate (MCR) of \(89.59\%\).

3.2.3 Effects of Human Activities on Relations of QoS Attributes

To analyze the impact of variations in human activities on the QoS attributes, we have classified the 14 processes into three groups according to their monolingual completion rate (MCR). This metric serves as a direct indicator of the quality of monolingual human activities.

  • Low-quality monolingual activity group: Process (2), (5), (8), (10).

  • Medium-quality monolingual activity group: Process (1), (6), (7), (9), (11).

  • High-quality monolingual activity group: Process (3), (4), (12), (13), (14).

Figures 5 and 6 examine the correlation between functional QoS attributes, specifically translation quality, and non-functional QoS attributes, namely execution time and cost. The analysis compares different steps (Step 1 to Step 3 from left to right in each subgraph of Figs. 5 and 6) for all 14 processes in the experiment. The findings indicate that both execution time and cost increase as the steps progress from Step 1 to Step 3, indicating that achieving higher functional QoS requires more time and cost. Step 1, which solely involves Web services, incurs negligible cost and execution time compared to other steps. However, the functional QoS achieved in Step 1 is also limited. In contrast, Step 2, and Step 3, which prioritize high functional QoS, entail significantly higher cost and execution time.

Fig. 5
A set of 3 multiple line graphs of Q o S translation quality versus Q o S cost. The parameters are 8, 5, 10, and 2 in graphs 1, 7, 11, 9, 6, and 1 in graph 2, and 4, 14, 12, 13, and 3 in graph 3. All lines follow an increasing trend in all graphs.

Relationship between cost and translation quality

Fig. 6
A set of 3 multiple line graphs of Q o S translation quality versus Q o S time. The parameters are 8, 5, 10, and 2 in graphs 1, 7, 11, 9, 6, and 1 in graph 2, and 4, 14, 12, 13, and 3 in graph 3. All lines follow an increasing trend in all graphs.

Relationship between execution time and translation quality

The results in Figs. 5 and 6 also demonstrate that the quality of human activities has varying effects on the QoS attributes of composite services. Specifically, composite services characterized by low-quality monolingual activity group incur significant costs in improving functional quality from Step 2 to Step 3, resulting in only marginal cost savings compared to purely human processes (valued at US$ 50). Furthermore, these composite services require more execution time in Step 3 compared to purely human processes (100 min). Conversely, composite services with high-quality monolingual activity groups can enhance functional QoS with minimal cost and execution time from Step 2 to Step 3. Consequently, the variations in the quality of human activities significantly influence QoS attributes. These results suggest the need for the development of quality control models for human activities to ensure high QoS in composite services.

3.3 Discussion

Although the example used in this study falls into the language service domain, it is important to note that the issue of service-based processes not always meeting users’ requirements due to limitations in functional QoS attributes is prevalent in other domains, such as various artificial intelligence (AI) services in smart cities, ranging from object detection to voice recognition. To address both functional QoS and non-functional QoS attributes in such service processes, the integration of human activities and Web services can be considered a promising approach. By combining human activities and Web services, the variety of service implementation can be expanded. In cases where Web service-based processes exhibit limited functional QoS, the introduction of human activities can enhance functional QoS to varying degrees based on users’ requirements. Similarly, in purely human processes, the incorporation of Web services, even with limited functional QoS, can enhance efficiency and improve non-functional QoS.

In this empirical investigation, our primary objective is to examine the impact of human activities on both functional and non-functional QoS. Consequently, we have chosen to utilize only a limited number of service composition patterns as defined in Sect. 2.2. Nevertheless, it is crucial to consider the appropriate application of various patterns for inducing human activities in different situations, considering users’ requirements. This is because the effect of human activities on the QoS of service processes may vary depending on the specific pattern employed. In the language translation example, the analysis of QoS effects of different patterns can be used for service design of field-based multi-language communication [28].

4 Human-in-the-Loop Service Design for Supporting Real-World Multilingual Activities

In the previous section, we described our research efforts on analyzing non-functional and functional QoS in human-in-the-loop service composition by using a pre-designed language translation service process. In this section, we will report our study of designing human-in-the-loop composite services for real-world applications, where there are numerous variations of combining human activities and Web services.

4.1 Designing Composite Services for Real-World Applications

To design human-in-the-loop composite services in the real world, there are several significant issues that need to be addressed. Firstly, the performance of services may vary due to the dynamic nature of service environments [31], resulting in inherent uncertainty in QoS [49]. This uncertainty poses challenges in designing composite services based on QoS. This issue becomes even more challenging when considering the combination of human activities and Web services. Secondly, when multiple QoS attributes are associated with services, it is often difficult to optimize all of these attributes simultaneously due to the presence of anti-correlated relationships among them [2]. For example, improving the quality of translation in a multi-language communication service might result in a significant increase in cost. Therefore, it is necessary to design composite services based on users’ requirements.

We present an example of a multi-language communication service design project, the YMC (Youth Mediated Communication)-Viet project, which aims to assist Vietnamese farmers in accessing agricultural knowledge from Japanese experts [28,29,30, 33]. The YMC-Viet project was conducted in collaboration with the Ministry of Agriculture and Rural Development of Vietnam (MARD) as a model initiative for providing ICT assistance to developing nations. Due to the low literacy rate among farmers in rural areas, literate youths, who are the children of these farmers, serve as intermediaries between the Japanese experts and the Vietnamese farmers. This project was implemented in Thien My Commune and Tra On District of Vinh Long Province, Vietnam, over four seasons from 2011 to 2014, involving 15–30 families of farmers in each season. The YMC-Viet project facilitates communication between Japanese experts and Vietnamese youths through an online tool called the YMC system [44, 45], where human-in-the-loop composite services are embedded. This system supports multiple languages and allows Vietnamese youths to send field data and questions. The Japanese experts receive these data and questions and respond in Japanese, which is then translated into Vietnamese by the system and delivered back to the youths. The key challenge is to design a multi-language communication service that maximizes the effectiveness of the YMC system.

To design the multi-language communication service, we utilize the Language Grid as the platform for language service composition. Figure 7 illustrates a part of available services for the YMC-Viet project. With the availability of various language resources on the Internet, such as machine translators, multi-language dictionaries, and parallel texts, it has become possible for users to design language services to suit their own requirements [34, 39]. However, challenges arise when dealing with the uncertain quality of different language services. For instance, estimating the quality of a machine translation service is always a difficult task. Therefore, it is crucial to develop an approach for designing composite services that can effectively handle the QoS uncertainty.

Fig. 7
A diagram illustrates a multistep translation process involving both machine and human elements. It begins with Japanese experts collaborating with Vietnamese farmers to achieve effective communication. A table of web services provided by the language grid is given below.

Available language services for multi-language agricultural support (cited from [32])

Based on the available services depicted in Fig. 7, several alternative composite services can be employed to support multi-language communication between Japanese and Vietnamese. These alternatives include: (1) a composite machine translation service that integrates Japanese-English machine translation and English-Vietnamese machine translation, (2) a composite Japanese-Vietnamese machine translation service that incorporates an agriculture dictionary, (3) a composite translation service that combines Japanese-Vietnamese machine translation with Vietnamese post-editing by human translators, and so on. However, determining the optimal composite service is challenging due to the uncertain quality of translation services, as previously discussed. Consequently, it is imperative to consider how to design an appropriate composite service that meets users’ requirements. Furthermore, it is likely that a combination of human activities and Web services will be necessary, thereby further complicating the service design process.

4.2 Service Design Process

To address the complex challenges posed by factors such as the QoS uncertainty, the composition of human activities and Web services, and the diverse requirements of users, it is imperative to adopt an iterative service design methodology for composite services prior to their implementation and deployment in the real world. In this regard, it is natural to assess the QoS of the composite services and users’ satisfaction throughout the entire design process.

In this study, we propose a user-centered participatory service design approach to address these challenges. While participatory design has been previously suggested for community informatics [9] and multi-agent systems [19], its application in service-oriented computing, particularly in the context of user-centered design for service composition, is also expected to be effective to address the aforementioned challenging issues. The proposed service design process includes the following phases:

  • Observation: Investigate and/or update the information of available Web services and human services, establish QoS criteria, and understand users’ QoS requirements for service design.

  • Modeling: Utilize a user-centered approach to identify the most suitable candidate human-in-the-loop composite service that can effectively meet the QoS requirements of users [30].

  • Implementation: Implement the composite service model defined in the previous phase. To facilitate the improvement of system implementations, participatory simulations are conducted prior to their deployment in real-world settings [28].

  • Analysis: Evaluate the implemented service by analyzing the log data of QoS based on the defined evaluation criteria. The findings from this analysis will offer valuable insights and knowledge that can be applied to refine the composite service in subsequent design iterations.

4.3 Experiment, Result and Analysis

We use the YMC-Viet project to illustrate the effectiveness of our proposed approach for human-in-the-loop composite service design [30]. Key elements during the service design process in the YMC-Viet project are as follows.

  • Services for composition. To implement the multi-language communication service, a range of atomic services and composite services are utilized. Table 3 shows a list of Web services provided by the Language Grid and human services used.

  • QoS attributes and QoS data. As previously discussed, QoS within the language service domain encompasses both non-functional attributes, such as translation cost and execution time, as well as functional attributes, such as translation quality. In this study, we have also focused on cost, execution time, and translation quality as the primary QoS attributes. Given the absence of QoS data prior to conducting field experiments, we estimated the QoS ranges for various composite services by simulations.

  • Users’ requirements. The user requires that the translation quality should exceed 4.0 and the cost should be reduced to below 50% of a purely human translation service.

Table 3 List of web services and human services for multi-language communication service design (cited from [30])
Table 4 Composite service processes designed in the YMC-Viet project
Fig. 8
A set of 2 line graphs of Q o S translation quality versus Q o S cost and Q o S time. The lines in both graphs follow an increasing trend. Datapoints from P 1 to P 5 are plotted along the line.

Change of service processes and QoS values with participatory service design in the YMC-Viet project

The user-centered participatory service design approach was employed in the design of the multi-language communication service during the first two seasons’ experiments. The iterative participatory design result, ranging from process P1 to P5, is presented in Table 4. The parallel text service, which was utilized from process P2 to P5, is omitted from Table 4 for simplicity. Figure 8 provides an overview of the QoS values associated with each process outlined in Table 4. Moreover, the refinement of composite service design is depicted, with four iterations observed throughout the experiment: from P1 to P2, from P2 to P3, from P3 to P4, and from P4 to P5. Composite service P5 successfully met the users’ requirements and was adopted as the optimal composite service. P5 combines several human-in-the-loop patterns defined in Sect. 2, including pre-processing and post-processing. As a result, P5 was selected as the composite service model for the implementation of the multi-language communication tool (YMC system) during the field experiment following the second season in 2012. The details of the composite service refinements are described in [29, 30].

The service process implemented in the YMC-Viet project yielded positive outcomes. There are two possible reasons. Firstly, the service process employed in this project was relatively straightforward and not overly complex. Secondly, we were able to leverage valuable insights gained from a prior study that focused on analyzing the QoS in human-in-the-loop language services. These valuable insights played a crucial role in reducing the number of potential composite service models at the initial stage of the project. On the other hand, it is imperative to devise effective techniques for optimizing human-in-the-loop services in situations where the service composition is intricate or when a novel application domain is introduced.

5 Analyzing Crowdsourcing Workflow Models

5.1 Crowdsourcing Workflows

In the previous section, we described the design and implementation of the human-in-the-loop service workflow for multi-language activities. In such workflows, human services performed through crowdsourcing are an attractive source of language services. Since the early 2010s, crowdsourcing has been utilized for a range of open-ended tasks, including writing, design, and translation. One of the advantages of crowdsourcing is its flexibility compared to machine services. However, when it comes to open-ended tasks like translation, the quality of the output from an individual worker cannot be guaranteed due to the varying abilities of crowdsourcing workers. To ensure the desired level of quality, requesters often create a workflow in which the output of one crowd worker is refined incrementally by other workers. While the significance of crowdsourcing workflows has been acknowledged in previous research [24], a comprehensive understanding of the general characteristics of such workflows is still lacking.

Collaboration among workers in crowdsourcing has primarily relied on two processes: the iterative process and the parallel process. In an iterative process, one worker’s task is improved upon by other workers in a continuous manner [24]. On the other hand, crowdsourcing is inherently a parallel process, where multiple workers execute the same task and the final result is determined through voting or other means [25, 37]. Studies on iterative and parallel processes in crowdsourcing workflows have revealed two main findings: (1) the diversity of crowd workers plays a significant role [37], and (2) prior results can negatively impact quality if subsequent workers are led astray in difficult iterative tasks [35]. Previous research has focused on analyzing workflows for specific tasks and has not provided a comprehensive understanding of crowdsourcing workflows. While there have been studies on optimizing workflows, these works have mainly concentrated on optimizing fixed workflow structures such as the number of iterations or degree of parallelism.

To optimize the utilization of crowdsourcing, it is imperative to address two key challenges. Firstly, it is crucial to develop a mechanism that allows task requesters to obtain an accurate estimation of the utility of crowdsourcing. This estimation of utility would assist them in making decisions on whether they use crowdsourcing or not prior to submitting an actual request. Secondly, an intuitive interface needs to be designed that enables users to request tasks easily.

Consider the scenario in which a requester intends to utilize crowdsourcing for a translation task. The crowdsourcing platform offers a pool of available workers, but the requester cannot determine the suitability of a worker until the task is completed. Since relying on a single worker may not guarantee translation quality, it is important to establish a translation workflow that involves multiple workers performing improvement tasks. In each iteration, a worker enhances the best result from the previous iteration. However, the requester aims to achieve the best outcome while considering the trade-off between cost and quality. Furthermore, the requester needs to decide whether to request a task based on the predicted cost and quality before posting it. Therefore, it is crucial to develop a model that encompasses crowd workers, tasks, and requester utility to gain a comprehensive understanding of crowdsourcing performance in general.

5.2 Modeling Iterative and Parallel Processes

To gain a comprehensive understanding of the crowdsourcing workflow, it is necessary to construct a model that can effectively estimate the utility of the workflow composed of iterative and parallel processes. This model is defined by several key factors, including the distribution of abilities among crowd workers, the level of difficulty associated with the task, and the preferences of the requester.

5.2.1 Workers

It is expected that workers with high abilities will produce high-quality results. For the sake of simplicity, we assume that the quality of a task’s execution is solely determined by the ability of the worker who performed the task. Given that the ability of a worker is not known prior to task execution, we employ a beta distribution to model the distribution of worker abilities. Probability density function f(x|a, v) is given by Eq. (1).

$$\begin{aligned} f(x|a,v)=\textrm{Beta}\left( \frac{a}{\textrm{min}(a,1-a)v},\frac{(1-a)}{\textrm{min}(a,1-a)v}\right) \end{aligned}$$
(1)

Here \(a \in (0,1)\) is the normalized value of the average ability of the workers in the crowdsourcing platform. \(v \in (0,1)\) is a parameter that determines the variance in worker ability. When v is near 0, the variance approaches 0. When v is near 1, the variance approaches the highest variance with average worker ability of a. The model extends the previous work [10] by modifying a parameter that describes the variance of worker ability.

5.2.2 Workflows

An open-ended task consists of iterations of improvement tasks, thus referred to as an iterative process. In an iterative process, high-quality results are achieved by iteratively improving prior work by a new worker. However, it is worth noting that there are instances where multiple workers simultaneously improve the same task, known as a parallel process. Examples of improvement tasks implemented as iterative and parallel processes are reported by Little et al. [35].

We formally define a crowdsourcing workflow as \(w=(p_1, \dots , p_n)\), where n is the number of improvement tasks in the iterative process and \(p_i (1 \le i \le n)\) is the number of workers that execute the ith improvement task in parallel. As a result, the total number of workers in the workflow is given by \({\displaystyle m=\sum \nolimits _{i=1}^n p_i}\).

After each iteration, the best result will be automatically selected. In the case that none of the results have better quality than the input of the improvement task, the input will be designated as the best result.

5.2.3 Improvement Task

Various tasks possess varying levels of difficulty. We assign a parameter \(d \in [0,1]\) to quantify the improvement difficulty of a task. If the improvement difficulty, d, is 0, then the improvement task is extremely easy. In contrast, the quality of a task with \(d=1\) indicates that the task is extremely challenging to improve. For example, if the task involves adding a missing caption to an illustration, then d would be close to 0 as it is relatively simple for a new worker to improve the quality by providing additional information. Conversely, if the task involves improving the illustration itself, the value of d may approach 1 since it is always extremely difficult to improve the output of another designer. For most other types of tasks, such as translation improvement, the value of d lies between 0 and 1. Given the improvement difficulty d of a task, we use the function \(q'(a, q)\) to define the quality of the outcome after executing the improvement task once, where a represents the worker’s ability and q denotes the quality of the input result of the current improvement task.

$$\begin{aligned} q'(a,q) = q+(1-q)a-q(1-a)d \end{aligned}$$
(2)

The equation presented above represents the summation of three distinct components. The first component represents the original quality, denoted as q, of the input result for the current improvement task. The second component signifies the increase in quality that occurs following the execution of the improvement task. Lastly, the third component represents the penalty in quality that arises if the improvement fails. We will further explain the second and third components in more detail. If the original quality of the input result is q, then the remaining potential for quality improvement is \(1-q\). The second component, \((1-q)a\), indicates that the extent of improvement is proportional to the worker’s ability, denoted as a. Conversely, \(q(1-a)\) represents the likelihood of improvement failure. When the original quality is high or the worker’s ability is low, the probability of improvement failure increases. The inclusion of the improvement difficulty d in the multiplication of the third component is justified by the fact that a larger value of d corresponds to a higher likelihood of quality deterioration. In other words, tasks with greater improvement difficulty are more prone to a decrease in quality. In the scenario where the improvement task is carried out by a single worker, the expected value of the quality after executing the improvement task is denoted as \(q'(a,q)\), as the expected value of the worker’s ability is a.

Next, we will elucidate the quality improvement through the incorporation of parallel processing. When multiple workers engage in the improvement task simultaneously, the outcome with the highest quality is selected as the assumed result. Consequently, the quality of the outcome is equivalent to that achieved by the worker with maximum ability during the iteration. We denote p as the number of workers involved in the improvement task in the current iteration. The maximum ability among these p workers (\(a_p^{\textrm{max}}\)) is estimated as the average of the maximum distribution (Eq. (4)). Here, F(x|a, v) represents the cumulative density function for f(x|a, v), and \(I_x(y,z)\) denotes the regularized beta function, which can be calculated using Eq. (3).

$$\begin{aligned} I_x(y,z) = \frac{\displaystyle \int ^{x}_{0}t^{y-1}(1-t)^{z-1}dt}{\textrm{Beta}(y,z)} \end{aligned}$$
(3)
$$\begin{aligned} &a_p^{\textrm{max}} = \int ^{1}_{0}xF(x|a,v)^p dx \\ &\,\,= [xF(x|a,v)]^{1}_{0}-\int ^{1}_{0}F(x|a,v)^p dx \nonumber \\ &\,\,= 1-\int ^{1}_{0}I_x\left( \frac{a}{\textrm{min}(a,1-a)v},\frac{(1-a)}{\textrm{min}(a,1-a)v}\right) ^p dx \nonumber \end{aligned}$$
(4)

Taking \(a_p^{\textrm{max}}\) as a, the quality obtained by parallel processing with p workers will be \(q'(a_p^{\textrm{max}},q)\).

5.2.4 Utility

The objective function for workflow optimization is determined by the utility of the requester in executing workflow, denoted as U. Previous studies have assessed the utility of a workflow by considering both the quality of the task and the cost of execution [10, 20]. In this study, utility is defined as the weighted sum of quality (Q) and cost (C) [48]. The preference of the requester is represented by the weight assigned to quality, denoted as \(\beta \). Thus, the weight assigned to cost is equal to \(1-\beta \).

$$\begin{aligned} U= \beta Q+ (1-\beta ) C \end{aligned}$$
(5)

\(Q \in [0,1]\) can be obtained from the predicted quality of workflow w. The cost, \(C \in [0,1]\), is the normalized value given by Eq. (6), where m represents the number of workers and M represents the predefined maximum number of workers. It is important to note that the total cost is solely determined by the number of workers and is not affected by iterative or parallel processes.

$$\begin{aligned} C=\frac{M-m}{M} \end{aligned}$$
(6)

5.3 Workflow Optimization

5.3.1 The Search Algorithm

Based on the process model presented above, it is possible to make predictions about the utility of a given workflow. Given a large number of potential workflows (specifically, if there are n workers, there are \(2^n\) possible workflows), it is crucial to employ an efficient search strategy for workflow optimization. In this regard, we propose a search algorithm that identifies the maximum expected value of utility from a limited search space.

We assume that the cost of the workflow is proportional to the number of crowd workers involved. Therefore, when the quality is fixed, the utility of the workflow will monotonically decrease as the number of workers increases. On the other hand, the quality will monotonically improve with an increase in the number of workers. Although there may be occasional failures in the improvement tasks, it is assumed that the result with superior quality is selected when comparing the input and output of an improvement task. Therefore, an increase in the number of workers does not result in a decline in quality. Based on these assumptions, we can see that excessively increasing the number of workers will lead to a decrease in utility, as quality always has an upper limit. That is why there exists an optimal workflow that can maximize utility.

The proposed algorithm for identifying the optimal workflow is referred to as Algorithm 1. This algorithm operates within a state space composed of workflows, with each workflow being considered a state. The initial state, denoted as \(w = (1)\), consists of a single improvement task performed by one crowd worker and is stored in the state set OPEN. The state space is searched by expanding the contents of state set OPEN. The expansion process expand is outlined in Algorithm 2; it takes a workflow w as input and returns a set of workflows, denoted as W, which includes all possible workflows generated by adding one crowd worker to the original workflow w. The function utility in Algorithm 1 takes a workflow w as input and returns the predicted utility. The search algorithm stores only \(w'\) that is in the expanded set of w and has higher utility than workflow w in the OPEN state set. This approach ensures that the search begins from the center of the crater and terminates at the crater rim, effectively avoiding the horizon effect in the state space where workflows are considered as states.

Algorithm 1
A screenshot of 25 line algorithmic code for optimal workflow search. Workflow u, current best flow s, and expanded workflow closed are initialized. While and for looping statements are coded. S is returned.

Searching Optimal Workflow search

Algorithm 2
A screenshot of a 12 line algorithmic code for expanding workflow. The number of workers pi, the number of iterations n, and the workflow w are initialized. The for looping statement is coded. W is returned.

Expanding Workflow expand

5.3.2 Optimality

Here we will discuss the optimality of the workflow search algorithm (Algorithm 1) for crowdsourcing tasks. In a crowdsourcing workflow that consists of iterative and parallel processes, the search algorithm begins with an initial workflow state containing only one crowd worker and gradually expands the state space by adding one crowd worker at each epoch. The search algorithm terminates when the workflow state with the highest utility reaches the optimal workflow based on the given assumptions.

To prove the termination of our search algorithm, we show that the increase in utility created by adding a worker monotonically decreases with higher utility. Let the expected values of the quality and cost of workflow w with m crowd workers be q and c, respectively. First, we show that incremental quality monotonically decreases when one crowd worker is added with either iteration or parallelism. Assuming that the additional crowd worker is used to increase the iteration number, the incremental quality is given by \(\Delta q = a(1-q)-(1-a)qd\). Here a and d are constants assuming the additional worker always has expected quality a. In each iteration, q monotonically increases. Therefore, \(a(1-q)\) monotonically decreases and \((1-a)qd\) monotonically increases. As a result, incremental quality \(\Delta q\) monotonically decreases. On the other hand, when the additional crowd worker is used to increase parallelism, the quality increment \(\Delta q\) depends on the increment of the maximum ability of worker \(\Delta a\). Since the maximum expected ability is calculated using the regularized beta function, which satisfies \(I_x(y,z) \le 1\), \(\Delta a\) monotonically decreases with the increase in m. Therefore, strengthening parallelism leads to a monotonic decrease in \(\Delta q\). The increment of the maximum value of a beta distribution monotonically decreases as it approaches 1, so quality increment \(\Delta q\) monotonically decreases with the addition of a worker. Second, cost increment \(\Delta c\) remains constant when one crowd worker is added. This implies that the normalized cost C monotonically decreases. As utility is calculated by the weighted summation of quality and cost, the increase in the amount of utility decreases and turns negative.

In summary, the incremental utility monotonically decreases and eventually becomes negative at a certain point. Therefore, the search algorithm terminates under the given assumptions. Furthermore, since the expansion of the workflow state space stops when the incremental utility becomes negative, the workflow state with the maximum utility is obtained when the search algorithm terminates.

It should be noted that the above discussion does not guarantee an optimal solution when increasing crowd workers in real-world crowdsourcing tasks. Instead, the model can calculate the optimal workflow based on predetermined values. However, if the optimal solution search can be conducted efficiently, we can gain insights into the characteristics of crowdsourcing workflows and utilize this knowledge in the design of real-world crowdsourcing tasks.

5.3.3 Analysis of Optimal Workflows

Based on the established model and its optimization algorithm, it is possible to make estimations regarding the utility of each workflow under different parameter settings. In this monograph, we mainly report the experiment that examines the optimal workflows and their utility for different parameter settings. The details of the experiment that compares the performance of iterative and parallel processing methods are described in [11].

Table 5 Optimal workflows in different variations of v and d

We use the proposed search algorithm to obtain optimal workflow w for various combinations of parameters. Furthermore, we calculate the utility of each workflow w. The specific parameter settings used in the experiments are as follows:

 

 Average ability of workers \(a \in (0,1)\): :

varied from 0.1 to 0.9 in steps of 0.2.

 Variance of worker ability \(v \in (0,1)\): :

varied from 0.1 to 0.9 in steps of 0.2.

 Improvement difficulty \(d \in \) [0,1]::

0 (low), 0.5 (middle) and 1 (high).

 Preference of the requester over quality \(\beta \): :

0.1, 0.5 and 0.9.

 

Table 5 and Fig. 9 present the findings of optimal workflows and their utilities under different settings of the variance of worker ability and improvement difficulty of tasks. The results indicate that as the variance of worker ability increases, optimal workflows tend to exhibit greater parallelism. Additionally, the parallelism of optimal workflows also tends to increase with higher levels of improvement difficulty. The utility of optimal workflows demonstrates an upward trend as improvement difficulty decreases. However, it is worth noting that the utility of optimal workflows can also increase with higher levels of worker ability variance, even in cases where improvement difficulty is high. This is because workflows with a high degree of parallelism are more likely to be optimal solutions, and worker ability becomes more influential when the variance of worker ability is high.

Fig. 9
A multiple line graph of the utility of the optimal workflow versus the variance of worker ability. The lines are d equals 0, 0.5, and 1. Line D equals 0 is parallel at 0.785. Lines D equal 0.5 and 1 follow an increasing trend. The data are estimated.

Utilities of the optimal workflow in different variations of v and d

Table 6 and Fig. 10 present the outcomes of optimal workflows and their utilities under different variations of the average worker ability and quality preference of the requester. The findings indicate that the optimal workflows exhibit the highest level of parallelism when the average worker’s ability is at the intermediate level (i.e., \(a=0.5\)). Additionally, as the average worker’s ability deviates from the intermediate level (either higher or lower), the degree of parallelism in the optimal workflows decreases and iterative improvement becomes more effective. Not surprisingly, optimal workflows involve a larger number of workers when the requester places a high emphasis on quality (i.e., cost has low importance). Furthermore, the utility of optimal workflows is more influenced by the average worker’s ability when the requester prioritizes quality.

Table 6 Optimal workflows in different variations of a and \(\beta \)
Fig. 10
A multiple line graph of the utility of the optimal workflow versus the average ability of workers. The lines are beta equal to 0.1, 0.5, and 0.9. All the lines follow an increasing trend.

Utilities of the optimal workflow in different variations of a and \(\beta \)

The above analysis can also provide an explanation for previous research findings. For instance, Kittur et al. demonstrated the significance of having a diverse pool of crowd workers in a parallel process [25]. Kamar et al. proposed that increasing the number of crowd workers is an effective strategy, particularly when the cost is relatively low [20]. Further, Little et al. revealed that prior work with poor quality can have a negative effect on the overall quality of the workflow if the crowdsourcing task is difficult [35].

5.4 Implementing Crowdsourcing Workflow Models

Based on the proposed crowdsourcing workflow model and optimization method, we implement a system that facilitates the utilization of workflows for both task requesters and task interface developers [12].

The system consists of two modules: the workflow management module and the task interface module. The workflow management module calculates the optimal workflow by considering the average and variance of workers’ abilities derived from past execution results, as well as an estimation of task difficulty. Requesters can select a workflow that they deem reasonable based on the predicted values of quality and cost. On the other hand, the task interface module is designed to cater to the needs of both requesters and workers.

While the implementation of this module may vary depending on the specific task, communication between requesters and workers remains a common feature across all tasks. The system receives input data through the task interface and communicates with the workflow management module. It is worth noting that the proposed system can be customized to suit typical translation tasks and other applications.

6 Related Work

6.1 Human Activities in Service Composition

Service composition has been a significant topic in the field of service-oriented computing for the past two decades. Various approaches, such as Petri nets, AI planning, formal models, and semantic approaches, have been proposed for service composition [14, 38, 43]. Zeng et al. introduce a multidimensional QoS model for service composition, considering attributes such as execution price, duration, reputation, successful execution rate, and availability [52]. In our work, we consider QoS attributes from both the non-functional aspects and functional aspects. Similarly, Canfora et al. consider application-specific QoS attributes along with general non-functional QoS [6]; they use an image processing workflow as an example, where resolution and color depth are considered application-specific QoS attributes. However, their work primarily focuses on overall QoS computing, while our work addresses the QoS optimization in human-in-the-loop service composition.

Human activities have been studied in the context of workflow management. Zhao et al. propose a formal model of human workflow based on BPEL4People specifications, which uses communicating sequential processes (CSP) to model a human workflow [54]. However, their model does not cover the composition of human activities and Web services. Other research has explored human workflow from the perspectives of organization management [56] and resource management [41]. Moreover, crowdsourcing has emerged as a promising approach for cost-effective task execution since the early 2010s. For instance, crowdsourcing translation has been proposed for building corpora in natural language processing, with a focus on quality management [3, 50]. While these studies discuss the possibility of replacing professional human translators with non-professional crowd workers, our research explores the integration of Web services and human activities to analyze the effects on QoS of composite services.

6.2 User-Centered Composite Service Design

Research on QoS-aware service composition has traditionally assumed that composite services are given in advance. The primary focus is then to select the most suitable set of atomic services based on QoS optimization [7, 15, 36, 46, 52, 53, 55]. Our research differs from previous studies in that we focus on designing composite services in real-world scenarios rather than selecting atomic services for given composite services.

Moreover, most of the previous work overlooks the challenges of handling QoS issues in real service composition environments. Firstly, there are situations where certain QoS attributes cannot be aggregated for composite services. For example, it is difficult to calculate the translation quality of a composite translation service by simply aggregating its component atomic services (e.g., machine translation service, morphological analysis service, dictionary service). Secondly, when multiple QoS attributes are present, maximizing all of them is challenging due to potential anti-correlated relations [2]. Thirdly, QoS values vary with the context of different service invocations, which is known as QoS uncertainty [49]. These issues become even more challenging in the human-in-the-loop composite service design. Therefore, a user-centered service design methodology is crucial when designing composite services, which is the focus of our work [28,29,30, 33].

6.3 Crowdsourcing Workflow Models

Crowdsourcing workflows are commonly employed to enhance the quality of challenging tasks. They were originally proposed to complete the tasks whose quality cannot be guaranteed by a single worker. Quality control of the classification or voting task by multiple workers can be regarded as the workflow of parallel processing [42]. On the other hand, the iterative process of improvement is proposed to deal with open-ended tasks. Several workflow processes have been proposed to address the issue of quality control in specific tasks. For example, Soylent utilizes the Find-Fix-Verify crowd programming pattern to improve worker quality by dividing word processing tasks into generation and review stages [5]. Zaidan and Callison-Burch propose a crowdsourcing translation workflow that achieves high-quality translations by aggregating multiple translations, redundantly editing them, and selecting the best results using machine learning [50].

Translation is used as a typical example throughout our work; it has also been a subject of study in the context of crowdsourcing. Zaidan et al. demonstrate the feasibility of crowdsourcing translation through a sequence of tasks, where workers create translation drafts, edit translated sentences, and vote to select the best translation [50]. Ambati et al. propose a combination of active learning and crowdsourcing translation to improve the quality of statistical machine translation [3]. Additionally, Aziz et al. develop and investigate a crowdsourcing-based tool for post-editing machine translations and evaluating their quality [4].

Moreover, various tools have been developed to manage the crowdsourcing of complex tasks. TurKit, for instance, is a toolkit designed for prototyping and exploring algorithmic human computation [35]. CrowdForge decomposes and recomposes complex crowdsourcing tasks based on the MapReduce algorithm [25]. Turkomatic supports task decomposition by crowd workers [26]. CrowdWeaver is a system that visually manages complex tasks and allows for task decomposition revision during execution [23]. The development of tools for modeling and managing workflows is of interest as it aligns with the objective of enhancing the understanding of crowdsourcing workflows. In contrast, our study provides a theoretical framework for the development of workflow design in crowdsourcing and offers valuable insights into the design of human-in-the-loop services as well.

7 Conclusion

This monograph summarized our research efforts on designing and analyzing human-in-the-loop service compositions. The main contributions are as follows:

  • We studied composite services that compose human activities and Web services, considering both the functional and non-functional QoS attributes. To comprehensively analyze how human activities affect the QoS in such composite services, we conducted extensive experiments in the field of language translation services. Our findings indicated that the integration of human activities and Web services introduces diversity into conventional service processes. Our analysis also revealed that high-quality human activities can significantly enhance various QoS attributes of service processes, whereas low-quality human activities may have negative effects on service processes.

  • We conducted an empirical study on designing human-in-the-loop composite services, considering the uncertain nature of real-world services and the need to satisfy users’ QoS requirements. We proposed an iterative participatory service design process that consists of the phases of observation, modeling, implementation, and analysis. Then, we used a field study of multi-language communication service design to illustrate the effectiveness of our approach.

  • We proposed theoretical approaches to understanding the crowdsourcing workflows by using an example of complex translation tasks. We modeled workers and tasks and calculated the optimal workflows. To confirm the feasibility of this model, we conducted computational experiments to calculate the optimal workflow under various parameter settings. The experiment results were also consistent with existing research. Although this study mainly focused on human activities, there is potential to incorporate the proposed crowdsourcing workflow optimization techniques into the human-in-the-loop service design.

The research presented in this monograph was carried out during the 2010s. In recent years, the emergence of cloud computing, edge computing, Internet of Things (IoT), artificial intelligence (AI), and machine learning (ML) has led to a substantial growth in the variety of service types and available services on the Internet. This development has had a significant impact on the research community of service composition.

On the other hand, the increasing demand for advanced intelligent applications in smart cities has highlighted the importance of the human-in-the-loop design methodology, particularly in the field of IoT, AI, and ML [47, 51]. We expect that the insights obtained from our previous research on human-in-the-loop service composition could contribute to these emerging fields.