Keywords

1 Introduction and Background

Over the last decade, there has been a significant change in the way software companies function and use platforms as a type of open innovation to expand their markets and stakeholders, and have seen a significant increase in software usage. These platforms serve as the foundation for creating software ecosystems (SECO)s, where the platform provider, also known as the keystone organization, collaborates and innovates with other software vendors [1, 2]. Software ecosystems are complex and dynamic systems, consisting of various software components, platforms, and developers that interact with each other [1]. Companies such as HubSpot, Salesforce, Xero, Slack, Shopify, and Wix have thrived from their integration, marketplace, innovation, and other qualities that make a thriving ecosystem [3].

Various operating system-specific application stores, marketplaces, public review websites, and keystone platforms like Shopify provide user feedback in the form of reviews [23]. Developers rely on this feedback to make informed decisions and prioritize their actions [5]. In recent times, there has been a growing inclination towards examining user reviews to extract insightful knowledge about software products and recognize areas for improvement. Although previous studies have been made to identify problems and concerns through user reviews [6], our study focuses on analyzing reviews that are specific to software ecosystems as analysis of ecosystems remains a challenge in software ecosystems [7].

Several studies have identified various problems in SECOs, such as coordination problems [8], vendor lock-in [9], interoperability issues [10], and project management [11]. The challenges of SECO research include understanding the complex interactions and selection of various stakeholders [12], developing effective governance mechanisms [13], designing appropriate business models [1], and Requirement elicitation [24]. The use of Natural Language Processing (NLP) and user review mining has become a popular research topic in software engineering due to the increasing importance of user feedback in software development [14, 15]. This approach involves analyzing user reviews to extract useful information, such as feature requests, bug reports, and user opinions. Work similar to ours has been on identifying privacy themes from user feedback [16] and classifying advertisement-related reviews [17].

However, analyzing software ecosystem reviews is difficult due to multiple feedback channels and the complex interplay with third-party applications. It can be hard to distinguish if feedback is for a single partner application, multiple applications, or the core platform [18]. Platform providers must rely on partners to gather feedback and make it accessible. The distinction between the core product and partner apps might become unclear, making it challenging for end users to provide feedback and platforms to analyze feedback [19]. To further our understanding of end-user challenges and their mitigation strategies in SECOs, we ask the following research questions:

  • RQ1: What are the different problems faced by end-users in software ecosystems?

  • RQ2: How has the amount of end user feedback in software ecosystems changed over time?

  • RQ3: Are there recommended strategies for mitigating end-user challenges in software ecosystems?

1.1 Research Contributions

Our study provides several contributions. First, we introduce a method for researchers to work with user feedback in SECOs and distinguish SECO-related reviews. Then, we shed light on six areas of end-user concerns in software ecosystems and provide an array of discussion topics and feedback for each area. Additionally, we also reveal how SECO-related feedback has grown over time which shows the increasing need for studies in this space. Finally, we provide recommendations for developers and owners of software platforms to address and try to prevent these problems from occurring. The study’s two-part design enhances understanding of end-user concerns and industrial perspectives on software ecosystems, guiding platform design for better ecosystem management and sustainability through key roles keystones play in a platform’s success [1, 3, 20].

2 Methodology

We used a mixed-method study as summarized and illustrated in Fig. 1

Fig. 1.
figure 1

Research Design Summary

2.1 SECO Platforms and Dataset Curation

First, we identified 15 popular SECO platforms, based on their characteristics such as integration, innovation, interoperability, marketplace, software as a service (SaaS), and integration platform as a Service (iPaaS) that define a SECO [1,2,3, 21] in addition to the well-defined classification of software ecosystems [4] as software platforms, service platforms, software standards. We further expand on the discussed “service platform” by categorizing them according to service sectors by selecting one or two platforms for each sector that serves as a baseline to retrieve similar platforms. We picked e-commerce platforms (Shopify and WooCommerce), CRM tools (HubSpot, ZenDesk, and MailChimp), Software as a Service (SaaS) (SalesForce and Xero), Communications Platforms (Slack and Teams), Payment Integration software (Square Up), Integration Platform as a Service (iPaaS) solutions (Zapier), development platforms (Wix and WordPress), and Human Resources Integration Platforms (Bamboo HR).

Table 1. User Feedback Collection

We retrieved applications from mobile application stores (Google Play and App Store) with search queries (regex = “software” + “as a service/platform /ecosystems/integration”) and by retrieving platforms “similar” to the identified 15 baseline platforms using Python libraries mentioned below. A total of 283 platforms were identified, but only 139 of them were used for analysis based on having SECO-relevant reviews (and which we discuss next). We used sources shown in Table 1 to collect user feedback from where we scraped 2,455,285 user reviews. The reviews were scraped using manual web scraping on TrustPilot, the google-play-scraperFootnote 1, and app-store-scraperFootnote 2 libraries in PythonFootnote 3 for respective Google and Apple app stores, KaggleFootnote 4 for Shopify store reviews, and directly from organizations. We combined all of it to form a single dataset with attributes ’source’, ’platform’, ’review content’, ’review date’, and ’developer response’.

2.2 Identifying SECO-Related Reviews

To manually determine if a review is a SECO-related review, reviews were read in detail to understand the context of the user comments, employed pair coding and Cohen Kappa’s coefficient [22] in the process. The classification was further refined by utilizing SECO-related keywords such as “platforms,” “integration,”, “API”, “ecosystems,” “plugins,” and “sync.” These keywords were instrumental in distinguishing SECO reviews from non-SECO reviews and were manually validated based on contextual understanding. For instance, reviews containing contextual clues such as integration issues, third-party app names, and plugin names were classified as SECO-related. Conversely, reviews that lacked explicit SECO-related terminology, such as those discussing poor app performance or usability issues, were classified as non-SECO reviews. Some reviews like “the platform constantly crashes on my older iPhone..” that at first appeared to be a SECO-related review, were classified irrelevant as well, as they do not provide specific challenge regarding use of the platform, rather a generic comment about compatibility.

We began by creating a subset of 500 random reviews, ensuring an equal distribution of reviews corresponding to each rating scale, ranging from 1 to 5. A second-coder of the dataset labeled the identical 500 reviews with an author over 5 iterations of 100 reviews each, yielding an incremental agreement score, saturating at 0.81, indicating high agreement levels. Having built a shared understanding of what a “SECO-related review” is, we split 6000 random reviews (1200 reviews from rating 1–5 each). Upon combining the initial 500 reviews and the 6000 labeled reviews, a total of 848 SECO-related reviews were identified. Reviews like “Nothing but issues with this platform. You change a setting and it doesnt work on *third-party app name*, fix it on *plugin name* and the platform changes it back!! Terrible Customer service dont help much, just tell you to speak to *platform name*! Who say its an integration issue. Wasted two days trying to integrate this and would have been quicker doing it all manually!” were marked as a SECO review whereas reviews like “Its a very useless app. It cannot run in normal internet speed. It’s a lot of confusion to use this app. It buffers a lot while attending class” were marked as not relevant.

We then trained an XGBoost classifier [25] using the labeled 6500 reviews with a standard 80:20 proportion of train-test split for training the model. The model was trained with 0.97 accuracy, 0.99 precision, and 0.80 recall, and 0.89 F1-score, indicating high accuracy and reliability [26]. Having applied the 2.4 million reviews on this classifier, we were left with 40,261 reviews related to SECO from 139 platforms. Table 1 shows a breakdown of reviews retained from all the sources.

2.3 Manual Multi-class Labeling

On the 40,261 SECO-related reviews, we selected a balanced dataset (rating) of 2000 SECO-related reviews for manual labeling and further labeled 3000 more. We listed 6 common SECO issue themes and performed single-label, multi-class, manual classification following a well-practiced card-sorting technique [27]. Relevant keywords were created by observing term frequencies using TF-IDF Vectorizer [28] and manual observation. Categories and their keywords included: Integration: integration, API, plugin, sync; Customer Support: customer, support, representative, speak; Design & Complexity: interface, confusing, easy, hard, design, customization; Privacy & Security: privacy, security, beware, fake, scam, login, authentication, password; Cost & Pricing: price, cost, refund, expensive, charge, buy, payment, credit, card, merchant, money; Performance & Compatibility: device, phone, slow, responsive, frequent, audio, video, crash, desktop, web, mobile, quality. We used these keywords to label 3000 more reviews. A review belongs to a class with high confidence when at least 2 of the keywords were present in the review. If none two matched, at least one keyword need to be matched. If none of the keywords matched, they were simply classified as ‘Other’. We manually verified 200 randomized reviews and observed all of them accurately represented SECO-related concerns without any major overlapping of categories when filtered with at least 2 matching keywords.

2.4 SECO Challenges Classifier and Analysis Method

We used XGBoostFootnote 5 as the primary classification model to classify reviews based on different categories. The dataset of 5000 train-test training reviews was preprocessed using well-used and known NLTK toolkit featuresFootnote 6. We performed a training-test split with a frequently used ratio of 80:20. We used precision, recall, and F1-score as evaluation metrics to measure the performance [26] of the model in different categories. The XGBoost model achieved an accuracy of 0.93, with a macro average precision of 0.92, recall of 0.89, and F1-score of 0.90 as shown in Table 2, which indicates that the model was able to classify the reviews into different categories with very high accuracy. To validate the performance of the model, we manually verified a sample of 50 reviews from each category, which resulted in an accuracy of 91 percent. We compared the XGBoost model’s performance with similar classification models. The XGBoost model outperformed with an accuracy of 0.93, while Linear SVC and Random Forest achieved an accuracy of 0.84 and 0.82, respectively. The methodology demonstrates the effectiveness of using XGBoost for classifying reviews into different categories.

Table 2. Classification Report

We implemented the classifier on the 40,261 software ecosystem reviews. We identified the most relevant and frequently occurring terms (also referred to as features) using a set of negative reviews for each category. The set of negative reviews belonging to each category is kept using Vader SentimentFootnote 7 with a negativity score of over 0.4. The features present in those reviews are extracted using TF-IDF. In Eq. 1, t is a term (word), d is a document, D is the corpus (collection of documents), ’tf’ is the term frequency, and ’idf’ is the inverse document frequency [28].

$$\begin{aligned} \text {tf-idf}(t,d,D) = \text {tf}(t,d) \cdot \text {idf}(t,D) \end{aligned}$$
(1)

The reviews were preprocessed to remove non-English words, stop words, and tokenize them. We then performed Chi Squared analysis to measure the association between each feature and its’ corresponding label. The chi-Squared analysis is a popular method not only for hypothesis validation but also useful for feature selection and computing association between features and their labels [29].It can be implemented using the formula in 2 where \(\chi ^2\) is the chi-squared statistic, n is the number of categories, \(O_i\) is the observed frequency in category i, and \(E_i\) is the expected frequency in category i.

$$\begin{aligned} \chi ^2 = \sum _{i=1}^{n}\frac{(O_i - E_i)^2}{E_i} \end{aligned}$$
(2)

2.5 Interviews

Having identified these challenges, we also conducted qualitative research through semi-structured interviews [30] to derive and articulate a set of mitigation strategies. Four platform executives were selected for the interviews based on their roles, positions, and platform profiling (anonymized as P1, P2, etc.) as shown in Table 3. The selection used purposive sampling [31]. The interviewees were asked questions about monitoring user feedback, ensuring seamless integration, recommended strategies for solving challenges, managing an evolving marketplace of vendors, and other questions relating to the findings from RQ1.

Table 3. Interviewee Profile

The interviews were conducted following ethical principles, including informed consent, confidentiality, and privacy, as per university approved research ethics application. The data collected from the interviews were transcribed, sorted, and analyzed using a thematic analysis approach [32], which enabled us to identify and analyze the themes and patterns in the data related to how companies identify and address issues related to software ecosystems through user feedback.

3 Findings and Discussion

3.1 Distribution

Out of 40,261 reviews, ‘Integration’ has the highest proportion of software ecosystem reviews at 28.85% with a 4.26/5 median rating. ‘Customer Support’ is the second highest category at 17.67% with a 3.72/5 median rating, followed by ‘Design and Complexity’ at 8.35% with a 4.47 rating. ‘Privacy and Security’ have the lowest rating of 2.87/5 with 4% of the reviews, ‘Cost and Pricing’ has 6.74% with a 3.67/5 rating, and ‘Performance and Compatibility’ has the lowest proportion of reviews at 2.80% with a 3.78/5 median rating. SECO review not fitting into any of the six categories were classified as ‘Other’ with 31.58% of the reviews, leaving room for future work for introduction of additional categories.

3.2 RQ1: End-User Pain-Points in SECOs

In this section, we present the findings from reviews for all classified areas of SECO issues. In order to extract the pain-points (features), we performed the following set of operations: Let C be a set of reviews with respective category IDs, where review \(r_i\) has a sentiment score \(s_i \in {positivescore, negativescore}\). Let \(C = {(l_i, R_i) \mid i = 1,2,\dots ,n, s_i = \text {negative score} > 0.50}\) be the set of negative reviews. Let \(L = {l_1,l_2,\dots ,l_n}\) be the set of categories present in C. Define \(R_i = {r_j \mid r_j \in R_i \text { and } s_j = \text {negative}}\) as the set of negative reviews belonging to category \(l_i\). Define \(\text {TF-IDF}_c : R_c \rightarrow F\), where \(F = {(r,f) \mid r \in R, f \in W}\) is the set of review features for all reviews in C. Let \(F_l' = {f \mid (r,f) \in \text {TF-IDF}c(R), r \in R_l}\) be the set of features present in reviews of category l. Let \(\chi ^2 (f,l)\) be a statistical measure of association between feature f and category l. Then, the set of categories and their top 100 features with a \(\chi ^2 (f,l)\) is given by:

\((Labels, (feature, score)){[1,100]} = {(l, F_l', \chi ^2 (f,l)) \mid l \in C, f \in F_l', \chi ^2 (f,l)}\).

Integration. The first category of pain points in software ecosystems is related to integration, with the most common issues being problems with integration and a “lack” of integration altogether. These are followed by “cross-platform issues”, “API errors”, and “API key” problems. Users are frustrated with the difficulty of integrating different software components and systems, which leads to inefficiencies and lost productivity. One of the most common integration complaints is regarding “Facebook API” errors. Similarly, integration errors with “Google API” caused issues with SEO and other critical aspects of online business. Another common integration issue mentioned in the data is the lack of “PayPal integration”. “Mailchimp integration” and “Outlook Integration” are other common issues that cause problems with email marketing campaigns. Several of the pain points in this category are related to specific platforms, such as “Android integration”. The pain points related to integration in software ecosystems can have significant impacts on software architecture [33].

Customer Support. The second category of pain points in software ecosystems is related to customer support extracted from SECO-related reviews. The top pain point in this category is “worst customer service”, followed by “impossible to reach”, “service joke/rude”, and “speak English” indicating significant dissatisfaction among users with the customer service provided by the software ecosystem. Other pain points include difficulty reaching customer support and poor quality of service. Customers seem to prefer speaking to “real humans” over “chat. Poor customer service could result in lost customers and damage to the organization’s reputation. Platforms may need to invest in better support channels to ensure that users and third-party developers have access to the help they need. Overall, the problems identified suggest that users have a variety of dissatisfaction with the customer support provided by the platforms.

Design and Complexity. In our study, the most frequent pain point in the user experience category is around the topic of “bad user interface”. This can be evaluated in several ways from previously established theories [34] and our own findings such as problems in “sorting” and “ads”. Some of the other topics provide more specific examples of what users find challenging about the software interface. For example, the “mobile app interface” topic showed that users have difficulty with software that is primarily mobile-based. The “web interface” related reviews mentioned that users find web-based software challenging to navigate. Additionally, “interface slow” and “lags” indicate that users have problems with the performance of the software. Issues such as “desktop interface” and “other app easy” indicate that users have trouble with desktop-based software and that they may compare it unfavorably to other, more user-friendly applications. The topics in this category suggest that users find software with bad or confusing user interfaces frustrating and difficult to use, which can lead to decreased productivity, innovation, and satisfaction with the software.

Privacy and Security. Privacy and security are critical concerns for most software users, especially in the e-commerce platform realm [35]. Users are often hesitant to trust a platform with their personal and sensitive information [36], and the reviews in this category reflect that. The features discussed in this category include “possible scams”, “fake apps”, and “fake reviews”, all of which suggest that users are worried about the legitimacy of the platform and the third-party apps they are using. Some important pain points in this category were “impossible login” and “keeps asking for passwords”, indicating that users are struggling to access their accounts. An interesting issue topic identified is “data mining”, showing that users are concerned about how platforms are mining their personal data. Other pain points in this category relate to user authentication and security measures. The issue topic of marketplace scammers suggests that users are worried about fraudulent third-party marketplace sellers on the platform. Platforms that can address these concerns and implement robust security measures by clearly stating policies, increased lucidity, and readability are likely to have happier and more trusting users [37].

Cost and Pricing. Pricing is an important characteristic of ecosystem marketplace [38, 39]. This category focuses on the cost and pricing structures of SECOs. The main pain points raised by users were related to “losing money”, “issues with credit card payments”, and “expensive fees”. The reasons for this were “unexpected charges”, “hidden fees”, and ineffective “refund policies”. The pain point “credit card” had a significant association score, indicating that users had issues with their card payments. The pain point “waste money” indicated that users felt that they were spending money on a product that was not worth the cost. Other pain points related to cost and pricing include“refund impossible”, “prices expensive”, “fees expensive”, and “charged accounts”. These raised issues suggested that users lost the company’s trust and were dissatisfied with the pricing and fees associated with the platforms and their services and that they had difficulty obtaining refunds or finding affordable alternatives.

Performance and Compatibility. Though companies choose cross-platform development more and more over native development [40] the most significant pain points in this final category seem to be “web interface” and “device version”, followed closely by the topic “multiple devices” and “loss connection”. These pain points suggest that users are experiencing sync and connectivity issues across web, desktop, and mobile versions of the platform. Another common topic in this category is “mobile website”, suggesting that users are having difficulty accessing and using the software ecosystem on their mobile devices. The pain point “loss data work” suggested that users are experiencing data loss or data corruption while using the software ecosystem. Other pain points in this category included “video audio quality”, “lost quality”, “iPhone iPad issue”, “don’t trust app”, “phone horrible”, “buggy slow”, “app crashes constantly”, “web version”, “loss clients”, “phone laptop”, “sort problem”, and “messed website”. These pain points suggest that users are experiencing issues with the overall functionality and reliability of the software ecosystem, causing them to lose trust in platforms, and even instances of businesses losing clients.

3.3 RQ2: Growth in SECO Feedback Over-Time

We analyzed the change in SECO-related review numbers over time by mapping the reviews from January 2013 to December 2022. We grouped the reviews by month and counted the number of reviews in each month. We calculated the median count for all categories. Reviews from before 2013 and from 2023 were discarded due to their insignificance in number.

Fig. 2.
figure 2

Change in SECO reviews over time

We can observe from Fig. 2 that there has been a significant rise in software ecosystem reviews in the last decade, with the reviews regarding SECOs starting to grow significantly from 2016 onwards. The number of SECO reviews increased from 51 in 2013 to 4,610 in 2022, with the highest growth occurring between 2016 and 2020. In 2020, the growth rate went to a 130.08 percent increase from 2019, but it declined in 2022 with a -26.75 percent growth rate compared to the previous year. The average growth rate from 2018 to 2022 was 258.11 percent. From our interviews, we confirmed that platform organizations faced an increasing demand for integration tools and customer support during the COVID-19 pandemicFootnote 8.

3.4 RQ3: Mitigation Strategies for Platforms

Here, we present our interview findings with platform owners in the form of recommendations, who also fully validated the challenges discussed earlier.

API First Approach. Application Programming Interface (API) first development is a strategy that focuses on building the API first before allowing third-party developers to make an integration request. This prevents organizations from having to implement one-off integration specific to the developer request. For example, the VP of Engineering from P2 said “...small startups have an API first mentality. It’s in the DNA of the company that they’re building an API so that they don’t run into one-off issues.”, which potentially addresses the most talked about API-related end-user concerns such as “lacks integration”.

User and Developer Communities. Mitigating customer support and other end-user problems in a software ecosystem requires actively engaging the user community, supporting developers, continuously improving the platform, and fostering collaboration and partnerships. These strategies help address issues, enhance the user experience, and align with evolving integration requirements, as quoted by P4’s CTO, “..an ecosystem doesn’t thrive if there’s no community for all the stakeholders..”.

Third-Party App Control. Platform owners should mitigate security and financial risks and issues in their ecosystem by implementing a strict vetting process, continuously monitoring and auditing third-party apps, incentivizing safe and high-quality apps through pricing strategies, and providing developer support and resources. Platform P3’s advocate says “If somebody had essentially abandoned all supported their app and they would be removed from our marketplace” which ensures compliance and monitoring in the marketplace.

Feedback-Driven Approach. In order to effectively mitigate design, complexity, and performance issues, adopting a feedback-driven approach is a valuable strategy for platform owners. As mentioned by the CTO of P4 “We monitor user interactions within the apps. We get notices of, like rage clicks, things like that, where they go.”, implementing tracking tools, actively soliciting and carefully prioritizing feedback, incorporating user and developer input into the development process, and maintaining transparent communication channels are advisable.

Cross-Platform Development. Platform owners should prioritize cross-platform development and utilize progressive web apps (PWAs) to enhance the platform’s accessibility and provide a consistent user experience across different devices. To quote P1’s CTO, “We would consider like a cross-platform Progressive Web App To make everything work with mobile devices across the board”, extending the platform’s reach and maintaining competitiveness through cross-platform development, platform owners can attract a wider audience and mitigate platform-specific user issues.

Documentation and Guidelines. Platform owners should prioritize comprehensive documentation, accessibility, quality and security guidelines, and developer support in optimizing the utilization of the platform’s API. By providing clear instructions, easy access, and assistance to developers, platform owners can foster a collaborative and productive developer community, resulting in high-quality integration and improved platform success, as P3’s advocate said, “It starts with having really clear ATP documentation. I think having that publicly available, they start first ideating about the process.”

User Data Management. By providing transparent policies, establishing efficient incident response processes, prioritizing user privacy, and adhering to relevant regulations, platform owners can foster trust, protect user information, and mitigate potential risks associated with data breaches or non-compliance. For example, P1’s CTO said, “We don’t hold the client information in our databases for any longer than, you know, The lifetime of an order which is the lifecycle of the data.”, and P2’s VP of engineering mentioned “Good user data management practice such as streamlined SSO authentication is a good practice to resolve integration as well as privacy issues”, meaning platform owners must ensure that third-party applications delete user data when it is no longer needed, and secure authentication practices must be implemented.

4 Implications

This study represents a first large-scale investigation of end-user challenges in software ecosystems. We presented a method for identifying user feedback that distinguishes SECO-related reviews from general reviews by using methods explained in Sect. 2.2. We also identified that integration issues, customer support, the complexity of design and user interface, issues with privacy and security, pricing issues, and platform compatibility are problem areas in software ecosystems, as well as a set of recommendations to mitigate these challenges. This study has significant implications for SECO researchers, highlighting unexplored end-user challenges and the lack of prior research. The temporal growth of SECO-related reviews, particularly during the COVID-19 pandemic, underscores the dynamic nature of SECOs. The study’s recommendations offer actionable guidance for both researchers and industry stakeholders.

5 Threats to Validity

The study’s results may be influenced by the varied quality and accuracy of data from different sources and limited interviews. The user feedback, mainly from mobile app reviews, may not fully represent all users across various software platforms. The data, although extensive, was selectively scraped from certain platforms, potentially limiting its applicability to diverse software ecosystems, especially open-source software. The identification of software ecosystem-related issues was crucial to the analysis which is a potential threat to the construct validity. However, the pair-coding approach with inter-rater agreement was the most ideal way of initially classifying what a SECO review is. Also, manually investigating the results of the automated classification to ensured accuracy alongside an optimal evaluation results of the classifier.

6 Conclusion and Future Work

This study provides a valuable contribution to the existing knowledge of end-user concerns and the industrial perspective on software ecosystems. By identifying key issues and providing recommendations in several aspects of a SECO platform, our findings can guide platforms in designing and fostering better ecosystems. The methods and techniques used in this study can serve as methodological guidance for future research in this space.

Future work could expand the scope of the study to include more ecosystem platforms and user reviews. The two machine learning classifiers could be further refined to improve its accuracy in first identifying what kind of feedback is a SECO-related feedback, and secondly in categorizing SECO reviews according to the proposed problem areas. Additional problem categories could be identified and analyzed. The effectiveness of the mitigation strategies suggested could be evaluated through implementation and user feedback. Longitudinal studies could be conducted to track the changes in user challenges and developer responses over time.