1 Introduction

Communities that enable open collaboration rely on collaborators [8, 19]. It is essential to have active contributors for sustainability reasons. Enhancing online community engagement is an approach to motivate members to contribute [2], as well as improve usability. Engaged users are motivated, and perceive themselves to be in control over the interaction [17]. Engagement is a category of user experience characterized by attributes of feedback, challenge, positive affect, aesthetic and sensory appeal, attention, variety/novelty, interactivity, and control perceived by the user [17]. Feedback is an engagement attribute related to the need for collaborative awareness of what is happening in the online environment. It helps to provide a common ground [6], a shared background of understanding that supports user interaction. As users need to understand how their actions affect the system and the other community members with whom they relate [5], notifications are usually designed to efficiently provide current and relevant information [4, 5].

In remote collaboration, it is difficult to understand what the current focus of attention of the individuals is. People often fail to realize when common ground is non-existent or insufficient during online collaboration [4, 5].

A variety of features have been developed to allow people to maintain online awareness of interesting information, for instance, notification systems. Knowledge gain from notifications can help users to plan future tasks, interact with others socially, and conclude simple tasks in a timely manner [5]. Notification improves the awareness of what is happening in the system, but does the use of notifications increase engagement with open collaboration online communities?

In this paper, we studied the effects of notification on user engagement in an open collaborative system. The research objective was to understand if the notification would interfere with the users’ engagement of an open collaboration system for architecture image sharing (http://arquigrafia.org.br). The notification was the first feature inserted intending to contribute to engagement, seeking to improve the feedback of each action the user performs with the system and, consequently, supporting the awareness of what happens in the collaborative environment.

2 Awareness and Notification

People working collaboratively must establish and maintain awareness of one another’s intentions, actions and results. Connecting individuals, peers, and social groups as part of their own feedback loops with technology has a great potential of learning, motivation and creativity [3].

Notification systems are typically triggered by user’s task events, such as mail alerts and status updates. Therefore, they typically support awareness of the collaborator presence, tasks, and actions, helping to keep people aware of events beyond their current interactions. In many cases, the notification functionality supports collaborative awareness [5].

Carroll [4] presents a conceptual model of communities based on community identity; participation and awareness; and social networks, in which participation and awareness are directly related to engagement. Users need notification systems that keep them informed about: (i) what is happening to the objects they care about; (ii) what actions are being taken with such objects to access or to modify them, and (iii) who is performing these actions. Relevant information could be a discrete event or a series of events [5, 16].

The success of notification systems depends on supporting the attention between tasks, while simultaneously allowing a utility evaluation by accessing additional information. Notifications should ideally cause minimal user distraction with respect to his/her primary task [13]. However, some notification systems are designed to attract user’s attention and get them to perform a task, such as reminding a commitment. Examples of notification systems include instant messaging, system and user status updates, e-mail alerts, and news [14].

The benefit of notifications depends on the content of the message, its structure, style, and relationships between messages. The benefit might also vary among users and situations. Therefore, a notification can result in user ratings completely different from the perceived benefit [21].

According to Sousa et al. [20], personal and business relationships are built on systems that aggregate a variety of contexts and configurations, establishing new interaction scenarios that bring together into a common space the technology, applications, and users. This space becomes an aggregator of individuals and actions that enable certain behaviors, such as sharing, definition of new connections, as well as proposals of learning and participation of each individual involved [12].

In this context, Millen and Patterson [15] argue that shared online spaces need to be designed to support social engagement. Notification is particularly an important design feature. In addition, there is a negative impact of prolonged silence in the system, as it exhibits the inactivity of the community. Daily and ongoing activities are important to sustain community participation and it is important that members become aware of this activity.

3 Related Work

A large body of literature seeks to describe the factors that contribute to a specific online behavior, such as the frequency of participation by message posts [15]. Carroll et al. [5] studied a virtual school system to identify key aspects of awareness in collaborative situations, understand usability issues, and explore how notification systems can be designed. When analyzing integrated event logs, they found that the interaction flow with notification systems has an impact on the ability users have to collaborate and to be aware in the system. As a result, the authors presented notification design strategies to better support collaborative activities.

Vastenburg et al. [21] present the results of a controlled laboratory study of ten participants performing routine household activities. They subjectively assessed factors that were expected to influence the acceptability of notifications. All user activities and notifications were controlled. The results showed that adjusting the message intrusion level may improve the acceptability of notifications and that users’ activities at the time of notification do not influence acceptability.

Millen and Patterson [15] investigated the effects of email notification on social engagement from the activity logs. They concluded that users are almost twice as likely to return to the site when they receive a notification alert. They also found evidence that increasing the number of messages contained in the alert is useful for promoting community engagement.

McCrickard et al. [13] evaluated the use of animated text in secondary displays in notification systems looking for the balance between attention and utility. They described two empirical investigations focused on the three often conflicting design objectives: interruption of primary tasks, reaction to specific notifications, and comprehension of information over time. The researchers concluded that the slow fade appears to be the best secondary display animation type tested.

Our research is focused on the analysis of engagement, before and after the insertion of the notification functionality, during the process of developing a collaborative system that has problems of engagement with users.

4 Online Field Experiment

Online field experiments, often called A/B testing, are built into the context of an online community under study. They do not allow for a direct manipulation of the treatment nor need to assign subjects at random to either control or treatment conditions. In general, online field experiments select a random sample of an online community’s population for participation, divide participants into groups and then observe or measure the participants’ outcomes [18].

Online field experiments usage has grown substantially in recent years, mostly in the industry, in a world in which the traces of social interactions are increasingly available online [18]. They are popular in multiple fields, such as computer science, economics, public finance, industrial organization, human-computer interactions, computer-supported collaborative work, and e-commerce [7].

The overall goal of our online field experiment is to investigate whether notifications increase engagement in open collaboration online communities. Particularly, we planned the experiment in the context of the Arquigrafia online community.

Arquigrafia is a public, nonprofit digital collaborative community dedicated to disseminating architectural images, with special attention to the Brazilian architecture (www.arquigrafia.org.br). The main objective of the community is to contribute to the study, teaching, research, and diffusion of architectural and urban culture, by promoting collaborative interactions among people and institutions.

Arquigrafia needs to foster a community around Architecture images and information. The analysis of subjective architectural issues on images will only generate relevant results when a mass of users engages to build a collective intelligence on architecture and urbanism. For this reason, it is a suitable project to carry out the experiment.

We use the GQM approach [1] to document our goal. Therefore, we state the overall experimental goal as:

Analyze the notification feature

For the purpose of its evaluation

With respect to its effectiveness on increasing users’ engagement in open collaboration online communities

from the viewpoint of the researcher

in the context of the Arquigrafia open collaboration online community.

We thus aim to answer the following research question (RQ): Do notifications increase engagement in open collaboration online communities?

Engagement can be best analyzed by a series of interrelated metrics which are combined to form a whole. The relative proportion, or importance, of each of these metrics will vary depending on the type of business being considered [22]. These metrics can be aggregated as an engagement score:

  • Recency is about the time gap between the last visit to the present.

  • Frequency is about the number of user accesses to the system.

  • Duration is about how long users spend time in each connection.

  • Virality is about how many other users are influenced by a certain user to engage with the object.

  • Ratings is a user evaluation in terms of quality, quantity, or some combination of both.

The metrics are used to measure user engagement with the system before (Period 1) and after (Period 2) inserting a new notification feature. The period considered (Period 1 + Period 2) was 14 June 2015 to 10 August 2015.

Table 1 describes the periods considered in this experiment.

Table 1. Periods before and after inserting the notification feature

The recency metric was obtained by means of the difference between the last and the second to last access in days. Therefore, even though the user has used the system more than once during a day, the recency only counts 1 day if he had an access the previous day. Therefore, the lower the recency, the greater the interest the user had in returning to the system in a short time.

For this experiment, the recency was calculated from the beginning of each period; otherwise, Period 2 would be harmed with higher results of recency than Period 1. Both periods have stored data only since the insertion of logs from June 14; therefore, we have no recency data for many users that accessed the system in Period 1, because Period 1 started on June 14. For this reason, we balanced both periods starting recency calculation at the beginning of each period.

The frequency metric was calculated with the number of accesses to the system for different moments of the day (frequency per day); and with the number of access days of each user (frequency in days). For the first case, if the user accessed the system five times within an hour, we still consider only one access because it happened within one hour of the day analyzed. Therefore, the frequency per day was calculated with the number of accesses in the period divided by the number of days of each period, 14 days. The maximum number of accesses is equal to 24 hours per day * 14 days or 336 accesses in each period. For the frequency in days, we count the number of days a user accessed the system in both periods.

The duration metric presents the time of each user access in seconds, showing the difference between the last and the last but one access dates. Analogously, as with the frequency metric, the duration is also grouped by day and hour. The results of duration metric in each period are calculated by means of the average of access durations of a user in the period.

The virality metric was analyzed calculating accesses to the system via posts on Facebook and Google+ social networks and by the pages accessed from the notification functionality. This metric allows a deeper analysis of the impact of notification, as it tracks users accessing content in a system they would not know otherwise.

For the ratings metric, we considered the functions of comments and evaluations in the system. Comment functionality allows users to express their opinions about shared content, and may add new data or add value to content in the system. The evaluation functionality is an area the user has to analyze by means of quantitative parameters of the shared content in the system.

Similarly to other studies on online engagement [9, 10], we purposefully designed the experiment as an online field experiment, in a real existing open collaboration system, rather than in a laboratory setting.

Event logs were inserted in the system to collect real usage data. User actions were logged into .log files. We wrote an algorithm (in the Java programming language) to convert each row from the .log files into data organized in a .csv file. These files, in turn, were converted into SQL queries to insert the content into a MySQL database. To retrieve the notification engagement metrics, it was necessary to evaluate external events from both development process and code updates, as well as events involving new users accessing the system, such as the new users logging because of the usability tests.

In Period 1, the system did not have the notification feature and it had the first version of the action logs system. In Period 2, the system had a new notification feature, which was inserted after a remote usability test between June 29 and July 14, but focused on other features.

The notification feature inserted in Period 2 was an improvement over a previous version. In this release, it was possible to group notifications related to the same object shared by an author (system user). For instance, in the previous version, if ten users commented on a single content shared by a certain user in the system, the system displayed ten different notifications to the author. In the second version, only one notification appears informing that ten users commented on a specific content. In both versions, it was possible to access the notified content from the notification interface.

Originally, 33,855 events of logged users were analyzed, with 1422 events from 89 users of Period 1 and 32,433 events from 1096 users of Period 2. We believe that the increase of users in Period 2 occurred as a consequence of a remote usability test. For this reason, we deleted the data of users that accessed the system during the test period aiming to withdraw the influence of the usability test in the analysis.

Besides, for building a comparable data set, we deleted users’ data that appeared in only one of the periods considered in the experiment. Therefore, we analyzed the behavior of the same users in the 2 periods, each period with the same number of days: 14 days. After the data cleaning, each period has 31 users. Period 1 has 321 events and Period 2 has 11,654 events performed by the same users as in Period 1. Therefore, we were able to compare the behavior of two homogeneous user groups pre- and post-implementation of the notification feature, Period 1 and Period 2, respectively. The relevant information was recovered from MySQL database and was exported to CSV files, which were used as data sources for the R tool. In the R tool, we performed statistical analysis to validate the data set relevance. We are considering a statistical significance of 0.05. If the p-value is less than 0.05, there is evidence to claim that the data sets differ significantly. The Shapiro-Wilk normality test rejected the hypothesis that data from Periods 1 and 2 come from a normal population. Therefore, we used the Wilcoxon rank sum test, a non-parametric statistical hypothesis test, to perform data analysis.

The analysis was divided into two periods: before and after the insertion of the notification feature. We created MySQL views for each period to facilitate retrieving metric values from users that accessed the system in each period. We also generated graphs (boxplot) from the statistical analysis to facilitate the results visualization. Our SQL scripts enable standardized and automated retrieval of metrics, enabling replication of the analysis performed at any time. The notification feature implemented in the system was intended to display to users their status as well as the status of their objects in the system. For example, notifying the user that people have commented on some content shared by him/her. The feature also enables the user to know who the users who followed him/her are, which can promote the expansion of their contacts. Note that the notification feature can be viewed by any user logged in, regardless of whether they have notifications about their status at that time or not. In this case, the user receives a message that he/she does not have notifications yet, which is an indication that he/she needs to perform actions on the system. The next sections present the results for each of the five engagement metrics considered.

5 Results and Discussion

Table 2 presents statistical results for metrics frequency and duration. For metrics recency, virality and ratings, enough evidence lacks to compare Periods 1 and 2. The results of Table 2 are discussed in further detail in the next subsections.

Table 2. Statistical results for the experiment.

5.1 Frequency and Duration Metrics

For the frequency per day, in Period 1, 31 users had frequency average between 0.07 and 0.64 accesses per day. In Period 2, the 31 users accessed the system between 0.14 and 21.35 times per day. The average number of accesses per day was 0.14 accesses for Period 1 with standard deviation (SD) of 0.17; and 4.22 accesses for Period 2 with standard deviation of 6.51. The p-value for periods comparison was 2.229e-11, from Wilcoxon rank sum test. Figures 1(a) and (b) summarize the average number of accesses of users per day for the 14 days of each period.

For the frequency in days, in Period 1, 30 users had frequency of 1 day of access and only 1 user had frequency of 4 days of access. In Period 2, 27 users accessed the system between 9 and 14 days and 4 users accessed the system between 2 and 4 days. The average number of access days is 1.09 for Period 1 with standard deviation of 0.53; and 11.35 days of access for Period 2 with standard deviation of 3.56. The p-value for periods comparison was 8.146e-13, from Wilcoxon rank sum test. Figures 2(a) and (b) summarize the number of users access days for the 14 days of each period. Our data suggests that Period 2 presented an improvement in the frequency of accesses of users, for both types of frequency, considering the same 31 users, after the insertion of the notification feature.

For the duration metric, in Period 1, 22 users had duration average nearly 0 s, which represents only the access to the home page, without taking any action on the system; 2 users had duration average between 11 and 77 s; 5 users had duration average between 462.41 and 937.66 s; and 2 users had duration between 1134 and 1875 s. In Period 2, 10 users had duration average nearly 0 s (up to 0.09); 2 users had duration average between 0.5 and 2.51 s; 9 users had duration average between 21.42 and 65.29 s in the period; 6 users had duration average between 124.29 and 226.03 s; and 4 users had duration average between 1255.80 and 1326.34 s. Period 1 had average access duration of 211.05 s with standard deviation of 445.91.

Fig. 1.
figure 1

Average number of accesses per day for Periods 1 and 2.

Fig. 2.
figure 2

Number of access days for Periods 1 and 2.

Fig. 3.
figure 3

Average duration in seconds for Periods 1 and 2.

For Period 2, the average duration was 210.75 s with standard deviation of 425.47. The comparison between Period 1 and Period 2 resulted in a p-value of 0.01308 for the duration metric, from Wilcoxon rank sum test. Although the duration in seconds remained small in Period 2, for users data without outliers - up to 226.03 s -, there was an increase of short accesses, as if the users were checking something new in the system, which is directly related to the notification entry informing them of novelty. In the duration metric, the outliers data from Period 1 increased the duration average but they did not represent most users of Period 1. 70% of the users had a duration average of nearly 0 s in Period 1, whereas in Period 2, 32% of the users had the same duration of nearly 0 s (up to 0.09) or 38% to also consider durations between 0.5 and 2.51 s. The results are summarized in Figs. 3(a) and (b). Our data suggests that there was a slight improvement in duration metric after the insertion of the notification, especially when the variation in duration in seconds is analyzed among the same users considered in both periods, according to Fig. 3.

5.2 Recency, Virality and Ratings Metrics

In Periods 1 and 2, the virality from Facebook and Google+ was 0. For the virality from notification, six users accessed other system pages starting from the notification feature. The average of virality from notification was 2 in Period 2, ranging from 2 to 3 accesses to other pages from the notification, from the six users that received notifications. For the ratings metric from comments, Period 1 had no comments, while Period 2 had 1 user comment. For the ratings metric from evaluations of photos, the two periods had no evaluations performed.

It is worth noting that the pages considered for the virality metric calculation indicate the content pages of the system that were accessed by users from the notification, as pages of users’ profile. The notifications view is available to all logged-in users, but only users who received comments about their shared content or users who had new followers received notifications. The user who shared the content is notified with the identification of who the user was and what action he/she performed (e.g., posting a comment).

Considering the periods under analysis, access to pages from the notification started in Period 2. Only the six users who received notifications from users who followed them could have access to pages from the notification interface. The six users that received notifications are among the longest and most frequent users. However, there was no influence of the notification on the virality metric from accesses through Facebook or Google+, as well as on the from comments and evaluation of system contents.

For the recency metric, in Period 1, thirty users accessed the system once and they have no previous access data because we started the action logging in Period 1; therefore, it is not possible to calculate their recency. Only 1 user had recency equal to 1 day. However, the access dates ranged between June 20 and June 27. Therefore, it is possible to conclude that the recency of the thirty users was higher than 6 days in Period 1. In Period 2, the 31 users had recency between 1 and 3 days, 24 of them with recency equal to 1. Therefore, our data suggests that Period 2 presented an improvement in the recency of users.

5.3 Discussion

The analysis of the engagement metrics was performed in an open collaboration system that depends on the users’ interaction, so that urban architectural elements can be jointly shared and analyzed by the community. For this reason, engagement is a central concept for the sustainability of the system. However, Arquigrafia was facing user engagement problems, the reason why we chose it for our study.

There were no other activities in the system that could lead to increased frequency, duration or recency of users in the system. The number of uploads, ratings and comments about photos was small to influence the user behavior. The only external event that occurred was the usability test between Periods 1 and 2, but all users who accessed the system during the test period were removed from the analysis. For this reason, our data suggest that the notification improved the frequency and the recency of the same group of users from Period 1 to Period 2 and the notification caused a slight improvement in the duration of users in the system.

The results are in agreement with the conclusions obtained by Millen and Patterson [15], who stated that the presence of a notification service resulted in increased site activity, especially total sessions per day. In addition, Millen and Patterson reported that ongoing daily activities are important to sustained participation in a community, and the members need be made aware of the activities. According to Carroll et al. [5], notification systems provide common ground essential for collaborative work since they have an impact on the ability users have to collaborate and to be aware of the system. Consequently, notification systems can increase users participation and awareness, which are directly related to engagement.

One limitation in the experiment was the data sample size. Each group was analyzed with data collected in a 14-day period, with 31 users in each period. This approach is also encouraged by Kohavi et al. [11]. Despite this sample limitation, we believe that our approach is valuable because it brings insights about the research question before investing in larger studies.

6 Conclusion

We conducted an online field experiment in an open collaboration community for 28 days. Our study analyzed the relationship between the notification implementation and user engagement behavioral outcomes. We aimed to investigate whether notifications increase users’ engagement in the context of Arquigrafia, a digital collaborative community focused on the diffusion of architectural images. The major original contribution of this paper was to explore, in a real setting, the engagement of two homogeneous user groups: pre- and post-implementation of a notification feature. We measured users’ engagement using recency, frequency, duration, virality, and ratings metrics.

There was a significant improvement in frequency and recency of users and a slight improvement in the duration after the insertion of the notification, considering the same users in both periods. For the virality from notification, there were changes in the behavior of users that accessed the notification interface, but there was no influence of the notification on the virality metric from accesses in Facebook or Google+, nor on the ratings from comments and evaluations of system contents. Regarding our Research Question (Do notifications increase engagement in open collaboration online communities?), our results indicate an improvement in the user’s engagement, as four of the five engagement metrics had positive results.

This work points to the need for some future studies. The next step is to conduct a large-scale online field experiment to be able to test hypotheses about the relationship between the notification feature and user engagement. This would also increase results generalizability. Finally, future studies may aim to evaluate additional features that might influence users’ engagement in open collaboration communities.