Cybersecurity Discussions in Stack Overflow: A Developer-Centred Analysis of Engagement and Self-Disclosure Behaviour

Stack Overflow (SO) is a popular platform among developers seeking advice on various software-related topics, including privacy and security. As for many knowledge-sharing websites, the value of SO depends largely on users' engagement, namely their willingness to answer, comment or post technical questions. Still, many of these questions (including cybersecurity-related ones) remain unanswered, putting the site's relevance and reputation into question. Hence, it is important to understand users' participation in privacy and security discussions to promote engagement and foster the exchange of such expertise. Objective: Based on prior findings on online social networks, this work elaborates on the interplay between users' engagement and their privacy practices in SO. Particularly, it analyses developers' self-disclosure behaviour regarding profile visibility and their involvement in discussions related to privacy and security. Method: We followed a mixed-methods approach by (i) analysing SO data from 1239 cybersecurity-tagged questions along with 7048 user profiles, and (ii) conducting an anonymous online survey (N=64). Results: About 33% of the questions we retrieved had no answer, whereas more than 50% had no accepted answer. We observed that"proactive"users tend to disclose significantly less information in their profiles than"reactive"and"unengaged"ones. However, no correlations were found between these engagement categories and privacy-related constructs such as Perceived Control or General Privacy Concerns. Implications: These findings contribute to (i) a better understanding of developers' engagement towards privacy and security topics, and (ii) to shape strategies promoting the exchange of cybersecurity expertise in SO.


INTRODUCTION
The last decade has put privacy in the spotlight of software development, as new legal frameworks emerged to safeguard people's data protection rights and promote responsible engineering practices.One clear example is the EU General Data Protection Regulation (GDPR) [24] which has introduced strong legal provisions seeking to enforce software companies to comply with a set of privacy principles including transparency, fairness, and informed consent.More recently, as the software industry moves towards the development of Artificial Intelligence (AI) applications, a new regulatory framework is in sight [12], promising to strengthen the protection and governance of personal data in AI Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.Copyrights for components of this work owned by others than ACM must be honored.Abstracting with credit is permitted.To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.Request permissions from permissions@acm.org.systems.In turn, companies and organisations have been urged to adopt privacy-by-design practices to comply with current regulations.Nevertheless, this has also raised questions and concerns among software developers on how to effectively translate these legal provisions and privacy principles into technical solutions [30].
Question-Answer (Q&A) platforms are a valuable resource for both experienced and junior programmers seeking support in their software development tasks.Stack Overflow (SO) [33] is among the largest Q&A platforms in which developers participate in discussions related to performance issues, bugs, and code workarounds [4].Given the increasing importance of cybersecurity in software engineering, a large number of questions regarding privacy, security, and data protection have been posited and addressed by SO users.Particularly, issues related to GDPR compliance, privacy policies, and access-control are some of the most popular privacy-related discussions in SO [22,38].Still, privacy-and security-related topics receive little attention in comparison to others such as data science, big data, and mobile operating systems 1 .Albeit this suggests a low engagement towards cybersecurity discussions within the SO community, it also reveals an overall tendency among software developers to overlook privacy and security aspects of their code [6,15,28].

Motivation
Developers play a key role in embedding privacy and security principles into the core architecture of information systems [15].However, many often fail to create secure software solutions that successfully preserve users' privacy and data protection rights [15,28].Over the last years, a growing body of research has leveraged the SO's dataset to identify and characterise cybersecurity trends among software practitioners.Prior work has investigated developers' motivations [22], knowledge gaps [38], and concerns towards privacy and security [21].However, "answer-hungry" questions are still a common phenomenon and an ongoing issue within Q&A websites (i.e., questions remaining unanswered or unresolved) [14].Being SO a community frequented by more than 100 Million developers per month [31], users' commitment towards timely and high-quality answers becomes critical for the platform's reputation and success.
Former research has sought to understand users' motivations (and amotivations) when it comes to participation in Q&A forums [2,10,41].Yet, little effort has been made to characterise users' engagement in cybersecurity discussions in SO.That is, on providing evidence and actionable information about community members participating actively (or not) in such exchanges.Individuals' engagement in Online Social Networks (OSNs) like Facebook has been extensively investigated from the perspective of privacy concerns.Such research has analysed the connection between users' self-disclosure decisions (e.g., the amount of private information they reveal inside profiles and posts) and their engagement in these platforms (e.g., number and quality of OSN posts) [9,17,36].Overall, such research has not only contributed to a better understanding of users' privacy concerns and practices but has also paved the road for the development of user-centred technologies.That is, for the elaboration of methods and tools aiming to support and guide users' interaction in OSN environments [27].
However, to the extent of our knowledge, the role of privacy-related behaviour has not been closely investigated within Q&A platforms like SO. Particularly, the interplay between developers' self-disclosure practices and their engagement in discussion threads has not been yet explored under the lens of privacy and security benchmarks.

Contribution and Research Questions
SO is a valuable resource for developers seeking advice about multiple aspects of software development.Given the increasing importance of cybersecurity in software engineering, it becomes necessary to foster the engagement among its users towards privacy-and security-related discussions.Hence, this work aims at contributing to ongoing research in SO by investigating the interplay between users' self-disclosure decisions and their engagement in cybersecurity discussions.All in all, the research questions (RQs) this paper seeks to answer are: • RQ1: Are users' self-disclosure behaviour associated with their engagement in cybersecurity discussions?Prior studies in OSNs (in general) and Q&A platforms (in particular) have shown correlations between users' engagement and self-disclosure practices (e.g., [2,17,39]).Hence, this RQ aims at zooming into developers' decisions regarding profile visibility and their participation in discussions about privacy and security.Particularly, it seeks to investigate whether different self-disclosure patterns exist across SO users who involve themselves actively in such discussions, and those who do not.
• RQ2: Are privacy-related constructs associated with users' engagement in cybersecurity discussions?As with RQ1, former studies have delved into the relation between psychological constructs (e.g., perceived risks and control) and peoples' engagement within OSNs (e.g., [16,36]).The purpose of this RQ is to examine whether such correlations also take place in SO but regarding users' participation in discussions about privacy and security.
To answer these RQs, we have followed a mixed-method approach combining the analysis of data collected from an online survey and information retrieved from SO user profiles.The results of our analysis show significant differences in the self-disclosure practices (i.e., with regard to profile visibility) of users contributing actively to discussions about data protection and information security, and those who do not.These findings not only contribute to a better understanding of users' engagement in such discussions, but also to solutions addressing "answer-hungry" questions in Q&A platforms.Particularly, for the elaboration of incentive strategies and recommender systems promoting the exchange of cybersecurity expertise in SO.
Paper Structure.Section 2 discusses related work and gives and overview of the paper's theoretical background.Section 3 describes the methodology employed for the study in terms of data collection, aggregation, and survey design.Section 4 reports the results of our analysis, and Section 5 discusses them.Section 6 summarises limitations and threats to validity.Section 7 concludes this work.

BACKGROUND AND RELATED WORK
A growing amount of literature has zoomed into cybersecurity discussions in SO and engagement patterns in OSNs.This section summarises related work elaborating on privacy and security insights gathered through SO.Alongside, we discuss research addressing privacy concerns as a rationale for users' engagement and self-disclosure behaviour in OSNs.

Cybersecurity Discussions in SO
Given the Q&A affordances available within SO, this platform has been widely used as a proxy for understanding the cybersecurity concerns and practices of software engineers [13,21,22,38].For instance, Lopez et al. [22] conducted a qualitative analysis of SO discussion threads to understand the type of security support developers seek and provide online.Their findings suggest that security-related discussions in SO are rich in terms of technical help but also regarding developers' personal values and attitudes such as trust, fear, and sense of responsibility.In a follow-up article [21], the authors gathered further insights on how security knowledge is built and fostered within the SO community.Overall, their results show that developers often tend towards security-related discussions within the context of technical solutions provided by others.In line with this, Tahaei et al. [38] applied natural language processing techniques to unveil topics emerging within privacy-related questions.The outcome of such an analysis showed that privacy policies, access-control, and encryption are among the main privacy topics addressed by SO members.
At its core, SO is a peer-production community where knowledge is built from the interaction between developers seeking to clarify each other's technical inquires [29].Hence, users' participation and engagement are of utmost importance for the sustained development of the platform and the expertise crafted within it.Moreover, timely answers to questions are critical to the platform's efficiency and, thus, to its popularity.Nonetheless, prior research has systematically reported that many questions in SO receive little attention or even remain unanswered/unresolved (up to 30% by May 2022 [32]).As a catalyst for developers' technical concerns and best practices, it is essential to understand the factors contributing to or impairing users' participation in SO.Prior work has tried to explain why some questions remain unanswered and even proposed machine learning models for predicting whether specific questions will be addressed or not [3].Still, the low engagement and the lack of answers to specific questions (including privacy-and security-related ones) remain open issues [14].Hence, there is a call for empirical evidence to (i) help characterise users' engagement in cybersecurity discussions and (ii) elaborate strategies for boosting their participation in such discussions.

Insights from Online Social Networks
Factors influencing people's participation in OSNs have been thoroughly investigated through the lens of privacy concerns.Moreover, prior work has closely analysed users' privacy practices, often accounting for correlations between OSN engagement and self-disclosure behaviour.Staddon et al. [36], for instance, observed strong associations between privacy concerns and users' engagement on Facebook using an online survey.Their findings revealed that individuals expressing concerns about their privacy also report spending less time on the platform and sharing less content.Hence, they concluded that privacy concerns might play a significant role in people's engagement in OSNs.In line with this, a study by Choi and Sung [9] showed that privacy concerns are closely associated with active Instagram use (e.g., sharing content and interacting more with others) and people's selection of a particular OSN platform over others (e.g., Instagram over Snapchat).Alongside, research has systematically reported evidence on the so-called "privacy paradox", showing offsets between users' concerns and engagement in OSNs [18].Such evidence suggests that, despite expressing privacy concerns, people still join OSNs and disclose significant amounts of personal information.
When it comes to engagement in Q&A platforms, Kayes et al. [17] investigated the interplay between users' privacy concerns and their participation in Yahoo!Answers.By considering changes in profile visibility as manifestations of privacy concerns, the authors unveiled correlations between users' self-disclosure behaviour and their platform contributions.Overall, they observed that users with a private profile contribute more often and with better content to the platform than those with a public one.Such findings can contribute substantially to the elaboration of Q&A recommendation approaches.For instance, one could leverage profile visibility for rooting unresolved questions to those users who are more likely to answer them [17].Surprisingly, concerns and practices alike have not been thoroughly investigated in SO despite its Q&A and social network affordances.Moreover, to the extent of our knowledge, the relationship between engagement in cybersecurity topics and self-disclosure practices has not been yet explored nor investigated from a developer-centred perspective.

METHODOLOGY
We conducted a two-stage empirical study to identify nuances in the self-disclosure practices of users participating actively in cybersecurity discussions, and those who do not.For this, we created a dataset from 7048 SO profiles corresponding to engaged and unengaged users during the first stage of the study.This dataset was then leveraged on the second stage to conduct an anonymous online survey.Both experimental stages are described in detail in the following subsections.

Data Collection
To identify users concerned with cybersecurity topics, we first conducted an analysis of privacy-and security-related conversations in SO.Such an analysis consisted in the identification of cybersecurity-relevant conversation threads through their corresponding user-assigned tags.For this, we used SO's Tag Explorer2 for the definition of tag sets which were used thereafter to mine relevant conversations.Particularly a set of topic tags plus two language tags were employed in the identification of cybersecurity-relevant discussions.
We included privacy, privacy-policy, security, code-access-security, data-security, network-security, and gdprconsentform as topic tags 3 .Additionally, r and python were used as language tags given the increasing popularity of these languages within the data science community [23].Thereby, we sought to narrow down the scope of the study mainly to data science practitioners as they are prone to handle sensitive data (e.g., medical records, biometric data, demographics).Furthermore, their cybersecurity practices can have a great impact on automated decision-making systems (e.g., biases, discrimination).

Discussions Dataset (D1).
Each topic tag was explored in combination with each language tag, resulting in 14 tag searches.To maximise the size of the dataset, we did not include additional restrictions such as time of posting, the existence of an approved answer, upvotes, or downvotes.Both search and extraction were executed through an R-based mining package included in the StackExchange API 4 .We conducted fourteen independent searches (i.e., one per tag combination) using the search/advanced endpoint and a tag filter provided by the API itself.By the end of the mining process, a total of 1239 questions/posts were retrieved from SO (Figure 1).
Questions posted in SO can be answered or commented on by other platform members.The main difference is that the latter asks for clarification instead of describing a suitable solution.One question can trigger several answers and comments (to the main question or to others' answers) from other SO users interested in the discussion topic.Therefore, such comments and answers are also relevant for identifying SO profiles corresponding to individuals who engage in cybersecurity discussions.Consequently, answers and comments associated with each of the 1239 questions were also mined and included in a discussions dataset 1.After this additional mining process, 1 contained 1239 questions, 2558 comments to questions, 1811 answers, and 2373 comments to answers.

Profiles Dataset (D2).
The information contained in  1 allowed us to identify the SO ids of those users who have either posted a question, provided an answer, or posted a comment deemed as cybersecurity-relevant.Overall, 3591 unique ids were retrieved, from which only 17 corresponded to users with fully private SO profiles.The remaining 3574 ids were used to mine the public information disclosed in their profiles through the StackExchange API (i.e., via the users/{ids} endpoint).
The email address of some of them was also mined using the GitHub (GH) URL available in the profiles (email addresses are never included in SO profile pages).This step was necessary to recruit participants afterwards for the online survey.This complementary mining process was executed using the R package gh5 resulting in 457 unique e-mail  addresses corresponding to engaged users.Such information was included in the profiles dataset  2 along with the rest of the profile information extracted from SO.
In order to populate  2 with profile information from unengaged users, we first estimated a representative sample size for such a subgroup.For this, we run a query to determine how many users have participated on each language tag6 using Stack Exchange's Data Explorer (SEDE).The result of this query gave 46038 users for the r tag, and 777587 for python tag.Next, we mined the profile information from a representative sample of these two groups with a 99% confidence and a margin of error of 3%.Such information was mined directly from the users/{uids} endpoint, ensuring that the corresponding SO ids were not already part of the engaged group, and were not repeated across each language.
Overall, we obtained 1830 Python users and 1645 R users (3475 in total).These results were merged into the  2 dataset, using an additional variable to indicate whether this information corresponds to engaged or unengaged users.
Like with the engaged profiles, we collected the e-mail addresses of 413 unengaged users via GH (Figure 1).

Data Aggregation
We parsed the information collected in both datasets to compute two variables of interest: (i) the amount of information users disclose in their profiles, and (ii) their engagement in cybersecurity discussions.The following subsections describe these variables plus an additional analysis we conducted to understand self-disclosure through display names.
3.2.1 Amount of Self-Disclosure.SO allows users to include the following information in their profiles: display name (with a maximum of 30 characters), location (as a text field), title (available in the profile, but merged into the display name when using the API), about me (HTML-friendly text box of up to 3000 characters), a website link, links to Twitter and GitHub profiles, and a profile picture (if not used, the system assigns a randomised avatar).To compute a metric reflecting the amount of personal information revealed in a profile, we assigned a normalised variable (i.e., ranging from 0 to 1) to each field except for the title.The value for each particular variable was estimated as follows: • We gave each link (website, Twitter and GitHub) a value of 1 if it was filled in the user's profile, and 0 if not.
• The location variable was calculated as the links (i.e., 1 if it was completed and 0 if not).Since users can obfuscate this field (e.g., by using nicknames or aliases), we conducted a card sorting analysis to estimate the reliability of this coding schema.From this analysis, we concluded that location information could be considered accurate if present.
• The variable corresponding to the display name was computed as the proportion of used characters over the total available (30 characters).As with location, we completed another card sorting analysis to obtain further reliability insights.Once again, we concluded that the information present in this field could be considered accurate.Both card-sorting analyses can be found in the paper's Replication Package.
• The profile image was retrieved as an URL address during the data collection process.To determine whether an image corresponds to a custom or a default one we compared its URL against a collection of Gravatar7 URLs (Gravatar pictures are frequently used as default in SO profiles).Using regular expressions, we assigned a 0 value to those profile pictures found in the Gravatar database.Otherwise, they were considered as custom and given a value of 1.
• The about me field can have up to 3000 characters allowing HTML formatting.The HTML tags were removed through an R script, and the proportion of used characters was calculated to determine the corresponding disclosure value of this field.This approach assumes that, as more characters are included, more personal information is being revealed.These normalised variables were aggregated into another variable named    quantifying the amount of personal information disclosed in a SO profile: where    corresponds to the maximum number of disclosable attribute values (7 in total), and     to the summation of each normalised variable.

3.2.2
Engagement in Cybersecurity Discussions.We classified users into engaged or unengaged, given their participation by computing the number of cybersecurity-relevant questions a user has posted (#), the number of answers provided to such questions (#), and of corresponding comments.This last one was divided into comments to cybersecurity questions (#  ) and comments to cybersecurity answers (#  ).Overall, if the sum # + # + #  + #  was greater than 0, then the user was classified as engaged and, otherwise, as unengaged.Also, we classified engaged users into proactive and reactive according to their tendency towards starting new discussion threads.Particularly, we considered proactive users to those who place more questions than comments and answers.That is, in cases where # ≥ # + #  + #  .Conversely, users posting more comments and answers than cybersecurity questions were classified as reactive.That is, when # < # + #  + #  .

Survey Structure
To complement the analysis of profile information and discussion threads, we conducted an online survey within a subgroup of SO users.In particular, we aimed at measuring psychological constructs and antecedents to better understand developers' concerns and behaviour regarding cybersecurity.The questionnaire consisted of an introductory part and two main sections: i.The introductory section provided information about the aim of the study along with the conditions for participation/withdrawal (participation was voluntary, and people were given a chance to withdraw at any time).
We also included the contact details of the authors in case of further questions and enquiries.
ii.After accepting the survey's terms and conditions, participants were forwarded to the first part of the questionnaire.This part included questions eliciting demographic information (e.g., participants' gender, education level, and current work status) along with their prior experience in software development (e.g., years working with R or Python).
iii.The second part included a set of questions measuring the following constructs: general privacy concerns (GPC), privacy concerns on social threats (PCS), privacy concerns on organisational threats (PCO), perceived privacy risk (RSK), perceived control (PC), and self-disclosure (SD).We used well-established constructs and scales previously elaborated and validated by other authors (i.e., GPC by Buchanan et al. [8] and the rest by Krasnova et al. [19]).All questions were close-ended and measured using a 6-Point Likert scale to increase the responses' reliability.We also included an attention question by the end of this section to identify careless respondents and preserve the quality of the results [20].
This survey was assessed and approved by an Ethics Committee, and is also available in the Replication Package.
Population & Sampling.The survey was distributed through Qualtrics in April/May 2021 using the email addresses collected during the mining process (Section 3.1.2).We gathered 69 responses, out of which five were filtered through the "attention control" question.The remaining 64 responses were considered for the corresponding analysis.

RESULTS
We conducted several statistical analyses over the information collected from SO and the responses obtained through the online survey.We conducted a -Test followed by an ANOVA test to identify significant differences in the self-disclosure practices of engaged and unengaged users.The results of these tests were complemented afterwards with an analysis of the survey data.

Privacy and Security Discussions (SO Q&A data)
A total of 1239 cybersecurity-related questions were collected from SO using the StackExchange API (as explained in Section 3.1.1).As shown in Table 1, around 67% of these questions had at least one answer (answered), and about 47% received an answer considered adequate by the user who asked the question (accepted).Another 58% had a positive score (i.e., a positive difference between up-votes and down-votes), whereas 39% of the questions received at least one comment.SO also allows experienced community members to close questions that are either off-topic or may need further clarification.We observe that around 8% of the questions in our dataset fall into this category.Following the same user categories investigated in Section 4.2, we conducted a one-way ANOVA test to analyse the privacy-related constructs elicited in the second part of the survey (i.e., GPC, PCS, PCO, RSK, PC, and SD).From the 64 participants, 33 were classified as unengaged, 8 as proactive, and 23 as reactive.Prior to conducting the test, we assessed the reliability of the employed scales by calculating their corresponding Cronbach's Alpha coefficient.In all the cases, such a value was higher than 0.7 suggesting a high internal consistency within each scale's items.
Table 4 also summarises the outcome of the one-way ANOVA for each constructs measured.We found no significant differences in any of these constructs across proactive, reactive, and unengaged users.This was also the case when conducting a -Test for a two-group classification (i.e., engaged and unengaged).

DISCUSSION
This section discusses the results of our study and provides answers to the paper's research questions.We also elaborate on the implications of our findings within the area of developer-centred security, namely the elaboration of strategies for boosting the participation of SO users in cybersecurity discussions.

Engagement and Self-Disclosure Behaviour (RQ1)
Our findings suggest that SO users with a tendency towards starting cybersecurity discussions disclose significantly less information in their profiles than others who do not (Section 4.2).Similar observations were made by Kayes et al. [17] in a study about peoples' engagement in the Q&A platform Yahoo!Answers.The authors found correlations between users' self-disclosure behaviour (i.e., profile visibility preferences), the frequency, and the quality of their contributions.
Particularly, individuals with a more restrictive profile tend to contribute more and with better content than those with a public one.Furthermore, such users also showcase higher retention levels (i.e., average time interval between contributions) and have a higher perception on answer quality.
On the other hand, our results also show that reactive users not only reveal more profile information than proactive ones, but also more than those unengaged.Such a finding is to some extent aligned with prior research on identity formation in Q&A platforms.To a certain extent, participation in SO is driven by users' need for recognition within the platform.That is, in terms of points and badges that users can assign to each other based on the perceived quality of their contributions [41].For instance, a study conducted by Adaji and Vassileva [2] showed that high-quality questions are frequently posted by users with complete profile information.Vargo and Matsubara [39] also made similar observations and concluded that profile visibility tends to decrease over time.Hence, we could assume that reactive users may also be driven by reputation or recognition when deciding whether to disclose more personal information inside their profiles.

Engagement and Privacy-Related Constructs (RQ2)
Unlike the results obtained from the users' profile information (Section 4.2), the analysis conducted over the survey data showed no significant differences in the elicited constructs (i.e., GPC, PCST, PCOT, RSK, and PC) across unengaged, proactive, and reactive users (Section 4.3).We hypothesise that this can be related to the relatively good reputation of SO in terms of privacy and data protection, as opposed to OSNs like Facebook.Unlike the latter, SO has not received the attention of mainstream media due to major data-breach scandals or privacy violations.Hence, the role of privacy concerns and perceived risks may not be significant for users' participation and engagement within the platform.
The differences observed in self-disclosure behaviour were not reflected by its survey counterpart (i.e., the SD variable).Nevertheless, and despite that such results may look inconsistent, prior research has also found discrepancies between people's reported and actual privacy behaviour.As mentioned in Section 2.2, this is often referred to as the "privacy paradox", a phenomenon frequently observed within users of OSNs.Our findings suggest traces of this paradox among SO users, especially when contrasting the outcome of the survey analysis with that of the users' profiles.Still, further research is necessary to determine whether the reported privacy behaviour outperforms the actual one across the three user categories.It would be of special interest to understand whether and up to which extent is the privacy paradox manifested among SO users, and how does it relate to their overall engagement.

Implications and Recommendations
As privacy and security flaws in information systems grow steadily, it is very important to promote the exchange of privacy and security knowledge among software practitioners.To a large extent, the SO community is encompassed by early-career developers seeking for support and guidance in their engineering practices [22].Hence, it plays a key role in the dissemination and synthesis of cybersecurity expertise.However, our results show an apparent deficit in terms of answers to privacy-and security-related questions (Section 4.1).This can not only cause dissatisfaction to those asking such questions, but also damage the platform's value and usefulness in this regard.Having identified nuances in the self-disclosure behaviour across different user groups can be used to foster the exchange of privacy and security expertise.For instance, profile information could be leveraged to motivate the participation in cybersecurity discussions among SO users, by rooting pending questions to those users who are more likely to answer them (e.g., those with a less visible profile).Moreover, closed questions could be assigned to these users for further clarification, and thus increase their resolution chances.Such an approach could also contribute to existing Q&A recommender systems and frameworks (e.g., [40]) seeking to match forthcoming questions to potential respondents.That is, by incorporating profile visibility as a feature of their question-user matching algorithms.
Similarly, our results could be used to elaborate incentive strategies targeting unengaged individuals.For example, by delivering cybersecurity suggestions to those SO users having a more visible profile.This approach is illustrated in Fig. 4, where a (hypothetically) unengaged user receives such suggestions as she seeks for advice about an issue that is not cybersecurity-related.Here suggestions come in the form of privacy-and security-related entries in the Overflow Blog [35], a website curated by SO that gathers essays, opinion articles, and podcasts about computer programming.Using different persuasive styles to approach certain user groups could also improve even more the chances of engagement and behaviour change [5,26].For example, unengaged users could be nudged using a more authoritarian style (e.g., "Microsoft and other big tech companies urge developers to engage in cybersecurity training!"),whereas a consensual one could be applied to proactive and reactive users (e.g., "Many across the SO community agree: Cybersecurity training is essential for software developers!").Likewise, differentiated training content (e.g., access to customised documentation and software artefacts) could be offered to each user group based on a further assessment of their technical skills.

LIMITATIONS AND THREATS TO VALIDITY
To a certain extent, the results of our study are subject to limitations related to its experimental design.In particular, the following construct, external, and internal threats may affect the validity of our findings and conclusions: Construct threats stem from the degree to which scales, constructs, and instruments measure the properties they intend to [25].Within the scope of our study, one construct threat arises from the approach employed to compute users' amount of self-disclosure.Profiles are not the only means to reveal private information in SO as users can also disclose personal data inside questions, answers, or comments.However, we conducted our analysis exclusively over SO profiles as they are already adequate and extensive sources of self-disclosure evidence.
Another construct threat relates to the approach we followed to characterise users' engagement in SO.Indeed, engagement can also take a passive form, where a member (often referred as a "lurker") may not contribute actively to a discussion but may still read it and take advantage of its knowledge.We left passive engagement out of the scope of this work as it cannot be determined from the information in our dataset.Still, future research will seek to characterise lurkers and their interaction patterns regarding cybersecurity discussions.
Regarding the psychological constructs elicited during the online survey, we have assessed their reliability by computing the corresponding Cronbach's Alpha coefficient.As mentioned in Section 4.3, we obtained values higher than 0.7 in all the cases, suggesting a high internal consistency of these survey instruments.
External threats refer to conditions that may affect the generalisability of the study results [11].In our case, this relates to the discussions and profile samples we extracted from SO.Since the selection of cybersecurity questions was guided by the tags users assign to them, we may have considered wrongly-tagged questions in our analysis or missed some untagged ones out.Nonetheless, since the SO community of curators often addresses such problems, we assumed the posts we retrieved were accurately labelled.
Another external threat to validity stems from the different sample sizes between Python and R discussions.To minimise this threat, we treated both samples as one without conducting any analysis on each specific language.
Likewise, having gathered the email addresses only through GitHub can also be seen as an external threat since it directly impacts the survey's sample size -we sent the survey only to those users from whom we collected their emails via GitHub-.Nevertheless, we managed to gather enough contact details using this approach and distributed the survey to a fair amount of potential respondents.
Internal threats relate to factors that may influence the independent variables of the study in terms of causality [11].
In this work, we have analysed the connection between users' self-disclosure practices and their engagement in cybersecurity discussions.However, as mentioned in Section 5.1, both self-disclosure and engagement practices can be influenced by users' need for recognition and popularity within the platform, among other intrinsic and extrinsic factors.
Hence, we acknowledge that our study is observational and, as such, cannot be leveraged to draw casual conclusions given the lack of controlled experimental ground truth data.

CONCLUSIONS AND FUTURE WORK
Secure software development largely depends on practitioners' abilities to detect and address potential cybersecurity threats.Still, prior work has shown that many consider security and privacy as secondary aspects of software projects [1].
Given the increasing popularity of Q&A platforms like SO, it is important to characterise and foster the exchange of cybersecurity expertise of their users in order to shape privacy-and security-savvy communities.

Fig. 1 .
Fig. 1.Mining process followed to extract and generate both datasets.

Table 4 .
One-way ANOVA Test (profile and survey data).

Table 5 .
Survey Self-Reported Demographic Data.