Introduction

The act of tweeting, i.e. publishing short texts on X, formerly known as Twitter, can be considered as somewhat different from the act of retweeting (or reposting), i.e. forwarding an original tweet (or post) that someone else has tweeted earlier. Writing an original tweet takes some effort and engagement with the topic, while retweeting can be done with a simple click on a button, clearly requiring less engagement with the topic and the original source. Yet, oftentimes both tweeting and retweeting are treated as similar events and in altmetric purposes they are usually not analyzed nor calculated separately. In some cases, one could argue, the number of retweets might be more important than original tweeting. Celebrities and social media influencers depend on the engagement of their followers to share and disseminate their message even further. On the other hand, disseminating new knowledge about scientific discoveries might benefit more from higher numbers of original tweeting and true engagement, as retweeting doesn’t require any engagement or even knowledge about the original discovery or source. Users might also have different motivation for tweeting and retweeting scientific publications. Mohammadi et al. (2018) investigated motivations to engage with scientific tweets, finding that 85% of participants in his survey retweet to disseminate research; 42% do it because they want to let the author know that they have found the research interesting; 20% do it to save it for later access; and 15% to let the author know that they have read the article. This research investigates differences between the acts of tweeting and retweeting of scientific articles and discusses what implications the findings have on the reliability and applicability of Twitter for research impact assessment.

Due to a variety of different affordances in tweets, it is not well-known which type of events are more valuable for research impact assessment. Twitter uptake of publications is usually counted from the number of raw tweets, i.e., all tweets and retweets mentioning a specific scientific publication, which may include tweets and retweets sent by automated bot accounts or tweets that could be considered as spam. It is also unclear whether automatically created or disseminated tweets should be considered as valuable as original human-created tweets from an impact assessment perspective. The overall aim of this study is to investigate differences between original tweets and retweets, and to discuss the implications of the findings for research impact assessment. For this purpose, we investigate involvement of different user types and usage of various types of textual and media features and user engagement events in tweets and retweets, and then make comparison across fields and over time. This overall aim of this research can be summarized in the following research questions:

  1. 1.

    How does tweeting and retweeting differ between different Twitter user groups?

  2. 2.

    Are specific content features of Twitter posts more prevalent in either tweets or in retweets?

  3. 3.

    How does tweeting and retweeting change over time?

  4. 4.

    How does tweeting and retweeting differ across different fields of science?

Background

Earlier studies have extensively investigated tweeting and retweeting of scientific publications. The results have shown great variations both between tweeting and retweeting, and between different fields of science. Ke et al. (2017) discovered high proportions of retweeting, as they found 66% (30,204) of the investigated 45,867 scientists retweeting. Holmberg and Thelwall (2014) also suggested that scholars are retweeting scientific products more than typical Twitter users. Haustein (2019) investigated tweets and retweets in 14 fields, discovering that overall 50% were retweets. The proportion of retweets with links to scientific articles has been found to vary across fields, as Physics (30%), Engineering and Technology (33%) and Arts (39%) had fewer retweets than Biology (54%), Social Sciences (53%) and Health and Humanities (both 52%) (Haustein, 2019). Didegah et al. (2018) investigated 6388 tweets to 300 articles in five broad fields identifying between 21.5% retweets in Life and Earth Sciences to 68.4% in Physical Science and Engineering. Similarly, Holmberg and Thelwall (2014) discovered that scientists retweeted between 22% in Digital Humanities and 42% in Biochemistry. Arroyo-Machado et al. (2021) found that original tweets constitute about 44% of all tweets (vs. 56% retweets) to Microbiology and Information and Library Science publications from 2012 to 2018. Hassan et al. (2019) found retweets to count for 39% (4061) of the investigated total of 10,345 tweets to 820 articles in Library and Information Science. Maleki (2018) identified only 30% retweets among tweets linking to Astrophysical Journal articles. Na (2015) found that retweets accounted for 19% of a total of 2016 tweets to Psychology articles, while later, Ye and Na (2018) found the retweets to account for 56.8% of the tweets linking to a set of Psychology publications. Tur-Viñes et al. (2018) found that 38.5% of tweets to 30 Spanish journals in Communication were retweeted. The results from earlier studies vary greatly, possibly depending on the chosen dataset and/or time of the study, but they all point to differences between tweeting and retweeting.

Some of the earlier studies have only reported the extent of original tweets to publications, such as the study of Díaz-Faes et al. (2019), which reported the mean original tweets by users with scientific tweets (5.36 tweets, median = 0). Other studies only report extent of retweets, such as the study of 162 publications in the journal Cell, reporting a median of 17.5 retweets (Cui et al., 2018). Fang et al. (2022) reported varying extent of retweets across fields, from 0.95 mean retweets in Physical Sciences and Engineering to 2.07 in Life and Earth Sciences, and 2.18 in Social Sciences and Humanities. Fang et al. (2020) introduced two indicators: Degree of Originality (proportion of original tweets to an article) and Degree of Concentration (proportion of retweets to an article). They reported about 28% median Originality and 20% median Concentration per article for 1154 highly tweeted publications with over 2.6 million tweets. More recently, Fang et al. (2022) analyzed different actions connected to about seven million unique original tweets linking to over two million WoS-indexed publications. They found that about 36% of original tweets were disseminated through about 13.5 million retweets, and 9% through 1.2 million so called quote retweets. Only 2% of the original tweets were disseminated through all these different forms of dissemination. Although liking was a very common event on majority of original tweets (52%), the overlap of retweeting and liking the same articles was fairly common (30%) (Fang et al., 2022), possibly suggesting that the actions are relatively similar.

The so-called reply and quote tweets contain, in addition to the original tweet content, some original content by the tweeters and can thus be considered as original tweets rather than retweets. Reply tweets can technically be identified by their connection to a conversation thread that links to a scientific publication. Kumar et al. (2019) discovered that 6% to 8% of the tweets linking to scientific publications were reply tweets in Medicine, Environmental Science and Chemistry. Didegah et al. (2018) found between 4.8% of tweets in Life and Earth Sciences and 24.5% in Mathematics and Computer Science to be replies.

Addressing other users by writing their @username in the tweet has been found to be more common than use of reply tweets. Hassan et al. (2019) found 81% (8373) of a total of 10,345 tweets to 820 articles in Library and Information Science to contain @username mentions. Kumar et al. (2019) concluded that 8.3% of the tweets in Medicine contained @username mentions, 7.9% contained a summary of the research results, 4.2% contained some praise or criticism-related sentiments, and 1.3% expressed the tweeters’ desire to access the article. But majority of tweets to scientific articles tend to be neutral (76%) (Hassan et al., 2020).

Thelwall et al. (2013) found that between 17% of tweets linking to the journal Science and 87% linking to Nature only contained a mention of the title of the research publication. Didegah et al. (2018) reported that 92.4% of tweets in Life and Earth Sciences did not have any additional content beyond the title and article hyperlink. In Physical Science and Engineering and in Mathematics and Computer Science the proportion of tweets without any additional content were 16.6 and 57.6%, respectively. In a study of tweets to publications in the institutional repository of Georgia Southern University, Sergiadis (2018) found that 48% of the tweets mentioned only the title, 14% contained a summary of the research and 15.5% indicated some opinions of and interest towards the research. Kumar et al. (2019) also reported that majority of the tweets in Medicine (74.3%) and Chemistry (65.7%) do not contain any additional text other than the title of and a link to the article. Htoo and Jin-Cheon (2017) identify four themes of text in tweets containing scientific content and across academic and non-academic individuals and organizations, with majority of the tweets meant for sharing rather than discussion, promotion or access. These results show that often the tweets and retweets do not contain any additional content beyond the title of the shared scientific article and a hyperlink to the full article.

Results from recent studies about including some media content in tweets that link to scientific publications suggests that adding some visuals to the tweets can significantly enhance the users' engagement with the tweets. Hoffberg et al. (2020) created a randomized control trial group to compare dissemination of tweets with visual abstracts to textual tweets linking to 50 PubMed articles, finding that visual abstracts had significantly higher number of impressions, retweets and link clicks. Oska et al. (2020) also conducted a prospective case–control analysis of 40 articles in American Journal of Nephrology. Researchers tweeted the articles in three different formats: text only, with a key figure, and with a visual abstract. The results showed that visual abstracts attracted twice as many views as the tweets with key figures and text only, and that they gained five times more user engagement than the text only tweets and 3.5 times more than the tweets containing key figure.

Methods

To answer the research questions, during April 2023 we extracted all X (Twitter.com) posts (including both original tweets and retweets) linking to PLOS ONE articles. All PLOS papers are only classified as multidisciplinary in Scopus, we adopted and modified the classification made by Dimensions and used by altmetric.com (Australian and New Zealand Standard Research Classification 2020-ANZSRC) to assign subject fields to each article (Table 1). For the analysis we used the first subject field assigned to each paper to avoid duplicate papers occurring in different fields when sampling for manual content analysis. For content analysis, this study used two datasets of tweets mentioning papers published in PLoS ONE:

Table 1 Population and random sample sizes of original tweets and retweets across fields in Dataset 2

Dataset 1

For manual analysis of content and user type we aimed for 95% confidence interval that required us to manually check 371 tweets and retweets. The analysis of Dataset 1 was focused on primary coding and categorization of both textual and media content features and types of Twitter users posting them.

Dataset 2

To enhance the analysis and statistical power in examining differences in the proportion of features utilized in original tweets versus retweets, we broadened our sample size by creating Dataset 2. Dataset 2 consists of 5898 randomly sampled tweets (2866 original tweets and 3032 retweets) for content analysis across specific subject categories. The sampling was performed as complementary to dataset 1 and only for the fields with fewer than 100 cases to ensure over 90% confidence interval per subject area (Table 1). This also helped to improve representation of other fields in the sampling, as the random sample in dataset 1 was skewed towards Medicine and Health Sciences and Biological Sciences (70%) due to larger population of tweets from these. This data was then used for making comparisons across two broader subject areas of Science, Technology, Engineering and Medicine (STEM) and Social Sciences and Humanities (SS&H).

Table 2 shows the volume of original tweets and retweets and distribution of random samples over time based on publication year of tweeted papers.

Table 2 Population and random sample sizes of original tweets and retweets over publication year in Dataset 2

Twitter user categorization

Twitter user categorization was done in an exploratory way by analyzing the content of profile descriptions. The profile descriptions were systematically classified manually by two coders, identifying frequently appearing words or specifically descriptive content (Hsieh & Shannon, 2005). This qualitative content analysis used inductive reasoning, allowing for the themes and categories to emerge from the classification process of user profiles (Zhang & Wildemuth, 2009), although user classification in Twitter is rather complex due to users using it for both personal and professional reasons (Bowman, 2015). The initial classification led to identification of 24 user types. A simplified two-level unit-function classification of Twitter users (inspired by many Twitter and Facebook user classification studies, e.g., Mohammadi et al., 2020) was incorporated (Table 3) to manually code user types in Dataset 1. At unit-level the classification identifies individuals, organization and other accounts (bots and unassigned); at function-level academic, professional, science communicator and other accounts (public individuals, bots, and unassigned). This classification resulted in nine combined unit and function categories.

Table 3 Twitter user categories and types of users included in each category

Tweet content categorization

Dataset 1 was used to code the content based on text and media features. In terms of content features, three categories were assigned (1) Only paper hyperlink and/or title; (2) inclusion of #hashtag, @username or other minor additions; (3) everything else that constitutes communication and human-like sentences (any phrase or sentence other than title of the tweeted paper). Additionally, tweets and retweets were identified for containing hashtags, @user mentions, and for being Reply or Quote tweets. In Twitter, Quote tweets are a form of retweets, but as they have an original contribution from users and are possible to be retweeted we identify them as original tweets, while retweets of Quote tweet are counted in retweet category.

All tweets and retweets were opened in a web browser to analyze the content features. This approach was taken as the Twitter API provides media content only as shortened hyperlinks. These shortened hyperlinks could then link to the linked paper, webpages, quoted tweets, images, or videos. After checking tweets and retweets, three main codes were assigned as media content: (1) image, (2) video, (2) article link preview (i.e. small snippets of the article displayed from the publisher website and containing part of the paper title). Then, image content was further classified as (a) from papers (figures or snapshots of article content) and (b) not from paper (related to subject of paper, related to author (e.g., while presenting at a conference), and (c) journal cover or database logo). Majority of the hyperlinks to the papers were not shown in publisher URL preview mode and not all preview modes contained a thumbnail figure from the paper. If the article link preview contained a thumbnail figure it was coded as an image.

Statistical analysis

Power Analysis

Difference of proportions between tweets and retweets were examined using Pearson Chi2 test and pooled standard deviation was conducted for independent samples. Power Analysis was used for assessing the probability of detecting an effect when it exists. Sample sizes (for both datasets) and confidence interval of 95% was assigned for the power estimates of the proportions. Power Analysis was utilized to estimate the impact of independent samples of original tweets and retweets on the observed proportions of content features which were dummy-coded. This study uses the Power value to estimate how significant the differences in observation of factors for each group are. In the present study, the power of 0.8 and above means that with 95% confidence interval occurrence of false negatives or type two errors is not likely in 80% or more of the cases. Odds ratio was calculated to demonstrate the strength of the association between tweets and retweets for each content feature. Values close to 1 show that the odds of tweets are the same in either the presence or absence of them in retweets.

Proportion non-zero

Proportion non-zero was used as an estimate for tweets and retweets with non-zero content features, with a 95% confidence interval (Thelwall, 2017). Lack of overlap in confidence intervals or error bars of tweets and retweets per compared feature are considered as a significant difference and overlapping confidence intervals are considered as insignificant difference.

Field equalized proportion non-zero

World normalised proportion non-zero of metrics (EMNPC) were used for comparisons of the prevalence of various content features across tweets and retweets between STEM and SS&H fields when they were used more or less than the world average (= 1). Both proportion non-zero and EMNPC analysis were conducted with Webometric Analyst (lexiurl.wlv.ac.uk) (Thelwall, 2017).

Findings

Use of Different features in Tweets and Retweets by Various Twitter Users (Dataset 1)

In answer to the first research question, Fig. 1 indicates the proportion of tweets and retweets posted by different Twitter user categories and Table 4 shows the power of differences between proportions of tweets and retweets posted by each user type. The results show that Individuals account for about half of the tweets (55%), but almost 81% of the retweets, academic individuals and other individuals accounting for 37.5 and 30.5%, respectively. Bots and Organizations both have significantly higher contribution to tweets (20.2 and 19.1%, respectively) than to retweets (1.6 and 12.1%). The estimated power (> 0.8) and odds ratios in the Table 4 and the 95% confidence intervals in Fig. 1 suggest that the differences between proportions of users tweeting and retweeting are significant for Individuals, especially academic individuals and other individuals (with higher contribution to Retweets), and Bot (with lower contribution to Retweets), but not significantly different for other categories.

Fig. 1
figure 1

Proportion of tweets and retweets posted by user categories

Table 4 Proportion of various Twitter users’ contribution to tweets and retweets (Dataset 1) and the estimated power (α = 0.05) and odds ratios of proportion differences

Content features in Tweets and Retweets

To answer the second research question, we analyzed the presence of specific content features in the tweets and retweets. Table 5 shows the extent of different content features that appeared in tweets and retweets in dataset 1 (both n = 371). Table 6 shows how various Twitter user categories incorporate various content features in their tweets and retweets.

  • Communication, human-like sentences. In terms of text content (Table 6), majority of original tweets contain Communication sentences produced by Individuals (about 32%), that are mainly Academic Individuals (about 18%) and that belong to Academia in general (21%), but with significantly greater extent on retweeting Communication sentences (65%, 30.5% and about 28%, respectively).

  • Only # or @ added. Posting academic articles that contain hashtags and @user mentions is significantly more frequent among Individuals (about 13%), and Academia in general (about 10%), compared to their retweeting these (about 6% and 4%, respectively).

  • Only hyperlink and/or title. Tweets by Individuals and Bots containing only title and/or hyperlink to academic articles were more frequent than any other user category (10.5% and 10%, respectively), with only Individuals retweeting them almost at the same rate (9.7%). Amongst Individuals only Academic Individuals tend to retweet posts with only title or hyperlink (6.7%) significantly more than they tweet (2.4%).

  • Images. In terms of media content (Table 6), Individuals (about 10%) and Academia (about 7%) tweet images along with paper more than any other user category. However, Individuals retweet (about 46%) about five times more than they tweet (about 10%) images, while the difference is larger for Professional Individuals (7% vs. 0.8%, power > 0.9), and Other Individuals (about 19% vs. 3%, power = 1) than Academic Individuals (approximately 20% vs. 6%, power = 1). Academia in general (including individuals and organizations) includes images three times more in retweets (about 22%) than in tweets (about 7%).

  • Paper URL Preview. The paper snippet previews from publisher website are significantly more apparent in retweets by Individuals (about 30%) and Academic Individuals (15%) compared to their tweets (7.3 and 3.5%, respectively). Use of Paper URL previews by various user categories follows a similar pattern to image uptake in terms of significance of differences between tweets and retweets.

Table 5 Proportion of (Original) tweets and retweets in dataset 1 (both n = 371) containing various content features and power estimate (α = 0.05) and odds ratio of proportion differences
Table 6 Proportion of various Twitter users’ contribution to (Original) tweets and retweets across text and media content features

To provide further answer to the second research question, a sample of tweets and retweets (dataset 2) were examined for their content features, resulting in an increased estimated power when comparing proportion differences across original tweets and retweets (Table 7) for majority of the content features. Figure 2 gives the proportion of various content features at 95% confidence interval illustrating the significance of difference in proportions between tweets and retweets, using error bars.

Table 7 Proportion of (Original) tweets and retweets in dataset 2 containing various content features and power estimate (α = 0.05) and odds ratio of proportion differences
Fig. 2
figure 2

Comparison between Original Tweets (n = 2866) and Retweets (n = 3032) in proportion of various content features in dataset 2. Colored factors indicate significantly higher proportion in the tweet type

Text Content

Although communication, human-like sentences appear in 45% of tweets, at about 63% they are significantly more apparent in retweets. Over a third of tweets contain just a paper hyperlink and/or title (36%), compared to just below one-quarter of the retweets (23.8%) and overall, minor additions to content of tweets are less common, while more popular in tweets (15%) than in retweets (6%). Hashtags are present in about a quarter of both tweets and retweets, but @user mentions are slightly but significantly more apparent in retweets than in tweets (about 27% vs. 23%, power > 0.9). Replies have a retweet rate that is half as likely as the rate for original tweets, approximately 8% compared to 14%. Similarly, quote tweets are less prevalently retweeted (9.5% vs. 12.6%).

Media Content

Media content appears twice as often in retweets (42%) than in tweets (24%). Majority of the media are Images. Video-sharing in tweets linking to papers is rare (0.5%), however the chances of finding video content in retweets is a bit higher (1.8%). Images shared in tweets from papers (11.5%), including both figures and snapshots of content (9% and 2.4%), constitute majority of media content shared, and such content is twice (about 22%) as likely to be retweeted (17.1% and 4.8%). Paper link previews account for about 16% of tweets, but attract significantly more retweets at about 24%, especially when a figure thumbnail is shown (5.4% tweet vs. 13.4% retweets). Based on the power estimates, there is no statistically significant difference between tweets and retweets for inclusion of images that are not from the scientific paper (both about 6%).

Comparing Trends in Usage of Content Feature in Tweets and Retweets

To answer the third research question, we analyzed changes in the inclusion of different content features in tweets and retweets over time. Figure 3 gives an overview of the changes in proportion of tweets and retweets containing various content features at 95% CI. Both tweets and retweets are independent samples and are not connected, except that they belong to the same journal, hence any alignment of patterns is not necessarily connected to the samples.

Fig. 3
figure 3

Comparing usage of various text and media content in Tweets and Retweets over time based on papers’ publication years

Text Content Trends

Almost all years show a significantly higher number of tweets than retweets for the categories “only paper hyperlink and or title” and “hashtag, @user mention or minor additions”. Both text patterns also indicate a changeable but an overall slight drop from about 40% to 20% in original tweets and from about 20% to around 5% in retweets. On the contrary, communication, human-like sentences indicate fluctuation but overall increase in both tweets (from about 20% in 2016 to 60% in 2022) and retweets (from about 50% in 2015 to 90% in 2022). Retweets has often been significantly more likely to be inaccessible (deleted, limited or sent from suspended accounts) than original tweets. Hashtag usage indicate a consistent pattern across tweets and retweets over time, whereas use of @user mentions indicates an increase in retweets more recently, appearing consistently over time in tweets (around 22%). Both reply and quote tweets show significantly higher number of tweets (at about 20%) than retweets (about 10%).

Media Content Trends

Images are the only feature that indicate a consistent rise over time as well as a significantly higher presence in retweets (with a rise from about 30% in 2014 to 80% in 2022) compared to tweets (from about 10% in 2013 to 50% in 2022). However, figures from paper are only slightly more in retweets than in tweets and have only slightly increased over time. Paper link previews indicate a persistent rise in tweets from about 10% in 2013 to about 50% in 2022, while retweets had shown higher link previews, the trend is reversed from 2020 (about 35%) by dropping to 30% in 2022.

Content features in Tweets and Retweets across Subject Fields (Dataset 2)

To answer the fourth research question, we included the subject fields of the tweeted and retweeted scientific articles to our analysis. Figure 4 gives an overall comparison of tweets and retweets across STEM and SS&H fields across various content features using a field normalized proportion containing content features. The comparisons need to be made relative to value 1 where values above indicate significantly higher usage than the world average and below significantly lower usage than the world average. A more detailed report across 22 subject field (eleven per STEM and SS&H fields) are provided in Table 8 (for text content) and Table 9 (for media content). In the supplementary material, Figs. 5–26 provide field-specific factors involved in tweets and retweets.

Fig. 4
figure 4

Comparing field equalized proportion (EMNPC) original tweets and retweets across STEM and SS&H fields

Table 8 Proportion of Original tweets (retweets) in dataset 2 containing various textual content features across 22 disciplines and underlined when power of estimates for proportion differences based on Pearson Chi-Squared test was larger than 0.8 (α = 0.05)
Table 9 Proportion of (Original) tweets and retweets in dataset 2 containing various media content features across 22 disciplines and underlined when power of estimates for proportion differences based on Pearson Chi-Squared test was larger than 0.8 (α = 0.05)

Text Content across STEM and SS&H. Both “only paper hyperlink and/or title” and “hashtag, @user mention or minor additions” occurred significantly higher in original tweets to STEM papers (5 and 4 fields, respectively) and significantly fewer of retweets to SS&H papers than the world average (2 and 5 fields, respectively). A reverse pattern is seen for “Communication, human-like sentences” with significantly higher occurrence in retweets to SS&H papers (five fields) and significantly lower occurrence in tweets to STEM papers (7 fields). Retweets to SS&H papers were significantly more likely to be inaccessible in contrast to original tweets to STEM. Usage of @user mentions is also more than world average in retweets to SS&H papers and below world average for original tweets to STEM papers. Hashtag mentions is significantly above world average for STEM fields and below world average for SS&H fields in both tweet types. Replies are significantly above world average in all original tweets and below world average for Retweets to STEM papers. Quotes are significantly above world average in original tweets to SS&H and below in retweets to STEM papers.

Media Content across STEM and SS&H. Publisher link previews were occurring significantly above average in retweets in SS&H (3 fields) and below world average in all original tweets (4 fields). Any media content (16 fields), including any image (15 fields) and paper Figs. (6 fields) appear in retweets in both subject areas. Videos were exceptionally more prevalent than world average in retweets to STEM (only in Physical Sciences, see Table 9). Snapshots from paper content are prevalent in retweets to SS&H (especially in Philosophy and Religious studies, see Table 9). Images not from paper appear above average in Retweets to STEM papers but not significantly in any narrower fields. Database logos or journal cover photos were significantly above average in Original tweets to STEM fields. Images in thumbnail previews were significantly more prevalent in retweets to SS&H papers (three fields) and less prevalent in all original tweets (7 fields). Retweets to STEM fields contained more tweets with users tagged to the image than the world average and original tweets in both areas had less of them than the world average. Table 8. Proportion of Original tweets (retweets) in dataset 2 containing various textual content features across 22 disciplines and highlighted when power of estimates for proportion differences based on Pearson Chi-Squared test was larger than 0.8 (α = 0.05).

Discussion

This study investigated how different user types tweet and retweet on Twitter (presently known as X), and if certain types of content features were more common in either tweets or retweets across different fields of science and over time.

Answering the first research question, and confirming some of the findings from earlier studies, the results from this study showed how the contribution of different user types differed between tweets and retweets. The findings showed how individual user types account for clearly more retweets than original tweets (85% retweets vs. 62% tweets), suggesting that different user types clearly contribute differently to tweets and retweets. The findings also show that this disparity is significantly more apparent in tweeting practice of academic individuals (38% retweets vs. 27% tweets) and other individuals (31% vs. 20%), while bots retweet significantly less than they tweet. Future research that would want to investigate bots in altmetric context, might benefit from focusing on original tweets only. The findings indicate that tweets and retweets are different in many ways, possibly reflecting different kinds of impact or attention, and therefore they should be treated separately if considering them as a measure of impact or attention. This has clear implications for applicability of altmetrics for research assessment and for the use of altmetric indicators in general. Researchers, publishers, and others involved in either scientometrics research or with research assessment should thus avoid treating all tweets as equal.

Some of the findings confirm those from earlier studies. For instance, the proportion of quote tweets was consistent across original tweets and retweets (in dataset one 8.9 and 8.1%, respectively) and with previous observations (9% reported by Fang et al., 2022). Similarly, the proportion of replies in original tweets (7.5%) was very close to the results in Fang et al. (2022). While our results were very similar to some of the earlier studies, the findings also point to variation across subject fields (answering the fourth research question) that needs to be recognized and considered if the data would be used for altmetric purposes. Future research could investigate the reasons for the observed differences.

Answering the second research question, the results showed significant differences in the text and media content of original tweets and retweets. Communicative sentences appeared up to twice as often in retweets than in tweets, indicating growth over time (answering the third research question) and that majority of retweets (up to about 80%) are likely to be result of actual human interaction. It would also appear that perhaps personal commenting, rather than just sharing the title of scientific articles, invites more engagement and thus, wider dissemination and reach. Our results also showed that media content and hyperlink previews too are significantly more common in retweets, with image usage showing consistent growth both in tweets (up to 50%) and retweets (up to 80%). The inclusion of such content in the tweets appear to attract more engagement in the form of retweeting, thus increasing the reach of the tweeted research papers. As this finding casts some doubt on the usefulness of tweets and retweets as indicators of impact or attention, future research could further investigate the motivations for tweeting and retweeting, and the influence specific characteristics of the tweet content or the scientific paper in itself may have on the attention it receives on Twitter.

Conclusion

This research investigated differences in user engagement, and use of text content and media content in tweets and retweets that link to scientific publications. The findings indicate significant differences, showing that Twitter reach of scientific publications heavily depends on the chosen strategies to include communicative and visual content, such as images, figures, and other visual media, as tweets and retweets with visual content gain more attention and therefore wider reach The gained attention may not thus be due to the significance of the research results or reflect public interest towards the research, at least not alone. It is therefore clear that all tweets are not equal and thus all tweets should not be treated similarly for altmetric purposes. Further research is needed to identify how these differences between tweets and retweets associate with conventional impact of research and what are the implications of occurrence of different events for research impact evaluation.