1 Objective

1.1 Background and data rationale

Social media has emerged as a valuable resource for dissecting foreign policy dynamics, offering insights into public opinion, diplomatic interactions, and the communication of policy decisions [1,2,3]. Among social media platforms (SMPs), Twitter stands out due to its concise nature and rapid information dissemination, making it a prime source for real-time foreign policy insights [4, 5]. Recognizing this, we introduce the TFPsocialmedia dataset, tailored for scholars interested in Turkish Foreign Policy and social media analysis for international politics.

1.2 Dataset overview

TFPsocialmedia encompasses 180,302 tweets from 3597 accounts across 27 countries, focusing on political actors and commentators. It enables analysis of intricate communication networks and public perceptions surrounding TFP. Our dataset, spanning 2007 to 2023, supports statistical, network, and text analysis techniques. It unveils Turkey’s image within the global Twittersphere, offering multifaceted insights into its portrayal and perception.

1.3 Research applications

The TFPsocialmedia dataset has been employed in the study by Mehmetcik et al. Beyond this, the dataset allows exploration of political discourse patterns, sentiment trends, and crisis response dynamics. By connecting Twitter trends with foreign policy events, researchers uncover intricate relationships [6].

1.4 Future directions

Our resource-rich dataset bolsters the study of TFP and extends to broader applications in international relations research. Its insights into communication networks and public perception dynamics offer valuable context for policy analysis. The TFPsocialmedia dataset’s ongoing updates and diverse account selection promise continued relevance and depth for researchers exploring multifaceted aspects of TFP.

In summary, the TFPsocialmedia dataset, illustrated by its application in a notable study, facilitates nuanced exploration of TFP perceptions and communication dynamics. As an adaptable and expanding resource, it holds potential to enrich foreign policy analysis within broader regional and global contexts.

1.5 Data description

The TFPsocialmedia dataset comprises tweets related to the keyword “Turkey,” sourced from a meticulously curated list of 3597 user accounts on Twitter. These accounts encompass individuals, organizations, and news outlets relevant to Turkey-related subjects. To ensure data relevance, the academic Twitter API was employed, utilizing the stream-by-user account option coupled with a keyword filter.


Data Collection Process:

  • User account selection: A comprehensive list of relevant Twitter user accounts was compiled, representing key stakeholders discussing Turkish Foreign Policy.

  • Stream-by-user account option: Leveraging the academic API, tweets from selected user accounts were streamed in real time, ensuring a continuous feed of their content.

  • Keyword filter: To refine the dataset’s focus, tweets containing the keyword “Turkey” were filtered, ensuring alignment with the research topic

  • Data collection: The matched tweets were collected and stored, resulting in a dataset of over 220,000 tweets. Data collection initiative commenced in July 2022 and has been ongoing since then, continuously capturing new data up until the present time.

  • Data cleaning: A rigorous data cleaning process eliminated irrelevant data. For instance, a text processing and filtering script separated unrelated “turkey” content, leaving only tweets relevant to Turkey the country. It is also acknowledged that there is a need for continuous monitoring and refinement of the dataset as the data is continually updated.

  • Final dataset: The cleaned dataset encompasses more than 180,000 tweets, including attributes such as tweet text, date, and user information.

  • Updates: In our working process involves collecting tweets weekly, utilizing specific time scale options. In practical terms, every new search is initiated from the point where the previous one concluded. This approach ensures a systematic and continuous data collection/data cleaning and further analysis process over time.

1.6 Data accessibility and transparency

The dataset, alongside codes and calculations, is publicly accessible on both a Figshare repository[7].Footnote 1 This transparent approach adheres to reproducible social data science practices. Sharing these resources fosters openness and collaboration within the research community, facilitating validation, extension, and deeper insights. The data’s availability promotes robustness and quality in social data science research, enriching the collective knowledge landscape (Table 1).

Table 1 Overview of data files/data sets

2 Limitations

  • Language bias: The dataset primarily features English-language tweets, potentially introducing a language bias that could limit its representativeness across non-English-speaking countries. While English serves as a prevalent international language, this limitation might hinder a complete global landscape depiction.

  • Account dominance: The dataset displays a prominent presence of United States (861 accounts), European institutions (778 accounts), and the United Kingdom (429 accounts). This dominance could result from the English-language focus and the political and economic significance of these entities, potentially skewing the representation of other regions.

  • Selection bias: The dataset’s account selection process leans toward politically relevant accounts such as politicians, state officials, and political commentators. This selection bias could underrepresent alternative voices and grassroots perspectives, influencing the overall balance of viewpoints presented.

  • Twitter’s structure: Due to the dynamic nature of Twitter and the potential deletion of older tweets over time, our dataset may have better coverage for the periods in which it was collected compared to earlier periods.

2.1 Mitigation efforts

  • To address language bias, ongoing efforts involve incorporating non-English tweets using translation tools. This expansion aims to enhance global representation and increase the dataset’s inclusivity.

  • While acknowledging selection bias, the dataset’s focus on specific account categories aims to ensure data consistency and reliability, mitigating misinformation. Interpretation of findings should consider the potential influence of selection bias.

As part of our commitment to transparency, we actively acknowledge and address these limitations, striving to improve the dataset’s comprehensiveness and relevance for diverse research inquiries.