Reflections on study abroad: a computational linguistics approach
- 168 Downloads
Study abroad and the associated sociocultural experience has been a subject of substantial interest to social science scholars and university administrators. Shedding novel light on the phenomenon, we draw on a corpus of student-authored reflective essays and apply machine learning methods for analysis of text-as-data to examine the features and the determinants of salient themes emphasized by students in their study abroad reflections. Our analysis identifies 18 different topics spanning the domains of distinctly cultural cognition, interaction with people, physical environment, and personal change. Specifics of the experience such as duration and location, timing of reflections, and observable student characteristics including gender, major, academic performance, extracurricular involvement, and socioeconomic status are all important determinants of student’s reflections. Different factors, however, matter differently with respect to students’ emphases on particular topics, a finding indicative of the complex nature of the study abroad experience.
KeywordsStudy abroad Reflections Culture Text-as-data Machine learning Structural topic model
The demand for international education and the volume of international student mobility have been steadily increasing over the past decades and are currently at an all-time high . Studying abroad is nowadays deemed a one-of-a-kind opportunity for cultural immersion and an ideal preparation for future productive engagement in the increasingly intercultural and globally interdependent world. Consequently, study abroad has become an integral part of higher education both in the USA and internationally.
Study abroad, however, is an expensive endeavor. Spurred by internal budgetary pressures as well as external requirements imposed by higher education accreditation bodies, administrators and scholars across colleges and university campuses have become increasingly interested in systematic appraisals of study abroad programs and their effects on students’ cultural, academic, and personal competencies (see, e.g., Anderson and Lawton ). As a result, a growing volume of research scattered across social sciences, humanities, and policy literature on international education has attempted to characterize and assess students’ study abroad experiences.1
With regard to substantive findings, the resultant copious yet still largely inchoate body of research is strongly suggestive of the potential of the study abroad experience to exert an effect on a wide range of student outcomes such as intercultural communication skills, foreign language proficiency, appreciation of cross-cultural differences, and interest in international affairs [2, 3, 6, 7, 32, 33, 37]. In addition, time spent studying abroad has been shown to be associated with an elevated interest in scholarly pursuits, increased self-awareness, enhanced ability to cope with challenges, greater willingness to engage in prosocial behavior, and better informed career choice considerations [15, 25, 26]. As a further step in this line of inquiry, the literature has attempted to investigate how particular student outcome measures correlate with study abroad experience-related and student-specific variables, such as study abroad duration and student characteristics [3, 9, 37]. While there in general exist few settled findings, and even fewer comprehensive analyses of the role of potentially pertinent factors, existing evidence indicates that both experience- and student-specific traits can matter for study abroad outcomes.
Methodologically, the existing study abroad literature has predominantly relied on data generated on the basis of closed-ended surveys (among many others, see, e.g., Carlson and Widaman , Basow and Gaugler , Terzuolo ) or qualitative interviews (e.g., Mendelson , Dolby , Walsh ). The resulting analyses, combined with increasingly careful application of statistical methods, have undoubtedly notably improved our understanding of students’ study abroad experiences. Yet, reliance on closed-ended surveys and interview-based methods also renders the resulting studies inherently vulnerable to well-known methodological drawbacks that come with the use of those methods. Closed-ended surveys, for example, are sensitive to researchers’ subjective decisions concerning specific topical emphases as reflected in the design of the survey questions and can be plagued by participants’ response biases. On the other hand, qualitative interviews are resource-intensive and consequently often result in small sample sizes with adverse implications for the statistical power of ensuing studies.
Therefore, in addition to further addressing challenges associated with the use of existing methods, scholarship on the study abroad experience and its ramifications would benefit from incorporating new methodological tools for analysis of large textual datasets that are gradually becoming an integral part of the methodological toolkit of social scientists and humanities scholars [11, 14, 24, 31]. The application of such tools in the context of study abroad research holds promises to, on the one hand, alleviate (at least some of) the concerns with current methodological approaches and, at the same time, complement the existing research in terms of the nature of generated insights.
In the present paper, we follow this line of reasoning in an attempt to provide a hitherto unexplored perspective on students’ study abroad experience and cultural immersion. Instead of relying on close-ended survey questionnaires or small-scale interviews, we use novel computational methods for analysis of text-as-data to quantitatively examine a comparatively large textual corpus of mandatory, open-ended reflection essays authored by a group of students who are enrolled at a selective US liberal arts college and who recently studied abroad. Our analysis first identifies groupings of salient themes, referred to as ‘topics’, about the study abroad experience as emphasized by students in their own words. The topics are uncovered by an unsupervised machine learning algorithm. They are thus devoid of human researchers’ preconceived notions about the study abroad experience that may affect the design of closed-ended survey questionnaires and the conduct of qualitative interviews. We then use the estimated topics in a regression-like framework to investigate whether, and if so how, students’ emphases on particular topics vary systematically with a comparatively broad range of observable experience and student characteristics available in our data. The resulting analysis casts a novel light on the question of what factors shape study abroad experiences. More generally, our investigation illustrates how machine learning-based computational techniques applied to text-as-data can be utilized to investigate the features and the determinants of cultural, societal, and personal reflections.
To examine students’ study abroad reflections, we estimate a topic model, a machine learning-based statistical tool for analysis of large textual corpora. As a class of generative probability models, topic models require a researcher to postulate a model of the data generating process and then use the data to determine the most likely values for the parameters within the model. To estimate the parameter values, topic models view texts as ‘bags of words’ and then apply an unsupervised machine learning algorithm that exploits the co-occurrence of words across documents to classify groups of words that tend to co-occur .
The resulting ‘topics’ are formally conceptualized as probability distributions over corpus vocabulary, while documents (chunks of text, for us student essays) are modeled as mixtures of topics.2 The name of each topic is assigned by the researcher upon scrutiny of key words most closely associated with the topic and after reading of the documents that feature a given topic particularly prominently. The topics themselves, however, are solely a product of model estimation. They are not obtained by matching words and documents to concrete thematic issues specified by the researcher prior to estimation (as they would be in a supervised model).
Topic modeling is of course not a substitute for careful reading and nuanced interpretation of text. As a complement to conventional textual analysis, topic models are particularly suitable for analyses of large textual corpora when the principal aim of the analysis is to provide a macroscopic guide to the salient themes emphasized in a corpus. With the emergence of ‘big data’ and researchers’ interest in text-as-data methods, the use of topic models has become increasingly common across a broad range of academic disciplines. The Latent Dirichlet Allocation (LDA) model  in particular has been fruitfully applied by both social scientists and humanities scholars [14, 16, 17, 23]. We use the Structural Topic Model (STM; Roberts et al. [27, 28]), a recent innovation that, unlike the LDA model, integrates document-level data into the analysis (see, e.g., Lucas et al. , Farrell , Law , Grajzl and Murrell ).3 The incorporation of metadata into the analysis produces the best available estimates of the topics as well as, importantly, allows the researcher to examine the effect of metadata covariates on topical prevalence.4
In our analysis, we first estimate a set of topics, the principal themes in the corpus as identified by the unsupervised machine learning algorithm and not readily apparent to a human reader of many disparate documents (student essays). We then make use of the defining characteristic of the STM, the inclusion of metadata covariates, to investigate the effect of metadata covariates on students’ emphases on particular topics in a regression-like framework. The resulting statistical analysis allows us to address a central question for assessment of study abroad programs: What experience- and student-specific factors determine students’ perceptions of their study abroad experience, and in what way?
Our textual corpus consists of study abroad reflective essays written by students at a selective private liberal arts college, located in the South Atlantic region of the USA. About 22% of students at the university under our consideration choose to study abroad at some point during their undergraduate career.5 Upon expressing their intention to study abroad, students complete an internal application and meet with a study abroad advisor who assists them with selecting a suitable program. Prior to departing abroad, approved students undergo a comprehensive online as well as in-person orientation about studying and living abroad.
Each deployed study abroad student is formally required to produce two reflective essays about their study abroad experience. (Our analysis is therefore not subject to selective response bias that often plagues studies utilizing data collected on the basis of voluntary surveys or interviews.) The essay requirements are made clear to students prior to departing for study abroad. The students are mandated to turn in their essays soon after returning on campus.
The first essay (early reflection) must be completed during the first week after the arrival at the study abroad location. Students are expected to write a brief (approximately, 250–500-word) commentary on any notable cultural and related experiences that they underwent upon their arrival at the study abroad location, taking into account the cultural goals they had set for themselves. The second essay (ex-post reflection) is a longer (roughly 1000-word) reflection to be completed no sooner than during the last 2 weeks of students’ study abroad. In that reflection, students are asked to review and reflect upon the first essay (early reflection) and comment on how the time away might have changed their initial cultural and other impressions. Students are further requested to comment on their cultural goals, how their time abroad changed them, and how their experience abroad is expected to affect their future on-campus experience.
Length of reflection essays, in words
Early reflection essays
Ex-post reflection essays
All reflection essays
We imported the corpus into R for pre-estimation processing using R’s stm package . We processed the corpus using the textProcessor function to convert the text to lowercase, apply the Porter stemming algorithm, and remove stop words (common natural language words with very little meaning, such as ‘a’, ‘and’, ‘the’, etc.) as well as numbers and punctuations. The resulting dataset consist of 71,042 word tokens.
With student essays conceptualized as mixtures of topics, the prevalence of a particular topic will tend to vary across essays and students. Because different essays pertain to different timing of reflections and are authored by different students, who come from heterogeneous backgrounds and who studied abroad in different locations and for varied lengths of time, one would like the data generating process that underpins the computational identification of the topics to let topical prevalence vary with available essay- and student-level metadata. This is exactly what STM allows for, enabling the researcher to incorporate metadata into the estimation of topics and subsequently assess the relationship between topical prevalence and the metadata.
Reflection essay timing
Timing of student reflections
Early , ex-post 
Year of study abroada
Calendar year of studying abroad
2017 , 2018 
Length of study abroad
Number of days spent studying abroad
Min. 40, max. 251, mean 114, std. dev. 18
Country of study abroad
Argentina , Australia , Brazil , Chile , China , Costa Rica , Czech Republic , Denmark , France , Germany , Greece , Hungary , India , Nepal , Indonesia , Ireland , Italy , Japan , Jordan , Nepal , New Zealand , Peru , Russia , Rwanda , Singapore , South Africa , Spain , Sweden , UAE , Uganda , UK 
World region of study abroad
Africa , Asia , Eastern Europe , Western Europe , Central America , South America , Oceania 
Male , female 
Part of university where the student has declared the choice of (first) major
Commerce school , the college 
Binary variable for whether the student has a second major or not
Has second major , does not have second major 
Cumulative GPA prior to studying abroad
Min. 2.97, max. 4.00, mean 3.52, std. dev. 0.25
Greek life affiliation
Binary variable for whether the student is affiliated with Greek life organizations or an independent
Greek , independent 
Varsity athlete status
Binary variable for whether the student is a varsity athlete or not
Varsity athlete , not varsity athlete 
Binary variable for whether the student is a domestic (US) student or a foreign student (on an F-1 visa or a resident alien)
Foreign , domestic 
Binary variable for whether the student is recipient of financial aid from the university or not
On financial aid , not on financial aid 
Binary variable for whether the student is a QuestBridge student or not
QuestBridge student , not QuestBridge student 
With each student in the sample having completed both required essays, exactly half of the essays are early reflections and the other half ex-post reflections. The mean length of time that students-authors spend studying abroad is 114 days, with the minimum and the maximum equal to 40 and 251 days, respectively. 71% of the students in our sample studied abroad in year 2017 and the rest studied abroad in 2018.
The students studied in 31 different countries covering all of the world’s major regions. The countries where students studied most commonly are Spain, Italy, Australia, UK, and Denmark. Western Europe is thus the most widely visited world region. One percent (two students) studied in multiple world regions.
Consistent with several prior study abroad studies (see, e.g., Stroud , Dotta ), a disproportionately large number of students (68%) in our sample are females. 63% are majoring in business, accounting, economics, or politics (majors offered in the commerce school) and the remaining students in humanities, sciences, the arts, or social sciences other than economics and politics (majors offered in the college). 32% of students have a second major. The mean cumulative GPA as measured prior to studying abroad is 3.52, with standard deviation equal to 0.25. 81% of the students are affiliated with Greek life organizations (fraternities and sororities), a proportion broadly consistent with the overall membership in Greek life organizations on campus as a whole. 18% of students are varsity athletes, a percentage lower than the percentage for campus as a whole.
12% of students are on a student visa or have a resident alien status, a number that slightly exceeds the campus-wide proportion. 41% of students are recipient of some amount of financial aid and 2% (three students) are recipients of the QuestBridge scholarship for exceptional students from low-income families. The sample proportions of students on financial aid and QuestBridge scholars are lower than the campus-wide proportions of financial aid recipients and QuestBridge students, respectively.
The complete dataset (anonymized textual corpus and corresponding metadata) that we utilize in our analysis is not publicly available. For replication and further research purposes, it is available from the authors upon reasonable request.
What do students emphasize in their reflections?
Choosing the number of topics
An important modeling decision in estimating an STM is the choice of the number of topics to be estimated. There exists no definite approach to determining the optimal number of topics for a corpus. The literature advocates the use of both computational statistical measures and human judgment (Roberts et al. [27:1068–1070; 29]). We first estimated models featuring between 5 and 30 topics. We compared the resulting models based on the measures of goodness of fit (in particular held-out likelihood and size of residuals). We then narrowed our focus on the subset of estimated models that fit the data particularly well [29, 35, 38]. We contrasted these models based on the average scores on semantic coherence (a measure of internal consistency of topics) and exclusivity (a measure of the extent to which topics can be differentiated one from another). This allowed us to identify models that are not strictly dominated by other models based on the average semantic coherence and exclusivity scores. We then carefully inspect the estimated topics for a small set of models on the resulting semantic coherence-exclusivity frontier . We ended up selecting the model with 18 topics.
Topics and top words for estimated 18-topic STM
1. Comparing Cultures
Highest Prob cultur, london, peopl, differ, also, american, much, countri, citi, experi, learn, mani, british, one, interest, time, danish, class, semest, student, live, studi, environment, state, interact, divers, like, howev, can, understand
FREX london, environment, german, germani, british, danish, divers, applic, influenc, psycholog, varieti, freiburg, recycl, behavior, viewpoint, event, anticip, sustain, urban, exhibit, interact, berlin, reaction, westminst, design, respect, dessert, emphas, york, issu
2. Food Culture
Highest Prob spanish, spain, time, cultur, differ, eat, peopl, speak, also, sevilla, languag, day, famili, live, citi, meal, first, week, host, use, much, learn, dinner, get, spaniard, food, even, realli, think, abroad
FREX spaniard, spain, spanish, madrid, sevilla, lunch, sevill, eat, siesta, schedul, host, meal, dinner, mom, lifestyl, mother, andalusian, carmen, bread, tapa, semana, toledo, speak, santa, slower, argentina, dialect, feria, famili, late
3. Social Habits
Highest Prob peopl, time, danish, dane, denmark, can, get, cultur, much, like, also, realli, first, citi, way, bike, copenhagen, see, one, still, around, will, just, day, know, thing, famili, make, differ, state
FREX dane, denmark, danish, bike, copenhagen, smoke, babi, birthday, hygg, welfar, tax, host, nightlif, fashion, metro, children, young, age, drunk, belong, implement, drink, lane, happiest, januari, cozi, destin, dress, guarante, trust
4. Immersing in New Culture
Highest Prob rome, peopl, differ, cultur, life, time, italian, way, thing, new, place, experi, live, home, realiz, much, like, will, itali, first, citi, take, one, now, can, week, get, languag, american, learn
FREX rome, itali, roman, italian, trastever, chilean, lifestyl, sandwich, memori, shop, superfici, cabot, john, wash, siena, groceri, lack, european, acquir, money, miss, eastern, sunset, oppos, simpl, rack, piazza, simplic, valu, adjust
5. Work Culture & Experience
Highest Prob: work, cultur, peopl, time, experi, learn, australia, class, differ, australian, think, also, one, day, abroad, will, student, first, understand, studi, life, new, mani, like, much, lot, countri, realli, way, see
FREX: australia, rwanda, internship, australian, aborigin, collabor, irish, work, project, costa, late, team, assign, ngo, rica, lab, cathol, infrastructur, environ, compani, surf, colleg, rwandan, corpor, cowork, indigen, group, relax, creat, campus
6. Indigenous People & Land
Highest Prob new, zealand, cultur, time, maori, place, learn, differ, life, home, ive, much, peopl, abroad, like, make, studi, way, countri, state, see, back, student, mani, take, feel, one, howev, citi, think
FREX maori, zealand, dunedin, kiwi, auckland, island, otago, pacif, geolog, flat, land, farm, laid-back, outdoor, meat, new, cook, pakeha, refresh, recognit, landscap, protect, popul, sourc, mara, earth, groceri, respect, degre, healthi
7. Social Divides
Highest Prob dubai, peopl, differ, time, countri, uae, cultur, student, south, one, class, also, live, friend, howev, experi, arab, middl, women, life, jordan, see, american, mani, studi, new, abroad, first, like, east
FREX: uae, dubai, arab, jordan, islam, segreg, durban, african, gender, east, south, middl, uganda, cape, aud, africa, debbi, femal, triniti, color, irish, women, lebanon, traffic, men, homestay, market, cast, kampala, ireland
8. Arrival & First Impressions
Highest Prob first, one, arriv, week, time, cultur, street, differ, citi, feel, even, day, walk, italian, american, experi, apart, student, scotland, new, florenc, town, seem, peopl, just, thing, immedi, abroad, almost, andrew
FREX valencia, scotland, duomo, andrew, confront, immedi, pull, crowd, varanasi, driver, town, tree, florenc, scottish, thirti, foot, rural, hot, anxieti, board, nyu, apart, arriv, highland, kathmandu, bus, bother, began, saturday, space
9. History & Art
Highest Prob class, cultur, histori, learn, experi, also, art, studi, abl, abroad, travel, differ, citi, time, scotland, andrew, florenc, univers, new, student, countri, peopl, understand, made, opportun, one, take, much, place, mani
FREX: andrew, scotland, art, scottish, medic, colosseum, florenc, histori, highland, ancient, edinburgh, medicin, artist, modern, mosqu, wlu, museum, connect, entranc, sevilla, monument, lectur, scenic, debat, scienc, undergradu, castl, ruin, cathedr, knowledg
10. Institutions & Prosperity
Highest Prob chines, china, cultur, citi, differ, first, shanghai, one, languag, experi, time, peopl, week, mani, also, roommat, western, state, unit, expect, countri, nation, life, understand, live, econom, learn, feel, even, can
FREX shanghai, chines, china, singapor, beij, singaporean, western, roommat, inde, econom, skin, edit, cet, commerci, growth, toilet, undoubt, strict, incom, bed, whiten, govern, freedom, subway, achiev, room, prosper, selfi, nation, million
11. Exploring the Surroundings
Highest Prob sydney, australia, australian, time, student, differ, experi, cultur, much, univers, week, citi, one, first, get, live, like, abroad, new, studi, travel, day, peopl, mani, american, also, come, walk, know, will
FREX sydney, australian, australia, cairn, reef, aussi, bag, thailand, starbuck, univers, melbourn, mountain, beach, rainforest, hub, startup, coffe, surf, bay, opera, hall, hostel, outdoor, dive, music, backpack, uni, snorkel, although, intern
Highest Prob thing, realli, time, experi, one, get, peopl, like, citi, just, abroad, much, home, friend, new, will, think, travel, place, learn, class, differ, live, first, day, way, studi, even, also, back
FREX sweden, flight, florenc, term, honest, cancel, amsterdam, dublin, plane, realli, sinc, sat, pretti, stockholm, homesick, switzerland, love, glad, weekend, didnt, ice, figur, harder, anyth, fun, got, boss, cool, airport, nervous
Highest Prob class, student, learn, like, differ, citi, get, live, first, cultur, week, studi, realli, peopl, program, professor, much, abroad, also, one, school, feel, thing, even, sinc, can, mani, new, time, milan
FREX milan, math, budapest, hungari, hungarian, engin, edinburgh, low, professor, unsw, station, transport, navig, milanes, materi, scienc, third, nutrit, letter, greek, admit, argentina, mathemat, athen, sinc, librari, teach, min, program, unlik
14. Discovering Society & Friends
Highest Prob czech, first, peopl, pragu, friend, one, differ, learn, week, experi, languag, program, get, realli, time, cultur, will, even, citi, though, day, like, new, make, abl, also, much, place, use, food
FREX czech, pragu, republ, communism, buddi, bara, teacher, check, intens, tram, pari, card, nazi, post-communist, bed, program, cet, charl, regim, michael, alarm, server, orient, camp, strang, czechoslovakia, though, restaur, ethnic, communist
15. Conversing with People
Highest Prob time, cultur, peopl, will, much, russian, one, can, languag, learn, friend, back, differ, citi, even, experi, also, polit, like, world, feel, semest, abroad, convers, countri, japan, studi, made, greec, state
FREX: russian, russia, japan, japanes, templ, greec, waiter, greek, preserv, athen, restroom, wont, west, custom, servic, surfac, literatur, partner, convers, tip, barcelona, waitress, water, afraid, hill, mediterranean, architectur, olya, rubl, polit
16. Relating to People
Highest Prob peopl, time, differ, life, thing, cultur, live, will, much, experi, like, feel, know, place, learn, way, one, can, think, just, come, want, countri, friend, greec, mani, see, world, also, understand
FREX: chile, greec, india, soccer, game, cusco, aborigin, barcelonan, poverti, indian, peru, play, career, generous, sport, barcelona, cuba, catalonia, catalan, sexual, kid, refer, stadium, seen, path, movement, relationship, happier, context, valu
17. Personal Growth
Highest Prob abroad, time, experi, learn, studi, cultur, will, abl, differ, new, one, friend, first, french, student, semest, life, also, countri, travel, much, washington, lee, take, back, way, mani, class, goal, visit
FREX thank, pari, independ, abroad, climb, franc, busi, lee, washington, teach, classroom, french, aix, european, matur, bless, engag, intern, achiev, term, econom, latin, lesson, reward, alon, contin, union, set, polit, enorm
18. Coping with Challenges
Highest Prob french, like, peopl, time, cultur, one, learn, experi, way, differ, week, much, semest, walk, say, mani, franc, languag, feel, american, abroad, made, will, food, first, situat, friend, take, part, live
FREX nant, housem, french, bath, indonesia, mistak, franc, balines, bali, homeless, romain, oxford, thesi, hello, mother, tutor, honor, phase, terror, bakeri, linguist, smoke, macron, brief, croissant, verb, relish, situat, unlock, porch
We present two distinct word lists of the 30 most important words for each topic. The highest probability (highest prob) words are those that are most frequent for a given topic, but also non-exclusive and hence may be featured as highest probability words for multiple topics (e.g., ‘cultur’, ‘abroad’, ‘experi’). In contrast, FREX words reflect two criteria: they are on the one hand frequent for a given topic (as highest probability words) and at the same time relatively exclusive to that topic, with our choice of relative weights assigned to the former and the latter criteria equal to 0.25 and 0.75, respectively. As such, FREX words are particularly informative for purposes of identifying topic names and distinguishing between topics.8
The topics summarized in Table 3 therefore provide a macroscopic, statistical, machine learning-based overview of the central ideas emphasized by the students in their study abroad reflections. Inspection of Table 3 suggests that the emphasized reflections entail one or more of the following broad groups of themes: commentaries entailing distinctly cultural perceptions, thoughts on interactions with people, observations on the physical environment, and considerations of a more personal nature. Further scrutiny of Table 3 reveals that, congruent with Fig. 1, cultural elements permeate nearly all of the topics; indeed, ‘cultur’ is among the top 30 highest probability words for 17 out of 18 topics. In what follows, we briefly discuss each topic in turn, justifying the assigned topic names. Supplementary Appendix further justifies the choice of topic names and illustrates the underlying ideas expressed by the students by providing sample quotes from essays featuring a particular topic most prominently.
The first among seven topics that are distinctly centered on cultural reflections is Comparing Cultures. This topic entails comparative discussions of cultural diversity, variety, and different outlooks on life (FREX words include ‘divers’, ‘varieti’, ‘viewpoint’) in a wide range of local contexts (‘german’, ‘british’, ‘danish’), often with reference to either American culture (‘american’ is among top-ranked highest prob words) or to other cultures. The city of London is highlighted as a particularly prominent example of a place rich in cultural diversity. In contrast, Denmark and Germany are perceived as countries with a distinct environmentally oriented culture (‘environment’, ‘recycl’, ‘sustain’), especially when contrasted with the USA.
Food Culture encapsulates students’ reflections on food, eating habits, and corresponding lifestyle that they have observed during their study abroad (top-ranked highest prob words include ‘cultur’, ‘eat’, ‘food’, ‘meal’). Documents featuring this topic most prominently discuss food in a Spanish and Latin American context. Students reflect on eating schedules, food choices, as well as lifestyle and traditions that include gatherings of the entire family for meals during important holidays (FREX words include ‘siesta’, ‘famili’, ‘santa’, ‘semana’, ‘feria’). A number of reflections appear in the context of students’ interaction with their host families (FREX words include ‘host’, ‘mother’).
Social Habits feature a description of local social habits and customs that readily stood out in students’ perceptions of the prevailing culture at their study abroad location. Many of the essays featuring this topic most prominently reflect on Denmark, but top-ranked essays also include observations from other countries and world regions. Consistent with the top-ranked FREX words for this topic, students discuss the use of bicycles as a common means of transportation, approach to attending to young children, the tradition to celebrate birthdays, the lack of conversations among people on the metro, the practice of spending a lot of time with one’s family, local smoking and drinking habits, the vibrant city nightlife, and fashion styles.
The next topic, Immersing in New Culture, encapsulates reflective discussions of how a first unfamiliar place with its distinct cultural norms and customs eventually became the students’ new home, resulting in embracement of the local lifestyle (‘home’, ‘new’, ‘lifestyle’, ‘life’ are featured among the top-ranked highest prob or FREX words) and immersion into the local culture. In essays featuring this topic most prominently, students reflect on choosing to “live as the Romans do”, “adjust as opposed to stick to the old ways”, and “assimilate into a new culture” with respect to a variety of contexts, including shopping, ordering of meals, strolling through local markets (piazzas), having to dry clothes on a rack (rather than rely on a dryer), using local currency (as opposed to credit cards), and gradually gaining an understanding of local values, such as the appeal of slowing down one’s lifestyle.
Work Culture & Experience features students’ reflections on work culture at places they visited and on work experience that they gained during their study abroad. ‘work’, ‘culture’, ‘experi’ are all among the top-ranked highest prob words. This topic encapsulates students’ cultural insights based on work conducted in both in-class and out-of-class settings, such as during internships and in a lab. Reflections center on how different people and collaborators they have interacted with approach completing assignments and engage in group and team work.
Indigenous People & Land is heavily focused on indigenous, in particular Maori, culture (FREX words include ‘maori’ and ‘mara’ as stem of marae, the focal point of Maori village community) and on the relationship between indigenous groups and local population (FREX words include ‘pakeha’, the Maori word for white New Zealander of European descent; as well as ‘respect’, ‘recognit’). The discussion of indigenous culture is often linked to reflections on importance of land and preservation (‘land’, ‘landscape’, ‘protect’ are among top FREX words). Multiple students tied their thoughts expressed in this topic to the US treatment of American Indian populations.
Social Divides is about the many dimensions of social cleavage that the students have witnessed during their study abroad. Students comment and reflect on the manifestations of observed social segregation with respect to race, gender, caste, and social class (highest prob and FREX words include ‘segreg’, ‘gender’, ‘color’, ‘cast’). Many of the contexts that feature this topic prominently are from Islamic nations, hence ‘islam’ is among top-ranked FREX words. The discussion of (unequal) treatment of men and women, however, are also prominently featured in multiple essays discussing European countries.
The next six topics entail observations on the physical environment and the society. Cultural reflections are featured in a number of these topics, although often less evidently so than in the first seven topics. Arrival & First Impressions encompasses reflections upon arrival at the study abroad location (highest prob words include ‘first’, ‘arriv’, ‘immedi’) and the corresponding observations of the physical and social environment (FREX words include ‘driver’, ‘town’, ‘rural’, ‘bus’, ‘walk’, ‘crowd’, as well as notable tourist locations such as ‘duomo’). Students further reflect on settling in at their new place (‘apart’ among FREX words is stem of apartment) as well as the associated emotional responses (highest prob words include ‘feel’ and FREX words include ‘anxieti’).
History & Art is unmistakably about reflections on the history and the art of the places where the students were deployed (‘histori’ and ‘art’ are among both highest prob and FREX words). These observations stem both from the guided tours and students’ independent visits of notable locations (including museums) as well as through the history and history of art classes that they took in their study abroad program (while ‘medic’ among FREX words refers to the Medici dynasty from Florence, ‘medicin’ refers to observations made by pre-med students).
Institutions & Prosperity entails observations about societal institutions and economic prosperity, often in a comparative context vis-à-vis the Western world (hence ‘western’ among highest prob and FREX word; ‘skin’ and ‘edit’ refer to Chinese youth’s desire to edit selfies via whitening one’s skin in order to appear western). Multiple top-ranked highest prob and FREX words depict the functioning and prosperity of the economy (‘econom’, ‘commerc’, ‘growth’, ‘income’, ‘prosper’), as well as existence of an advanced metro system (‘subway’) and types of toilets (‘toilet’) as indicators of economic development. At the same time, the students comment on the role and involvement of the government (‘govern’) and the conceptualizations of freedom.
Exploring the Surroundings contains comments based on students’ exploration of the city and the wider surroundings of their study abroad location. The availability of coffee shops is one example of such exploration (hence ‘coffe’ and ‘starbuck’ among FREX words). This topic further features students’ description of the many outdoor activities they had the opportunity to participate in (e.g., backpacking, diving, snorkeling) and the places they were able to visit (e.g., rainforest, beach, mountains, coral reef, opera hall).
The topic we named Travel is about students’ reflections on the logistics and the feelings associated with travel (‘travel’ is among highest prob words and FREX words include ‘flight’, ‘cancel’, ‘plane’, ‘airport’, ‘fun’, ‘love’, ‘nervous’, ‘homesick’). Students further emphasize the unique opportunity during their study abroad to travel to multiple locations over long weekends (hence ‘weekend’ among FREX words). Importantly, Travel is the only topic for which ‘cultur’ is not among the top 30 highest prob words.
Academics is unambiguously about students’ academic experience, narrowly defined. Top-ranked highest prob and FREX words for this topic include ‘class’, ‘student’, ‘learn’, ‘program’, ‘school’, ‘professor’, as well as ‘math’, ‘scienc’, ‘engin’, ‘librari’. This topic encapsulates students’ reflections on the nuances of academic programs they participated in and the academic culture as it pertains to the conduct of classes, professors’ teaching styles, use of facilities, and academic interactions with fellow students.
The following three topics are about different dimensions of students’ interaction with people. Discovering Society & Friends is a topic where students reflect on their early learning about the society of their study abroad location. Such early learning (‘orient’ as the stem of orientation is among top-ranked FREX words) often took place upon discovering an initial set of friends soon after arrival to the study abroad location. [‘friend’ is among top-ranked highest prob words and the stem of buddy (‘buddi’) is among the top-ranked FREX words; further FREX words entail first names of persons (Bara and Michael).] These individuals introduced students to the basic characteristics of their society, such as the Czech Republic’s pre-communist, communist, and post-communist history and ethnic makeup. Essays that feature this topic prominently further refer to study abroad experience in Spain, France, and Germany.
Conversing with People depicts the many acts of holding conversations, a key channel of interaction with people. This is the only topic where ‘convers’ is among both highest prob and FREX words. This topic entails students’ descriptions of interactions with staff in the restaurants and customer service (‘waiter’, ‘waitress’ are among FREX words), as well as with and among new acquaintances and friends (‘friend’ is among highest prob words and ‘olya’ among FREX words captures the name of a person, Olya, with whom a student has had many conversations). The necessity to understand and speak the local language facilitates such interaction (‘languag’ is among top highest prob words and ‘russian’, ‘japanes’, ‘greek’ are among top-ranked FREX words) and allows one to learn about the society and culture.
In contrast, Relating to People captures ways of connecting with people at a deeper level, emphasizing the forming of relationships. Indeed, while ‘people’ is the top-ranked highest prob word, ‘relationship’ is among top-ranked FREX words. This topic is not focused on language and conversation per se, but rather on the many contexts within which students related to different individuals they have met. A commonly mentioned context is sports, with students commenting on how they played soccer or other games with local children or visited a sports match (FREX words include ‘soccer’, ‘game’, ‘play’, ‘stadium’, ‘kid’). It is through relating to people that students also learned about social and cultural issues such as poverty and importance of local identity, respectively (‘povert’, ‘catalan’ and ‘barcelonan’ are all among FREX words).
The final two topics in the corpus are students’ personal reflections on how the study abroad experience affected them as individuals. Personal Growth encapsulates students’ thoughts on ways in which studying abroad allowed them to mature and become more independent as well as reflect on lessons learned, goals achieved, and life more broadly (highest prob and FREX words include ‘abroad’, ‘time’, ‘experi’, ‘learn’, ‘independ’, ‘mature’, ‘life’, ‘lesson’, ‘reward’). In describing their growth as a person, many students express gratitude to individuals whom they met and who facilitated their experience (‘thank’ and ‘bless’ are among FREX words), as well as comment on how their personal growth will facilitate their on-campus engagement upon their return.
The last topic, Coping with Challenges, refers to the variety of challenges that the students faced and the approaches they took in striving to overcome them. This is the only topic where ‘situat’ (the stem of situation) is both among highest prob and FREX words. A further noteworthy FREX word is ‘mistak’. Essays featuring this topic most prominently highlight the initial lack of mastery of local language (hence ‘french’ is both among highest prob and FREX words) that presented a challenge in both academic and social settings. The notion of experience (‘experi’ is among highest prob words) in this topic can have either a negative connotation, such as when a student was harassed by a homeless person (in one context), or a positive connotation, such as when interaction with a homeless person (in another context) allowed a student to overcome grief due to passing of a parent. Students further illuminate stressful situations such as having to resolve differences with a housemate (hence ‘romain’ and ‘smoke’ among FREX words), the lack of comfort in Indonesian bathrooms (hence ‘bali’ and ‘indonesia’ among FREX words), concerns about terrorism and political unrest (hence ‘terror’ and ‘macron’ among FREX words), and even the challenges involved in trying to figure out the local custom when shopping in a bakery (hence ‘bakeri’ and ‘croissant’ among FREX words).
Assessing the role of experience- and student-specific factors
We next make use of the defining characteristics of the STM, the inclusion of metadata covariates into the estimation of topics, to examine the association between topical prevalence (a measure of the degree of authors’ emphasis on particular topics) and metadata covariates. We are thereby able to address the question of whether, and if so how, the students’ emphases on specific topics, summarized in Table 3, vary systematically with observable experience-specific factors and student-level characteristics.
Multiple theoretical perspectives on the processes of sociocultural adjustment, social learning, and development of intercultural competence suggest that study abroad-related perceptions, experiences, and outcomes may vary with participant- and experience-specific characteristics [3, 21, 32, 37]. We would thus expect both the student’s background and the nature of his or her study abroad experience to influence the student’s perceptions of the study abroad experience, and hence the extent of emphasis on particular topics. In the absence of comparable empirical studies we, however, refrain from articulating specific ex ante hypotheses concerning the relationship between particular metadata covariates and students’ emphases on individual topics. Instead, we let the data speak for themselves and then rely on an inductive approach to summarize the gist of our findings based on the obtained empirical evidence.
To conduct the analysis, upon estimating our 18-topic STM, we make use of the estimateEffect function available in R’s stm package to estimate regressions with essay-level proportion devoted to a topic as the outcome and metadata variables as covariates (see Roberts et al. [29, 30]). We present our results in the form of figures. Specifically, we plot mean differences in estimated topic proportions for two different values (a ‘treatment’ and a ‘control’) of a given document-level covariate of interest.9 We display the point estimates and the corresponding 95% confidence intervals. For easier readability, we customize the horizontal axis for each figure.
The resulting machine learning-based statistical analysis is informative of what factors, and in what way, may be influencing students’ perceptions of their study abroad experience. However, we caution against readily interpreting our results as purely causal in nature. Despite a wide range of covariates that we include in our analysis, it is possible that there exist omitted variables that are on the one hand correlated with our metadata covariates and at the same time exert an effect on student’s perceptions of various study abroad experiences. Moreover, the sample of students who choose to study abroad is likely a non-random sample of all students (see, e.g., Goldstein and Kim , Stroud ). (Indeed, as noted in the description of metadata above, the composition of our sample with respect to various student characteristics does not fully reflect the composition of the entire student body at the university of our consideration.) If so, the unobservables that influence students’ perceptions of their study abroad experience may be correlated with unobservables that determine whether students opt to study abroad in the first place (an example may be student-specific extent of extraversion), a scenario leading to classic sample selection bias.
With these caveats in mind, we proceed as follows. We first illuminate the role of the experience-specific covariates. We then turn to examining the role of student-specific socio-demographic factors and other observable student characteristics.
Timing of reflections
Length of time spent studying abroad
We next examine whether, and if so how, the length of time spent abroad matters for students’ emphases on specific topics. Existing evidence in the scarce existing literature on the subject suggests that the length of time may exert an effect on a variety of study abroad outcomes (see, e.g., Dwyer ). Because students were expected to record early reflections during their first week abroad, early reflections should not be influenced by the total length of time spent abroad. In exploring the effect of the length of time spent abroad, we therefore condition the analysis only on students’ ex-post reflection essays. We model the length of time abroad with a simple binary variable that splits the sample of student-authors into a subsample of authors for whom the length of time abroad exceeds the sample median value (114 days) and a subsample for whom the length of time abroad is smaller than the sample median value.
Study abroad location
As noted in the discussion of our empirical approach, the non-random character of students’ choices with respect to study abroad programs and locations prevent us from being able to ascertain to what extent these associations capture the causal effect of the study abroad location versus the effect of unobserved student characteristics that determine where a student chooses to study abroad. In other words, it is certainly possible that studying in less developed regions of the world renders a student comparatively more attentive to the many manifestations of the social cleavages that are on average less apparent in other parts of the world. Alternatively, however, students who are inherently relatively more receptive to social issues may be particularly eager to study abroad in locations where societal divides are especially ubiquitous.
Four broad conclusions can be drawn on the basis of the above-surveyed evidence on the effect of metadata covariates on topical prevalence. First and foremost, the timing of reflections, experience-specific factors, and student characteristics in general clearly exhibit an effect on students’ study abroad reflections. In other words, our empirical results suggest that students’ study abroad experience is critically shaped both by the study abroad environment and by the student’s individual background.
Second, different experience- and student-specific factors affect the emphasis on particular topics differently. For example, the emphasis on observations about the many dimensions of societal cleavages (Social Divides) is significantly influenced both by study abroad location and by student’s socioeconomic status. Neither study abroad location nor student’s socioeconomic status, however, exhibit an effect on topical prevalence of Arrival & First Impressions. The prevalence of the latter topic, however, is in turn significantly determined by the timing of reflections, a metadata covariate that, together with having studied in Western Europe, exhibits a statistically significant effect on topical prevalence for the largest number of topics (altogether seven).
Third, the particular observable student characteristic that exhibits an effect on topical prevalence for the largest number of topics (altogether six) is, interestingly, student’s broad choice of academic major. In the absence of more detailed student-level controls, this finding is at least in part likely explained by existence of unobservable student characteristics that shape both the student’s choice of major as well as his or her perceptions while studying abroad.
Fourth, prevalence of all but two among the 18 estimated topic is statistically significantly shaped by at least some experience-related or student-specific factor. Only prevalence of the topics Conversing with People and Travel is statistically insignificantly related to specific values of any of the metadata covariates that we had explored. The emphasis on two further topics, Comparing Cultures and Coping with Challenges, is statistically significantly influenced by a single respective covariate. These findings are consistent with the interpretation that especially travel and holding conversations with people, but also engaging in cultural comparisons and encountering challenges, are ubiquitous elements of virtually any study abroad experience, regardless of the location, timing of reflections, and student’s background. Among these topics, Travel in particular is the second most prominent topic in the corpus (see Fig. 2).
In this paper, we have taken a new route to analyzing students’ study abroad experiences, a subject of direct interest to scholars across multiple social science disciplines as well as international education practitioners. Drawing on a corpus of mandatory essays authored by students at a selective, private liberal arts college in the USA in order to reflect on their study abroad experiences, we have applied tools for quantitative analysis of text-as-data to characterize the salient themes emphasized in students’ reflections.
Our analysis uncovers 18 different topics that span over multiple domains, including reflections on culture, observations on the physical environment, interaction with people, as well as comments on personal challenges and change. We then demonstrate that both the specifics of the study abroad experience, such as the length of time spent abroad and the study abroad location, as well as the deployed student’s background, including his or her socioeconomic status, are important determinants of the student’s emphasis on particular topics and, thus, define his or her study abroad experience. Furthermore, different experience- and student-specific factors affect students’ emphases on particular topics differently. Our analysis thereby provides a unique insight into the complex nature of the study abroad experience and the web of factors that influence it.
Future work should attempt to address issues of causality, as well as examine to what extent our findings apply to study abroad students from other universities and colleges. We hope that our application of computational methods for analysis of text-as-data as a thus far unexplored lens for investigation of study abroad experiences will stimulate further research on study abroad, international education, and intercultural immersion more generally.
A Scopus search on 'study abroad' appearing in the title, abstract, or keywords identifies more than 2000 published contributions, with the vast majority of publications dated after year 2008. An analogous search using Google Scholar reveals many more works. For a necessarily limited set of sample contributions and further references, see, e.g., Carlson and Widaman , Ryan and Twibell , Dwyer , Rundstrom Williams , Hadis , Anderson et al. , Paige et al. , Collentine , Norris and Gillespie , Basow and Gaugler , and Terzuolo .
For example, the words 'dog' and 'bark' will appear more often in a topic about dogs, 'cat' and 'purr' in a topic about cats, while 'pet' and 'vet' may appear roughly equally in both. Documents feature multiple topics in different proportions. A document that is 20% about cats and 80% about dogs will tend to feature four times as many dog words as cat words.
See https://www.structuraltopicmodel.com for an updated list of published applications of STM.
For an exposition of the formal statistical structure of the STM and computational aspects of estimation, see Roberts et al. .
This figure does not include shorter spells abroad as part of regular coursework offered by resident faculty.
We dropped four essays of students who by the time of completion of our data collection had not yet turned in both the early and the ex-post reflection essay in the required format.
In studying the words lists, it is important to keep in mind that STM-based estimates of topics are driven by correlations across documents in the occurrence of words. Thus, estimated word lists will also contain words that are on their own not particularly informative about the core ideas underlying a topic. (For example, 'can' and 'will' are among highest probability words for several topics.) Indeed, this is an aspect of STM that human readers cannot easily match. An author's use of a topic might rely on specific combinations of words in patterns that a human reader might find hard to discern.
Roberts et al. [30: 12] note that to implement the regressions, "…the topic model should contain at least all the covariates contained in the estimateEffect regression". Accordingly, the set of metadata variables that we utilize to estimate the effect of specific metadata covariates on topical prevalence conceptually coincides with the set of metadata covariates that we utilize to estimate the topics. Practically, to estimate the effects associated with categorical and numeric variables that take on multiple values (such as, e.g., Region and GPA; see Table 2), we define and utilize in the analysis corresponding binary variables that highlight the effects of interest (e.g., Africa vs. other regions; above median GPA vs. below median GPA).
We are grateful to Mark Rush and Marc Conner for making this project possible. Griffin Noe provided excellent research assistance. An anonymous reviewer offered valuable comments and suggestions on an earlier draft of the manuscript.
Compliances with ethical standards
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
- 1.Anderson, P. H., & Lawton, L. (2011). Intercultural development: study abroad vs. on-campus study. Frontiers: The Interdisciplinary Journal of Study Abroad, 21, 86–108.Google Scholar
- 5.Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.Google Scholar
- 9.Dwyer, M. M. (2004). More is better: The impact of study abroad program duration. Frontiers: The Interdisciplinary Journal of Study Abroad, 10, 151–163.Google Scholar
- 11.Gentzkow, M., Kelly, B.T., & Taddy, M. (2017). Text as data. Journal of Economic Literature. https://www.aeaweb.org/articles?id=10.1257/jel.20181020 (forthcoming).
- 15.Hadis, B. F. (2005). Why are they better students when they come back? Determinants of academic focusing gains in the study abroad experience. Frontiers: The Interdisciplinary Journal of Study Abroad, 11, 57–70.Google Scholar
- 18.Institute for International Education. (2018). A World on the Move Trends in Global Student Mobility. New York, NY: Institute for International Education (IIE).Google Scholar
- 19.Law, D. S. (2016). Constitutional archetypes. Texas Law Review, 95, 153–243.Google Scholar
- 22.Mendelson, V. G. (2004). ‘Hindsight is 20/20’: student perceptions of language learning and the study abroad experience. Frontiers: The Interdisciplinary Journal of Study Abroad, 10, 43–63.Google Scholar
- 29.Roberts, M.E., Stewart, B.M., & Tingley, D. (2016). stm: R package for structural topic models. Journal of Statistical Software. https://cran.r-project.org/web/packages/stm/vignettes/stmVignette.pdf (forthcoming).
- 30.Roberts, M.E., Stewart, B.M., Tingley, D., Benoit, K. (2018). Package ‘stm’. Reference manual, version January 28, 2018. https://cran.r-project.org/web/packages/stm/stm.pdf.
- 35.Taddy, M.A. (2012). On estimation and selection for topic models. In Proceedings of the 15th International Conference on Artificial Intelligence and Statistics, pp. 1184–1193.Google Scholar
- 36.Terra Dotta. (n.d.). Tackling the Gender Gap in Study Abroad. http://www.terradotta.com/articles/article-Tackling-The-Gender-Gap-In-Study-Abroad-3-15.pdf.
- 38.Wallach, H.M., Murray, I., Salakhutdinov, R., & Mimno, D. (2009). Evaluation methods for topic models. In ICML ‘09 Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1105–1112.Google Scholar