Elites are those who can impose upon others the duty of filling out questionnaires … those whose sociology no one dares to write (Schmitt 1991:181, authors’ translation)


For elite studies to be publically relevant they need to engage and convince. To convince, the definition and demarcation of the elite must be theoretically sound and methodologically transparent, and to engage, we need detail and depth in the descriptions. We should strive to depict elites in their relations to each other; describe the ties they form in their affiliations, and layout their careers and their characteristics. In this contribution, our ambition is to outline a quantitative research framework that meets this need for thick description : a comprehensive prosopographical approach, based on relational sociology.

By prosopography we refer to a particular approach to data collection and conceptualisation, with a long tradition in the field of historical research. Doing prosopography is the construction of “a collective biography, describing the external features of a population group that the researcher has determined has something in common (profession, social origins, geographic origins, etc.)” (de Ridder-Symoens, quoted in Verboven et al. 2007, p. 39). That is, “an inquiry into the common background characteristics of a group of [...] actors by means of a collective study of their lives”, i.e. “to establish a universe to be studied, and then to ask a uniform set of questions - about birth and death, marriage and family, social origins and inherited economic position, place of residence, education, amount and sources of personal wealth, occupation, religion, experience of office and so on” (Stone 1971, p. 46).

In elite studies that follow the tradition of C. Wright Mills and Pierre Bourdieu , the combination of relational sociology and prosopography is common in the studies of fields, networks, and careers. Here we will briefly introduce three qua ntitative varieties of relational analysis that are well suited for prosopographical data; multiple correspondence analysis, social network analysis and sequence analysis. We argue that these three methods strengthen and corroborate each other and, when combined into a single research design, deliver an engaging and convincing description of the constellations of power and elites. First we discuss the problem of defining elites, and how data analysis can be introduced in that step. Then we introduce the three types of relational analysis , before finally introducing in detail the different data formats required by these analyses.

Defining the Population, an All but Trivial Question

When conducting elite prosopographies the recurring problem in all relational sociology, that of boundary specification (Emirbayer 1997), becomes particularly pertinent. Who are part of the elite, the field or the network? It is important to reflect thoroughly on what makes a group. What is the ‘thing’ they have in common? Those at the top of a field are often qualitatively different from those just a few steps down the ladder, and as a consequence any description relying on a too inclusive sample will miss how the top distinguish themselves from the riff-raff.

Within the methodological tradition of elite studies, population boundaries are often specified with either a positional, reputational or decisional approach (Hoffmann-Lange 2018). The positional approach, that remains the most widely used, identifies individuals at the top of formal hierarchies regarded by the researcher to have the resources meriting inclusion in elites. This could for instance be top managers in the largest corporations in a society. The key issue for the positional approach is how to select the organisations, and who and how many to select from each organisation. This may be particularly difficult when the hierarchy is not easily measured, for instance in culture or media. The reputational approach uses either the scholar, key insiders or general surveys to assess key players in a community. Thus, the primary challenge for the reputational approach is how to select those who select, and to ensure that their preconceptions of power do not hinder the identification of the power of those who work in more subtle or naturalized ways. The decisional approach relies on the scholars identifying those involved in a political process. With the decisional approach, the main obstacle is selecting - and to getting access to data on - the relevant decisions and struggles. The most central decisions in society are not always up to debate and might even have been settled ages ago. Similarly it is not always necessary for powerful agents to be highly active to have their interests taken into account by those active in the decision process.

Common for these three approaches is a relatively high vulnerability to more or less ad hoc decisions (Larsen and Ellersgaard 2017). Even the demarcation of a somewhat straightforward group, e.g. top business leaders, involves ad hoc decisions. Should we select the top 100 or 200 corporations? And by turnover, balance, equity or prestige? How many CEO’s? Do we include chairmen? And what about owners or CFO’s? First and foremost it is important not to rely naively on official categories and registers, since the fields and groups under investigation rarely - if ever - follow the same logic of demarcation. These questions require a reflexive movement back and forth between data and hypotheses, similar to a hermeneutic circle in which qualitative knowledge of the field informs and is informed by data (Bourdieu 2005, 99). We propose here that this process can be further guided and substantiated by systematic use of relational methods (Larsen and Ellersgaard 2017; see also Knoke 1993) using patterns of formal interaction to identify key players at the core of elite networks.

The increasing availability of large data sources creates the pitfall of defining the studied group too broadly. There are both analytical and practical reasons for limiting the sample to be studied and thus drawing boundaries at an exclusive level. Analytically, because elites, as Mills (1956, 11) puts it, are those who ‘sit on the same terrace’ and form a group of relative peers. Instead of giving into the lure of big data, i.e. using inclusive registers as an occasion for increasing sample sizes, we propose that the researcher uses the registers to create highly select groups. That is, by first extracting large datasets of, for instance, corporate boards, then using relational methods , like social network analysis, to identify the relevant group and then, with no hesitation, throw away the rest of the sample.

Relational Methods

We propose to use the wealth of data available to put description at center stage (see Savage and Burrows 2007). Simple descriptive tables of the gender, nationality, education or social background of an elite group is often telling in their own right, particularly if used in a longitudinal or comparative framework (see Hartmann 2007), as demonstrated by the impact of Thomas Piketty’s (2014) work (see Savage 2014) .

Let us demonstrate here, by using our analysis of the Danish elite network as an example, how these simple but compelling descriptions can be supplemented by more structural analyses of the elite group, e.g. of the system of relationships between their different social characteristics, of their social relations, their career trajectory. As mentioned, three specific methods are well suited for these purposes and supplement each other very well, i.e. multiple correspondence analysis, social network analysis and sequence analysis.

Multiple correspondence analysis (MCA) is, in short, a way to represent the relationships between a large number of variables in a multidimensional space. Both variable categories and the individuals, or any appropriate unit of analysis, can then be projected onto this space (Hjellbrekke 2018). MCA is closely associated with the work of Pierre Bourdieu as a way to construct and analyse social spaces and fields (Lebaron 2009). Recently, MCA has been used to construct spaces on prosopographical populations such as central bank directors (Lebaron and Dogan 2012), French and Norwegian elites (Denord et al. 2018; Hjellbrekke et al. 2007) and top Swiss (Bühlmann et al. 2012), Indian (Naudet et al. 2018), and Danish CEO’s (Ellersgaard et al. 2013).

MCA offers a range of analytical opportunities. Variables or cases not included in the construction of the space can be projected post hoc as supplementary variables (or cases) onto the constructed space. Studying how ‘external’ variables structure the space is a strong explorative, as well as corroborative, feature. Furthermore, the logic structuring the positions of particular subgroups within a space can be further explored in a class-specific analysis (Chiche and Le Roux 2010).

MCA is a strong technique for handling prosopographical data (Broady 2002). First of all because it lets us grasp a lot of information simultaneously, but also, and especially, since it has a robust way of handling incomplete or non-response data. As Stone (1971: 58) notes, prosopographical data will almost inevitably be incomplete. The issue of keeping variables with missing data in the analysis, but giving no weight to the missing-category, is handled well by specific MCA (Le Roux and Rouanet 2010). Another strength of MCA is that it allows the researcher to use the associations between imperfect indicators of a social phenomena , e.g. a form of capital , to tease out that information in the constructed space. In our analysis of the social space of the Danish power elite, 43 variables with 193 categories were used to construct the space. Combining different measures, levels of economic capital were clearly an important factor in the analysis although we could not gather income or wealth data (analysis forthcoming, see Lunding 2017 for an earlier version). As shown in Fig. 5.1, estate value, house type (including the size of the land), combined with positions in different types of corporations in the Danish corporate register (CVR), follow a logic closely associated with economic capital on the vertical axis.

Fig. 5.1
figure 1

The Danish Field of power . (Source: Lunding et al. 2020)

Using prosopographical data for Social Network Analysis (SNA) requires the researcher to incorporate relational data already in the design and data collection steps. Rather than only gathering attribute data of individuals, data must tell how each member of the population is connected, or related, to each other. A relation may take multiple forms, spanning from sharing characteristics such as educational background, over kinship ties to interaction or shared affiliation networks (Borgatti et al. 2009).

Social Network Analysis allows for analysis of relations on many different levels within the elite population. From analysing the overall structure and level of connectivity in the population, to identifying the most central individuals, to finding subgroups based on community detection algorithms (Keller 2018), social network analysis allows the researcher to describe and explore relations within a defined population. For instance, social network analysis has been used to explain the particular role played by the Medici family in the multiplex Florence eliteworks in the renaissance (Padgett and Ansell 1993) and to explore the changing relationships in American corporate networks leading to a less cohesive and politically inept business elite in the US.

As stated earlier, network analysis may offer empirical guidance to the empirical definition of the population. In Fig. 5.2, we show how 423 people, the core of the Danish elite network containing 37,750 individuals holding 56,325 positions, are connected. Adding to this, the color denotes the sector of their primary organisational affiliation, showing us a cohesive elite core, which nonetheless cluster around the key sectors in the Danish power elite.

Fig. 5.2
figure 2

The network of the core of the Danish power elite. (Source: Larsen and Ellersgaard 2018)

The career trajectories of elites are often of particular interest. Not only do these show ‘how the elite got to the top’, they also shed light on which types of organisational experiences are given high value in elite groups. For this purpose Sequence Analysis (SA) provides the opportunity to not only map out careers, but also take the particular ordering and tempo of careers into account (Jäckle and Kerby 2018). For instance the speed with which CEO’s have climbed the corporate ladder. This, again, requires the researcher to prepare already in the design and data collection phases of the study, as the temporal ordering of all cases in states must be recorded.

Sequence analysis of a prosopographical elite population was pioneered by Abbott and Hrycak’s (1990) study of eighteenth century German chamber musicians and Blair-Loy’s (1999) study of executive women in finance. Recently, it has been used to map and describe careers of e.g. Bankers (Araujo 2017) federal judges (Jäckle 2016), members of parliament (Ohmura et al. 2018) and top CEO’s (Koch et al. 2017). For the Danish power elite described above we constructed a multichannel data set covering the year-to-year main affiliation of each member in six different states: Sector, subsector, organisation size, organisational stability (time in current organisation), position in organisational hierarchy and geographical location. Based on optimal matching and Ward-clustering, this allowed us to identify 10 ideal typical pathways into the Danish power elite, as illustrated in Fig. 5.3.

Fig. 5.3
figure 3

The 10 pathways to the Danish power elite. (Source: Ellersgaard et al. 2019)

Relational Data

These relational methods require three different types of data. We will now demonstrate how they are found and collected.

Social Network Data

Data on social networks can take two forms - either a simple list of direct connections (like, Hans → Sofia) or a list of affiliation memberships (Sofia → Board of Unilever) from which the direct connections can be derived. If conceptually possible, it is often preferable to collect networks as affiliation networks. Affiliation networks can be constructed in various ways. It might sometimes be possible to rely on already gathered sets of ‘big data’, e.g. compositions of corporate boards. When relying on ‘big data’, it is of great importance to go through the laborious and informed process of ensuring data quality (Heemskerk et al. 2018). Data on affiliation networks can also be gathered manually. A common approach is to create long lists of all organizations of interest, e.g. the largest corporations, the top state offices etc., before ‘scraping’ the web pages or archives of these organizations for all affiliations of relevance; boards, committees, advisory boards, etc. Another approach is the snowball sample. Here we start with a select set of agents. From their CV’s all their affiliations and organizations are gathered, and the process is repeated for the new agents appearing in these affiliations. The snowball sample is much faster to collect because you do not need lists of organizations and you do not collect the isolated and disconnected affiliations. But you obviously lose the global structure of the network with a snowball sample, and it does not necessarily have a natural boundary.

Affiliation data can then be used to explore and create variables covering which sectors or types of affiliations individuals are engaging in. In turn, for each individual, the centrality (Freeman 1979), positions in core or peripheral groups in the network (Larsen and Ellersgaard 2017) or in certain clusters or communities (see e.g. Heemskerk et al. 2013; Palla et al. 2005), can be calculated . The network data can also be used to assess whether or not - and through what types of affiliations - individuals of the prosopography are connected. Furthermore, the sociometric distance between all individuals covered in the prosopography may be calculated.

Biographical Data

Biographical data is collected in the familiar format of ‘one row one individual’, with several columns of attributes; like age, position, gender, place of birth etc. When only non-systematic collections of biographical data are available data must be gathered row wise, one individual at a time. If more systematic sources are available, data can be collected column wise, i.e. you collect data on several people on the same variable. The feasible size of the data that can be collected is highly dependent on the number of variables that need to be collected by hand in a row wise fashion. Column wise data collection is often faster when ordered by magnitudefaster, but might require programming skills.

Biographical data sets have two sets of variables; raw and coded data. The raw data columns are long text strings that are then coded into several variables. The coded data are simple, single value columns with values ready or nearly ready for quantitative analysis. By keeping both the raw strings and coded variables, the basis of a given variable is transparent - in this way improving reproducibility - and it allows for subsequent recoding. The raw text strings are often imported from biographical databases like Who’s Who (see Priest 1982) or scraped from websites such as LinkedIn or Wikipedia. These strings can be very long and contain many variables such as gender, age, first position etc. They are then separated either by hand or with the help of text string manipulation tools such as regular expressions.

Other important sources of biographical data are news and portrait articles. These can often be downloaded in full from newspaper databases. These articles are saved in a corpus of searchable text files. Automatically extracting articles is not a straightforward process, though. The researcher needs to carefully craft individual search strings for each person. Simply searching for portraits of John Smith will give us a lot of irrelevant articles. Searching, however, for John Smith AND director AND Unilever in the period 1990 to 1997 might give us good results. If you save the individual search strings as a variable these can be fed into web scrapers and you might construct the strings by combining several variables like position, name and organization.

In the era of digitalization a lot of previously hard-to-get-to archival material becomes increasingly accessible online. This means that historical data can be gathered in a less, although still, time consuming way. Census lists, church books or parish registers may provide genealogical data and sometimes even historical attribute data on the social position of parents or grandparents etc.

In some cases online sources have free or commercialized APIs, making data collection easy. If not, building web scrapers can speed up the data collection. Remember though, that while web scrapers are efficient when they are successful, the researcher still might have to weed out irrelevant data by hand.

Even with CV’s, biographies and portrait articles , the researcher will face a fair amount of missing information. The more famous a person is the easier it is to find information about them so, in biographical data, missing data is always skewed (Tables 5.1 and 5.2).

Table 5.1 Data types and data sources

Sequence Data

Data on career sequences are collected in the spell format. A sequence consists of spells and they have three types of values: Identity, state and period. You could think of a spell as someone (identity), doing something (state) from one date to another (period). A spell can have multiple states at the same time. Sequences made up of spells with multiple states are termed multi-channel or multi-stream. A career sequence is often multi-channel and a single spell could look like this:

Name: John Smith (identity), Position: Head of Sales (state), Organisation: Coca-Cola (identity), Organisation size: Very Large (state), Place: Cape Town (state), Start: 01-10-1986 (period), End: 05-08-1996 (period).

The whole career then consists of several rows with spells, and individuals would not necessarily have the same amount of spells. From one channel it is possible to derive new channels like the rhythm of organizational change, the number of geographical locations or the level of the position. It is also possible to calculate the duration of a certain state. The same project could have several sequences for the same population, career, housing, network positions etc. While these sequences cannot be collected in the same matrix they can be compared in the later analysis.

The researcher should consider whether the individual spells are allowed to overlap, e.g. if a person can have two CEO positions at the same time. Similarly it is important whether a gap - a set of years where we have no registration - is missing, inactivity or an independent state; like “unemployed”. What counts as missing is important when you count the number of distinct states in a sequence and when calculating the distances between two sequences.

When collecting sequences the researcher will often collect several channels at the same time but one person at a time often on the basis of a CV. For elite populations it is often possible to find relatively complete CV’s. However, elite members might omit low status positions in their CV and as with biographical data it is considerably easier to find data on the famous.

Concluding Remarks

In this chapter, we have argued that prosopographical data collection, combined with relational quantitative methods, offers a unique possibility to understand the social structure of elites. We stress, however, that the quality of an analysis based on prosopographical data rests upon a well-defined, theoretically relevant population, and on access to credible and multifaceted sources and a data collection done with the particular data formats needed, for e.g. social network analysis or sequence analysis, in mind.

In order to move beyond the derogatory or celebratory images of elites from spontaneous sociology, doing prosopography offers an empirical approach to address the particular role of elites played in a societal context. Based on a combination of the craftsman’s knowledgeable care of data and the creativity to use this data in novel ways to not only describe, but also map the relationship between elite groups, prosopographical data and relational methods allow the researcher to highlight relations that matter among the people that matter. Lastly, by naming the powerful and allowing the reader to follow the powerful in the visual representations possible, through multiple correspondence analysis, social network analysis and sequence analysis, the prosopographical researcher not only produces results that are more intuitive to interpret but also aesthetically pleasing. Using names and plotting these allows readers to engage more, and more critically, with results, allowing elite studies to become public sociology. Putting names on power, and gathering and analysing prosopographical data is a way of indirectly imposing on elites the duty of filling out questionnaires and writing their sociology.

Names and Identities: Tying It All Together

For all relational types of data you should expect to spend a considerable amount of time on identity resolution (Keats-Rohan 2007, p. 151). Is the Sofia at Unilever the same as Sofia at Barclays? The severity of this problem differs dramatically between contexts. English working class names like John Smith are notoriously difficult, while Danish upper class names are fairly unique. Similarly, people do not use the same name or the same spelling across data sources and affiliations. Often the official name will be longer e.g. Dick Cheney is short for Richard Bruce Cheney. In order to resolve the name matching problem the researcher needs to collect as much extra - even otherwise redundant data - for each entry as possible. For affiliations, like boards, make sure to collect the small biographical descriptions that often accompany the list of board members. But even with this data the researcher is often forced to what amounts to qualified guesses. Furthermore, you should not expect persons and organizations to have the same name across different sources: as a result each person and organization needs a unique identifier across all datasets and a list of aliases, otherwise you will not be able to merge and cross reference between affiliation networks, sequence data and biographical data.

Identity resolution, or name matching, is best done by hand and while it may seem tempting to use algorithms for fuzzy name matching - the quality is often low. At best, algorithms may help in the hand coded process. For larger datasets it is often impossible to solve identity problems completely. For sequence and biographical data the problem will rarely affect the substantial arguments of the analysis. This is not the case for network analysis and considerable care should be given to the most central agents . Be wary if John Smith is a central agent. But on the other hand the risk of erring on the side of caution is equally problematic. Having split Richard Cheney from Dick Cheney will not just affect the centrality of Dick Cheney, but of all those he is connected to, and if it is a central agent it could impact the global structure of the network. One approach is to ensure a higher data quality for central agents. Here, analysis, data collection and data quality control is a back and forth process.