A Move Forward: Exploring National Identity Through Non-linear Principal Component Analysis in Germany

In research on national identity, scholars have developed a wide variety of approaches to measure and better understand this ubiquitous yet complex concept. To date, most of these approaches have been theory-driven, while only a very few have been data-driven. In this article, we aim to contribute to the latter by introducing a new data-driven method that has not been applied yet—that of non-linear principal component analysis (NLPCA). In contrast to other commonly used methods such as factor analysis, NLPCA distinguishes itself by making relatively few assumptions about the data and by allowing for greater flexibility when discovering underlying dimensions of such a complex concept as national identity. Drawing on the 2013 ISSP National Identity module, our analysis focuses on the case of Germany, also taking into account Western and Eastern Germany. Running an NLPCA, we find four dimensions that cover the multidimensionality of national identity: nationalistic attitudes, national pride and attachment, cosmopolitan beliefs, and membership criteria defining national belonging. This article contributes to the empirical debate on measuring national identity by suggesting a new and flexible methodological approach that better grasps the concept’s complexity and which we believe can move empirical research on national identity forward in and beyond Germany.


Introduction
In recent years, a growing body of research on national identity has emerged (e.g. Hjerm 1998;Miller 2000;Li and Brewer 2004;Huddy and Khatib 2007;Kunovich 2009;Theiss-Morse 2009;Ariely 2011Ariely , 2019Ariely , 2020Wright et al. 2012;Schildkraut 2014;Hanson and 1 3 and Khatib 2007; Choi et al. 2021). What is more, it is also far from uncommon to employ these closeness items to measure both national identity and related, yet fundamentally distinct concepts such as patriotism (e.g. Huddy and Del Ponte 2019 on national identity; Citrin et al. 2001 and Ariely 2020 on patriotism), further muddying the waters.
Seeking to provide more analytical clarity in this discourse, this article makes three contributions to empirical scholarship on national identity. First, going beyond existing studies dealing with the measurement of this concept (e.g. Bostock and Smith 2001;Latcheva 2011;Wright et al. 2012), we introduce a novel methodological approach by which one can better grasp the multidimensionality of national identity and which we believe can move empirical research in that field forward. Thus, we explicitly focus on the measurement of national identity itself, and do not, for instance, build on other literature strands that mainly investigate the impact of national identity on other political attitudes (e.g. Hjerm 1998;Blank and Schmidt 2003;Huddy and Khatib 2007;Kunovich 2009;Berg and Hjerm 2010;Miller and Ali 2014;Gustavsson and Stendahl 2020;Mader et al. 2020) or the relationship with other kinds of identities such as ethnic identities (e.g. Masella, 2013;Citrin and Sears, 2014;Molina et al. 2015;Choi 2021). We contend that both the conceptual and the methodological inconsistencies by which the extant scholarship is marked mainly result from pursuing a deductive approach based on unspecific concepts and unclear theories. Thus, in order to contribute to addressing these challenges, we offer an alternative way by conducting a data-driven approach that has not been applied yet: a non-linear principal component analysis (NLPCA). To date, only a very few studies have relied on a similar, that is an inductive, approach to national identity by running a latent class analysis (e.g. Alemán and Woods 2018;Ditlmann and Kopf-Beck 2019). However, in contrast to these scholars, to our knowledge, this is the first study in that field that uses NLPCA.
Second, this investigation is distinct from existing accounts in that it does not consider only specific items that, for instance, revolve solely around the importance of certain membership criteria such as the meaning of ancestry or language (e.g. Hjerm 1998;Kunovich 2009;Ha and Jang 2015;Helbling and Reeskens 2016;Ariely 2020) or rest on a single item concerning the closeness one feels towards the nation (e.g. Huddy and Del Ponte 2019; Gustavsson and Stendahl 2020). Instead, we integrate 58 variables of the ISSP 2013 item battery in the analysis in order to better cover the multidimensionality of the concept. To our knowledge, there is just one similar study on nationalism, by Bonikowksi (2017), that included 23 variables of the ISSP in a latent class analysis. Yet, while insightful and valuable, we seek to go beyond this study in three ways: first, we apply NLPCA which allows for much greater flexibility, making fewer assumptions than LCA. Second, we integrate 58 and thus more items in our analysis. Lastly, we investigate the multidimensionality of national identity, i.e. national belonging and attachment, and thus a fundamentally distinct concept than nationalism, commonly defined as the idealisation of one's nation. 1 Third, we concentrate on Germany and thus investigate a case that-apart from very few exceptions (e.g. Mader et al. 2020)-has not been covered at large in previous scholarship. To date, aside from the investigations of Ditlmann and Kopf-Beck (2019), both inductive-oriented studies (e.g. Alemán and Woods 2018) and deductive-oriented studies (e.g. Hjerm 1998;Kunovich 2009;Reeskens and Hooghe 2010;Miller and Ali 2014;Helbling and Reeskens 2016;Ariely 2019Ariely , 2020Huddy and DelPonte 2019;Choi et al. 2021) were cross-country analyses. In terms of single-country analyses, most articles have focused on the US (e.g. Citrin et al. 2001;Li and Brewer 2004;Huddy and Khatib 2007;Theiss-Morse 2009;Wright et al. 2012;Schildkraut 2014;Molina et al. 2015;Hanson and O'Dwyer 2018), or other countries such as South Korea (e.g. Ha and Jang 2015) or Spain (e.g. Hierro and Rico 2019). The case of Germany is particularly special in many respects: First, previous scholarship has shown that Germans have a rather complicated relationship towards their country which is mainly caused by the atrocities of World War II committed under the national socialist regime, resulting, amongst other things, in a comparatively low degree of nation-related attitudes, i.e. national pride or nationalism (e.g. Blank and Schmidt 2003;Miller-Idriss and Rothenberg 2012;Miller and Ali 2014;Ditlmann and Kopf-Beck 2019). Second, Germany has undergone a remarkable transition from a primarily ethnic to a more civic conception of nationhood (e.g. Bonikowski 2017;Ditlmann and Kopf-Beck 2019;Mader et al. 2020;Filsinger et al. 2020). While the latter is still the predominant narrative of the nation, both notions co-exist and are articulated in public discourse, as in most other countries. Third, Germany is also an interesting case insofar as it was once divided and later on reunified in the early 1990s. Also, accounting for the differences in both political and cultural terms between Eastern and Western Germany, we not only investigate Germany as a whole but also take this divide into consideration, underscoring the context sensitivity of our analysis. This article will proceed as follows. First, we will briefly present the existing various measurement approaches and highlight their empirical shortcomings. Second, we will describe the main idea of NLPCA and stress its advantages in comparison to other commonly used methods such as principal component analysis (PCA) or factor analysis (e.g. Citrin et al. 2001;Kunovich 2009;Latcheva 2011;Ariely 2011;2020;Ha and Jang 2015;Filsinger et al. 2020;Mader et al. 2020). Third, we will report the results of our analysis and the degree to which they relate to the concept of national identity as it is currently used. In addition, we will provide an empirical application of our results, studying various relationships between national identity and several socio-demographic variables. After discussing the broader relevance of our findings, we will close with two major suggestions aimed to improve future empirical research on national identity.

Current measures of national identity
Seeking to structure the multitude of operationalizations in the extant literature, we identify four different approaches (Table 1):

Importance of membership criteria
A dominant line of research employs items tapping the importance of certain membership criteria defining the national boundaries (e.g. Blank and Schmidt 2003;Kunovich 2009;Berg and Hjerm 2010;Ha and Jang 2015;Helbling and Reekens 2016;Hanson and O'Dwyer 2018;Ariely 2020;Mader et al. 2020;Filsinger et al. 2020). As such, Blank and Schmidt (2003, p. 296) define national identity as the "importance of national affiliation as well as the subjective significance of an inner bond with the nation" and apply items asking how important it is for someone to be a citizen of the Federal Hierro and Rico (2019), Ariely (2020), Huddy and Khatib (2007), Molina et al. (2015), Fleischmann et al. (2019), Carey (2002) Republic of Germany, to possess German citizenship, and to have "inner ties" with Germany. In a similar vein, Helbling and Reeskens (2016, p. 746) assess the "criteria individuals use as 'symbolic boundaries' (…) to distinguish 'us' from 'them"'. Drawing on the ethnic-civic dimension, they use various indicators to see how important it is for someone to have certain membership criteria such as the importance of place of birth, citizenship, ancestry, religion, the ability to speak the language, and the respect for laws.
In addition, they ask if someone "feels" their nationality and whether they have lived in the country for a long time. Finally, scholars such as Huddy and Khatib (2007) and Johnston et al. (2010) consider national identity as a type of social identity and investigate how "important" it is to be their nationality (here either American or Canadian).

Pride
Another approach measures national identity by means of pride. For instance, Elkins and Sides (2007) define national identity as a kind of "loyalty to a nation-state in a very general sense", and they apply a single item known as the general national pride (GNatPr) item, asking, "How proud are you of being [ NATIONALITY]?". Likewise, scholars such as Shayo (2009), use the same logic. In a similar vein, some studies such as the one of Pehrson et al. (2009; for similar approaches see also Carey 2002) combine the general national pride item with another item relating to closeness. Yet, despite its widespread usage, this item entails some shortcomings worth mentioning: To begin with, Ha and Jang (2015) provide evidence that national identity and pride are related, yet distinct concepts. Put differently, one can identify with one's nation but does not necessarily need to express pride in it, and vice versa. Moreover, recent work has suggested that national identity is, by definition, not based on pride but is also constituted by collective memories people feel ashamed of ). On a more general note, Meitinger (2018, p. 428) criticises the "vagueness" of the national pride item, indicating that it is a questionable indicator for measuring national identity (for a further discussion see Miller and Ali 2014). In light of this equivocation, scholars such as Dekker et al. (2003) advocate for distinguishing pride and national identity. Likewise, Smith and Kim (2006) see pride as a consequence rather than as an indicator for someone's national identity.

Closeness
Conceiving of national identity as a means of "[…] being similar to some people and different (in perceived or actual terms) from others" (Hjerm 1998, p. 452), i.e. feeling closer to co-nationals than to others (e.g. Triandafyllidou 1998), national identity is often measured by asking for the degree of how close someone feels towards his or her nation (e.g. Huddy and Khatib 2007;Huddy and Del Ponte 2019;Choi et al. 2021). Likewise, Transue (2007) approaches national identity by looking at how well one identifies with a certain group of people and asks respondents how "close" they feel to other Americans to gauge their degree of identity. Yet, when looking at these measures more closely, it is far from uncommon to see this item being used for measuring related, yet distinct concepts such as patriotism (e.g. Citrin et al. 2001;Ariely 2020) or national attachment (e.g. Gustavsson and Stendahl 2020), blurring the meanings of these three constructs.

Single items and mixed items
Apart from these three approaches, a wide array of studies use only a single indicator to measure someone's national identity. For instance, Hierro and Rico (2019) simply ask respondents to what degree they identify with their country (in their case, Spain). Apart from such single-item approaches, national identity is commonly measured with a limited number of different items. For instance, following Kosterman and Feshbach's scale of patriotism (1989), Molina et al. (2015) use the following items: "I have a great love for my country"; "I find the sight of the American flag very moving"; "I am proud to be an American"; and "I feel very warmly toward my country". However, in doing so, they tend to conflate national identity and patriotism, underscoring the lack of conceptual clarity by which this line of research is marked (for a discussion on the distinction between national identity and patriotism see Theiss-Morse 2009). In a similar vein, Fleischmann et al. (2019) use items referring to both "belonging" to a country and the "feeling" of being a part of the country. Given this multitude of measures, one can preliminary conclude that "[…]no single measure has become dominant in the field of national identity" (Miller and Ali 2014, p. 244). In light of the high amount of disagreement on its operationalization, we claim that national identity is a multifaceted concept that cannot be neatly captured with a few items. Dealing with this complexity, we therefore feel the need for employing a method that can reveal the various underlying dimensions with the lowest number of restrictions possible. Seeking to fulfil these requirements, this study applies a data-driven approach using NLPCA.

The creation and the original ideas behind the national identity module of the ISSP
In this study, we draw on the 2013 "National Identity" module of the ISSP (ISSP Research Group 2015), one of the most famous and most often used datasets in the field of national identity, covering a wide range of questions about the nation and thus of our object of investigation. The ISSP is a cross-national survey based on representative samples from over 30 countries, which has been administered annually since 1985.
Given the inductive nature of our procedure, and ensuring a context sensitive approach, it is important to be aware of the original ideas and assumptions that formed the creation of the ISSP. Thus, most importantly, prior to administering the first ISSP wave in 1995, the survey designers failed to agree on a common definition of national identity. What is more, various methodological issues occured, leading to possible "moral, ethical and political problems" (Svallfors 1996, p. 131). As a result, while being first proposed in 1990, the ISSP suffered from a high amount of uncertainty still in 1994. In line with the conceptual dissent, items referring to symbols such as raising the flag or singing the national anthem were eliminated, underscoring the disagreement among the designers on a common list of indicators. Lastly, the final module encompassed around 70 items covering seven sub-fields of national identity: localism, national identity, globalism, minorities and immigration, special demographics, standard demographics, and European integration (Svallfors 1996). Notably, the second sub-field-also called national identity-was intended to find out "who" makes a nation (Svallfors 1996, p. 132). Given this wide variety of items, scholars have used this module to measure concepts the module neither intended to cover nor its designers originally had in mind (Heath et al. 2009, p. 296), such as patriotism and xenophobia (e.g. Ariely 2020).

Data
Drawing on the 2013 "National Identity" module of the ISSP for the case of Germany, we aimed to run a context-sensitive analysis. That is, we not only focused on Germany as a whole but also took into account both Western and Eastern Germany, resulting in the creation of three versions of the data set: one for the country as a whole containing 1716 respondents, and two for Western and Eastern Germany, with 1183 and 533 respondents respectively. Note though, that we choose Germany in this paper solely for illustrative purposes-the proposed method holds as well for other countries. Thus, we also carried out a time series cross sectional analysis on six countries-Germany, Spain, Great Britain, Ireland, Norway and Sweden-covering a time period between 1995 and 2013. Here, we found that NLPCA can yield useful scales for these cases as well (see further Appendix A).

Method: non-linear principal component analysis
To reveal the underlying structure of our data, it is necessary to reduce the number of dimensions it contains. To do so, we look at the relationships among all the dimensions that are present in the data and see if we can combine and reduce them to a smaller number. The new dimensions we create in this way are then known as latent dimensions that underlie our data.

Background: Why NLPCA?
One of the most common ways to find these latent dimensions is by running a PCA. Yet, while useful and popular, PCA makes several assumptions that are often overlooked or even violated. For instance, the data is required to be at least an interval level of measurement or higher. However, in the social sciences, we are confronted with a categorical (nominal or ordinal) level of measurement in most cases. Dealing with these assumptions, scholars often treat such data as if it were numerical, assuming that both categories have the correct order and that the distances between them are equal. Yet, there is no guarantee that this is the case, making the application of PCA rather questionable (Blasius and Thiessen 2012). In addition, regular PCA rests on the premise that there are solely linear relationships between the variables. However, there is no reason that this assumption is fulfilled in the ISSP data set. Given those two restrictions, we contend that an NLPCA is the more appropriate technique in this case (Gifi 1990;Meulman et al. 2004). First, by means of optimal scaling, it transforms the categories into numerical values which can then be used as in a regular PCA (Linting et al. 2007, p. 338). As such, the more the results of NLPCA and PCA are alike, the more valid the assumption of treating the data as numerical would have been (Blasius and Thiessen 2012, pp. 133-38). Advocating for a data-driven approach and given that most variables in the ISSP 2013 data are categorical (ordinal), we apply an NLPCA in this study. To do so, we will use the SPSS CATEGORIES module, which is based on the work of Gifi (1990). As such, it aims to minimize a restricted loss function using an alternating least squares (ALS) algorithm to find the optimal scores.

Application
Applying NLPCA, we followed the guidelines set out by Linting and van der Kooij (2012) that cover the number of dimensions, the removal of outliers and missing data, the level of measurement, and eventual rotation. We now discuss each of these in turn.

Dimensions
Unlike in PCA, we have to set the number of dimensions we wish to extract in advance, as the solutions in NLPCA are not nested (Linting et al. 2007). In doing this, we ran an NLPCA eight times, starting with a single dimension and increasing this number by 1 with each iteration. We then compared the scree-plots that resulted from this and looked at which number of dimensions we most often saw as an "elbow". Here, we found that this happened in all three instances at four dimensions. We then looked at the variance accounted for (VAF) to see what part of the variance these dimensions explain (Table 2). In each case, the first dimension handled around half of the total variance explained-around 20%. The second dimension then captured a further quarter, and the third and fourth dimensions together captured another quarter. In total, the four dimensions explained 37.93% of the variance for Germany as a whole, 40.69% for Eastern Germany and 37.69% for Western Germany.

Outliers and missing data
Removing outliers, we took the standardized component scores for each of the respondents and eliminated those that were higher than 3 or lower than − 3. In all three data sets, there were only several cases in which this was necessary. In most cases, this was a result of a too high degree of missingness or of the respondent giving the same response to all the items. Checking for missing data, we used the "passive" option available for NLPCA. This option made the algorithm use an observation if it had a value on a variable and ignore it if it did not. This is possible as NLPCA is not based on a correlation matrix but on the data itself.

Level of measurement
As NLPCA is able to work with any measurement level of data, we first had to establish the measurement level of the variables in our data by plotting their original categories against their optimal scores. In the resulting transformation graphs, the data is at least ordinal if it shows a monotonic non-linear pattern. If not, this indicates that the data is most likely of a nominal level. This most often showed up in the form of plateaus in the graphs-also known as "ties". These occurred when the optimal values were the same for increasing categories. An example of this takes place in item 60 ("Strong patriotic feelings can lead to intolerance in Germany") in the data for Eastern Germany. Here, there were ties for the first three categories. This indicates that for the respondents, there was no real difference between them. Thus, instead of a five-point scale, respondents treated the response options in a binary fashion. While individual instances of ties are not a problem, they become one if they occur too often (Blasius and Gower 2005). Also, ties are less of a problem if the categories concerned contain only a few of the total number of cases. Here, we found this to be the case in all instances where we found ties. As such, we assumed an ordinal level of measurement for all our variables.

Rotation
Like regular PCA, NLPCA allows for various forms of rotation. Here, we opted for an oblique (oblimin) rotation, as we deemed it likely that the different dimensions we would find correlated. To justify this assumption, we ran an NLPCA with oblimin rotation and looked at the correlations between the dimensions (Table 3). Here, we found that in all cases there was a moderate correlation between the first and fourth dimensions (0.26, 0.26 and 0.33) and the fourth and second dimensions (0.35, 0.31 and 0.31). As such, we opted to use the rotation. Doing so also meant that our interpretation would depend on two instead of on a single matrix. Apart from the pattern matrix that holds the loadings, we would also have to interpret the structure matrix that holds the correlations between the variables and the dimensions. From these two matrices, we first removed all those loadings in the pattern matrix that were < 0.4. Then, we also removed any values where the corresponding value in the structure matrix was < 0.4. In cases of cross-loadings, we took that value in the pattern matrix for which the value in the structure matrix was the highest. As such, any remaining cross-loadings were the result of similar values in the structure matrix. For the rest of this paper, we will refer to the values in the pattern matrix only.

Results
We now turn to the discussion of the dimensions (Table 4). After running the analysis for each of the three cases, we observed that all shared a similar structure. On the first dimension, we find items related to nationalistic attitudes, such as the belief in the intrinsic superiority of one's nation accompanied by the unconditional allegiance to it (e.g. Kosterman and Feshbach 1989;Blank and Schmidt 2003;Davidov 2009;Mußotter 2021), 2 on the second we find items referring to various aspects of "positive" national pride as well as the attachment that one feels towards their country; on the third dimension we find items relating to "negative" national pride and a broad range of cosmopolitan attitudes; on the fourth dimension we find items that cover both ascriptive and elective traits one ought to have to be seen by others as 89 -a true German. We will now discuss each of these in more depth, referring to the topics to which the items belong in the original codebook, as described by Harkness and Scholz (1996). As for the first dimension, we find that it contains all the items that relate to attitudes on foreigners and foreign cultural presence and four of the items that relate to international issues. Also, it contains item 8 on how "close" one feels to Europe-suggesting that many respondents relate the idea of Europe to foreign issues-and item 21 ("People should support their country even if the country is in the wrong"). As most items have a formulation that views one's own country as "better" than others, we see this as capturing the chauvinism of the respondent. That this is the first dimension is of interest, as it is rarely covered by most theories.
The second dimension then captures all except two of the pride items and also items 19 and 20 on whether the country is better than others, and items 59 and 61 on a positive assessment of patriotic feelings. As such, this dimension comes closest to that understanding of national identity taken by the "pride" approach. Note though that this only refers to positive versions of pride and not to its negative equivalents. Thus, the only two "pride" items it does not capture, "I am often less proud of [COUNTRY] than I would like to be" and "The world would be a better place if [COUNTRY NATIONALITY] acknowledged [COUNTRY'S] shortcomings", refer to negative mentions of pride and are captured by the third dimension.
Of particular interest is the third dimension. While it only loads nine items, their loadings are all high, and the VAF of the dimension is roughly equal to that of dimension 4. The best way to interpret this dimension is as a negative version of the second dimension. Thus, where that dimension captured positive feelings toward the nation, this dimension captures those that are negative. As such, it contains the two negative pride items 23 and 24. It also contains those items that favour international solutions over national ones. Thus, it contains item 18 ("There are some things about Germany today that make me feel ashamed of Germany") and item 36 ("For certain problems, like environment pollution, international bodies should have the right to enforce solutions"). Both of these items express scepticism of the nation being able to carry out the solutions by itself, instead of preferring an international solution.
Finally, the fourth dimension bundles both the importance and the closeness approach. Thus, it includes all items from the identification category and the items dealing with what makes one a true German. The only exception is item 8 ("How close do you feel to: Europe?"), which loads on the first dimension.
Looking at both Eastern and Western Germany separately, we find that their structures deviate only a little from each other. In Western Germany, item 8 on closeness to Europe now loads with the other closeness items on dimension 4. This suggests that at least here respondents see "Europe" as part of the nation, and not as an international topic. Also in Western Germany, item 53 ("Legal immigrants to Germany who are not citizens should have the same rights as German citizens") is now no longer part of the first but the third dimension. Thus, respondents relate this item with negative views towards the nation. As for East Germany, we find that all but one of the closeness items now load on the second, pride dimension. This suggests that for those respondents the location they currently live in is part of the pride they feel towards their nation.

Empirical application
After generating the four scales for Germany, we will now turn to a brief empirical application. In doing this, we seek to show how our scales relate to those currently in use, and especially investigate the relationship between our measures and two socio-demographic variables, that are both widely considered as the main predictors: age and the level of education of the respondent (e.g. Theiss-Morse 2009).
Previous scholarship has largely substantiated that higher levels of education are more likely to be associated to lower levels of national identity (e.g. Theiss-Morse 2009;Kunovich 2009;Huddy and Khatib 2007;Bekhuis et al. 2014;Gustavsson and Stendahl 2020; for a more differentiated view see Huddy and Del Ponte 2019). Given this evidence, the following hypothesis is tested: H1: The higher the level of education of the respondent, the lower the level of national identity.
Likewise, there is strong evidence that older people are more likely to display higher levels of national identity than younger people (e.g. Theiss-Morse 2009;Kunovich 2009;Huddy and Khatib 2007;Bekhuis et al. 2014;Huddy and Del Ponte 2019;Hierro and Rico 2019;Gustavsson and Stendahl 2020). Thus, the following hypothesis is tested: H2: The older the respondent, the higher the level of national identity. For both cases, we measure the level of identity as the position on the additive scale created by each of the four sets of items. Besides, for age, we use the age of the respondent as given by themselves, and for education their years of schooling. Here, the average age was 49.3 (sd = 17.6) and the years of schooling 12.4 (sd = 3.8). In order to test both hypotheses, we run a simple additive model in which we regress the age and years of education of the respondent on each of the scales. Table 5 shows the standardised coefficients of this analysis. Note that the way we constructed the scales means that higher values stand for a higher degree of national identity.
Concerning education, we find significant correlations for Scales 1, 2, and 4, consistent with hypothesis 1. In contrast, the Scale 3 capturing negative feelings towards the nation, does not empirically support the hypothesis. We find that Scale 4, targeting both importance and closeness items, is the only one empirically substantiating hypothesis 2, showing that a higher age corresponds with a higher level of national identity. As such, for both cases, while there are correlations, they are not equal across the board.
At this point, the differences between the scales displayed bolster our main claim and thus the need to treat the construction of national identity scales with caution. That is, a different combination of items and thus a different operationalization of this construct implies the potential to reject or empirically support a relationship without the underlying facts changing. Also, it shows that relationships scholars considered as a solid empirical finding in the field of national identity as a whole, may only hold for parts of it, highlighting the limitations inherent in the scales employed. Therefore, it is reasonable to assume that NLPCA can lead one to gain a better and more nuanced understanding of the nature of

Discussion
In this article, we introduced a novel data-driven approach to better study the complexity of national identity that has not been applied yet in this field. Going beyond existing studies that tended to neglect the multidimensionality of the concept and thus focus on a limited part of it, we opted for a method that "finds" all the underlying dimensions within the data and thus better reveals the complexity of the concept under investigation. Using NLPCA, we yielded four distinct underlying elements that national identity consists of: nationalistic attitudes, national pride and attachment, cosmopolitan beliefs, and membership criteria defining national belonging. Put differently, one can safely conclude that national identity is the overarching dimension of these related, yet distinct aspects all revolving around the same point of reference-the nation. Adopting this approach, we contributed to the growing empirical scholarship on national identity in three ways: arguing for a data-driven method that has not been applied yet to this field, using the full data set instead of just a few items, and contributing to the understanding of national identity in Germany, a case that has not yet been covered at large. Exploring national identity by running an NLPCA, we contributed to addressing the challenges by which the extant empirical scholarship is marked. Despite all the remarkable advantages, existing studies, mainly theory driven, are limited as they measure only one aspect of national identity, focusing on pride, closeness, or membership criteria defining national belonging. Applying NLCPA, this investigation is distinct from these approaches as it relies on a data-driven method that seeks to find all of the underlying patterns in the data. Allowing for non-linear relations between the variables and dealing with both ordinal and nominal data, NLPCA has two crucial advantages in contrast to other methods such as PCA that are commonly used in empirical scholarship on national identity.
Fully acknowledging the multidimensionality of national identity, we decided to integrate all the items contained in the National Identity module of the ISSP and not just a few, as is commonly done in current scholarship. By including all the items of the ISSP battery, we were able to provide a more complete picture of the data. In addition, we did not come across any inherent data problems in generating additive scales from the items instead of using only a single item. Yielding a four-dimensional structure, we were able to empirically approach this complexity by employing NLCPA.
Concerning our single-case analysis, we found only limited differences between Eastern and Western Germany, making the perceived East-West divide less significant. As a result, it is reasonable to conclude that both parts share a relatively similar understanding of what national identity is made up of.
Apart from these strengths, there are also some limitations of this data-driven approach that need to be mentioned. As Benoit and Laver (2012) note, the results of an inductive analysis are often confusing. In addition, when applying NLPCA, scholars need to make many small (and seemingly arbitrary) decisions, i.e. when to include a variable as either ordinal or nominal, or whether or what criteria to maintain when dealing with outliers. As Gelman and Loken (2014) warn, taking such decisions can easily lead to getting lost in a "garden of forked paths" where multiple small decisions can lead to widely different outcomes. As such, it is important to be aware of the potential effects of each of the decisions, as well as to provide reasoning why it is taken. This is especially relevant when one cannot be sure about the underlying quality of the data. While for our case, we deem the quality of the ISSP data to be good enough to build scales from, we are well aware of the criticisms that have been raised against it (e.g. Blasius and Thiessen 2012). Given that the conclusions that one draws from a data-driven approach are fully dependent on that data, ensuring that the quality of the data is in order is advisable.
Yet, despite these limitations, we believe in the merits of this data-driven approach and close with two recommendations for future research. First, we think that NLPCA can move empirical scholarship on national identity forward. Showing that it can be applied not only for the case of Germany but in other countries such as Great Britain or Spain, we also highlighted the generalizability of this method. Therefore, on a more general note, we call for more reflection not only on the way we conceptualise it but also, and in particular, how we measure national identity. Given the multitude of different indicators applied in existing accounts, there is a considerable lack or empirical agreement that may warrant reconsideration. Put differently: if national identity is measured, amongst other things, by love of country, the concept risks losing its meaning in both substantial and empirical terms. Second, given the four different dimensions we found, this study highlights the need to conceive of national identity as a multidimensional instead of a unidimensional concept. Thus, we call for using proper methodological approaches that do justice to this complexity.