Introduction

Buying a home or ensuring an income for one’s retirement are just a couple of examples of the many situations that individuals will face during their lives and which require basic financial knowledge to make sound decisions. Currently, the ability to make informed, aware, and efficient financial decisions seems to be particularly important. Some recent trends converge to demonstrate a real need to promote and improve individuals’ financial literacy, especially in some countries, such as Italy (Coppola et al. 2017). As pointed out by analyses conducted in Italy and elsewhere (see The European House-Ambrosetti, Consorzio PattiChiari 2007; Grifoni and Messy 2012; Lusardi and Mitchell 2014), compared to past generations, people live longer (and live longer in retirement), which clearly suggests that there is a need to effectively manage money to achieve lifelong financial security. In this regard, another very recent trend is the greater personal responsibility that individuals have over pension planning, and the anxiety that they associate with it (Nicolini 2017). In Italy, the willingness to handle financial information is less pronounced for women, seniors and the unemployed (sometimes considered as “more vulnerable groups” in terms of poverty risk). Additionally, a low level of financial knowledge, financial anxiety and a lack of interest in financial issues show a negative correlation (Linciano et al. 2017). Taken together, these factors create challenges that might be more easily faced with the aid of financial education.

It is currently acknowledged that the design of effective educational interventions needs to be preceded by a thorough assessment of the initial or baseline level of financial literacy. This initial step is crucial for two reasons. First, a sound assessment approach helps in correctly identifying the educational gaps or biases to be addressed by the education programs. Second, this initial assessment represents an indispensable prerequisite for evaluating the success and impact of the defined interventions. Thus, assessing financial literacy is a challenge that needs to be undertaken by any organization, either public or private, that wants to engage in financial education at any level.

To date, the issue of assessing financial literacy has concentrated on the conceptual definition of the latent variable called “financial literacy”. Huston (2010) and Remund (2010) provide a thorough literature review that helps frame the issue of defining the concept of what financial literacy is or should be. Indeed, financial literacy has been variably defined as specifically referring to a form of knowledge (e.g., Hilgert et al. 2003), the ability to apply that knowledge (e.g., Mandell 2007), and good financial behavior (e.g., Moore 2003). The methods used to measure financial literacy vary quite substantially according to the different conceptual definitions adopted. In fact, without a consensual definition, financial literacy has been measured dissimilarly across studies. The construct either focuses on a few financial issues or covers a wide variety of financial topics, including debt, insurance, spending, budgeting, inflation, investments, and saving for retirement. Analogously, the number of questions used to assess financial literacy levels also varies widely, ranging from 3 to 45 items. Across studies, both performance tests (usually multiple-choice questionnaires) and self-report methods have been employed to measure financial literacy. Performance tests are mainly knowledge-based (e.g., Mandell 2007), while self-reports tend to assess perceived knowledge. More recently, tests have been designed to gauge both objective knowledge and perceived knowledge. In general, considerable progress has been achieved in the design of surveys aimed at identifying individual levels of financial literacy through the effort made by the OECD and its International Network on Financial Education (INFE) to develop and promote a common questionnaire based on the experience of a large number of previous rigorous national and international surveys. The OECD/INFE questionnaire and the underlying approach were described in OECD INFE (2011) and discussed in great detail by Kempson (2009). The questionnaire has been used in many countries (Atkinson and Messy 2012) to collect comparable data, including Italy (ABI-PattiChiari 2014).

In contrast, until very recently, the process of data analysis (i.e., of analyzing the information obtained through the questionnaires) has been less explored. Both bivariate and multivariate techniques are usually applied. In general, the responses to the proposed questions are simply summed to generate a score of financial literacy, which typically ranges between zero and the maximum number of correct answers. More recent studies have applied factor analysis (van Rooij et al. 2011). It is widely acknowledged, however, that more work is needed to develop rigorous psychometric analysis (Knoll and Houtts 2012). Leveraging the Italian experience in assessing financial literacy at the national level, this paper critically reviews data analysis approaches used to evaluate financial literacy, and proposes a new method to gauge this latent construct in order to obtain a valid and reliable index that is able to capture educational needs in a manner that is as accurate and targeted as possible.

The remainder of this paper is organized as follows. The second section provides an overview of the OCSE-INFE international survey on financial literacy and the main results obtained when run in Italy in 2013. Section three describes the sample used in the current study and outlines our approach to data analysis. Section four presents the empirical results. The final section summarizes the main results and draws some conclusions.

Background: the OECD–INFE international survey on financial literacy

In 2011 the OECD promoted a financial literacy survey within the framework of the INFE (OECD INFE 2011). The aim of the project was to collect information on the level of financial literacy among member countries, fully consistent with OECD recommendations to guarantee the cross-country comparability of financial literacy indicators. The findings of the first pilot study based on this approach are illustrated in Atkinson and Messy (2012).

Italy joined the survey in 2013 by means of a public–private partnership led by “PattiChiari,” a consortium of Italian banks committed to promoting market transparency and financial education. The questionnaire used to assess the respondents’ level of financial literacy closely followed the OECD INFE guidelines to measure financial literacy across countries. It included both a core questionnaire and a set of supplementary questions aimed at investigating issues such as the ability to properly access and use financial information and to plan for retirement, which were considered to be of interest for a better description of the Italian population’s level of financial literacy.

Following the OECD approach detailed in Atkinson and Messy (2012), the pieces of information provided in the core sections of the questionnaire were used to define three indicators of financial literacy: a financial behavior index (FBI), financial attitude index (FAI), and financial knowledge index (FKI). Furthermore, by exploiting the supplementary questions of the survey, two further indicators were created: a financial familiarity index (FFI; measuring the knowledge and usage of financial products and services), and financial planning index (FPI; focusing on the respondents’ ability to plan for retirement). The exact definitions and statistical descriptions of all indicators are reported in ABI-PattiChiari (2014) and Baglioni et al. (2018).

The FBI was obtained as an additive indicator, ranging from zero to nine, based on the answers to questions focusing on the financial decisions of the respondent, and assigning a score to each answer that increased with the quality of each decision (i.e., “savvy” financial behavior). The questions covered the consistency of the respondent’s purchases with respect to budget constraints, the ability to meet payment deadlines and to maintain an adequate financial budget, the quality of savings and the choices of financial products, and the ability to commit to long-term financial planning.

The FAI provided a measure of the respondents’ propensity to save. The index was built by assessing the individual’s attitude toward saving for the future, as well as the perception of the tradeoff between current and future spending. The index was obtained by means of categorical scores (between 1 and 5) with higher values indicating a higher propensity to save.

Following Lusardi and Mitchell (2011), the FKI was based on the number of correct answers given by the respondent to questions addressing simple financial concepts such as the role of inflation, the ability to compute simple and compound interest rates, the relationship between risk and return, and the notion of portfolio diversification.

The two supplementary indicators provided a closer look into the respondent’s familiarity with financial products and the ability to plan for retirement. The FFI was based on the respondent’s knowledge and usage of fifteen financial instruments, ranging from bank accounts and credit cards to mutual funds, stocks and shares, and insurance products.

The FPI aimed to establish the respondents’ awareness of the necessity to plan savings in advance in order to smooth consumption over the entire life cycle. The index was based on three questions assessing the existence of a financial budget at the household level, familiarity with supplementary pension funds, and familiarity with other forms of long-term savings to support retirement income.

To obtain a comprehensive measurement of financial literacy, the indexes described above were then aggregated, building on the approach detailed by Atkinson and Messy (2012). Indeed, the authors highlight that “financial literacy is a combination of knowledge, attitude and behavior, and so it makes sense to explore these three components in combination […] by adding the scores together” (Atkinson and Messy 2012, p. 39). The financial literacy index was therefore built as a simple average of the three indexes describing an individual’s financial behavior, financial attitude and financial knowledge. Along with this first financial literacy index, a second and more comprehensive financial literacy index was computed. This included all five elementary indexes depicted above; i.e., the three indexes suggested in the OECD guidelines plus the two indexes obtained from the supplementary questions.

Once the elementary and comprehensive indexes had been computed, they were subsequently used to analyze their relationship with the usual set of sociodemographic and economic characteristics of the respondents to obtain a description of the determinants of the level of financial literacy of the Italian population.

Two main methods were applied: ordered probit and ordinary least square (OLS) regressionsFootnote 1 and classification and regression tree (CART) analysis.Footnote 2 The former are traditional forms of analysis that estimate the relationships between a dependent variable (which could be a categorical and ordinal variable or continuous variable) and a set of independent variables. The latter is a non-parametric regression and classification method originally introduced by Breiman et al. (1984). It allows the simultaneous identification of significant covariates impacting on the dependent variable of interest (in our case, the financial literacy indexes) and significant clusters (in our case, of individuals) that exhibit relevant differences with respect to the dependent variable, and homogeneous characteristics with respect to the explanatory variables considered.

In other words, using probit or OLS regressions, the researcher obtains a causal relationship (significance and relevance of impact) between the level of financial literacy and the sociodemographic explanatory variables under investigation. With CART analysis, the researcher is also able to split the sample into relevant and homogeneous clusters that exhibit differences in their sociodemographic characteristics with respect to their (similar) level of financial literacy.

These two diverse approaches to data analysis produce different results concerning the main determinants of the aggregate level of financial literacy as well as its elementary factors (knowledge, behavior, attitude).

Tables 5 and 6 and Figs. 3, 4 and 5 in Appendix 1 present these differences. The tables include the results of ordered probit and OLS regressions applied to each elementary index and to the aggregate indicators of financial literacy.Footnote 3 Figures 3, 4, 5 illustrate the results of applying CART analysis. The differences are immediately apparent in two ways. First, the influencing covariates are not necessarily the same. Second, with CART analysis it is possible to identify different clusters of individuals with respect to the same variable that regression analysis highlighted as being relevant in influencing their level of financial literacy.

For instance, with respect to the FFI (Fig. 3), the CART analysis resulted in ten different clusters of individuals, initially identified with respect to their participation in the labor force. In this case, what was then relevant in explaining their familiarity with financial products was their level of income, followed by the area of residence for low-income individuals. On the other hand, for inactive individuals (in search of a job but also retirees and students), the second most important explanatory variable was the area of residence, followed by education for those living in southern Italy; however, marital status was more important for individuals living in northern Italy. On the other hand, the probit regression applied to the FFI resulted in a larger number of influencing factors beyond those highlighted by the CART analysis; i.e., gender, age, and direct involvement in the financial decisions of the household. In addition, regression analysis suggested to policy makers that all individuals sharing the same characteristics (for instance, living in southern Italy) are potentially identical targets for the same education program, showing the same deficit in financial literacy. In contrast, CART analysis showed that there are at least three very different clusters among individuals residing in southern Italy (those active in the job market, those not working with low education, and those inactive with a higher level of education), suggesting that individuals not working and with a low level of education comprise the target group most in need of financial education programs.

Similarly, considering the financial knowledge of the respondents (Fig. 4), CART analysis identified gender as the first discriminant factor in the sample of respondents, followed by education and income. As before, the regression analysis identified, with no scale of priority, a larger number of significant covariates, including age and having an active role in financial decision making within the household, in addition to gender, level of income and level of education (as in the CART analysis). In the specific case of the FKI, it is important to underline that targeting women as a single homogenous cluster in need of financial education is, again, a choice with potentially limited effects. Indeed, women with a higher level of educational attainment (university degree), who are married or cohabitating, and who are in the labor force earning medium- to high-level salaries, show, on average, degrees of financial knowledge that are similar to those attained by men. Those who are in greater need of receiving educational support on basic financial issues are women with limited education or those not in the labor force.

Considering the global financial literacy index proposed by the OECD INFE guidelines (see Fig. 5), the OLS regression identified gender, age, involvement in the financial decisions of the household, marital status, level of education, income and area of residence as relevant determinants of the individual’s level of financial literacy. CART analysis restricted the number of relevant covariates, highlighting that the level of education was the first discriminatory variable to define individuals with lower and higher levels of financial literacy. Next, among individuals who had attained tertiary education, the level of income was the second most important discriminatory variable, which helped to identify a specific cluster with a high educational level but a low-income level. Once the level of income was considered, the area of residence was important for medium-income individuals, whereas gender became relevant for high-income individuals. On the other side of the “tree”—individuals with educational attainment up to the secondary level—the geographical area of residence, first, and age, second, were found to be discriminating factors. In summary, a financial education program targeting women as a homogenous (and vulnerable) cluster in need of educational support would not take into account the fact that only highly educated, high-income women are in need of such a specific program, whereas all other women can be addressed by finance programs targeting men with, for example, a low income and lower educational attainments.

So far, by critically reviewing the outcomes of one of the most comprehensive surveys conducted in Italy, we have shown that, according to the diverse statistical methods used to analyze the same financial literacy indexes, different insights about the level of financial literacy of individuals can be revealed, which have important implications for policies (including the design of education programs) aimed at improving this literacy.

A further step toward understanding a latent variable such as financial literacy could come from the adoption of statistical approaches that are able to provide information on the reliability and validity of the measures used. In the next section, we apply a well-known psychometric technique—item response theory (IRT)—in an area where such techniques are not often applied; i.e., financial literacy measurement. To the best of our knowledge, only a few studies have explored the viability of these models for assessing financial literacy (Bongini et al. 2012, 2015; Knoll and Houtts 2012; Despard and Chowa 2014).

Methods

Sample

The proposed procedure for data handling was applied to a sample comprising 1247 Italian residents of at least 18 years of age who were reached via CATI. The sample was obtained by appropriate stratification across several dimensions (gender, age, geographical area, and municipality size).

Table 1 shows the distribution of the sample relative to the main sociodemographic variables. The respondents’ average age is approximately 50 years. Almost 60% are married or cohabitants; and 42.3% are employed. The median family income declared by the respondents is approximately 1900 euros. Regarding the educational level of the respondents, approximately 21% received only primary education, 29% secondary education (lower level), 31% secondary education (upper level), and 11% tertiary education. In terms of the geographic composition of the sample, 46.5% of the respondents live in the northern region of the country; approximately 31.01% reside in small municipalities (up to 10,000 inhabitants); and 23.5% in large cities (above 100,000 inhabitants).

Table 1 Sample distribution

Statistical analysis: item response theory

The issue of the most appropriate way to measure literacy has attracted increasing attention in educational research over the last two decades. One important aim in measurement is to build tests with high validity and reliability. The two most popular frameworks in educational measurement are classical test theory (CTT) and item response theory (IRT) (Hambleton and Jones 1993). In general, CTT has dominated the area of standardized testing because of its weak assumptions and its easy interpretation. Indeed, the indexes proposed by the OECD approach and discussed above rely on CTT. Despite these features, CTT has been criticized since the score on a test is not an absolute characteristic of the respondent. In fact, it depends on the content of the test. Moreover, the difficulty of the items may vary depending on the sample of respondents who take a specific test. It is therefore difficult to compare the data of respondents between different tests. For these reasons, IRT was originally developed to overcome the problems with CTT.

The specific feature that makes IRT models increasingly popular in many areas of research is the presence of a metric that considers both the test’s difficulty and the respondent’s specific abilities. IRT aims to measure one or more ordinal/quantitative latent variables on a metric level of measurement, and it is fit to quantify aspects such as ability and personal traits. For these reasons, it has been widely adopted in educational research and psychometrics, where researchers develop and design exams, maintain banks of items for exams, and measure the items’ difficulties for successive versions of exams by the use of IRT (Bond and Fox 2007; Goldstein 1979). For example, in computerized adaptive testing (CAT), the respondents respond to items that are optimally selected to assess their attitude or abilities. The respondents may receive no common items. IRT helps to select the items for a respondent and to measure the scores across different subsets of items. For instance, several aptitude tests need IRT to estimate the abilities of the respondents, such as the Armed Services Vocational Aptitude Battery, the Scholastic Aptitude Test (SAT), and the Graduate Record Examination (GRE). Several individual intelligence tests adopt IRT to manage the tests, such as the Woodcock–Johnson Psycho-Educational Battery, the Differential Ability Scales, and the Stanford-Binet test (Embretson and Reise 2013). Furthermore, several researchers have applied IRT to personality trait measurements (Reise and Waller 1990), as well as to attitude measurements and behavioral ratings (Engelhard and Wilson 1996).

The Program for International Student Assessment (PISA) surveys has been adopting IRT models since 2000 (Liu et al. 2008). Moreover, personal properties or item characteristics can be included in IRT models to explain person or item effects, obtaining explanatory item-response models (De Boeck and Wilson 2004). Until very recently, the analysis of financial literacy has relied only on CTT. To the best of our knowledge, only a few studies have used IRT in this domain (Knoll and Houts 2012; Bongini et al. 2012, 2015; Despard and Chowa 2014).

In general, IRT models convert raw scores into linear and reproducible measurements. An IRT model has two properties, which require checking in order to ensure the model’s validity. Those properties are unidimensionality and local independence. The unidimensional property requires that the items of a questionnaire share a common primary construct (i.e., that they all measure financial literacy), while the local independence property requires that the items are significantly independent of each subpopulation of respondents whose members are homogeneous with respect to the latent trait measured (for instance, gender or race).

According to IRT models, an individual’s response to an item is determined by his/her level of knowledge (alternatively, ability or trait) of the latent variable under investigation (e.g., financial literacy), and by the level of difficulty of the given item. IRT models define the score (number of items answered correctly) of a particular respondent as a probability function of his/her ability and item difficulty. One way of expressing IRT models is in terms of the probability that an individual with a particular trait will correctly answer an item that has a particular level of difficulty, as expressed in the following formula:

$$P\left( {X_{pik} = 1 |\theta_{p} , \beta_{ik} } \right) = \frac{{e^{{(\theta_{p} - \beta_{ik} )}} }}{{1 + e^{{(\theta_{p} - \beta_{ik} )}} }}$$
(1)

In Formula (1), Xpik refers to the response X made by the p-th individual to the i-th item (k refers to the possible level of the i-th itemFootnote 4); θp refers to the level of knowledge (ability) of financial literacy of the p-th individual; and βik is the level of financial literacy (difficulty) required to reach level k of the i-th item. In addition, we let βi denote the average level of financial literacy for the i-th item.

A typical representation of IRT is an “item map” where the item difficulties can be placed like points along a line and the person’s ability as a point along the same line. In Fig. 1, we apply this method to the data underlying the FKI described in the previous section.

Fig. 1
figure 1

Item-person map for the FKI (CART procedure)

To answer our central research question—i.e., whether survey outcomes are sensitive to the data handling method employed—we applied IRT analysis to the survey data, checking the two properties of unidimensionality and local independence. Our aims are, firstly, to test whether the selected items were indeed measuring the same latent construct (i.e., an individual’s financial knowledge, financial attitude, financial behavior, and level of financial literacy); and, secondly, to ensure the local independence property by assessing whether the instrument is measuring the specific object. Our third aim is to analyze the attributes of the items and the respondents on the same scale, via the item-person map, to convey easy-to-read information about the distribution of the respondents and the chosen items.

Results

Table 2 displays the misfit indexes (Wright 1999; Bond and Fox 2007) for our three elementary indexes (financial knowledge, financial attitude, and financial behavior) and for the overall latent variable of financial literacy. As the term implies, a misfit is an observation that cannot fit into the overall structure of the questionnaire and is an indicator of how well the data conform to the IRT model parameters. In this work, we used the index based on the average value of the squared residuals (MNSQ). Two types of fit statistics are addressed by the MNSQ: infit (the weighted average of the squared residuals) and outfit (unweighted average of the squared residuals) (Bond and Fox 2007). Guidelines vary according to test, item and respondent characteristics, but for general purposes, an MNSQ value in the interval [0.5–1.5] means that the item is “productive” for the measurement. In contrast, for values greater than 2.0, the item is considered degrading for the measurement (Linacre 2006). Almost all the items used for computing the three sub-indexes are “productive for measurement”, which means that they do not distort the measure under investigation; i.e., they all measure the same latent construct. In the case of the FBI and the overall financial literacy index, the test confirmed that all but three items are coherent and finalized to measure the specific latent variable. However, such items did not degrade the measurement system; thus, they can be maintained in the questionnaire. In summary, we can confirm that the items proposed in the OECD/INFE questionnaire are indeed good measures of one latent variable and can be used together.

Table 2 The measure of the items for FBI, FAI, FKI, and financial literary index

A second relevant property that needs to be assessed is local independence, which ensures that the instrument is measuring the specific object. For this purpose, we apply principal component analysis to the standardized residuals (Smith 2002). Table 3 shows the standardized residual variance decomposition for our set of indexes.

Table 3 Standardized residual variance of the indexes

The raw variances of the empirical model explained by each index closely match the expected raw variances (modeled). Moreover, because the modeled values for the three indexes are in the interval [50–60%], the measurement scales can be considered fairly good. Regarding the overall financial literacy index, which exhibits a value greater than 80%, the measurement scale is considered excellent. Furthermore, for the three indexes, the unexplained variances in the 1st contrast demonstrate that the instrument is good (Fisher 2007) since it falls within the required interval [5–10%]. Given that the value of unexplained variances for the global financial literacy index is less than 3%, the measurement instrument is excellent. In summary, we confirm, on solid statistical grounds, that the items used to build the four indexes meet the required unidimensional and local independence traits and are appropriate to define the level of financial literacy of an individual.

The third aim of our analysis was to assess the attributes of the items and respondents at the same time via the item-person map. Figure 1 presents the item-person map for the FKI. Maps produced by IRT models can be used to quickly communicate complex information and do so in a presentational format that can be easily understood. Indeed, if we were not using an IRT metric, we would have been unable to measure, on the very same scale, both the respondents’ ability and the questions’ difficulty. In fact, in the case of the financial knowledge items, the item difficulty scores ranged between 0 and 1247 (i.e., the whole sample): zero was applied to the case where no respondent answered each question correctly, and 1247 applies to the case where the whole sample answered each question correctly. Conversely, a person’s ability is measured on a scale ranging from 0 to 6: 0 corresponds to a person who was unable to answer any item correctly, and 6 applies to a person who answered the whole set of questions correctly. Therefore, the two metrics are not directly comparable.

The IRT item-person map shown in Fig. 1 orders the level of financial knowledge of the respondent (left-hand side), and the difficulty of the multiple-choice questions (right-hand side). The questions at the top of the scale were more difficult to answer; hence the test becomes easier further down the scale. The individuals with the least financial ability (at the bottom of the scale) had difficulty even with the easiest concepts (e.g., the relationship between risk and return); whereas the individuals with the most financial literacy (at the top of the scale) had no difficulty performing any of the activities implied by the questions. In particular, the respondents on the upper left-hand side were said to be “better” or “smarter” than the items on the lower right-hand side, which means that these easier items were not difficult enough to challenge highly proficient individuals. On the other hand, the items on the upper right-hand side outsmarted the individuals on the lower left-hand side, which implies that these difficult items were beyond the level of ability possessed by our sample. Items 4 and 6 were the easiest and most difficult to answer, respectively. This conclusion is also supported by the frequency distribution of the answers given to the six items concerning the construct of financial knowledge. Table 4 lists the percentage of correct answers for the six items.

Table 4 Percentage of correct answers for the six items of the FKI

The relevant contribution of IRT lies in the fact that the map reproduces directly the frequency distribution of the respondents with respect to their financial knowledge (ability) and the position of the items with respect to their difficulty in the financial knowledge construct. The unit of measurement of difficulty and ability is the same. For instance, item 4, with a difficulty equal to − 0.99, was correctly answered by 85.3% of the respondents. This is equivalent to saying, ‘85.3% is the proportion of respondents who have an ability greater than − 0.99.’

Having confirmed that the items were correctly chosen, and having investigated the relationship between difficulty and ability, a researcher is subsequently provided with a number of statistical methods to further investigate the socioeconomic characteristics of the respondents in relation to the IRT measure. For instance, it might be useful to evaluate whether a specific subgroup (defined by age, gender, or education) is disadvantaged or advantaged with respect to single items (numeracy problems, behavioral aspects, or attitude issues) and the whole issue under investigation (financial literacy). Differential item functioning (DIF) is a method that can uncover such differences, as explored by Bongini et al. (2012, 2015), who found a gender gap among university students on a single item (but not one that referred to the whole construct of financial literacy). Alternatively, one can include IRT measures and the socioeconomic characteristics of the respondents in a latent regression model, which provides a powerful framework to detect and analyze group differences that considers the characteristics of both items and individuals simultaneously (De Boeck and Wilson 2004).

Finally, CART analysis can be applied to the IRT measure. In this study, we applied CART analysis to the overall financial literacy index to compare the results when applied to the same construct (financial literacy) but measured through two different methods, CTT (Fig. 5) and IRT (Fig. 2). It is immediately apparent that the same approach applied to two different ways of constructing the same latent variable delivers different results with respect to relevant clusters differing in their level of financial literacy. In other words, depending on how we handle financial literacy data, through CTT or IRT models, we end up with dissimilar outcomes about who needs more financial education.

Fig. 2
figure 2

Segmentation of the Italian population with respect to the aggregate indicator of financial literacy, as defined by the IRT model (CART procedure)

Discussion/conclusion

The present paper aimed to provide insight in order to improve the procedures for analyzing data that describe a latent variable such as financial literacy, leveraging the recent Italian national survey based on the approach proposed by the OECD through its INFE. As underlined in the introduction, assessing the baseline level of financial literacy represents an indispensable prerequisite to the design of effective education programs; that is, interventions that successfully address specific target groups with particular educational needs. The evidence provided in this paper shows, firstly, that different methods of analysis applied to the same measure of financial literacy deliver different results; and, secondly, that the same method of analysis applied to different measures of financial literacy also delivers different results. Consequently, we can state that the method of data analysis is crucial for the subsequent step of devising successful education programs in the field of financial literacy among different target groups. In particular, our findings show that adopting a specific method of data analysis delivers results that would not be obtained by adopting an alternative method, thus indicating that different approaches cannot be considered interchangeable. These findings suggest further improvements to the process of financial literacy evaluation which we summarize here.

First, CTT has long been proven to be outdated as regards defining people’s level of financial literacy. A basic test should be integrated into more sophisticated models where the difficulty of the items and the ability of the respondents are considered. From this perspective, using IRT helps to define for every possible test item difficulty the existence of a weighted score that corresponds to that level of ability, opinion, or feeling of the respondents. Moreover, when the assumptions of IRT are proven, its estimates of the item parameters are independent of the sample. A respondent should show the same ability, independent of the set of items adopted; and conversely, a given item should have the same difficulty, independent of the respondents.

Second, applying alternative and more sophisticated methods of data analysis to financial literacy data enables researchers to target specific population groups. Instead of assuming that sharing the same personal characteristics among individuals (e.g., gender) necessarily means sharing the same financial literacy needs, the results of our CART analysis suggest that women should not be considered a homogeneous group in terms of their level of financial literacy. Consequently, policy makers cannot treat “women” as a potentially identical target of the same education program; rather, they should differentiate and develop specific programs depending on the different cluster to which women belong (e.g., educational level, residential area). In this regard, with the goal of targeting people (especially the more financially vulnerable ones) in an ever more detailed and precise way, the data analysis methods used in this study might offer the possibility to also include other individual non-cognitive characteristics such as personality traits, which were recently proven to be a fundamental aspect of financial behavior. For example, research has investigated conscientiousness (Roa et al. 2017) and impulsivity (Baldi et al. 2013; Iannello et al. 2015; Bongini et al. 2015), as well as the social roles of respondents, such as homemakers vs. financial workers (Croson and Gneezy 2009; Dwivedi et al. 2015).

Many of the studies carried out in Italy to date have focused on financial knowledge (the cognitive aspect of financial literacy). However, future research should perhaps carry out more in-depth analysis of soft skills rather than content knowledge, such as the confidence to be proactive, and a willingness to take investment risks. For example, in a meta-analysis carried out by Fernandes et al. (2014), measured knowledge of financial facts had a weak relationship to financial behavior in econometric studies, controlling for omitted variable bias. As pointed out by many authors (e.g., Worthington 2006; Nicolini 2017), financial literacy should be tested against an individual’s needs and the context in which they live, not against a large set of available financial products and services, since consumers will never need or use most of these products and services. The assumption here is that an individual’s financial literacy should not be measured in “a linear sense” but, rather, with respect to the set of knowledge that is necessary to deal with specific financial needs, desires, expectations and fears of specific groups of consumers. From this perspective, “financial literacy becomes a multidimensional construct, with an individual being knowledgeable in certain domains (e.g., investing) while showing a deep lack of knowledge in others (e.g., borrowing)” (Nicolini 2017, p. 35). However, a lack of knowledge in a specific area is not considered a very critical gap if the individual is not called to make financial decisions in that area.

In line with this view, we are aware that the questionnaire used in our study involves some aspects that warrant critique. That questionnaire was a version of the OECD/INFE questionnaire tailored to a national survey and, as pointed out by Robson and Splinter (2015), one problem with national surveys is that there is no clear way to assess individual responses and micro-level changes over time in regard to behavior. As for studies that use the questionnaire to provide reliable information on what people do in the financial domain, another limitation is that the questionnaire tests behavior with self-assessed questions that deal with financial problems and tasks which may be not be realistic for every respondent. In fact, it focuses primarily on one aspect of an individual’s financial capability,Footnote 5 attaching less importance to the context. Further research should take into account social and contextual issues, as suggested by some institutions that promote financial inclusion and financial wellbeing (e.g., CYFI 2012; CFPB 2015), and by authors who are critical of mainstream approaches to financial literacy and work with people living on low incomes (e.g., Landvogt 2006; Rinaldi 2016).

To conclude, our research suggests that financial literacy research should be open to new and alternative approaches to measurement, while being aware that different data analysis methods can produce different results. Therefore, different types of analysis are called for. Additionally, researchers should be clear about why one method is to be preferred to another, and why one set of results are more useful than another set. This sort of information would be useful to policy makers who are keen to design more efficient and more effective financial education programs for target groups.