Skip to main content

The ecological fallacy in a time series context: evidence from Spanish regional unemployment rates


The ecological fallacy (EF) is a common problem regional scientists have to deal with when using aggregated data in their analyses. Although there is a wide number of studies considering different aspects of this problem, little attention has been paid to the potential negative effects of the EF in a time series context. Using Spanish regional unemployment data, this paper shows that EF effects are not only observed at the cross-section level, but also in a time series framework. The empirical evidence obtained shows that analytical regional configurations are the least susceptible to time effects relative to both normative and random regional configurations, while normative configurations are an improvement over random ones.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10


  1. Nomenclature des Unites Territoriales Statistiques (NUTS) is the geographical system established by the Eurostat for the production of regional statistics within the European Union. According to Eurostat, “normative regions are the expression of a political will; their limits are fixed according to the tasks allocated to the territorial communities, to the sizes of population necessary to carry out these tasks efficiently and economically, or according to historical, cultural and other factors” (Eurostat, 2006).

  2. We discarded the use of the global Moran statistic due to the relatively low number of geographical units considered.

  3. Note the difference between cluster and region. A cluster does not satisfy spatial contiguity constraints, whereas a region does.

  4. The main characteristic of seeded regions is that each region is the result of selecting one area (seed area) to which other neighboring areas are assigned. This methodology was first proposed by Vickrey (1961) for solving districting problems.

  5. See Gordon (1999) for more information about other heterogeneity measures in classification models.

  6. The objective function values of k-means two stages have been expressed in terms of Eq. 2 in order to facilitate comparisons. The objective function values for NUTS aggregations are also expressed in terms of Eq. 2.

  7. See Appendix for a description of the decomposition of the Theil index for the within and the between components.

  8. Data on unemployment rates for the different levels of aggregation are freely available on the Spanish Instituto Nacional de Estadística’s website:

  9. To our knowledge, the only previous work that has considered this issue is Rey (2001).

  10. These simulations also take into account the nested configuration of both scales. Thus, every solution for 15 regions has its nested solution for 6 regions.


  • Alonso J, Izquierdo M (1999) Disparidades regionales en el empleo y el desempleo. Papeles de Economía Española 80:79–99

    Google Scholar 

  • Arbia G (1986) The modifiable areal unit problem and the spatial autocorrelation problem: towards a joint approach. Metron 44:391–407

    Google Scholar 

  • Batty M, Sikdar PK (1982) Spatial aggregation in gravity models. Environ Plan A 14:377–822

    Article  Google Scholar 

  • Bentolila S, Jimeno J (1995) Regional unemployment persistence: Spain 1976–1994. C.E.P.R. Discussion paper no. 1259

  • Blanchard O, Jimeno JF (1995) Structural unemployment: Spain versus Portugal. Am Econ Rev 85:212–218

    Google Scholar 

  • Commission of the European Communities, Eurostat, Unit A4 GISCO (1997) Geographical information systems in statistics. SUP.COM 95, Lot 115

  • Cressie N (1993) Statistics for spatial data. Wiley, New York

    Google Scholar 

  • Duque JC (2004) Design of homogeneous territorial units. A methodological proposal and applications. PhD thesis, University of Barcelona, Spain

  • Duque JC, Church RL (2004) A new heuristic model for designing analytical regions. In: North American Meeting of the International Regional Science Association, Seattle

  • Duque JC, Ramos R, Suriñach J (2006) Supervised regionalization methods: a survey. mimeo

  • Eurostat (2006) Nomenclature of territorial units for statistics—NUTS. Statistical regions of Europe. (06/19/2006)

  • Fischer MM (1980) Regional taxonomy—a comparison of some hierarchic and non hierarchic strategies. Reg Sci Urban Econ 10:503–537

    Article  Google Scholar 

  • Glover F (1977) Heuristic for integer programming using surrogate constraints. Decis Sci 8:156–166

    Google Scholar 

  • Glover F (1989) Tabu search. Part I. ORSA J Comput 1:190–206

    Google Scholar 

  • Glover F (1990) Tabu search. Part II. ORSA J Comput 2:4–32

    Google Scholar 

  • Gordon AD (1996) A survey of constrained classification. Comput Stat Data Anal 21:17–29

    Article  Google Scholar 

  • Gordon AD (1999) Classification, 2nd edn. Chapman & Hall, Boca Raton

    Google Scholar 

  • Gotway CA, Young LJ (2002) Combining incompatible spatial data. J Am Stat Assoc 97:632–648

    Article  Google Scholar 

  • Greenland S, Morganstern H (1989) Ecological bias, confounding, and effect modification. Int J Epidemiol 18:269–274

    Google Scholar 

  • Jimeno JF, Bentolila S (1998) Regional unemployment persistence (Spain, 1976–94). Labour Econ 5:25–51

    Article  Google Scholar 

  • López-Bazo E, del Barrio T, Artís M (2002) The regional distribution of Spanish unemployment: a spatial analysis. Pap Reg Sci 81:365–389

    Article  Google Scholar 

  • López-Bazo E, del Barrio T, Artís M (2005) Geographical distribution of unemployment in Spain. Reg Stud 3:305–318

    Article  Google Scholar 

  • MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley symposium on mathematical statistics and probability, vol. 1. University of California Press, Berkeley, pp. 281–297

  • Marimon R, Zilibotti F (1998) ‘Actual’ versus ‘virtual’ employment in Europe. Is Spain different? Eur Econ Rev 42:123–153

    Article  Google Scholar 

  • Martin D, Nolan A, Tranmer M (2001) The application of zone-design methodology in the 2001 UK census. Environ Plan A 33:1949–1962

    Article  Google Scholar 

  • Moran P (1948) The interpretation of statistical maps. J R Stat Soc B 10:243–251

    Google Scholar 

  • Murtagh F (1985) A survey of algorithms for contiguity-constrained clustering and related problems. Comput J 28:82–88

    Article  Google Scholar 

  • Norman P, Rees P, Boyle P (2003) Achieving data compatibility over space and time: creating consistent geographical zones. Int J Popul Geogr 9:365–386

    Article  Google Scholar 

  • Openshaw S (1973) A regionalisation algorithm for large datasets. Comput Appl 3–4:136–147

    Google Scholar 

  • Openshaw S (1977) A geographical solution to scale and aggregation problems in region-building, partitioning and spatial modeling. Trans Inst Br Geographers 2:459–472

    Article  Google Scholar 

  • Openshaw S (1984) The modifiable areal unit problem. Concepts and techniques in modern geography, vol. 38. GeoBooks, Norwich

  • Openshaw S, Rao L (1995) Algorithms for reengineering 1991 census geography. Environ Plan A 27:425–446

    Article  Google Scholar 

  • Openshaw S, Wymer C (1995) Classifying and regionalizing census data. In: Openshaw S (ed) Census users handbook. Geo Information International, Cambridge, UK, pp. 239–270

    Google Scholar 

  • Piantadosi S, Byar DP, Green SB (1988) The ecological fallacy. Am J Epidemiol 127:893–904

    Google Scholar 

  • Rey S (2001) Spatial analysis of regional income inequality. REAL discussion paper 01-T9

  • Richardson S (1992) Statistical methods for geographical correlation studies. In: Elliot P, Cuzick J, English D, Stern R (eds) Geographical and environmental epidemiology: methods for small area studies. Oxford University Press, New York, pp. 181–204

    Google Scholar 

  • Richardson S, Stucker L, Hemon D (1987) Comparison of relative risks obtained in ecological and individual studies: some methodological considerations. Int J Epidemiol 16:111–120

    Google Scholar 

  • Robinson WS (1950) Ecological correlations and the behavior of individuals. Am Sociol Rev 15:351–357

    Article  Google Scholar 

  • Theil H (1967) Economics and information theory. Rand McNally and Company, Chicago

    Google Scholar 

  • Vickrey W (1961) On the prevention of gerrymandering. Political Sci Q 76:105–110

    Article  Google Scholar 

  • Wise SM, Haining RP, Ma J (1997) Regionalization tools for exploratory spatial analysis of health data. In: Fisher MM, Getis A (eds) Recent developments in spatial analysis: spatial statistics, behavioural modelling, and computational intelligence. Springer, Berlin Heidelberg New York, pp. 83–100

    Google Scholar 

  • Wise SM, Haining RP, Ma J (2001) Providing spatial statistical data analysis functionality for the GIS user: the SAGE project. Int J Geogr Inf Sci 15:239–254

    Article  Google Scholar 

  • Yule GU, Kendall MG (1950) An introduction to the theory of statistics, 14th edn. Griffin, London

    Google Scholar 

Download references


The authors wish to thank three anonymous referees and Serge Rey, E. López-Bazo and E. Pons for their helpful comments and suggestions about previous versions of this paper, and Philip Stephens for editing. The usual disclaimer applies. Financial support is gratefully acknowledged from the CICYT SEJ2005-04348/ECON project.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Juan Carlos Duque.

Appendix: The Theil index

Appendix: The Theil index

The Theil index has been computed as follows:

$$ T = {\sum\limits_{p = 1}^n {\frac{{u_{p} }} {U}} }\log {\left[ {\frac{{u_{p} /U}} {{1/n}}} \right]} $$

where n is the number of provinces (47), u p is the provincial unemployment rate indexed by p, and U represents the Spanish unemployment rate \( U = {\sum\nolimits_{p = 1}^n {u_{p} } }. \)

Overall inequality can be completely and perfectly decomposed into a between-group component T g and a within-group component (T wg ) Thus: T = T g  + T wg With \( T^{'}_{{\text{g}}} = {\sum\limits_{i = 1}^m {\frac{{U{}_{i}}} {U}\log {\left[ {\frac{{U_{i} /U}} {{n_{i} /n}}} \right]}} } \) where i indexes regions, with n i representing the number of provinces in group i, and U i the unemployment rate in region i, and \( T^{{\text{W}}}_{{\text{g}}} = {\sum\limits_{t = 1}^m {\frac{{U_{i} }} {U}{\sum\limits_{p = 1}^{n_{i} } {\frac{{u_{{ip}} }} {{U_{i} }}} }\log {\left[ {\frac{{u_{{ip}} /U_{i} }} {{1/n_{i} }}} \right]}} }, \) where each provincial unemployment rate is indexed by two subscripts: i for the unique region to which the province belongs, and subscript p, where, in each region, p goes from 1 to n i .

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Duque, J.C., Artís, M. & Ramos, R. The ecological fallacy in a time series context: evidence from Spanish regional unemployment rates. J Geograph Syst 8, 391–410 (2006).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: