Abstract
The ecological fallacy (EF) is a common problem regional scientists have to deal with when using aggregated data in their analyses. Although there is a wide number of studies considering different aspects of this problem, little attention has been paid to the potential negative effects of the EF in a time series context. Using Spanish regional unemployment data, this paper shows that EF effects are not only observed at the cross-section level, but also in a time series framework. The empirical evidence obtained shows that analytical regional configurations are the least susceptible to time effects relative to both normative and random regional configurations, while normative configurations are an improvement over random ones.
This is a preview of subscription content, access via your institution.











Notes
Nomenclature des Unites Territoriales Statistiques (NUTS) is the geographical system established by the Eurostat for the production of regional statistics within the European Union. According to Eurostat, “normative regions are the expression of a political will; their limits are fixed according to the tasks allocated to the territorial communities, to the sizes of population necessary to carry out these tasks efficiently and economically, or according to historical, cultural and other factors” (Eurostat, 2006).
We discarded the use of the global Moran statistic due to the relatively low number of geographical units considered.
Note the difference between cluster and region. A cluster does not satisfy spatial contiguity constraints, whereas a region does.
The main characteristic of seeded regions is that each region is the result of selecting one area (seed area) to which other neighboring areas are assigned. This methodology was first proposed by Vickrey (1961) for solving districting problems.
See Gordon (1999) for more information about other heterogeneity measures in classification models.
The objective function values of k-means two stages have been expressed in terms of Eq. 2 in order to facilitate comparisons. The objective function values for NUTS aggregations are also expressed in terms of Eq. 2.
See Appendix for a description of the decomposition of the Theil index for the within and the between components.
Data on unemployment rates for the different levels of aggregation are freely available on the Spanish Instituto Nacional de Estadística’s website: http://www.ine.es.
To our knowledge, the only previous work that has considered this issue is Rey (2001).
These simulations also take into account the nested configuration of both scales. Thus, every solution for 15 regions has its nested solution for 6 regions.
References
Alonso J, Izquierdo M (1999) Disparidades regionales en el empleo y el desempleo. Papeles de Economía Española 80:79–99
Arbia G (1986) The modifiable areal unit problem and the spatial autocorrelation problem: towards a joint approach. Metron 44:391–407
Batty M, Sikdar PK (1982) Spatial aggregation in gravity models. Environ Plan A 14:377–822
Bentolila S, Jimeno J (1995) Regional unemployment persistence: Spain 1976–1994. C.E.P.R. Discussion paper no. 1259
Blanchard O, Jimeno JF (1995) Structural unemployment: Spain versus Portugal. Am Econ Rev 85:212–218
Commission of the European Communities, Eurostat, Unit A4 GISCO (1997) Geographical information systems in statistics. SUP.COM 95, Lot 115
Cressie N (1993) Statistics for spatial data. Wiley, New York
Duque JC (2004) Design of homogeneous territorial units. A methodological proposal and applications. PhD thesis, University of Barcelona, Spain
Duque JC, Church RL (2004) A new heuristic model for designing analytical regions. In: North American Meeting of the International Regional Science Association, Seattle
Duque JC, Ramos R, Suriñach J (2006) Supervised regionalization methods: a survey. mimeo
Eurostat (2006) Nomenclature of territorial units for statistics—NUTS. Statistical regions of Europe. http://europa.eu.int/comm/eurostat/ramon/nuts/home_regions_en.html (06/19/2006)
Fischer MM (1980) Regional taxonomy—a comparison of some hierarchic and non hierarchic strategies. Reg Sci Urban Econ 10:503–537
Glover F (1977) Heuristic for integer programming using surrogate constraints. Decis Sci 8:156–166
Glover F (1989) Tabu search. Part I. ORSA J Comput 1:190–206
Glover F (1990) Tabu search. Part II. ORSA J Comput 2:4–32
Gordon AD (1996) A survey of constrained classification. Comput Stat Data Anal 21:17–29
Gordon AD (1999) Classification, 2nd edn. Chapman & Hall, Boca Raton
Gotway CA, Young LJ (2002) Combining incompatible spatial data. J Am Stat Assoc 97:632–648
Greenland S, Morganstern H (1989) Ecological bias, confounding, and effect modification. Int J Epidemiol 18:269–274
Jimeno JF, Bentolila S (1998) Regional unemployment persistence (Spain, 1976–94). Labour Econ 5:25–51
López-Bazo E, del Barrio T, Artís M (2002) The regional distribution of Spanish unemployment: a spatial analysis. Pap Reg Sci 81:365–389
López-Bazo E, del Barrio T, Artís M (2005) Geographical distribution of unemployment in Spain. Reg Stud 3:305–318
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley symposium on mathematical statistics and probability, vol. 1. University of California Press, Berkeley, pp. 281–297
Marimon R, Zilibotti F (1998) ‘Actual’ versus ‘virtual’ employment in Europe. Is Spain different? Eur Econ Rev 42:123–153
Martin D, Nolan A, Tranmer M (2001) The application of zone-design methodology in the 2001 UK census. Environ Plan A 33:1949–1962
Moran P (1948) The interpretation of statistical maps. J R Stat Soc B 10:243–251
Murtagh F (1985) A survey of algorithms for contiguity-constrained clustering and related problems. Comput J 28:82–88
Norman P, Rees P, Boyle P (2003) Achieving data compatibility over space and time: creating consistent geographical zones. Int J Popul Geogr 9:365–386
Openshaw S (1973) A regionalisation algorithm for large datasets. Comput Appl 3–4:136–147
Openshaw S (1977) A geographical solution to scale and aggregation problems in region-building, partitioning and spatial modeling. Trans Inst Br Geographers 2:459–472
Openshaw S (1984) The modifiable areal unit problem. Concepts and techniques in modern geography, vol. 38. GeoBooks, Norwich
Openshaw S, Rao L (1995) Algorithms for reengineering 1991 census geography. Environ Plan A 27:425–446
Openshaw S, Wymer C (1995) Classifying and regionalizing census data. In: Openshaw S (ed) Census users handbook. Geo Information International, Cambridge, UK, pp. 239–270
Piantadosi S, Byar DP, Green SB (1988) The ecological fallacy. Am J Epidemiol 127:893–904
Rey S (2001) Spatial analysis of regional income inequality. REAL discussion paper 01-T9
Richardson S (1992) Statistical methods for geographical correlation studies. In: Elliot P, Cuzick J, English D, Stern R (eds) Geographical and environmental epidemiology: methods for small area studies. Oxford University Press, New York, pp. 181–204
Richardson S, Stucker L, Hemon D (1987) Comparison of relative risks obtained in ecological and individual studies: some methodological considerations. Int J Epidemiol 16:111–120
Robinson WS (1950) Ecological correlations and the behavior of individuals. Am Sociol Rev 15:351–357
Theil H (1967) Economics and information theory. Rand McNally and Company, Chicago
Vickrey W (1961) On the prevention of gerrymandering. Political Sci Q 76:105–110
Wise SM, Haining RP, Ma J (1997) Regionalization tools for exploratory spatial analysis of health data. In: Fisher MM, Getis A (eds) Recent developments in spatial analysis: spatial statistics, behavioural modelling, and computational intelligence. Springer, Berlin Heidelberg New York, pp. 83–100
Wise SM, Haining RP, Ma J (2001) Providing spatial statistical data analysis functionality for the GIS user: the SAGE project. Int J Geogr Inf Sci 15:239–254
Yule GU, Kendall MG (1950) An introduction to the theory of statistics, 14th edn. Griffin, London
Acknowledgments
The authors wish to thank three anonymous referees and Serge Rey, E. López-Bazo and E. Pons for their helpful comments and suggestions about previous versions of this paper, and Philip Stephens for editing. The usual disclaimer applies. Financial support is gratefully acknowledged from the CICYT SEJ2005-04348/ECON project.
Author information
Authors and Affiliations
Corresponding author
Appendix: The Theil index
Appendix: The Theil index
The Theil index has been computed as follows:
where n is the number of provinces (47), u p is the provincial unemployment rate indexed by p, and U represents the Spanish unemployment rate \( U = {\sum\nolimits_{p = 1}^n {u_{p} } }. \)
Overall inequality can be completely and perfectly decomposed into a between-group component T ′g and a within-group component (T wg ) Thus: T = T ′g + T wg With \( T^{'}_{{\text{g}}} = {\sum\limits_{i = 1}^m {\frac{{U{}_{i}}} {U}\log {\left[ {\frac{{U_{i} /U}} {{n_{i} /n}}} \right]}} } \) where i indexes regions, with n i representing the number of provinces in group i, and U i the unemployment rate in region i, and \( T^{{\text{W}}}_{{\text{g}}} = {\sum\limits_{t = 1}^m {\frac{{U_{i} }} {U}{\sum\limits_{p = 1}^{n_{i} } {\frac{{u_{{ip}} }} {{U_{i} }}} }\log {\left[ {\frac{{u_{{ip}} /U_{i} }} {{1/n_{i} }}} \right]}} }, \) where each provincial unemployment rate is indexed by two subscripts: i for the unique region to which the province belongs, and subscript p, where, in each region, p goes from 1 to n i .
Rights and permissions
About this article
Cite this article
Duque, J.C., Artís, M. & Ramos, R. The ecological fallacy in a time series context: evidence from Spanish regional unemployment rates. J Geograph Syst 8, 391–410 (2006). https://doi.org/10.1007/s10109-006-0033-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10109-006-0033-x