Abstract
This paper is concerned with simultaneously regrouping regions and sectors when analyzing the relative sectorial specialization of regions and the relative regional concentration of sectors. An automatic two-mode clustering algorithm is proposed with a view toward a concept of overall localization, corresponding to a discrepancy between an actual two-way contingency table (regions \(\times \) sectors) and an hypothetical table reflecting independence between regions and sectors. This procedure identifies similar regions (respectively sectors) according to the relative sectorial (respectively regional) structure. This algorithm significantly reduces the size of the original table and obtain an optimal collapsed table with low level of information loss vis-à-vis the degree of overall localization. The properties and results of the algorithm are discussed through two applications, namely Argentina and Brazil.
Similar content being viewed by others
References
Alonso-Villar O, del Río C (2013) Concentration of economic activity: an analytical framework. Reg Stud 47:756–772
Baldwin RE, Martin P (2004) Agglomeration and regional growth. In: Henderson JV, Thisse JF (eds) Handbook of urban and regional economics. Elsevier, Amsterdam
Banerjee A, Dhillon I, Ghosh J, Merugu S, Modha DS (2007) A generalized maximum entropy approach to Bregman co-clustering and matrix approximation
Ben Saber H, Elloumi M (2015) DNA microarray data analysis: a new survey on biclustering. Int J Comput Biol 4:21–37
Benabdeslem K, Allab K (2013) Bi-clustering continuous data with self-organizing map. Neural Computing and Applications 22: 1551–1562
Benzécri JP (1973) Analyse des Données. Dunod, París
Benzécri JP (1992) Correspondence analysis handbook. Dekker, New York
Bhattacharya A, Cui Y (2017) A GPU-accelerated algorithm for biclustering analysis and detection of condition-dependent coexpression network modules. Sci Rep 7:4162. https://doi.org/10.1038/s41598-017-04070-4
Bickenbach F, Bode E (2006) Disproportionality measures of concentration, specialization and polarization. Kiel Institute for the World Economy, working paper 1276
Bickenbach F, Bode E (2008) Disproportionality measures of concentration, specialization and localization. Int Reg Sci Rev 31:359–388
Bickenbach F, Bode E, Krieger-Boden C (2010) Closing the gap between absolute and relative measures of localization, concentration or specialization. Kiel Institute for the World Economy, working paper 1660
Bock HH (1979) Simultaneous clustering of objects and variables. In: INRIA, pp 187–203
Branson D (2000) Stirling numbers and Bell numbers: their role in combinatorics and probability. Math Sci 25:1–31
Braverman EM, Kiseleva NE, Muchnik IB, Novikov SG (1974) Linguistic approach to the problem of processing large bodies of data. Autom Remote Control 35:1768–1788
Busygin S, Prokopyev O, Pardalos PM (2008) Biclustering in data mining. Comput Oper Res 35:2964–2987
Caldas J, Kaski S (2011) Hierarchical generative biclustering for microrna expression analysis. J Comput Biol 18:251–261
Cazes P (1986) Correspondance entre deux ensembles et partition de ces deux ensembles. Les Cahiers de l’Analyse des Données 11:335–340
Charrad M, Lechevallier Y, Saporta G, Ben Ahmed M (2009) Détermination du nombre des classes dans l’algorithme croki de classification croisée. In: EGC, pp 447–448
Cheng Y, Church GM (2000) Biclustering of expression data. In: ISMB, pp 93–103
Ciampi A, González Marcos A, Castejón Limas M (2005) Correspondence analysis and two-way clustering. Stat Oper Res Trans 29:27–42
Combes P, Gobillon L (2015) The empirics of agglomeration economies. In: Duranton G, Henderson JV, Strange WC (eds) Handbook of regional and urban economics. Elsevier, Amsterdam
Corsten L, Denis J (1990) Structuring interaction in two-way tables by clustering. Biometrics 46:207–215
Cottineau C, Finance O, Hatna E, Arcaute E, Batty M (2018) Defining urban clusters to detect agglomeration economies. Environ Plan B: Urban Anal City Sci. https://doi.org/10.1177/2399808318755146
Denis JB, Vincourt P (1982) Panorama des méthodes statistiques d’analyse des interactions genotype \( \times \) milieu. Agronomie 2:219–230
Donato V (2002) Políticas públicas y localización industrial en Argentina. Fundación Observatorio PyME, Buenos Aires, CIDETI working paper 2002/01
Duffy DE, Quiroz AJ (1991) A permutation-based algorithm for block clustering. J Classif 8:65–91
Duranton G, Puga D (2000) Diversity and specialization in cities: why, where and when does it matter? Urban Stud 37:533–555
Eckart C, Young G (1936) The approximation of one matrix by another of lower rank. Psychometrika 1:211–218
Ellison G, Glaeser EL, Kerr WR (2010) What causes industry agglomeration? Evidence from coagglomeration patterns. Am Econ Rev 100:1195–1213
Escofier B (1978) Analyse factorielle et distances répondant au principe d’équivalence distributionnelle. Rev Stat Appl 16:29–37
Florence P (1939) Report of the location of industry. Political and Economic Planning, London
Fujita M, Krugman P, Venables A (2001) The spatial economy. Cities, regions, and international trade. MIT Press, Cambridge
Fujita M, Thisse J-F (2002) Economics of agglomeration. Cities, industrial location, and regional growth. Cambridge University Press, Cambridge
Gan L, Jiang J (1999) A test for global maximum. J Am Stat Assoc 94:847–854
Gardner M (1978) The Bells: versatile numbers that can count partitions of a set, primes and even rhymes. Sci Am 238:24–30
Gilula Z (1986) Grouping and associations in contingency tables: an exploratory canonical correlation approach. J Am Stat Assoc 81:773–779
Goodman L (1981) Criteria for determining whether certain categories in a cross-classification table should be combined with special reference to occupational categories in an occupational mobility table. Am J Sociol 87:612–650
Goodman L (1985) The analysis of cross-classified data having ordered and/or unordered categories: association models, correlation models, and asymmetry models for contingency tables with or without missing entries. Ann Stat 13:10–69
Govaert G (1977) Algorithme de classification d’un tableau de contingence. In: INRIA, pp 487–500
Govaert G (1995) Simultaneous clustering of rows and columns. Control Cybern 24(4):437–458
Govaert G, Nadif M (2008) Block clustering with Bernoulli mixture models: comparison of different approaches. Comput Stat Data Anal 52:3233–3245
Govaert G, Nadif M (2010) Latent block model for contingency tables. Commun Stat Theory Methods 3:416–425
Govaert G, Nadif M (2013) Co-clustering. Wiley, Hoboken
Greenacre MJ (1984) Theory and applications of correspondence analysis. Academic Press, London
Greenacre MJ (1988) Clustering the rows and columns of a contingency table. J Classif 5:39–51
Greenacre MJ (1993) Multivariate generalizations of correspondence analysis. In: Cuadras CM, Rao CR (eds) Multivariate analysis: future directions 2. North-Holland, Amsterdam
Greenacre MJ (2007) Correspondence analysis in practice. Chapman & Hall/CRC, Boca Raton
Greenacre MJ (2011) A simple permutation test for clusteredness. Barcelona GSE working paper 555
Guimarães P, Figueiredo O, Woodward D (2003) A tractable approach to the firm location decision problem. Rev Econ Stat 84:201–204
Guimarães P, Figueiredo O, Woodward D (2009) Dartboard tests for the location quotient. Reg Sci Urban Econ 39:360–364
Haedo C (2009) Measure of global specialization and spatial clustering for the identification of “Specialized” Agglomeration. Ph.D. thesis, Dipartimento di Scienze Statistiche “P. Fortunati”, Università di Bologna, Bologna. http://amsdottorato.cib.unibo.it/1735/1/Christian_Haedo_tesi.pdf
Haedo C, Mouchart M (2015a) Specialized agglomerations with lattice data: model and detection. Spatial Stat 11:113–131
Haedo C, Mouchart M (2015b) Methodological framework for the analysis of industrial geographical data, part of the project Mapas Industriales de América Latina y el Caribe (MIALC). Fundación Observatorio PyME, Buenos Aires. https://www.geoecon.info/slides/slide/metodologia-1
Haedo C, Mouchart M (2018) A stochastic independence approach for different measures of concentration and specialization. Pap Reg Sci 97:1151–1168
Hahsler M, Piekenbrock M, Doran D (2019) dbscan: Fast density-based clustering with R. J Stat Softw 91:1–30. https://doi.org/10.18637/jss.v091.i01
Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67:123–129
Hausmann R, Hidalgo CA, Bustos S, Coscia M, Chung S, Jimenez J, Simoes AR, Yildirim MA (2015) Atlas of economic complexity: mapping paths to prosperity. MIT Press, Cambridge. http://atlas.cid.harvard.edu/media/atlas/pdf/HarvardMIT_AtlasOfEconomicComplexity.pdf
Hirotsu C (1983) Defining the pattern of association in two-way contingency tables. Biometrika 70:579–589
Jagalur M, Pal C, Learned-Miller E, Zoeller RT, Kulp D (2007) Analyzing in situ gene expression in the mouse brain with image registration, feature extraction and block clustering. BMC Bioinform 8:S5
Jambu M (1978) Classification Automatique pour l’Analyse des Données, I- Méthodes et Algorithms. Dunod, Paris
Jobson J (1992) Applied multivariate data analysis. Volume II: categorical and multivariate methods. Springer, New York
Keribin C, Brault V, Celeux G, Govaert G (2015) Estimation and selection for the latent block model on categorical data. Stat Comput 25:1201–1216
Lebart L, Mirkin BG (1993) Correspondence analysis and classification. In: Cuadras CM, Rao CR (eds) Multivariate analysis: future directions. North-Holland, Amsterdam
Lebart L, Morineau A, Warwick KH (1984) Multivariate descriptive statistical analysis. Wiley, New York
Liu H, Zou J, Ravishanker N (2018) Multiple day biclustering of high-frequency financial time series. Stat 7:e176. https://doi.org/10.1002/sta4.176
Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinf 1:24–45
Mardia K, Kent J, Bibby J (1979) Multivariate analysis. Academic Press, London
Marinelli C, Winzer N (2004) Agrupamiento de filas y columnas homogéneas en modelos de correspondencia. Revista de Matemática: Teoría y Aplicaciones 11:59–68
Mirkin B (1996) Mathematical classification and clustering. Kluwer, Dordrecht
Moineddin R, Beyene J, Boyle E (2003) On the location quotient confidence interval. Geogr Anal 35:249–256
Nathan M, Overman H (2013) Agglomeration, clusters, and industrial policy. Oxf Rev Econ Policy 29:383–404
O’Donoghue D, Gleave B (2004) A note on methods for measuring industrial agglomeration. Reg Stud 38:419–427
Orzechowski P, Sipper S, Huang X, Moore JH (2018) EBIC: an evolutionary-based parallel biclustering algorithm for pattern discovery. Bioinformatics 34:3719–3726. https://doi.org/10.1093/bioinformatics/bty401
Park PJ, Manjourides J, Bonetti M, Paganob M (2009) A permutation test for determining significance of clusters with applications to spatial and gene expression data. Comput Stat Data Anal 53:4290–4300
Puga D (2010) The magnitude and causes of agglomeration economies. J Reg Sci 50:203–219
Rao CR (1995) A review of canonical coordinates and an alternative to correspondence analysis using Hellinger distance. QÜESTIIÓ 19:23–63
Rosenthal S, Strange WC (2004) Evidence on the nature and sources of agglomeration economies. In: Henderson JV, Thisse JF (eds) Handbook of urban and regional economics. Elsevier, Amsterdam
Rota G-C (1964) The number of partitions of a set. Am Math Mon 71:498–504
Schepers J, Bock H-H, Van Mechelen I (2017) Maximal interaction two-mode clustering. J Classif 34:49–75
Sloane NJA (2001) Bell numbers. In: Hazewinkel M (ed) Encyclopedia of mathematics. Springer, New York
Tang C, Zhang L, Zhang A, Ramanathan M (2001) Interrelated two-way clustering: an unsupervised approach for gene expression data analysis. In: BIBE, pp 41–48
Tibshirani R, Hastie T, Eisen M, Ross D, Botstein D, Brown P (1999) Clustering methods for the analysis of dna microarray data. Technical report, Department of Statistics, Stanford University
Tobler WR (1970) A computer movie simulating urban growth in the Detroit region. Econ Geogr 46:234–240
Van Mechelen I, Bock H-H, De Boeck P (2004) Two-mode clustering methods: a structured overview. Stat Methods Med Res 13:363–394
Viladecans-Marsal E (2004) Agglomeration economies and industrial location: city-level evidence. J Econ Geogr 4:565–582
Ward JH Jr (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58:236–244
Funding
Michel Mouchart gratefully acknowledges financial support from IAP research network Grant No. P6/03 of the Belgian government (Belgian Science Policy). Both authors gratefully acknowledge the financial support of FOP, that promoted intercontinental cooperation.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declares that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Michel Mouchart gratefully acknowledges financial support from IAP research network grant nrP6/03 of the Belgian government (Belgian Science Policy). Both authors gratefully acknowledge the financial support of FOP, that promoted intercontinental cooperation. A special thank is due to Vicente N. Donato for the impetus he gave to the development of the topic of this paper. Dominique Peeters also deserves a particular gratitude for a series of comments that lead to substantial improvements of a previous version of this paper. The help of Fernando Valli has been instrumental in shaping the algorithm developed in this paper and is deeply acknowledged. Highly appreciated and gratefully acknowledged are also comments given by technical analysts participating to seminars where this paper has been presented, namely at the Competitiveness and Innovation Division of the Inter-American Development Bank (IADB) in Madrid (Sp) and Washington (USA), at the Structural Policies and Innovation Unit of the OECD Development Centre in Paris (Fr) and at the Farnesina of the Ministero degli Affari Esteri e della Cooperazione Internazionale (MAECI) in Roma (It).
Rights and permissions
About this article
Cite this article
Haedo, C., Mouchart, M. Two-mode clustering through profiles of regions and sectors. Empir Econ 63, 1971–1996 (2022). https://doi.org/10.1007/s00181-022-02201-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00181-022-02201-z