Two-mode clustering through profiles of regions and sectors

Haedo, Christian; Mouchart, Michel

doi:10.1007/s00181-022-02201-z

Two-mode clustering through profiles of regions and sectors

Published: 10 March 2022

Volume 63, pages 1971–1996, (2022)
Cite this article

Empirical Economics Aims and scope Submit manuscript

178 Accesses
1 Citation
2 Altmetric
Explore all metrics

Abstract

This paper is concerned with simultaneously regrouping regions and sectors when analyzing the relative sectorial specialization of regions and the relative regional concentration of sectors. An automatic two-mode clustering algorithm is proposed with a view toward a concept of overall localization, corresponding to a discrepancy between an actual two-way contingency table (regions \(\times \) sectors) and an hypothetical table reflecting independence between regions and sectors. This procedure identifies similar regions (respectively sectors) according to the relative sectorial (respectively regional) structure. This algorithm significantly reduces the size of the original table and obtain an optimal collapsed table with low level of information loss vis-à-vis the degree of overall localization. The properties and results of the algorithm are discussed through two applications, namely Argentina and Brazil.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Socioeconomic Zoning: Comparing Two Statistical Methods

A Constrained Cluster Analysis with Homogeneity of External Criterion

Issues on Clustering and Data Gridding

References

Alonso-Villar O, del Río C (2013) Concentration of economic activity: an analytical framework. Reg Stud 47:756–772
Article Google Scholar
Baldwin RE, Martin P (2004) Agglomeration and regional growth. In: Henderson JV, Thisse JF (eds) Handbook of urban and regional economics. Elsevier, Amsterdam
Google Scholar
Banerjee A, Dhillon I, Ghosh J, Merugu S, Modha DS (2007) A generalized maximum entropy approach to Bregman co-clustering and matrix approximation
Ben Saber H, Elloumi M (2015) DNA microarray data analysis: a new survey on biclustering. Int J Comput Biol 4:21–37
Article Google Scholar
Benabdeslem K, Allab K (2013) Bi-clustering continuous data with self-organizing map. Neural Computing and Applications 22: 1551–1562
Benzécri JP (1973) Analyse des Données. Dunod, París
Google Scholar
Benzécri JP (1992) Correspondence analysis handbook. Dekker, New York
Book Google Scholar
Bhattacharya A, Cui Y (2017) A GPU-accelerated algorithm for biclustering analysis and detection of condition-dependent coexpression network modules. Sci Rep 7:4162. https://doi.org/10.1038/s41598-017-04070-4
Article Google Scholar
Bickenbach F, Bode E (2006) Disproportionality measures of concentration, specialization and polarization. Kiel Institute for the World Economy, working paper 1276
Bickenbach F, Bode E (2008) Disproportionality measures of concentration, specialization and localization. Int Reg Sci Rev 31:359–388
Article Google Scholar
Bickenbach F, Bode E, Krieger-Boden C (2010) Closing the gap between absolute and relative measures of localization, concentration or specialization. Kiel Institute for the World Economy, working paper 1660
Bock HH (1979) Simultaneous clustering of objects and variables. In: INRIA, pp 187–203
Branson D (2000) Stirling numbers and Bell numbers: their role in combinatorics and probability. Math Sci 25:1–31
Google Scholar
Braverman EM, Kiseleva NE, Muchnik IB, Novikov SG (1974) Linguistic approach to the problem of processing large bodies of data. Autom Remote Control 35:1768–1788
Google Scholar
Busygin S, Prokopyev O, Pardalos PM (2008) Biclustering in data mining. Comput Oper Res 35:2964–2987
Article Google Scholar
Caldas J, Kaski S (2011) Hierarchical generative biclustering for microrna expression analysis. J Comput Biol 18:251–261
Article Google Scholar
Cazes P (1986) Correspondance entre deux ensembles et partition de ces deux ensembles. Les Cahiers de l’Analyse des Données 11:335–340
Google Scholar
Charrad M, Lechevallier Y, Saporta G, Ben Ahmed M (2009) Détermination du nombre des classes dans l’algorithme croki de classification croisée. In: EGC, pp 447–448
Cheng Y, Church GM (2000) Biclustering of expression data. In: ISMB, pp 93–103
Ciampi A, González Marcos A, Castejón Limas M (2005) Correspondence analysis and two-way clustering. Stat Oper Res Trans 29:27–42
Google Scholar
Combes P, Gobillon L (2015) The empirics of agglomeration economies. In: Duranton G, Henderson JV, Strange WC (eds) Handbook of regional and urban economics. Elsevier, Amsterdam
Google Scholar
Corsten L, Denis J (1990) Structuring interaction in two-way tables by clustering. Biometrics 46:207–215
Article Google Scholar
Cottineau C, Finance O, Hatna E, Arcaute E, Batty M (2018) Defining urban clusters to detect agglomeration economies. Environ Plan B: Urban Anal City Sci. https://doi.org/10.1177/2399808318755146
Article Google Scholar
Denis JB, Vincourt P (1982) Panorama des méthodes statistiques d’analyse des interactions genotype \( \times \) milieu. Agronomie 2:219–230
Article Google Scholar
Donato V (2002) Políticas públicas y localización industrial en Argentina. Fundación Observatorio PyME, Buenos Aires, CIDETI working paper 2002/01
Duffy DE, Quiroz AJ (1991) A permutation-based algorithm for block clustering. J Classif 8:65–91
Article Google Scholar
Duranton G, Puga D (2000) Diversity and specialization in cities: why, where and when does it matter? Urban Stud 37:533–555
Article Google Scholar
Eckart C, Young G (1936) The approximation of one matrix by another of lower rank. Psychometrika 1:211–218
Article Google Scholar
Ellison G, Glaeser EL, Kerr WR (2010) What causes industry agglomeration? Evidence from coagglomeration patterns. Am Econ Rev 100:1195–1213
Article Google Scholar
Escofier B (1978) Analyse factorielle et distances répondant au principe d’équivalence distributionnelle. Rev Stat Appl 16:29–37
Google Scholar
Florence P (1939) Report of the location of industry. Political and Economic Planning, London
Google Scholar
Fujita M, Krugman P, Venables A (2001) The spatial economy. Cities, regions, and international trade. MIT Press, Cambridge
Google Scholar
Fujita M, Thisse J-F (2002) Economics of agglomeration. Cities, industrial location, and regional growth. Cambridge University Press, Cambridge
Book Google Scholar
Gan L, Jiang J (1999) A test for global maximum. J Am Stat Assoc 94:847–854
Article Google Scholar
Gardner M (1978) The Bells: versatile numbers that can count partitions of a set, primes and even rhymes. Sci Am 238:24–30
Article Google Scholar
Gilula Z (1986) Grouping and associations in contingency tables: an exploratory canonical correlation approach. J Am Stat Assoc 81:773–779
Article Google Scholar
Goodman L (1981) Criteria for determining whether certain categories in a cross-classification table should be combined with special reference to occupational categories in an occupational mobility table. Am J Sociol 87:612–650
Article Google Scholar
Goodman L (1985) The analysis of cross-classified data having ordered and/or unordered categories: association models, correlation models, and asymmetry models for contingency tables with or without missing entries. Ann Stat 13:10–69
Article Google Scholar
Govaert G (1977) Algorithme de classification d’un tableau de contingence. In: INRIA, pp 487–500
Govaert G (1995) Simultaneous clustering of rows and columns. Control Cybern 24(4):437–458
Google Scholar
Govaert G, Nadif M (2008) Block clustering with Bernoulli mixture models: comparison of different approaches. Comput Stat Data Anal 52:3233–3245
Article Google Scholar
Govaert G, Nadif M (2010) Latent block model for contingency tables. Commun Stat Theory Methods 3:416–425
Article Google Scholar
Govaert G, Nadif M (2013) Co-clustering. Wiley, Hoboken
Book Google Scholar
Greenacre MJ (1984) Theory and applications of correspondence analysis. Academic Press, London
Google Scholar
Greenacre MJ (1988) Clustering the rows and columns of a contingency table. J Classif 5:39–51
Article Google Scholar
Greenacre MJ (1993) Multivariate generalizations of correspondence analysis. In: Cuadras CM, Rao CR (eds) Multivariate analysis: future directions 2. North-Holland, Amsterdam
Google Scholar
Greenacre MJ (2007) Correspondence analysis in practice. Chapman & Hall/CRC, Boca Raton
Book Google Scholar
Greenacre MJ (2011) A simple permutation test for clusteredness. Barcelona GSE working paper 555
Guimarães P, Figueiredo O, Woodward D (2003) A tractable approach to the firm location decision problem. Rev Econ Stat 84:201–204
Article Google Scholar
Guimarães P, Figueiredo O, Woodward D (2009) Dartboard tests for the location quotient. Reg Sci Urban Econ 39:360–364
Article Google Scholar
Haedo C (2009) Measure of global specialization and spatial clustering for the identification of “Specialized” Agglomeration. Ph.D. thesis, Dipartimento di Scienze Statistiche “P. Fortunati”, Università di Bologna, Bologna. http://amsdottorato.cib.unibo.it/1735/1/Christian_Haedo_tesi.pdf
Haedo C, Mouchart M (2015a) Specialized agglomerations with lattice data: model and detection. Spatial Stat 11:113–131
Article Google Scholar
Haedo C, Mouchart M (2015b) Methodological framework for the analysis of industrial geographical data, part of the project Mapas Industriales de América Latina y el Caribe (MIALC). Fundación Observatorio PyME, Buenos Aires. https://www.geoecon.info/slides/slide/metodologia-1
Haedo C, Mouchart M (2018) A stochastic independence approach for different measures of concentration and specialization. Pap Reg Sci 97:1151–1168
Article Google Scholar
Hahsler M, Piekenbrock M, Doran D (2019) dbscan: Fast density-based clustering with R. J Stat Softw 91:1–30. https://doi.org/10.18637/jss.v091.i01
Article Google Scholar
Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67:123–129
Article Google Scholar
Hausmann R, Hidalgo CA, Bustos S, Coscia M, Chung S, Jimenez J, Simoes AR, Yildirim MA (2015) Atlas of economic complexity: mapping paths to prosperity. MIT Press, Cambridge. http://atlas.cid.harvard.edu/media/atlas/pdf/HarvardMIT_AtlasOfEconomicComplexity.pdf
Hirotsu C (1983) Defining the pattern of association in two-way contingency tables. Biometrika 70:579–589
Article Google Scholar
Jagalur M, Pal C, Learned-Miller E, Zoeller RT, Kulp D (2007) Analyzing in situ gene expression in the mouse brain with image registration, feature extraction and block clustering. BMC Bioinform 8:S5
Article Google Scholar
Jambu M (1978) Classification Automatique pour l’Analyse des Données, I- Méthodes et Algorithms. Dunod, Paris
Google Scholar
Jobson J (1992) Applied multivariate data analysis. Volume II: categorical and multivariate methods. Springer, New York
Book Google Scholar
Keribin C, Brault V, Celeux G, Govaert G (2015) Estimation and selection for the latent block model on categorical data. Stat Comput 25:1201–1216
Article Google Scholar
Lebart L, Mirkin BG (1993) Correspondence analysis and classification. In: Cuadras CM, Rao CR (eds) Multivariate analysis: future directions. North-Holland, Amsterdam
Google Scholar
Lebart L, Morineau A, Warwick KH (1984) Multivariate descriptive statistical analysis. Wiley, New York
Google Scholar
Liu H, Zou J, Ravishanker N (2018) Multiple day biclustering of high-frequency financial time series. Stat 7:e176. https://doi.org/10.1002/sta4.176
Article Google Scholar
Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinf 1:24–45
Article Google Scholar
Mardia K, Kent J, Bibby J (1979) Multivariate analysis. Academic Press, London
Google Scholar
Marinelli C, Winzer N (2004) Agrupamiento de filas y columnas homogéneas en modelos de correspondencia. Revista de Matemática: Teoría y Aplicaciones 11:59–68
Google Scholar
Mirkin B (1996) Mathematical classification and clustering. Kluwer, Dordrecht
Book Google Scholar
Moineddin R, Beyene J, Boyle E (2003) On the location quotient confidence interval. Geogr Anal 35:249–256
Article Google Scholar
Nathan M, Overman H (2013) Agglomeration, clusters, and industrial policy. Oxf Rev Econ Policy 29:383–404
Article Google Scholar
O’Donoghue D, Gleave B (2004) A note on methods for measuring industrial agglomeration. Reg Stud 38:419–427
Article Google Scholar
Orzechowski P, Sipper S, Huang X, Moore JH (2018) EBIC: an evolutionary-based parallel biclustering algorithm for pattern discovery. Bioinformatics 34:3719–3726. https://doi.org/10.1093/bioinformatics/bty401
Article Google Scholar
Park PJ, Manjourides J, Bonetti M, Paganob M (2009) A permutation test for determining significance of clusters with applications to spatial and gene expression data. Comput Stat Data Anal 53:4290–4300
Article Google Scholar
Puga D (2010) The magnitude and causes of agglomeration economies. J Reg Sci 50:203–219
Article Google Scholar
Rao CR (1995) A review of canonical coordinates and an alternative to correspondence analysis using Hellinger distance. QÜESTIIÓ 19:23–63
Google Scholar
Rosenthal S, Strange WC (2004) Evidence on the nature and sources of agglomeration economies. In: Henderson JV, Thisse JF (eds) Handbook of urban and regional economics. Elsevier, Amsterdam
Google Scholar
Rota G-C (1964) The number of partitions of a set. Am Math Mon 71:498–504
Article Google Scholar
Schepers J, Bock H-H, Van Mechelen I (2017) Maximal interaction two-mode clustering. J Classif 34:49–75
Article Google Scholar
Sloane NJA (2001) Bell numbers. In: Hazewinkel M (ed) Encyclopedia of mathematics. Springer, New York
Google Scholar
Tang C, Zhang L, Zhang A, Ramanathan M (2001) Interrelated two-way clustering: an unsupervised approach for gene expression data analysis. In: BIBE, pp 41–48
Tibshirani R, Hastie T, Eisen M, Ross D, Botstein D, Brown P (1999) Clustering methods for the analysis of dna microarray data. Technical report, Department of Statistics, Stanford University
Tobler WR (1970) A computer movie simulating urban growth in the Detroit region. Econ Geogr 46:234–240
Article Google Scholar
Van Mechelen I, Bock H-H, De Boeck P (2004) Two-mode clustering methods: a structured overview. Stat Methods Med Res 13:363–394
Article Google Scholar
Viladecans-Marsal E (2004) Agglomeration economies and industrial location: city-level evidence. J Econ Geogr 4:565–582
Article Google Scholar
Ward JH Jr (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58:236–244
Article Google Scholar

Download references

Funding

Michel Mouchart gratefully acknowledges financial support from IAP research network Grant No. P6/03 of the Belgian government (Belgian Science Policy). Both authors gratefully acknowledge the financial support of FOP, that promoted intercontinental cooperation.

Author information

Authors and Affiliations

Fundación Observatorio PyME (FOP), Ciudad Autónoma de Buenos Aires, Argentina
Christian Haedo
Institut de Statistique, Biostatistique et Sciences Actuarielles (ISBA), Université catholique de Louvain, Louvain-la-Neuve, Belgium
Michel Mouchart

Authors

Christian Haedo
View author publications
You can also search for this author in PubMed Google Scholar
Michel Mouchart
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christian Haedo.

Ethics declarations

Conflict of interest

The authors declares that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Michel Mouchart gratefully acknowledges financial support from IAP research network grant nrP6/03 of the Belgian government (Belgian Science Policy). Both authors gratefully acknowledge the financial support of FOP, that promoted intercontinental cooperation. A special thank is due to Vicente N. Donato for the impetus he gave to the development of the topic of this paper. Dominique Peeters also deserves a particular gratitude for a series of comments that lead to substantial improvements of a previous version of this paper. The help of Fernando Valli has been instrumental in shaping the algorithm developed in this paper and is deeply acknowledged. Highly appreciated and gratefully acknowledged are also comments given by technical analysts participating to seminars where this paper has been presented, namely at the Competitiveness and Innovation Division of the Inter-American Development Bank (IADB) in Madrid (Sp) and Washington (USA), at the Structural Policies and Innovation Unit of the OECD Development Centre in Paris (Fr) and at the Farnesina of the Ministero degli Affari Esteri e della Cooperazione Internazionale (MAECI) in Roma (It).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Haedo, C., Mouchart, M. Two-mode clustering through profiles of regions and sectors. Empir Econ 63, 1971–1996 (2022). https://doi.org/10.1007/s00181-022-02201-z

Download citation

Received: 24 February 2021
Accepted: 03 January 2022
Published: 10 March 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s00181-022-02201-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two-mode clustering through profiles of regions and sectors

Abstract

Access this article

Similar content being viewed by others

Socioeconomic Zoning: Comparing Two Statistical Methods

A Constrained Cluster Analysis with Homogeneity of External Criterion

Issues on Clustering and Data Gridding

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Two-mode clustering through profiles of regions and sectors

Abstract

Access this article

Similar content being viewed by others

Socioeconomic Zoning: Comparing Two Statistical Methods

A Constrained Cluster Analysis with Homogeneity of External Criterion

Issues on Clustering and Data Gridding

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation