Skip to main content
Log in

Two-mode clustering through profiles of regions and sectors

  • Published:
Empirical Economics Aims and scope Submit manuscript

Abstract

This paper is concerned with simultaneously regrouping regions and sectors when analyzing the relative sectorial specialization of regions and the relative regional concentration of sectors. An automatic two-mode clustering algorithm is proposed with a view toward a concept of overall localization, corresponding to a discrepancy between an actual two-way contingency table (regions \(\times \) sectors) and an hypothetical table reflecting independence between regions and sectors. This procedure identifies similar regions (respectively sectors) according to the relative sectorial (respectively regional) structure. This algorithm significantly reduces the size of the original table and obtain an optimal collapsed table with low level of information loss vis-à-vis the degree of overall localization. The properties and results of the algorithm are discussed through two applications, namely Argentina and Brazil.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Alonso-Villar O, del Río C (2013) Concentration of economic activity: an analytical framework. Reg Stud 47:756–772

    Article  Google Scholar 

  • Baldwin RE, Martin P (2004) Agglomeration and regional growth. In: Henderson JV, Thisse JF (eds) Handbook of urban and regional economics. Elsevier, Amsterdam

    Google Scholar 

  • Banerjee A, Dhillon I, Ghosh J, Merugu S, Modha DS (2007) A generalized maximum entropy approach to Bregman co-clustering and matrix approximation

  • Ben Saber H, Elloumi M (2015) DNA microarray data analysis: a new survey on biclustering. Int J Comput Biol 4:21–37

    Article  Google Scholar 

  • Benabdeslem K, Allab K (2013) Bi-clustering continuous data with self-organizing map. Neural Computing and Applications 22: 1551–1562

  • Benzécri JP (1973) Analyse des Données. Dunod, París

    Google Scholar 

  • Benzécri JP (1992) Correspondence analysis handbook. Dekker, New York

    Book  Google Scholar 

  • Bhattacharya A, Cui Y (2017) A GPU-accelerated algorithm for biclustering analysis and detection of condition-dependent coexpression network modules. Sci Rep 7:4162. https://doi.org/10.1038/s41598-017-04070-4

    Article  Google Scholar 

  • Bickenbach F, Bode E (2006) Disproportionality measures of concentration, specialization and polarization. Kiel Institute for the World Economy, working paper 1276

  • Bickenbach F, Bode E (2008) Disproportionality measures of concentration, specialization and localization. Int Reg Sci Rev 31:359–388

    Article  Google Scholar 

  • Bickenbach F, Bode E, Krieger-Boden C (2010) Closing the gap between absolute and relative measures of localization, concentration or specialization. Kiel Institute for the World Economy, working paper 1660

  • Bock HH (1979) Simultaneous clustering of objects and variables. In: INRIA, pp 187–203

  • Branson D (2000) Stirling numbers and Bell numbers: their role in combinatorics and probability. Math Sci 25:1–31

    Google Scholar 

  • Braverman EM, Kiseleva NE, Muchnik IB, Novikov SG (1974) Linguistic approach to the problem of processing large bodies of data. Autom Remote Control 35:1768–1788

    Google Scholar 

  • Busygin S, Prokopyev O, Pardalos PM (2008) Biclustering in data mining. Comput Oper Res 35:2964–2987

    Article  Google Scholar 

  • Caldas J, Kaski S (2011) Hierarchical generative biclustering for microrna expression analysis. J Comput Biol 18:251–261

    Article  Google Scholar 

  • Cazes P (1986) Correspondance entre deux ensembles et partition de ces deux ensembles. Les Cahiers de l’Analyse des Données 11:335–340

    Google Scholar 

  • Charrad M, Lechevallier Y, Saporta G, Ben Ahmed M (2009) Détermination du nombre des classes dans l’algorithme croki de classification croisée. In: EGC, pp 447–448

  • Cheng Y, Church GM (2000) Biclustering of expression data. In: ISMB, pp 93–103

  • Ciampi A, González Marcos A, Castejón Limas M (2005) Correspondence analysis and two-way clustering. Stat Oper Res Trans 29:27–42

    Google Scholar 

  • Combes P, Gobillon L (2015) The empirics of agglomeration economies. In: Duranton G, Henderson JV, Strange WC (eds) Handbook of regional and urban economics. Elsevier, Amsterdam

    Google Scholar 

  • Corsten L, Denis J (1990) Structuring interaction in two-way tables by clustering. Biometrics 46:207–215

    Article  Google Scholar 

  • Cottineau C, Finance O, Hatna E, Arcaute E, Batty M (2018) Defining urban clusters to detect agglomeration economies. Environ Plan B: Urban Anal City Sci. https://doi.org/10.1177/2399808318755146

    Article  Google Scholar 

  • Denis JB, Vincourt P (1982) Panorama des méthodes statistiques d’analyse des interactions genotype \( \times \) milieu. Agronomie 2:219–230

    Article  Google Scholar 

  • Donato V (2002) Políticas públicas y localización industrial en Argentina. Fundación Observatorio PyME, Buenos Aires, CIDETI working paper 2002/01

  • Duffy DE, Quiroz AJ (1991) A permutation-based algorithm for block clustering. J Classif 8:65–91

    Article  Google Scholar 

  • Duranton G, Puga D (2000) Diversity and specialization in cities: why, where and when does it matter? Urban Stud 37:533–555

    Article  Google Scholar 

  • Eckart C, Young G (1936) The approximation of one matrix by another of lower rank. Psychometrika 1:211–218

    Article  Google Scholar 

  • Ellison G, Glaeser EL, Kerr WR (2010) What causes industry agglomeration? Evidence from coagglomeration patterns. Am Econ Rev 100:1195–1213

    Article  Google Scholar 

  • Escofier B (1978) Analyse factorielle et distances répondant au principe d’équivalence distributionnelle. Rev Stat Appl 16:29–37

    Google Scholar 

  • Florence P (1939) Report of the location of industry. Political and Economic Planning, London

    Google Scholar 

  • Fujita M, Krugman P, Venables A (2001) The spatial economy. Cities, regions, and international trade. MIT Press, Cambridge

    Google Scholar 

  • Fujita M, Thisse J-F (2002) Economics of agglomeration. Cities, industrial location, and regional growth. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Gan L, Jiang J (1999) A test for global maximum. J Am Stat Assoc 94:847–854

    Article  Google Scholar 

  • Gardner M (1978) The Bells: versatile numbers that can count partitions of a set, primes and even rhymes. Sci Am 238:24–30

    Article  Google Scholar 

  • Gilula Z (1986) Grouping and associations in contingency tables: an exploratory canonical correlation approach. J Am Stat Assoc 81:773–779

    Article  Google Scholar 

  • Goodman L (1981) Criteria for determining whether certain categories in a cross-classification table should be combined with special reference to occupational categories in an occupational mobility table. Am J Sociol 87:612–650

    Article  Google Scholar 

  • Goodman L (1985) The analysis of cross-classified data having ordered and/or unordered categories: association models, correlation models, and asymmetry models for contingency tables with or without missing entries. Ann Stat 13:10–69

    Article  Google Scholar 

  • Govaert G (1977) Algorithme de classification d’un tableau de contingence. In: INRIA, pp 487–500

  • Govaert G (1995) Simultaneous clustering of rows and columns. Control Cybern 24(4):437–458

    Google Scholar 

  • Govaert G, Nadif M (2008) Block clustering with Bernoulli mixture models: comparison of different approaches. Comput Stat Data Anal 52:3233–3245

    Article  Google Scholar 

  • Govaert G, Nadif M (2010) Latent block model for contingency tables. Commun Stat Theory Methods 3:416–425

    Article  Google Scholar 

  • Govaert G, Nadif M (2013) Co-clustering. Wiley, Hoboken

    Book  Google Scholar 

  • Greenacre MJ (1984) Theory and applications of correspondence analysis. Academic Press, London

    Google Scholar 

  • Greenacre MJ (1988) Clustering the rows and columns of a contingency table. J Classif 5:39–51

    Article  Google Scholar 

  • Greenacre MJ (1993) Multivariate generalizations of correspondence analysis. In: Cuadras CM, Rao CR (eds) Multivariate analysis: future directions 2. North-Holland, Amsterdam

    Google Scholar 

  • Greenacre MJ (2007) Correspondence analysis in practice. Chapman & Hall/CRC, Boca Raton

    Book  Google Scholar 

  • Greenacre MJ (2011) A simple permutation test for clusteredness. Barcelona GSE working paper 555

  • Guimarães P, Figueiredo O, Woodward D (2003) A tractable approach to the firm location decision problem. Rev Econ Stat 84:201–204

    Article  Google Scholar 

  • Guimarães P, Figueiredo O, Woodward D (2009) Dartboard tests for the location quotient. Reg Sci Urban Econ 39:360–364

    Article  Google Scholar 

  • Haedo C (2009) Measure of global specialization and spatial clustering for the identification of “Specialized” Agglomeration. Ph.D. thesis, Dipartimento di Scienze Statistiche “P. Fortunati”, Università di Bologna, Bologna. http://amsdottorato.cib.unibo.it/1735/1/Christian_Haedo_tesi.pdf

  • Haedo C, Mouchart M (2015a) Specialized agglomerations with lattice data: model and detection. Spatial Stat 11:113–131

    Article  Google Scholar 

  • Haedo C, Mouchart M (2015b) Methodological framework for the analysis of industrial geographical data, part of the project Mapas Industriales de América Latina y el Caribe (MIALC). Fundación Observatorio PyME, Buenos Aires. https://www.geoecon.info/slides/slide/metodologia-1

  • Haedo C, Mouchart M (2018) A stochastic independence approach for different measures of concentration and specialization. Pap Reg Sci 97:1151–1168

    Article  Google Scholar 

  • Hahsler M, Piekenbrock M, Doran D (2019) dbscan: Fast density-based clustering with R. J Stat Softw 91:1–30. https://doi.org/10.18637/jss.v091.i01

    Article  Google Scholar 

  • Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67:123–129

    Article  Google Scholar 

  • Hausmann R, Hidalgo CA, Bustos S, Coscia M, Chung S, Jimenez J, Simoes AR, Yildirim MA (2015) Atlas of economic complexity: mapping paths to prosperity. MIT Press, Cambridge. http://atlas.cid.harvard.edu/media/atlas/pdf/HarvardMIT_AtlasOfEconomicComplexity.pdf

  • Hirotsu C (1983) Defining the pattern of association in two-way contingency tables. Biometrika 70:579–589

    Article  Google Scholar 

  • Jagalur M, Pal C, Learned-Miller E, Zoeller RT, Kulp D (2007) Analyzing in situ gene expression in the mouse brain with image registration, feature extraction and block clustering. BMC Bioinform 8:S5

    Article  Google Scholar 

  • Jambu M (1978) Classification Automatique pour l’Analyse des Données, I- Méthodes et Algorithms. Dunod, Paris

    Google Scholar 

  • Jobson J (1992) Applied multivariate data analysis. Volume II: categorical and multivariate methods. Springer, New York

    Book  Google Scholar 

  • Keribin C, Brault V, Celeux G, Govaert G (2015) Estimation and selection for the latent block model on categorical data. Stat Comput 25:1201–1216

    Article  Google Scholar 

  • Lebart L, Mirkin BG (1993) Correspondence analysis and classification. In: Cuadras CM, Rao CR (eds) Multivariate analysis: future directions. North-Holland, Amsterdam

    Google Scholar 

  • Lebart L, Morineau A, Warwick KH (1984) Multivariate descriptive statistical analysis. Wiley, New York

    Google Scholar 

  • Liu H, Zou J, Ravishanker N (2018) Multiple day biclustering of high-frequency financial time series. Stat 7:e176. https://doi.org/10.1002/sta4.176

    Article  Google Scholar 

  • Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinf 1:24–45

    Article  Google Scholar 

  • Mardia K, Kent J, Bibby J (1979) Multivariate analysis. Academic Press, London

    Google Scholar 

  • Marinelli C, Winzer N (2004) Agrupamiento de filas y columnas homogéneas en modelos de correspondencia. Revista de Matemática: Teoría y Aplicaciones 11:59–68

    Google Scholar 

  • Mirkin B (1996) Mathematical classification and clustering. Kluwer, Dordrecht

    Book  Google Scholar 

  • Moineddin R, Beyene J, Boyle E (2003) On the location quotient confidence interval. Geogr Anal 35:249–256

    Article  Google Scholar 

  • Nathan M, Overman H (2013) Agglomeration, clusters, and industrial policy. Oxf Rev Econ Policy 29:383–404

    Article  Google Scholar 

  • O’Donoghue D, Gleave B (2004) A note on methods for measuring industrial agglomeration. Reg Stud 38:419–427

    Article  Google Scholar 

  • Orzechowski P, Sipper S, Huang X, Moore JH (2018) EBIC: an evolutionary-based parallel biclustering algorithm for pattern discovery. Bioinformatics 34:3719–3726. https://doi.org/10.1093/bioinformatics/bty401

    Article  Google Scholar 

  • Park PJ, Manjourides J, Bonetti M, Paganob M (2009) A permutation test for determining significance of clusters with applications to spatial and gene expression data. Comput Stat Data Anal 53:4290–4300

    Article  Google Scholar 

  • Puga D (2010) The magnitude and causes of agglomeration economies. J Reg Sci 50:203–219

    Article  Google Scholar 

  • Rao CR (1995) A review of canonical coordinates and an alternative to correspondence analysis using Hellinger distance. QÜESTIIÓ 19:23–63

    Google Scholar 

  • Rosenthal S, Strange WC (2004) Evidence on the nature and sources of agglomeration economies. In: Henderson JV, Thisse JF (eds) Handbook of urban and regional economics. Elsevier, Amsterdam

    Google Scholar 

  • Rota G-C (1964) The number of partitions of a set. Am Math Mon 71:498–504

    Article  Google Scholar 

  • Schepers J, Bock H-H, Van Mechelen I (2017) Maximal interaction two-mode clustering. J Classif 34:49–75

    Article  Google Scholar 

  • Sloane NJA (2001) Bell numbers. In: Hazewinkel M (ed) Encyclopedia of mathematics. Springer, New York

    Google Scholar 

  • Tang C, Zhang L, Zhang A, Ramanathan M (2001) Interrelated two-way clustering: an unsupervised approach for gene expression data analysis. In: BIBE, pp 41–48

  • Tibshirani R, Hastie T, Eisen M, Ross D, Botstein D, Brown P (1999) Clustering methods for the analysis of dna microarray data. Technical report, Department of Statistics, Stanford University

  • Tobler WR (1970) A computer movie simulating urban growth in the Detroit region. Econ Geogr 46:234–240

    Article  Google Scholar 

  • Van Mechelen I, Bock H-H, De Boeck P (2004) Two-mode clustering methods: a structured overview. Stat Methods Med Res 13:363–394

    Article  Google Scholar 

  • Viladecans-Marsal E (2004) Agglomeration economies and industrial location: city-level evidence. J Econ Geogr 4:565–582

    Article  Google Scholar 

  • Ward JH Jr (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58:236–244

    Article  Google Scholar 

Download references

Funding

Michel Mouchart gratefully acknowledges financial support from IAP research network Grant No. P6/03 of the Belgian government (Belgian Science Policy). Both authors gratefully acknowledge the financial support of FOP, that promoted intercontinental cooperation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christian Haedo.

Ethics declarations

Conflict of interest

The authors declares that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Michel Mouchart gratefully acknowledges financial support from IAP research network grant nrP6/03 of the Belgian government (Belgian Science Policy). Both authors gratefully acknowledge the financial support of FOP, that promoted intercontinental cooperation. A special thank is due to Vicente N. Donato for the impetus he gave to the development of the topic of this paper. Dominique Peeters also deserves a particular gratitude for a series of comments that lead to substantial improvements of a previous version of this paper. The help of Fernando Valli has been instrumental in shaping the algorithm developed in this paper and is deeply acknowledged. Highly appreciated and gratefully acknowledged are also comments given by technical analysts participating to seminars where this paper has been presented, namely at the Competitiveness and Innovation Division of the Inter-American Development Bank (IADB) in Madrid (Sp) and Washington (USA), at the Structural Policies and Innovation Unit of the OECD Development Centre in Paris (Fr) and at the Farnesina of the Ministero degli Affari Esteri e della Cooperazione Internazionale (MAECI) in Roma (It).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Haedo, C., Mouchart, M. Two-mode clustering through profiles of regions and sectors. Empir Econ 63, 1971–1996 (2022). https://doi.org/10.1007/s00181-022-02201-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00181-022-02201-z

Keywords

Navigation