Conceptual and Empirical Arguments for a Language Feature: Evidence from Language Mixing

  • Barbara E. BullockEmail author
  • Almeida Jacqueline Toribio
Part of the Studies in Natural Language and Linguistic Theory book series (SNLT, volume 95)


This chapter points to the relevance of Herschensohn’s Constructionist view of second language acquisition for the study of bilingual language mixing. In elaborating Constructionism, Herschensohn (2000) argues that the assembling of the lexicon and its attendant features constitutes the major task of the learner. The articulation of the bilingual lexicon is also invoked in formulating the Functional Head Constraint (Belazi et al. 1994), which characterizes patterns of switching in proficient bilinguals and in second language learners by appeal to the matching and checking of features, including language. While the validity of the language feature has been disputed, we underscore the positive consequences of tagging lexical items with a language label, as we move towards recruiting computational tools for effectively exploiting bilingual corpora. We provide evidence of the benefits of language tagging in quantifying language mixing profiles and in classifying bilingual phenomena such as code-switching versus borrowing.


Borrowing Code-switching Matrix language Constructionism Functional Head Constraint Determiner Phrase (DP) 


  1. Adamou, E. 2016. A corpus-driven approach to language contact: Endangered languages in a comparative perspective, vol. 12. Walter de Gruyter GmbH & Co KG.Google Scholar
  2. Adel, H., K. Kirchhoff, D. Telaar, N.T. Vu, T. Schlippe, and T. Schultz. 2014. Features for factored language models for code-Switching speech. In SLTU, 32–38.Google Scholar
  3. Appel, R., and P. Muysken. 1987. Language contact and bilingualism. London; Baltimore, Md., USA: Edward Arnold.Google Scholar
  4. Baldwin, T., and M. Lui. 2010. Language identification: The long and the short of the matter. In Human language technologies: The 2010 annual conference of the North American chapter of the Association for Computational Linguistics, 229–237. Association for Computational Linguistics.Google Scholar
  5. Bandi-Rao, S., M. den Dikken, and J. MacSwan. 2014. Light switches: On v as a pivot in codeswitching, and the nature of the ban on word-internal switches. In Grammatical theory and bilingual codeswitching, 161–183. Cambridge, MA: MIT Press.Google Scholar
  6. Barman, U., A. Das, J. Wagner, and J. Foster. 2014. Code mixing: A challenge for language identification in the language of social media. In Proceedings of the first workshop on computational approaches to code switching, 13–23.Google Scholar
  7. Beebe, L.M. 1977. The influence of the listener on code-switching. Language Learning 27 (2): 331–339.Google Scholar
  8. Belazi, H.M., E.J. Rubin, and A.J. Toribio. 1994. Code switching and X-Bar theory: The functional head constraint. Linguistic Inquiry 25 (2): 221–237.Google Scholar
  9. Barnett, R., Codó, E., Eppler, E., Forcadell, M., Gardner-Chloros, P., van Hout, and R., Sebba, M. 2000. The LIDES Coding Manual: A Document for Preparing and Analyzing Language Interaction Data Version 1.1–July 1999. International Journal of Bilingualism, 4 (2): 131–271.Google Scholar
  10. Bhatt, R.M. 2014. Argument licensing in optimal switches. In Grammatical theory and bilingual codeswitching, ed. J. MacSwan, 135–158. Cambridge, MA: MIT Press.Google Scholar
  11. Blokzijl, J., M. Deuchar, and M. Couto. 2017. Determiner asymmetry in mixed nominal constructions: The role of grammatical factors in data from Miami and Nicaragua. Languages 2 (4): 20.Google Scholar
  12. Brysbaert, M., and B. New. 2009a. Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods 41 (4): 977–990.Google Scholar
  13. Brysbaert, M., and B. New. 2009b. Subtlexus: American word frequencies. Http:/Subtlexus. Lexique. Org.Google Scholar
  14. Bullock, B.E., and A.J. Toribio. 2004. Introduction: Convergence as an emergent property in bilingual speech. Bilingualism: Language and Cognition. Special Issue: Bilingualism and Linguistic Convergence, 7 (2), 91–93.Google Scholar
  15. Bullock, B.E., and A.J. Toribio. 2018. Sociolinguistics of bilingualism. In An introduction to bilingualism: Principles and processes, 2nd ed., ed. J. Altarriba and R.R. Heredia, 300–316. New York, N.Y.: Lawrence Erlbaum Associates.Google Scholar
  16. Bullock, B.E., G. Guzmán, V. Sharath, J. Serigos, and A.J. Toribio. 2018a. Predicting the presence of a Matrix Language in code-switching. In Proceedings of the third workshop on computational approaches to linguistic code-switching, 68–75. Association for Computational Linguistics.Google Scholar
  17. Bullock, B.E., G.A. Guzmán, J. Serigos, and A.J. Toribio. 2018b. Should code-switching models be asymmetric? In Interspeech 2018, 2534–2538.Google Scholar
  18. Byers-Heinlein, K. 2014. Languages as categories: Reframing the “One Language or Two” question in early bilingual development. Language Learning 64 (s2): 184–201.Google Scholar
  19. Cantone, K.F., and N. Müller. 2008. Un nase or una nase? What gender marking within switched DPs reveals about the architecture of the bilingual language faculty. Lingua 118 (6): 810–826.Google Scholar
  20. Cavnar, W.B., and J.M. Trenkle. 1994. N-gram-based text categorization. In Proceedings of third annual symposium on document analysis and information retrieval, vol. 48113, no. 2, 161–175.Google Scholar
  21. Çetinoğlu, O., S. Schulz, and N.T. Vu. 2016. Challenges of computational processing of code-switching. Presented at the EMNLP, Austin, TX.Google Scholar
  22. Van Coetsem, F. 1988. Loan phonology and the two transfer types in language contact. Dordrecht: Foris Publications.Google Scholar
  23. Das, A., and B. Gambäck. 2014. Identifying languages at the word level in code-mixed Indian social media text.Google Scholar
  24. Deuchar, M. (2010). BilingBank Spanish-English Miami Corpus.Google Scholar
  25. Di Sciullo, A.-M., P. Muysken, and R. Singh. 1986. Government and code-mixing. Journal of Linguistics 22 (1): 1–24.Google Scholar
  26. Diab, M., and A. Kamboj. 2011. Feasibility of leveraging crowd sourcing for the creation of a large scale annotated resource for Hindi English code switched data: A pilot annotation. DTIC Document.Google Scholar
  27. Dijkstra, T., and W.J.B. Van Heuven. 2002. The architecture of the bilingual word recognition system: From identification to decision. Bilingualism: Language and Cognition 5 (3): 175–197.Google Scholar
  28. Dubois, S., and S. Noetzel. 2005. Intergenerational pattern of interference and internally-motivated changes in Cajun French. Bilingualism 8 (2): 131–143.Google Scholar
  29. Finkel, J.R., T. Grenager, and C. Manning. 2005. Incorporating non-local information into information extraction systems by Gibbs Sampling. In Proceedings of the 43rd annual meeting on association for computational linguistics, 363–370. Association for Computational Linguistics.Google Scholar
  30. Franco, J.C., and T. Solorio. 2007. Baby-steps towards building a Spanglish language model. In International conference on intelligent text processing and computational linguistics, 75–84. Springer.Google Scholar
  31. Francom, J.C. 2013. ACTIV-ES: A novel Spanish-language corpus for linguistic and cultural comparisons between communities of the Hispanic world.Google Scholar
  32. Francom, J., M. Hulden, and A. Ussishkin. 2014. ACTIV-ES: A comparable, cross-dialect corpus of ’everyday’ Spanish from Argentina, Mexico, and Spain. In LREC, 1733–1737.Google Scholar
  33. Goh, K.-I., and A.-L. Barabási. 2008. Burstiness and memory in complex systems. EPL (Europhysics Letters), 81 (4): 48002.Google Scholar
  34. Green, D.W. 1998. Mental control of the bilingual lexico-semantic system. Bilingualism: Language and Cognition 1 (2): 67–81.Google Scholar
  35. Grefenstette, G. 1995. Comparing two language identification schemes. In 3rd international conference on statistical analysis of textual data, 11–13.Google Scholar
  36. Guzmán, G.A., J. Serigos, B.E. Bullock, and A.J. Toribio. 2016. Simple tools for exploring variation in code-switching for linguists. In EMNLP 2016, 2–20.Google Scholar
  37. Guzmán, G.A., J. Ricard, J. Serigos, B. Bullock, and A.J. Toribio. 2017a. Moving code-switching research toward more empirically grounded methods. In CDH2017: Corpora in the Digital Humanities, 1–9.Google Scholar
  38. Guzmán, G., J. Ricard, J. Serigos, B.E. Bullock, and A.J. Toribio. 2017b. Metrics for modeling code-switching across corpora. In Proceedings of the Interspeech 2017, 67–71.Google Scholar
  39. Herring, J.R., M. Deuchar, M.C.P. Couto, and M.M. Quintanilla. 2010. ‘I saw the madre’: Evaluating predictions about codeswitched determiner-noun sequences using Spanish–English and Welsh–English data. International Journal of Bilingual Education and Bilingualism 13 (5): 553–573.Google Scholar
  40. Herschensohn, J. 2000. The second time around minimalism and L2 acquisition, vol. 21. John Benjamins Publishing.Google Scholar
  41. Jake, J.L., C. Myers-Scotton, and S. Gross. 2002. Making a minimalist approach to codeswitching work: Adding the matrix language. Bilingualism: Language and Cognition 5 (1): 69–91.Google Scholar
  42. Jarvis, S., and A. Pavlenko. 2007. Crosslinguistic influence in language and cognition. New York and London: Routledge.Google Scholar
  43. Joshi, A. K. 1982. Processing of sentences with intra-sentential code-switching. In Proceedings of the 9th conference on Computational linguistics, vol. 1, pp. 145–150. Academia Praha.Google Scholar
  44. Jurgens, D., Y. Tsvetkov, and D. Jurafsky. 2017. Incorporating dialectal variability for socially equitable language identification. In Proceedings of the 55th annual meeting of the Association for Computational Linguistics, Volume 2: Short papers, vol. 2, 51–57).Google Scholar
  45. Khattab, G. 2013. Phonetic convergence and divergence strategies in English-Arabic bilingual children. Linguistics 51 (2): 439–472.Google Scholar
  46. King, B., and S. Abney. 2013. Labeling the languages of words in mixed-language documents using weakly supervised methods. In Proceedings of NAACL-HLT, 1110–1119.Google Scholar
  47. Köppe, R., and J.M. Meisel. 1995. Code-switching in bilingual first language acquisition. In One speaker, two languages: Cross-disciplinary perspectives on code-switching, ed. L. Milroy and P. Muysken, 276–301. New York: Cambridge University Press.Google Scholar
  48. Li, Y., and P. Fung. 2012. Code-switch language model with inversion constraints for mixed language speech recognition. In COLING, 1671–1680.Google Scholar
  49. Li, Y., and P. Fung. 2013. Improved mixed language speech recognition using asymmetric acoustic model and language model with code-switch inversion constraints. In 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP), 7368–7372. IEEE.Google Scholar
  50. Liceras, J.M., R.F. Fuertes, S. Perales, R. Pérez-Tattam, and K.T. Spradlin. 2008. Gender and gender agreement in bilingual native and non-native grammars: A view from child and adult functional–lexical mixings. Lingua 118 (6): 827–851.Google Scholar
  51. Lignos, C., and M. Marcus. 2013. Toward web-scale analysis of codeswitching. In Proceedings of annual meeting of the Linguistic Society of America.Google Scholar
  52. López, L. (ed.). 2018. Code-switching: Theoretical questions, experimental answers. A festschrift presented to Kay González-Vilbazo presented by his colleagues and students. Amsterdam: John Benjamins.Google Scholar
  53. Lui, M., and T. Baldwin. 2012. langid. py: An off-the-shelf language identification tool. In Proceedings of the ACL 2012 system demonstrations, 25–30. Association for Computational Linguistics.Google Scholar
  54. Lui, M., J.H. Lau, and T. Baldwin. 2014. Automatic detection and language identification of multilingual documents. Transactions of the Association for Computational Linguistics 2: 27–40.Google Scholar
  55. MacSwan, J. 2000. The architecture of the bilingual language faculty: Evidence from intrasentential code switching. Bilingualism: Language and Cognition 3 (1): 37–54.Google Scholar
  56. MacSwan, J. 2004. Code switching and grammatical theory. In The handbook of bilingualism, ed. T.K. Bhatia and W.C. Ritchie, 283–311. Oxford: Blackwell.Google Scholar
  57. MacSwan, J. 2014. Grammatical theory and bilingual codeswitching. MIT Press.Google Scholar
  58. Maharjan, S., E. Blair, S. Bethard, and T. Solorio. 2015. Developing Language-tagged Corpora for Code-switching Tweets. In LAW@ NAACL-HLT, 72–84.Google Scholar
  59. Mahootian, S. 1993. A null theory of code switching. Dissertation, Northwestern University, Evanston, IL.Google Scholar
  60. Mahootian, S., and B. Santorini. 1996. Code switching and the complement/adjunct distinction. Linguistic Inquiry 27 (3): 464–479.Google Scholar
  61. Matras, Y., and J. Sakel. 2007. Grammatical borrowing in cross-linguistic perspective, vol. 38. Walter de Gruyter.Google Scholar
  62. Mougeon, R., T. Nadasdi, and K. Rehner. 2005. Contact-induced linguistic innovations on the continuum of language use: The case of French in Ontario. Bilingualism: Language and Cognition 8 (2): 99–115.Google Scholar
  63. Muysken, P. 2000. Bilingual speech: A typology of code-mixing. Cambridge: Cambridge University Press.Google Scholar
  64. Muysken, P. 2002. Computation and storage in language contact. In Storage and computation in the language faculty, ed. E. Nooteboom, F. Weerman, and F. Wijnen, 157–179). Dordrecht: Kluwer.Google Scholar
  65. Myers-Scotton, C. 1993. Dueling languages: Grammatical structure in codeswitching. Oxford: Oxford University Press (Clarendon Press).Google Scholar
  66. Myers-Scotton, C. 2002. Contact linguistics: Bilingual encounters and grammatical outcomes. Oxford: Oxford University Press.Google Scholar
  67. Nguyen, D.-P., and A.S. Dogruoz. 2013. Word level language identification in online multilingual communication. Association for Computational Linguistics.Google Scholar
  68. Parafita Couto, M.C., and H. Stadthagen-González. 2017. El book or the libro? Insights from acceptability judgments into determiner/noun code-switches. International Journal of Bilingualism.Google Scholar
  69. Petrov, S., D. Das, and R. McDonald. 2011. A universal part-of-speech tagset. ArXiv:1104.2086.Google Scholar
  70. Pfaff, C.W. 1979. Constraints on language mixing: Intransentential code-switching and borrowing in Spanish/English. Language 55 (2): 291–318.Google Scholar
  71. Poplack, S., D. Sankoff, and C. Miller. 1988. The social correlates and linguistic processes of lexical borrowing and assimilation. Linguistics 26 (1): 47–104.Google Scholar
  72. Poplack, S., L. Zentz, and N. Dion. 2012. What counts as (contact-induced) change. Bilingualism: Language and Cognition 15 (02): 247–254.Google Scholar
  73. Post, R.E. 2010. Code-switching in the determiner phrase: a comparison of Tunisian Arabic-French and Moroccan Arabic-French switching. Master’s Thesis, University of Texas at Austin.Google Scholar
  74. R. Core Team. 2014. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.Google Scholar
  75. Radford, A., T. Kupisch, R. Köppe, and G. Azzaro. 2007. Concord, convergence and accommodation in bilingual children. Bilingualism: Language and Cognition 10 (3): 239–256.Google Scholar
  76. Ritter, E. (1991). Two functional categories in noun phrases: Evidence from Modern Hebrew. In Perspectives on phrase structure: Heads and licensing, ed. S. Rothstein, 37–62. San Diego: Academic Press.Google Scholar
  77. Roeper, T. 1999. Universal bilingualism. Bilingualism: Language and Cognition, 2 (3): 169–186.Google Scholar
  78. Rosner, M., and P.-J. Farrugia. 2007. A tagging algorithm for mixed language identification in a noisy domain. In Interspeech, 190–193.Google Scholar
  79. Rubin, E.J., and A.J. Toribio. 1995. Feature-checking and the syntax of language contact. In Amsterdam studies in the theory and history of linguistic science series, vol. 4, 177–177.Google Scholar
  80. Sankoff, D., and S. Poplack. 1980. A formal grammar for code-switching. In CUNY Working Papers, vol. 8. Centro de Estudios Puertorriqueños.Google Scholar
  81. Schulz, S., and M. Keller. 2016. Code-switching Ubique Est-language identification and part-of-speech tagging for historical mixed text. In Proceedings of the 10th SIGHUM workshop on language technology for cultural heritage, social sciences, and humanities, Proceedings of LaTeCH, 43–51.Google Scholar
  82. Sharma, A., S. Gupta, R. Motlani, P. Bansal, M. Srivastava, R. Mamidi, and D.M. Sharma. 2016. Shallow parsing pipeline for Hindi-English code-mixed social media text. ArXiv:1604.03136.Google Scholar
  83. Sibun, P., and J.C. Reynar. 1996. Language identification: Examining the issues.Google Scholar
  84. Solorio, T., E. Blair, S. Maharjan, S. Bethard, M. Diab, M. Gohneim, et al. 2014. Overview for the first shared task on language identification in code-switched data. In Proceedings of the first workshop on computational approaches to code switching, 62–72.Google Scholar
  85. Solorio, T., and Y. Liu. 2008a. Learning to predict code-switching points. In Proceedings of the conference on empirical methods in natural language processing, 973–981. Association for Computational Linguistics.Google Scholar
  86. Solorio, T., and Y. Liu. 2008b. Part-of-speech tagging for English-Spanish code-switched text. In Proceedings of the conference on empirical methods in natural language processing, 1051–1060. Association for Computational Linguistics.Google Scholar
  87. Spradlin, K., Liceras, J., and Fernández-Fuertes, R. 2003. The grammatical features spell-out hypothesis as a diagnostic for bilingual competence. Paper presented at Presented at the The 4th International Symposium on Bilingualism (ISB4), Arizona State University, April 30-May 3, 2003.Google Scholar
  88. Thomason, S.G., and T. Kaufman. 1988. Language contact, creolization, and genetic linguistics. Berkeley, CA: University of California Press.Google Scholar
  89. Toribio, A.J. 2001. On the emergence of bilingual code-switching competence. Bilingualism: Language and Cognition 4 (3): 203–231.Google Scholar
  90. Toribio, Almeida Jacqueline. 2018. The future of code-switching research. Code-switching: Theoretical questions, experimental answers. In A festschrift presented to Kay González-Vilbazo presented by his colleagues and students, ed. L. López, 257–267. Amsterdam: John Benjamins.Google Scholar
  91. Treffers-Daller, J. 2005. Brussels French une fois: Transfer-induced innovation or system-internal development? Bilingualism: Language and Cognition 8 (2): 145–157.Google Scholar
  92. Vu, N.T., H. Adel, and T. Schultz. 2013. An investigation of code-switching attitude dependent language modeling. Statistical Language and Speech Processing 297–308.Google Scholar
  93. Weinreich, U. 1953. Languages in contact: findings and problems. New York; The Hague: Linguistic Circle of New York; Mouton.Google Scholar
  94. Yamaguchi, H., and K. Tanaka-Ishii. 2012. Text segmentation by language using minimum description length. In Proceedings of the 50th annual meeting of the association for computational linguistics: Long papers, vol. 1, 969–978. Association for Computational Linguistics.Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Barbara E. Bullock
    • 1
    Email author
  • Almeida Jacqueline Toribio
    • 2
  1. 1.Department of French and ItalianUniversity of Texas at AustinAustinUSA
  2. 2.Department of Spanish and PortugueseUniversity of Texas at AustinAustinUSA

Personalised recommendations