Abstract
To explore neurocognitive mechanisms underlying the human language faculty, cognitive scientists use artificial languages to control more precisely the language learning environment and to study selected aspects of natural languages. Artificial languages applied in cognitive studies are usually designed ad hoc, to only probe a specific hypothesis, and they include a miniature grammar and a very small vocabulary. The aim of the present study is the construction of an artificial language incorporating both syntax and semantics, BLISS. Of intermediate complexity, BLISS mimics natural languages by having a vocabulary, syntax, and some semantics, as defined by a degree of non-syntactic statistical dependence between words. We quantify, using information theoretical measures, dependencies between words in BLISS sentences as well as differences between the distinct models we introduce for semantics. While modeling English syntax in its basic version, BLISS can be easily varied in its internal parametric structure, thus allowing studies of the relative learnability of different parameter sets.
Similar content being viewed by others
Notes
Treebank-3 release of the Penn Treebank project.
References
Saffran JR, Aslin RN, Newport EL. Statistical learning by 8-month-old infants. Science. 1996;274(5294):1926–28.
Christiansen MH. Using artificial language learning to study language evolution: exploring the emergence of word order universals. In: The evolution of language: 3rd international conference; 2000. pp. 45–8.
Pena M, Bonatti LL, Nespor M, Mehler J. Signal-driven computations in speech processing. Science. 2002;298(5593):604–7.
Petersson KM, Folia V, Hagoort P. What artificial grammar learning reveals about the neurobiology of syntax. Brain Lang. 2010;Available from: http://dx.doi.org/10.1016/j.bandl.2010.08.003
Friederici AD, Steinhauer K, Pfeifer E. Brain signatures of artificial language processing: evidence challenging the critical period hypothesis. P Natl Acad Sci USA. 2002;99(1):529–34.
Gomez R. Infant artificial language learning and language acquisition. Trends Cogn Sci. 2000;4(5):178–86.
Mueller JL, Oberecker R, Friederici AD. Syntactic learning by mere exposure—an ERP study in adult learners. BMC neurosci. 2009;10(1):89.
Kinder A, Lotz A. Connectionist models of artificial grammar learning: what type of knowledge is acquired?. Psychol Res. 2009;73(5):659–73.
Reber AS. Implicit learning of artificial grammars. J Verb Learn Verb Behav. 1967;6:855–63.
Knowlton BJ, Squire LR. Artificial grammar learning depends on implicit acquisition of both abstract and exemplar-specific information. J Exp Psychol Learn Mem Cogn. 1996;22:169–81.
Opitz B, Friederici AD. Interactions of the hippocampal system and the prefrontal cortex in learning language-like rules. Neuroimage. 2003;19:1730–37.
Manning CD, Schuetze H. Foundations of statistical natural language processing. 1st ed. Cambridge: The MIT Press; 1999
Marcus GF, Vijayan S, Bandi Rao S, Vishton PM. Rule learning by seven-month-old infants. Science. 1999;283(5398):77.
Hochmann JR, Endress AD, Mehler J. Word frequency as a cue for identifying function words in infancy. Cognition. 2010;115(3):444–57.
Ullman MT, Pancheva R, Love T, Yee E, Swinney D, Hickok G. Neural correlates of lexicon and grammar: evidence from the production, reading, and judgment of inflection in aphasia. Brain Lang. 2005;93(2):185–238.
de Diego-Balaguer R, Fuentemilla L, Rodriguez-Fornells A. Brain dynamics sustaining rapid rule extraction from speech. J Cogn Neurosci. 2011;23(10):3105–20.
Bahlmann J, Schubotz RI, Friederici AD. Hierarchical artificial grammar processing engages Broca’s area. Neuroimage. 2008;42:525–34.
Lany J, Saffran JR. From statistics to meaning. Psychol Sci. 2010;21(2):284–91.
Jurafsky D, Martin JH. Speech and language processing: an introduction to natural language processing, computational linguistics and speech recognition (Prentice Hall Series in Artificial Intelligence). 1st ed. Prentice Hall; 2000.
Chomsky N. On certain formal properties of grammars. Inform Control. 1959;2(2):137–67.
McRae K, Cree GS, Seidenberg MS, Mcnorgan C. Semantic feature production norms for a large set of living and nonliving things. Behav Res Methods. 2005;37(4):547–59.
Cover TM, Thomas JA, Wiley J, et al. Elements of information theory. vol. 306. New Jersey: Wiley; 1991.
MacKay DJC. Information theory, inference, and learning algorithms. Cambridge: Cambridge University Press; 2003.
Altmann G, Kamide Y. Incremental interpretation at verbs: restricting the domain of subsequent reference. Cognition. 1999;73(3):247–64.
Bicknell K, Elman JL, Hare M, McRae K, Kutas M. Effects of event knowledge in processing verbal arguments. J Mem Lang. 2010;63(4):489–505.
Manning CD, Raghavan P, Schütze H. Introduction to information retrieval. 1st ed. Cambridge: Cambridge University Press; 2008.
Piantadosi ST, Tily H, Gibson E. Word lengths are optimized for efficient communication. P Natl Acad Sci USA. 2011 Mar;108(9):3526–29.
Monaghan P, Chater N, Christiansen MH. The differential role of phonological and distributional cues in grammatical categorisation. Cognition. 2005;96(2):143–82.
Toro JM, Nespor M, Mehler J, Bonatti LL. Finding words and rules in a speech stream. Psychol Science. 2008;19(2):137–44.
Shukla M, White KS, Aslin RN. Prosody guides the rapid mapping of auditory word forms onto visual objects in 6-mo-old infants. P Natl Acad Sci USA. 2011;108(15):6038–43.
Monaghan P, Christiansen MH, Chater N. The phonological-distributional coherence hypothesis: cross-linguistic evidence in language acquisition. Cogn Psychol. 2007;55(4):259–305.
Richardson FM, Thomas MSC, Price CJ. Neuronal activation for semantically reversible sentences. J Cogn Neurosci. 2010;22(6):1283–98.
Gruening A. Neural networks and the complexity of languages [Ph.D. dissertation]. School of Mathematics and Computer Science, University of Leipzig; 2004.
Longobardi G, Guardiano C. Evidence for syntax as a signal of historical relatedness. Lingua. 2009;119(11):1679–706.
Acknowledgments
We are grateful to Mayur Nikam and Mohammed Katran, who helped develop early versions of BLISS, and to Giuseppe Longobardi for linguistic advice and encouragement.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Full Grammar of BLISS
All the grammar rules and the lexicon of BLISS are shown in Table 4.
Proof of Eq. 9
After analyzing the relation between prior and posterior probabilities of a word in our models (Eq. 9), we have generated corpora with the desired overall word frequencies (the word frequencies of the semantics models were adjusted to the ones of the No-Semantics). Note, however, that there are some constraints regarding the possibility of generating sentences with arbitrary word frequencies. In Eq. 9, the numerator cannot be negative, because the output is a probability measure. Therefore, the possible posterior probability of a word is constrained by the parameter g and pairwise statistics in P s (w j | h e ) , which here has been derived from Shakespeare.
Length of BLISS Sentences
The probability distribution of the length of sentences generated by the BLISS grammar is shown in Fig. 10. To calculate mutual information values between words appearing in different positions of a sentence, we need to work with sentences of the same length. Considering the fact that the average length of the sentences is about 5 and also to obtain at least 5 distinct measures of triple mutual information values, we chose length 7. Thus, in this paper, for the results involving mutual information and triple mutual information, we used sentences of at least 7 words.
Size of Model-generated Corpora
A technical question to be addressed is how many sentences are needed, to generate sufficient statistics. To answer this question, we produced up to 40 million sentences, and calculated several of the measures used in this paper.
In Fig. 11, the mutual information between words in neighboring positions in a sentence, averaged over positions, I(n;n−1), was measured for each model and with corpora of different size: 1, 10, 20, and 40 million sentences of at least 7 words. As shown, having 10 million sentences is enough for capturing pairwise statistics in the corpora, regardless of having syntax or semantics.
Figure 12 shows the result for the average triple mutual information among words, I(n;n−2,n−1), which needs the largest samples among all measures we have applied. As illustrated, increasing the size of the corpus from 20 to 40 million sentences does not appreciably change results for the models with syntax (Subject–Verb, Verb–Subject, Exponential, and No-Semantics), while we see some changes for the models without syntax, which need larger samples.
In sum, corpora with 20 million sentences, of at least 7 words each, were used for the calculation of the mutual information and triple mutual information in this study.
For the KL-divergence results, we also need sufficient word-pair statistics, like for mutual information values. Hence, we have used corpora of at least 20 million sentences for the KL-divergence calculation as well, but without the 7-word constraint.
Rights and permissions
About this article
Cite this article
Pirmoradian, S., Treves, A. BLISS: an Artificial Language for Learnability Studies. Cogn Comput 3, 539–553 (2011). https://doi.org/10.1007/s12559-011-9113-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-011-9113-4