Skip to main content
Log in

BLISS: an Artificial Language for Learnability Studies

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

To explore neurocognitive mechanisms underlying the human language faculty, cognitive scientists use artificial languages to control more precisely the language learning environment and to study selected aspects of natural languages. Artificial languages applied in cognitive studies are usually designed ad hoc, to only probe a specific hypothesis, and they include a miniature grammar and a very small vocabulary. The aim of the present study is the construction of an artificial language incorporating both syntax and semantics, BLISS. Of intermediate complexity, BLISS mimics natural languages by having a vocabulary, syntax, and some semantics, as defined by a degree of non-syntactic statistical dependence between words. We quantify, using information theoretical measures, dependencies between words in BLISS sentences as well as differences between the distinct models we introduce for semantics. While modeling English syntax in its basic version, BLISS can be easily varied in its internal parametric structure, thus allowing studies of the relative learnability of different parameter sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Treebank-3 release of the Penn Treebank project.

References

  1. Saffran JR, Aslin RN, Newport EL. Statistical learning by 8-month-old infants. Science. 1996;274(5294):1926–28.

    Article  PubMed  CAS  Google Scholar 

  2. Christiansen MH. Using artificial language learning to study language evolution: exploring the emergence of word order universals. In: The evolution of language: 3rd international conference; 2000. pp. 45–8.

  3. Pena M, Bonatti LL, Nespor M, Mehler J. Signal-driven computations in speech processing. Science. 2002;298(5593):604–7.

    Article  PubMed  CAS  Google Scholar 

  4. Petersson KM, Folia V, Hagoort P. What artificial grammar learning reveals about the neurobiology of syntax. Brain Lang. 2010;Available from: http://dx.doi.org/10.1016/j.bandl.2010.08.003

  5. Friederici AD, Steinhauer K, Pfeifer E. Brain signatures of artificial language processing: evidence challenging the critical period hypothesis. P Natl Acad Sci USA. 2002;99(1):529–34.

    Article  CAS  Google Scholar 

  6. Gomez R. Infant artificial language learning and language acquisition. Trends Cogn Sci. 2000;4(5):178–86.

    Article  PubMed  Google Scholar 

  7. Mueller JL, Oberecker R, Friederici AD. Syntactic learning by mere exposure—an ERP study in adult learners. BMC neurosci. 2009;10(1):89.

    Article  PubMed  Google Scholar 

  8. Kinder A, Lotz A. Connectionist models of artificial grammar learning: what type of knowledge is acquired?. Psychol Res. 2009;73(5):659–73.

    Article  PubMed  Google Scholar 

  9. Reber AS. Implicit learning of artificial grammars. J Verb Learn Verb Behav. 1967;6:855–63.

    Article  Google Scholar 

  10. Knowlton BJ, Squire LR. Artificial grammar learning depends on implicit acquisition of both abstract and exemplar-specific information. J Exp Psychol Learn Mem Cogn. 1996;22:169–81.

    Article  PubMed  CAS  Google Scholar 

  11. Opitz B, Friederici AD. Interactions of the hippocampal system and the prefrontal cortex in learning language-like rules. Neuroimage. 2003;19:1730–37.

    Article  PubMed  Google Scholar 

  12. Manning CD, Schuetze H. Foundations of statistical natural language processing. 1st ed. Cambridge: The MIT Press; 1999

    Google Scholar 

  13. Marcus GF, Vijayan S, Bandi Rao S, Vishton PM. Rule learning by seven-month-old infants. Science. 1999;283(5398):77.

    Article  PubMed  CAS  Google Scholar 

  14. Hochmann JR, Endress AD, Mehler J. Word frequency as a cue for identifying function words in infancy. Cognition. 2010;115(3):444–57.

    Article  PubMed  Google Scholar 

  15. Ullman MT, Pancheva R, Love T, Yee E, Swinney D, Hickok G. Neural correlates of lexicon and grammar: evidence from the production, reading, and judgment of inflection in aphasia. Brain Lang. 2005;93(2):185–238.

    Article  PubMed  Google Scholar 

  16. de Diego-Balaguer R, Fuentemilla L, Rodriguez-Fornells A. Brain dynamics sustaining rapid rule extraction from speech. J Cogn Neurosci. 2011;23(10):3105–20.

    Article  PubMed  Google Scholar 

  17. Bahlmann J, Schubotz RI, Friederici AD. Hierarchical artificial grammar processing engages Broca’s area. Neuroimage. 2008;42:525–34.

    Article  PubMed  Google Scholar 

  18. Lany J, Saffran JR. From statistics to meaning. Psychol Sci. 2010;21(2):284–91.

    Article  PubMed  Google Scholar 

  19. Jurafsky D, Martin JH. Speech and language processing: an introduction to natural language processing, computational linguistics and speech recognition (Prentice Hall Series in Artificial Intelligence). 1st ed. Prentice Hall; 2000.

  20. Chomsky N. On certain formal properties of grammars. Inform Control. 1959;2(2):137–67.

    Article  Google Scholar 

  21. McRae K, Cree GS, Seidenberg MS, Mcnorgan C. Semantic feature production norms for a large set of living and nonliving things. Behav Res Methods. 2005;37(4):547–59.

    Article  PubMed  Google Scholar 

  22. Cover TM, Thomas JA, Wiley J, et al. Elements of information theory. vol. 306. New Jersey: Wiley; 1991.

    Book  Google Scholar 

  23. MacKay DJC. Information theory, inference, and learning algorithms. Cambridge: Cambridge University Press; 2003.

    Google Scholar 

  24. Altmann G, Kamide Y. Incremental interpretation at verbs: restricting the domain of subsequent reference. Cognition. 1999;73(3):247–64.

    Article  PubMed  CAS  Google Scholar 

  25. Bicknell K, Elman JL, Hare M, McRae K, Kutas M. Effects of event knowledge in processing verbal arguments. J Mem Lang. 2010;63(4):489–505.

    Article  PubMed  Google Scholar 

  26. Manning CD, Raghavan P, Schütze H. Introduction to information retrieval. 1st ed. Cambridge: Cambridge University Press; 2008.

    Google Scholar 

  27. Piantadosi ST, Tily H, Gibson E. Word lengths are optimized for efficient communication. P Natl Acad Sci USA. 2011 Mar;108(9):3526–29.

    Article  CAS  Google Scholar 

  28. Monaghan P, Chater N, Christiansen MH. The differential role of phonological and distributional cues in grammatical categorisation. Cognition. 2005;96(2):143–82.

    Article  PubMed  Google Scholar 

  29. Toro JM, Nespor M, Mehler J, Bonatti LL. Finding words and rules in a speech stream. Psychol Science. 2008;19(2):137–44.

    Article  Google Scholar 

  30. Shukla M, White KS, Aslin RN. Prosody guides the rapid mapping of auditory word forms onto visual objects in 6-mo-old infants. P Natl Acad Sci USA. 2011;108(15):6038–43.

    Article  CAS  Google Scholar 

  31. Monaghan P, Christiansen MH, Chater N. The phonological-distributional coherence hypothesis: cross-linguistic evidence in language acquisition. Cogn Psychol. 2007;55(4):259–305.

    Article  PubMed  Google Scholar 

  32. Richardson FM, Thomas MSC, Price CJ. Neuronal activation for semantically reversible sentences. J Cogn Neurosci. 2010;22(6):1283–98.

    Article  PubMed  Google Scholar 

  33. Gruening A. Neural networks and the complexity of languages [Ph.D. dissertation]. School of Mathematics and Computer Science, University of Leipzig; 2004.

  34. Longobardi G, Guardiano C. Evidence for syntax as a signal of historical relatedness. Lingua. 2009;119(11):1679–706.

    Article  Google Scholar 

Download references

Acknowledgments

We are grateful to Mayur Nikam and Mohammed Katran, who helped develop early versions of BLISS, and to Giuseppe Longobardi for linguistic advice and encouragement.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sahar Pirmoradian.

Appendix

Appendix

Full Grammar of BLISS

All the grammar rules and the lexicon of BLISS are shown in Table 4.

Table 4 The full BLISS PCFG (probabilistic context-free grammar)

Proof of Eq. 9

$$ \begin{aligned} P_{{\rm post}} &= \displaystyle\sum_{h}P(h) P(w_j | h)\\ &=\displaystyle\sum_{h}P(h)\left( (1-g) P_{{\rm prior}}(w_j) + g P_s(w_j | h) \right)\\ &=\displaystyle\sum_{h_{ie}} P(h_{ie}) P_{{\rm prior}}(w_j)\\ &+\displaystyle\sum_{h_{e}} P(h_{e}) \left( (1-g) P_{{\rm prior}}(w_j) + g P_s(w_j | h_e) \right)\\ &=\left(\displaystyle\sum_{h_{ie}} P(h_{ie}) + (1-g) \displaystyle\sum_{h_{e}} P(h_{e})\right)P_{{\rm prior}}(w_j)\\ &+ g \displaystyle\sum_{h_{ie}} P(h_{ie}) P_s(w_j | h_{ie}) \end{aligned} $$
(10)

After analyzing the relation between prior and posterior probabilities of a word in our models (Eq. 9), we have generated corpora with the desired overall word frequencies (the word frequencies of the semantics models were adjusted to the ones of the No-Semantics). Note, however, that there are some constraints regarding the possibility of generating sentences with arbitrary word frequencies. In Eq. 9, the numerator cannot be negative, because the output is a probability measure. Therefore, the possible posterior probability of a word is constrained by the parameter g and pairwise statistics in P s (w j | h e ) , which here has been derived from Shakespeare.

Length of BLISS Sentences

The probability distribution of the length of sentences generated by the BLISS grammar is shown in Fig.  10. To calculate mutual information values between words appearing in different positions of a sentence, we need to work with sentences of the same length. Considering the fact that the average length of the sentences is about 5 and also to obtain at least 5 distinct measures of triple mutual information values, we chose length 7. Thus, in this paper, for the results involving mutual information and triple mutual information, we used sentences of at least 7 words.

Fig. 10
figure 10

Probability distribution of sentence lengths in a corpus produced by the Subject–Verb model. Other models with grammar show similar distributions

Size of Model-generated Corpora

A technical question to be addressed is how many sentences are needed, to generate sufficient statistics. To answer this question, we produced up to 40 million sentences, and calculated several of the measures used in this paper.

In Fig.  11, the mutual information between words in neighboring positions in a sentence, averaged over positions, I(n;n−1), was measured for each model and with corpora of different size: 1, 10, 20, and 40 million sentences of at least 7 words. As shown, having 10 million sentences is enough for capturing pairwise statistics in the corpora, regardless of having syntax or semantics.

Fig. 11
figure 11

Average mutual information I(n;n−1), also averaged over all positions, versus number of sentences in the corpus. The mutual information between words is measured for each model and with corpora of different size: 1, 10, 20, and 40 million sentences of at least 7 words. As shown, 10 million sentences are enough for capturing pairwise statistics in the corpora, regardless of syntax or semantics

Figure 12 shows the result for the average triple mutual information among words, I(n;n−2,n−1), which needs the largest samples among all measures we have applied. As illustrated, increasing the size of the corpus from 20 to 40 million sentences does not appreciably change results for the models with syntax (Subject–Verb, Verb–Subject, Exponential, and No-Semantics), while we see some changes for the models without syntax, which need larger samples.

Fig. 12
figure 12

Average triple mutual information I(n;n−1,n−2), again averaged also across positions, versus number of sentences in the corpus. The three-way mutual information needs the largest sample among all measures we applied but, as illustrated, increasing the size from 20 to 40 million sentences does not change it considerably, for the models with syntax (Subject–Verb, Verb–Subject, Exponential, and No-Semantics). We see still some change for the models without syntax, which obviously need larger corpora, due to their larger word-to-word variability

In sum, corpora with 20 million sentences, of at least 7 words each, were used for the calculation of the mutual information and triple mutual information in this study.

For the KL-divergence results, we also need sufficient word-pair statistics, like for mutual information values. Hence, we have used corpora of at least 20 million sentences for the KL-divergence calculation as well, but without the 7-word constraint.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pirmoradian, S., Treves, A. BLISS: an Artificial Language for Learnability Studies. Cogn Comput 3, 539–553 (2011). https://doi.org/10.1007/s12559-011-9113-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-011-9113-4

Keywords

Navigation