Skip to main content

Table 5 Linguistic details and available parts of the Bible corpus with approximate translation date (linguistic data and translation dates from Lewis et al. 2014)

From: A massively parallel corpus: the Bible in 100 languages

ISO 639-3 Language Family Genus Subgenus Speakers Script Full Parts Year
acu Achuar-Shiwiar Jivaroan    5,000 Latin N NT 1981
afr Afrikaans Indo-European Germanic West 5,000,000 Latin Y   1953
agr Aguaruna Jivaroan    38,300 Latin N NT 1973
ake Akawaio Carib Northern East-West Guiana 4,500 Latin N NT 2010
als Albanian Indo-European Albanian Tosk 3,000,000 Latin Y   1993
amh Amharic Afro-Asiatic Semitic South 17,500,000 Ethiopic N NT 1840
amu Amuzgo Oto-Manguean Amuzgoan   23,000 Latin N NT 1973
arb Arabic Afro-Asiatic Semitic Central 206,000,000 Arabic Y   1865
hye Armenian Indo-European Armenian   64,00,000 Armenian N Parts 1883
djk Aukan Creole English based Atlantic 15,500 Latin N NT 1999
bsn Barasana-Eduria Tucanoan Eastern Tucanoan Central 1,890 Latin N NT 2001
eus Basque Basque    700,000 Latin N NT 1855
bul Bulgarian Indo-European Slavic South 9,000,000 Cyrillic Y   1864
cjp Cabécar Chibchan Talamanca   8,840 Latin N NT 1993
cak Cakchiquel Mayan Quichean Greater Quichean 132,000 Latin N NT 1931
cni Campa (Asháninka) Arawakan Maipuran Southern Maipuran 26,100 Latin N NT 1972
kbh Camsá Equatorial (?)    4,770 Latin N NT 1990
ceb Cebuano Austronesian Malayo-Polynesian Phillipine 15,800,000 Latin Y   1917
cha Chamorro Austronesian Malayo-Polynesian Chamorro 92,000 Latin N Parts 2007
chr Cherokee Iroquoian Southern Iroquoian   16,400 Cherokee N NT 1850
chq Chinantec (Quiotepec) Oto-Manguean Chinantecan   8,000 Latin N NT 1983
cmn Chinese Sino-Tibetan Sinitic Chinese 840,000,000 Chinese Y   1874
cop Coptic Afro-Asiatic Egyptian   Extinct Coptic N NT 1716
hrv Croatian Indo-European Slavic South 5,500,000 Latin Y   1831
ces Czech Indo-European Slavic West 9,500,000 Latin Y   1380
dan Danish Indo-European Germanic North 5,500,000 Latin Y   1550
dik Dinka Nilo-Saharan Eastern Sudanic Nilotic 450,000 Latin N NT 2006
eng English Indo-European Germanic West 328,000,000 Latin Y   1611
epo Esperanto Constructed    1,000 Latin Y   1900
est Estonian Uralic Finno-Ugric Finno-Permic 1,000,000 Latin Y   1739
ewe Ewe Niger-Congo Atlantic-Congo Volta-Congo 2,250,000 Latin N NT 1911
pes Farsi (Persian) Indo-European Indo-Iranian Iranian 22,000,000 Arabic Y   1838
fin Finnish Uralic Finno-Ugric Finno-Permic 5,000,000 Latin Y   1776
fra French Indo-European Italic Romance 58,000,000 Latin Y   1776
gla Gaelic (Scottish) Indo-European Celtic Insular 67,000 Latin N Parts 1801
gbi Galela West Papuan North Halmahera Galela-Loloda 79,000 Latin N NT 2002
deu German Indo-European Germanic West 90,300,000 Latin Y   1545
ell Greek Indo-European Greek Attic 13,000,000 Greek Y   1840
guj Gujarati Indo-European Indo-Iranian Indo-Aryan 45,500,000 Gujarati N NT 1823
hat Haitian Creole Creole    7,700,000 Latin Y   1985
heb Hebrew Afro-Asiatic Semitic Central 5,300,000 Hebrew Y   1599
hin Hindi Indo-European Indo-Iranian Indo-Aryan 180,000,000 Devanagari Y   1818
hun Hungarian Uralic Finno-Ugric Ugric 12,500,000 Latin Y   1590
isl Icelandic Indo-European Germanic North 230,000 Ethiopic Y   1863
ind Indonesian Austronesian Malayo-Polynesian Malayo-Sumbawan 23,100,000 Latin Y   1974
ita Italian Indo-European Italic Romance 61,700,000 Latin Y   1649
jai Jakalteko Mayan Kanjobalan-Chujean Kanjobalan 77,700 Latin N NT 1979
jpn Japanese Japonic    122,000,000 Kanjii Y   1883
quc K’iche’ Mayan Quichean-Mamean Greater Quichean 1,900,000 Latin N NT 1995
kab Kabyle Afro-Asiatic Berber Northern 3,100,000 Latin N NT 2011
kan Kannada Dravidian Southern Tamil-Kannada 35,300,000 Kannada Y   1831
kor Korean Altaic(?)    66,300,000 Hangul Y   1911
lat Latin Indo-European Italic Latino-Faliscan Extinct Latin Y   400
lav Latvian Indo-European Baltic Eastern 1,500,000 Latin N NT 1689
lit Lithuanian Indo-European Baltic Eastern 3,100,000 Latin Y   1735
dop Lukpa Niger-Congo Atlantic-Congo Volta-Congo 50,000 Latin N NT 2009
plt Malagasy Austronesian Malayo-Polynesian Greater Barito 7,520,000 Latin Y   1835
mal Malayalam Dravidian Southern Tamil-Kannada 35,400,000 Malayalam Y   1841
mam Mam Mayan Quichean-Mamean Greater Mamean 200,000 Latin N NT 1993
glv Manx Indo-European Celtic Insular 77,000 Latin N Parts 1773
mri Maori Austronesian Malayo-Polynesian Central-Eastern 60,000 Latin Y   1858
mar Marathi Indo-European Indo-Iranian Indo-Aryan 68,000,000 Devanagari Y   1821
mya Myanmar (Burmese) Sino-Tibetan Tibeto-Burman Lolo-Burmese 32,300,000 Myanmar Y   1835
nhg Nahuatl (Tetelcingo) Uto-Aztecan Southern Uto-Aztecan Aztecan 3,500 Latin N NT 1980
nep Nepali Indo-European Indo-Iranian Indo-Aryan 11,100,000 Devanagari Y   1914
nor Norwegian Indo-European Germanic North 4,600,000 Latin Y   1904
ojb Ojibwa Algic Algonquian Central 20,000 Aboriginal Syllabics N NT 1988
pck Paite (Chin) Sino-Tibetan Tibeto-Burman Kuki-Chin-Naga 78,800 Latin Y   1971
pol Polish Indo-European Slavic West 36,600,000 Latin Y   1975
por Portuguese Indo-European Italic Romance 178,000,000 Latin Y   1751
pot Potawatomi Algic Algonquian Central 1,300,000 Latin N Parts 1844
kek Q’eqchi’ Mayan Quichean-Mamean Greater Quichean 400,000 Latin Y   1988
quw Quichua Quechuan Quechua II B 20,000 Latin N NT 1972
rmn Romani Indo-European Indo-Iranian Indo-Aryan 710,000 Latin N NT 2008
ron Romanian Indo-European Italic Romance 23,400,000 Latin Y   1928
rus Russian Indo-European Slavic East 143,000,000 Cyrillic Y   1876
srp Serbian Indo-European Slavic South 7,000,000 Latin Y   1804
jiv Shuar (Jivaro) Jivaroan    46,700 Latin N NT 2010
slk Slovak Indo-European Slavic West 4,610,000 Latin Y   1832
slv Slovene Indo-European Slavic South 1,730,000 Latin Y   1584
som Somali Afro-Asiatic Cushitic East 8,340,000 Latin Y   1979
spa Spanish Indo-European Italic Romance 328,000,000 Latin Y   1569
swh Swahili Niger-Congo Atlantic-Congo Volta-Congo 788,000 Latin N NT 1891
swe Swedish Indo-European Germanic North 8,300,000 Latin Y   1917
arc Syriac Afro-Asiatic Semitic Central Extinct Syriac N NT 464
shi Tachelhit Afro-Asiatic Berber Northern 3,000,000 Arabic N NT 2010
tgl Tagalog Austronesian Malayo-Polynesian Phillipine 23,900,000 Latin Y   1905
ttq Tamajaq (Tuareg) Afro-Asiatic Berber Tamasheq 640,000 Latin N Parts 1979
tel Telugu Dravidian South-Central Telugu 69,600,000 Telugu Y   1854
tha Thai Tai-Kadai Kam-Tai Be-Tai 20,300,000 Thai Y   1883
tur Turkish Altaic Turkic Southern 50,000,000 Latin Y   1827
ukr Ukranian Indo-European Slavic East 37,000,000 Cyrillic N NT 1903
ppk Uma Austronesian Malayo-Polynesian Celebic 20,000 Latin N NT 1996
usp Uspanteco Mayan Quichean-Mamean Greater Quichean 3,000 Latin N NT 1999
vie Vietnamese Austro-Asiatic Mon-Khmer Viet-Muong 68,600,000 Latin Y   1934
wal Wolaytta Afro-Asiatic Omotic North 1,230,000 Ethiopic N NT 1981
wol Wolof Niger-Congo Atlantic-Congo Atlantic 4,000,000 Latin N NT 1988
xho Xhosa Niger-Congo Atlantic-Congo Volta-Congo 7,800,000 Latin Y   1859
dje Zarma Nilo-Saharan Songhai Southern 2,350,000 Latin Y   1990
zul Zulu Niger-Congo Atlantic-Congo Volta-Congo 9,980,000 Latin N NT 1883