Language names and codes

Languages often have many names. Spanish, for example, is also called Castellano, Español, Espanjalainen, and Spansk, among others. The situation is complicated for indigenous languages, many of which have not been well studied. It may be difficult to decide how to divide some language families into separate languages, especially in cases of dialect continuums.

At AILLA, we use the official name or spelling for a language, if there is one. Names for Mayan languages, for example, are defined by the Mayan Language Academy. If there is no such standard, we use the name given to us by the depositor.

Language Codes

The International Standards Organization (ISO) defines a set of three-letter codes for all the languages of the world. There are many flaws in this set, especially for minority languages, but some such set is needed to support computerized searches. It would not be feasible to search for narratives in Spanish if you had to include all the variant names in your search. It is much simpler and more effective to search for narratives in spn.

The ISO codes were originally designed by the Summer Institute of Linguistics. They are published in the Ethnologue. The Ethnologue has a code for nearly every language listed in Kaufman 1994a&b. Unfortunately, there are too many codes for some languages, or more precisely, some languages have been divided into too many subvarieties. For example, a language with many speakers, like Zapotec, may have codes for every town in which the language is spoken. On the other hand, there may be too few codes for a given linguistic situation. For example, there is only one code (kui) for both Kalapalo and Kuikuro, although the speakers of those languages consider themselves to be distinct cultures and communities. There are separate codes for all the Scandinavian languages, which technically form a linguistic region involving dialect continuums. Those codes reflect socio-political considerations, not linguistic ones.

Nevertheless, the codes are necessary, however flawed they may be. We hope that AILLA's community of speakers and researchers will help refine these lists over time. We will be happy to assist in submitting revisions to the ISO. In the meantime, we have two strategies for dealing with cases in which the existing codes are poorly mapped onto actual languages. First, we ask our depositors to choose the closest ISO code and to write a brief note about the issue for their collection page, if necessary. When the code changes, the old code will be maintained as part of the metadata at AILLA. If choosing an existing ISO code is simply not possible due to the complexity of the linguistic situation, we use a code for the language family instead

The ISO standard does not include codes for language families. We use the codes devised by Anthony Aristar and his team for MultiTree. These are not official, but they are a well-considered and consistent set.

Language family codes are especially useful in areas with complex language continuums like Mexico. The Mixtecan languages surveyed by Kathryn Josserand provide an excellent example. This survey was conducted in order to provide basic data for beginning to understand the relationships among varieties of Mixtecan. There are 52 codes for Mixtecan languages, one for most -- but not all -- of the towns in Mexico where a variety of Mixtecan is spoken. These codes did not correspond very well to Dr. Josserand's analysis, however (example). We could not consistently assign ISO codes to the recordings in this collection. These languages are referred to collectively in Mexico as 'Mixtecan.' We decided to follow that practice and label all the materials with the language family code, MIXT.

Language Family Tables

There are two tables presenting information about language names, dialects, and language families:

Meso-American Languages (including northern Mexico)

South American Languages (including the Caribbean islands)


The information in these tables is derived from the sources listed in the references. The backbone of the tables is the information in Kaufman 1994a and 1994b. The family trees, however, are greatly simplified, along the lines of Campbell (1997). We leave out intermediate nodes, such as Eastern Otomanguean, since the goal here is to facilitate searching the archive for related resources, not to present a definitive analysis of the language families. We refer interested readers to the cited works for greater depth of information.

The tables are bilingual. There are five columns in each table:

Nombre Nombres variantes Padre (Código de) Idioma (Código de) País
Name Variant names Parent Language (Code) Country (Code)
Huasteco [Te:nek], Wasteko, Huastecano Mayance HVA MEX
Palantla (Tierra baja ~ Lowland) Chinanteco CPA
Mixteco Misteko Mixtecano MIX MEX
Norteño Northern Mixteco  
Central Central    
Sureño Southern    
(Juxtlahuaca Oeste,   JMX
Yutanduchi,   MAB
Itundujia,   MCE

Name: This is the name that we will use for this language or family on AILLA's interfaces. Family names are always in boldface. Sometimes a boldface name in this column is a name that is also commonly used for a single language, like Mixteco. These names are treated as family names whenever Kaufman lists one or more sub-languages under that heading. What that means is that there is more than one language, linguistically speaking, in this group, each of which may have several distinct dialects. The language vs. dialect situation for some "languages", like Mixteco and Zapoteco, is complicated: there are many mutually intelligible dialects, some dialects that are not mutually intelligible, not enough information and not enough different names to go around. Kaufman often provides sub-groupings, like Northern, Central, and Southern; we list these in the Names column, because that is their linguistic level in the family tree. It is up to the speakers and to history to decide how to sort these complex groups into languages and give them names.

Variant names: This column contains lot of information for some languages:

  1. A list of variant names, beginning with the autonym -- a name used by speakers, which often means something like 'human language' -- presented in brackets. Different spellings or translations of the same name are separated by ~. (Note: some of the autonyms use characters that may not be included in a font that you have on your computer.)
  2. Names of dialects are listed in parentheses. Variant spellings or translations of the same dialect name are separated by ~. A dialect that has its own Language Code goes on a separate line. Kaufman lists dialects for some languages; these are always listed first, and matched to Ethnologue codes whenever possible. Then all the varieties that have unique codes are listed, one per line. These may or may not actually be separate dialects.

Parent: The parent of the language in the first column is given here. The root of a family tree can be identified by the fact that its Parent is 0, and the whole line is in boldface.

Language Code: These are described above.

Country code: The countries of the world are also identified by three-letter codes, which come from the International Standards Organization, ISO-3166. In the tables, the country code is given at as high a level in the language family tree as possible. For example, all the Mixe-Zoquean languages are spoken in Mexico, so the country code appears in the row that is the root of that family tree. But some Mayan languages are spoken in Mexico, and some in Guatemala, so there is a country code given for each individual Mayan language.

AILLA is a joint effort of the LLILAS Benson Latin American Studies and Collections, the Department of Linguistics, and the Digital Library Services Division of the University Libraries at the University of Texas at Austin.
AILLA is also grateful for support from the National Endowment for the Humanities and the National Science Foundation.
