Language families are groups of languages said to be “genetically” related on the basis of having a common ancestor, or protolanguage. Such languages share several features and vocabulary items, although these similarities are hardly apparent when comparing two languages as seemingly distinct as say, English and Hindi (both of the Indo-European family). As speakers move apart, systematic changes occur over time, until languages come to differ greatly from each other. Families are further subdivided into branches of languages that diverged from each other only after splitting from the family’s common ancestor. These languages share more similarities with each other than with languages belonging to other branches within the family.
Historical linguists normally use the comparative method to reconstruct a protolanguage. By examining several related languages for cognates, words bearing a similarity due to their common descent, linguists can postulate the original forms from which the cognates arose. This method uses lexical terms such as pronouns, kinship terms, body parts, and lower numbers, which are terms most resistant to change. Though these methods have provided significant insight into the genetic links between different languages, it is important to remember that similarities can also arise from borrowing, linguistic universals, or by chance.
The classification of languages into families has not been uncontroversial, especially when mega-family relationships between two entire families are proposed. The two opposite positions regarding such proposals are:
1. The clumpers (who tend to favor clumping many languages into few families)
2. The splitters (who will only concede relationships on the strongest basis of the of evidence and prefer several, separate families).
Because of such disagreements, and other factors such as insufficient research into certain languages, there is no established number of families. Ethnologue, a language database maintained by the Summer Institute of Linguistics, lists over 120 distinct families, some with only a single language. Some languages are not able to be classified with any others, and are labeled as isolates without any known relatives.
Perhaps the most familiar and studied language family is Indo-European, to which English belongs,and from which several proto-Indo-European words have been reconstructed. Research into this language family has spanned nearly two centuries, allowing for reconstructed terms and a proposed homeland in the Russian steppes. The language was probably morphologically complex, with an intricate case system as seen in Greek, Latin, Sanskrit, and maintained in several modern languages. Languages belonging to this group extend throughout most of Europe, parts of the Near East, and into India.
There are several branches:
- The Germanic Branch includes English, German, Frisian, Dutch, Afrikaans, Swedish, Danish, and Norwegian. The extinct East Germanic sub-branch is represented by written manuscripts in Gothic.
- The languages belonging to the Italic Romance Branch are descendants of Latin, and include Italian, Spanish, Portuguese, French, Romanian, Catalan, and a few additional languages.
- The Celtic Branch was once distributed throughout a wide swath of Europe, but is now mostly confined to the British Isles and Ireland. Languages include Cornish, Manx, Breton, Irish Gaelic, Scottish Gaelic, and Welsh.
The Balto-Slavic Branch of the Indo-European family is further divided into the Baltic languages—including Latvian, Lithuanian and extinct Old Prussian—and the Slavic languages including Russian, Ukrainian, Bulgarian, Serbian, Croatian, Czech and Polish.
Greek comprises its own branch, but the modern variant of this language differs greatly from the Classical Greek spoken long ago.
Other single-language families include Armenian, spoken in Turkey and Armenia, and Albanian,spoken in Albania and the surrounding areas.
The Indo-Iranian Branch includes the Iranian languages Farsi (or modern Persian, spoken in Iran), Kurdish, and Pashto, among others. The Indo-Aryan sub-branch includes the vaast majority of the languages of India, many of which trace back to the vernacular on which Sanskrit was based. These include the widely-spoken Hindu/Urdu, Bengali, Gujarati, Marathi, Punjabi, and many more.
Two entire branches of the Indo-European family are extinct: Anatolian, whose most prominent member was Hittite, and Tocharian, subdivided into Tocharian A and B, which was spoken in the first millennium AD in Chinese Turkestan, now known as Xinjiang.
The Uralic language family is found mainly in the northwestern part of the Eurasian continent and divides into two branches: Finno-Ugric and Samoyedic.
To the former belong Finnish and Estonian, spoken in northern Europe, as well as Hungarian, spoken in Hungary in central Europe. Hungarian is the most widely spoken member of the Uralic language family. Also included in this grouping are the Sami languages spoken throughout the northern Eurasian area.
The Samoyed Branch encompasses the languages of the Samoyeds, who sparsely inhabit Siberia and arctic Russia. These languages include Nenets, Selkop, and others. One common feature found in most of these languages is negative-conjugation, in which the negative auxiliary rather than the main verb is conjugated. Additional common traits in many but not all of the Uralic languages are vowel harmony, and SOV word order (with varying degrees of flexibility), and spatial case markings.
The Caucasian languages are found in the area around the Caucus Mountains. A number of languages in this region belong to the Indo-European or Altaic language families, but nearly forty defy such classification and comprisre three language families. The Kartvelian or Southern Caucasian family includes Georgian, with the most native speakers. The two additional families are Northwest Caucasian (with Kabardian and Abkhaz among others) and Northeast Caucasian (including Chechen and Kuri). Though the languages have often been studied comparatively and share some features, such as a rich consonant inventory and ergativity, they are generally seen as distinct families rather than branches of a single Caucasian family.
The languages of the Altaic family cover a large area throughout northern and central Asia. The three main branches are Turkic, Mongolian, and Tungusic.
Turkish is the most well-known member of the Turkic group, which also includes Azerbaijani, Turkmen, Uzbek, Kazakh, and several others languages spoken throughout central Asia. The Turkic languages bear a close linguistic resemblance to one another, and often exhibit a verb-final word order, agglutinating morphology, and vowel harmony.
The Mongolian Branch includes the Mongol language, but also some others, such as Buryat, Dagur, and Dongxiang, languages located primarily in Mongolia and China.
To the Tungusic Branch belong Evenki, Lamut, Nanai, and Manchu, located in Siberia and northern China. Mongolian and Tungusic languages share with the Turkic languages a SOV word order and vowel harmony.
It is to the Altaic language family that Japanese and Korean are sometimes assigned. However, this classification remains very controversial, and no strong genetic relationships with other languages have been shown. The two are alternatively considered isolates, or to be related to each other. Both are widely-spoken and have am SOV word order.
The term Paleosiberian is used to refer to a grouping of languages based not on genetic relationships but on geographical proximity. These are the languages of the sparsely-inhabited northeastern parts of Siberia. Four language families are recognized under the general term Paleosiberian: the Luoravetlan languages include Chukchi, Koryak, and Kerek; the Yeniseian and Yukaghir groups each consist of a single surviving language, Ket and Yukaghir, respectively; and Gilyak comprises its own group.
The Dravidian languages are found mostly in southern India. The most prominent members of this family are Tamil, Telugu, Kannada, and Malayalam, all of which have several millions of speakers. Many other Dravidian languages are spoken in this area as well, but do not match these four in number of speakers. Dravidian languages have a large number of cases, and use an agglutinating, suffixing morphology. The basic word order is SOV. Because of the proximity of the Dravidian languages to the Indo-Aryan languages in the more northern parts of Europe, the two language families have influenced each other in the borrowing of words and features.
The Sino-Tibetan family contains two main branches: Sinitic and Tibeto-Burman.
To the Sinitic Branch belong all of the Chinese languages, sometimes referred to as dialects despite often being mutually unintelligible. Mandarin and Cantonese are the most widely-spoken of these.
The Tibeto-Burman languages are spoken in southern China and other countries of Southeast Asia. Tibetan and Burmese are the most prominent members of this branch which also includes several other languages. A final branch sometimes included in this family is the Miao-Yao group, represented most by Miao or Hmong, and Yao or Mien. The classification of this group remains controversial and is sometimes considered a separate family. Most of the Sino-Tibetan languages are tonal, SOV, and have a mostly agglutinating morphology.
The Tai or Kadai language family includes many of the languages of Southeast Asia. Thai, the language of Thailand, belongs to the southwestern branch, as does Lao. Several other languages spoken throughout Myanmar/Burma, Thailand, Laos, and parts of Cambodia, Vietnam, and China are included in this family. Some examples: Zhuang, Shan, Dong, and Isan. Many of these languages feature mostly monosyllabic words, tones, and a SVO word order.
Also spoken in parts of southern Asia are the Austro-Asiatic languages.
The Mon Khmer Branch includes Khmer, spoken in Cambodia, Mon, in Myanmar/Burma and Thailand, and Vietnamese, although there is some dispute about the latter. The other branch, Munda, includes several languages spoken throughout parts of India and the Nicobar Islands. Two prominent members of this group are Mundari and Santali.
Afro-Asiatic languages are those found throughout northern Africa and into the Middle East. This is a large family with several branches. One large branch is Semitic, with Hebrew, Arabic, and Amharic as its best known members. Arabic dialects, which often differ greatly from one another, are spoken throughout several countries in the Middle East. Egyptian Coptic once formed its own branch, but is now extinct. Berber languages are spoken in northern Africa, and include Kabyle, Tamashek, and many Tuareg languages.
Other branches are Cushitic (including Dabarre, Somali, and Oromo), Chadic (with Hausa), and Omotic (with Wolaytta, among others). Some common features of many of these languages include pharyngealized or glottalized consonants, the use of prefixes to conjugate verbs, internal inflections, masculin and feminin genders and masculin and feminin distinctions.
Niger-Congo languages are spread throughout (western and) sub-Saharan Africa. This is a very large family, encompassing over a thousand languages.
The well-known Bantu languages, widely spoken throughout the southern half of Africa, are a sub-group within this family and include Kikongo, Xhosa, Shona, Zulu and Swahili, a lingua-franca of eastern Africa. But the Niger-Congo family encompasses many other languages as well, such as the Kordofanian languages, the Gur languages, the Atlantic group (including Wolof), the Kwa group (with Akan, Ewe, and Ijo), the Mande group (including Bambara and Malinke), and the Adamawa-Ubangi languages. Most of the family’s languages are tonal and have several noun classes.
Nilo-Saharan languages are found in two groupings, one around the base of the Nile, and the other at the base of the Chari River. The Eastern Sudanic group includes Maasai, Kalenjin, Luo, Dinka, Nubian and Nuer, among others, while Lugbara, Lendu and Ngambai are grouped with others into the Central Sudanic Branch. Additional groupings include Komuz, Saharan, and Songhai.
The Khoisan languages, found in southern Africa primarily in the Kalahari Desert, comprise the smallest of Africa’s language families. Nonetheless, they are well-known due to the click consonants that characterize them. Such features are very rare in the world’s languages such as Xhosa, Zulu, and several southern Bantu languages, have borrowed some of these sounds from the Khoisan languages, examples including Nama, !Kung, and Jul’Hoansi.
The Austronesian language family, found throughout the Pacific in Malaysia, Polynesia, Indonesia, Micronesia, and many other islands in the area, has over 1200 languages.
One large branch of this family is known as Malayo-Polynesian and includes such languages as Javanese, Malay, Tagalog, and Chamorro among many others. Other members of this family are Taiwanese, Maori, Samoan, Tongan, and Fijian. Many Austronesian and Malagasy (spoken in Madagascar) languages have both inclusive and exclusive first person pronouns; are verb-initial or verb-second word order; and use reduplication as grammatical markers. Several languages have very simple phonemic systems: Hawaiian has only 13 phonemes.
Although Austronesian languages are spoken in the island of New Guinea, this small landmass is home to an additional 650 languages that comprise their own family called Indo-Pacific or Papuan. The rough terrain of the island made it possible for groups of people to live in relative proximity while remaining isolated, which allowed their speech to diverge into separate languages. Indeed, there may be languages yet to be discovered. A common feature is switch reference, in which each dependent clause within a sentence is marked for whether or not it shares the subject of the independent clause. It has also been suggested that the languages Tasmanian (extinct) and Andamanese may belong to this family.
The Australian Aboriginal languages form their own family. Many of the members of this group have a very small number of speakers, and several others have already become extinct. Tiwi, Walpiri, and Djingili are among those with the most speakers. The language of Dyirbal is notable for its avoidance style, a lexicon used around taboo relatives that differs greatly from the ordinary lexical term. Overall, many Australian languages lack fricative consonants and make no voicing distinction, but they do make distinctions between more places of articulation for stops and nasals.
The classification of the languages in the Americas has often been disputed. However, some families are well established and accepted.
Furthest north is the Eskimo-Aleut family, extending from the tip of Siberia to Greenland.
The Na-Dené languages are spoken in north-west Canada as well as an area in the southwestern U.S. Some of the languages in this family are Navajo and related Apache languages, Haida, and Tlingit.
The Macro-Algonquian group refers to a large number of languages spread throughout central and eastern Canada and the U.S. and including Cheyenne, Cree, Mohican, Ojibwa and Lenape (Delaware) among others.
The Siouan languages are found in several areas in the central and eastern parts of the U.S. and include Dakota, Crow and Winnebago.
Mohawk, Susquehannock (Western Pennsylvania) and Oneida are members of the Iroquoian languages.
Spanning North and Central America are the Hokan family, the Uto-Aztecan family (with Hopi and Nahuatl), and the Penutian group (with Chinook and Nez Perce).
The many Mayan languages spoken throughout Mexico and Guatemala include Yucatec Maya, Itza’, Quiche, and Tzotzil. The varieties of Quiche share among them over a million speakers.
Within South America, the language categories are even more difficult to determine, and a great deal of linguistic diversity remains in the region despite the wide-spread use of Indo-European languages. Some language families include Carib, spoken north of the Amazon; Macro-Ge, found throughout Brazil; and Panoan, in Brazil, Bolivia, and Paraguay. These three are sometimes subsumed under a single family called Ge-Pano-Carib. Hixkaryana, a Carib language, is known for being the first language discovered to demonstrate an OVS word order. The Chibchan family is found in several areas from Central America to Brazil and Chile, and includes the Yanomami languages.
Finally, the Andean-Equatorial group, likewise wide-spread, include Aymara, Quechua (with its many varieties) and Guarani, a Tupi language that shares official status in Paraguay. Not all of these relationships are agreed upon, and Tupi, Quechan, and Aymaran, are sometimes considered separate families of their own rather than branches of a common family. Many languages have become extinct and several more are nearing such a state.
Finally, there are a number of types of languages that defy any attempts to assign them to a particular family. Some well-known isolates include Basque, spoken in northern Spain and southern France but bearing no relationship to the Indo-European languages; Ainu, spoken in Japan; and Gilyak, in northeast Russia. Additionally Creole and Pidgin languages, which develop due to contact between two or more distinct languages, remain difficult to classify. Some examples include Tok Pisin, Haitian, Malay, Sango, and Kituba. These languages may, over time, converge with their lexifier language, or may further diverge into mutually unintelligible tongues.
It is important to remember that the number of languages in a family is not an indication of the number of speakers. In many cases, the families may be quite small. While the Austronesian and Niger Congo families have the greatest number of languages, the Indo-European and Sino-Tibetan families boast the most speakers. The study of language families remains important for understanding the way that languages share certain features but also change over time into unique tongues.
Comrie, Bernard (ed.). 1987. The World’s Major Languages. New York: Oxford University Press.
Crystal, David. 2010. The Cambridge Encyclopedia of Language. 3rd ed. Cambridge: Cambridge University Press.
Lewis, M. Paul (ed.). 2009. Ethnologue: Languages of the World, Sixteenth edition. Dallas, Tex.: SIL International. Online version: http://www.ethnologue.com/.