Posts
One of the typologically puzzling things about Arabic, and Semitic languages in general, is that /i/ and /u/ very often contrast with /a/, but hardly ever with each other. This is usually an indication that these are allophones, but this explanation can not be held if these vowels can't freely interchange, and are perceived as separate vowels.
Although this issue is an issue in the whole of Semitic, as far as I am aware, I am most familiar with Arabic, so I'll stick to using examples from this language.
Of course, there is one extremely productive pattern of 'minimal pairs' of vowels in the form of case endings.
Nom. rajul-un
Gen. rajul-in
Acc. rajul-an
So, sure they seem quite phonemic in that context. But what I find puzzling is that in stem formations we can't find u and i to contrast normally.
To further research this I have made a table of the distribution of Arabic vowels in CVCVC roots. The table looks as follows:
|
V1 \ V2 |
a |
i |
u |
ā |
ī |
ū |
|
a |
+ |
+ |
+ |
+ |
+ |
+ |
|
i |
+ |
- |
- |
+ |
- |
- |
|
u |
+ |
- |
+ |
+ |
- |
+ |
|
ā |
- |
+ |
- |
- |
- |
- |
|
ī |
- |
- |
- |
- |
- |
- |
|
ū |
- |
- |
- |
- |
- |
- |
Several notes can be made about this table. I shaded the entry CaCiC, since it is difficult. The only word I can think of is malik 'king' (although doubtlessly there are more). Some people will probably know that this word is related to Hebrew mĕlĕḵ which paradoxically points to a CVCC root. Is malik perhaps from *malk with an epenthetic vowel? It is very reminiscent of dutch melk 'milk' which by many people is in fact pronounced [ˈmɛ.lǝk] rather than [ˈmɛlk].
Another thing that is strange is that, of the long vowels, only ā can occur in V1 position, and exclusively if it is followed by the vowel i. Could it perhaps be that the CaCiC is indeed from *CaCC, and that CāCiC represents the orignal *CaCiC?
If this were true, the table of vowel distribution would look a lot more elegant.
|
V1 \ V2 |
a |
i |
u |
ā |
ī |
ū |
|
a |
+ |
+ |
+ |
+ |
+ |
+ |
|
i |
+ |
- |
- |
+ |
- |
- |
|
u |
+ |
- |
+ |
+ |
- |
+ |
There is an enormous problem with this reductionist approach though. The Vowel pattern CāCiC is associated with a meaning of nomen agentis. It is quite productive, from the word kataba 'to write' we can form kātib 'writer'. That would be fine, if it wasn't that Hebrew has this exact same pattern. Hebrew has the verb ṣāfăr 'to count' besides ṣôfēr 'scribe, writer (litt. 'counter')' (ô < *ā, ē < *i). If we assume that CāCiC is from *CaCiC this must have been a common shift for Arabic, Hebrew and I've been told, also Aramaic. Could someone with knowledge of Akkadian/Ethiopian Languages let me know whether this pattern exists and whether it has CāCiC or CaCiC?
So, after the discussion on CaCiC, let's continue regarding this vowel table. Maybe not completely surprising, but for allowed vowel distributions, Arabic disregards vowel length. CiCiC isn't allowed, whether the second i is long or not. Same goes for the other disallowed vowel combinations. I wonder what this implies. I have no experience with languages that have long vowels and limitations on their distribution, so I'm not sure what scenario is typologically plausible.
It is good that I made this table, for it has shown me some stuff that I was previously unaware of. I was under the impression that the distribution of u and i was identical, but I have found absolutely no examples of words with CiCiC, while CuCuC is in fact quite a common plural formation. As I knew before writing this combinations with i and u in one root are impossible, which is mysterious. It almost looks like a sort of 'vowel disharmony' if I may coin that term.
I had written a large post of a proposal of a fourth proto-semitic vowel *ǝ , that would be affected by its surroundings, but often simply surface as a or i. But once I put the distribution into a table, I became uncertain if such a proposal would be feasible, and threw away most of this post.
It is true that i and also u sometimes have schwa-like properties, if malik indeed comes from *malk that's obviously an example, but there's even more readily available examples in the form of the 'alif al-waṣl. When a Arabic word starts with a CC cluster a vowel is placed in front of the first consonant to make the cluster pronounceable. For example *sm 'name' becomes (i)sm. When a vowel proceeds it, this vowel is lost again, it is purely epenthetic. When the root contains no vowels, or an a or i the value of the 'alif al-waṣli is i. But if the following vowel is an u the 'alif al-waṣl is also u as in *drus > (u)drus 'learn!'. This is in fact an example of vowel harmony. There are some nouns violate this rule though like (i)mru'' 'man'. Another strange thing is that the a in the definite article (a)l behaves just like 'alif al-waṣl except that it is always a in isolated pronunciation. Nevertheless it is quite obvious that this alif al-waṣl must have come from a subphonemic *ǝ.
Another example of a *ǝ is the i that is often used to break up clusters in a sentence especially the apocopate verb often needs an extra i places in between its final consonant and the following word.
If there was a *ǝ in the middle of words, would that help to explain the distribution of the vowels? It might, if we assume that all i were in fact *ǝ we would understand which CiCuC and CuCiC do not occur, since the u would have affected the *ǝ to become an u. But it still does not explain why CiCiC and CiCīC unless we assume that *ǝ and *ī turned a preceeding *ǝ into a. Such an explanation is entirely ad-hoc. Although it might be true, there is no indication that it was like that, and we would need comparative evidence to prove it.
So to conclude, Arabic gives quite strong indications that i was in fact rather a *ǝ than an *i that was heavily affected by its surroundings. This does not increase or decrease the amount of phonemic vowels, but it may help understanding the vocalic patterns in Arabic better.
There is no conclusive evidence though that i was *ǝ, one would have to look at deeper genetic relations (Afro-Asiatic? Maybe only Berbero-Semitic?). I do feel that one should probably position this *ǝ in proto-semitic times if it exists. Hebrew vowel distribution is as far as I can see it, quite similar to that of Arabic.
I hope to soon dive into correspondences between Arabic and Berber verbal morphology with this hypothesis that i should be interpretead as a *ǝ. But before that I should probably consider the Arabic verbal morphology first, since I've only considered nouns of the type CVCVC so far. The vowel distribution in the verbal morphology becomes quite a bit more difficult though.
That's right after 3 years and a bit, I am now officially a Bachelor of Arts in Comparative Indo-European Linguistics. Yay me, and yay for shameless self-promotion!
So I finished my final Bachelor Thesis with a score of 9/10, that is to say, pretty damn good. And therefore I shall treat you guys on this goodness, my thesis on the Consonant Gradation in the Indo-European Verb.
I am sure that it will lead to loads of discussion, because there is a lot to discuss, and even more is uncertain. But I am willing to discuss it all, it's an exciting subject. So enjoy!
[EDIT] Due to issues with rapidshare, I now uploaded my thesis to Mediafire (Thanks Tropylium!), please let me know if anyone runs into issues.
Recently I've been doing a class on fieldwork, in this class we have an informant who speaks Minangkabau, a Malay dialect spoken by about 5 million people around Padang (which was recently hit by quite a severe earthquake).
Last week it was my turn to elicit some words and sentences from the informant, and one of the things that was elicited was the word for 'leg' kaki. This word struck me as odd, but I had no idea why.
Just now my mom came in, it's a cold evening, and she had cold feet.
I tell her wow, wat heb je kouwe kakkies ' wow you have cold feet'. And then it struck me why the word had seemed so familiar: kakkies is a bargoens word for 'feet'. And yes this is indeed a loanword from Indonesian!
As the title says, I am often perplexed by afro-asiatic. I've learned some Arabic and Hebrew, followed a class on comparative semitic, I have a (hardly looked at) book on Egyptian, and I'm currently following a class on Riffian Berber and general Berber Linguistics.
Studying these languages it seems silly to deny that Proto-Afro-Asiatic must have existed. So I won't. But what always puzzles me, is the fact that unlike Indo-European the 'proof' for Afro-Asiatic is quite the opposite of what kind of proof we find in Indo-European.
Lexical items in Afro-Asiatic that are cognate, are extremely hard to find. This is quite the opposite in Indo-European, where lexical items were the first things to catch the attention of a certain relation between the languages.
But the morphology of Afro-Asiatic is disturbingly similar. Obvious are things like -t suffix for the feminine, but even personal endings of verbs are surprisingly similar in Afro-Asiatic.
This is completely unlike Indo-European. Sure Sanskrit and Greek grammatically are almost clones of each other, but I make it no secret that I believe that the relation between Sanskrit and Greek is a lot closer than some people claim. But reconstructing a uniform image of the verbal system or even morphology when comparing Sanskrit to, say, Germanic, stuff gets a lot more confusing.
And then we're talking about Germanic and Sanskrit. The time depth of Indo-European is a LOT less than that of Afro-Asiatic. Is there something inherent to the way the language's structure which makes morphemes more resistant to change? That seems odd, structurally you could argue Indo-European at an early stage (but post-syncope) was quite similar to Afro-Asiatic languages.
Of course this 'morphological but not lexical' change resistance is more of a 'feeling' I get, then anything I ever measured. So maybe I'm wrong about this. Maybe Afro-Asiatic is just as innovating in the morphological department as Indo-European, but just a whole lot more innovating in the lexical department.
This is me just rambling to a point that it's appallingly unscientific, but I guess it'll set some of your brains into motion, and that'll be enough. :-P
The 19th of october, that will be the date that I will be defending my Bachelor thesis on Consonant Gradation in the Verbal System of Proto-Indo-European.
This defense is open to public, if any reader happens to be around and wants to come, he is invited to place a comment, then I'll provide more information.
After that I can officially call myself Bachelor of Arts in Comparative Indo-European Linguistics, which is kind of cool.
Hey guys! Long time no see. My Bachelor thesis was eating a lot of time, combined with work on the Greek Etymological Dictionary and me just simply enjoying my holiday. But I'm back, with this word that has been bothering me for some time now.
The word Skt. sthā- 'to stand', is besides its double representation of the Laryngeal quite straightforward. Now if we look at its causative though, something really funny happens. Usually a causative is formed by giving the root lengthened grade (from PIE *o in open syllables) and adding the suffix -aya-. Words ending in vowels though would get the situation where we'd have **sthā-aya-. which is a rather unfortunate cluster of vowels. To remedy this, Sanskrit puts a -p- between the root and the suffix resulting in sthāpaya- 'to cause to stand; to stop'.
Why a p? This is not at all a natural transitional consonant you'd put there. A y would be a lot more likely (and quite common practice in Sanskrit). Since it can not be readily understood by phonetic reasons, there's two more examples. The Vedic people were feeling funny, and thought it'd be nice to come up with a completely nonsensical transition sound, or it is archaic.
As a historical linguist, I feel compelled to further research the archaic option. Indo-European has certain elements behind certain stems called 'stem-extension'. These are always simple consonants like *k, *p or *u. The function of these stem-extensions have always been a bit mysterious. A nice example is the root *(s)ker- ''to cut' as found in Dutch scheren 'to shave' beside *(s)ker-p- which we find in Old English sceorfan 'to bite'.
I believe that this p that shows up in Sanskrit might give us an indication of the original function of the *p-stem-extension. Maybe originally this was a way to form causatives from verbal stems, which was later replaced by the common textbook causative formation. A nice note to put with this is, that Anatolian indeed is unfamiliar with the textbook causative formation, so there's some indication that it's recent.
While most p-causatives in Vedic Sanskrit occur after Laryngeal final roots, there are a few verbs that show this p even without them ending in a vowel/laryngeal. These are r̥- 'to go'; ar-p-áya- 'cause to go' and kṣi- 'to dwell' kṣe-p-áya- 'cause to dwell'.
All in old, Sanskrit seems to give a strong indication that the *p-stem extension is an old causative formation. Now we must look to see if there's any other words out there in other languages that seem to support this idea. Germanic *(s)ker- 'to shave/cut'' ~ *(s)ker-p- 'to bite' might be seen as a reflex of this, though the difference is rather more intensive than causative.
There is lots more to say about these stem extensions, and I'm nowhere near done figuring them out. There's some really odd stuff going on with the voice of these extensions for example. They seem to become pre-glottalised sometimes for no apparent reason.
As a final little side-note sthāpaya- looks suprisingly much like the Dutch verb stoppen 'to stop'. I don't buy the commonly cited Latin etymology stupere (it wouldn't explain with Dutch and Enlish both have the vowel o rather than u, or English with u and Dutch with o), it can hardly be cognate either, since the vowels would be wrong, and Dutch p points to PIE *b, which is very odd to have in the first place. So until I make any significant breakthrough on this bizarre word (which even if it is from Latin has a difficult reconstruction), I'll consider it completely unrelated.
The other day I had a discussion about the Dutch verb willen 'to want'. It is a funny verb, because it formally has two past tenses. Both wou and wilde.
I was watching a movie in which the form wou was used in the subtitles and the person who I was watching it with pointed out that it looked silly and was incorrect. She claimed that wilde was the correct formal form. Luckily our lovely language hasn't been prescriptivised to a level that a perfectly correct form like wou is deemed incorrect, but it does show how people feel about it. Even I tend to avoid wou when writing formal letters.
The funniest thing is, wou is the historically correct form. willen belongs to a small class of funny germanic verbs that are ja-verbs in the present, but behave as normal verbs in the preterite. So, willen goes back to *wiljan while its past tense is a perfectly normal Germanic preterite *wal. In other words, it's a strong verb.
In general though ja-presents are weak verbs, while those without a suffix are strong, and this is the reason why it was changed to wilde. For example rillen 'to shake' has a past tense rilde from *riljan and *riliða where in the preterite the *j-suffix shows up in its vocalised form *i. By analogy of this class of verbs, a secondary preterite of willen was easily made, making the verb regular rather than irregular.
What I find remarkable is that, generally more 'formal' language tends to be a bit more archaic, but in this case, people seem to prefer an analogically levelled form over a form that preceeds it by well over a 1000 years.
Not sure if any of you ever saw this, but there's a band called The Magnetic Fields who did a song on Ferdinand de Saussure, which is cool enough for me to justify posting it here.
She-Wolves and Godesses in Sanskrit are and odd bunch. You have two types of ī-stems in Sanskrit (and also in Indo-european) the hysterodynamic and proterodynamic ones.
vṛkī- 'she-wolf' is one of the Hysterodynamic ones (which is quite rare).
nom. vṛkīs ( < *-iH-s)
acc. vṛkyam ( < *-iH-ém)
gen. vṛkyas ( < *iH-ós )
devī- 'godess' is Proterodynamic
nom. devī (< *-iH)
gen. devī-m (< *-iH-m)
acc. devyās (< *-iéH-s)
The most striking of this is, that 2 perfectly feminine words, perfectly animate and all, have two different flections and on top of that, one takes the nominative marker *-s while the other doesn't.
I'm imagining that at some earlier indo-european stage some cluster *Hs must have assimilated or something along those lines. But I have not quite figured out how these paradigms would work pre-syncope. And rather than leaving you all in the dark, I thought I'd post this up, and see if any readers have bright ideas where the nom. *-sg comes from, or why it is absent.
Beekes doesn't reconstruct it for PIE as far as I can tell. But then we would have to assyume quite a bizarre analogy. But any thoughts are welcome!
One of the great annoyances about the Dutch language, is that the definition of the 'correct' standard languages is rather different from what we actually speak. This has to do with the standardisation of the Dutch language when the first bible translation was introduced. Dutch was morphed into some sort of mixture between Latin and Dutch, giving rise to new case forms and constructions previously unheard of in Dutch.
By now making a distinction between masculine and feminine is finally on its way out; and writing cases has been abolished for some time too. Nevertheless some things persist. Some people insist on making a difference between a dative and accusative third person pronoun hun and hen (I'm not even sure which of the two is which), which were originally just two dialectal variants of the same word. But were taken to be used as two different cases to facilitate a more accurate representation of the Greek language.
Another truly, and even, far more common 'correction' that is made to people's speech is the comparative.
In English we would write the following sentence:
He doesn't have more children than me.
'than me'; perfectly normal to use 'me' here, which is what all other germanic language do, except for 'correct' Dutch. We're supposed to say:
Hij heeft niet meer kinderen dan ik.
IK, nominative! Why? Because apparently you're supposed to fill in the rest of the sentence as follows:
Hij heeft niet meer kinderen dan ik heb.
or in English: He doesn't have more children than I have.
But English has no problem changing pronominal case here, why should we? And then when we look at actual spoken dutch we indeed find:
Hij heeft niet meer kinderen dan mij. As we would expect it. I had a previous suspicion that this must have been a early-modern dutch prescriptivist innovation, and as it turns out, I'm right. In Middle Dutch texts we find this sentence written in 1200 AD:
Hine hadde niet meer kinder dan mi
He-NEG had NEG more children than me.
So, the 'dan ik' construction is historically wrong. This never seems to convince prescriptivists though. Even if the construction wasn't historically wrong though. Why would anyone say that something that 90% of the population says is 'incorrect'. By which standard are you measuring language? Isn't language defined by the people who speak it? If it isn't, then what does tell us what language is? Because clearly language itself can't be used since it has no authority over what language is according to these prescriptivists. Do they really think grammar books come falling from the sky through some divine intervention?
There's an enormous contradiction here. I believe language should be spoken the way it is spoken, not the way some 17th century theologist would like to see us speak some pseudo-latin-dutch hybrid monster.