American in Spain

How complicated is your language?

January 26, 2008

Like living abroad, learning a second language results in a lot of introspection about your own language, the rules that govern it (or don't), and how and why those rules would have come to be. Also, as a member of a bilingual household, I find it interesting which words and phrases are typically said in one language or the other. Some phrases are just easier and shorter in one language than another. I recently asked one of my blogrollmates, who earns her living translating books from French to English, which language requires more words and pages to convey the same information. Her response was that a French document will have about 30% more words than its English counterpart. I suspect that the same is true for Spanish, even though most of the words will be small possessive transitive helpers (de, la, a, se, lo, te, me, nos) that we don't have in English. German, on the other hand, might have much fewer words, but many more letters due to the nature of the language.

I think it would be interesting to see some real data on this topic. Surely all the languages could be ranked by terseness or expressiveness or succinctness or whatever you want to call it. Perhaps such data could be gleaned from analyzing translated literature at the Gutenberg Project or something.

The other day I was installing Microsoft Word 2008. Having recently been pondering this stuff, one particular part of the installation caught my eye: the selection of which "proofing tools" (spell checker, thesaurus, etc.) I wanted to install. Check out the enormous differences in the amount of space these dictionaries take up. Surely there must be some correlation between the number of mebibytes needed to verify spelling and grammar rules for a language and the language's general complexity. Obviously there will be differences in how much time and effort Microsoft has spent in getting each language right, but I think it works nicely as a general measure of spelling and grammatical complexity.

Proofing tool sizes for various languages

It turns out that German is the most complex and Portuguese is the simplest European language.

...at least to the extent that this is a decent measure.