How Many Words Do You Need To Learn To Speak A Language?

Learning a language, at least in its initial phase, is all about learning new vocabulary. And one of the first questions you may want to ask yourself to downsize this overwhelming task is: how many words do you really need to learn in order to understand and speak a foreign language? The answer may not be as formidable as you think.

How many words is there in a language?

Counting the number of words in a language is a pain in the butt.

First of all, all dictionaries have a different number of words. For example, Merriam-Webster has proudly announced that it has 470,000 entries. Oxford English Dictionary seems to have "just" 171,476 words. However, both insist that English has around a million distinct words.

"Why aren't they in dictionaries?" you may ask. The answer is simple: these words include all sorts of slang, dialectisms (region-specific words), technical terms, and names of chemical elements such as benzoapyrene and methulene diphenyl diisocyanate, which you probably don't want to hear about.

The same applies to foreign languages. Trésor de la langue française noted 539,413 distinct entries for the French language. However, the cunning Tresor actually included a bunch of inflections like "belle-belles," which certainly do not add anything to the meaning. Nevertheless, the estimated number of French words is around 1,200,000. The same applies to Spanish, as Diccionario de la lengua española has just 88,000 entries, while the real spoken language out there in all Hispanosphere is ten times richer. So let's give up on this task.

So, how many words do you need for your everyday linguistic needs? No more than a few thousand.

Even Shakespeare didn't use more than 31,534 words in all his poems.

So you're fine.

How many words you need to speak some French: A case study.

A case study showed 312,000 words. Is that a lot or not? (For a quick reference, it's something like The Lord of Rings.)

In any case, an original French study of a speech corpus of 312,000 words revealed that just 8,000 of them were different. However, about half of these 8,000 words (4,564) occurred in speech no more than three times.

But, hey, that's only half of the story.

On the other hand, the first 38 most frequent words showed up exactly 157,943 times. That corresponds to 50.6% of French speech consisting of just 38 words. Here's the famous list:

The 38 most frequent words in the French language from L'Elaboration du français fundamental by Dedier, 1964

As you can see, there's not much sense in any of these 38 words. Personally, I see a bunch of morphemes: prepositions, pronouns, and a couple of verbs. Just dry, sheer grammar.

However, applause to Dr. Dedier (1964) for this meticulous work.

What's important, however, is what we can get out of this data. Here are two conclusions:

  1. You need just 8,000 words to keep going for days and nights. Come on, 312,000 words equals a 30-hour audiobook.

  2. You use 38 words for half of the time you speak.


What is the core vocabulary

Another French linguist, Paul Rivenc, conducted a similar study of (again) spoken French. However, his findings were organized in a more beautiful way:

This "sun" is basically a representation of a language from the lexical point of view. Vocabulary was found to consist of five major parts:

Morphemic nucleus

As a previous study has shown, the language nucleus consists of a very small number of grammatical words. Determiners (a, the), auxiliary verbs (have, be, do), pronouns of all sorts (I, you, his, mine), prepositions (in, at), and so on. These words show up in every single sentence because they help to build a grammatical one.

However, even a hundred of these nucleus morphemes won't allow you to create a meaningful thought. Without an inflow of fresh words, you will hit the max line fairly soon. This is mine, not yours. Got it?

Core vocabulary

Only 1000-2000 words show up frequently enough to be counted as the core of a language.

However, not all high-frequency words are "service" ones. Those notorious grammatical words would consist of just 25% of the core. The rest would be distributed between nouns (40%), verbs (22%), and adjectives (11%). They are indispensable. You won't be able to go for a day without using them.

Core vocabulary is simply popular and useful words that appear a lot in our speech. They are not specific at all. All they can refer to are very general and vague ideas like "to like", "to want", "to have", "to come", or "to work". Or, when it comes to nouns, to fairly common things like "day", "thing", "home", "water", "life", or "people". People are a fairly common thing in this life, aren't they?

By the way, all nasty things like irregular verbs, the remaining old declensions, and so on show up on this level.

So if something is irregular, damn, you have to learn it.

Frequent vocabulary (Rays)

  • Here, things become complicated.

    "Ray" vocabulary slips from our mouths all the time but... but it's domain-specific. Do you know how many words you need, for example, to ask for directions, to buy stuff in a grocery store, or to order something edible in a foreign language? The range of things you have to know how to name is really immense:

  • Types of transport: a car, a bus, a taxi;

  • Directions: left, right, and a Toronto-specific "North-West";

  • Time: tomorrow, Saturday, in a week;

  • Edible stuff: 1 kg of tomatoes, chicken breasts, ice-cream;

  • Restaurant-specific: please, vegetarian, deep-fried.

And so on, just think about it! Of course, you don't need to know how to say "vegetarian" in French when you are lost in Lyon. But the time will come, my friend, when you get hungry.

The "ray" vocab penetrates all possible domains of our life, and it's very easy to have gaps on this level. Why? Because it's estimated to cover around 4000-5000 words. And this is a lot when you're a beginner.

Semi-specialized vocabulary

"A man with a scant vocabulary will almost certainly be a weak thinker. The richer and more copious one’s vocabulary and the greater one’s awareness of fine distinctions and subtle nuances of meaning, the more fertile and precise is likely to be one’s thinking. Knowledge of things and knowledge of the words for them grow together. If you do not know the words, you can hardly know the thing."

― Henry Hazlitt, Thinking as a Science

That’s the prominent problem for all language learners. Basic vocabulary doesn't give us much flexibility of expression, and more refined vocabulary requires a lot of time to learn. Meanwhile, some native speakers look at us as if we were some kind of funny creatures with undeveloped speech. Not all, thankfully.

In any case, if you want to pass for a learned person in a foreign country, you have to use high-end words and expressions. This level consists of around 25,000 words that are mostly redundant to your basic vocabulary. All those synonyms of "good" and "bad", the 79th and 132nd meanings of "to go", Latin derivations, and so on would be included at this level.

As you can guess, the only way to memorize all this vocabulary is to read, and read a lot.

Technical vocabulary

However, high in the sky, there's another level, and most of us never reach it, even in our native language.

These other 200,000-300,000 highly technical, or scientific, words are so domain-specific that they almost never overlap. Linguists attack you with formidable terms like "suprasegmental properties" or "degemination"; mathematicians - with "manifold" and "quaternions"; nuclear scientists - with "antineutrino" and "photomultipliers". I hope it's enough.

There is absolutely no chance for this vocabulary to be used in everyday speech. But again, you come across them in many technical books on a related topic.

How many words do you need for a new language?

You can't learn them all; it's evident.

We don't even know 50% of the words in our mother tongue, let alone a foreign language. Certain tools, like Test Your Vocab, for example, are there to illustrate this idea. And it's fine.

The 80/20 Principle

Learning a language is not just cramming new words for the rest of your life. As the Pareto Law goes, "20% of efforts give 80% of results," and the smart thing to do is not to waste a single minute covering the other 20%. And since an average adult native speaker knows around 20,000-35,000 words, your 20% target would be to cover 4,000-7,000 words.

If we refer back to the Rivenc’s Vocabulary Sun, we will see that the core and frequent vocabulary together comprise around 5,000-7,000 words. These are the words you want to concentrate on.

General Service Lists

So, you've found out how many words you need to learn. Now, how do you do this?

You have to know your enemy. You can't learn thousands of words without knowing what they look like. And the perfect tool here is the General Service List. Briefly, the GSL is a long list of the 2,000 most frequent words in English compiled by Dr. West in 1953.

Wow, that’s an old one, you might say. And I would totally agree, as would hundreds of linguists and lexicostatists. Fortunately, there were a number of people who went beyond simply saying that the West’s list is outdated.

Dr. Charles Browne and his team published an updated (and slightly expanded) version of the GSL in 2013. The new list includes 2800 words deliberately extracted from an immense 273-million-word written corpus of English. With the New General Service list, learners can understand 92% of English texts (on general topics, of course).

Go multilingual

I can sense your question already.

“But I don’t need a list of English words, I need one for [insert your target language here]!!!”

In that case, you'll want to access the blog of Hermit Dave, who created frequency word lists for 39 languages based on the Open Subtitles Org database. Most lists include more than 50,000 words. For example, the one for the Russian language included 450,029 unique words. And, since it's all subtitles, you have access to authentic spoken language.

GSL's 2000 words are quite inferior to the number of words you need to know to function in real life. Nevertheless, you won’t be able to build a wide vocabulary without this foundation, so I recommend starting here. As you progress, you'll naturally acquire new vocabulary through reading, listening, and real-time conversations.

