The Flua Index — a language level you can say out loud

Abstract. We propose the Flua Index: the percentage of everyday spoken sentences in a language that a learner fully understands. We argue it is a better way to talk about language comprehension than CEFR letters ("I'm A2") or vocabulary counts ("I know 2,000 words") — because a layperson understands it instantly, because it is continuous rather than six coarse bands, and above all because it is checkable. We define the metric, ground the sentence-level design in the lexical-coverage literature, describe how it is estimated, and state its limitations plainly, since honesty is the entire point. (The "we" is me and an app, but every paper needs a we.)

Here is a confession that will sound like modesty and isn't: I understand about 40% of everyday spoken French.

Notice what that sentence just did. You know exactly what it means. You don't need a rubric, a framework, or a footnote. Out of the next hundred sentences of ordinary French conversation, about forty will land on me whole, and sixty will have at least one hole in them. You could — and this is the part I care about most — sit me down, play me a hundred sentences, and check.

No other way of talking about language ability survives that test. This essay is about why, and about the metric Flua now puts at the center of its progress screen: the Flua Index.

The problem with "I'm B1"

The standard answer to "how's your French?" is a CEFR level — the Council of Europe's six bands, A1 to C2. (The US government's ILR scale, 0 to 5, has the same shape and the same problems.) CEFR is genuinely useful for what it was built for: giving schools, examiners, and visa offices a shared rubric. As a way for one human to tell another human how much of a language reaches them, it fails three times over.

It's opaque. Say "I'm B1" to anyone outside the language-learning bubble and you have communicated nothing. The official descriptor — "can understand the main points of clear standard input on familiar matters" — is a sentence written by a committee, for a committee. Nobody's aunt has ever nodded at it.

It's coarse. Six bands cover the entire distance from tourist to translator, so each band swallows hundreds of hours. You can work hard for six months and still, truthfully, be "A2" — the label doesn't move even though you did. A measurement that can't register half a year of progress is a bad measurement and a worse motivator.

It's unverifiable in practice. Most CEFR levels in the wild are self-assessed, which is to say vibes. The certified ones come from an exam that measured exam-shaped skills on one afternoon, and the certificate doesn't decay even though your language does. Either way, the number can't be audited by the person you're telling it to.

The runner-up is worse: counting words

The other number people reach for is a vocabulary count: "I know 2,000 words." This one at least sounds empirical. It overstates in two compounding ways.

First, knowing a word on a flashcard list is not understanding it live. In real speech the word conjugates, contracts, elides, and arrives at speed inside grammar — list-knowledge is the entry ticket to comprehension, not comprehension. Second, comprehension is wildly non-linear in vocabulary. Word frequencies follow a Zipf distribution: the most common thousand words do enormous work, and each thousand after that does less. Two learners who "know 2,000 words" can live in different languages depending on which two thousand. A unit that is both inflated and non-linear is not a unit; it's marketing.

The Flua Index, defined

The Flua Index is the estimated percentage of everyday spoken sentences in a language that you fully understand. Every word in that definition is a design decision:

Everyday spoken — measured against a corpus of what people actually say: conversation-register language, of the kind derived from subtitles and transcribed dialogue. Not textbook dialogues, not newspaper prose, not Proust. The corpus is the definition, so it has to be named and held fixed.
Sentences — the unit is a whole sentence, never a word count. The next section is about why this is the load-bearing choice.
Fully understand — every word in the sentence is within your reach; no gap that breaks the meaning. There is no partial credit, because partial credit is how numbers learn to flatter you.

So "I understand 40% of everyday spoken French" cashes out as: of a hundred sentences sampled from real everyday French, as they actually occur, about forty land with nothing missing.

Why whole sentences: the 95–98% problem

The counterintuitive core of the metric is that it counts sentences, not words — and here the research literature does the heavy lifting.

A consistent finding in comprehension research is that you need to know almost all the words before connected language becomes comfortable. Laufer (1989) put the minimum lexical coverage for adequate reading comprehension at 95%; Hu and Nation (2000) found that unassisted, comfortable comprehension wants about 98%; Nation (2006) worked out how much vocabulary that actually demands, and van Zeeland and Schmitt (2013) found the same threshold shape holds for listening. The precise cutoffs are debated; the shape is not.

Sit with what that means for per-word scores: "I understand 85% of the words" sounds nearly done, and is experienced as drowning — because at 85% coverage nearly every sentence still has a hole in it, and the hole is frequently the word carrying the meaning. Comprehension isn't lived one word at a time. It's lived sentence by sentence: a sentence is roughly the smallest unit of speech you can actually use — act on, laugh at, answer.

Counting fully-understood sentences bakes the threshold literature directly into the unit. A sentence only scores when your coverage of it is total. That is why a Flua Index reads lower than the dashboards of streak apps — and why it tracks what living inside the language actually feels like. My 40% doesn't feel like "almost half fluent." It feels like catching whole exchanges when they're simple and losing the thread at a fast dinner table. Which is exactly right. An honest metric should feel true when you walk around inside it.

How Flua estimates it

Flua is a spaced-repetition system that teaches through whole sentences, which means it observes something most apps don't: the live state of every word you've met — when you saw it, whether you recalled it, how reliably, how recently. From that review history it maintains a model of which words are currently within your reach.

The index falls out of crossing that model with the everyday-speech corpus: a corpus sentence clears the bar when every word in it is within reach, and the Flua Index is the share of sentences that clear, weighted by how often such sentences actually occur in everyday speech. (The same machinery pointed at one specific book instead of the everyday corpus yields my favorite side stat — "you can currently read 61% of Le Petit Prince" — but the everyday corpus is the flagship, because everyday speech is the thing you moved countries to understand.)

This is an estimate, and I want to be precise about the ways it can be wrong:

It gates on words, not grammar. The model verifies lexical reach; it doesn't separately verify that you parse every construction. This is softened by the everyday register (short, grammatically plain sentences) and by the fact that you acquired each word inside full sentences to begin with — but at the margins it's a source of optimism, and we count it as one.
Card speed is not street speed. Recognizing a known word in fast connected speech — French liaison is a hazing ritual — is harder than recognizing it on a card. Early on, the index runs a little ahead of your ears; with listening exposure the two converge. (Interestingly, van Zeeland and Schmitt found listening thresholds may be slightly gentler than reading ones, so the direction of this bias isn't as obvious as it seems.)
The corpus is the claim. 40% of everyday conversation, not 40% of legal French or of teenage banter. Change the corpus and you change the number — which is why the corpus stays fixed and named, and why the index says which French it's measuring.

I'll take a well-grounded estimate with stated error sources over a certified letter grade from one afternoon eighteen months ago. But the estimate has to keep saying it's an estimate — the moment a number like this starts rounding itself up, it's worth nothing.

What the number buys you

You can say it out loud — to your mother, your landlord, a stranger — and be understood on the first pass.
It moves. It's continuous, so a good week is visible. The difference between 38 and 43 is real and yours; the difference between "A2" and "A2" is six months of invisible work.
It can be checked. Sample real sentences, count what lands. A number you can audit is a number you can trust — and the only kind worth publishing next to your name.
It compares. Same corpus recipe, same number, across learners and across languages.
It measures the goal, not a proxy. Not exam performance, not streaks, not XP — the actual fraction of the actual language that actually reaches you.

The flip side is the feature: it will be lower than your streak app has been telling you. A number that can't disappoint you can't inform you either.

Say the true number out loud

CEFR asks where you rank. A vocab counter asks how much you've collected. The Flua Index asks the only question the dinner table asks: how much of this reaches you, right now? One is a caste system, one is a trophy shelf, and one is a progress bar.

I'm at 40. The gap between 40 and 100 isn't an embarrassment to be rounded away — it's the roadmap, and I get to watch it close in single-digit increments instead of waiting years for a letter to change. So when someone asks how my French is going, I don't say "A2," and I don't say "2,000 words." I say I understand about 40% of everyday spoken French — and by the time you read this, that sentence should already be slightly out of date.

References

Laufer, B. (1989). What percentage of text-lexis is essential for comprehension? In C. Laurén & M. Nordman (Eds.), Special Language: From Humans Thinking to Thinking Machines (pp. 316–323). Multilingual Matters.
Hu, M., & Nation, I. S. P. (2000). Unknown vocabulary density and reading comprehension. Reading in a Foreign Language, 13(1), 403–430.
Nation, I. S. P. (2006). How large a vocabulary is needed for reading and listening? The Canadian Modern Language Review, 63(1), 59–82.
van Zeeland, H., & Schmitt, N. (2013). Lexical coverage in L1 and L2 listening comprehension: The same or different from reading comprehension? Applied Linguistics, 34(4), 457–479.
Council of Europe (2001). Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Cambridge University Press.
Zipf, G. K. (1949). Human Behavior and the Principle of Least Effort. Addison-Wesley.