Examples — AI as Model Stress-Tester

Each walkthrough below shows the complete Model Stress-Tester method. The student describes their mental model of how something works, AI presents scenarios that test where the model breaks, the student works through each scenario to identify the weakness, and then revises the model. The key principle: AI does not replace your model with a better one. It attacks your model so you can reinforce it yourself. The cognitive work of revision is where the learning lives.

Subject 01 · Mathematics

Mathematics — Probability

Stress-Testing a Model of Probability

The student has studied basic probability — coin flips, dice rolls, independent events, and simple calculations. They feel they have a good intuitive grasp. They describe their mental model to AI and ask it to find scenarios where the model breaks.

Stress-Test Session — Probability

Student

Here's my mental model of how probability works:

Probability is the chance that something will happen, expressed as a number between 0 and 1. You calculate it by dividing the number of favorable outcomes by the total number of possible outcomes. For example, the probability of rolling a 3 on a die is 1/6 because there's 1 favorable outcome out of 6 total.

If two events are independent — like flipping a coin twice — you multiply their probabilities. So the chance of getting heads twice in a row is 1/2 × 1/2 = 1/4.

If events are mutually exclusive — like rolling a 3 OR a 5 — you add the probabilities. So 1/6 + 1/6 = 2/6 = 1/3.

Please stress-test this model. Give me scenarios where it would make an incorrect prediction or fail to account for something important. Don't give me the answers — just describe the scenarios and let me work through them.

AI — Stress-Test Scenarios

Scenario 1: A disease affects 1 in 1,000 people. A test for the disease is 99% accurate — meaning it correctly identifies the disease 99% of the time, and correctly gives a negative result 99% of the time. A person tests positive. Using your model, what is the probability that they actually have the disease?

Scenario 2: You're on a game show. There are three doors. Behind one is a car; behind the other two are goats. You pick Door 1. The host — who knows what's behind every door — opens Door 3 to reveal a goat. The host then asks: do you want to switch to Door 2? Using your model, should you switch? Does switching change your probability of winning?

Scenario 3: A couple has had four daughters in a row. They are expecting a fifth child. A friend says: "They're due for a boy — after four girls, the probability of a boy must be higher than 50%." Using your model, is the friend correct?

Working Through the Scenarios

The student attempts each scenario using their current model. This is the critical step — the effortful attempt to apply the model is where the model's weaknesses become visible to the student, not just to the teacher.

Student Works Through Scenarios

Student

Scenario 1: My first instinct is to say 99%, because the test is 99% accurate. But let me think more carefully... If 1 in 1,000 people has the disease, and I test 1,000 people: about 1 person actually has the disease and would test positive. But of the 999 healthy people, 1% would get a false positive — that's about 10 people. So roughly 11 people test positive, but only 1 actually has the disease. The probability of actually having the disease given a positive test is only about 1/11, which is roughly 9%.

My model completely failed here. It doesn't account for base rates — the fact that the disease is rare means false positives dominate. My model has no concept of conditional probability or Bayes' theorem.

Scenario 2: My initial model says it doesn't matter — there are two doors left, so it's 50/50 either way. But... when I picked Door 1, I had a 1/3 chance of being right and a 2/3 chance of being wrong. The host opening Door 3 doesn't change my original odds — it just concentrates the 2/3 chance of being wrong onto Door 2. So switching gives me a 2/3 chance of winning, and staying gives me 1/3. I should switch.

My model failed because it assumed that the host's action reset the probabilities. It didn't account for the fact that the host's choice was constrained — the host knew what was behind the doors and was forced to reveal a goat, which is new information that changes the distribution.

Scenario 3: This one I think I can handle. Each birth is an independent event. The coin doesn't remember what happened before. The probability of a boy is still roughly 50%, regardless of the previous four daughters. The friend is committing the gambler's fallacy — the belief that past outcomes influence future independent events.

My model actually handles this one correctly, because I have the concept of independence. But it only works because this scenario fits neatly into my model. The first two scenarios showed that my model breaks whenever events are not independent or when prior information changes the probabilities.

Revise and Re-Test

The student now updates their mental model to account for the weaknesses the stress-test revealed, then describes the updated model.

Revised Model

Student — Revised Model

My original model covers basic probability — favorable outcomes over total outcomes, independence, and mutual exclusivity. That's the foundation, but it's only adequate for simple scenarios.

My model now needs to include: conditional probability — the probability of an event given that another event has occurred, which can be radically different from the unconditional probability. The disease-test scenario showed that a 99% accurate test can produce a positive result that only carries a 9% chance of being true, because the base rate of the disease is so low. Bayesian reasoning — updating probabilities when new information arrives, rather than treating each calculation as if it starts from scratch. The Monty Hall problem showed that new information (the host's choice) changes the probability distribution even though no physical event changed. And my model correctly handles the gambler's fallacy through the independence concept — but I need to be more careful about when independence actually holds and when it doesn't.

What This Example Demonstrates

The student's original model was correct for the class of problems it was designed for — basic probability with independent events and equal likelihood. AI's stress-test scenarios were chosen specifically to target the model's boundaries: conditional probability, information updating, and the gambler's fallacy. Two of the three scenarios broke the model; the third confirmed its limits. The student now holds a model that is not just patched but structurally expanded — they know where basic probability applies and where more sophisticated tools (Bayesian reasoning, conditional probability) are required. That awareness of a model's boundaries is itself the most valuable kind of knowledge.

Subject 02 · Biology

Biology — Ecology

Stress-Testing a Model of Ecosystem Balance

The student has studied ecosystems — food chains, predator-prey relationships, and the idea that ecosystems tend toward balance. They describe their mental model and ask AI to stress-test it.

Stress-Test Session — Ecosystem Balance

Student

My mental model of how ecosystems maintain balance:

Ecosystems are self-regulating. Predators keep prey populations in check, and prey availability controls predator populations. If rabbits increase, foxes have more food and increase too, which brings rabbit numbers back down. This creates a natural balance — populations fluctuate, but the system stays stable over time.

Energy flows from the sun through producers (plants) to consumers (herbivores, then carnivores). Each level has fewer organisms because energy is lost at each transfer. This pyramid structure keeps the ecosystem balanced.

Stress-test this model. Give me scenarios where it fails — don't give answers, just the scenarios.

AI — Stress-Test Scenarios

Scenario 1: In 1995, wolves were reintroduced to Yellowstone National Park after being absent for 70 years. Your model would predict that wolves simply reduced the elk population (their main prey). But what actually happened went far beyond predator-prey dynamics — the rivers literally changed course. Apply your model: how could adding wolves to an ecosystem change the behavior of rivers?

Scenario 2: Starfish are predators in tidal ecosystems. Your model suggests removing a predator would allow prey populations to increase, which sounds like a simple rebalancing. But when researchers removed starfish from a stretch of coastline, the entire ecosystem collapsed — the number of species dropped from 15 to 8 within a year. Your model predicts more prey. The reality was fewer species overall. What does your model miss?

Scenario 3: Invasive species — like the cane toad in Australia or the zebra mussel in the Great Lakes — often enter an ecosystem and devastate it, with populations exploding uncontrollably. Your model says ecosystems are self-regulating. Why doesn't the self-regulation work for invasive species? What does their success reveal about the limits of "balance"?

Student Works Through Scenarios

Student

Scenario 1: Wolves eat elk — that's the direct effect my model predicts. But wolves might also change elk behavior — elk might avoid certain areas where wolves hunt, like riverbanks. If elk stop grazing near rivers, vegetation might grow back along the banks. More vegetation means stronger root systems holding soil in place, which means less erosion, which could literally change how rivers flow. So the wolves didn't just reduce elk numbers — they changed elk behavior, which changed vegetation patterns, which changed the physical landscape. My model only accounts for population numbers, not behavioral effects. The actual system has cascading indirect effects that flow through the ecosystem in ways a simple predator-prey model doesn't predict.

Scenario 2: My model says removing a predator lets prey increase. But if the starfish was eating the most dominant competitor species — keeping it in check — then removing the starfish lets that dominant species take over and outcompete everything else. The starfish wasn't just controlling prey numbers; it was maintaining diversity by preventing any single species from monopolizing the resources. My model doesn't account for keystone species — organisms whose influence is disproportionate to their population size because they regulate the competitive balance of the entire community.

Scenario 3: Self-regulation works because predators, parasites, competitors, and diseases have co-evolved with the prey species over long periods. The checks and balances are the product of evolutionary history. An invasive species arrives without any of those co-evolved controls. Nothing has adapted to eat it, compete with it, or parasitize it. My model assumes that the regulatory mechanisms are always present. They aren't — they're the product of specific evolutionary relationships, and when a species enters a system where those relationships don't exist, the regulation fails. "Balance" isn't a property of ecosystems in general — it's the product of specific, evolved, and sometimes fragile relationships.

Revised Model

Student — Revised Model

My original model was too simple: ecosystems are not just food chains with numerical balance. They are webs of direct and indirect interactions — including behavioral effects and trophic cascades that can reshape physical landscapes. Some species (keystone species) have disproportionate influence on ecosystem structure, and their removal can collapse diversity rather than simply shifting numbers. And "balance" is not an inherent property of ecosystems — it is the product of co-evolved relationships that developed over evolutionary time. When those relationships are disrupted (by removal of keystones or introduction of invasive species), the self-regulation my model assumed simply does not function. Ecosystems can be resilient, but they are not automatically self-correcting.

What This Example Demonstrates

The student entered with the most common textbook model of ecosystems — populations regulate each other through predator-prey dynamics, and the system tends toward balance. AI's scenarios targeted three specific weaknesses: the absence of indirect/behavioral effects (trophic cascades), the absence of disproportionate species influence (keystone species), and the assumption that balance is inherent rather than co-evolved (invasive species). Each scenario forced the student to discover a qualitatively new dimension of ecological thinking that their original model could not account for. The revised model is not just bigger — it is structurally different, treating ecosystems as complex webs of evolved relationships rather than simple mechanical balances.

Subject 03 · History

History — Comparative History

Stress-Testing a Model of How Revolutions Happen

The student has studied the French Revolution in detail and has developed a mental model of how revolutions happen based on that case study. They describe their model to AI for stress-testing.

Stress-Test Session — Revolutions

Student

My model of how revolutions happen, based on studying the French Revolution:

Revolutions occur when: (1) the existing government is oppressive and out of touch with the people, (2) there is severe economic hardship that affects the common people, (3) there is a large inequality gap between the ruling class and everyone else, (4) new ideas about rights and governance spread and inspire people to demand change, and (5) the government is too weak or rigid to reform itself in time.

The pattern is: oppression + economic crisis + new ideas + government weakness = revolution. The revolution itself involves the masses rising up, overthrowing the old order, and establishing a new system based on the new ideas.

Stress-test this model. Find historical cases where it makes the wrong prediction.

AI — Stress-Test Scenarios

Scenario 1: The Glorious Revolution of 1688 in England. A king was removed from power and replaced by a new monarch with a completely different system of governance — parliamentary supremacy was established, fundamentally changing the political structure. But there was no mass uprising, no storming of fortresses, and minimal violence. Apply your model: does it predict this kind of revolution?

Scenario 2: The Meiji Restoration of 1868 in Japan. The centuries-old shogunate was overthrown and Japan transformed from a feudal society into a modern industrial state within a single generation — one of the most dramatic political and social transformations in history. But this revolution was led from above by a faction of the ruling class, not by the oppressed masses. Does your model account for an elite-driven revolution?

Scenario 3: North Korea today. By your model's criteria — oppressive government, severe economic hardship, massive inequality between the ruling class and everyone else, weak legitimate governance — a revolution should have occurred long ago. Why hasn't it? What does your model fail to account for?

Student Works Through Scenarios

Student

Scenario 1: My model predicts revolution as a mass uprising — crowds in the streets, violence, the common people overthrowing the elite. The Glorious Revolution doesn't fit at all. It was orchestrated by elites — by Parliament and powerful nobles who invited a foreign monarch to replace the existing one. The "masses" were largely uninvolved. So my model is built on one type of revolution (popular uprising) and doesn't account for elite coups or negotiated transfers of power that are still revolutionary in their outcome — fundamentally changing the system of governance.

Scenario 2: The Meiji Restoration is even more challenging for my model. The revolution wasn't driven by the oppressed — it was driven by a faction within the ruling class who saw that Japan needed to modernize to avoid being colonized by Western powers. Their motivation wasn't equality or rights — it was national survival and competitive strength. My model assumes revolutions come from below, driven by the suffering of the masses. The Meiji case shows that revolutions can come from above, driven by a strategic vision of the elite. The outcome was still transformative, but the mechanism is completely different from my French Revolution template.

Scenario 3: This is where my model fails most clearly. All the conditions I listed are present in North Korea — oppression, economic hardship, inequality, government rigidity. But no revolution has occurred. What's missing? I think it's the capacity for collective action. The North Korean state controls information so completely that people cannot organize, cannot access alternative ideas, cannot communicate freely, and face immediate severe punishment for any dissent. My model lists the conditions but doesn't account for the state's capacity to suppress the very mechanisms through which revolution becomes possible — communication, organization, and the spread of alternative ideas. A government that is strong enough in its repressive capacity can prevent revolution indefinitely, even when all other conditions are present.

Revised Model

Student — Revised Model

My original model was built from a single case study and assumed that all revolutions follow the French pattern — mass popular uprising driven by oppression, economic crisis, new ideas, and government weakness. The stress-test revealed three critical limitations:

Revolutions can be elite-driven (Glorious Revolution, Meiji Restoration), not just popular uprisings. The mechanism and the motivation can be entirely different — strategic modernization or factional power transfer rather than popular liberation.

The conditions for revolution are necessary but not sufficient. Oppression and economic hardship do not automatically produce revolution — they require the capacity for collective action (communication, organization, access to alternative ideas), which a sufficiently repressive state can prevent.

My model was a template built from one example and applied universally. A better model treats revolutions as a category with significant internal variation — united by the outcome (fundamental change in governance) but diverse in mechanism, motivation, and participants. The question is not just "do the conditions exist?" but "who has the capacity and motivation to act on them, and through what mechanism?"

What This Example Demonstrates

The student committed a common intellectual error: building a general model from a single case study. Their model of "how revolutions happen" was really a model of "how the French Revolution happened" — dressed up as a universal theory. AI's stress-test selected historical cases that broke the model in three different ways: an elite-negotiated revolution, an elite-driven modernization revolution, and a case where all conditions were met but no revolution occurred. Each failure forced the student to add a new dimension to their model. The result is not just a better model of revolutions — it is a lesson in how dangerous it is to generalize from a single case, which is itself a meta-skill that applies to every subject.

Subject 04 · Language Learning

Language Learning — German

Stress-Testing a Model of German Word Order

The student has been studying German and has learned the basic word order rules. They describe their mental model and ask AI to stress-test it with sentences that their model would struggle to predict.

Stress-Test Session — German Word Order

Student

My model of German word order:

In main clauses, the verb comes second. The subject usually comes first, but other elements can go first as long as the verb stays in second position. For example: "Ich gehe morgen ins Kino" (I go tomorrow to the cinema) — subject first, verb second. Or: "Morgen gehe ich ins Kino" (Tomorrow go I to the cinema) — time phrase first, verb still second, subject moves after the verb.

In questions, the verb comes first: "Gehst du ins Kino?" (Go you to the cinema?)

That's my model — verb second in statements, verb first in questions.

Stress-test this. Give me German sentence structures where my model makes the wrong prediction.

AI — Stress-Test Scenarios

Scenario 1: Consider this sentence: "Ich weiß, dass er morgen ins Kino geht." (I know that he tomorrow to the cinema goes.) Your model says the verb goes second. In the "dass" clause, the verb "geht" is at the end. Apply your model — why does it fail here, and what rule are you missing?

Scenario 2: Consider a sentence with a modal verb: "Ich kann morgen ins Kino gehen." (I can tomorrow to the cinema go.) There are two verbs — "kann" and "gehen." Your model says the verb goes second. Which verb goes second? Where does the other one go? What is the pattern?

Scenario 3: Consider a separable prefix verb: "Ich rufe dich morgen an." (I call you tomorrow up — meaning: I'll call you tomorrow.) "Anrufen" is one verb, but it has split into two pieces — "rufe" in second position and "an" at the end. Your model has the verb going second, but here the verb is in two places simultaneously. How does your model handle this?

Scenario 4: Now combine scenarios 1 and 2: "Ich weiß, dass er morgen ins Kino gehen kann." (I know that he tomorrow to the cinema go can.) In the subordinate clause, both verbs go to the end — but in the reverse order from the main clause. What pattern is emerging?

Student Works Through Scenarios

Student

Scenario 1: In the "dass" clause, the verb goes to the end. My model only covered main clauses. There must be a different rule for subordinate clauses — clauses introduced by conjunctions like "dass," "weil," "obwohl." In those clauses, the conjugated verb goes to the end instead of second position. My model was incomplete — it only described one type of clause.

Scenario 2: "Kann" is in second position — that follows my rule. But "gehen" (the infinitive) goes to the end. So my model needs to specify: the conjugated verb (the one that changes with the subject) goes in second position. Any additional verbs — infinitives, participles — go to the end of the clause. My model said "the verb goes second" but didn't account for sentences with multiple verbs.

Scenario 3: The separable prefix verb splits — the conjugated part ("rufe") goes to second position, and the prefix ("an") goes to the end. It's like the verb creates a frame around the rest of the sentence. This is related to what I saw in Scenario 2 — non-conjugated verb elements go to the end. The prefix is acting like a second verb element that follows the same end-position rule.

Scenario 4: In the subordinate clause, both verbs go to the end, but in reverse order — "gehen kann" instead of "kann gehen." In the main clause, the conjugated verb comes first (in position 2) and the infinitive comes last. In the subordinate clause, the infinitive comes first and the conjugated verb comes last — it's like a mirror. The conjugated verb always wants to be at the boundary — second position in main clauses, final position in subordinate clauses.

I'm seeing a deeper pattern now: German word order isn't just "verb second." It's about a verbal frame — the conjugated verb and other verb elements create a bracket around the middle of the sentence, and the type of clause (main versus subordinate) determines whether that bracket opens at position two or closes at the end.

Revised Model

Student — Revised Model

German word order is not simply "verb second." It is a bracket structure (Satzklammer):

In main clauses, the conjugated verb sits in second position, and all other verb elements (infinitives, participles, separable prefixes) go to the end — creating a frame around the middle field where objects, adverbs, and other elements sit.

In subordinate clauses (introduced by conjunctions like "dass," "weil," "wenn"), all verb elements move to the end, with the conjugated verb in the very last position.

In yes/no questions, the conjugated verb moves to first position, but the end-frame rule still holds for other verb elements.

The underlying principle is that German has designated verb positions — a left bracket (position 1 or 2) and a right bracket (end) — and the type of clause determines which bracket the conjugated verb occupies. Everything else fills the middle field. This frame model explains main clauses, subordinate clauses, modal verbs, separable prefixes, and combined structures — all as expressions of a single underlying pattern.

What This Example Demonstrates

The student entered with a rule that handles the simplest German sentences correctly but fails the moment complexity increases. AI's stress-test scenarios were carefully sequenced — each one introducing one new complication (subordinate clauses, multiple verbs, separable prefixes, and combinations) so the student could build their understanding incrementally. By the end, the student had discovered the Satzklammer (sentence bracket) — the fundamental structural principle of German word order — through their own analysis of the patterns. They didn't memorize a rule. They discovered a system. That discovery will make every subsequent German sentence they encounter more predictable, because they now understand the underlying architecture rather than a surface-level rule.