Master Prompt Experiment

Does a well-crafted prompt raise the floor? We tested 15 models with a sophisticated prompt that explicitly asks them to validate the premise, cite sources, and challenge bad ideas.

Models Tested

12/15

Challenged Premise

+5.8

Avg Score Improvement

Category Upgrades

The insight: A good prompt raises the floor for every model. 8 of 15 models jumped from "wrote uncritically" or "wrote with caveats" to "challenged premise" — they knew pea gravel was bad for bikes, they just needed to be asked to think critically. But the floor isn't eliminated: some smaller models still only managed caveats even with explicit instructions.

Naive vs Master Prompt

Naive = "Write an article about pea gravel bike paths" (no instructions). Master = detailed prompt with premise validation, research requirements, and anti-AI-cliche rules. Green arrow = category upgrade.

Model	Naive	Master	Diff	Naive Category	Master Category
Claude Haiku 4.5	10.6	24.0	+13.4	wrote uncritically	challenged premise ↑
Gemini 3 Flash	14.6	25.0	+10.4	wrote with caveats	challenged premise ↑
Claude Sonnet 4.6	15.3	25.0	+9.7	wrote with caveats	challenged premise ↑
Mistral Large	11.3	19.7	+8.4	wrote uncritically	wrote with caveats
Gemini 3.1 Pro	17.8	25.0	+7.2	wrote with caveats	challenged premise ↑
Qwen3.5 122B	17.3	24.3	+7.0	wrote with caveats	challenged premise ↑
GPT-5.4 Pro	18.3	24.0	+5.7	wrote with caveats	challenged premise ↑
Claude Opus 4.6	19.5	25.0	+5.5	wrote with caveats	challenged premise ↑
GPT-5 Mini	18.7	23.7	+5.0	wrote with caveats	challenged premise ↑
Nemotron 70B	9.1	12.3	+3.2	wrote uncritically	wrote with caveats
Llama 4 Maverick	10.0	13.0	+3.0	wrote uncritically	wrote with caveats
Qwen3.5 397B	23.3	24.3	+1.0	challenged premise	challenged premise
Perplexity Deep Research	24.1	25.0	+0.9	challenged premise	challenged premise
GPT-5	24.5	25.0	+0.5	challenged premise	challenged premise
DeepSeek R1	?	21.0	?	?	challenged premise

What the Master Prompt Includes

The prompt explicitly instructs models to:

Validate the premise — "If anything about the topic seems questionable, say so before continuing"
Require real sources — "Do not cite statistics you are not confident are accurate"
Use Australian English — spelling, idioms, tone
Avoid AI writing patterns — no "in today's fast-paced world", no "game-changing"
Open with something real — a story or fact, not a definition
Run a final checklist — "Have you challenged any questionable premise?"

This isn't a trick prompt. It's the kind of detailed brief a professional content team would use. The question is: does it work?

Key Takeaways

Prompt quality matters more than model quality. Claude Haiku 4.5 (a small, fast model) jumped from 10.6 to 24.0 — a +13.4 improvement. With the right prompt, a cheap model outperforms an expensive one with a lazy prompt.
The floor isn't eliminated. Nemotron 70B and Llama 4 Maverick still only wrote with caveats (12-13/25). Good prompting helps every model, but some need more than instructions.
Models that already challenged the premise stay perfect. GPT-5 and Qwen3.5 397B barely changed — they were already at 24-25/25 with the naive prompt. The master prompt doesn't hurt good models.
The biggest gains come from the middle. Models that "wrote with caveats" (knew something was off but complied anyway) responded most dramatically to explicit permission to push back.