Citation Score™ methodology.
Four components — Engine Coverage, On-Page Extractability, Citation Velocity, and Recommendation History. The metric layer above the Five-Stage Citation Hierarchy.
By Jonathan Landman · Published · v3.9.4
The 60-second answer
The Wiele Citation Score™ is the instrumented measure of how often AI answer engines cite a brand against a named competitor set. The score has four components — Engine Coverage, On-Page Extractability, Citation Velocity, and Recommendation History — each scored separately and rolled into a composite tracked month over month.
Brands enter Wiele's measurement system through the
Five-Stage Citation Hierarchy
framework — Entity Resolution, Source Authority, Structured Extractability, Freshness, Recommendation History. Citation Score™ is the metric layer; the hierarchy is the methodology layer; the
Citation Score™ subscription
is the recurring product.
Stage 0 prerequisite. Before a page enters the hierarchy at Stage 1, it clears the Stage 0 Self-Evidence ≥ 7/10 gate (Krug's Trunk Test, six 5-second answers). Pages below 7/10 are rewritten before any hierarchy work begins. The rubric lives in §2 below.
§1 · Engine Coverage
Which AI answer engines does the score track? Coverage breadth is a baseline signal of monitoring rigour — a Citation Score™ that tracks only ChatGPT misses the engines a buyer might actually use on the day they decide. Wiele's coverage spans ten engines at Authority tier: ChatGPT, Perplexity, Google AI Overviews, Gemini, Microsoft Copilot, Claude, Grok, You.com, Brave Search, and DeepSeek. Coverage stratifies by tier — five engines at Starter, eight at Pro, ten at Authority.
The score for §1 is the proportion of in-scope engines on which the brand appears within the top-three cited sources across the named prompt set, weighted by engine traffic share. A brand that wins citation share on ChatGPT but loses Perplexity is structurally different from one that splits coverage evenly across both — and the score reflects that.
How Wiele scores it. Monthly engine runs against a fixed prompt panel of 30 to 100 queries per brand (sized by tier). The panel is locked at engagement start to keep month-over-month comparison honest. Each engine's weight reflects the engine's estimated share of relevant buyer-side AI search traffic.
§2 · On-Page Extractability
How well a page is structured for direct extraction by AI answer engines. Inputs: schema completeness, heading hierarchy clarity, definitive-opener presence, paragraph-block scannability, semantic-HTML correctness.
Sub-metric — Prototype Match Score
How closely the page matches the prototypical exemplar of the content category it claims to occupy ("AEO audit content," "agency pricing page," "citation-method explainer"). Anchored on Jones et al. (2015) — the 50ms first-impression study found that prototypicality, moderate visual complexity, and processing fluency drive memorable encoding — and Loken Handbook prototypicality theory (brands ARE categories; the prototypical exemplar wins on attention, attitude-accessibility, and automatic choice).
Rubric (10-point scale).
| Score | Description |
|---|---|
| 0–3 | Atypical or chaotic. Page does not register as the category it claims; first-impression encoding fails; LLM extraction misroutes the page to an adjacent or wrong category. |
| 4–6 | Partial match. Some prototypical elements present (heading shape, schema bundle), but layout or copy violates category expectations. LLM extraction succeeds but at higher cost; competitor pages outrank on prototype-match basis. |
| 7–9 | High prototypicality. Page reads as a strong exemplar of its category. LLM extraction is fast and high-confidence; first-impression encoding succeeds within Jones et al.'s 50ms window. |
| 10 | Reference exemplar. The page IS the category. Competitors implicitly cite the structure. LLM training data treats it as the prototypical case. (Wiele targets this for every flagship surface.) |
How Wiele scores it. Manual rubric against the canonical-exemplar set maintained in our Intelligence Library. A Citation Score™ Authority-tier subscriber receives a Prototype Match audit on every flagship page once per quarter; Pro receives biannually; Starter receives the score but not the audit narrative.
Sub-metric — Self-Evidence Score (Stage 0 rubric)
Self-Evidence is the Stage 0 prerequisite gate: can a human reader answer Krug's Trunk Test in 5 seconds (site identity, page identity, primary sections, current location, CTA, buyer-journey position)? The score is 1–10. Pages at 4–6 are rewritten before hierarchy work begins; pages at 7–9 enter Stage 1; pages at 10 are reference exemplars and feed our internal Prototype Match training set.
The processing-fluency signals that drive 50ms first-impression encoding in human cognition drive LLM extraction confidence in the same direction — which is why Self-Evidence and Prototype Match cluster together in §2 rather than living on separate scoring axes.
§3 · Citation Velocity
The rate at which a brand picks up new citations across the engine set, measured per engine-run. Velocity catches trajectory changes that absolute citation share misses — a brand at 12 percent share that climbed from 5 percent in two months is a different story from one stuck at 12 percent for six.
Velocity is the early-signal metric for Stage 3 (Structured Extractability) interventions. Schema sprints, definitive-opener rewrites, and FAQ pattern adoption typically show up in velocity inside one to two engine runs — months before the absolute share number moves enough to be statistically clean.
How Wiele scores it. Month-over-month delta in citation count per engine, normalised by prompt-panel size. Positive velocity sustained over a quarter feeds the Recommendation History compounding loop (§4); flat or negative velocity triggers a Stage 3 audit on the highest-traffic cited pages.
§4 · Recommendation History
Engines that have cited a source before are statistically more likely to cite it again. Recommendation History is the compounding loop and the reason early citation matters disproportionately — every accepted citation strengthens the next one.
§4 tracks two derived signals: the count of distinct prompts on which a brand was cited in the prior 90-day window (breadth), and the count of repeat citations on the same prompt across consecutive engine runs (depth). Depth without breadth indicates a single captured prompt; breadth without depth indicates volatile citation eligibility. Both rising together is the compounding signal.
How Wiele scores it. Rolling 90-day window across the prompt panel. Authority-tier engagements receive a quarterly Recommendation History narrative — which prompts became repeat-citation eligible, which slipped, and which compounded against the named competitor set. Pro receives biannually; Starter receives the breadth/depth pair but not the narrative.
Sources and methodology refresh
The Citation Score™ methodology is refreshed quarterly against the canon library maintained by Wiele's research operator. Anchors that load-bear the v3.9.4 specification:
Krug — Don't Make Me Think. Trunk Test + Self-Evidence ≥ 7/10 gate (§2 sub-metric).
Jones et al. (2015) — 50ms first-impression encoding study. Prototypicality + moderate visual complexity + processing fluency as the dominant predictors (§2 sub-metric).
Loken Handbook — prototypicality theory; brands as categories; the prototypical exemplar wins (§2 sub-metric).
Whalen — Design for How People Think (2019). Six Minds pre-deploy gate, complementary to Self-Evidence scoring.
Refresh cadence: 2026-08-14 (quarterly cycle). Changes to the rubric are versioned and disclosed to active subscribers ahead of the run in which they take effect.
The next step
Instrument the lift.
The Citation Score™ subscription runs the engine panel month over month and surfaces the four-component score against your named competitor set. Three tiers, monthly in GBP, three-month minimum.

