What is Stage 3 in the Five-Stage Citation Hierarchy?

Stage 3 is Structured Extractability — whether an AI answer engine can pull a clean, attributable answer block off your page. It stacks on Entity Resolution (Stage 1) and Source Authority (Stage 2), and feeds Freshness (Stage 4) and Recommendation History (Stage 5). The full hierarchy is in Citation Brief #001.

Which schema types matter most for AI citation?

Five do the heavy lifting at Stage 3: Article (the methodology wrapper), FAQPage (the answer-harvester for genuine buyer questions), HowTo (procedural content), Person (named-author authority signal), and BreadcrumbList (structural context). All five should ship as inline JSON-LD, not injected via client-side JavaScript.

Should every page have FAQPage schema?

Only if the page contains genuine buyer Q&A — eight to ten substantive questions with real answers. FAQPage schema stuffed with marketing fluff (Why choose us? How long until results?) gets flagged by Google's rich-result eligibility filter and silently dropped. Reserve FAQPage for buyer-decision questions with substantive answers; mirror the schema content to the visible content exactly.

HowTo vs Article — which should I use?

Article wraps any methodology, brief, case study, or thought-leadership piece — it's the editorial wrapper for substantive content. HowTo wraps procedural content where the page documents a sequence of steps or a checklist. A methodology page that walks a buyer through a five-stage process can carry both: Article as the outer wrapper, HowTo for the procedural section.

Does the schema have to be in JSON-LD format?

Microdata and RDFa work in principle, but every major engine documents JSON-LD first and supports it most reliably. JSON-LD also separates cleanly from rendered content, which makes the schema easier to maintain and harder to break with content edits. Wiele engineers JSON-LD across every engagement.

How do I test if my schema is working?

Google's Rich Results Test (search.google.com/test/rich-results) is the primary tool — it validates the schema and previews how Google might render it. Schema.org Validator (validator.schema.org) catches structural errors. For an end-to-end check, run a fixed prompt panel against the live engines after the schema ships and compare citation share before and after; that's how the Citation Score™ subscription instruments the lift.

Does Stage 3 apply to non-English content?

Yes — schema is language-agnostic and the extractability content pattern (definitive opener, semantic heading hierarchy, table-or-list layouts, on-page citations) transfers across languages. Specific engine weights drift slightly between locales, but the structural lift from a Stage 3 sprint shows up consistently across English, French, German, Spanish, and Japanese in the Wiele Citation Tracker dataset.

What's the typical citation-share lift from a Stage 3 sprint?

Across pre-engagement Signal Audits where Stage 3 was the primary intervention, the typical citation-share lift on Stage-3-sensitive prompts is 30 to 50 percent inside one quarter. The exact lift depends on baseline (sites with no schema and buried answers see the largest delta; sites with partial schema see smaller incremental lift) and competitive context (uncontested verticals lift faster than saturated ones). The Citation Score™ subscription tracks the lift month over month against a named competitor set.

Citation Briefs

Citation Brief #002 · AEO methodology

Stage 3 — How to engineer pages AI answer engines actually quote

Stage 3 of the Five-Stage Citation Hierarchy — Structured Extractability — is the highest-leverage stage. Schema bundle, content pattern, antipatterns, and the Wiele extractability checklist.

By Jonathan Landman · Published 14 May 2026 · 14 min read

The 60-second answer

Stage 3 of the Five-Stage Citation Hierarchy is Structured Extractability — whether an AI answer engine can pull a clean, attributable answer block off your page. It is the highest-leverage stage in the hierarchy because it responds to sprint-scale engineering work, not multi-year authority compounding.

The work is five things done in concert: a definitive opener inside the first 200 words, a clean H2 / H3 hierarchy, the right structured-data wrappers (Article · FAQPage · HowTo · Person · Breadcrumb), table-or-list layouts where the data permits, and on-page source attribution. Brands that ship the bundle inside a two-to-four-week sprint typically see 30 to 50 percent citation-share lift on Stage-3-sensitive prompts within a quarter.

Why Stage 3 is the highest-leverage stage.

Each stage of the Five-Stage Citation Hierarchy responds to engineering work on a different timescale. Stage 1 — Entity Resolution — is binary: solved or unsolved. Once your knowledge-graph footprint is clean, the lift is unlocked, but the ceiling is capped at "the engine can find you." Stage 2 — Source Authority — compounds over years. Tier-1 placements and named-author publishing build the substrate, but no agency engagement converts a six-month-old domain into a Wikipedia-grade source in a quarter. Stage 4 — Freshness — is a maintenance discipline; staying current keeps citation eligibility alive, but freshness is the rent you pay, not the building. Stage 5 — Recommendation History — is the long-game moat; early citation begets later citation, but the loop only opens once Stages 1 through 4 are landed.

Stage 3 is different. The schema markup, content pattern, and structural choices that make a page extractable can ship in two to four weeks. The engines see the change on the next crawl, IndexNow accelerates discovery, and citation-share lift shows up in the next monthly engine run. It is the only stage where a small, focused engineering sprint delivers a measurable trajectory change inside a single quarter. Every Wiele Signal Audit output prioritises Stage 3 fixes near the top of the 30-day roadmap, because the cost-to-impact ratio is the highest in the hierarchy.

The leverage cuts both ways. Sites that neglect Stage 3 silently underperform — they have decent authority, good content, and even strong founder voice, but the engines pass them over for competitors who simply made their answers easier to quote. That neglect is the most common failure pattern Wiele sees in pre-engagement audits across premium agencies and B2B firms. The five elements that follow are the entire intervention.

The schema bundle.

Five Schema.org types do the heavy lifting at Stage 3. The engines lean on JSON-LD; alternative encodings (microdata, RDFa) work in principle but JSON-LD is the format every major engine documents first and supports most reliably. Inline the schemas in the page's rendered HTML — not in a hydration payload a non-JS crawler will miss.

Article

The wrapper

Every methodology piece, brief, case study, or thought-leadership page should wrap in an Article schema. The minimum fields are headline, description, datePublished, dateModified, author, and image. Article is the surface where founder voice (Stage 5 of the hierarchy) gets attributed; engines that resolve a credible author tend to up-weight the cited passage. Article also tells the engine the page is editorial, not promotional — which matters when the engine is choosing between a vendor blog and an analyst report.

FAQPage

The answer harvester

FAQPage is the most under-shipped Stage 3 lever in the agency category. When a page genuinely answers eight to ten buyer-decision questions, FAQPage schema hands the engine a pre-extracted Q&A block ready to quote verbatim. The two anti-patterns to avoid are using FAQPage for marketing fluff (engines penalise promotional FAQ) and duplicating the on-page FAQ inside the schema instead of mirroring it (engines flag the mismatch). Match the schema content to the visible content exactly, and reserve FAQ for genuine buyer questions, not objection-handling.

HowTo

The procedural wrapper

HowTo applies when a page documents a procedure — a checklist, a sequence of steps, a methodology in stages. The Five-Stage Citation Hierarchy itself is the kind of content HowTo was designed for. Each step gets a name, text, and optionally an image. Engines reward HowTo because the structure maps cleanly to procedural prompts, the ones starting with "how do I" or "what's the process for." Apply HowTo on methodology pages, audit-roadmap pages, and any page that walks a buyer through a sequence.

Person

The author authority signal

Person schema on the founder or named author is the entity-disambiguation lever. Without it, the engine sees Organization-attributed content; with it, the engine sees a named human whose authority is independently verifiable via sameAs links to LinkedIn, X, and any other recognised profiles. Critical detail: the Person.sameAs must point to the human's personal profiles, not the company's. Pointing Person.sameAs at the company LinkedIn collapses the human and the brand into one entity, dilutes citation attribution, and undercuts the founder-voice signal — see the five signals breakdown for the wider context.

Breadcrumb

The structural context

BreadcrumbList is the quiet workhorse. It tells the engine where the page sits in the site hierarchy, which lets the engine reason about authority transfer from the parent surface (a /citation-brief index page lifts each individual brief; a /systems/ai-visibility lifts each child page). BreadcrumbList is also the schema Google's SERP enrichment leans on for the breadcrumb display under blue links. Cheap to ship, runs on every page, no excuse to skip it.

The extractability content pattern.

Schema is necessary but not sufficient. The content itself has to be shaped for extraction. Five elements compose the Wiele extractability pattern, and they reinforce one another — the bundle works because every element nudges the engine toward the same conclusion: this page contains a clean, attributable answer to the prompt.

The definitive opener. The first 100 to 200 words must contain a quotable answer to the page's implied question. Not a tease, not a setup, not a marketing hook — the answer itself, written in the third person, framed as if it could be lifted verbatim into an AI response. The 60-second-answer block at the top of every Wiele Citation Brief is the canonical implementation. Engines that scan a page in milliseconds will pull the first extractable block they find; if your first block is "we believe" and a competitor's is "the five signals are…", the engine picks the competitor.
A predictable H2 / H3 hierarchy. Two levels deep, semantic headings, no skipped levels. Every H2 names a substantive section; every H3 names a substantive subsection. Engines map the heading tree onto a passage-retrieval index — the cleaner the tree, the more accurately the engine locates the right passage to cite. Marketing-style headings (questions framed for click-bait, all-caps slogans) confuse the retrieval; declarative headings outperform them across every engine class.
Table-or-list layouts where the data permits. Pricing comparisons, feature breakdowns, multi-step procedures, multi-option trade-offs — anything that benefits from rows-and-columns or numbered-list structure should ship as a table or list, not as prose. Engines extract tables and lists almost atomically; prose containing the same data extracts less reliably. The cost is the discipline of structuring information; the reward is preferential citation.
Internal anchor links. Every H2 should resolve to a stable URL fragment (e.g. /citation-brief/stage-3-structured-extractability#the-schema-bundle). Anchors let the engine cite a sub-section URL rather than the whole page — finer-grained attribution that survives content updates and lifts citation precision. Cheap to implement; Wiele's mdx-components automatically wires this via rehype-slug.
On-page citations and source attribution. When the page asserts a fact, it names the source — Schema.org spec, Google Search Central documentation, public engine guidance, or the brand's own primary research. On-page citations signal to the engine that the page itself participates in the citation graph, which compounds with Stage 2 (Source Authority) and Stage 5 (Recommendation History) over time.

What NOT to do.

Five recurring Stage 3 antipatterns Wiele sees across pre-engagement audits — each one is a quiet citation-share leak.

Burying the answer three scrolls deep. The page eventually answers the question, but only after a 600-word setup, a brand origin story, and a manifesto on AI search. Engines that pull the first extractable block from the first 200 words never see the actual answer. Lead with it; expand around it.
Schema injected via client-side JavaScript. Some implementations inject JSON-LD via a runtime script tag after hydration. Non-JS crawlers (including the AI-engine crawlers that prioritise speed over execution) miss the schema entirely. Inline the JSON-LD in the rendered HTML response — server-side or static-rendered, not hydrated.
FAQPage stuffed with marketing FAQ. Schema.org defines FAQPage for genuine question-and-answer content. Pages that wrap promotional CTAs ("Why choose us?" "How long until I see results?") in FAQPage schema get flagged by Google's rich-result eligibility filter and silently dropped from the answer set. Reserve FAQPage for buyer-decision questions with substantive answers.
H1 + paragraph-only structure. A 3,000-word page with a single H1 and no H2 / H3 breakdown is a wall of prose with no extraction handles. The engine can't locate the right passage; the founder voice is uniform but un-quotable. Two levels of semantic heading hierarchy, minimum, for any page over 800 words.
No author attribution. The page is unsigned, no Person schema, no byline, no bio link. Founder voice (Stage 5 of the hierarchy) evaporates — the engine can't weigh the author's authority because the author doesn't exist as an entity. Name the author; ship Person schema with the correct sameAs; link to a bio that the engine can crawl.

The Wiele extractability checklist.

Run this against any page that you want cited. Ten items, two minutes per page, ownable as a deliverable. Wiele engagements ship a populated version of this checklist as part of every Signal Audit; the version below is the methodology you can apply yourself.

The first 100 to 200 words contain a quotable, third-person answer to the page's implied question.
H2 / H3 hierarchy is two levels deep, semantic, no skipped levels.
Article schema wraps the page (or WebPage / Service / Product as the page type dictates).
FAQPage schema is present if and only if the page contains genuine buyer Q&A.
HowTo schema is present if the page documents a procedure.
Person schema names the author with correct sameAs (personal profile, not company).
BreadcrumbList schema is present and matches the visible breadcrumb.
JSON-LD is inlined in the rendered HTML response — visible in view-source, not injected via JavaScript.
Tables or lists are used wherever the data permits comparison or enumeration.
Every H2 resolves to a stable anchor URL via id slug (e.g. #the-schema-bundle).

If a page fails three or more of these, it is leaking Stage 3 citation share — and the lift from fixing it is usually visible within one to two monthly engine runs.

Methodology & sources.

Stage 3 patterns observed across the Wiele AI Citation Tracker dataset (private, anonymised) over 18 months of weekly engine runs. Engine-specific guidance triangulated against each provider's public documentation:

Schema.org — Article, FAQPage, HowTo, Person, BreadcrumbList type specifications
Google Search Central — Structured data guidelines and Rich Results eligibility
Google AI Overviews — Documented opt-in mechanics via robots-meta-tag
Bing Webmaster — IndexNow protocol and structured-data recommendations
OpenAI ChatGPT search — Cited-source format observed across 240+ panel queries
Perplexity — Public methodology on source weight and citation attribution
Wiele Citation Tracker dataset — 18 months · 12 client cohorts · weekly engine runs across 10 engines
The Five-Stage Citation Hierarchy (Citation Brief #001) and the five citation signals framework (Wiele Labs) provide the broader methodological context

Every claim above is reproducible from public sources or Wiele's instrumented engine-run dataset. Engagement clients receive the named-competitor Stage-3 lift trace inside the Citation Score™ dashboard. The full prompt panel, source-level citation logging, and methodology rubric are published at /trust.

Stage 3 is where the citation game gets won fastest. If you want a populated extractability scorecard against your live site — and the 30-day roadmap to close the gaps — start with a Signal Audit, or instrument the lift over time with the Citation Score™ Authority retainer. Brief #001 covers the full Five-Stage Citation Hierarchy; the AI Visibility system page details how Stage 3 sits inside the wider Wiele engagement shape.

Questions on this brief.

The next step

Start with a Signal Audit.

A diagnostic that maps your citation graph, entity baseline, and authority gaps — plus a 30-day implementation roadmap. The fastest way to know where you stand inside the answer economy.

Run a Growth Audit See the Citation Score™ subscription