Stage 3 — How to engineer pages AI answer engines actually quote
Stage 3 of the Five-Stage Citation Hierarchy — Structured Extractability — is the highest-leverage stage. Schema bundle, content pattern, antipatterns, and the Wiele extractability checklist.
By Jonathan Landman · Published · 14 min read
The 60-second answer
Stage 3 of the Five-Stage Citation Hierarchy is Structured Extractability — whether an AI answer engine can pull a clean, attributable answer block off your page. It is the highest-leverage stage in the hierarchy because it responds to sprint-scale engineering work, not multi-year authority compounding.
The work is five things done in concert: a definitive opener inside the first 200 words, a clean H2 / H3 hierarchy, the right structured-data wrappers (Article · FAQPage · HowTo · Person · Breadcrumb), table-or-list layouts where the data permits, and on-page source attribution. Brands that ship the bundle inside a two-to-four-week sprint typically see 30 to 50 percent citation-share lift on Stage-3-sensitive prompts within a quarter.
Why Stage 3 is the highest-leverage stage.
Each stage of the Five-Stage Citation Hierarchy responds to engineering work on a different timescale. Stage 1 — Entity Resolution — is binary: solved or unsolved. Once your knowledge-graph footprint is clean, the lift is unlocked, but the ceiling is capped at "the engine can find you." Stage 2 — Source Authority — compounds over years. Tier-1 placements and named-author publishing build the substrate, but no agency engagement converts a six-month-old domain into a Wikipedia-grade source in a quarter. Stage 4 — Freshness — is a maintenance discipline; staying current keeps citation eligibility alive, but freshness is the rent you pay, not the building. Stage 5 — Recommendation History — is the long-game moat; early citation begets later citation, but the loop only opens once Stages 1 through 4 are landed.
Stage 3 is different. The schema markup, content pattern, and structural choices that make a page extractable can ship in two to four weeks. The engines see the change on the next crawl, IndexNow accelerates discovery, and citation-share lift shows up in the next monthly engine run. It is the only stage where a small, focused engineering sprint delivers a measurable trajectory change inside a single quarter. Every Wiele Signal Audit output prioritises Stage 3 fixes near the top of the 30-day roadmap, because the cost-to-impact ratio is the highest in the hierarchy.
The leverage cuts both ways. Sites that neglect Stage 3 silently underperform — they have decent authority, good content, and even strong founder voice, but the engines pass them over for competitors who simply made their answers easier to quote. That neglect is the most common failure pattern Wiele sees in pre-engagement audits across premium agencies and B2B firms. The five elements that follow are the entire intervention.
The schema bundle.
Five Schema.org types do the heavy lifting at Stage 3. The engines lean on JSON-LD; alternative encodings (microdata, RDFa) work in principle but JSON-LD is the format every major engine documents first and supports most reliably. Inline the schemas in the page's rendered HTML — not in a hydration payload a non-JS crawler will miss.
Article
The wrapper
Every methodology piece, brief, case study, or thought-leadership page should wrap in an Article schema. The minimum fields are headline, description, datePublished, dateModified, author, and image. Article is the surface where founder voice (Stage 5 of the hierarchy) gets attributed; engines that resolve a credible author tend to up-weight the cited passage. Article also tells the engine the page is editorial, not promotional — which matters when the engine is choosing between a vendor blog and an analyst report.
FAQPage
The answer harvester
FAQPage is the most under-shipped Stage 3 lever in the agency category. When a page genuinely answers eight to ten buyer-decision questions, FAQPage schema hands the engine a pre-extracted Q&A block ready to quote verbatim. The two anti-patterns to avoid are using FAQPage for marketing fluff (engines penalise promotional FAQ) and duplicating the on-page FAQ inside the schema instead of mirroring it (engines flag the mismatch). Match the schema content to the visible content exactly, and reserve FAQ for genuine buyer questions, not objection-handling.
HowTo
The procedural wrapper
HowTo applies when a page documents a procedure — a checklist, a sequence of steps, a methodology in stages. The Five-Stage Citation Hierarchy itself is the kind of content HowTo was designed for. Each step gets a name, text, and optionally an image. Engines reward HowTo because the structure maps cleanly to procedural prompts, the ones starting with "how do I" or "what's the process for." Apply HowTo on methodology pages, audit-roadmap pages, and any page that walks a buyer through a sequence.
Person
The author authority signal
Person schema on the founder or named author is the entity-disambiguation lever. Without it, the engine sees Organization-attributed content; with it, the engine sees a named human whose authority is independently verifiable via sameAs links to LinkedIn, X, and any other recognised profiles. Critical detail: the Person.sameAs must point to the human's personal profiles, not the company's. Pointing Person.sameAs at the company LinkedIn collapses the human and the brand into one entity, dilutes citation attribution, and undercuts the founder-voice signal — see the five signals breakdown for the wider context.
Breadcrumb
The structural context
BreadcrumbList is the quiet workhorse. It tells the engine where the page sits in the site hierarchy, which lets the engine reason about authority transfer from the parent surface (a /citation-brief index page lifts each individual brief; a /systems/ai-visibility lifts each child page). BreadcrumbList is also the schema Google's SERP enrichment leans on for the breadcrumb display under blue links. Cheap to ship, runs on every page, no excuse to skip it.
The extractability content pattern.
Schema is necessary but not sufficient. The content itself has to be shaped for extraction. Five elements compose the Wiele extractability pattern, and they reinforce one another — the bundle works because every element nudges the engine toward the same conclusion: this page contains a clean, attributable answer to the prompt.
The definitive opener. The first 100 to 200 words must contain a quotable answer to the page's implied question. Not a tease, not a setup, not a marketing hook — the answer itself, written in the third person, framed as if it could be lifted verbatim into an AI response. The 60-second-answer block at the top of every Wiele Citation Brief is the canonical implementation. Engines that scan a page in milliseconds will pull the first extractable block they find; if your first block is "we believe" and a competitor's is "the five signals are…", the engine picks the competitor.
A predictable H2 / H3 hierarchy. Two levels deep, semantic headings, no skipped levels. Every H2 names a substantive section; every H3 names a substantive subsection. Engines map the heading tree onto a passage-retrieval index — the cleaner the tree, the more accurately the engine locates the right passage to cite. Marketing-style headings (questions framed for click-bait, all-caps slogans) confuse the retrieval; declarative headings outperform them across every engine class.
Table-or-list layouts where the data permits. Pricing comparisons, feature breakdowns, multi-step procedures, multi-option trade-offs — anything that benefits from rows-and-columns or numbered-list structure should ship as a table or list, not as prose. Engines extract tables and lists almost atomically; prose containing the same data extracts less reliably. The cost is the discipline of structuring information; the reward is preferential citation.
Internal anchor links. Every H2 should resolve to a stable URL fragment (e.g.
/citation-brief/stage-3-structured-extractability#the-schema-bundle). Anchors let the engine cite a sub-section URL rather than the whole page — finer-grained attribution that survives content updates and lifts citation precision. Cheap to implement; Wiele's mdx-components automatically wires this via rehype-slug.On-page citations and source attribution. When the page asserts a fact, it names the source — Schema.org spec, Google Search Central documentation, public engine guidance, or the brand's own primary research. On-page citations signal to the engine that the page itself participates in the citation graph, which compounds with Stage 2 (Source Authority) and Stage 5 (Recommendation History) over time.
What NOT to do.
Five recurring Stage 3 antipatterns Wiele sees across pre-engagement audits — each one is a quiet citation-share leak.
Burying the answer three scrolls deep. The page eventually answers the question, but only after a 600-word setup, a brand origin story, and a manifesto on AI search. Engines that pull the first extractable block from the first 200 words never see the actual answer. Lead with it; expand around it.
Schema injected via client-side JavaScript. Some implementations inject JSON-LD via a runtime script tag after hydration. Non-JS crawlers (including the AI-engine crawlers that prioritise speed over execution) miss the schema entirely. Inline the JSON-LD in the rendered HTML response — server-side or static-rendered, not hydrated.
FAQPage stuffed with marketing FAQ. Schema.org defines FAQPage for genuine question-and-answer content. Pages that wrap promotional CTAs ("Why choose us?" "How long until I see results?") in FAQPage schema get flagged by Google's rich-result eligibility filter and silently dropped from the answer set. Reserve FAQPage for buyer-decision questions with substantive answers.
H1 + paragraph-only structure. A 3,000-word page with a single H1 and no H2 / H3 breakdown is a wall of prose with no extraction handles. The engine can't locate the right passage; the founder voice is uniform but un-quotable. Two levels of semantic heading hierarchy, minimum, for any page over 800 words.
No author attribution. The page is unsigned, no Person schema, no byline, no bio link. Founder voice (Stage 5 of the hierarchy) evaporates — the engine can't weigh the author's authority because the author doesn't exist as an entity. Name the author; ship Person schema with the correct sameAs; link to a bio that the engine can crawl.
The Wiele extractability checklist.
Run this against any page that you want cited. Ten items, two minutes per page, ownable as a deliverable. Wiele engagements ship a populated version of this checklist as part of every Signal Audit; the version below is the methodology you can apply yourself.
- The first 100 to 200 words contain a quotable, third-person answer to the page's implied question.
- H2 / H3 hierarchy is two levels deep, semantic, no skipped levels.
- Article schema wraps the page (or WebPage / Service / Product as the page type dictates).
- FAQPage schema is present if and only if the page contains genuine buyer Q&A.
- HowTo schema is present if the page documents a procedure.
- Person schema names the author with correct sameAs (personal profile, not company).
- BreadcrumbList schema is present and matches the visible breadcrumb.
- JSON-LD is inlined in the rendered HTML response — visible in view-source, not injected via JavaScript.
- Tables or lists are used wherever the data permits comparison or enumeration.
- Every H2 resolves to a stable anchor URL via id slug (e.g.
#the-schema-bundle).
If a page fails three or more of these, it is leaking Stage 3 citation share — and the lift from fixing it is usually visible within one to two monthly engine runs.
Methodology & sources.
Stage 3 patterns observed across the Wiele AI Citation Tracker dataset (private, anonymised) over 18 months of weekly engine runs. Engine-specific guidance triangulated against each provider's public documentation:
- Schema.org — Article, FAQPage, HowTo, Person, BreadcrumbList type specifications
- Google Search Central — Structured data guidelines and Rich Results eligibility
- Google AI Overviews — Documented opt-in mechanics via robots-meta-tag
- Bing Webmaster — IndexNow protocol and structured-data recommendations
- OpenAI ChatGPT search — Cited-source format observed across 240+ panel queries
- Perplexity — Public methodology on source weight and citation attribution
- Wiele Citation Tracker dataset — 18 months · 12 client cohorts · weekly engine runs across 10 engines
- The Five-Stage Citation Hierarchy (Citation Brief #001) and the five citation signals framework (Wiele Labs) provide the broader methodological context
Every claim above is reproducible from public sources or Wiele's instrumented engine-run dataset. Engagement clients receive the named-competitor Stage-3 lift trace inside the Citation Score™ dashboard. The full prompt panel, source-level citation logging, and methodology rubric are published at /trust.
Stage 3 is where the citation game gets won fastest. If you want a populated extractability scorecard against your live site — and the 30-day roadmap to close the gaps — start with a Signal Audit, or instrument the lift over time with the Citation Score™ Authority retainer. Brief #001 covers the full Five-Stage Citation Hierarchy; the AI Visibility system page details how Stage 3 sits inside the wider Wiele engagement shape.
Questions on this brief.
The next step
Start with a Signal Audit.
A diagnostic that maps your citation graph, entity baseline, and authority gaps — plus a 30-day implementation roadmap. The fastest way to know where you stand inside the answer economy.

