The Role of Structured Data in Generative AI
Structured data plays an important supporting role in enhancing generative AI’s understanding and outputs, even if it doesn’t directly influence AI rankings. It helps AI systems differentiate entities like brand, price, and reviews, ensuring accurate and contextually rich AI-generated summaries. Structured data also feeds knowledge graphs that generative AI models rely on. For instance, Bing Chat uses Schema.org data to enhance its knowledge panels. Structured data is also applied in Retrieval-Augmented Generation (RAG) setups to retrieve accurate information from knowledge bases, ensuring consistency and reliability in model responses. Content generation tools such as Jasper and HubSpot’s AI content assistant can parse structured data to generate accurate product descriptions, FAQs, or summaries.
Structured data is coded using in-page markup on the page to which the information applies, describing the content of that page. It’s advised not to create blank pages solely for structured data or add structured data about information not visible to the user. The Rich Results Test is a useful tool for validating structured data
Challenges and Limitations of Structured Data in AI
Despite its benefits, using structured data in generative AI presents challenges. Not all generative AI models can parse Schema.org markup or JSON-LD without preprocessing, which may require additional steps. Improper implementation of Schema.org markup, such as missing fields or outdated schema types, can lead to misinterpretation or complete disregard by AI systems. Some AI systems primarily train on large text corpora and may not inherently prioritize structured data unless explicitly fine-tuned for it.
SEO vs. SGE: The Difference
One author succinctly states that the “currency” of Google search was links, relevant content, smart keyword use, and references, while the “currency” of large language models is “mentions” (specifically, words that appear frequently near other words) across the training data. Crawlers and bots remain key to both SEO and SGE, with OpenAI using Bing’s bot data for its model. SEO focuses on URLs, links, and keywords, whereas SGE is about prediction and consumes vast amounts of data for training purposes. This means that a goal for SGE is to be mentioned on different sites, not just in passing, but in a more qualitative feature/mention on sites considered sources of training data by the LLM. However, models can be purposefully “biased” based on company and political dynamics; for example, OpenAI prefers search results that rank better on Bing.com. LLM companies also prefer those who allow them to train on their data.
Understanding AI Overviews and When They’re Triggered
AI Overviews (AIOs) are summaries that appear at the top of search pages. To rank well in Google’s AIOs, the focus should be on creating high-quality, authoritative content that directly addresses user queries and aligns with Google’s E-E-A-T standards. Prioritizing clear, concise language, proper formatting, and internal linking improves skimmability and user experience.
Optimizing content for search intent, leveraging structured data, and ensuring a fast, mobile-friendly website are also crucial. Not all searches have an AI Overview.
There are patterns in when AIOs appear for certain keywords. Increases in AIO presence were observed for “best” keywords (50% uptick), “what is” questions (approximately 20% increase, suggesting more informational queries), and “how to” queries (about 15% increase, indicating a rise in process-oriented searches). Increases were also seen for “symptoms of” searches (roughly 12% increase), keywords related to data governance, analytics, and cloud technologies (approximately 10% increase), and “treatment” related queries (about 10% increase). These changes suggest an increase in informational queries and process-oriented searches.
Decreases in AIO presence were noted for “Vs” comparisons (approximately 20% decrease, possibly to avoid hallucinations), brand-specific queries (15% fewer keywords), and general product categories (about 14% fewer keywords). Lifestyle-related queries also saw approximately 12% fewer AIOs, and product model numbers or technical specifications displayed roughly 10% fewer AIOs. AIO presence was reduced by approximately 25% for basic tech support queries and approximately 15% for general wellness queries. These decreases suggest that AIOs may not add value in certain contexts where other search features are present or when answers are volatile. The changes overall suggest AIOs are more likely to be triggered by users seeking in-depth, specialized information, particularly in medical, technological, and financial domains, rather than for everyday consumer decisions or basic troubleshooting.
Niches that get the most SGE snippets are casual, everyday topics with extensive content, such as Food and Beverage, Business, and Relationships, where AI can generate satisfying answers. Google takes extra care when generating content around sensitive or high-stakes topics, which often involve YMYL sites and require thorough fact-checking.
A graph illustrates how often AIOs appear based on the number of words in a query, showing that longer queries generally trigger AIOs more frequently. For example, single-word queries have AIOs 12.03% of the time, while eight-word queries show AIOs 32.11% of the time.
SGE Snippet Preferences and Ranking
Google tends to refer to the top organic results when selecting links for SGE snippets. Authoritative, popular sites are favored when picking links for SGE snippets, and these sites come from various categories, rarely being niche-specific, though niche sites may appear if closely matched to the query. Forum sites like Quora and Reddit frequently appear in SGE snippets due to their discussions and first-hand experiences. Google also includes Google Maps links in SGE snippets for local search queries. Small websites may have difficulty appearing in SGE snippets as Google mainly picks authoritative sites with many backlinks and keywords.

Best Practices for SGE and SGE Ranking
The report outlines best practices for SGE and SGE ranking, covering technical aspects, content, tracking, and foundational SEO hygiene.
Technical Best Practices:
- LLM Crawlers and Bots: Ensure Bing’s bot and OAI-SearchBot are not excluded in your robots exclusion file. Allow ChatGPT-User and GPTBot. While you can exclude these bots, there is a rumor that OpenAI “prefers” those who allow it to use website data in SGE results.
- Structured Data: Check your structured data for your website and all content. Use Schema where it matters, focusing on enhancing traditional search features like rich snippets and FAQs. Optimize structured data for Knowledge Graphs, ensuring it is clean, complete, and properly implemented. Experiment with how structured data impacts your site’s visibility in tools like Bing Chat or SGE.
- Content Structure: Structure content explicitly as questions and direct answers to increase the likelihood of it being surfaced by Google’s AI models.
- Technical SEO: Maximize technical SEO for improved crawling of on-page content, as Google’s AI models still rely on crawling a site’s content. Ensure a HTTP 200 (success) status for pages. Pages must have indexable content (supported file types) and not violate spam policies.
Content Best Practices:
- Topic Overview Pages: Create comprehensive topic overview content that covers the entire user journey, from initial research to final purchasing decisions, to position those pages as prime sources for Google’s AI.
- Keywords: Leverage long-tail keywords, as SGE encourages detailed queries. Align content with the intent of keywords using search tools.
- SERP Analysis: Analyze the SERP for chosen keywords before writing content to understand what makes top results rank, their depth, and how to create better content.
- Natural Language: Write content in natural, conversational language, avoiding overly formal tone or “fluffy” content.
- AI-Friendly Content: Focus on value-driven, well-structured, and authoritative content that AI can easily find and summarize. Use bullet points and logical structures (H2s, H3s, H4s), and avoid overly long sentences. Ensure content is information-rich and provides value to the LLM. Review and revise any existing “AI-unfriendly” content.
- Search Journey Tool: Use search journey tools to understand the series of queries a user may take on a topic, informing keyword targeting.
Tracking and Monitoring:
- AI Overview Tracking: Track search volume for queries that currently trigger AI Overviews to reveal content gaps and optimization opportunities, prioritizing high-value terms.
- Ranking Tools: Use tools like Otter.ai to check SGE rankings.
Foundational SEO Hygiene:
- Basics: Update your Google Business Profile with accurate contact details and operating hours. Ensure good reviews on Google, Reddit, etc.. Maintain consistency across directories for names, addresses, and phone numbers. Maintain SEO best practices. Don’t ignore local SEO and local reviews, as LLMs use these for training data more than traditional search engines. Track both SEO and LLM/SGE rankings.
- Content and Metadata: Update content regularly to ensure relevance, accuracy, and usefulness (EEAT). Include keywords in titles, headings, alt text, image file names, and descriptions. Ensure compliance with structured data requirements for various result types. You must have a favicon, site name, domain, breadcrumb, and visible URL configured.
Broader Visibility in LLM Training Data:
- Estimate AIO Presence: Estimate what percentage of the time users searching for your products or services will get an AI Overview, and invest resources accordingly. Note that ranking for AIO isn’t the same as ranking for ChatGPT, which only produces an AIO in its interface.
- Pursue Mentions: Since ChatGPT pulls from extensive internet training data, if you want to appear in results like “top restaurants in San Francisco,” ensure you are mentioned on review sites, in magazine articles, Reddit posts, and cultural publications. Improving SGE means pursuing other avenues that serve as “inputs” for SGE. The more your business appears in multiple sources, the more it increases the LLM’s “prediction” percentage that you should be in said results.
- Identify LLM Data Sources: Understand your business and industry to identify where LLMs would look for data or on which data they have been trained. You can ask multiple language models about their training data. Tools like SparkToro or BuzzSumo can help analyze where people end up for their searches, which can then be used to ask ChatGPT if those sources are used in training data.
- Featured Status: Pursue featured status on high-authority Q&A and information sites like Quora and Reddit, as they are frequently cited in Google’s AI overviews.
- Partnerships: Pursue partnerships with generative AI players like OpenAI and Google, as their partnership teams are looking to connect with large players in their markets. It’s worth having conversations to understand how to rank in their results, especially if you have data of interest for training their models. OpenAI has a data partnership form.
Conclusion
In conclusion, the evolving landscape of search, particularly with the rise of Search Generative Experience (SGE) and Large Language Models (LLMs), necessitates a strategic shift in content optimization. While traditional SEO focused on links and keywords, the new “currency” of LLMs is “mentions” across vast training data, emphasizing the importance of qualitative features and broad visibility. Structured data, though not a direct ranking factor, plays a crucial supporting role in enhancing AI’s contextual understanding, feeding knowledge graphs, and enabling accurate content generation. However, challenges such as parsing issues, quality of implementation, and model-specific behaviors must be addressed for effective utilization of structured data.
AI Overviews (AIOs) are becoming increasingly prevalent for detailed, specialized queries, particularly in medical, technological, and financial domains, while appearing less for basic troubleshooting or product comparisons. To thrive in this environment, content creators must prioritize high-quality, authoritative, and E-E-A-T aligned content that is clear, concise, and well-structured. Technical SEO remains vital for ensuring content crawlability and rendering , alongside the critical step of ensuring LLM crawlers are not excluded. Strategic content development should involve creating topic overview pages, leveraging long-tail keywords, aligning content with user intent, and writing in natural language. Beyond on-page optimization, broadening visibility through mentions on review sites, magazine articles, and Q&A platforms like Quora and Reddit, and even pursuing partnerships with generative AI players, will be instrumental in influencing LLM training data and improving SGE rankings. Ultimately, a holistic approach that combines foundational SEO hygiene with a keen understanding of AI’s consumption patterns will be key to success in the evolving search landscape.




