Free Backlink CheckerFree Backlink Checker
Crawly
Guide

XML Sitemap: The Complete Guide

What to include, what to leave out, how to submit to Google and Bing, and how to validate your sitemap — from setup to ongoing maintenance.

What an XML sitemap does

An XML sitemap is a file that lists the URLs you want search engines to discover and index. It is not a guarantee of indexing — Google decides for itself what to index — but it is the most direct way to tell Google which pages exist on your site and which you consider important.

Sitemaps are particularly useful for large sites, new sites with few external links, sites with pages that are not well-linked internally, and any site where content is updated frequently.

What a valid sitemap looks like

The minimum valid XML sitemap contains the XML declaration, the urlset root element, and one or more url entries each containing a loc element:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2026-05-01</lastmod>
    <changefreq>weekly</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>https://example.com/about/</loc>
    <lastmod>2026-03-15</lastmod>
  </url>
</urlset>

Only loc is required. The optional fields:

  • lastmod: The date the page was last meaningfully changed. Google uses this to decide how frequently to recrawl. Use ISO 8601 format (YYYY-MM-DD). Only update this when content actually changes — not on every deploy.
  • changefreq: A hint about update frequency. Google largely ignores this. Not worth maintaining carefully.
  • priority: A relative priority from 0.1 to 1.0. Google also largely ignores this for external comparison. Not worth maintaining carefully.

What to include in your sitemap

Your sitemap should contain the canonical URL for every page you want indexed. The rule of thumb: if a page should appear in search results, it belongs in the sitemap.

  • Homepage
  • All published blog posts and articles
  • All live product pages
  • All live category pages
  • Core landing pages (services, features, pricing, about, contact)
  • Guide and resource pages

What to leave out

A sitemap full of noindex pages, redirects, and thin content tells Google your sitemap cannot be trusted and reduces its usefulness.

  • Pages with noindex tags: If you do not want the page indexed, do not include it in the sitemap. Including noindex pages creates a contradiction and causes Google to distrust the file.
  • Redirect URLs: Sitemaps should contain only final destination URLs. Do not include URLs that redirect to other pages.
  • 4xx and 5xx URLs: Only include pages that return 200 HTTP responses.
  • Non-canonical URLs: If a page has a canonical tag pointing to a different URL, use the canonical URL in the sitemap, not the page URL.
  • Low-value parameter URLs: Filtered, sorted, and session-based URLs should be excluded. Include only the clean base URL.
  • Paginated pages: Include page 1 of a paginated series. Exclude pages 2, 3, etc. unless they have distinct indexable content.
  • Admin, login, and account pages: These should not be indexed.

Sitemap index files

The Sitemaps protocol limits a single sitemap file to 50,000 URLs and 50MB. Larger sites should use a sitemap index file that references multiple individual sitemaps:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-pages.xml</loc>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-posts.xml</loc>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-products.xml</loc>
  </sitemap>
</sitemapindex>

Splitting sitemaps by content type also makes it easier to monitor indexing rates for different sections of your site in Google Search Console.

How to generate your sitemap

WordPress

Yoast SEO, Rank Math, and All in One SEO all generate XML sitemaps automatically. WordPress also has a built-in sitemap at /wp-sitemap.xml since version 5.5. The plugin-generated sitemaps give you more control over what is included and excluded.

Shopify

Shopify generates a sitemap automatically at /sitemap.xml. It includes products, collections, blog posts, and pages. You cannot edit the sitemap directly, but you can control which pages are included by setting their indexability in Shopify's admin.

Next.js

Next.js App Router supports sitemap generation natively. Create a app/sitemap.ts file that exports a function returning an array of sitemap entries:

import type { MetadataRoute } from 'next'

export default function sitemap(): MetadataRoute.Sitemap {
  return [
    {
      url: 'https://example.com',
      lastModified: new Date(),
      changeFrequency: 'weekly',
      priority: 1,
    },
  ]
}

Static sites

Static site generators (Hugo, Jekyll, Astro, Eleventy) typically include sitemap plugins or built-in sitemap generation. Check your framework's documentation for the recommended approach.

How to submit your sitemap

robots.txt

Add a Sitemap: directive to your robots.txt file. All crawlers that read robots.txt will discover your sitemap automatically:

Sitemap: https://example.com/sitemap.xml

Google Search Console

In Search Console, go to Sitemaps under the Indexing section. Enter your sitemap URL and click Submit. Search Console then shows you the status of your sitemap: how many URLs were submitted, how many Google has indexed, and any errors found.

Bing Webmaster Tools

Submit your sitemap in Bing Webmaster Tools under Sitemaps. Bing's index feeds ChatGPT and Grok search results, so submission here matters beyond Bing's own traffic.

Common sitemap mistakes

Including noindex pages

The most common mistake. If a URL is in your sitemap but carries a noindex tag, you are sending contradictory signals. Google may deprioritise your sitemap as a discovery tool.

Including redirect URLs

After a migration or URL restructure, old URLs often end up in the sitemap. Crawlers follow the redirect, which wastes crawl budget and signals that your sitemap is not properly maintained.

Not updating after content changes

Deleted pages that remain in the sitemap return 404s. New pages that are not added to the sitemap may take longer to be discovered. Keep your sitemap in sync with your live content.

Sitemap blocked by robots.txt

If your robots.txt blocks the path where your sitemap lives, crawlers cannot read it. Verify that your sitemap URL is accessible.

How Crawly validates your sitemap

Crawly cross-references your sitemap against the crawl results on every run. It flags:

  • Sitemap URLs that return non-200 status codes
  • Sitemap URLs blocked by robots.txt
  • Sitemap URLs with noindex tags
  • Sitemap URLs that redirect (rather than returning 200 directly)
  • Pages not in the sitemap that are discovered through crawling

This gives you a live view of sitemap health after every crawl, without needing to wait for Search Console to report issues.

Validate your sitemap on every crawl

Crawly checks sitemap health automatically. Free to download, unlimited pages.

Download free