Free Backlink CheckerFree Backlink Checker
Crawly
All articles
Technical SEOXML Sitemap

What is an XML Sitemap and Do You Need One?

An XML sitemap lists every URL you want search engines to index. Here is what to include, what to leave out, how to submit it, and the mistakes to avoid.

16 May 2026 · 6 min read

An XML sitemap is a file that lists every URL on your website you want search engines to index. It is submitted to Google and Bing directly, giving crawlers a complete map of your site without having to discover every page through internal links alone.

The file lives at yoursite.com/sitemap.xml and contains a structured list of URLs alongside optional metadata: when each page was last modified, how frequently it changes, and its relative priority.

Why does an XML sitemap matter for SEO?

Search engines find pages by following links. But not every page on your site is well-linked. New pages, orphaned pages, or pages buried deep in your navigation may take weeks or months for Googlebot to discover through crawling alone.

A sitemap accelerates discovery. When you submit one to Google Search Console or Bing Webmaster Tools, the search engine immediately knows every URL you want indexed. It still decides which pages to crawl and index on its own terms, but it has the full picture from day one.

For most small to medium sites, a sitemap does not dramatically change how many pages get indexed. The benefit grows with site size and complexity: ecommerce stores with thousands of products, news publishers with rapid content velocity, and sites with weak internal link structures all gain more from a sitemap than a simple ten-page website.

What should be in your XML sitemap?

Include only the pages you want indexed. The sitemap is a signal to Google about what matters on your site. Filling it with low-quality or non-canonical pages sends a mixed message.

Include:

  • All canonical, indexable pages
  • Your most important landing pages, product pages, and blog posts
  • Pages you actively want to rank in search results

Exclude:

  • Pages with a noindex tag (contradictory to include them)
  • Paginated pages beyond page one (unless each has unique indexable content)
  • Duplicate pages that have a canonical pointing elsewhere
  • Thank-you pages, checkout confirmation pages, admin pages
  • URLs with parameters that create duplicate content

A sitemap full of noindex or canonical-redirected URLs creates confusion and wastes Googlebot's crawl budget on pages you do not want it to visit.

What does an XML sitemap look like?

A basic sitemap:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://www.example.com/</loc>
    <lastmod>2026-05-01</lastmod>
    <changefreq>weekly</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>https://www.example.com/about/</loc>
    <lastmod>2026-03-15</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.8</priority>
  </url>
</urlset>

The required field is <loc>: the full URL of each page. <lastmod>, <changefreq>, and <priority> are optional, and Google has stated it largely ignores <changefreq> and <priority> values. <lastmod> is the most useful of the three, provided it reflects the actual date content was meaningfully updated (not just a system timestamp from a CMS rebuild).

Sitemap index files

Large sites use a sitemap index file: a single XML file that points to multiple individual sitemaps, each covering a section of the site.

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://www.example.com/sitemap-pages.xml</loc>
  </sitemap>
  <sitemap>
    <loc>https://www.example.com/sitemap-blog.xml</loc>
  </sitemap>
  <sitemap>
    <loc>https://www.example.com/sitemap-products.xml</loc>
  </sitemap>
</sitemapindex>

Each individual sitemap can contain up to 50,000 URLs and must be under 50MB uncompressed. For very large sites, splitting sitemaps by content type makes it easier to diagnose issues: if blog posts are being indexed poorly, isolating them in their own sitemap makes the problem easier to track.

How to submit a sitemap to Google

Two methods:

1. Google Search Console: go to Sitemaps under the Index section, enter your sitemap URL, and submit. Google will report the number of URLs submitted and how many it has indexed. The gap between submitted and indexed is worth monitoring.

2. robots.txt: add a Sitemap directive to your robots.txt file:

Sitemap: https://www.example.com/sitemap.xml

This ensures any crawler that visits your site can find the sitemap, even without a Search Console submission.

Both methods work. Using both is sensible.

Common XML sitemap mistakes

Including noindex pages

If a page has <meta name="robots" content="noindex">, including it in your sitemap creates a contradiction: you are telling Google to index it via the sitemap and not to index it via the meta tag. Google will generally respect the noindex tag, but the conflict is sloppy and wastes crawl budget.

Outdated URLs

If you have restructured URLs or deleted pages, your sitemap should reflect the current state of the site. A sitemap full of 301-redirecting or 404ing URLs wastes Googlebot's time and signals poor site hygiene.

Not updating the sitemap after publishing new content

If your CMS does not generate the sitemap dynamically, new pages will not appear in it until you regenerate and re-submit. Dynamic sitemaps generated by your CMS are preferable for sites that publish frequently.

Using relative URLs

All URLs in a sitemap must be absolute, including the protocol: https://www.example.com/page/, not /page/.

How to check whether your sitemap is valid

Visit yoursite.com/sitemap.xml directly. If it returns a properly formatted XML file, the structure is correct. If you get a 404 or a blank page, the sitemap is either missing or at a different path.

Google Search Console shows which URLs from your sitemap have been indexed versus discovered but not indexed. If a large proportion of submitted URLs are not being indexed, the issue is usually one of: content quality, canonical conflicts, or pages blocked by robots.txt.

For a complete view of which pages on your site are indexable, which have canonical issues, and which are being blocked, run a site crawl with Crawly. The crawl surfaces every page's indexability status alongside its sitemap inclusion, so you can identify pages that are in your sitemap but should not be, and pages that are missing from it but should be included.


An XML sitemap is one of the simplest technical SEO fundamentals. Done correctly it accelerates discovery; done poorly it sends conflicting signals to Google. Run a free site crawl to audit which of your pages are indexable and ready to include.

Try it yourself with Crawly

Free to download. No page cap. Claude Code MCP built in.

Download free