What is a Sitemap Index?
A sitemap index file is simply a group of individual sitemaps, using an XML format similar to a regular sitemap file. The sitemap index allows you to include several sitemap files under one file called sitemap index. It uses almost the same syntax but instead of including your page’s URLs, you will have to add the URLs to your sitemaps.
This is why if you plan to create a website with multiple subdirectories and sitemaps, you will have to use Sitemap Index.
Using Sitemap Index files (to group multiple sitemap files)
You can provide multiple sitemap files, but each sitemap file that you provide must have no more than 50,000 URLs and must be no larger than 50 MB (52,428,800 bytes). If you would like, you may compress your sitemap files using gzip to reduce your bandwidth requirement; however the sitemap file once uncompressed must be no larger than 50MB. If you want to list more than 50,000 URLs, you must create multiple sitemap files.
If you do provide multiple sitemaps, you should then list each sitemap file in a sitemap index file. Sitemap index files may not list more than 50,000 sitemaps and must be no larger than 50MB (52,428,800 bytes) and can be compressed. You can have more than one sitemap index file. The XML format of a sitemap index file is very similar to the XML format of a sitemap file.
The Sitemap index file must:
- Begin with an opening <sitemapindex> tag and end with a closing </sitemapindex> tag.
- Include a <sitemap> entry for each Sitemap as a parent XML tag.
- Include a <loc> child entry for each <sitemap> parent tag.
The optional <lastmod> tag is also available for Sitemap index files.
Note: A Sitemap index file can only specify sitemaps that are found on the same site as the sitemap index file. For example, http://www.yoursite.com/sitemap_index.xml can include sitemaps on http://www.yoursite.com but not on http://www.example.com or http://yourhost.yoursite.com. As with sitemaps, your sitemap index file must be UTF-8 encoded.
Sample XML Sitemap Index
The following example shows a Sitemap index that lists two Sitemaps:
<?xml version=”1.0″ encoding=”UTF-8″?>
Note: Sitemap URLs, like all values in your XML files, must be entity escaped.
Sitemap Index XML Tag Definitions:
|<sitemapindex>||required||Encapsulates information about all of the Sitemaps in the file.|
|<sitemap>||required||Encapsulates information about an individual Sitemap.|
|<loc>||required||Identifies the location of the Sitemap.
This location can be a Sitemap, an Atom file, RSS file or a simple text file.
|<lastmod>||optional||Identifies the time that the corresponding Sitemap file was modified. It does not correspond to the time that any of the pages listed in that Sitemap were changed. The value for the lastmod tag should be in W3C Datetime format.
By providing the last modification timestamp, you enable search engine crawlers to retrieve only a subset of the Sitemaps in the index i.e. a crawler may only retrieve Sitemaps that were modified since a certain date. This incremental Sitemap fetching mechanism allows for the rapid discovery of new URLs on very large sites.
Validating your Sitemap Index Files:
The following XML schemas define the elements and attribute that can appear in your sitemap file. You can download this schema from the links below:
For Sitemap index files: http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd