As its name suggests, a sitemap is a map, and it is needed for both search engines and visitors. In the first case, a sitemap represents a list of page addresses in the form of a special file. In contrast, the latter is a pack of links to categories and subcategories represented in the form of a simple navigation tree. The main function of a sitemap document is to make it possible to find/ index a particular section of the site with no problem. A sitemap helps search engines understand the structure of a resource to index it correctly. Therefore, webmasters provide search engines with information about the content of the site in the form of a file in .XML format
A map is an essential element of any site. Nowadays, without this file, it is impossible to promote a web resource successfully and ensure high traffic rates. Besides that, the presence of a sitemap increases the level of comfort of using the site for visitors to the resource. Let’s dive deeper into the question of the site audit roadmap and its primary focus.
People “Need” HTML Version
The HTML version of the sitemap is most suitable for users. It is intuitive and easy to understand. How to create it? It can be done in several ways:
- You can create the HTML version of the sitemap manually, that is, write the structure of your site in the form of URLs on one HTML page. After that, this page must be placed on the hosting where the site is located and put a link to it;
- The second method is automatic. You can generate an HTML version of the sitemap using an online service or a particular program. All you need to do is to enter the full domain name of a resource you need a sitemap for and get a ready-made file that you can easily place on-site hosting.
Large sites are hard to index in full, so the creation of a sitemap is essential for large e-commerce projects, online stores, and news outlets. However, it is problematic to do it manually when the number of pages exceeds tens of thousands or more. Therefore, it is worth turning to the second method.
Search Engines “Need” XML Version
Search engines should see your site the way you need it, how it is beneficial to you from an SEO point of view. For this, we need a special sitemap.xml (Extensible Markup Language) file. So what is an XML sitemap, and what information should it cover? We can say that XML sitemap is an addition to other particular files that regulate site indexing, such as robots.txt, and various meta tags that prohibit indexing of selected pages. As in the first option with an HTML sitemap version, you can generate a small sitemap using a free online generator, but this option is not always suitable. To create an “accurate” sitemap, we recommend that you create and manage it manually.
Why Do You Need a Sitemap.xml?
Search engines use a sitemap to find new documents on the site (both HTML documents and media content) that are not accessible through navigation but need to be scanned. The presence of a link to a document in the sitemap.xml does not guarantee page crawling or indexing, but the file helps large sites to index it better. Besides, data from the XML map is used when defining canonical pages, unless specifically indicated in the rel = canonical tag.
Sitemap.xml is essential for sites where:
- Some sections are not available through the navigation menu;
- There are many isolated pages or poorly connected pages;
- You use development technologies that are poorly supported by search engines (for example, Ajax, Flash, or Silverlight);
- There are a lot of pages on the site, so the chances are that the search crawler will miss new content are quite high.
If you decide to work on the XML file, then you need to know all the technical specs of the document:
- Sitemap.xml is an XML text file;
- Each sitemap can contain a maximum of 50,000 URLs and weigh no more than 50 MB;
- You can reduce the size of the sitemap.xml file and increase its transmission speed. Beware that weight restrictions remain for uncompressed sitemaps;
- The location of the sitemap determines the set of URLs that can be included in the document. The map containing the addresses of the pages of the entire site should be located at the root. If the sitemap is located in a folder, then all URLs must be found in this folder or deeper;
- All the addresses in the sitemap.xml must be absolute;
- The maximum URL length is 2048 characters;
- Special characters in URLs (such as ampersand or quotation marks) must be masked in HTML entities;
- Pages listed in the map should give a 200 status code;
- URLs listed in the map should not be covered in the robots.txt file.
If you follow all the above recommendations, you can ensure proper site indexing and removed unnecessary URLs.