Although search crawlers are improving by leaps and bounds on a daily basis, organic discovery continues to be a slow and tedious process. Popular and big websites publish hundreds of pages of content and also upload humungous amounts of digital media content making discovery by search crawlers difficult.
Even with the most sophisticated crawlers discovering content (organic search) takes time and you need loads of patience. Even when content is found it lacks context and keywords that make the task even more difficult.Webmasters have long wished to mark pages on websites that they want crawled by search crawlers, they would also be happy to provide keywords and context. Well, this is not an impossible dream; you can still use ‘Sitemap’ and submit a list of content URLs for search engines to crawl. If you have not followed this simple procedure, now is the chance you make a beginning. Here is brief tutorial to get you started:
Know your XML sitemaps
There are different types of sitemaps and you need to know the difference between each one of them. Each one has a specific role. Sitemaps began in the mid-2000 and Google was the one that started it. Other search engines quickly accepted it and a common industry supported XML schema was developed in the year 2006.
Sitemaps are not designed for human consumption, but are read by search engines. This is what differentiates sitemaps from webpages. Placing a URL in a sitemap is like giving a hint to the search engine; however, most people believe it is a command to the search engine. This means not all URLs get indexed from your sitemap, but it is worth placing the URL in the hope it is discovered by search engines. Search engines first crawl a website and then decide whether to index it or not. Therefore a sitemap should be looked up as request rather than a command.
More often than not sitemaps resemble a jumble making it difficult for search engines to read non-standard and invalid code. Search engines also have more difficulty managing URLs that return HTTP 301, 302, and 404 than HTTP 200.
Keep in mind that Bing checks the number of non-200 links in a sitemap and if the total number exceeds 1 percent of the URLs submitted it abandons the sitemap. We are not sure if this practice still continues; however, we will deal with this topic later.
Sitemaps do not have a standardized name and file location like robots.txt files and therefore it is difficult to read them by default. Robots.txt files are always read by search engines when crawlers visit the site.
To tide over this problem you need to properly submit sitemaps. An easy way is to place a reference to the sitemap in your robots.txt file. However, the most reliable method is to submit your sitemap via Bing or Google Webmaster tools.
You surely must have a Webmaster tools account. Webmaster tools account also helps reveal any errors in our submission of sitemap files, thus helping our site’s indexation efforts.
XML sitemap files do not need a specific name, you can give it any name you want to, and you don’t have to store it at the site root. However, the file should be UTF-8 encoded text file, which essentially means URLs that have special characters should use ‘entity escaping’ so that the URL in the sitemap is parsed by search engines. Sitemaps can be saved in a compressed form in gzip format or in uncompressed form and presented as .XML files.
XML sitemap protocol has certain defined XML tags some of which are optional, while others mandatory that allow webmasters to define information on the pages such as Date of page modification, URL, expected content change frequency, and rated priority of the page compared to other pages mentioned in the sitemap.
Optional tags are of little value to search but Bing gives importance to <priority> tag when allocating crawler budget. This does not mean that if you assign high value to your priority tag it will be beneficial to you. Be judicious and tell the search which URLs in your website are really valuable.
The one big aspect of XML that you should be aware of is that they have limitations on size. A XML sitemap can be as big as 10 MB and contain as many as 50,000 URL entries. Now this limitation might be a problem for enterprise level sites; they have the option of Sitemap Index file which references 50,000 URLs, each of which can list another 50,000 URLs. This allows for a possible 2.5 billion links. Pretty big by today’s standards!
XML sitemaps feed Web Index of search engines, which is perhaps the most important index. However, please note it is not the only index that is important.
In the coming blogs we will discuss about HTML Sitemaps, RSS feeds, News sitemaps, video sitemaps, mobile sitemaps, image sitemaps, and see how to build a sitemap.