A Guide to Robots.txt and Indexing your Site For Google,XML Sitemap
robots.txt is a small, plain text file that’s a part of your website. Its job is to tell the search engine spiders (those bits of software that crawl the web and index your site) what parts of your website they are, and aren’t, allowed look at.
If you’re using a Content Management System (CMS) like WordPress, Typo3, Concrete5, or any of the others on the market there will be parts of your website’s structure that the CMS does not want the spiders to go. Correct Website Architecture and use of Tools like XML Sitemaps and correct configuration of the robots Txt file for Google this is a website Ranking factor.
What does the Robots Txt File Do ?
The robots.txt file gives instructions to the search engine robots that analyze your website, it’s an exclusion protocol for robots . Thanks to this file, you can prohibit the exploration and indexing of your site to some robots (also called “crawlers” or “spiders”).
This robots.txt file allows all bots to index all content and provides them a link to the website’s XML sitemaps.
So far, so good… but what happens if your web developer (or you) make a mistake?
One of the most destructive errors I’ve seen is when a developer sets the robots.txt on a test site to a mode called “no-index”. This effectively tells the search engines not to crawl the site or add it to the index.
Then, the site goes live… and nobody remembers to change robots.txt.
To the “naked eye” looking at your website through a normal browser, you will have no idea that anything is wrong. But, to a search engine, your site just became effectively invisible. Worse, if you were already indexed, you just told the search engine to remove you from the index.
How to Check Your Robots.Txt
You can check your robots.txt using Google Search Console or ask your web developer to verify that everything is as it should be.
Remember that a robots.txt issue need not be sitewide – you may find pages or sections of your website that are blocked whilst others are fine.
SEO Guide to XML Sitemap
An XML sitemap is a computer-generated map of your website (in eXtensible Markup Language) that tells search engines about every page on your website. As well as telling the search engines where it is, your XML sitemap can also tell the search engines how important a page is and when it was last updated.
So, do you need one?
Search engines, particularly Google, won’t index content that they cannot reach by navigating to it from your homepage. It doesn’t need to be linked directly from your homepage, but if you can’t trace a route of clicks from your homepage to a particular page, then it is unlikely to appear in the index.
But that’s not the important thing about an XML sitemap…
How to Check If Your Site is Indexed
The important thing about an XML sitemap is that you can use it with tools like Google’s own “Search Console” to check that your website is being indexed.
How often does Google re-index websites
If you point Search Console at your site’s XML sitemap then you can get a measure of how many pages from the list contained in the sitemap are being actually indexed.
This is a great way to spot indexation problems like orphaned pages – pages that have become disconnected from the main navigation and are therefore not being indexed anymore.
Check, or ask your web developer to check, the following things:
1. Check your CMS is generating a sitemap.
2. Check there is metadata in your page header to tell search engines
where to find your XML sitemap.
3. Check your XML sitemap is connected to Search Console.
Google for Webmasters Tutorial: Crawling and Indexing
Thank for Reading my Insights into a Guide for Robots Txt and XML Sitemaps.