Robots Txt XML Sitemap SEO Guide

how to use robots file
Robots Txt XML Sitemap SEO Guide

A Guide to Robots.txt and Indexing your Site For Google,XML Sitemap

robots.txt is a small, plain text file that’s a part of your website. Its job is to tell the  search  engine  spiders  (those  bits  of  software  that  crawl  the  web  and  index your site) what parts of your website they are, and aren’t, allowed look at.

If  you’re  using  a  Content  Management  System  (CMS)  like  WordPress,  Typo3, Concrete5,  or  any  of  the  others  on  the  market  there  will  be  parts  of  your website’s structure that the CMS does not want the spiders to go. Correct Website Architecture and use of Tools like XML Sitemaps and correct configuration of the robots Txt file for Google this is a website Ranking factor.

What does the Robots Txt File Do ?

The robots.txt file gives instructions to the search engine robots that analyze your website, it’s an exclusion protocol for robots . Thanks to this file, you can prohibit the exploration and indexing of your site to some robots (also called “crawlers” or “spiders”).

This robots.txt file allows all bots to index all content and provides them a link to the website’s XML sitemaps.

Website Indexing

So  far,  so  good…  but  what  happens  if  your  web  developer  (or  you)  make  a mistake?

One  of  the  most  destructive  errors  I’ve  seen  is  when  a  developer  sets  the robots.txt  on  a  test  site  to  a  mode  called  “no-index”.  This  effectively  tells  the search engines not to crawl the site or add it to the index.

Then, the site goes live… and nobody remembers to change robots.txt.

To the “naked eye” looking at your website through a normal browser, you will have  no  idea  that  anything  is  wrong.  But,  to  a  search  engine,  your  site  just became  effectively  invisible.  Worse,  if  you  were  already  indexed,  you  just  told the search engine to remove you from the index.

How to Check Your Robots.Txt

You  can  check  your  robots.txt  using  Google Search Console or  ask  your  web developer to verify that everything is as it should be.

Remember that a robots.txt issue need not be sitewide – you may find pages or sections of your website that are blocked whilst others are fine.

 Testing Robots file

SEO Guide to XML Sitemap

An  XML  sitemap  is  a  computer-generated  map  of  your  website  (in  eXtensible Markup  Language)  that  tells  search  engines  about  every  page  on  your  website. As well as telling the search engines where it is, your XML sitemap can also tell the search engines how important a page is and when it was last updated.

So, do you need one?

Search engines, particularly Google, won’t index content that they cannot reach by  navigating  to  it  from  your  homepage.  It  doesn’t  need  to  be  linked  directly from your homepage, but if you can’t trace a route of clicks from your homepage to a particular page, then it is unlikely to appear in the index.

But that’s not the important thing about an XML sitemap…

How to Check If Your Site is Indexed

The important thing about an XML sitemap is that you can use it with tools like Google’s own “Search Console” to check that your website is being indexed.

How often does Google re-index websites

If  you  point  Search  Console  at  your  site’s  XML  sitemap  then  you  can  get  a measure  of  how  many  pages  from  the  list  contained  in  the  sitemap  are  being actually indexed.

This is a great way to spot indexation problems like orphaned pages – pages that have become disconnected from the main navigation and are therefore not being indexed anymore.

Check, or ask your web developer to check, the following things:

1.  Check your CMS is generating a sitemap.

2.  Check there is metadata in your page header to tell search engines
where to find your XML sitemap.

3.  Check your XML sitemap is connected to Search Console.

Google for Webmasters Tutorial: Crawling and Indexing

Thank for Reading my Insights into a Guide for Robots Txt and XML Sitemaps.

User Review
5 (5 votes)