To explain robots.txt, we would have to go to the basics of what a page really is. Think about real books with real pages. When we pick up a book, we see the white pages and the letters, but we don’t see the many processes it took to make paper and the many processes it took to make ink.
In the same way, we see a web page and most times do not even realize that the webpage is just the final product of many processes. Behind every web page on the internet are lines of codes, files, and folders saved on a server.
The texts, the arrangements, the videos, and files on every webpage are all stored in a directory on the servers, which we don’t see.
A robots.txt is one of those files. It is a very simple file on a website and functions as an instruction for web crawlers from search engines.
A robots.txt file is like a sign put up in an office: “No visitors allowed beyond here.” It is an index of all the content you don’t want search engines to find and index on your site.
When visitors walk into an office and see a sign like “No visitors allowed beyond here,” they turn around and go the other way. Search engines do the same thing.
When crawlers access your site, search and search for relevant content and encounter a robots.txt, they immediately understand that they are not supposed to go beyond there. They stop crawling and move on to other important sites.
What does this mean for SEO?
Well, putting up robots.txt on your website will mean that we can do the following:
1. No Crawling of Duplicate Content
By now, you should already know that search engines discredit duplicate content. In the same way, you most likely wouldn’t purchase the same item twice. Search engines try not to put up the same content on their respective results’ pages.
You can bypass this by using robots.txt. With robots.txt, you can tell search engines which double content they are not supposed to reach.
Since search engines can’t find these, they cannot discredit you.
Sometimes, we may have sections on our web pages we do not want search engines to find. Reasons being that they are private, or we at least don’t want them to be up in on the search engine result page. We can block search engines from seeing such pages, which is going to do SEO a lot of good.
3. Stop Images from appearing in Search Results
When we do not want images and videos to be found by crawlers, we can add instructions to the robots.txt files. It will mean search engines have fewer things to look out for and so narrow their searches to just a few pages.
4. No Server Overload
Putting up robots.txt helps prevent server overload, as all parts of the server are not actively involved during crawling. Creating robots.txt is good practice for SEO.