One of the best ways to get your content discovered on the web is through search engines. To be able to get an understanding of the content on your site, a search engine uses a spider or web crawler. A search engine crawler requests as many pages as possible, and as allowed by the robots.txt configuration file. The crawled pages are then categorised based on their contents. When a user searches for a term in the search engine, it may identify a match between the user’s search term and some keywords in your page.
Because web traffic drives ad revenue, an industry has popped up called search engine optimisation or SEO. SEO basically involves learning how search engines prioritise pages to include in search results and then tuning a website so that it is ranked more favourably. SEO draws more traffic and thus more ad revenue.
Most of the SEO techniques end up actually benefitting or at least being neutral to the end user’s experience. For example, serving content over HTTPS is an SEO tool. This helps to drive encryption around the web and increases user privacy and security. Including keywords in meta tags helps the search engine to understand the key contents of the page, to the end user, though this doesn’t really make any difference.
Unfortunately, there are also some “black hat SEO” techniques. These aren’t related to computer security black hats, as in bad guy hackers. They do, however, exploit how the system works to their own advantage. One of these tricks is called cloaking.
Cloaking
You would generally assume that if two people visit the same web page, they get the same content. If it has to do with user-related data the precise content might be different, but everyone should see the same news article, or their own social media feed etc. It’s also expected that a search engine has the same experience. Again, it won’t be signed into websites so it won’t index your personal feeds or anything but it should see any public content in just the same way you do.
A search engine can only prioritise results based on the content that it sees. Cloaking is the black hat SEO technique of changing the content that the search engine sees. An SEO-optimised page is shown to a search engine in order to achieve a high relevancy ranking. Actual human visitors, however, get served different content. While this can have some legitimate use cases, it is typically not employed to benefit the end user, but rather to benefit the search rankings.
For example, if the content of a page was in a non-text format a search engine just can’t understand it. In this case, it might then be reasonable to serve it with a transcribed version or with summary notes, etc. This could legitimately benefit the end user, helping to increase the rankings of genuinely relevant content that would otherwise be hard to categorise.
Spamdexing
Unfortunately, it’s typically used as a form of spamdexing and black hat SEO. Knowing that search engines look for certain SEO features a search engine crawler can be served with a heavily SEO-optimised page with no real actual content just huge amounts of keywords that make it look relevant. When a user visits they don’t see the same thing at all. Often they can be redirected to different content that isn’t actually relevant.
Tip: Spamdexing is a portmanteau of Spam and Indexing. It refers to an array of methods to illegitimately improve search engine ranking.
Of course, the user’s experience doesn’t matter to the site owner. Once the user hits their site and is served with ads, the owner is already generating ad revenue.
Conclusion
Cloaking is a form of black hat SEO. It involves identifying requests coming from search engine crawlers and then serving them heavily SEO-optimised pages. These pages are completely distinct from the content that is served to human users. This difference means that the search engine doesn’t have an accurate picture of the actual site that will be delivered to the user and so can’t correctly rank the result. Cloaking is always done with the intent of increasing the search engine ranking even if the actual content users see is not relevant or useful.