OK, so you’ve got yourself an XML Sitemap, and you’ve told Google where to find your sitemap using Google Webmaster Tools. So far, you’re well on your way to optimizing your WordPress site.
The next thing in your journey to the ultimate in WordPress SEO is telling the Search Engines what they need to index.
The first thing you need to understand is the fact that there are certain things Search Engines like, and certain things Search Engines DO NOT like.
As as sub-point, we need to understand that Search Engines determine the content of your site by “crawling” it.
So what is “crawling”? In a nutshell, “crawling” is when a Search Engine arrives at your page and follows every single link on your site. When it arrives at the next page, it does the same thing, and so on, and so on. When it reaches a new page, it “indexes” that page in its (very large) database of indexed sites. So, if you have a WordPress site with 5 Pages and 50 Posts, search engines will find and index those Pages and Posts based on the links that point to them.
For a more in-depth explanation of crawling, see this Wikipedia explanation.
The Second thing you need to understand is that you have the ability to control what content on your site gets indexed by the crawlers.
And with WordPress, the process is very simple.
But we first need to understand why certain content shouldn’t be indexed by search engines.
Duplicate Content Penalty
Google has been very public in it’s penalizing what it calls “duplicate content”. When two or more pages on either the same or different domains or subdomains contain substantial blocks of content that “either completely match other content or are appreciably similar”, a penalty of some sort (usually a penalty in rankings for the duplicate content) will come into effect.
If you’re wanting to rank high for your targeted keywords, then being penalized by Google is the last thing you want, even if the penalty is minor.
You may be thinking, “But I don’t have any duplicate content on my site”.
Oh Really?
Little do most people know, WordPress creates what are considered “Archives” of posts. These “Archives” are generated automatically based on things such as categories, dates, tags, authors, etc. And each one of these archives is considered by search engines as being another page of content. And since many posts fall into multiple archives, a single post excerpt could be showing up in multiple locations throughout your archives.
You have to stop this! Thankfully, this is not hard to do.
There is a META tag that you can place in the header.php file for your theme that will tell the search engines not to index the content found on these archive pages. And with a little conditional tag magic, we can tell Google to index only the content we want indexed.
<?php if(is_home() || is_single() || is_page()) { echo '<meta name="robots" content="index,follow" />'; } else { echo '<meta name="robots" content="noindex,follow" />'; } ?>
What this little bit of code does is tell the search engines, “If this is the homepage, a single post, or a Page, then you are allowed to index it. If not, then do NOT index it.”
And if you, for whatever reason, want the search engines to index your homepage, Posts, Pages, and your Category Archives, then all you have to do is this:
<?php if(is_home() || is_single() || is_page() || is_category()) { echo '<meta name="robots" content="index,follow" />'; } else { echo '<meta name="robots" content="noindex,follow" />'; } ?>
So now, you are safe from the duplicate content penalty! Your site is one step closer to being Search Engine Optimized!
Next time, we’ll cover how to optimize your Permalinks and Permalink structure for keyword targeting. Why not go ahead and Subscribe to this blog and get this series delivered to you daily? I promise, you’ll be glad you did!