Keeping Pace With The Google Penguin Updates

September 14, 2012 | |

Over the last few months, Google implemented some major changes in how your website is indexed and ranked. Issues that were considered very minor over the years now play a more important role in the value each page is given. Things like:

  • The total number of errors compared to the total number of pages
  • Temporary URLs – Pages without a permanent address
  • Pages that can be opened with multiple versions of the URL
  • The overall order of tags on a site


Google now looks at the process of regularly reading all of the pages on your site as a process with a budget. When more of their servers’ time is spent running into 404 errors and duplicate content issues than is spent actually indexing content that is where it is supposed to be then the site will reflect that the budget is out of balance. With the news that errors do not go away on their own, it’s become more important than ever to prevent search engines from seeing anything other than real content.

How do we accomplish this? Proper use of one file in particular is key – the innocuous file in your web directory called robots.txt. Using that file, you can tell crawler bots out there (whether they belong to Google, Bing, Yahoo or any other search engine) what you’d prefer they ignore on your site. It’s important that none of the items you block with your site’s robots file are in the sitemap for your Ecommerce site however – so make sure than any system you use for producing sitemaps is compatible with robots.txt files. Another very important piece of the puzzle is the .htaccess file. This file lets you tell not only search engines, but everyone and everything coming to your site how to interact with certain urls. If you have a page that’s doing well and is a good source of incoming traffic, but you need to change the name or move the page you can use the .htaccess file to make sure that anyone going to the old URL ends up where they should. Not only that, but with the right status (such as what’s called a 301 permanent redirect) you can tell search engines to replace the URL they have saved with the new one!

Sometimes the system you are using isn’t going to be compatible on its own with the way search engines want to see data on your site. Take two systems we work with on a daily basis for example – ProStores and WordPress. When a page is removed from either of these systems they will still load content when the address is visited. A search engine is at the very least looking for a 404 error message saying that the page no longer exists, but in ProStores and WordPress a search engine will get a page that checks out as ok. The search engine doesn’t understand that the text on the page saying that the product, category or post is gone – the search engine just thinks to itself “Hmmm, this page and all of these other pages just say exactly the same thing – I should penalize this website for having too much of the same exact content”. In those cases, there’s another trick to use – the Meta Robots tag. A search engine will understand that the page isn’t meant for them to keep if it finds the following in the <head> section on the page:

<meta name=”robots” content=”noindex,follow“>

There are a few different versions of the content part of that tag, but for our purposes this is the version we want to use. The trick, system by system, is figuring out how to make sure that this tag appears on the right pages and only the right pages. Under no circumstances would you want something like this on every page in your site because it’s telling search engines “read all the links on the page and follow them for more information, but don’t keep a copy of this page in your records and don’t include it with search results”.

All in all, Google and other search engines are starting to care a bit more than they used to about the way your site actually works and about the issues they need to deal with when trying to understand the content you have available on the web. If you make the job easier on them, they’ll reward you with a better ranked website or Ecommerce site. If you’d like us to review your site to see what we can do to help get it caught up, consider purchasing an SEO Analysis from our store and we can let you know what we recommend to avoid penalization and make sure your content is ranked properly.

Of course, if you’d just like to get the basics taken care of we’d be happy to redo your site’s robots.txt file, add the tags mentioned above and make some basic adjustments to ensure Google and other search engines are happy with the way your site works. Just follow this link to add development time to the shopping cart and checkout. We’ll get in touch right away to get the ball rolling so your site can start seeing the benefits as soon as possible!

Sean has been programming since first learning BASIC back in 1990. He has worked in the website development industry since its inception and has been working with IntuitSolutions since January 2006. Prepared for a career in systems administration by Drexel University, Sean is our Server Administrator and he works hard to make sure our equipment is efficient and reliable. He has been working with PHP and other powerful, web-friendly languages since the 90′s.