Google: The Duplicate Content Myth
Greg Grothaus of Google's Search quality team posted a video on the Google Webmaster Central Blog dispelling the duplicate content penalty myth. The video is a reproduction of a talk he gave at the Search Engine Strategies conference in San Jose last month on Duplicate Content and Multiple Site Issues.
In the video, Greg explains that Google does not automatically penalize you for having duplicate content on your web site as many have believed. Many have accepted this myth because of Google's search feature that hides similar pages from the user. However, it has never been Google's intention to penalize well-meaning webmasters who might happen to have multiple copies of the same page by accident. Anyone who has been developing web sites for a while (specifically dynamic ones) will tell you that it's quite common to have several different variations of the same URL. Greg crystalizes this with the following example:
These URLs are all different:
The URLs are all slightly different, but they are all displaying the home page of example.com, which is obviously not duplicate content. Google, in its infinite wisdom, understands this and will even attempt to pick the best url and combine all of the extras into one listing in search results.
However, just because there is no penalty for duplicate content does not give us an excuse to become lazy about keeping our URLs and URL re-writing techniques as clean as possible. You are still at a major disadvantage if people are linking to different copies of the same page, in that the link juice that could be captured by one single url on your web site is now being dispersed among two or more. Greg rightly states that if you have two identical pages with slightly different links, and 10 people are linking to one and 10 to the other, your listing is going to have half the rank from incoming links that it should. This is called dilution of link popularity. In addition to problems with linking, multiple URLs could also result in user-unfriendly URLs in search results, as well as inefficient crawling by search engines: you want them digging for new content, not re-reading the same thing.