What is duplicate content, and why do I need a checker?
Duplicate content, according to Google Webmaster Tools, occurs when large blocks of copy matches across two or more domains. Duplicate content can also occur within the same domain, hurting a website’s SEO efforts.
Duplicate content checker tools can assist marketing teams in keeping duplicate content to a minimum, and identify canonicalization issues within one website.
Why is it bad?
Not all instances of duplicate content are malicious or bad. In fact, many instances of duplicate content are due to technical issues: Forgetting to use the noindex tag on syndicated articles, using placeholder pages, and inappropriate use of redirects. While these examples are not considered “good”, they do not involve someone stealing your copy and publishing it on their own website.
What can I do?
The first step to solving a duplicate content issue is to find it with a duplicate content checker, and determine if it’s harmful. Here are some handy tips, and can help you get started on the right path:
After reviewing a variety of providers, Copyscape is #1 for website and blog publishers. They offer a free version to new users, and paid options for additional features. One upgraded version is called Copysentry, and this service scans the web on a regular basis. It will email you when duplicate content is found.
Users may enter a web page url to find page copies. They offer a “batch” check service, where users can simply upload their .xml sitemap or url.
3 Take action
If someone has stolen your content and published it on their own website, you have several options:
- Ask the site owner to noindex the page, or take it down.
- Submit a DMCA to the site owner, and then the hosting company, if no answer. You may hire a service to assist with this process, or you may submit a DMCA on your own. Google also has a resolution page dedicated to this subject, but it may be faster to go straight to the hosting company.
- How do you find the hosting company? You’ll have to do a little research to find this information:
Use WhoIs to the Nameserver information. Type the Nameserver information into Google, and look at the results for information on the hosting company. For example, if stabletransit is listed as the NameServer, and you type that into Google, you will find that Rackspace is the hosting company.
If the bulk of duplicate content issues exists within your own website, start cleaning up your site with suggestions from Webmaster Tools.
Finally, if you are looking for a more in-depth article on the topic of duplicate content, I suggest this post on the Technical SEO Moz blog.