Removing Stolen Content From Google

From time to time its not uncommon to find sites stealing your content in its entirety with no link back to its original source whilst plastering ads around their page to make money off your hard work. Getting these plagiarised sites removed from Google used to be a hassle, requiring the copyright owner to file a DMCA complaint and then mail or fax (you remember those technologies, right?) it to Google.

Good news though, I found  some stolen content today and it seems the times have finally changed with Google finally allowing web submitted content take down requests. This move has been a long time overdue in my opinion as a majority of these stolen content sites are making their money through Google’s adsense program. With the old fax and email system it was normally easier just to firewall off the network the offending site was hosted on to stop them scraping your content and move on with life.

Here are some tips on what to do if you suspect someone is stealing your content for their own gains without any attribution:

Identify possible stolen content

A lot of the time you can identify potential content theft from your webstats, if you see an image that is getting called a lot more than the post it appears in chances are its being hotlinked to from elsewhere, looking at the referring URL’s is also a good way to spot possible thieves. Enter your web site URL at Copy Scape and see if it can identify any duplicate content for you.

Get the content taken down

Lookup the IP address of the site using the nslookup command on the offending site e.g

nslookup offendingsite.com

This should return something similar to:

Server:        61.88.88.88
Address:    61.88.88.88#53

Non-authoritative answer:
Name:    www.enunix.com
Address: 74.82.173.217

Now perform a whois on the IP address returned to find out what network the sites is hosted on. This can be done either on the Linux command line using the whois tool or by using a web based tool. This will give you some information on the network similar to:

#
# Query terms are ambiguous.  The query is assumed to be:
#     “n 74.82.173.217”
#
# Use “?” to get help.
#

#
# The following results may also be obtained via:
# http://whois.arin.net/rest/nets;q=74.82.173.217?showDetails=true&showARIN=true
#

American Registry for Internet Numbers NET74 (NET-74-0-0-0-0) 74.0.0.0 – 74.255.255.255
Take 2 Hosting, Inc. T2H-NET4-2 (NET-74-82-160-0-1) 74.82.160.0 – 74.82.191.255

#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#
Here we can see Take 2 Hosting is responsible for this IP range, so you can then Google the company name find out how to contact them and report the offending content. If your lucky sometimes the companies whois record will even include an email address for reporting abuse originating from their network making this step easier. You can also do a whois on the domain of the site itself to find the registration details, 9 / 10 times though those is the content theft line of work will use a domain registrar with a private whois service that obscures the registrant’s contact details.

Report infringement to Google

Reporting the offending content to Google is a crucial  step, first because they can remove their site from the search listings to limit the sites traffic and as most content thieves make their money from adsense they can ban their adsense account. To report offending content head to this form and request it removal.