Stolen Content

AussieDave

24 years & still going!
Joined
Nov 28, 2013
Messages
4,978
Reaction score
3,518
I would caution against blocking out an entire country. One of my biggest high rollers is a player from Russia.

I've been vetoing all traffic, thru my sites for yonks. China and others, have complete blanket block. One of the most noticable flags are outdate browsers. Firefox 40.0 and lesser versions are used heavily by scrappers etc. So if one was to JUST block those outdate browser versions, it has a huge impact. Also block bogus Google, Bing etc etc., UA's. Another trick these scum bags use.

As far a SE cache's, I don't allow those. I know Google follows that instruction, no sure about Bing etc. But since doing that, and other security layers, it has severely impact scrappers etc stealing my content. And, its stopped other crap in its path, too.
 

footballaffiliate

Affiliate Guard Dog Member
Joined
Jan 10, 2013
Messages
404
Reaction score
149
How do you actually safe gaurd something like that. I mean, unless you can determine with 100% accurancy it's a scrapper etc, then I could see this having a fallover effect, which, could backfire and cause you to lose ranking. I believe Google would see this as cloaking...just sayin

Some of our stats are updated daily, so it's in a scraper's interest to keep coming back and getting the latest data. Therefore we've written code to detect patterns, like time between clicks etc that identifies visits that cannot be human, and are more likely bot. You can also block IPs or scramble data for a period of time then unlock the IP from blocking so that for a scraper it's useless, but in the long term you're not banning IPs that might be used by legitimate users in the future.

Other things to look for - HTTP headers - block scraping software in your .htaccess file. Click frequency. Click patterns - are visitors going through every page and querystring methodically? Country of visitor. And other factors that combined allow you to make a reasonable judgement as to whether someone is a scraper. You can even set limits by country. So, you could say that more than 100 visits to a page in 2 hours from China is not allowable, but from another country, perhaps yes.

The only way a person can scrape our data now is to visit very infrequently, from different countries, different IP addresses, visiting pages randomly etc etc in order to look like a human. Even then, we can detect patterns in visits across IP addresses to see if there is some methodical scraping of all data combinations for that page.

Once a scraper knows they can't get reliable information from you they'll go elsewhere because they can't run a website themselves when they're constantly disrupted from gathering their stolen content.

I hate scrapers that's why we go to great lengths to stop them. We pay good money for our data and have invested a lot in code that transforms this data into useful stuff.

The other thing to do is to keep reporting these IP addresses to the abuse databases online like abuseipdb.com
 
Top