Free IP Blocking - Cool Site - Even writes .htaccess for you!

Aussie-Dave

Former AGD Member
Joined
Nov 24, 2007
Messages
684
Reaction score
3
Hi all,

You work your butt off, get listed in numerous spot 1's on Google and even get site wide links and then your targeted by site scrappers. It's enough to make you go crazy.

Well if blocking IP after IP seems a like a chore, here's a solution I found that I'm using now.

Most of my sites are Australian facing. Those that are not, are still only suited to players from certain Countries. But to block entire Country ranges, least off locate them for free is a pain.

Well here is you answer:

xxxhttp://www.countryipblocks.net/

It even writes the .htaccess for you and other formats too.


Cheers

Dave
 

Daera

Affiliate Guard Dog Member
Joined
Oct 16, 2008
Messages
291
Reaction score
0
That's a great site!

I was wondering how to get the country block ip's and info awhile back and Bonustreak's partner referred me there.

Thanks for posting that. Very handy for forum owners, in dealing with country's that are very high with spam/fraud.
 

Perc

Affiliate Guard Dog Member
Joined
Aug 24, 2010
Messages
195
Reaction score
19
My site was scraped this morning, from these IPs in Isreal:

192.114.71.13
82.80.249.244
82.80.230.228

Add those to your bad IP list.
 

Guard Dog

Guard Dog
Staff member
Joined
Dec 13, 2006
Messages
11,228
Reaction score
3,144
What a pain in the ass. I have never worried about scrapers, but I'd bet I have lost a ton of traffic from them :(
 

Perc

Affiliate Guard Dog Member
Joined
Aug 24, 2010
Messages
195
Reaction score
19
Well I think I figured how to block most of the bad bots and scrapers via .htaccess. There are probably more up to date agents that need to be added, and some old ones can probably be trimmed, but it should at least keep out the bad automated bots or scraping scumbags that aren't trying too hard.

Here's a bit of my .htaccess:

Code:
ErrorDocument 403 /403.html

RewriteEngine On 
RewriteBase /

# IF THE UA STARTS WITH THESE
RewriteCond %{HTTP_USER_AGENT} ^(aesop_com_spiderman|alexibot|backweb|bandit|batchftp|bigfoot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(black.?hole|blackwidow|blowfish|botalot|buddy|builtbottough|bullseye) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(cheesebot|cherrypicker|chinaclaw|collector|copier|copyrightcheck) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(cosmos|crescent|curl|custo|da|diibot|disco|dittospyder|dragonfly) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(drip|easydl|ebingbong|ecatch|eirgrabber|emailcollector|emailsiphon) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(emailwolf|erocrawler|exabot|eyenetie|filehound|flashget|flunky) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(frontpage|getright|getweb|go.?zilla|go-ahead-got-it|gotit|grabnet) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(grafula|harvest|hloader|hmview|httplib|httrack|humanlinks|ilsebot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(infonavirobot|infotekies|intelliseek|interget|iria|jennybot|jetcar) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(joc|justview|jyxobot|kenjin|keyword|larbin|leechftp|lexibot|lftp|libweb) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(likse|linkscan|linkwalker|lnspiderguy|lwp|magnet|mag-net|markwatch) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(mata.?hari|memo|microsoft.?url|midown.?tool|miixpc|mirror|missigua) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(mister.?pix|moget|mozilla.?newt|nameprotect|navroad|backdoorbot|nearsite) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(net.?vampire|netants|netcraft|netmechanic|netspider|nextgensearchbot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(attach|nicerspro|nimblecrawler|npbot|octopus|offline.?explorer) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(offline.?navigator|openfind|outfoxbot|pagegrabber|papa|pavuk) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(pcbrowser|php.?version.?tracker|pockey|propowerbot|prowebwalker) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(psbot|pump|queryn|recorder|realdownload|reaper|reget|true_robot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(repomonkey|rma|internetseer|sitesnagger|siphon|slysearch|smartdownload) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(snake|snapbot|snoopy|sogou|spacebison|spankbot|spanner|sqworm|superbot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(superhttp|surfbot|asterias|suzuran|szukacz|takeout|teleport) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(telesoft|the.?intraformant|thenomad|tighttwatbot|titan|urldispatcher) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(turingos|turnitinbot|urly.?warning|vacuum|vci|voideye|whacker) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(libwww-perl|widow|wisenutbot|wwwoffle|xaldon|xenu|zeus|zyborg|anonymouse) [NC,OR]
 
# STARTS WITH WEB
RewriteCond %{HTTP_USER_AGENT} ^web(zip|emaile|enhancer|fetch|go.?is|auto|bandit|clip|copier|master|reaper|sauger|site.?quester|whack) [NC,OR]
 
# ANYWHERE IN UA -- GREEDY REGEX
RewriteCond %{HTTP_USER_AGENT} ^.*(craftbot|download|extract|stripper|sucker|ninja|clshttp|webspider|leacher|collector|grabber|webpictures).*$ [NC]

# ISSUE 403 / SERVE ERRORDOCUMENT
RewriteRule . - [F,L]

<Files 403.html>
order allow,deny
allow from all
</Files>

<Files .htaccess>
order deny,allow
deny from all
</Files>

Might be a little over the top, but unless you have a very high traffic site, it's not supposed to cause a noticable performance hit.

This is an easy to find list, so anyone trying hard enough would just name their agent differently, but this should at least help.

To the vets, is this just an exercise in futility? Is there any point putting much effort into keeping them out, or is it easier to just react when/if your scraped site/content is published elsewhere? Or ban IPs when you notice it happening?
 

Aussie-Dave

Former AGD Member
Joined
Nov 24, 2007
Messages
684
Reaction score
3
To the vets, is this just an exercise in futility? Is there any point putting much effort into keeping them out, or is it easier to just react when/if your scraped site/content is published elsewhere? Or ban IPs when you notice it happening?

I wasn't bothering but with all the BS out there these days, not mention those damn WP plugins used for evil, I think you have to keep on top of it.

Been using a similar bot identifier but yours looks more current, so I've swapped it with mine. Thanks :)

I'm annoyed right now with sites being able to pull alexa stats. It's ok if it's for good but 9 times out 10 it's being used for evil. Tonight I found a site pulling that info using my meta data...ect...ect and running Google ads.

So now I have to go into WP functions and screw around with the dynamically created robots.txt file to add Disallow to alexa. Should fix the problem.

Most of this stuff is coming from Russia, so I've IP banned the whole country.


Cheers

:)

Dave
 

mattsgame

Affiliate Guard Dog Member
Joined
Sep 27, 2010
Messages
166
Reaction score
5
Hi all,

You work your butt off, get listed in numerous spot 1's on Google and even get site wide links and then your targeted by site scrappers. It's enough to make you go crazy.

Well if blocking IP after IP seems a like a chore, here's a solution I found that I'm using now.

Most of my sites are Australian facing. Those that are not, are still only suited to players from certain Countries. But to block entire Country ranges, least off locate them for free is a pain.

Well here is you answer:



It even writes the .htaccess for you and other formats too.


Cheers

Dave


Could someone please explain to me what a site scrapper is? or a link that i could read for myself?

Thanks all, never heard of this term before.

Cheers
Matt
 

Perc

Affiliate Guard Dog Member
Joined
Aug 24, 2010
Messages
195
Reaction score
19
A scraper will basically download your whole website, or just pictures/text, to copy all your work and do whatever they want with it.
 

Aussie-Dave

Former AGD Member
Joined
Nov 24, 2007
Messages
684
Reaction score
3
Always a good idea to use hotlinking protection. It stops images being used on other sites which also steals your hosting bandwidth.

Everything on this thread so far is for Unix.

If your hosting on a Windows server you can't use a .htaccess file.
Sorry I have no idea what you do on a Windows Server.

This is what I use in my .htaccess to prevent hotlinking.

Code:
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://yourdomain.com/.*$  [NC]
RewriteCond %{HTTP_REFERER} !^http://yourdomian.com$  [NC]
RewriteCond %{HTTP_REFERER} !^http://www.yourdomain.com/.*$  [NC]
RewriteCond %{HTTP_REFERER} !^http://www.yourdomain.com$  [NC]
RewriteRule .*\.(.*jpg|jpeg|png|gif)$ - [F,NC]

Only use the RewriteEngine on command once.



Cheers

:)

Dave
 
Last edited:
Top