Блокировка ботов и снижение нагрузки на сервер

Исследуя логи сервера, можно найти множество ботов, буквально атакующих ваш сайт изо дня в день, создавая огромную нагрузку на сервер. Блокировка ненужных ботов – логический этап развития крупных проектов.

Вторая сторона медали – исследование вашего проекта конкурентами через сервисы, например: ahrefs, semrush, serpstat, linkpad и другие. Если вы делаете SEO продвижение с помощью PBN-сетей, через эти сервисы можно очень легко отследить всю вашу сеть, после чего "настучать" в Google для следующего бана всей сети сайтов читера. Чтобы этого не произошло, следует уделить время закрытию от роботов этих сервисов.

Есть и другие преимущества от блокировки, например: частичная защита от воровства контента, защита от замысла ддос и хакерских атак. Обычно это делается с предварительным анализом сайта, проводимым некоторыми из нижеуказанных ботов.

Популярные боты, создающие нагрузку на сервер

План статьи

Список ботов, создающих нагрузку на сервер, периодически пополняется и обновляется:

  • Java
  • NjuiceBot
  • Gigabot
  • Scrapy
  • Baiduspider
  • SeznamBot
  • crawler
  • JS-Kit
  • HybridBot
  • Voyager
  • PostRank
  • DomainCrawler
  • SemrushBot
  • MegaIndex.ru
  • ltx71
  • SurveyBot
  • AhrefsBot
  • Exabot
  • Aport
  • CCBot
  • DotBot
  • GetIntent\Crawler
  • ia_archiver
  • SurveyBot
  • larbin
  • Butterfly
  • libwww
  • bingbot
  • Wget
  • SWeb
  • LinkExchanger
  • Soup
  • GrapeshotCrawler
  • WordPress
  • DnyzBot
  • spbot
  • DeuSu
  • MLBot
  • InternetSeer
  • BUbiNG
  • FairShare
  • Yeti
  • Birubot
  • YottosBot
  • gold\ crawler
  • Linguee
  • Ezooms
  • lwp-trivial
  • Purebot
  • kmSearchBot
  • SiteBot
  • CamontSpider
  • ptd-crawler
  • HTTrack
  • suggybot
  • ttCrawler
  • Nutch
  • msnbot
  • msnbot-media
  • Slurp
  • Zeus
  • Abonti
  • aggregator
  • AhrefsBot
  • Aport
  • asterias
  • Baiduspider
  • BDCbot
  • Birubot
  • BLEXBot
  • BUbiNG
  • BuiltBotTough
  • Bullseye
  • BunnySlippers
  • Butterfly
  • ca\-crawler
  • CamontSpider
  • CCBot
  • Cegbfeieh
  • CheeseBot
  • CherryPicker
  • coccoc
  • CopyRightCheck
  • cosmos
  • crawler
  • Crescent
  • CyotekWebCopy/1\.7
  • CyotekHTTP/2\.0
  • DeuSu
  • discobot
  • DittoSpyder
  • DnyzBot
  • DomainCrawler
  • DotBot
  • Download Ninja
  • EasouSpider
  • EmailCollector
  • EmailSiphon
  • EmailWolf
  • EroCrawler
  • Exabot
  • ExtractorPro
  • Ezooms
  • FairShare
  • Fasterfox
  • FeedBooster
  • Foobot
  • Genieo
  • GetIntent\Crawler
  • Gigabot
  • gold\ crawler
  • GrapeshotCrawler
  • grub\-client
  • Harvest
  • hloader
  • httplib
  • HTTrack
  • humanlinks
  • HybridBot
  • ia_archiver
  • ieautodiscovery
  • Incutio
  • InfoNaviRobot
  • InternetSeer
  • IstellaBot
  • Java
  • Java/1\.
  • JamesBOT
  • JennyBot
  • JS-Kit
  • k2spider
  • Kenjin Spider
  • Keyword Density/0\.9
  • kmSearchBot
  • larbin
  • LexiBot
  • libWeb
  • libwww
  • Linguee
  • LinkExchanger
  • LinkextractorPro
  • linko
  • LinkScan/8\.1a Unix
  • LinkWalker
  • lmspider
  • LNSpiderguy
  • ltx71
  • lwp-trivial
  • lwp\-trivial
  • magpie
  • Mata Hari
  • MaxPointCrawler
  • MegaIndex
  • memoryBot
  • Microsoft URL Control
  • MIIxpc
  • Mippin
  • Missigua Locator
  • Mister PiX
  • MJ12bot
  • MLBot
  • moget
  • MSIECrawler
  • msnbot
  • msnbot-media
  • NetAnts
  • NICErsPRO
  • Niki\-Bot
  • NjuiceBot
  • NPBot
  • Nutch
  • Offline Explorer
  • OLEcrawler
  • Openfind
  • panscient\.com
  • PostRank
  • ProPowerBot/2\.14
  • ProWebWalker
  • ptd-crawler
  • Purebot
  • Python\-urllib
  • QueryN Metasearch
  • RepoMonkey
  • Риддлер
  • RMA
  • Scrapy
  • SemrushBot
  • serf
  • SeznamBot
  • SISTRIX
  • SiteBot
  • sitecheck\.Internetseer\.com
  • SiteSnagger
  • Serpstat
  • Slurp
  • SnapPreviewBot
  • Sogou
  • Soup
  • SpankBot
  • spanner
  • spbot
  • Spinn3r
  • SpyFu
  • suggybot
  • SurveyBot
  • suzuran
  • SWeb
  • Szukacz/1\.4
  • Teleport
  • Telesoft
  • The Intraformant
  • TheNomad
  • TightTwatBot
  • Титан
  • toCrawl/UrlDispatcher
  • True_Robot
  • ttCrawler
  • turingos
  • TurnitinBot
  • UbiCrawler
  • UnisterBot
  • Unknown
  • uptime files
  • URLy Warning
  • User-Agent
  • VCI
  • Vedma
  • Voyager
  • WBSearchBot
  • Web Downloader/6\.9
  • Web Image Collector
  • WebAuto
  • WebBandit
  • WebCopier
  • WebEnhancer
  • WebmasterWorldForumBot
  • WebReaper
  • WebSauger
  • Website Quester
  • Webster Pro
  • WebStripper
  • WebZip
  • Wget
  • WordPress
  • Wotbox
  • wsr\-agent
  • WWW\-Collector\-E
  • Yeti
  • YottosBot
  • Zao
  • Zeus
  • ZyBORG
  • ahrefsbot
  • ahrefs
  • qwantify
  • qwant
  • semrushbot
  • semrush
  • dotbot
  • mj12bot
  • Detectify
  • dotbot
  • Риддлер
  • LinkpadBot
  • BLEXBot
  • FlipboardProxy
  • aiHitBot
  • trovitBot

Напишите в комментариях, нужно ли расписывать, к чему относятся роботы выше (название сервиса и другая информация)?

Как заблокировать AhrefsBot, SemrushBot, MJ12bot и другие боты?

Существует 2 известных мне метода надежной блокировки от плохих ботов:

  1. Из-за блокировки входа в файле .htaccess, тоже находится в корне (рекомендуемый способ!).
  2. Из-за блокировки в файле robots.txt, находящемся в корне сайта.

Если вы знаете другие способы, обязательно напишите в комментариях!

Закрытие плохих ботов через .htaccess (рекомендую)

# BEGIN Bad Bot Blocker SetEnvIfNoCase User-Agent "Abonti|aggregator|AhrefsBot|Aport|asteries|Baiduspider|BDCbot|Birubot|BLEXBot|BUbiNG|BuiltBotTough|Bullseye|BunnySlipper |CherryPicker|Coccoc|CopyRightCheck|Cosmos|Crawler|Crescent|CyotekWebCopy/1\.7|CyotekHTTP/2\.0|DeuSu|Discobot|DittoSpyder|DnyzBot|DomainCrawler| EroCrawler|Exabot|ExtractorPro|Ezooms|FairShare|Fasterfox|FeedBooster|Foobot|Genieo|GetIntent\ Crawler|Gigabot|gold\ crawler|GrapeshotCrawler |Incutio|InfoNaviRobot|InternetSeer|IstellaBot|Java|Java/1\.|JamesBOT|JennyBot|JS-Kit|k2spider| LinkExchanger|LinkextractorPro|link|LinkScan/8\.1a Unix|LinkWalker|lmspider|LNSpiderguy|ltx71|lwp-trivial|lwp\-trivial|magpie|Mata Hari|MaxPointCrawler|MegaIndex|memoryBot|Microsoft URL Control| Missigua Locator|Mister PiX|MJ12bot|MLBot|moget|MSIECrawler|msnbot|msnbot-media|NetAnts|NICERSPRO|Niki-Bot|NjuiceBot|NPBot| /2\.14|ProWebWalker|ptd-crowler|Purebot|PycURL|Python\-urllib|QueryN Metasearch| SiteSnagger|Serpstat|Slurp|SnapPreviewBot|Sogou|Soup|SpankBot|spanner|spbot|Spinn3r|SpyFu|suggybot| toCrawl/UrlDispatcher|True_Robot|ttCrawler|turingos|TurnitinBot|UbiCrawler|UnisterBot|Unknown|uptime files|URLy Warning|User-Agent|VCI|Vedma|Voyager|WBSearchBot|Web Downloader/6\. WebBandit|WebCopier|WebEnhancer|WebmasterWorldForumBot|WebReaper|WebSauger|Website Quester|Webster Pro|WebStripper|WebZip|Wget|WordPress|Wotbox| bad_bot Deny from env=bad_bot # END Bad Bot Blocker

Снижение нагрузки на сервер после закрытия ботов в .htaccess:

Снижение нагрузки на сервер после закрытия ботов в .htaccess
Снижение нагрузки на сервер после закрытия ботов в .htaccess

Закрытие плохих роботов из-за robots.txt

User-agent: Java Disallow: / User-agent: NjuiceBot Disallow: / User-agent: Gigabot Disallow: / User-agent: Scrapy Disallow: / User-agent: Baiduspider Disallow: / User-agent: SeznamBot Disallow: / User- agent: Crawler Disallow: / User-agent: JS-Kit Disallow: / User-agent: HybridBot Disallow: / User-agent: Voyager Disallow: / User-agent: PostRank Disallow: / User-agent: DomainCrawler Disallow: / User- agent: SemrushBot Disallow: / User-agent: MegaIndex.ru Disallow: / User-agent: ltx71 Disallow: / User-agent: SurveyBot Disallow: / User-agent: AhrefsBot Disallow: / User-agent: Exabot Disallow: / User- agent: Aport Disallow: / User-agent: CCBot Disallow: / User-agent: DotBot Disallow: / User-agent: ia_archiver Disallow: / User-agent: SurveyBot Disallow: / User-agent: larbin Disallow: / User-agent: Butterfly Disallow: / User-agent: libwww Disallow: / User-agent: bingbot Disallow: / User-agent: Wget Disallow: / User-agent: SWeb Disallow: / User-agent: LinkExchanger Disallow: / User-agent: Soup Disallow : / User-agent: GrapeshotCrawler Disallow: / User-agent: WordPress Disallow: / User-agent: DnyzBot Disallow: / User-agent: spbot Disallow: / User-agent: DeuSu Disallow: / User-agent: MLBot Disallow: / User-agent: InternetSeer Disallow: / User-agent: BUbiNG Disallow: / User-agent: FairShare Disallow: / User-agent: Yeti Disallow: / User-agent: Birubot Disallow: / User-agent: YottosBot Disallow: / User- agent: Linguee Disallow: / User-agent: Ezooms Disallow: / User-agent: lwp-trivial Disallow: / User-agent: Purebot Disallow: / User-agent: kmSearchBot Disallow: / User-agent: SiteBot Disallow: / User- agent: CamontSpider Disallow: / User-agent: ptd-crawler Disallow: / User-agent: HTTrack Disallow: / User-agent: suggybot Disallow: / User-agent: ttCrawler Disallow: / User-agent: Nutch Disallow: / User- agent: msnbot Disallow: / User-agent: msnbot-media Disallow: / User-agent: Slurp Disallow: / User-agent: Zeus Disallow: / User-agent: Abonti Disallow: / User-agent: aggregator Disallow: / User- agent: AhrefsBot Disallow: / User-agent: Aport Disallow: / User-agent: asterias Disallow: / User-agent: Baiduspider Disallow: / User-agent: BDCbot Disallow: / User-agent: Birubot Disallow: / User-agent: BLEXBot Disallow: / User-agent: BUbiNG Disallow: / User-agent: BuiltBotTough Disallow: / User-agent: Bullseye Disallow: / User-agent: BunnySlippers Disallow: / User-agent: Butterfly Disallow: / User-agent: CamontSpider : / User-agent: CCBot Disallow: / User-agent: Cegbfeieh Disallow: / User-agent: CheeseBot Disallow: / User-agent: CherryPicker Disallow: / User-agent: coccoc Disallow: / User-agent: CopyRightCheck Disallow: / User-agent: cosmos Disallow: / User-agent: crawler Disallow: / User-agent: Crescent Disallow: / User-agent: DeuSu Disallow: / User-agent: discobot Disallow: / User-agent: DittoSpyder Disallow: / User- agent: DnyzBot Disallow: / User-agent: DomainCrawler Disallow: / User-agent: DotBot Disallow: / User-agent: Download Ninja Disallow: / User-agent: EasouSpider Disallow: / User-agent: EmailCollector Disallow: / User-agent : EmailSiphon Disallow: / User-agent: EmailWolf Disallow: / User-agent: EroCrawler Disallow: / User-agent: Exabot Disallow: / User-agent: ExtractorPro Disallow: / User-agent: Ezooms Disallow: / User-agent: FairShare Disallow: / User-agent: Fasterfox Disallow: / User-agent: FeedBooster Disallow: / User-agent: Foobot Disallow: / User-agent: Genieo Disallow: / User-agent: Gigabot Disallow: / User-agent: GrapeshotCrawler Disallow: / User-agent: Harvest Disallow: / User-agent: hloader Disallow: / User-agent: httplib Disallow: / User-agent: HTTrack Disallow: / User-agent: humanlinks Disallow: / User-agent: HybridBot Disallow: / User -agent: ia_archiver Disallow: / User-agent: ieautodiscovery Disallow: / User-agent: Incutio Disallow: / User-agent: InfoNaviRobot Disallow: / User-agent: InternetSeer Disallow: / User-agent: IstellaBot Disallow: / User-agent : Java Disallow: / User-agent: JamesBOT Disallow: / User-agent: JennyBot Disallow: / User-agent: JS-Kit Disallow: / User-agent: k2spider Disallow: / User-agent: Kenjin Spider Disallow: / User- agent: kmSearchBot Disallow: / User-agent: larbin Disallow: / User-agent: LexiBot Disallow: / User-agent: libWeb Disallow: / User-agent: libwww Disallow: / User-agent: Linguee Disallow: / User-agent: LinkExchanger Disallow: / User-agent: LinkextractorPro Disallow: / User-agent: linko Disallow: / User-agent: LinkScan/8\.1a Unix Disallow: / User-agent: LinkWalker Disallow: / User-agent: lmspider Disallow: / User-agent: LNSpiderguy Disallow: / User-agent: ltx71 Disallow: / User-agent: lwp-trivial Disallow: / User-agent: magpie Disallow: / User-agent: Mata Hari Disallow: / User-agent: MaxPointCrawler Disallow: / User-agent: MegaIndex Disallow: / User-agent: memoryBot Disallow: / User-agent: Microsoft URL Control Disallow: / User-agent: MIIxpc Disallow: / User-agent: Mippin Disallow: / User-agent: Missigua Locator Disallow : / User-agent: Mister PiX Disallow: / User-agent: MJ12bot Disallow: / User-agent: MLBot Disallow: / User-agent: moget Disallow: / User-agent: MSIECrawler Disallow: / User-agent: msnbot Disallow: / User-agent: msnbot-media Disallow: / User-agent: NetAnts Disallow: / User-agent: NICErsPRO Disallow: / User-agent: NjuiceBot Disallow: / User-agent: NPBot Disallow: / User-agent: Nutch Disallow: / User-agent: Offline Explorer Disallow: / User-agent: OLEcrawler Disallow: / User-agent: Openfind Disallow: / User-agent: PostRank Disallow: / User-agent: ProWebWalker Disallow: / User-agent: ptd-crawler Disallow : / User-agent: Purebot Disallow: / User-agent: QueryN Metasearch Disallow: / User-agent: RepoMonkey Disallow: / User-agent: Riddler Disallow: / User-agent: RMA Disallow: / User-agent: Scrapy Disallow: / User-agent: SemrushBot Disallow: / User-agent: serf Disallow: / User-agent: SeznamBot Disallow: / User-agent: SISTRIX Disallow: / User-agent: SiteBot Disallow: / User-agent: SiteSnagger Disallow: / User -agent: Serpstat Disallow: / User-agent: Slurp Disallow: / User-agent: SnapPreviewBot Disallow: / User-agent: Sogou Disallow: / User-agent: Soup Disallow: / User-agent: SpankBot Disallow: / User-agent : spanner Disallow: / User-agent: spbot Disallow: / User-agent: Spinn3r Disallow: / User-agent: SpyFu Disallow: / User-agent: suggybot Disallow: / User-agent: SurveyBot Disallow: / User-agent: suzuran Disallow: / User-agent: SWeb Disallow: / User-agent: Szukacz/1\.4 Disallow: / User-agent: Teleport Disallow: / User-agent: Telesoft Disallow: / User-agent: The Intraformant Disallow: / User -agent: TheNomad Disallow: / User-agent: TightTwatBot Disallow: / User-agent: Titan Disallow: / User-agent: toCrawl/UrlDispatcher Disallow: / User-agent: True_Robot Disallow: / User-agent: ttCrawler Disallow: / User -agent: turingos Disallow: / User-agent: TurnitinBot Disallow: / User-agent: UbiCrawler Disallow: / User-agent: UnisterBot Disallow: / User-agent: Unknown Disallow: / User-agent: uptime files Disallow: / User- agent: URLy Warning Disallow: / User-agent: User-Agent Disallow: / User-agent: VCI Disallow: / User-agent: Vedma Disallow: / User-agent: Voyager Disallow: / User-agent: WBSearchBot Disallow: / User -agent: Web Image Collector Disallow: / User-agent: WebAuto Disallow: / User-agent: WebBandit Disallow: / User-agent: WebCopier Disallow: / User-agent: WebEnhancer Disallow: / User-agent: WebmasterWorldForumBot Disallow: / User -agent: WebReaper Disallow: / User-agent: WebSauger Disallow: / User-agent: Website Quester Disallow: / User-agent: Webster Pro Disallow: / User-agent: WebStripper Disallow: / User-agent: WebZip Disallow: / User -agent: Wget Disallow: / User-agent: WordPress Disallow: / User-agent: Wotbox Disallow: / User-agent: Yeti Disallow: / User-agent: YottosBot Disallow: / User-agent: Zao Disallow: / User-agent : Zeus Disallow: / User-agent: ZyBORG Disallow: / User-agent: ahrefsbot Disallow: / User-agent: ahrefs Disallow: / User-agent: qwantify Disallow: / User-agent: qwant Disallow: / User-agent: semrushbot Disallow: / User-agent: semrush Disallow: / User-agent: dotbot Disallow: / User-agent: mj12bot Disallow: / User-agent: Detectify Disallow: / User-agent: dotbot Disallow: / User-agent: Riddler Disallow: / User-agent: LinkpadBot Disallow: / User-agent: BLEXBot Disallow: / User-agent: FlipboardProxy Disallow: / User-agent: aiHitBot Disallow: / User-agent: trovitBot Disallow: /

Пример использования