- This topic has 8 replies, 4 voices, and was last updated 2 years, 1 month ago by Leo.
-
AuthorPosts
-
March 2, 2022 at 8:58 pm #2140256Clare
Hi there.
I’ve been contacted by my hosting service provider regarding my site’s ‘high resource usage’. My site is pretty small. I thought it was related to my images so I changed them all but the issue persists apparently. The contact has come back and thinks it could be related to a facebook bot (I have inserted code on my site). He provided a snippet of data which showed that it visited my site 25 times in one day even though I don’t have any ads currently running. He has provided some code as follows that he think might help:
– Add rules to .htaccess to block common bad bots:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT}
^.*(Adsbot|Ahrefs|MJ12bot|Seznam|Baiduspider|Yandex|SemrushBot|DotBot|spbot).*$
[NC]
RewriteRule .* – [F,L]– Setup a crawl delay in robots.txt to inform bots that read it to rate limit requests. Create robots.txt file with:
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Crawl-delay: 30I have a few questions:
1. Any idea if this would help?
2. Where would I put this code?
3. Are there any security or privacy issues with the bot?
4. Should I just remove the code when no ads are running and if I did this would this resolve the issue or are there other bots that might visit uninvited?I understand this question doesn’t really relate to theme issues, but wondered if you knowledgeable peeps could help?
Thanks
ClareMarch 2, 2022 at 9:35 pm #2140286ElvinStaffCustomer SupportHi Clare,
For Q1:
It will if you add the correct userAgent. I’m not sure what UA FB uses though so I can’t verify if the exact code you provided is correct.
For Q2:
htaccess and robots.txt can be opened if you have FTP access. FTP access is simply, opening your site’s files and folders..htaccess
androbots.txt
as usually in the root folder of your server. If you’re having troubles finding it, you may have to ask your hosting for support.For Q3:
Potentially, yes. Bots gather information about your site. This is actually a big issue w/ social media sites because they gather information about you and sell it to big companies.For Q4:
I’m not sure how you can programmatically determine if an ad is running from its script. The best way to go is by setting up your robots.txt (it’s empty) and .htaccess to at least curb what they can do.March 2, 2022 at 10:29 pm #2140321ClareThanks Elvin. Do you know what the code would be to block bots entirely? Thanks
March 2, 2022 at 10:57 pm #2140334ElvinStaffCustomer SupportThanks Elvin. Do you know what the code would be to block bots entirely? Thanks
It’s not within my expertise, unfortunately. While I understand what has to be done, the specifics of the procedure eludes me.
I’m not sure if
facebookexternalhit
is the user agent to be used. but if it is, you can use these articles as basis for its usage – https://kinsta.com/blog/wordpress-robots-txt/
https://yoast.com/ultimate-guide-robots-txt/March 3, 2022 at 1:22 am #2140419LonginosHi
Here my 5cts to this question.1.- Facebook is not excluded with the code you provided, need to add Facebook to the chain:
^.*(Adsbot|Ahrefs|MJ12bot|Seznam|Baiduspider|Yandex|SemrushBot|DotBot|spbot|Facebook).*$
Saying this, if you share a post in FB then FB bot MUST see your site to extract the info it need for the shared post. Facebook bot (or Twitter bot) don´t crawl your site only fecht your site to extract info for your shared post (Image, publisher, date….). If you block it your shared post look w/o any info, only a url.2.- Crawl-delay as far as I know is not a standar so some bots don´t care about it. I think only Google bot obey this.
3.- Yes, you can block all (well, not all but most of it) bots, using:
^.*((B|b)ot|(C|c)rawl).*$
But, again, this block ALL bot types, included Google or Bing bots, so your site will be delisted in these search engines.4.- Having, say, 200 request/day from bots is likely having 200 request/day from users, if your site have a high resource usage with this amount of visits/day then there are, I think, other problems: misconfiguration of some type (database mostly) or your hosting plan must be upgraded to suuport more users.
March 3, 2022 at 1:35 am #2140435ClareThanks Elvin. I’ll look into it. Thanks so much for your input.
March 3, 2022 at 1:46 am #2140448ElvinStaffCustomer SupportYou can try @Longinos’ suggestion for #1. (it’s for .htaccess).
And really consider what he mentioned on #4. (This is related to what he mentions about #2. I’m not sure if Facebook followed on Google’s example on obeying it but its most likely not the case.)
You may really have to upgrade your plan. Maybe to at least have some sort of smart/intelligent caching for any requests from bots.
No problem. 😀
March 6, 2022 at 1:35 pm #2144639ClareThanks Longinos and Elvin. From what you’ve both said it sounds like there might be more to it than adding a bit of code. I think I need a web developer as this is beyond my skill set and I just don’t have the time to investigate what might or might not work. Do either of you know where I can find a list of wordpress developers? Thanks so much, Clare.
March 6, 2022 at 1:59 pm #2144653LeoStaffCustomer SupportPlease see the Custom WordPress Development suggestion at the bottom of this page:
https://generatepress.com/what-support-includes/ -
AuthorPosts
- You must be logged in to reply to this topic.