[Support request] Facebook bot issues

Home Forums Support [Support request] Facebook bot issues

Home Forums Support Facebook bot issues

Viewing 9 posts - 1 through 9 (of 9 total)
  • Author
    Posts
  • #2140256
    Clare

    Hi there.

    I’ve been contacted by my hosting service provider regarding my site’s ‘high resource usage’. My site is pretty small. I thought it was related to my images so I changed them all but the issue persists apparently. The contact has come back and thinks it could be related to a facebook bot (I have inserted code on my site). He provided a snippet of data which showed that it visited my site 25 times in one day even though I don’t have any ads currently running. He has provided some code as follows that he think might help:

    – Add rules to .htaccess to block common bad bots:

    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT}
    ^.*(Adsbot|Ahrefs|MJ12bot|Seznam|Baiduspider|Yandex|SemrushBot|DotBot|spbot).*$
    [NC]
    RewriteRule .* – [F,L]

    – Setup a crawl delay in robots.txt to inform bots that read it to rate limit requests. Create robots.txt file with:

    User-agent: *
    Disallow: /wp-admin/
    Allow: /wp-admin/admin-ajax.php
    Crawl-delay: 30

    I have a few questions:
    1. Any idea if this would help?
    2. Where would I put this code?
    3. Are there any security or privacy issues with the bot?
    4. Should I just remove the code when no ads are running and if I did this would this resolve the issue or are there other bots that might visit uninvited?

    I understand this question doesn’t really relate to theme issues, but wondered if you knowledgeable peeps could help?

    Thanks
    Clare

    #2140286
    Elvin
    Staff
    Customer Support

    Hi Clare,

    For Q1:

    It will if you add the correct userAgent. I’m not sure what UA FB uses though so I can’t verify if the exact code you provided is correct.

    For Q2:
    htaccess and robots.txt can be opened if you have FTP access. FTP access is simply, opening your site’s files and folders.

    .htaccess and robots.txt as usually in the root folder of your server. If you’re having troubles finding it, you may have to ask your hosting for support.

    For Q3:
    Potentially, yes. Bots gather information about your site. This is actually a big issue w/ social media sites because they gather information about you and sell it to big companies.

    For Q4:
    I’m not sure how you can programmatically determine if an ad is running from its script. The best way to go is by setting up your robots.txt (it’s empty) and .htaccess to at least curb what they can do.

    A wise man once said:
    "Have you cleared your cache?"

    #2140321
    Clare

    Thanks Elvin. Do you know what the code would be to block bots entirely? Thanks

    #2140334
    Elvin
    Staff
    Customer Support

    Thanks Elvin. Do you know what the code would be to block bots entirely? Thanks

    It’s not within my expertise, unfortunately. While I understand what has to be done, the specifics of the procedure eludes me.

    I’m not sure if facebookexternalhit is the user agent to be used. but if it is, you can use these articles as basis for its usage – https://kinsta.com/blog/wordpress-robots-txt/
    https://yoast.com/ultimate-guide-robots-txt/

    A wise man once said:
    "Have you cleared your cache?"

    #2140419
    Longinos

    Hi
    Here my 5cts to this question.

    1.- Facebook is not excluded with the code you provided, need to add Facebook to the chain:
    ^.*(Adsbot|Ahrefs|MJ12bot|Seznam|Baiduspider|Yandex|SemrushBot|DotBot|spbot|Facebook).*$
    Saying this, if you share a post in FB then FB bot MUST see your site to extract the info it need for the shared post. Facebook bot (or Twitter bot) don´t crawl your site only fecht your site to extract info for your shared post (Image, publisher, date….). If you block it your shared post look w/o any info, only a url.

    2.- Crawl-delay as far as I know is not a standar so some bots don´t care about it. I think only Google bot obey this.

    3.- Yes, you can block all (well, not all but most of it) bots, using:
    ^.*((B|b)ot|(C|c)rawl).*$
    But, again, this block ALL bot types, included Google or Bing bots, so your site will be delisted in these search engines.

    4.- Having, say, 200 request/day from bots is likely having 200 request/day from users, if your site have a high resource usage with this amount of visits/day then there are, I think, other problems: misconfiguration of some type (database mostly) or your hosting plan must be upgraded to suuport more users.

    #2140435
    Clare

    Thanks Elvin. I’ll look into it. Thanks so much for your input.

    #2140448
    Elvin
    Staff
    Customer Support

    You can try @Longinos’ suggestion for #1. (it’s for .htaccess).

    And really consider what he mentioned on #4. (This is related to what he mentions about #2. I’m not sure if Facebook followed on Google’s example on obeying it but its most likely not the case.)

    You may really have to upgrade your plan. Maybe to at least have some sort of smart/intelligent caching for any requests from bots.

    No problem. 😀

    A wise man once said:
    "Have you cleared your cache?"

    #2144639
    Clare

    Thanks Longinos and Elvin. From what you’ve both said it sounds like there might be more to it than adding a bit of code. I think I need a web developer as this is beyond my skill set and I just don’t have the time to investigate what might or might not work. Do either of you know where I can find a list of wordpress developers? Thanks so much, Clare.

    #2144653
    Leo
    Staff
    Customer Support

    Please see the Custom WordPress Development suggestion at the bottom of this page:
    https://generatepress.com/what-support-includes/

Viewing 9 posts - 1 through 9 (of 9 total)
  • You must be logged in to reply to this topic.