Site and robots
Your Salesforce Site is for Customers, not for bots!
You have created a pretty Salesforce site to manage your business, and you want to make it cost effective.There are some limits on the platform in term of CPU, bandwidth and pages viewed per day, if you reach those limits, you probably will have to pay to increase them (for instance changing from an enterprise edition to an unlimited edition).
The resource consumption should be useful, that means targeted to your expected visitors. Did you know that most of the visits are not real human visits ?
The web is not a beautiful place with only friendly people. The underground web is based on machines. These "bots" (robots) are downloading pages for good or bad reasons. And each time they get a page from your site, it's part of the available resources they consume. The issue is the ratio between humans and bots. If a website has not put in place any protection, you will get more trafic from bots than humans.
How to optimize resources ? The first step is to prevent crawling from bad bots. Of course, you will have to make a choice between "good" and "bad". For instance, google, bing, baidu and a few other are crawling the web to make you apear in the search results. Don't block them as they will give your real visitors. On the other side, you have a few bots that are crawling the web to get content information that will be sold: they consume your resources and you don't get money from them - stop them. You can even have bots that will harvest email addresses from your pages, or try to identify security issues (such as a form that is not protected by a captcha). You absolutely need to block them.
The quick win is that Salesforce is providing you a simple standard option to tell the bots they are not wecome: the use of a standard file called "robots.txt" (the file is common for all your salesforce sites). You just have to define a list and associated rights. Pay attention to the fact that very bad bots don't read this file, they won't be blocked.
By default, salesforce will prevent all bots for non production orgs (dev edition etc.). You absolutely need to define a robots.txt for your production org. The syntax is quite simple, but the content is not easy to define: how can you know which robots to put in the file? The following content is a VisualForce page that you will have to add to your org, and then point to this VF page in your Salesforce site configuration, and voila! Taking 5 minutes to do this can spare lots of money.
<apex:page contentType="text/plain" showHeader="false"> User-agent: 008 user-agent: AhrefsBot User-agent: aipbot User-agent: Alexibot User-agent: AlvinetSpider User-agent: Amfibibot User-agent: Antenne Hatena User-agent: antibot User-agent: ApocalXExplorerBot User-agent: asterias User-agent: BackDoorBot/1.0 User-agent: BecomeBot User-agent: Biglotron User-agent: BizInformation User-agent: Black Hole User-agent: BLEXBot User-agent: BlowFish/1.0 User-agent: BotALot User-agent: BruinBot User-agent: BuiltBotTough User-agent: Bullseye/1.0 User-agent: BunnySlippers User-agent: CatchBot User-agent: ccubee User-agent: ccubee/3.5 User-agent: Cegbfeieh User-agent: CheeseBot User-agent: CherryPicker User-agent: CherryPickerElite/1.0 User-agent: CherryPickerSE/1.0 User-agent: Combine User-agent: ConveraCrawler User-agent: ConveraMultiMediaCrawler User-agent: CoolBot User-agent: CopyRightCheck User-agent: cosmos User-agent: Crescent User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0 User-agent: DimensioNet User-agent: discobot User-agent: DISCo Pump 3.1 User-agent: DittoSpyder User-agent: dotbot User-agent: Drecombot User-agent: DTAAgent User-agent: e-SocietyRobot User-agent: EmailCollector User-agent: EmailSiphon User-agent: EmailWolf User-agent: envolk User-agent: EroCrawler User-agent: EverbeeCrawler User-agent: ExtractorPro User-agent: Flamingo_SearchEngine User-agent: Foobot User-Agent: FDSE User-agent: g2Crawler User-agent: genieBot User-agent: gsa-crawler User-agent: Harvest/1.5 User-agent: hloader User-agent: HooWWWer User-agent: httplib User-agent: HTTrack User-agent: HTTrack 3.0 User-agent: humanlinks User-agent: Igentia User-agent: InfoNaviRobot User-agent: Ipselonbot User-agent: IRLbot User-agent: JennyBot User-agent: JikeSpider User-agent: Jyxobot User-agent: KavamRingCrawler User-agent: Kenjin Spider User-Agent: larbin User-agent: LexiBot User-agent: libWeb/clsHTTP User-agent: LinkextractorPro User-agent: LinkScan/8.1a Unix User-agent: linksmanager User-agent: LinkWalker User-Agent: lmspider User-agent: lwp-trivial User-agent: lwp-trivial/1.34 User-agent: Mata Hari User-agent: Microsoft URL Control - 5.01.4511 User-agent: Microsoft URL Control - 6.00.8169 User-agent: MIIxpc User-agent: MIIxpc/4.2 User-agent: minibot(NaverRobot)/1.0 User-agent: Mister PiX User-Agent: MJ12bot User-agent: MLBot User-agent: moget User-agent: moget/2.1 User-agent: MS Search 4.0 Robot User-agent: MS Search 5.0 Robot User-Agent: MSIECrawler User-Agent: MyFamilyBot User-agent: Naverbot User-agent: NetAnts User-agent: NetAttache User-agent: NetMechanic User-Agent: NetResearchServer User-agent: NextGenSearchBot User-agent: NICErsPRO User-agent: noxtrumbot User-agent: NPBot User-agent: Nutch User-agent: NutchCVS User-agent: Offline Explorer User-Agent: OmniExplorer_Bot User-agent: Openfind User-agent: OpenindexSpider User-Agent: OpenIntelligenceData User-agent: PhpDig User-agent: pompos User-agent: ProPowerBot/2.14 User-agent: ProWebWalker User-agent: psbot User-agent: QuepasaCreep User-agent: QueryN Metasearch User-agent: Radian6 User-agent: R6_FeedFetcher User-agent: R6_CommentReader User-agent: RepoMonkey User-agent: RMA User-agent: RufusBot User-Agent: SBIder User-Agent: schibstedsokbot User-Agent: ScSpider User-agent: SearchmetricsBot User-Agent: semanticdiscovery User-agent: SemrushBot User-agent: Shim-Crawler User-Agent: ShopWiki User-agent: SightupBot User-Agent: silk user-agent: sistrix user-agent: sitebot User-agent: SiteSnagger User-agent: SiteSucker User-agent: Slurp User-agent: Sogou web spider User-agent: sosospider User-agent: SpankBot User-agent: spanner User-agent: Speedy User-agent: Sproose User-agent: Steeler User-agent: suggybot User-agent: SuperBot User-agent: SuperBot/2.6 User-agent: suzuran User-agent: Szukacz/1.4 User-agent: Tarantula User-agent: Teleport User-agent: Telesoft User-agent: The Intraformant User-agent: TheNomad User-agent: Theophrastus User-agent: TightTwatBot User-agent: Titan User-agent: toCrawl/UrlDispatcher User-agent: TosCrawler User-agent: TridentSpider User-agent: True_Robot User-agent: True_Robot/1.0 User-agent: turingos User-agent: turnitinbot User-agent: twiceler User-agent: Ultraseek User-agent: UrlPouls User-agent: URLy Warning User-agent: Vagabondo User-agent: VCI User-agent: Verticrawlbot User-agent: voyager User-agent: voyager/1.0 User-agent: Web Image Collector User-agent: WebAuto User-agent: WebBandit User-agent: WebBandit/3.50 User-agent: WebCopier User-agent: webcopy User-agent: WebEnhancer User-agent: WebIndexer User-agent: WebmasterWorldForumBot User-agent: webmirror User-agent: WebReaper User-agent: WebSauger User-agent: website extractor User-agent: Website Quester User-agent: Webster Pro User-agent: WebStripper User-agent: WebStripper/2.02 User-agent: WebZip User-agent: Wget User-agent: WikioFeedBot User-agent: WinHTTrack User-agent: WWW-Collector-E User-agent: Xenu Link Sleuth/1.3.8 User-agent: xirq User-agent: yacy User-agent: YRSPider User-agent: ZeBot User-agent: ZeBot_www.ze.bz User-agent: Zeus User-agent: Zookabot Disallow: / Sitemap: https://www.adminbooster.com/sitemap.xml </apex:page>