Today on Blogcritics
Home » Culture and Society » Science and Technology » Open Source: Configuring Apache – Don’t Succumb To The “Slashdot Effect”

Open Source: Configuring Apache – Don’t Succumb To The “Slashdot Effect”

Please Share...Tweet about this on Twitter0Share on Facebook0Share on Google+0Share on LinkedIn0Pin on Pinterest0Share on TumblrShare on StumbleUpon0Share on Reddit0Email this to someone

Like many techno-geeks I host my LAMP website on a cheap ($150) computer and my broadband connection. I have also wondered what would happen if my site was linked on Slashdot or Digg. Specifically, would my setup be able to survive the “Slashdot Effect?” A Pentium 100mhz can easily saturate a T1’s worth of bandwidth and my upload speed is capped (supposedly) at 384kbps, so the server should easily be able to handle that. My bandwidth will be saturated before the server is incapacitated, at least that’s the idea.

The machine I use for my web server is a $150 PC that I bought from Fry’s one day (I always buy their $150 PC’s when they’re in stock). Here are the relevant specs on my little server:

CPU: AMD Athlon 2600+
RAM: 512MB
Hard Drive: 40GB 7200RPM
Software: Debian Linux, MySQL, Apache, PHP, WordPress

There is additional software installed on this machine because it is also used as a desktop computer. However, none of that software is important for the purposes of this article.

The RAM has been upgraded since I purchased the machine from Fry’s because it originally came with 128MB, which is a little low for my tastes. The only other upgrade was a new CPU fan and that was out of personal preference, the default fan was just too loud.

Below are some directives in my httpd.conf and some general recommendations that I think are vital to helping you survive a good Slashdotting on low-budget hardware.

  • MaxKeepAliveRequests 0
    The KeepAlive directive in httpd.conf allows persistent connections to the web server, so that new connection does not have to be initiated for each request. Setting the MaxKeepAliveRequests directive to 0 enables unlimited number of requests per connection, which makes sense if you think about it. Why allow persistent connections but then terminate them after a short period of time?
  • KeepAliveTimeout 15
    Because persistent connections are allowed, it is important that they are not kept open indefinitely. This directive will close the connection after 15 seconds of inactivity.
  • MinSpareServers 15
    This is the minimum number of spare servers you want running at any given time. This way, if multiple simultaneous requests are received there will already be child processes running to handle them. Setting this number too high is a waste of system resources and setting it too low will cause the system to slow down.
  • MaxSpareServers 65
    Same as above, but the maximum child processes running at any given time.
  • StartServers 15
    This is the number of servers Apache will start initially. As more servers handle requests a minimum of 15 spare servers will run up to the maximum of 64.
  • MaxClients 500
    This is the maximum number of simultaneous clients that can connect to the server at any given time. Setting this number too low will result in users being locked out of the server under normal traffic situations and setting it too high will result in your server being so overloaded that all the requests timeout anyway. I think 500 is about right for most people’s needs.
  • MaxRequestsPerChild 100000
    Sets the maximum number of requests each child process will handle. This is mostly to prevent memory leaks and other mishaps but is important nonetheless. Setting this too low will cause a large portion of child processes to end for no real reason, thus slowing down the site. This could be set to 0 (unlimited) but that would negate any protection from valid issues like memory leaks.
  • HostnameLookups off
    This prevents DNS lookups of all the visitors to the site, I am pretty sure it’s off by default. If it’s on in your httpd.conf I would recommend turning it off.

Those are the main directives in my httpd.conf that I pay attention to for traffic handling purposes. Besides tweaking Apache, I also do other things to help ensure that my site doesn’t get too overwhelmed.

I minimize graphics on my site, and use css instead (where I can). This is pretty easy with WordPress, depending on which theme you use. I stay away from themes with a lot of images and I tend not to put any in my posts either. They’re just too much of a drain on bandwidth, especially if you have a lot of traffic. On top of all that, I don’t really like seeing graphics when I go to other sites. Most of the time they just get in the way of the information.

As far as static pages go, that isn’t much of an option for me. Everything in WordPress is dynamically pulled from the database unless certain plug-ins are installed and since my upload speed is the main bottleneck in my implementation, static pages aren’t really a factor. However, if you have a faster upload speed, then having a cache of static pages would speed things up for you.

Another thing that will help with bandwidth if you submit a link to one of the larger sites (Digg, Slashdot, etc.) is to use CoralCDN. CoralCDN is essentially a caching/proxy service that will reduce the drain on your bandwidth. All you have to do to use it is append “.nyud:net:8090” (without the quotes) to any link you submit. All requests for that link will then be automatically routed through CoralCDN.

Those are just a few things you can do to help avoid having your server killed by Slashdot or Digg. The Apache configuration changes are important, but so is having a simple site that is low on graphics and other bandwidth intensive content. I’m sure there are many other things that can be done, and I don’t claim to be an expert in this field (hence the general recommendations). So far, the previously mentioned things have helped my site stay up under some heavy traffic, but I have yet to be Slashdotted (thankfully?). If anyone else has recommendations on additional precautions I can take, I’m more than happy to hear them. If this gets onto Digg, Slashdot, etc. then it will be a good test of the things I’ve mentioned. We’ll have to wait and see.

Originally posted on PoliticalApathy.com

Powered by

About Adam Drake

  • http://shawnwilsher.com Shawn Wilsher

    For Coral, you need to append ‘.nyud.net:8090′ as opposed to what you said.

    Source

  • http://adrake.blogdns.com Adam Drake

    Sorry about the typo, you’re correct, it will be changed ASAP.

  • http://www.saiyine.com Saiyine

    Can’t believe that you have an upload capped at 384, a 2800+ server and don’t even mention mod_gzip.

  • http://uadmin.blogspot.com James Dickens

    Sorry to break this to you like this, but prepairing for a slashdotting, is more about content than cpu power especially if you have a slow connection. If you want to survive a /.’ing remove all large pictures. Even the smallest boxes can survive if they are only serving text, take your standard webpage its one or two pages long, complete with all the code overhead that is about 5k, and if you have 384KB/s upload, that is enough for nearly 60 users hitting your site per second and yes any box faster than a pentium can fill a 10mbit link. The problem with being slash dotted is the graphics, one average size jpeg that 300×300 in size, can be 50K. a screenshot can be 150KB you can see once you have 10 people accessing your site things start to degrade of course 10 people may be fine if they are patient, since each would be getting 3.8KB/s but most likely someone will think this is too slow, time to reload after they have grabed 25% of the file, and the server may not notice and keeps the old connection open for a bit, while another one is made.

    I have a friend that has survived numerous slashdottings, and on a daily basis he see’s 25,000 hits a day, on a 440 MHz machine. The machine is idle doing this, he even has graphics, not really large ones and he has a 10mbit link to the net.

  • http://www.paulisageek.com Paul Tarjan

    You got dugg talking about how to respond to being dugg. Wonderful!

  • http://adrake.blogdns.com Adam Drake

    Paul:

    I wouldn’t say I got dugg as bad as some do because the machine is up. However, I did reboot it because I couldn’t get in with SSH and when I walked over to the KB everything was frozen. It’s working fine now though and the traffic is more intense so I’m not sure what happened. The CPU has been 90% idle so I think (as I said in my article) that bandwidth will be the problem.

    I’m sure before the end of the day the folks from digg will succeed :)

  • http://www.bitstorm.org/edwin/en/ Edwin Martin

    Your CPU can be 90% idle *and* still your server is exhausted.

    That’s because the major bottleneck in (web)servers is the harddisk.

    I don’t know any tool to measure the load of a harddisk. Does anyone know?

    Another tip: use static pages as much as possible. Then cached pages and at the end dynamically generated pages. In dynamically generated pages, keep SQL-queries as few as possible. Also optimize all your SQL-queries. The database is often the slowest part of a webserver with dynamic content.

  • frankie

    I can’t be bothered to read your comment policy before I post a comment.

  • http://adrake.blogdns.com Adam Drake

    Edwin:
    I use iostats to get an overview of HDD activity, the output is more than detailed enough for me. I hear there is another utility called watch that is good as well, but I’ve never used it.

    Frankie:
    If you’re talking about comments on my site, I’m a bit careful with that because of spam. Apologies for any inconvenience it may have caused you.

  • http://www.stockalicious.com theCreator

    Just use lighttpd

  • http://adrake.blogdns.com Adam Drake

    theCreator:
    Trying out lighttpd is my next project. It (evidently) works great with RAILS and it’s possible to use WordPress as well (link).

  • NOT an admin

    MaxKeepAliveRequests: because if you don’t limit the number of requests I can indefinitely use up a whole lot of clients.

  • NOT an admin

    MaxClients 500 — that one you have to be careful with. You need to do tests where you actually consume the max clients to see if you start to swap and/or hit an io wait bottleneck before you get there, which you almost certainly will with many types of dynamic pages running with 500 clients. For example, if you have php pages averaging 8Mb allocation, 500 clients means you’re using 4Gb just for php.

  • http://www.addictz.org Mike

    I set MaxClients to 500, I get an error message about ServerLimit being set too low at 256 for that MaxClients. I add the line ServerLimit 500 and it still isn’t reading that ServerLimit 500 and gives me the same error about ServerLimit being set to 256

  • http://adrake.blogdns.com Adam Drake

    Mike:
    Did you add a line so that now you have two MaxClients directives? If so then get rid of one of them and just keep one that says MaxClients 256.

    If that’s not the problem, I don’t know how to help you. The Apache package provided in the default apt-get repository doesn’t have an issue with it. What distribution of Linux are you using?

  • http://adrake.blogdns.com Adam Drake

    err…

    “…and just keep the one that says MaxClients 256.”

    should read

    “…and just keep the one that says MaxClients 512.”

    That’s what I get for reading your post and typing at the same time.

  • http://www.limk.com/english/ Wrzl

    If we are talking about a LAMP site the bottleneck is the database queries. apache comes after that. After slashdoted you have to make a static version of that page and upload your files like images, videos maybe css and js files to a free service maybe several one.That survive you and save your bandwidth costs.

  • http://blog.subverted.net Wade Mealing

    Alternatively, you could setup a reverse proxy. This doesnt even have to be on your own site.

    I’ve documented it here ( http://blog.subverted.net/?p=351 ), decided to have something in place before the next time I was slashdotted.

  • http://www.ksl.com/ Devin

    I think some of the advice here should be adjusted:

    1) DO NOT LEAVE KeepAliveTimout AT 15!!! By taking it down to 2 or 3, you will easily triple the number of requests that your server can handle (assuming your processor is fast enough). Most people don’t realize this, but you are slitting your own throat by letting clients tie up valuable Apache processes for 15 seconds, even though they are just sitting there! When a Slashdotting comes down the wire, you need every available Apache process to be working as much as possible. Don’t believe me? When the Slashdotting does come, open up your /server-status page and look at the status of 95% of the processes — they will all be “K”, meaning they’re just sitting there, looking stupid and consuming valuable memory resources.

    2) The previous comment about Apache complaining about MaxClients being above 256 is correct; you have to change a value in the Apache source code to get it above that. Some distributions may or may not do that, so make sure you check your error_log when starting Apache to see whether increasing MaxClients will work.

    3) Don’t put MaxClients at 500 unless you know what you are doing. “I think 500 is about right for most people’s needs” is wrong, and could end up killing your site. Although it is partially dependent upon your CPU, the major factor here is typically how much available RAM you have. Comment #13 by Not an Admin was exactly correct — if each Apache process uses 8 MB of memory, then you’ll need multiple gigabytes of free memory. If you don’t have enough memory for all of those processes, then you’ll start swapping and be completely hosed. For a very rough guideline, assume that each Apache process will consume anywhere from 2 to 8 MB of memory. Divide that into your *available* RAM, and you’ll get a much better idea of what value to use for MaxClients.

    4) The article didn’t mention this at all, but you should remove any Apache modules that you don’t need. Every module you remove will slim down Apache at least a little, allowing you to slowly inch up that MaxClients value. This isn’t as valuable as you might think (due to the way Linux will share the module’s non-writable memory between processes), but it will help some.

    5) As mentioned by Saiyine: if you still have extra CPU cycles to spare, use mod_gzip — it will free up a bit of bandwidth, allowing you to service even more requests. Even better would be to pre-compress static pages on disk, but many people don’t have that luxury.

    6) Do NOT, under any condition, perform an SQL query on every hit to your web site. Trust me, you will kill yourself if you have to do even a light SQL query under Slashdotting conditions. Either don’t use a database, or find a way of caching SQL results so you don’t hit the database every single time someone views a page.

    There are many more things that you could do to improve performance, but this should get people thinking along the right lines. The primary lesson to learn is: memory is usually your most valuable resource when dealing with Apache — use it as efficiently as possible.

  • http://sablog.com/ Shanti

    And if all else fails, next time try Lighttpd.

    =)

    Do a few google searches for Lighttpd vs. Apache benchmarks and you’ll see why.

  • http://www.burek.co.yu Ivan Minic

    Great stuff!

  • http://www.oneunified.net Stargrazer

    There is one other trick, but I can’t recall the attribute. Basically, linux will update a ‘last accessed’ attribute on a file every time it is accessed. The key is to turn this off. This will turn a bunch of irrelevant disk writes off, and you’ll only need disk reads, which the OS can cache. So you get kind of a double improvement on harddrive utilization.

  • http://www.tranquillo.net/ Carl P. Corliss

    that attribute is ‘noatime’ and can be added to the options section of your /etc/fstab file. On a different topic but similar line to fs attributes, if you have a seperate partition for /tmp, you should set it to nodev,noexec,nosuid as well. Doing so will add a bit more protection against hackers dropping a zombie bot in your /tmp and running it.

    Cheers!,


    Carl

  • mikey

    This article is pretty much useless, except for the comments made by Devin.

  • http://www.thebluesmokeband.com Brian Sorrell

    Devin’s comments are very good. I think that Mikey speaks too glibly about the usefulness of this article: even if the httpd.conf information is not particularly helpful, the point about page size is a good point. Keeping things small is absolutely essential; I wish that more designers would appreciate this.

    It is importantly correct, in my estimation, about the necessity of caching — or more pointedly NOT running queries for every request. Without doing any math here, I’m thinking that an IDE drive spinning at 7200 will not be able to keep up with the necessary IO that would result from the vast quantity of requests that /. generates.

    Anyway, it’s always valuable to have a discussion about these sorts of things, so thanks for this Adam!

    Cheers,

    Brian.

  • http://www.GearHack.com/ Chieh Cheng

    Forgot to mention that KeepAlive defaults to off:

    #
    # KeepAlive: Whether or not to allow persistent connections (more than
    # one request per connection). Set to “Off” to deactivate.
    #
    KeepAlive Off

    At least it does in Server version: Apache/2.0.40
    Server built: Sep 4 2002 17:20:34

  • MikeTA

    A really useful one I set during a slashdoting is the Apache Expires directive. Set this for dirs that contain (mostly) static files like images, so that the browser is told the file expires (say) one week in the future. If your pages have lots of little images – bullet markers, that sort of stuff – this is a big win.

  • http://www.micheas.net/ Micheas

    Apache2 and Lighttpd seems to benchmark fairly close to each other. (as in both side can make reproduceable benchmarks in their favor,)

    Although there is a lot of reasons for us to avoid a monoculture, so I would not discourage using Lighttpd. It seems to be a solid server, but if you have a well tuned apache 2 server, you probably won’t see much difference in performance between them. (a well tuned zope server could our perform a badly tuned apache 1.3 server, But apache2 can scream if you spend some time tuning and benchmarking it. (personally I have a soft spot for servers like dhttpd, fnord, roxen, micro-httpd, lighttpd, mzscheme, bozohttpd, cherokee, mathopd, boa, yaws, aolserver, thttpd, caudium, and zope. But, I still user apache a fair amount. It is easy to setup, secure, and administrate. In addition to being well supported by web apps.

    Tuning your webserver, and shutting off all unneeded extensions making the content as static as possible, and caching everything you can will get you a lot of the way to being able to withstand heavy traffic. After that you are going to have to benchmark your changes, and figure out where your bottleneck is and go from there.

  • http://www.teledyn.com mrG

    +1 on that endorsement of Devin’s comments as the sane alternative to the advice in the original article, only I would add one further tip:

    Do not use .htaccess files in your most travelled directory as this causes twice the traffic for every page and graphic. Turn options off and set your config in httpd.conf (or conf.d) instead.

    also a caveat: When I added that CDM url extension to this post, the form-post would no longer work; that’s logical if you think about it (no site should accept form data that’s not from it’s own URL) but could really trip you up if you weren’t expecting it.

    Also, I notice the many endorsements for mod_gzip — I thought mod_gzip was dead! I can’t find any current maintained code; what I can find is a royal pain to install, and doesn’t like apache2.

    I use mod_deflate instead, nearly the same function, easy as a single line of config to install, and pre-included by most Linux distro bundles.

    Another noteworthy item from the original article: Is it my imagination, or did the author neglect to give us the URL of that Slash-dot-proof $150 server? Did I just miss it? …

  • http://justaddwater.dk/ Watson

    I have a server on a fast connection (100Mbit/s). The disks are very fast SCSI disks in a RAID 5 setup and the CPU is about 800Mhz. But I only have 256MB of memory!

    What is the best performance tuning solution in this senario when disk, internet connection and CPU is fast but RAM is low?

  • http://adrake.blogdns.com Adam Drake

    Watson:

    Option 1: Use Lighttpd, it has a very low memory footprint and is faster than Apache in many implementations.

    Option 2: If you must use Apache, make sure you load only essential modules, don’t set your MaxClients too high, etc. You can look at some sites that are geared towards that kind of thing, but in the end your best bet is to incorporate the things that work at those sites and see how your memory usage pans out. It will take some tweaking but the payoff will be worth it.

  • Nikesh

    How this perosn get qualified for writing an article on Apache

  • mauricev

    There doesn’t seem to be any reference to what the “ideal” worker MPM values should be

    (These are the defaults)

    StartServers 2
    MaxClients 150
    MinSpareThreads 25
    MaxSpareThreads 75
    ThreadsPerChild 25
    MaxRequestsPerChild 0

  • http://www.anthonycargile.com/ Anthony Cargile

    I was a slashdot effect victim recently. My website, thecoffeedek.com had a great story and got slashdotted, and here’s how it went with 5 load-balancing lighttpd web servers with an optimized-configuration apache2 as a proxy front (using the proxy mod that does load balancing):

    First, the server loads during the peak of the hits (around 12,000 users per second) never made the server load (according to htop) reach above 0.1, and the reason is because the 2Mbps bandwidth buckled almost instantly.

    Once the bandwidth was hosed, the servers had nothing to worry about since over half the clients couldn’t even get through. But the ones that did hardly even affected the cluster, which on a very high amount of images per page and multiple database hits per page (wordpress) is quite astounding. And the boxes aren’t all that great, they’re processing speeds and RAM are:
    Server 1(front end): 2x800MHz, 768MB
    Server 2: 2x800Mhz, 186MB
    Server 3: 1.6GHz, 512MB
    Server 4: 2.4GHz, 1GB
    Server 5: 1.6GHz, 384MB (RAMBUS, too)

    So the machines aren’t optimal, but they’re in a perfect layer 4 cluster, and if the bandwidth were higher they could have easily handled the loads with very little latency. I disabled mod_gzip (or whatever it is now) on account of cluster confusion and processing difficulties. And one last thing: server 1 also is the MySQL database server, NFS, and NIS for the other nodes, so for every reverse proxy he was also getting hit several times (will be fixed in the future), but in its defense it has 2 NICs, one for internet requests only, and one for internal DB/NFS/NIS requests, and again as the most monitored of the systems the load never topped 0.1 average, with 50 days of uptime (still inaccurate because kexec resets the uptime count).

    Thats just my 2 cents. If you visit thecoffeedesk.com you can see the server ID at the bottom, or just append #server to the URL. And in the future, a text only backup on coral is overdue.

    Thanks,
    Anthony

  • http://thecoffeedesk.com/news/ Anthony

    Actually, we just got slashdotted again, and this time you could not even notice a speed difference. The biggest thing I did was plug the server directly into the gateway, and bam, no more bandwidth problems on 2Mbps up line. Just make sure you have a firewall on the front end server enabled, and that combined with my above specs, an additional server thrown in, some page whitespace trimming and slight static content conversion made the site perform beautifully at thousands of hits per second, as I said no difference in speed at all!