Stress Testing: How to Determine if a Web Server Crashed or Hung

What is Stress Testing and Why You Need It

Stress testing allows you to measure your web application’s robustness and reliability beyond normal load. It is typically conducted to ensure the server does not crash under high load as well as to expose crashes or hangs due to concurrency issues, resource exhaustion, deadlocks and other conditions.

Some Expected Failure Conditions Under Stress

The load generated during stress testing is usually so high that it may cause the server to crash or hang:

  • A crash is a result of a critical error or an unhandled exception. The crashed server becomes entirely inaccessible or returns error messages instead of normal content.
  • A hang means the server is still up and running but unresponsive, because it is stuck somewhere. For example, it could be waiting for a database call that does not return in a timely fashion, or having a deadlock preventing it from processing new requests, or going through an infinite loop, and so on.

Crashed and hung servers can typically be brought back up by restarting them. Hung servers may also get back to normal when the cause of the hang has been eliminated.

General Procedure for Stress Testing

Stress test conditions are best achieved by starting with a light load and then increasing it gradually over a period of time. Unlike a sudden burst in traffic, which will likely be difficult for the server to initially handle, an increasing load will let the server “warm up” properly before the heavy load. The test should then run until anticipated peak load conditions are reached, or until the server crashes or stops responding.

With SmartBear LoadComplete, creating a stress test includes the following steps:

  1. Record one or more application usage scenarios for use in the stress test. A good approach is to start with a single scenario, to reduce complexity and simplify debugging, or you can mix a few (generally up to 4) representative scenarios to diversify the generated load and uncover possible concurrency issues. Note that using many different scenarios at once is not recommended, as it will be harder to troubleshoot any problems.
  2. Verify each of the captured scenarios to ensure they play back properly.
  3. Set up a test based on the scenarios. The key settings are as follows:
    1. Define the load profile for your test. For example, you could use the predefined stepwise ramp-up profile, or define a custom load curve (such as peaks). Typically you would want to use a long enough load increase step so that the server can get stable after the load increase.
    2. Define the load profile for your test. For example, you could use the predefined stepwise ramp-up profile, or define a custom load curve (such as peaks). Typically you would want to use a long enough load increase step so that the server can get stable after the load increase.
    3. Define virtual user groups, their characteristics (such as browser and connection speed) and select the scenarios for them to run. Specify the total (that is, maximum) number of virtual users based on your server’s anticipated peak load or greatly exceeding its normal operating capacity.

Once the stress test is configured, you can run it and monitor the server behavior.

Analyzing Stress Testing Results

LoadComplete allows you to monitor various server metrics in real time during the test run, and also generates reports after the test run completes.

The key metrics that may indicate problems include the following:

  • Response time, time to first byte, time to last byte. Unusually slow responses can indicate anomalies and potential problems on the server side, such as resource-intensive database queries.
  • Errors and warnings. These usually start to occur when the load reaches the level that exceeds your server’s maximum operating capacity, and go up as higher load causes more errors. Errors can include those returned by the server in the response headers, such as 500 Internal Server Error, 503 Service Unavailable and other codes in the 5xx range, as well as those reported by the tools, such as Connection closed.

    Figure 1. A spike of errors before the server crash

  • Response throughput (Kb/s or Mb/s). This shows how much data is being sent from the server and tends to go down in case of failures.

    For example, when the server crashes and the error rate increases, the response speed decreases as the error responses are small in size. You can see an illustration of these symptoms on the above graph from 0:25 to 0:30.

    When the server hangs and stops responding, the response speed tends to minimum as no responses are typically received at all.

  • Server-side metrics (such as CPU and memory usage). These help you understand how the server is doing under the load. Zero-value graphs of all metrics can also indicate the server hang, as in this case it’s impossible to retrieve any metrics from the server.
  • Quality of Service errors. These let you detect violations of your defined performance thresholds.

By correlating these metrics, you can see how well the server handled abnormal load, and conclude whether such load eventually caused the server to fail. For a more in-depth view into the server state during the failure, however, you will need to analyze the server logs. They will also help you identify the exact cause of failures.

Interested in doing stress testing? Download a free trial of LoadComplete and try it today, or register for a free webinar.


Close

By submitting this form, you agree to our
Terms of Use and Privacy Policy

Thanks for Subscribing

Keep an eye on your inbox for more great content.

Continue Reading

Add a little SmartBear to your life

Stay on top of your Software game with the latest developer tips, best practices and news, delivered straight to your inbox