Debugging an API Performance Problem from the Real World
I see companies that survive with too few customers, then fall apart precisely when apparent success arrives: paying users start to depend on a product, response times wobble and stumble, and suddenly things become slow, ugly, and out-of-control.
True life examples illustrate a few of the principles that will help you get through both organization-wide performance crises and small daily slowdown puzzles.
Why am I so wary of growing beyond capacity? As bad as a sales slowdown is, it's survivable; it only takes cash to last, cash that's best regarded as an investment in practicing good operations and service. Cash can come from anywhere. Slowdowns are an opportunity to stay in close contact with best customers and refine offerings.
When an API can't keep up with demand, it poisons the attitude of the customers you worked so hard to acquire.
Developers think of software as an asset; any realistic accounting, though, demonstrates that the trust of customers far outweighs the value of a particular implementation. Developers can rewrite code; regaining client confidence, though, takes longer and is harder to control.
What is special about APIs?
Savvy programmers have already published several good books on performance remedies. A few paragraphs here can't replace those resources. What I can do, though, is pass on tips that the API fraternity too often neglects.
First is to think about an API in a couple of different aspects: how is it like other programming, and how does it differ? An API is a software product, and all our usual techniques for diagnosis and repair apply: profiling, source inspection, logging, decoration with assertions, talking to the bear (or rubber duck, depending on your culture), divide-and-conquer, scientific method, and so on.
API work differs from most application programming, though, and is more like embedded and/or system-level coding, in that customers are more likely to be machine processes, than other humans. One practical consequence: while load on a Web site read by humans can easily jump by a factor of ten or a hundred during peaks, use of an API explodes by a factor of a million or more when a rogue client appears.
Simple, robust schemes to throttle demanding consumers and to share resources among users who differ astronomically in their demands are crucial API design requirements.
Combinatorics of API testing
Automated API load testing is a necessary part of any solution; human intuition simply doesn't measure up to the enormous factors that crop up in API work. Clarity about testing is equally important. It starts by remembering that testing helps us find and eliminate errors, not prove that they're absent.
This insight is particularly important in the combinatorics of testing. I recently was involved in a couple of sophisticated systems that passed all their functional tests (verification of the equivalent of "2 + 2 = 4" on an end-to-end basis), load tests (adequate responsiveness over appropriate ranges of loads), unit tests, and so on. They both went into production ... and problems were immediate.
The two systems were utterly different in many regards: different organizations, different computer languages, different customer bases, and so on. Both times, though, a surprise in test combination caught our teams.
In the first, responsiveness was great: the system perfectly updated documents in real-time ... with inaccurate data. Although our system was clever enough not to let load degrade performance, load did introduce a race condition that effectively corrupted the computation we had so carefully tested.
The second, completely unrelated system was perfectly accurate at all loads ... but took so long with specific combinations of computations that clients timed out. Again, we had tested the system under loads that far exceeded production conditions — but not the particular combinations of calculations that triggered system resource constraints.
Our thorough testing beforehand paid off, of course; we knew a lot about where not to find the problems. While finding the real errors wasn't easy, we were quick enough at it to patch together solutions that kept customers whole.
Is your team equipped to handle API performance problems?
No single slogan guarantees API performance success: it requires a combination of a strong design, disciplined and sophisticated testing, careful monitoring, and a smart team able to respond to any problems that leak through all the other protections in place. "Hot spots" that plague many organizations include:
- Clumsy database access, often through an ORM which masks performance consequences. Make sure you have a data expert for trustworthy performance advice.
- Testing which neglects combinations of circumstances. Load testing, for instance, needs its own plan for test data.
- Misapplied effort. The best teams don't waste resources on pointless tests, but wisely combine testing and inspections and monitoring and other techniques in a balanced way that sniffs out problems as early as possible.
Cut yourself some slack: don’t over-promise API capabilities based on incomplete testing, nor design a system so delicate that it collapses when a surprise arrives. API work always brings us surprises.
Ensuring API Speed & Performance
According to SmartBear's State of API 2016 Report, less than 10% of API performance problems are resolved within 24 hours. One of the best ways to avoid API performance problems is to test and monitor your API for performance issues.
Whether you’re launching an API of your own, or are concerned about the third party APIs that power your applications, you need to understand how your APIs are performing. In this eBook, Ensuring API Speed & Performance: A Guide to API Testing and Monitoring for the Connected World, we look at two of the most important processes for ensuring the performance of your API both during development and in production — API load/performance testing and API monitoring.
In this guide, you will learn:
- The benefits of each of these API performance strategies
- Implementing API load testing and monitoring
- Using testing and monitoring together
- Finding the right tools for ensuring API performance
Get your copy.