Beware of claims worded like that.
Whenever I see “...improve performance 1000%,” I have to do extra work to decode what the author has encoded in his tidy numerical package with a percent-sign bow. The two performance improvement formulas that make sense to me are these:
- Improvement = (b – a)/b, where b is the response time of the task before repair, and a is the response time of the task after repair. This formula expresses the proportion (or percentage, if you multiply by 100%) of the original response time that you have eliminated. It can’t be bigger than 1 (or 100%) without invoking reverse time travel.
- Improvement = b/a, where b and a are defined exactly as above. This formula expresses how many times faster the after response time is than the before one.
Any time you see a ‘%’ character, beware: you’re looking at a ratio. The principal benefit of ratios is also their biggest flaw. A ratio conceals its denominator. That, of course, is exactly what ratios are meant to do—it’s called normalization—but it’s not always good to normalize. Here’s an example. Imagine two SQL queries A and B that return the exact same result set. What’s better: query A, with a 90% hit ratio on the database buffer cache? or query B, with a 99% hit ratio?
Query | Cache hit ratio |
---|---|
A | 90% |
B | 99% |
As tempting as it might be to choose the query with the higher cache hit ratio, the correct answer is...
There’s not enough information given in the problem to answer. It could be either A or B, depending on information that has not yet been revealed.Here’s why. Consider the two distinct situations listed below. Each situation matches the problem statement. For situation 1, the answer is: query B is better. But for situation 2, the answer is: query A is better, because it does far less overall work. Without knowing more about the situation than just the ratio, you can’t answer the question.
Situation 1 | |||
---|---|---|---|
Query | Cache lookups | Cache hits | Cache hit ratio |
A | 100 | 90 | 90% |
B | 100 | 99 | 99% |
Situation 2 | |||
---|---|---|---|
Query | Cache lookups | Cache hits | Cache hit ratio |
A | 10 | 9 | 90% |
B | 100 | 99 | 99% |
Because a ratio hides its denominator, it’s insufficient for explaining your performance results to people (unless your aim is intentionally to hide information, which I’ll suggest is not a sustainable success strategy). It is still useful to show a normalized measure of your result, and a ratio is good for that. I didn’t say you shouldn’t use them. I just said they’re insufficient. You need something more.
The best way to think clearly about performance improvements is with the ratio as a parenthetical additional interesting bit of information, as in:
- I improved response time of T from 10s to .1s (99% reduction).
- I improved throughput of T from 42t/s to 420t/s (10-fold increase).
Even authors who give you b and a have a nasty habit of leaving off the T, which is far worse even than leaving off the before and after numbers, because it implies that using their magic has improved the performance of every task on the system by exactly the same proportion (either p% or n-fold), which is almost never true. That is because it’s rare for any two tasks on a given system to have “similar” response time profiles (defining similar in the proportional sense). For example, imagine the following quite dissimilar two profiles:
Task A | |||
---|---|---|---|
Response time | Resource | ||
100% | Total | ||
90% | CPU | ||
10% | Disk I/O |
Task B | |||
---|---|---|---|
Response time | Resource | ||
100% | Total | ||
90% | Disk I/O | ||
10% | CPU |
No single component upgrade can have equal performance improvement effects upon both these tasks. Making CPU processing 2× faster will speed up task A by 45% and task B by 5%. Likewise, making Disk I/O processing 10× faster will speed up task A by 9% and task B by 80%.
For a vendor to claim any noticeable, homogeneous improvement across the board on any computer system containing tasks A and B would be an outright lie.