Friday, April 3, 2009

Cary on Joel on SSD

Joel Spolsky's article on Solid State Disks is a great example of a type of problem my career is dedicated to helping people avoid. Here's what Joel did:
  1. He identified a task needing performance improvement: "compiling is too slow."
  2. He hypothesized that converting from spinning rust disk drives (thanks mwf) to solid state, flash hard drives would improve performance of compiling. (Note here that Joel stated that his "goal was to try spending money, which is plentiful, before [he] spent developer time, which is scarce.")
  3. So he spent some money (which is, um, plentiful) and some of his own time (which is apparently less scarce than that of his developers) replacing a couple of hard drives with SSD. If you follow his Twitter stream, you can see that he started on it 3/25 12:15p and wrote about having finished at 3/27 2:52p.
  4. He was pleased with how much faster the machines were in general, but he was disappointed that his compile times underwent no material performance improvement.
Here's where Method R could have helped. Had he profiled his compile times to see where the time was being spent, he would have known before the upgrade that SSD was not going to improve response time. Given his results, his profile for compiling must have looked like this:
100%  Not disk I/O
  0%  Disk I/O
----  ------------
100%  Total
I'm not judging whether he wasted his time by doing the upgrade. By his own account, he is pleased at how fast his SSD-enabled machines are now. But if, say, the compiling performance problem had been survival-threateningly severe, then he wouldn't have wanted to expend two business days' worth of effort upgrading a component that was destined to make zero difference to the performance of the task he was trying to improve.

So, why would someone embark upon a performance improvement project without first knowing exactly what result he should be able to expect? I can think of some good reasons:
  • You don't know how to profile the thing that's slow. Hey, if it's going to take you a week to figure out how to profile a given task, then why not spend half that time doing something that your instincts all say is surely going to work?
  • Um, ...
Ok, after trying to write them all down, I think it really boils down to just one good reason: if profiling is too expensive (that is, you don't know how, or it's too hard, or the tools to do it cost too much), then you're not going to do it. I don't know how I'd profile a compile process on a Microsoft Windows computer. It's probably possible, but I can't think of a good way to do it. It's all about knowing; if you knew how to do it, and it were easy, you'd do it before you spent two days and a few hundred bucks on an upgrade that might not give you what you wanted.

I do know that in the Oracle world, it's not hard anymore, and the tools don't cost nearly as much as they used to. There's no need anymore to upgrade something before you know specifically what's going to happen to your response times. Why guess... when you can know.

2 comments:

Joel Garry said...

I think what this really demonstrates is the difficulty of defining the problem when it spans more than a single task - and deconstructing problems to single tasks may be quite misleading.

In that other Joel's example, the compile time issue was well-defined - but not the main business issue, which of course was not well-defined, and perhaps may never be and never even need to be - "make computers go faster." Method-R would have been fine and appropriate for the compiler speed issue, and still is. Maybe he'll see your post and give it a shot.

I'm sure you've seen this all the time, as I have - the business complains, say, that exports take too long. So the first thing you as an Oracle professional notice is they think exports are backups. Is it appropriate to start using tools to figure out why the exports are too slow - or do you want to define a more important issue, you are perhaps about to work on a system with no proper backups?

I think you need a step 0 in method-R: Is there even a capability to properly evaluate which task is most important? Is the system in a realistic ballpark? Does the business even care about what they've asked for?

So, will he ever get the compile times faster?

word: raydri
word: credly
word: chiess

Cary Millsap said...

It's all about context and relevance, and the talent for understanding these attributes is rare and precious.