Cary Millsap: stop guessing

Showing posts with label stop guessing. Show all posts

Friday, April 24, 2009

The Most Common Performance Problem I See

At the Percona Performance Conference in Santa Clara this week, the first question an audience member asked our panel was, "What is the most common performance problem you see in the field?"

I figured, being an Oracle guy at a MySQL conference, this might be my only chance to answer something, so I went for the mic. Here is my answer.

The most common performance problem I see is people who think there's a most-common performance problem that they should be looking for, instead of measuring to find out what their actual performance problem actually is.

It's a meta answer, but it's a meta problem. The biggest performance problems I see, and the ones I see most often, are not problems with machines or software. They're problems with people who don't have a reliable process of identifying the right thing to work on in the first place.

That's why the definition of Method R doesn't mention Oracle, or databases, or even computers. It's why Optimizing Oracle Performance spends the first 69 pages talking about red rocks and informed consent and Eli Goldratt instead of Oracle, or databases, or even computers.

The most common performance problem I see is that people guess instead of knowing. The worst cases are when people think they know because they're looking at data, but they really don't know, because they're looking at the wrong data. Unfortunately, every case of guessing that I ever see is this worst case, because nobody in our business goes very far without consulting some kind of data to justify his opinions. Tim Cook from Sun Microsystems pointed me yesterday to a blog post that gives a great example of that illusion of knowing when you really don't.

Friday, April 3, 2009

Cary on Joel on SSD

Joel Spolsky's article on Solid State Disks is a great example of a type of problem my career is dedicated to helping people avoid. Here's what Joel did:

He identified a task needing performance improvement: "compiling is too slow."
He hypothesized that converting from spinning rust disk drives (thanks mwf) to solid state, flash hard drives would improve performance of compiling. (Note here that Joel stated that his "goal was to try spending money, which is plentiful, before [he] spent developer time, which is scarce.")
So he spent some money (which is, um, plentiful) and some of his own time (which is apparently less scarce than that of his developers) replacing a couple of hard drives with SSD. If you follow his Twitter stream, you can see that he started on it 3/25 12:15p and wrote about having finished at 3/27 2:52p.
He was pleased with how much faster the machines were in general, but he was disappointed that his compile times underwent no material performance improvement.

Here's where Method R could have helped. Had he profiled his compile times to see where the time was being spent, he would have known before the upgrade that SSD was not going to improve response time. Given his results, his profile for compiling must have looked like this:

100%  Not disk I/O
  0%  Disk I/O
----  ------------
100%  Total

I'm not judging whether he wasted his time by doing the upgrade. By his own account, he is pleased at how fast his SSD-enabled machines are now. But if, say, the compiling performance problem had been survival-threateningly severe, then he wouldn't have wanted to expend two business days' worth of effort upgrading a component that was destined to make zero difference to the performance of the task he was trying to improve.

So, why would someone embark upon a performance improvement project without first knowing exactly what result he should be able to expect? I can think of some good reasons:

You don't know how to profile the thing that's slow. Hey, if it's going to take you a week to figure out how to profile a given task, then why not spend half that time doing something that your instincts all say is surely going to work?
Um, ...

Ok, after trying to write them all down, I think it really boils down to just one good reason: if profiling is too expensive (that is, you don't know how, or it's too hard, or the tools to do it cost too much), then you're not going to do it. I don't know how I'd profile a compile process on a Microsoft Windows computer. It's probably possible, but I can't think of a good way to do it. It's all about knowing; if you knew how to do it, and it were easy, you'd do it before you spent two days and a few hundred bucks on an upgrade that might not give you what you wanted.

I do know that in the Oracle world, it's not hard anymore, and the tools don't cost nearly as much as they used to. There's no need anymore to upgrade something before you know specifically what's going to happen to your response times. Why guess... when you can know.

Monday, June 9, 2008

Why Guess? When You Can Know

The comment from plαdys on my post about flash drives and databases is a great entry into exactly the right conversation to be having. He wrote:

What about Data Warehouse type databases? Lots of full table scans, less use of cache (not sure if that's true)...

The traditional approach to having this conversation is for a few enthusiastic participants to argue over what happens "most of the time" or what "can happen." I used to participate in conversations like that at conferences, on panels, and in email. Nowadays, people have conversations like this in newsgroups and blogs.

I'll submit to you that the conversation about what "can happen" and subsequent arguments about what happens "most of the time" are irrelevant. Here's why. Imagine that a conversation like that converged to a state where one person was able to argue successfully that "in precisely 99% of cases, some proposition P is true." That never happens, but go with me for a second; imagine it did.

So, now, is P true for you? Most people seem to assume that the things that happen to them are like the typical things that happen to most people. Most people would think, "If P is that common, then surely P is true for me." Maybe it is. But then again, maybe it's not. The most likely situation is that P is true for some tasks running within your system, but it's not true for others. Whether P is true for the most important tasks on your system is simply a game of chance if you're like most people who have this conversation.

My point is: Why guess? When you can know. When it comes to Amdahl's Law, you should be able to know exactly how much response time of an individual business task is being consumed upon the component of your system that you're thinking about upgrading. If you want to see an example of what it looks like, look at our Profiler page.

With Oracle systems, though, the traditional approach is not to think that way. The traditional approach is to look at measurements upon the resources that comprise a system (its CPUs, its memory, its disks, its network), not measurements upon the business tasks that the company who owns the machine is spending its money to perform.

My whole point is that if you look at the response time of your business's most important tasks (that's what Method R is all about), then you don't have to care about conversations about other people's systems, or whether your system is typical enough to follow other people's advice. You won't have to guess about stuff like that, because you'll know the specific nature and needs of your system, regardless of whether your system happens to be like anyone else's.

Stop guessing. You can know. But you have to be willing to look at performance from the outside in, from the perspective of the task being processed, not from the traditional inside-out perspective of the resources doing the work.

Wednesday, April 16, 2008

Stop Guessing

Here are two new YouTube videos featuring Anjo Kolk and me. These were filmed at the Miracle SQL Server Open World event held April 3, 4, and 5 at Lalandia in Rødby, Denmark.

Mogens Nørgaard introduced these videos with the statement, "May I present the new Miracle TV concept called SlowTV.... where people don't have to stare into the lens while trying to remember what they were talking about, but can roam freely with their eyes, just like in real life."