Friday, April 24, 2009

The Most Common Performance Problem I See

At the Percona Performance Conference in Santa Clara this week, the first question an audience member asked our panel was, "What is the most common performance problem you see in the field?"

I figured, being an Oracle guy at a MySQL conference, this might be my only chance to answer something, so I went for the mic. Here is my answer.
The most common performance problem I see is people who think there's a most-common performance problem that they should be looking for, instead of measuring to find out what their actual performance problem actually is.
It's a meta answer, but it's a meta problem. The biggest performance problems I see, and the ones I see most often, are not problems with machines or software. They're problems with people who don't have a reliable process of identifying the right thing to work on in the first place.

That's why the definition of Method R doesn't mention Oracle, or databases, or even computers. It's why Optimizing Oracle Performance spends the first 69 pages talking about red rocks and informed consent and Eli Goldratt instead of Oracle, or databases, or even computers.

The most common performance problem I see is that people guess instead of knowing. The worst cases are when people think they know because they're looking at data, but they really don't know, because they're looking at the wrong data. Unfortunately, every case of guessing that I ever see is this worst case, because nobody in our business goes very far without consulting some kind of data to justify his opinions. Tim Cook from Sun Microsystems pointed me yesterday to a blog post that gives a great example of that illusion of knowing when you really don't.

Monday, April 13, 2009

Maxine Johnson

I want to introduce you to Maxine Johnson, assistant manager of men's sportswear at Nordstrom Galleria Dallas. The reason I think Maxine is important is because she taught my son and me about customer service. I met her several months ago. I still have her card, and I'm still grateful to her. Here's what happened.

A few months ago, my wife and I were in north Dallas with some time to spare, and I convinced her to go with me to pick out one or two pairs of dress slacks. I felt like I was wearing the same pants over and over again when I traveled, and I could use an extra pair or two. We usually go to Nordstrom for that, and so we did again. After some time, I had two pairs of trousers that we both liked, and so we had them measured for hemming and picked them up a few days later.

A week or two passed, and then I packed a pair of my new pants for a trip to Zürich. I put them on in the hotel the first morning I was supposed to speak at an event. On my few-block walk from the hotel to the train station, I caught my reflection in a store window, and—hmmp—my pants were just not... really... quite... long enough. Every step, the whole cuff would come way up above the tops of my shoes. I stopped and tugged them down, and then they seemed alright, but then as soon as I started walking again, they'd ride back up and look too short.

They weren't bad enough that anyone said anything, but I was a little self-consious about it. I kept tugging at them all day.

When I hung them back up in my closet at home, I noticed that when I folded them over the hanger, they didn't reach as far as the other pants that I really liked. Sure enough, when I lined up the waists, these new pants were about an inch shorter than my favorite ones that I had bought at Nordstrom probably four years ago.

Now, pants at Nordstrom cost a little more than maybe at a lot of other places, but they're worth to me what I pay for them because they're nice, and they last a long time. But these new ones made me feel bad, because they were just a little bit off. I could already foresee a future of two new pairs of slacks hanging in my closet for years, never really making the starting rotation because they're just a little bit off, but never making the garage sale pile, either, because they had cost too much.

My wife agreed. They were shorter than the others. They were shorter than they should be. I needed to get them fixed.

Now, this is the part I always hate. Having made the decision, the next step is that step where you take the thing back and try to get the problem fixed. I hate that part. My wife doesn't mind it so much, but these were my pants, and so I was the one that had to go back and put them on so someone could fix them. I really dreaded it though, because I knew that the only way they could fix those pants was to take off the cuff.

It's late in the evening by the time my wife helps me build up a little head of steam, and we both decide (well, she decides, but she's right) that tonight is the perfect night for me to go on a 20-mile drive across town to Nordstrom to get my pants fixed. As a matter of fact, it'd be good if my older boy went with me. That makes it a little more fun, because he's good company for me.

It's late enough by now that before I could leave, I had to phone ahead, just to make sure the store was still open. A nice lady answered the phone. I said my name and told the nice lady that I was having some trouble with some slacks I had bought a few weeks ago, and how late did they stay open? She told me to come right on over.

So my boy and I got into the car, and I drove right on over.

A half hour later, I walked into the store, thankful that the doors were still open, carrying two pairs of slacks on a hanger, with my son walking beside me. A smiling nice lady approached me as I entered the men's department. "Mr. Millsap?" Yes, I am. It surprises me anytime someone remembers my name from that one phase of the conversation where I say real fast, "My name is Cary Millsap, and blah blah blah blah blah," and tell my whole story. The person on the phone hadn't asked me again what my name was. She had caught it in the blur at the beginning of my story.

She proceeded to explain to me what was going to happen. I was going to try on the slacks in the dressing room. The tailor would be there waiting for me. She and the tailor would look them over. If there was enough fabric to make them longer, then they'd do that tonight. If there weren't, then she was going to find two new pairs of slacks for me, and the tailor would have them ready for me tomorrow. If for any reason, those didn't work, then she'd keep preparing new trousers for me until I was satisfied.

Mmm, ok. I was probably grinning a little bit by now, because this was pretty fantastic news. I wasn't going to have to get my pants de-cuffed. I was still a little nervous, though, that when I came out of the dressing room, everyone was going to look at me like, "So what's the problem? I don't see any problem. Those are long enough."

When I came out, Maxine Johnson crossed her arms, put her hand to her chin, shook her head a little, and immediately said something to the effect of, "Oh my, no. That won't do at all." So she brought me two new pairs, which I tried on, and which the tailor measured for me. She gave me a reclaim ticket for the next day. As usual, I had missed her name when she introduced herself as I first entered the men's department. (As you probably already figured out, I have a bad habit of not paying enough attention to that part of the conversation that I think of as "the blur.") I did have the good sense to ask for her business card, which is why I know her name is Maxine Johnson.

My boy and I talked the whole ride home that what we had seen that night had been some real, first-class retail customer care right there, and that we all knew where we'd be buying my next pairs of pants. When I had gotten into the car an hour or so before, I had been very apprehensive about what might happen. I had been especially nervous about how I'd perform during the proving-what's-wrong part of the project. But Maxine Johnson put me completely at ease during my experience. She didn't just do the right thing, she did it in such a manner that I felt glad the whole problem had happened. Here's the thing:
Maxine Johnson made me feel like it was not just ok that I brought the pants back for repair, she made me feel like she was delighted by the opportunity to show me what Nordstrom could do for me under pressure.
I hope that the way Maxine Johnson made me feel is the way that my employees and I make our customers feel. I hope it's the way my children make their customers feel someday when they go to work.

Thank you, Maxine Johnson. Thank you.

Wednesday, April 8, 2009

What would you do with 8 disks?

Yesterday, David Best posted this question at Oracle-L:
If you had 8 disks in a server what would you do? From watching this list I can see alot of people using RAID 5 but i'm wary of the performance implicatons. (http://www.miracleas.com/BAARF/)

I was thinking maybe RAID 5 (3 disks) for the OS, software and
backups. RAID 10 (4 disks + 1 hot spare) for the database files.

Any thoughts?
I do have some thoughts about it.

There are four dimensions in which I have to make considerations as I answer this question:
  1. Volume
  2. Flow
  3. Availability
  4. Change
Just about everybody understands at least a little bit about #1: the reason you bought 8 disks instead of 4 or 16 has something to do with how many bytes of data you're going to store. Most people are clever enough to figure out that if you need to store N bytes of data, then you need to buy N + M bytes of capacity, for some M > 0 (grin).

#2 is where a lot of people fall off the trail. You can't know how many disks you really need to buy unless you know how many I/O calls per second (IOPS) your application is going to generate. You need to ensure that your sustained IOPS rate on each disk will not exceed 50% (see Table 9.3 in Optimizing Oracle Performance for why .5 is special). So, if a disk drive is capable of serving N 8KB IOPS (your disk's capacity for serving I/O calls at your Oracle block size), then you better make sure that the data you put on that disk is so interesting that it motivates your application to execute no more than .5N IOPS to that disk. Otherwise, you're guaranteeing yourself a performance problem.

Your IOPS requirement gets a little trickier, depending on which arrangement you choose for configuring your disks. For example, if you're going to mirror (RAID level 1), then you need to account for the fact that each write call your application makes will motivate two physical writes to disk (one for each copy). Of course, those write calls are going to separate disks, and you better make sure they're going through separate controllers, too. If you're going to do striping with distributed parity (RAID level 5), then you need to realize that each "small" write call is going to generate four physical I/O calls (two reads, and two writes to two different disks).

Of course, RAID level 5 caching complicates the analysis at low loads, but for high enough loads, you can assume away the benefits of cache, and then you're left with an analysis that tell you that for write-intensive data, RAID level 5 is fine as long as you're willing to buy 4× more drives than you thought you needed. ...Which is ironic, because the whole reason you considered RAID level 5 to begin with is that it costs less than buying 2× more drives than you thought you needed, which is why you didn't buy RAID level 1 to begin with.

If you're interested in RAID levels, you should peek at a paper I wrote a long while back, called Configuring Oracle Server for VLDB. It's an old paper, but a lot of what's in there still holds up, and it points you to deeper information if you want it.

You have to think about dimension #3 (availability) so that you can meet your business's requirements for your application to be ready when its users need it. ...Which is why RAID levels 1 and 5 came into the conversation to begin with: because you want a system that keeps running when you lose a disk. Well, different RAID levels have different MTBF and MTTR characteristics, with the bottom line being that RAID level 5 doesn't perform quite as well (or as simply) as RAID level 1 (or, say 1+0 or 0+1), but RAID level 5 has the up-front gratification advantage of being more economical (unless you get a whole bunch of cache, which you pretty much have to, because you want decent performance).

The whole analysis—once you actually go through it—generally funnels you into becoming a BAARF Party member.

Finally, dimension #4 is change. No matter how good your analysis is, it's going to start degrading the moment you put your system together, because from the moment you turn it on, it begins changing. All of your volumes and flows will change. So you need to factor into your analysis how sensitive to change your configuration will be. For example, what % increase in IOPS will require you to add another disk (or pair, or group, etc.)? You need to know in advance, unless you just like surprises. (And you're sure your boss does, too.)

Now, after all this, what would I do with 8 disks? I'd probably stripe and mirror everything, like Juan Loaiza said. Unless I was really, really (I mean really, really) sure I had a low write-rate requirement (think "web page that gets 100 lightweight hits a day"), in which I would consider RAID level 5. I would make sure that my sustained utilization for each drive is less than 50%. In cases where it's not, I would have a performance problem on my hands. In that case, I'd try to balance my workload better across drives, and I would work persistently to find any applications out there that are wasting I/O capacity (naughty users, naughty SQL, etc.). If neither of those actions reduced the load by enough, then I'd put together a justification/requisition for more capacity, and I would brace myself to explain why I thought 8 disks was the right number to begin with.

Friday, April 3, 2009

Cary on Joel on SSD

Joel Spolsky's article on Solid State Disks is a great example of a type of problem my career is dedicated to helping people avoid. Here's what Joel did:
  1. He identified a task needing performance improvement: "compiling is too slow."
  2. He hypothesized that converting from spinning rust disk drives (thanks mwf) to solid state, flash hard drives would improve performance of compiling. (Note here that Joel stated that his "goal was to try spending money, which is plentiful, before [he] spent developer time, which is scarce.")
  3. So he spent some money (which is, um, plentiful) and some of his own time (which is apparently less scarce than that of his developers) replacing a couple of hard drives with SSD. If you follow his Twitter stream, you can see that he started on it 3/25 12:15p and wrote about having finished at 3/27 2:52p.
  4. He was pleased with how much faster the machines were in general, but he was disappointed that his compile times underwent no material performance improvement.
Here's where Method R could have helped. Had he profiled his compile times to see where the time was being spent, he would have known before the upgrade that SSD was not going to improve response time. Given his results, his profile for compiling must have looked like this:
100%  Not disk I/O
  0%  Disk I/O
----  ------------
100%  Total
I'm not judging whether he wasted his time by doing the upgrade. By his own account, he is pleased at how fast his SSD-enabled machines are now. But if, say, the compiling performance problem had been survival-threateningly severe, then he wouldn't have wanted to expend two business days' worth of effort upgrading a component that was destined to make zero difference to the performance of the task he was trying to improve.

So, why would someone embark upon a performance improvement project without first knowing exactly what result he should be able to expect? I can think of some good reasons:
  • You don't know how to profile the thing that's slow. Hey, if it's going to take you a week to figure out how to profile a given task, then why not spend half that time doing something that your instincts all say is surely going to work?
  • Um, ...
Ok, after trying to write them all down, I think it really boils down to just one good reason: if profiling is too expensive (that is, you don't know how, or it's too hard, or the tools to do it cost too much), then you're not going to do it. I don't know how I'd profile a compile process on a Microsoft Windows computer. It's probably possible, but I can't think of a good way to do it. It's all about knowing; if you knew how to do it, and it were easy, you'd do it before you spent two days and a few hundred bucks on an upgrade that might not give you what you wanted.

I do know that in the Oracle world, it's not hard anymore, and the tools don't cost nearly as much as they used to. There's no need anymore to upgrade something before you know specifically what's going to happen to your response times. Why guess... when you can know.

Tuesday, March 31, 2009

Last call for C. J. Date course

Note added 3 April 2009: When I wrote this post, we were counting down toward 2 April as the date for our preliminary go/no-go decision. That date is now behind us, and we have made the preliminary decision to Go. We are accepting further enrollments. —Cary Millsap
Thursday 2 April 2009 is our last call for enrollment in C. J. Date's course, "How to write correct SQL, and know it: a relational approach to SQL." I'm looking forward to this course more eagerly than anything I've attended in the past ten years, ...maybe twenty.

SQL and I never really got along too well. When I first joined Oracle Corporation in 1989, I was new to relational databases. I had done one hierarchical database project in college. I enjoyed the project ok, but it wasn't something I ever wanted to do again. When I joined Oracle, I didn't know much about relational technology or SQL. In my formative first couple of years at Oracle, though, I just never learned to like the SQL language. Prior to my Oracle career, I designed languages and wrote compilers for a living. From a language design standpoint, it just seemed that SQL (at least "Oracle SQL") could have become something really cool, but it didn't. For Oracle to treat an empty string as NULL, for example, is a decision which I still can't believe made it into the light of day...

I had a lot of respect over the years for the people I met who knew how to make SQL do what they wanted it to do. Dominic Delmolino was one of the first people I ever met who could make SQL do things I had no idea it could do. I'm still amazed when I see the things that Tom Kyte can do with SQL. I was never one of the SQL people.

Lex de Haan is the first person I ever met who really revealed to me what my problem was. A few years ago, Lex delivered a Miracle presentation in Rødby, Denmark, that dropped my jaw. He explained a better way to write an application with SQL. He showed how to write a completely unambiguous specification using a language I understood, predicate calculus ("this set equals that set," that kind of thing). He then showed how to implement that specification in SQL.

Here's the problem, though. SQL doesn't implement many of the set-theory/predicate-calculus operations that I expect. I'm not looking at Lex's notes as I write this, so I'll show you an example outlined recently by Toon Koppelaars, Lex's coauthor on the brilliant book called Applied Mathematics for Database Professionals (Expert's Voice).

In SQL, there's no "set equality" operator. That's right, although SQL is a set processing language, it has no operator for testing whether one set A equals another set B. But set equality "A = B" can be rewritten as "(A is-a-subset-of B) and (B is-a-subset-of A)".

Unfortunately, SQL doesn't have an is-a-subset-of operator either. But "A is-a-subset-of B" can be rewritten into "A minus B = the-empty-set".

But SQL also lacks the concept of an empty set. The way to express that is to test whether the cardinality of a set is zero, as in "count(*)=0".

Over the course of an hour-long presentation, Lex showed me a dozen or so operators that are missing from SQL, which we really need for expressing our intentions clearly in SQL. He put structure around the negative feelings I had toward the language. And then he showed an equivalent translation for each missing operator that could be implemented in SQL, which invested back into the language a new power. That's the trick that caused my jaw to fall. In Lex's presentation, the game of writing applications in SQL went from this:
  1. Implement complex thoughts in crappy language that requires me to record my thoughts in a format that doesn't much resemble my thinking.
  2. Worry whether the implementation was really right.
...to this:
  1. Record complex thoughts using a language designed well to record exactly such thoughts.
  2. Translate the specification of the program into SQL, using translation patterns.
Since our predicate calculus expressions were explicit enough to be provable, and since we could prove the correctness of the translations we were using to move from our specification to our SQL, we could actually then prove the correctness of our SQL. It was a beam of hope that developers could actually write correct applications ...and know it!

That's the first day I ever got excited thinking about SQL.

So, on April 27–29 in Dallas, I'll get a chance to enter the next phase of that thinking. On top of that, the message will be delivered by Chris Date, who I really enjoyed at the Hotsos Symposium earlier this month, and who is one of the pioneers who invented the whole world our careers live in. I'm looking to forward to it. It should be an interesting classroom, with Chris Date in the front and Karen Morton, Jeff Holt, and some others with me in the back. I hope you won't miss the opportunity.

Like I said, the final day to sign up is this Thursday 2 April 2009. I know that economic times are tough these days, but this is a one-of-a-kind education event that I believe will deliver lasting value to everyone who goes.

Friday, February 27, 2009

Dad, do I really need math?

My kids are pretty good about their math homework. They seem to enjoy it for the most part. It wasn't always that way. When the going gets tough, the natural human response, it seems, is to quit. So at times in our kids' school careers, their Mom and I have had to hang tough with them to try to make them do their homework. (The credit here belongs to their Mom.)

I remember when I was in school, the prevailing attitude in the classroom was, "When are we ever going to need to know this?" The much sadder one was, "My Mom and Dad said that I'm never going to need to know this stuff."

I couldn't have told you, when I was 10 years old, that I'd need to understand queueing theory one day in order to finish an Oracle project I had to do for Fidelity Investments. Or that I'd be able to win a Jim Sundberg autographed World Series baseball by using the distributive law of multiplication in my head while he was showing 400 people how Gaylord Perry liked his signs on the mound. It didn't matter to me, because I just had faith that there was a good reason I was supposed to learn everything I could in school. Having that particular faith was no accident.

I don't remember my Mom and Dad ever forcing me into doing math. I knew, of course, that it was My Job to do as well as I could in school ('A's are loafing unless they're '100's). But I don't remember ever feeling forced.

One of the things I fondly remember my Dad doing with me was glide slope calculation. Dad flew for many years for United Airlines. He retired as a 767 captain a long time ago. One of his priorities as a professional was to conserve fuel for his employer. It used to bug him when a pilot would constantly monkey around with the throttle during the approach to a landing. My Dad told me his goal on approach was to dial back the power one time at cruise altitude, at the very beginning of the descent, and then never touch it again until he turned on the thrust reversers after touchdown.

So he played this game with me, especially on car rides, because it was a 30-minute drive each day to where I went to grade school. He'd give me the altitude we were at and the altitude we needed to descend to, and either a time limit or the number of miles outbound we were. Then he'd ask me to calculate the sink rate in my head. He put triangles into my brain that I could see every time he asked me a question like that, and I'd hatch on it with him until we came up with the right sink rate. Or he would ask me things like, if the nose is pointing to heading 026, then what heading is our tail pointed at. So he put circles into my brain, too.

Every once in a while—oh, and I loved this—he would give me a paper flight plan form, with dozens of tiny cells to fill in, and I would fill them all in. I was 6 or 7 when we was doing that. I of course didn't know how to do it correctly, but I filled it all in anyway. Whenever I was really worried about doing it "right," I'd ask my Dad, and he'd tell me the kinds of things I should write down and which cells I should write them in.

You know the biggest value of that flight planning experience? It was that I couldn't wait to find out in school someday what point meant. You know, as in "three point five." I remember the day in class when a teacher finally taught us about decimal points. I felt sooo cool because now I knew what "three point five" actually meant.

My Dad did things with me that got me interested and excited about doing math, all on my own, without making me feel like I was being punished by it. Thus the abundance of wonderful opportunities that I have today are largely a continuing gift from him. I hope that another gift he gave me is the ability to be a good enough dad myself for my own kiddos, but of course I worry that I'm not doing it enough, or well enough. Telling stories about it helps remind me how important it is.

What reminded me of all this is a little document called "A Short Course in Human Relations," autographed by Bobby Bragan. It sits here in the foyer of our Method R office. I see it every single time I walk through our door. You've probably heard the following statement:
Say you were standing with one foot in the oven and one foot in an ice bucket. According to the percentage people, you should be perfectly comfortable.
Bobby Bragan said that; I think it was in 1963. It is a classic illustration of skew, which is vitally important to my career. Bobby Bragan, though, is an American hero for lots of good reasons. You should read about him.

Well, one night a few years ago, I got to watch Bobby Bragan speak to a small group. His talk was fascinating. He brought a huge box of stuff up to the podium with him, and he warmed up with a game. He opened by pulling something out of the box and saying whoever can answer this riddle gets the prize. The first one was something like, "What has eighteen legs and two breasts?" Shocker, right? The answer was The Supreme Court. Whoever said that, Bobby Bragan tossed him the first prize of the night.

Pretty deep into his speech, he must have given out twenty prizes to people. Not me. I either didn't know the answer, or I didn't say it loud enough or fast enough. I watched prize after prize go out, until he brought out this autographed document called "A Short Course in Human Relations." He read it aloud. It was an important part of his speech. And then he asked the question that went with it: "Nine ballplayers come out of the dugout before each game, and each ballplayer shakes the hand of every teammate. How many handshakes is that?" The voice that said "thirty-six" was mine. I was doggone lucky that Bobby Bragan had asked a bunch of baseball players a math question, and right on the prize that I really wanted, too.

Math. You really never know when you're going to need it.

Friday, February 20, 2009

Dang it, people, they're syscalls, not "waits"...

So many times, I see people get really confused about how to attack an Oracle performance problem, resulting in thoughts that look like this:
I don’t understand why my program is so slow. The Oracle wait interface says it’s just not waiting on anything. ?
The confusion begins with the name "wait event." I wish Oracle hadn't called them that. I wish instead of WAIT in the extended SQL trace output, they had used the token SYSCALL. Ok, that's seven bytes of trace data instead of just four, so maybe OS instead of WAIT. I wish that they had called v$session_wait either v$session_syscall or v$session_os .

Here's why. First, realize that an Oracle "wait event" is basically the instrumentation for one operating system subroutine call ("syscall"). For example, the Oracle event called db file sequential read: that's instrumentation for a pread call on our Linux box. On the same system, a db file scattered read covers a sequence of two syscalls: _llseek and readv (that's one reason why I said basically at the beginning of this paragraph). The event called enqueue: that's a semtimedop call.

Second, the word wait is easy to misinterpret. To the Oracle kernel developer who wrote the word WAIT into the Oracle source code, the word connoted the duration that the code path he was writing would have to "wait" for some syscall to return. But to an end-user or performance analyst, the word wait has lots of other meanings, too, like (to name just two):
  1. How long the user has to wait for a task to complete (this is R in the R = S + W equation from queueing theory).
  2. How long the user's task queues for service on a specific resource (this is W in the R = S + W equation from queueing theory).
The problem is that, as obvious and useful as these two definitions seem, neither one of them means what the word wait means in an Oracle context, which is:
wait n. In an Oracle context, the approximate response time of what is usually a single operating system call (syscall) executed by an Oracle kernel process.
That's a problem. It's a big problem when people try to stick Oracle wait times into the W slot of mathematical queueing models. Because they're not W values; they're R values. (But they're not the same R values as in #1 above.)

But that's a digression from a much more important point: I think the word wait simply confuses people into thinking that response time is something different than what it really is. Response time is simply how long it takes to execute a given code path.
To understand response time, you have to understand code path.
This is actually the core tenet that divides people who "tune" into two categories: people who look at code path, and people who look at system resources.

Here's an example of what code path really looks like, for an Oracle process:
begin prepare (dbcall)
  execute Oracle kernel code path (mostly CPU)
  maybe make a syscall or two (e.g., "latch: library cache")
  maybe even make recursive prepare, execute, or fetch calls (e.g., view resolution)
end prepare
maybe make a syscall or two (e.g., "SQL*Net message...")
begin execute (another dbcall)
  execute Oracle kernel code path
  maybe make some syscalls (e.g., "db file sequential read" for updates)
end execute
maybe make a syscall or two
begin fetch (another dbcall)
  execute Oracle kernel code path (acquire latches, visit the buffer cache, ...)
  maybe make some syscalls (e.g., "db file...read")
end fetch
make a syscall or two
The trick is, you can't see this whole picture when you look at v$whatever within Oracle. You have to look at a lot of v$whatevers and do a lot of work reconciling what you find, to come up with anything close to a coherent picture of your code path.

But when you look at the Oracle code path, do you see how the syscalls just kind of blend in with the dbcalls? It's because they're all calls, and they all take time. It's non-orthogonal thinking to call syscalls something other than what they really are: just subroutine calls to another layer in the software stack. Calling all syscalls waits diminishes the one distinction that I think really actually is important; that's the distinction between syscalls that occur within dbcalls and the syscalls that occur between dbcalls.

It's the reason I like extended SQL trace data so much: it lets me look at my code path without having to spend a bunch of extra time trying to compose several different perspectives of performance into a coherent view. The coherent view I want is right there in one place, laid out sequentially for me to look at, and that coherent view fits what the business needs to be looking at, as in...

Scene 1:
  • Business person: Our TPS Report is slow.
  • Oracle person: Yes, our system has a lot of waits. We're working on it.
  • (Later...) Oracle person: Great news! The problem with the waits has been solved.
  • Business person: Er, but the TSP Report is still slow.
Scene 2:
  • Business person: Our TPS Report is slow.
  • Oracle person: I'll look at it.
  • (Later...) Oracle person: I figured out your problem. The TPS Report was doing something stupid that it didn't need to do. It doesn't anymore.
  • Business person: Thanks; I noticed. It runs in, like, only a couple seconds now.