Cary Millsap: queueing theory

Showing posts with label queueing theory. Show all posts

Tuesday, August 15, 2017

Words I Don’t Use, Part 5: “Wait”

The fifth “word I do not use” is the Oracle technical term wait.

The Oracle Wait Interface

In 1991, Oracle Corporation released some of the most important software instrumentation of all time: the “wait” statistics that were implemented in Oracle 7.0. Here’s part of the story, in Juan Loaiza’s words, as told in Nørgaard et. al (2004), Oracle Insights: Tales of the Oak Table.

This stuff was developed because we were running a benchmark that we could not get to perform. We had spent several weeks trying to figure out what was happening with no success. The symptoms were clear—the system was mostly idle—we just couldn’t figure out why.

We looked at the statistics and ratios and kept coming up with theories, the trouble was that none of them were right. So we wasted weeks tuning and fixing things that were not the problem. Finally we ran out of ideas and were forced to go back and instrument the code to figure out what the problem was.

Once the waits were instrumented the problem was diagnosed in minutes. We were having “free buffer” waits because the DBWR was not writing blocks fast enough. It’s amazing how hard that was to figure out with statistics, and how easy it was to figure out once the waits were instrumented.

...In retrospect a lot of the names could be greatly improved. The wait interface was added after the freeze date as a “stealth” project so it did not get as well thought through as it should have. Like I said, we were just trying to solve a problem in the course of a benchmark. The trouble is that so many people use this stuff now that if you change the names it will break all sorts of thing tools, so we have to leave them alone.

Before Juan’s team added this code, the Oracle kernel would show you only how much time its user calls (like parse, exec, and fetch) were taking. The new instrumentation, which included a set of new fixed views like v$session_wait and new WAIT lines in our trace files, showed how much time Oracle’s system calls (like reads, writes, and semops) were taking.

The Working-Waiting Model

The wait interface begat a whole new mental model about Oracle performance, based on the principle of working versus waiting:

Response Time = Service Time + Wait Time

In this formula, Oracle defines service time as the duration of the CPU used by your Oracle session (the duration Oracle spent working), and wait time as the sum of the durations of your Oracle wait events (the duration that Oracle spent waiting). Of course, response time in this formula means the duration spent inside the Oracle Database kernel.

Why I Don’t Say Wait, Part 1

There are two reasons I don’t use the word wait. The first is simply that it’s ambiguous.

The Oracle formula is okay for talking about database time, but the scope of my attention is almost never just Oracle’s response time—I’m interested in the business’s response time. And when you think about the whole stack (which, of course you do; see holistic), there are events we could call wait events all the way up and down:

The customer waits for an answer from a user.
The user waits for a screen from the browser.
The browser waits for an HTML page from the application server.
The application server waits for a database call from the Oracle kernel.
The Oracle kernel waits for a system call from the operating system.
The operating system’s I/O request waits to clear the device’s queue before receiving service.
...

If I say waits, the users in the room will think I’m talking about application response time, the Oracle people will think I’m talking about Oracle system calls, and the hardware people will think I’m talking about device queueing delays. Even when I’m not.

Why I Don’t Say Wait, Part 2

There is a deeper problem with wait than just ambiguity, though. The word wait invites a mental model that actually obscures your thinking about performance.

Here’s the problem: waiting sounds like something you’d want to avoid, and working sounds like something you’d want more of. Your program is waiting?! Unacceptable. You want it to be working. The connotations of the words working and waiting are unavoidable. It sounds like, if a program is waiting a lot, then you need to fix it; but if it’s working a lot, then it is probably okay. Right?

Actually, no.

The connotations “work is virtuous” and “waits are abhorrent” are false connotations in Oracle. One is not inherently better or worse than the other. Working and waiting are not accurate value judgments about Oracle software. On the contrary, they’re not even meaningful; they’re just arbitrary labels. We could just as well have been taught to say that an Oracle program is “working on disk I/O” and “waiting to finish its CPU instructions.”

The terms working and waiting really just refer to different subroutine call types:

“Oracle is working”	means	“your Oracle kernel process is executing a user call”
“Oracle is waiting”	means	“your Oracle kernel process is executing a system call”

The working-waiting model implies a distinction that does not exist, because these two call types have equal footing. One is no worse than the other, except by virtue of how much time it consumes. It doesn’t matter whether a program is working or waiting; it only matters how long it takes.

Working-Waiting Is a Flawed Analogy

The working-waiting paradigm is a flawed analogy. I’ll illustrate. Imagine two programs that consume 100 seconds apiece when you run them:

Program A		Program B
Duration	Call type	Duration	Call type
98	system calls (waiting)	98	user calls (working)
2	user calls (working)	2	system calls (waiting)
100	Total	100	Total

To improve program A, you should seek to eliminate unnecessary system calls, because that’s where most of A’s time has gone. To improve B, you should seek to eliminate unnecessary user calls, because that’s where most of B’s time has gone. That’s it. Your diagnostic priority shouldn’t be based on your calls’ names; it should be based solely on your calls’ contributions to total duration. Specifically, conclusions like, “Program B is okay because it doesn’t spend much time waiting,” are false.

A Better Model

I find that discarding the working-waiting model helps people optimize better. Here’s how you can do it. First, understand the substitute phrasing: working means executing a user call; and waiting means executing a system call. Second, understand that the excellent ideas people use to optimize other software are excellent ideas for optimizing Oracle, too:

Oracle’s wait interface is vital because it helps us measure an Oracle program’s complete execution duration—not just Oracle’s user calls, but its system calls as well. But I avoid saying wait to help people steer clear of the incorrect bias introduced by the working-waiting analogy.

Wednesday, March 3, 2010

RobB's Question about M/M/m

Today, user RobB left a comment on my recent blog post, asking this:

I have some doubts on how valid an M/M/m model is in a typical database performance scenario. Taking the example from the wait chapter of your book where you have a 0.49 second (service_time) query that you want to perform in less than a second, 95% percent of the time. The most important point here is the assumption of an exponential distribution for service_time immediately states that about 13% of the queries will take more than 2X(Average Service Time), and going the other way most likely to take 0 seconds. From just this assumption only, it is immediately clear that it is impossible to meet the design criteria without looking at anything else. From your article and link to the Kendall notation, wouldn’t an M/D/m model be more appropriate when looking at something like SQL query response time?? Something like M/M/m seems more suited to queueing at the supermarket, for example, and probably many other ‘human interactive’ scenarios compared to a single sub-component of an IT system.

Here’s my answer, part 1.

RobB,

First, I believe your observation about the book example is correct. It is correct that if service times are exponentially distributed, then about 13% (13.5335%, more precisely) of those times will be 2S, where S is the mean service time. So in the problem I stated, it would be impossible to achieve subsecond response time in more than about 86% of executions, even if there were no competing workload at all. You’re right: you don't need a complicated model to figure that out. You could get that straight from the CDF of the exponential distribution.

However, I think the end of the example provides significant value, where it demonstrates how to use the M/M/m model to prove that you're not going to be able to meet your design criteria unless you can work the value of S down to .103 seconds or less (Optimizing Oracle Performance Fig 9-26, p277). I’ve seen lots of people argue, “You need to tune that task,” but until M/M/m, I had never seen anyone be able to say what the necessary service time goal was, which of course varies as a function of the anticipated arrival rate. A numerical goal is what you need when you have developers who want to know when they’re finished writing code.

With regard to whether real-life service times are really exponentially distributed, you’ve got me wondering now, myself. If service times are exponentially distributed, then for any mean service time S, there’s a 9.5% probability that a randomly selected service time will be less than .1S (in Mathematica, CDF[ExponentialDistribution[1/s],.1s] is 0.0951626 if 0.1s > 0). I’ve got to admit that at the moment, I’m baffled as to how this kind of distribution would model any real-life service process, human, IT, or otherwise.

On its face, it seems like a distribution that prohibits service times smaller than a certain minimum value would be a better model (or perhaps, as you suggest, even fixed service times, as in M/D/m). I think I’m missing something right now that I used to know, because I remember thinking about this previously.

I have two anecdotal pieces of evidence to consider.

One, nowhere in my library of books dedicated to the application of queueing theory to modeling computer software performance (that’s more than 6,000 pages, over 14 inches of material) does Kleinrock, Allen, Jain, Gunther, Menascé, et al mention an M/D/m queueing system. That’s no proof that M/D/m is not the right answer, but it’s information that implies that an awful lot of thinking has gone into the application of queueing theory to software applications without anyone deciding that M/D/m is important enough to write about.

Two, I’ve used M/M/m before in modeling a trading system for a huge investment management company. The response time predictions that M/M/m produced were spectacularly accurate. We did macro-level testing only, comparing response times predicted by M/M/m to actual response times measured by Tuxedo. We didn’t check to see whether service times were exponentially distributed, because the model results were consistently within 5% of perfect accuracy.

Neither of these is proof, of course, that M/M/m is superior in routine applicability to M/D/m. One question I want to answer is whether an M/D/m system would provide better or worse performance than a similar M/M/m system. My intuition is leaning in favor of believing that the M/M/m system would give better performance. If that’s true, then M/M/m is an optimistic model compared to M/D/m, which means that if a real-life system is M/D/m and an M/M/m model says it’s not going to meet requirements, then it assuredly won’t.

I did find a paper online by G. J. Franx about M/D/m queueing. Maybe that paper contains an R=f(λ,μ) function that I can use to model an M/D/m system, which would enable me to do the comparison. I’ll look into it.

Then there’s the issue of whether M/M/m or M/D/m is a more appropriate model for a given real circumstance. The answer to that is simple: test your service times to see if they’re exponentially distributed. The Perl code in Optimizing Oracle Performance, pages 248–254 will do that for you.

Friday, February 20, 2009

Dang it, people, they're syscalls, not "waits"...

So many times, I see people get really confused about how to attack an Oracle performance problem, resulting in thoughts that look like this:

I don’t understand why my program is so slow. The Oracle wait interface says it’s just not waiting on anything. ?

The confusion begins with the name "wait event." I wish Oracle hadn't called them that. I wish instead of WAIT in the extended SQL trace output, they had used the token SYSCALL. Ok, that's seven bytes of trace data instead of just four, so maybe OS instead of WAIT. I wish that they had called v$session_wait either v$session_syscall or v$session_os .

Here's why. First, realize that an Oracle "wait event" is basically the instrumentation for one operating system subroutine call ("syscall"). For example, the Oracle event called db file sequential read: that's instrumentation for a pread call on our Linux box. On the same system, a db file scattered read covers a sequence of two syscalls: _llseek and readv (that's one reason why I said basically at the beginning of this paragraph). The event called enqueue: that's a semtimedop call.

Second, the word wait is easy to misinterpret. To the Oracle kernel developer who wrote the word WAIT into the Oracle source code, the word connoted the duration that the code path he was writing would have to "wait" for some syscall to return. But to an end-user or performance analyst, the word wait has lots of other meanings, too, like (to name just two):

How long the user has to wait for a task to complete (this is R in the R = S + W equation from queueing theory).
How long the user's task queues for service on a specific resource (this is W in the R = S + W equation from queueing theory).

The problem is that, as obvious and useful as these two definitions seem, neither one of them means what the word wait means in an Oracle context, which is:

wait n. In an Oracle context, the approximate response time of what is usually a single operating system call (syscall) executed by an Oracle kernel process.

That's a problem. It's a big problem when people try to stick Oracle wait times into the W slot of mathematical queueing models. Because they're not W values; they're R values. (But they're not the same R values as in #1 above.)

But that's a digression from a much more important point: I think the word wait simply confuses people into thinking that response time is something different than what it really is. Response time is simply how long it takes to execute a given code path.

To understand response time, you have to understand code path.

This is actually the core tenet that divides people who "tune" into two categories: people who look at code path, and people who look at system resources.

Here's an example of what code path really looks like, for an Oracle process:

begin prepare (dbcall)
  execute Oracle kernel code path (mostly CPU)
  maybe make a syscall or two (e.g., "latch: library cache")
  maybe even make recursive prepare, execute, or fetch calls (e.g., view resolution)
end prepare
maybe make a syscall or two (e.g., "SQL*Net message...")
begin execute (another dbcall)
  execute Oracle kernel code path
  maybe make some syscalls (e.g., "db file sequential read" for updates)
end execute
maybe make a syscall or two
begin fetch (another dbcall)
  execute Oracle kernel code path (acquire latches, visit the buffer cache, ...)
  maybe make some syscalls (e.g., "db file...read")
end fetch
make a syscall or two

The trick is, you can't see this whole picture when you look at v$whatever within Oracle. You have to look at a lot of v$whatevers and do a lot of work reconciling what you find, to come up with anything close to a coherent picture of your code path.

But when you look at the Oracle code path, do you see how the syscalls just kind of blend in with the dbcalls? It's because they're all calls, and they all take time. It's non-orthogonal thinking to call syscalls something other than what they really are: just subroutine calls to another layer in the software stack. Calling all syscalls waits diminishes the one distinction that I think really actually is important; that's the distinction between syscalls that occur within dbcalls and the syscalls that occur between dbcalls.

It's the reason I like extended SQL trace data so much: it lets me look at my code path without having to spend a bunch of extra time trying to compose several different perspectives of performance into a coherent view. The coherent view I want is right there in one place, laid out sequentially for me to look at, and that coherent view fits what the business needs to be looking at, as in...

Scene 1:

Business person: Our TPS Report is slow.
Oracle person: Yes, our system has a lot of waits. We're working on it.
(Later...) Oracle person: Great news! The problem with the waits has been solved.
Business person: Er, but the TSP Report is still slow.

Scene 2:

Business person: Our TPS Report is slow.
Oracle person: I'll look at it.
(Later...) Oracle person: I figured out your problem. The TPS Report was doing something stupid that it didn't need to do. It doesn't anymore.
Business person: Thanks; I noticed. It runs in, like, only a couple seconds now.