Cary Millsap: Profiler

Showing posts with label Profiler. Show all posts

Friday, February 27, 2015

What happened to “when the application is fast enough to meet users’ requirements?”

On January 5, I received an email called “Video” from my friend and former employee Guðmundur Jósepsson from Iceland. His friends call him Gummi (rhymes with “do me”). Gummi is the guy whose name is set in the ridiculous monospace font on page xxiv of Optimizing Oracle Performance, apparently because O’Reilly’s Linotype Birka font didn’t have the letter eth (ð) in it. Gummi once modestly teased me that this is what he is best known for. But I digress...

His email looked like this:

It’s a screen shot of frame 3:12 from my November 2014 video called “Why you need a profiler for Oracle.” At frame 3:12, I am answering the question of how you can know when you’re finished optimizing a given application function. Gummi’s question is, «Oi! What happened to “when the application is fast enough to meet users’ requirements?”»

Gummi noticed (the good ones will do that) that the video says something different than the thing he had heard me say for years. It’s a fair question. Why, in the video, have I said this new thing? It was not an accident.

When are you finished optimizing?

The question in focus is, “When are you finished optimizing?” Since 2003, I have actually used three different answers:

When are you are finished optimizing?

When the cost of call reduction and latency reduction exceeds the cost of the performance you’re getting today.
Source: Optimizing Oracle Performance (2003) pages 302–304.

When the application is fast enough to meet your users’ requirements.
Source: I have taught this in various courses, conferences, and consulting calls since 1999 or so.

When there are no unnecessary calls, and the calls that remain run at hardware speed.
Source: “Why you need a profiler for Oracle” (2014) frames 2:51–3:20.

My motive behind answers A and B was the idea that optimizing beyond what your business needs can be wasteful. I created these answers to deter people from misdirecting time and money toward perfecting something when those resources might be better invested improving something else. This idea was important, and it still is.

So, then, where did C come from? I’ll begin with a picture. The following figure allows you to plot the response time for a single application function, whatever “given function” you’re looking at. You could draw a similar figure for every application function on your system (although I wouldn’t suggest it).

Somewhere on this response time axis for your given function is the function’s actual response time. I haven’t marked that response time’s location specifically, but I know it’s in the blue zone, because at the bottom of the blue zone is the special response time R_T. This value R_T is the function’s top speed on the hardware you own today. Your function can’t go faster than this without upgrading something.

It so happens that this top speed is the speed at which your function will run if and only if (i) it contains no unnecessary calls and (ii) the calls that remain run at hardware speed. ...Which, of course, is the idea behind this new answer C.

Where, exactly, is your “requirement”?

Answer B (“When the application is fast enough to meet your users’ requirements”) requires that you know the users’ response time requirement for your function, so, next, let’s locate that value on our response time axis.

This is where the trouble begins. Most DBAs don’t know what their users’ response time requirements really are. Don’t despair, though; most users don’t either.

At banks, airlines, hospitals, telcos, and nuclear plants, you need strict service level agreements, so those businesses invest into quantifying them. But realize: quantifying all your functions’ response time requirements isn’t about a bunch of users sitting in a room arguing over which subjective speed limits sound the best. It’s about knowing your technological speed limits and understanding how close to those values your business needs to pay to be. It’s an expensive process. At some companies, it’s worth the effort; at most companies, it’s just not.

How about using, “well, nobody complains about it,” as all the evidence you need that a given function is meeting your users’ requirement? It’s how a lot of people do it. You might get away with doing it this way if your systems weren’t growing. But systems do grow. More data, more users, more application functions: these are all forms of growth, and you can probably measure every one of them happening where you’re sitting right now. All these forms of growth put you on a collision course with failing to meet your users’ response time requirements, whether you and your users know exactly what they are, or not.

In any event, if you don’t know exactly what your users’ response time requirements are, then you won’t be able to use “meets your users’ requirement” as your finish line that tells you when to stop optimizing. This very practical problem is the demise of answer B for most people.

Knowing your top speed

Even if you do know exactly what your users’ requirements are, it’s not enough. You need to know something more.

Imagine for a minute that you do know your users’ response time requirement for a given function, and let’s say that it’s this: “95% of executions of this function must complete within 5 seconds.” Now imagine that this morning when you started looking at the function, it would typically run for 10 seconds in your Oracle SQL Developer worksheet, but now after spending an hour or so with it, you have it down to where it runs pretty much every time in just 4 seconds. So, you’ve eliminated 60% of the function’s response time. That’s a pretty good day’s work, right? The question is, are you done? Or do you keep going?

Here is the reason that answer C is so important. You cannot responsibly answer whether you’re done without knowing that function’s top speed. Even if you know how fast people want it to run, you can’t know whether you’re finished without knowing how fast it can run.

Why? Imagine that 85% of those 4 seconds are consumed by Oracle enqueue, or latch, or log file sync calls, or by hundreds of parse calls, or 3,214 network round-trips to return 3,214 rows. If any of these things is the case, then no, you’re absolutely not done yet. If you were to allow some ridiculous code path like that to survive on a production system, you’d be diminishing the whole system’s effectiveness for everybody (even people who are running functions other than the one you’re fixing).

Now, sure, if there’s something else on the system that has a higher priority than finishing the fix on this function, then you should jump to it. But you should at least leave this function on your to-do list. Your analysis of the higher priority function might even reveal that this function’s inefficiencies are causing the higher-priority function’s problems. Such can be the nature of inefficient code under conditions of high load.

On the other hand, if your function is running in 4 seconds and (i) its profile shows no unnecessary calls, and (ii) the calls that remain are running at hardware speeds, then you’ve reached a milestone:

if your code meets your users’ requirement, then you’re done;
otherwise, either you’ll have to reimagine how to implement the function, or you’ll have to upgrade your hardware (or both).

There’s that “users’ requirement” thing again. You see why it has to be there, right?

Well, here’s what most people do. They get their functions’ response times reasonably close to their top speeds (which, with good people, isn’t usually as expensive as it sounds), and then they worry about requirements only if those requirements are so important that it’s worth a project to quantify them. A requirement is usually considered really important if it’s close to your top speed or if it’s really expensive when you violate a service level requirement.

This strategy works reasonably well.

It is interesting to note here that knowing a function’s top speed is actually more important than knowing your users’ requirements for that function. A lot of companies can work just fine not knowing their users’ requirements, but without knowing your top speeds, you really are in the dark. A second observation that I find particularly amusing is this: not only is your top speed more important to know, your top speed is actually easier to compute than your users’ requirement (…if you have a profiler, which was my point in the video).

Better and easier is a good combination.

Tomorrow is important, too

When are you are finished optimizing?

When the cost of call reduction and latency reduction exceeds the cost of the performance you’re getting today.

When the application is fast enough to meet your users’ requirements.

When there are no unnecessary calls, and the calls that remain run at hardware speed.

Answer A is still a pretty strong answer. Notice that it actually maps closely to answer C. Answer C’s prescription for “no unnecessary calls” yields answer A’s goal of call reduction, and answer C’s prescription for “calls that remain run at hardware speed” yields answer A’s goal of latency reduction. So, in a way, C is a more action-oriented version of A, but A goes further to combat the perfectionism trap with its emphasis on the cost of action versus the cost of inaction.

One thing I’ve grown to dislike about answer A, though, is its emphasis on today in “…exceeds the cost of the performance you’re getting today.” After years of experience with the question of when optimization is complete, I think that answer A under-emphasizes the importance of tomorrow. Unplanned tomorrows can quickly become ugly todays, and as important as tomorrow is to businesses and the people who run them, it’s even more important to another community: database application developers.

Subjective goals are treacherous for developers

Many developers have no way to test, today, the true production response time behavior of their code, which they won’t learn until tomorrow. ...And perhaps only until some remote, distant tomorrow.

Imagine you’re a developer using 100-row tables on your desktop to test code that will access 100,000,000,000-row tables on your production server. Or maybe you’re testing your code’s performance only in isolation from other workload. Both of these are problems; they’re procedural mistakes, but they are everyday real-life for many developers. When this is how you develop, telling you that “your users’ response time requirement is n seconds” accidentally implies that you are finished optimizing when your query finishes in less than n seconds on your no-load system of 100-row test tables.

If you are a developer writing high-risk code—and any code that will touch huge database segments in production is high-risk code—then of course you must aim for the “no unnecessary calls” part of the top speed target. And you must aim for the “and the calls that remain run at hardware speed” part, too, but you won’t be able to measure your progress against that goal until you have access to full data volumes and full user workloads.

Notice that to do both of these things, you must have access to full data volumes and full user workloads in your development environment. To build high-performance applications, you must do full data volume testing and full user workload testing in each of your functional development iterations.

This is where agile development methods yield a huge advantage: agile methods provide a project structure that encourages full performance testing for each new product function as it is developed. Contrast this with the terrible project planning approach of putting all your performance testing at the end of your project, when it’s too late to actually fix anything (if there’s even enough budget left over by then to do any testing at all). If you want a high-performance application with great performance diagnostics, then performance instrumentation should be an important part of your feedback for each development iteration of each new function you create.

My answer

So, when are you finished optimizing?

When the cost of call reduction and latency reduction exceeds the cost of the performance you’re getting today.
When the application is fast enough to meet your users’ requirements.
When there are no unnecessary calls and the calls that remain run at hardware speed.

There is some merit in all three answers, but as Dave Ensor taught me inside Oracle many years ago, the correct answer is C. Answer A specifically restricts your scope of concern to today, which is especially dangerous for developers. Answer B permits you to promote horrifically bad code, unhindered, into production, where it can hurt the performance of every function on the system. Answers A and B both presume that you know information that you probably don’t know and that you may not need to know. Answer C is my favorite answer because it is tells you exactly when you’re done, using units you can measure and that you should be measuring.

Answer C is usually a tougher standard than answer A or B, and when it’s not, it is the best possible standard you can meet without upgrading or redesigning something. In light of this “tougher standard” kind of talk, it is still important to understand that what is optimal from a software engineering perspective is not always optimal from a business perspective. The term optimized must ultimately be judged within the constraints of what the business chooses to pay for. In the spirit of answer A, you can still make the decision not to optimize all your code to the last picosecond of its potential. How perfect you make your code should be a business decision. That decision should be informed by facts, and these facts should include knowledge of your code’s top speed.

Thank you, Guðmundur Jósepsson, of Iceland, for your question. Thank you for waiting patiently for several weeks while I struggled putting these thoughts into words.

Thursday, June 18, 2009

Profiling with my Boy

We have an article online called "Can you explain Method R so even my boss could understand it?" Today I'm going to raise the stakes, because yesterday I think I explained Method R so that an eleven year-old could understand it.

Yesterday I took my 11 year-old son Alex to lunch. I talked him into eating at one of my favorite restaurants, called Mercado Juarez, over in Irving, so it was a half hour in the car together, just getting over there. It was a big day for the two of us because we were very excited about the new June 17 iPhone OS 3.0 release. I told him about some of the things I've learned about it on the Internet over the past couple of weeks. One subject in particular that we were both interested in was performance. He likes not having to wait for click results just as much as I do.

According to Apple, the new iPhone OS 3.0 software has some important code paths in it that are 3× faster. Then, upgrading to the new iPhone 3G S hardware is supposed to yield yet another 3× performance improvement for some code paths. It's what Philip Schiller talks about at 1:42:00 in the WWDC 2009 keynote video. Very exciting.

Alex of course, like many of us, wants to interpret "3× faster" as "everything I do is going to be 3× faster." As in everything that took 10 seconds yesterday will take 3 seconds tomorrow. It's a nice dream. But it's not what seeing a benchmark run 3× faster means. So we talked about it.

I asked him to imagine that it takes me an hour to do my grocery shopping when I walk to the store. Should I buy a car? He said yes, probably, because a car is a lot faster than walking. So I asked him, what if the total time I spent walking to and from the grocery store was only one minute? Then, he said, having a car wouldn't make that much of a difference. He said you might want a car for other reasons, but he wouldn't recommend it just to make grocery shopping faster.

Good.

I said, what if grocery shopping were taking me five hours, and four of it was time spent walking? "Then you probably should get the car," he told me. "Or a bicycle."

Commit.

On the back of his menu (photo above: click to zoom), I drew him a sequence diagram (A) showing how he, running Safari on an iPhone 3G might look to a performance analyst. I showed him how to read the sequence diagram (time runs top-down, control passes from one tier to another), and I showed him two extreme ways that his sequence diagram might turn out for a given experience. Maybe the majority of the time would be spent on the 3G network tier (B), or maybe the majority of the time would be spent on the Safari software tier (C). We talked about how if B were what was happening, then a 3× faster Safari tier wouldn't really make any difference. Apple wouldn't be lying if they said their software was 3× faster, but he really wouldn't notice a performance improvement. But if C were what was happening, then a 3× faster Safari tier would be a smoking hot upgrade that we'd be thrilled with.

Sequence diagrams, check. Commit.

Now, to profiles. So I drew up a simple profile for him, with 101 seconds of response time consumed by 100 seconds of software and 1 second of 3G (D):

Software  100
3G          1
-------------
Total     101

I asked him, if we made the software 2× faster, what would happen to the total response time? He wrote down "50" in a new column to the right of the "100." Yep. Then I asked him what would happen to total response time. He said to wait a minute, he needed to use the calculator on his iPod Touch. Huh? A few keystrokes later, he came up with a response time of 50.5.

Oops. Rollback.

He made the same mistake that just about every forty year-old I've ever met makes. He figured if one component of response time were 2× faster, then the total response time must be 2× faster, too. Nope. In this case, the wrong answer was close to the right answer, but only because of the particular numbers I had chosen.

So, to illustrate, I drew up another profile (E):

Software    4
3G         10
-------------
Total      14

Now, if we were to make the software 2× faster, what happens to the total? We worked through it together:

Software    4    2
3G         10   10
------------------
Total      14   12

Click. So then we spent the next several minutes doing little quizzes. If this is your profile, and we make this component X times faster, then what's the new response time going to be? Over and over, we did several more of these, some on paper (F), and others just talking.

Commit.

Next step. "What if I told you it takes me an hour to get into my email at home? Do I need to upgrade my network connection?" A couple of minutes of conversation later, he figured out that he couldn't answer that question until he got some more information from me. Specifically, he had to ask me how much of that hour is presently being spent by the network. So we did this over and over a few times. I'd say things like, "It takes me an hour to run my report. Should I spend $4,800 tuning my SQL?" Or, "Clicking this button takes 20 seconds. Should I upgrade my storage area network?"

And, with just a little bit of practice, he learned that he had to say, "How much of the however-long-you-said is being spent by the whatever-it-was-you-said?" I was happy with how he answered, because it illustrated that he had caught onto the pattern. He realized that the specific blah-blah-blah proposed remedies I was asking him about didn't really matter. He had to ask the same question regardless. (He was answering me with a sentence using bind variables.)

Commit.

Alex hears me talk about our Method R Profiler software tool a lot, and he knows conceptually that it helps people make their systems faster, but he's never known in any real detail very much about what it does. So I told him that the profile tables are what our Profiler makes for people. To demonstrate how it does that, I drew him up a list of calls (F), which I told him was a list of visits between a disk and a CPU. ...Just a list that says the same thing that a sequence diagram (annotated with timings) would say:

D 2
C 1
D 6
D 4
D 8
C 3

I told him to make a profile for these calls, and he did (H):

Disk   20
CPU     4
---------
Total  24

Excellent. So I explained that instead of adding up lists in our head all day, we wrote the Profiler to aggregate the call-by-call durations (from an Oracle extended SQL trace file) for you into a profile table that lets you answer the performance questions we had been practicing over lunch. ...Even if there are millions of lines to add up.

The finish-up conversation in the car ride back was about how to use everything we had talked about when you fix people's performance problems. I told him the most vital thing about helping someone solve a performance problem is to make sure that the operation (the business task) that you're analyzing right now is actually the most important business task to fix first. If you're looking at anything other than the most important task first, then you're asking for trouble.

I asked him to imagine that there are five important tasks that are too slow. Maybe every one of those things has its response time dominated by a different component than all the others. Maybe they're all the same. But if they're all different, then no single remedy you can perform is going to fix all five. A given remedy will have a different performance impact upon each of the five tasks, depending on how much of the fixed thing that task was using to begin with.

So the important thing is to know which of the five profiles it is that you ought to be paying attention to first. Maybe one remedy will fix all five tasks, maybe not. You just can't know until you look at the profiles. (Or until you try your proposed remedy. But trial-and-error is an awfully expensive way to find out.)

Commit.

It was a really good lunch. I'll look forward to taking my 9-year-old (Alex's little brother) out the week after next when I get back from ODTUG.

Monday, June 9, 2008

Why Guess? When You Can Know

The comment from plαdys on my post about flash drives and databases is a great entry into exactly the right conversation to be having. He wrote:

What about Data Warehouse type databases? Lots of full table scans, less use of cache (not sure if that's true)...

The traditional approach to having this conversation is for a few enthusiastic participants to argue over what happens "most of the time" or what "can happen." I used to participate in conversations like that at conferences, on panels, and in email. Nowadays, people have conversations like this in newsgroups and blogs.

I'll submit to you that the conversation about what "can happen" and subsequent arguments about what happens "most of the time" are irrelevant. Here's why. Imagine that a conversation like that converged to a state where one person was able to argue successfully that "in precisely 99% of cases, some proposition P is true." That never happens, but go with me for a second; imagine it did.

So, now, is P true for you? Most people seem to assume that the things that happen to them are like the typical things that happen to most people. Most people would think, "If P is that common, then surely P is true for me." Maybe it is. But then again, maybe it's not. The most likely situation is that P is true for some tasks running within your system, but it's not true for others. Whether P is true for the most important tasks on your system is simply a game of chance if you're like most people who have this conversation.

My point is: Why guess? When you can know. When it comes to Amdahl's Law, you should be able to know exactly how much response time of an individual business task is being consumed upon the component of your system that you're thinking about upgrading. If you want to see an example of what it looks like, look at our Profiler page.

With Oracle systems, though, the traditional approach is not to think that way. The traditional approach is to look at measurements upon the resources that comprise a system (its CPUs, its memory, its disks, its network), not measurements upon the business tasks that the company who owns the machine is spending its money to perform.

My whole point is that if you look at the response time of your business's most important tasks (that's what Method R is all about), then you don't have to care about conversations about other people's systems, or whether your system is typical enough to follow other people's advice. You won't have to guess about stuff like that, because you'll know the specific nature and needs of your system, regardless of whether your system happens to be like anyone else's.

Stop guessing. You can know. But you have to be willing to look at performance from the outside in, from the perspective of the task being processed, not from the traditional inside-out perspective of the resources doing the work.

Wednesday, June 4, 2008

Flash Drives and Databases

I learned today about "Sun to embed flash storage in nearly all its servers." This is supposed to be good news for database professionals all over because, flash storage "...consumes one-fifth the power and is a hundred times faster [than rotating disk drives]."

Hey-hey!

Of course, flash storage is going to cost a little more. Well, I'm not sure, maybe a lot more. But, according to the article:

John Fowler, the head of Sun’s servers and storage division, said at a press conference in Boston Tuesday. “The fact that it’s not the same dollars per gigabyte is perfectly okay.”

Alright, I understand that. Get more, pay more. I'm still on board.

But I predict that a lot of people who buy flash storage are going to be disappointed. Here's why.

We all know now that flash storage is a hundred times faster than rotating disk drives. (Says so right in the article. And consumes one-fifth the power.) We all also "know" that databases are I/O intensive applications. (The article says that, too. But everybody already "knew" that anyway.)

The problem that's going to happen is the people (1) who have a slow database application, (2) who assume that their application is slow because of the I/O it is doing, (3) whose application doesn't really spend much time doing I/O at all (whether it does a "lot" of I/O is irrelevant), and (4) who buy flash storage specifically on the hope that after the installation, their database application will "be 100x faster" (because, of course, the flash storage is 100x faster than the storage it is replacing).

See the problem?

Think about Amdahl's Law: improving the speed of a component will help a user's performance only in proportion to the duration for which that user used that component in the first place. Here's an example. Imagine this response time profile:

Total response time: 100 minutes (100%)
Time spent executing OS read calls: 5 minutes (5%) (e.g., db file sequential read)
Time spent doing other stuff: 95 minutes (95%)

Now, so how much time will you save if you upgrade your disk drives to a technology that's 100x faster. The answer is that the new "Time spent executing OS read calls" will be .05 minutes, right? Well, maybe. Let's go with that for a moment. If that were true, then how much time will you save? You'll save 4.95 minutes, which is 4.95% of your original response time. Your application won't be 100x faster (or, equivalently, 99% faster), it'll be 4.95% faster.

The users in this story aren't going to be happy with this if they're thinking that the result of an expensive upgrade is going to be 100x faster performance. If they're expecting 1-minute performance and get 95.05-minute performance instead, they're going to be, um, disappointed.

Now, reality is probably not even going to be that good. Imagine that those 5 minutes our user spent in the 100-minute original application response time was consumed executing 150,000 distinct Oracle db file sequential read calls (which map to 150,000 OS read calls). That makes your single-call I/O latency 0.002 seconds per call (300 seconds divided by 150,000 calls).

That's pretty good, but it's a normal enough latency these days on today's high-powered SAN devices. If you think about rotating disk drives, then 0.002 seconds per call is mind-blowingly excellent. But I/O latencies of 0.002 seconds or better don't come from disk drives, they come from the cache that's sitting in these SANs. The read calls that result in physical disk access are taking much longer, 0.005 seconds or more. An average latency of 0.002 is possible because so many of those read calls are being fulfilled from cache.

And the flash drive upgrades aren't going to improve the latency of those calls being fulfilled from cache.

So, to recap, the best improvement you'll ever get by upgrading to flash drives is a percentage improvement that's equivalent to the percentage of time you spent before the upgrade actually making I/O calls. If a lot of your I/O calls are satisfied by reads from cache to begin with, then upgrading to flash drives will help you less than that.

The biggest performance problem most people have is that they don't know where their users' time is going. They know where their system's time is going, but that doesn't matter. What people need to see is the response time profiles of the tasks that the business regards as the most important things it needs done. That's the cornerstone of what Method R (both the method and the company) is all about.

Flash drives might help you. Maybe a lot. And maybe they'll help you a little, or maybe not at all. If you can't see individual user response times, then you'll have to actually try them to find out whether they'll be good for you or not (imagine cash register sound here).

We built our Profiler software so that when we manage Oracle systems, we can see the users' response times and not have to guess about stuff like this. When you can see your response times, you don't have to guess whether a proposed upgrade is going to help you. You'll know exactly whom will be helped, and you'll know by how much.