Friday, July 11, 2008

So how do you FIX the problems that "Performance as a Service" helps you find?

I want to respond carefully to Reubin's comment on my Performance as a Service post from July 7. Reuben asked:
i can see how you actually determine for a customer where the pain points are and you can validate user remarks about poor performance. But i don't see from your post how you are going to attack the problem of fixing the performance issue.

i would be most interested in hearing your thoughts on that. I wonder if you guys are going to touch the actual code behind the "order button" you described.
Under the service model I described in the Performance as a Service post, a client using our performance service would have several choices about how to respond to a problem. They could contract our team as consultants; they could use someone else; they could do it themselves; or of course, they could choose to defer the solution.

Attacking the problem of fixing the performance issue is actually the "easy" part. Well, it's not necessarily always easy, but it's the part that my team have been doing over and over again since the early 1990s. We use Method R. Once we've measured the response time in detail with Oracle's extended SQL trace function, we know exactly where the task is spending the end user's time. From there, I think it's fair to say we (Jeff, Karen, etc.) are pretty skilled at figuring out what to do next.

Sometimes, the root cause of a performance problem requires manipulation of the application source code, and sometimes it doesn't. If you do diagnosis right, you should never have to guess which one it is. A lot of people wonder what happens if it's the code that needs modification, but the application is prepackaged, and therefore the source code is out of your control. In my experience, most vendors are very responsive to improving the performance of their products when they're shown unambiguously how to improve them.

If your application is slow, you should be eager to know exactly why it's slow. You should be equally eager to know, whether you wrote the code yourself or someone else wrote it for you. To avoid collecting the right performance diagnostic data for an application because you're afraid of what you might find out is like taking your hands off the wheel and covering your eyes when a child rides his bike out in front of the car that you're driving. There's significant time-value upon information about performance problems. Even if someone else's code is the reason for your performance problem (or whatever truth you might be afraid of learning), you need to know it as early as possible.

The SLA Manager service I talked about is so important because the most difficult part of using Method R is usually the data collection step. The difficulty is almost always more political than technical. It's overcoming the question, "Why should we change the way we collect our data?" I believe the business value of knowing how long your computer system takes to execute tasks for your business is important enough that it will get people into the habit of measuring response time. ...Which is a vital step in solving the data collection problem that's at the heart of every persistent performance problem I've ever seen. I believe the data collection service that I described will help remove the most important remaining barrier to highly manageable application software performance in our market.

Thursday, July 10, 2008

Christian Antognini's new book: Troubleshooting Oracle Performance

I learned from a friend yesterday that Chris Antognini's new book, Troubleshooting Oracle Performance, is available now. I just checked at Amazon, and the product is listed as temporarily out of stock. That's good: it means people are buying them up.

If you're an Oracle application developer, get one. If you're an Oracle database administrator, get one for yourself and a couple more for your developer friends.

I hope he sells a million of them.

Jonathan Lewis and I both wrote a foreword for Chris after seeing the work he had put into this project. Here's mine...

My Foreword for Chris's Book

I think the best thing that has happened to Oracle performance in the past ten years is the radical improvement in the quality of the information you can buy now at the bookstore.

In the old days, the books you bought about Oracle performance all looked pretty much the same. They insinuated that your Oracle system inevitably suffered from too much I/O (which is, in fact, not inevitable) or not enough memory (which they claimed was the same thing as too much I/O, which also isn’t true). They’d show you loads and loads of SQL scripts that you might run, and they’d tell you to tune your SQL. And that, they said, would fix everything.

It was an age of darkness.

Chris’s book is a member of the family tree that has brought to us, …light. The difference between the darkness and the light boils down to one simple concept. It’s a concept that your mathematics teachers made you execute from the time when you were about ten years old: show your work.

I don’t mean “show and tell,” where someone claims he has improved performance at hundreds of customer sites by hundreds of percentage points [sic], so therefore he’s an expert. I mean show your work, which means documenting a relevant baseline measurement, conducting a controlled experiment, documenting a second relevant measurement, and then showing your results openly and transparently so that your reader can follow along and even reproduce your test if he wants to.

That’s a big deal. When authors started doing that, Oracle audiences started getting a lot smarter. Since the year 2000, there has been a dramatic increase in the number of people in the Oracle community who ask intelligent questions and demand intelligent answers about performance. And there’s been an acceleration in the drowning-out of some really bad ideas that lots of people used to believe.

In this book, Chris follows the pattern that works. He tells you useful things. But he doesn’t stop there. He shows you how he knows, which is to say he shows you how you can find out for yourself. He shows his work.

That brings you two big benefits. First, showing his work helps you understand more deeply what he’s showing you, which makes his lessons easier for you to remember and apply. Second, by understanding his examples, you can understand not just the things that Chris is showing you, but you’ll also be able to answer additional good questions that Chris hasn’t covered. …Like what will happen in the next release of Oracle after this book has gone to print.

This book, for me, is both a technical and a “persuasional” reference. It contains tremendous amounts of fully documented homework that I can reuse. It also contains eloquent new arguments on several points about which I share Chris’s views and his passion. The arguments that Chris uses in this book will help me convince more people to do the Right Things.

Chris is a smart, energetic guy who stands on the shoulders of Dave Ensor, Lex de Haan, Anjo Kolk, Steve Adams, Jonathan Lewis, Tom Kyte, and a handful of other people I regard as heroes for bringing rigor to our field. Now we have Chris’s shoulders to stand on as well.

―Cary Millsap
10 April 2008

Monday, July 7, 2008

Performance as a Service

I've mentioned already that, for the second time in ten years, I'm starting a business. It's a lot easier nowadays than it was back in 1999. I know; it's supposed to be easier the second time you do something, but what I mean is different from that. It's just a lot easier to start a business now than it used to be.

Take email for example. I remember the trauma of having to buy and build a server, install Linux on it, find a location for it, install Sendmail, figure out how to manage that, eventually hire someone to manage it, buy email client software for everyone (in our case, Microsoft Outlook), eventually decide that we wanted to use Microsoft Exchange instead of Sendmail, and then keep on top of hardware and software maintenance for everything we had bought, all in an environment where prices and technology and requirements were continuously variable. It took nearly a whole full-time person just to figure out which options we should be thinking about.

Jeff Holt did most of this work for us in my first start-up almost ten years ago. Now, when you think of how many people in the world there are who can set up email, and compare that to how few people in the world there are who can do what Jeff can do with an Oracle database, you realize that the opportunity cost of having Jeff fiddle with email is ludicrously high. But in 1999, the only other option I knew about was to spend a bunch of cash to hire a separate person to do it instead of Jeff.

Today, you pay $50 to Google for a whole year's worth of Gmail service for each employee you have, and that's it. Ok, there's a half hour or so of configuration work you have to do to get your own domain name in your email addresses. But for way less than one month's rent, you've got email for your company for a whole year that works every time, all the time, from anywhere. All you need is a browser to access it, and even that is free these days.

I can tell you the same kind of story for web hosting, bug tracking, backup and recovery, HR and payroll, accounting, even for sales. The common thread here is that there are a lot of things you have to do as a business that have nothing whatsoever to do with what your business really does, which is that content that your people are really passionate about providing to the market. Today, it's economically efficient to let specialty firms do things for you that ten years ago, you wouldn't have considered letting someone else do.

...Which brings me to what we do. My company, Method R Corporation, does performance for a living. Specifically, Oracle software performance. We know how to make just about any Oracle-based software go faster, and we can do it quicker than you probably think. And we can teach people how to do it just like we do. We even sell the tools we use, which make it a lot easier to do what we do. It works. Read the testimonials at our Profiler page for some evidence of what I mean.

So here's a really important question for our company: Why would a telco or a manufacturer or a transportation company or a financial services company—or even a computer software manufacturer—want to learn as much about Oracle performance as the people in Method R have invested into learning? The answer is that a lot of companies just don't.

I love the field of software performance. I love it; it's my life's work. But most people don't. There are a lot of business owners and even software developers out there who just don't love thinking about software performance. I get that. Hey, I happen not to love thinking about software security. I know it's necessary, and I want it; I just don't want to have to think about it. I think most people regard software performance the same way: want it, need it even, don't want to think about it.

What if software performance were something, like Gmail, that just worked, and the only time you had to think about it was when you wrote a little check to make sure you could continue not having to think about it? I think there's a real business model there.

So here's what we're doing.

The people here at Method R have created a software package that we call our SLA Manager. "SLA" stands for "Service Level Agreement." It is software that tracks the response times (the durations that your end-users care about) of the business tasks that you mark as the most important things you want to watch. For example, if your application's "Book Order" function is something that's important to you, we can measure all 10,436 of your "Book Order" button clicks that happened yesterday. Our SLA Manager could tell you how long every single one of those clicks took. We can report information like, "Only 92.7% of those clicks were fulfilled in 3 seconds or less (not 99% like you wanted)." Of course, we can see trends in the data (that is, we can see your performance problems before your users can), and so on.

So, our value proposition is this: We'll install some data collection software at your site. We'll instrument some of the business tasks that you want to make sure never have performance problems. We'll show you exactly what we're doing so there's no need to fear whether we're messing anything up for you. For example, we'll show you how to turn all our stuff off with the flick of a switch in case you ever get into a debate with one of your software vendors over the impact our measurements might have upon your system.

We'll periodically transfer data from your site to ours, where we'll look at your performance data. We'll charge a small fee for that. The people looking at your data will be Cary Millsap, Jeff Holt, Karen Morton, ...people like that.
Remember: we're not looking at your actual transactions; all we're going to see is how many you do and how long they take.
We'll report regularly to you on what we see, and we'll make recommendations when we see opportunities for improvement. How much or how little help you want will be your decision. If you ever do want us to help you fix a performance problem with one of the tasks that we've helped you instrument, we'll be able to provide quick answers because we have the tools that work with the instrumentation we installed.

Another part of our service will be regularly scheduled knowledge transfer sessions, where the same people I've mentioned already will be available to you. Whether the events are public or private, remote or on-site, ...that will depend on the level of service you want to purchase. We'll tailor these sessions to your needs. We'll be in tune with those needs because of the data we'll be collecting.

If this business model sounds attractive to you, then I hope you'll drop us a note at info at method-r dot com. If it doesn't sound attractive, then we're eager to know how we could make the idea more appealing.

Tuesday, July 1, 2008

Multitasking: Productivity Killer

A couple of years ago, I read Joel Spolsky's article "Human Task Switches Considered Harmful," and it resonated mightily. The key take-away from that article is this: Never let people work on more than one thing at once. Amen. The nice thing about Joel's article is that it explains why in a very compelling way.

Last week, a good friend emailed me a link to an article by Christine Rosen called "The Myth of Multitasking," which goes even further. It quotes one group of researchers at the University of California at Irvine, who found that workers took an average of twenty-five minutes to recover from interruptions such as phone calls or answering e-mail and return to their original task.

So it's not just me.

The "benefits" of human multitasking is an illusion. Looking or feeling busy is no substitute for accomplishment.

Here's a passage from the Rosen article that might get your attention, if I haven't already:
...Research has also found that multitasking contributes to the release of stress hormones and adrenaline, which can cause long-term health problems if not controlled, and contributes to the loss of short-term memory.
Translation: Trying too hard to do the information overload thing makes you sick, and it makes you stupid.

For as long as I can remember, I've hated the times I've been "forced" to multitask, and I've loved those segments of my life when I've been free to lock down on a train of thought for hours at a time. I believe deep down that multitasking is bad—at least for me—and literature like the two articles I've discussed here supports that feeling in a compelling way.

Here's a checklist of decisions that I resolve to implement myself:
  • When you need to sit down and write, whether it's code or text, close your door, and turn off your phone and your email. (Or just work the 10pm-to-4am shift like I did with Optimizing Oracle Performance.)
  • When you're in a classroom, if you're really trying to learn something, turn off your email and your browser.
  • When you're managing someone, make sure he's working on one thing at a time. It's obviously important that this one thing should be the right thing to be working on. But it's actually worse to be working on two things than working on just one wrong thing. Read Spolsky. You'll see.