Monday, June 9, 2008

Why Guess? When You Can Know

The comment from plαdys on my post about flash drives and databases is a great entry into exactly the right conversation to be having. He wrote:

What about Data Warehouse type databases? Lots of full table scans, less use of cache (not sure if that's true)...

The traditional approach to having this conversation is for a few enthusiastic participants to argue over what happens "most of the time" or what "can happen." I used to participate in conversations like that at conferences, on panels, and in email. Nowadays, people have conversations like this in newsgroups and blogs.

I'll submit to you that the conversation about what "can happen" and subsequent arguments about what happens "most of the time" are irrelevant. Here's why. Imagine that a conversation like that converged to a state where one person was able to argue successfully that "in precisely 99% of cases, some proposition P is true." That never happens, but go with me for a second; imagine it did.

So, now, is P true for you? Most people seem to assume that the things that happen to them are like the typical things that happen to most people. Most people would think, "If P is that common, then surely P is true for me." Maybe it is. But then again, maybe it's not. The most likely situation is that P is true for some tasks running within your system, but it's not true for others. Whether P is true for the most important tasks on your system is simply a game of chance if you're like most people who have this conversation.

My point is: Why guess? When you can know. When it comes to Amdahl's Law, you should be able to know exactly how much response time of an individual business task is being consumed upon the component of your system that you're thinking about upgrading. If you want to see an example of what it looks like, look at our Profiler page.

With Oracle systems, though, the traditional approach is not to think that way. The traditional approach is to look at measurements upon the resources that comprise a system (its CPUs, its memory, its disks, its network), not measurements upon the business tasks that the company who owns the machine is spending its money to perform.

My whole point is that if you look at the response time of your business's most important tasks (that's what Method R is all about), then you don't have to care about conversations about other people's systems, or whether your system is typical enough to follow other people's advice. You won't have to guess about stuff like that, because you'll know the specific nature and needs of your system, regardless of whether your system happens to be like anyone else's.

Stop guessing. You can know. But you have to be willing to look at performance from the outside in, from the perspective of the task being processed, not from the traditional inside-out perspective of the resources doing the work.

Wednesday, June 4, 2008

Flash Drives and Databases

I learned today about "Sun to embed flash storage in nearly all its servers." This is supposed to be good news for database professionals all over because, flash storage "...consumes one-fifth the power and is a hundred times faster [than rotating disk drives]."

Hey-hey!

Of course, flash storage is going to cost a little more. Well, I'm not sure, maybe a lot more. But, according to the article:
John Fowler, the head of Sun’s servers and storage division, said at a press conference in Boston Tuesday. “The fact that it’s not the same dollars per gigabyte is perfectly okay.”
Alright, I understand that. Get more, pay more. I'm still on board.

But I predict that a lot of people who buy flash storage are going to be disappointed. Here's why.

We all know now that flash storage is a hundred times faster than rotating disk drives. (Says so right in the article. And consumes one-fifth the power.) We all also "know" that databases are I/O intensive applications. (The article says that, too. But everybody already "knew" that anyway.)

The problem that's going to happen is the people (1) who have a slow database application, (2) who assume that their application is slow because of the I/O it is doing, (3) whose application doesn't really spend much time doing I/O at all (whether it does a "lot" of I/O is irrelevant), and (4) who buy flash storage specifically on the hope that after the installation, their database application will "be 100x faster" (because, of course, the flash storage is 100x faster than the storage it is replacing).

See the problem?

Think about Amdahl's Law: improving the speed of a component will help a user's performance only in proportion to the duration for which that user used that component in the first place. Here's an example. Imagine this response time profile:
Total response time: 100 minutes (100%)
Time spent executing OS read calls: 5 minutes (5%) (e.g., db file sequential read)
Time spent doing other stuff: 95 minutes (95%)
Now, so how much time will you save if you upgrade your disk drives to a technology that's 100x faster. The answer is that the new "Time spent executing OS read calls" will be .05 minutes, right? Well, maybe. Let's go with that for a moment. If that were true, then how much time will you save? You'll save 4.95 minutes, which is 4.95% of your original response time. Your application won't be 100x faster (or, equivalently, 99% faster), it'll be 4.95% faster.

The users in this story aren't going to be happy with this if they're thinking that the result of an expensive upgrade is going to be 100x faster performance. If they're expecting 1-minute performance and get 95.05-minute performance instead, they're going to be, um, disappointed.

Now, reality is probably not even going to be that good. Imagine that those 5 minutes our user spent in the 100-minute original application response time was consumed executing 150,000 distinct Oracle db file sequential read calls (which map to 150,000 OS read calls). That makes your single-call I/O latency 0.002 seconds per call (300 seconds divided by 150,000 calls).

That's pretty good, but it's a normal enough latency these days on today's high-powered SAN devices. If you think about rotating disk drives, then 0.002 seconds per call is mind-blowingly excellent. But I/O latencies of 0.002 seconds or better don't come from disk drives, they come from the cache that's sitting in these SANs. The read calls that result in physical disk access are taking much longer, 0.005 seconds or more. An average latency of 0.002 is possible because so many of those read calls are being fulfilled from cache.

And the flash drive upgrades aren't going to improve the latency of those calls being fulfilled from cache.

So, to recap, the best improvement you'll ever get by upgrading to flash drives is a percentage improvement that's equivalent to the percentage of time you spent before the upgrade actually making I/O calls. If a lot of your I/O calls are satisfied by reads from cache to begin with, then upgrading to flash drives will help you less than that.

The biggest performance problem most people have is that they don't know where their users' time is going. They know where their system's time is going, but that doesn't matter. What people need to see is the response time profiles of the tasks that the business regards as the most important things it needs done. That's the cornerstone of what Method R (both the method and the company) is all about.

Flash drives might help you. Maybe a lot. And maybe they'll help you a little, or maybe not at all. If you can't see individual user response times, then you'll have to actually try them to find out whether they'll be good for you or not (imagine cash register sound here).

We built our Profiler software so that when we manage Oracle systems, we can see the users' response times and not have to guess about stuff like this. When you can see your response times, you don't have to guess whether a proposed upgrade is going to help you. You'll know exactly whom will be helped, and you'll know by how much.

The Magic of VMs

Something that Faan said in a comment to one of my posts stimulated a memory I’d like to share. In that post, I mentioned that I’m kind of interested in trying Microsoft Outlook 2007, but I’m too chicken to do it, because I don’t have enough faith that if I didn’t end up wanting to buy it, I’d be able to uninstall it without gorping up my Outlook 2003 installation, which I still rely upon.

He mentioned that a good way to evaluate a product without having that product go mad through your production data is to use virtual machine software, like VMware. In my estimation, this is brilliant.

And that’s where the memory comes in. On my most recent trip to Europe, I had some time with my good friend Carel Jan Engel. Among the many stories we traded, Carel Jan gave me an excellent solution to the age-old problem of the awful transition period you have to go through when you replace your laptop computer.

In the Old Days, when you got a new computer, you had to install all the stuff that used to be on your old computer onto your new computer. This typically required me to spend weeks with both laptops sitting in front of me, so I could have access to all the license keys and so forth that I needed to install everything onto my new machine. Then there was the issue of re-customizing all your toolbars and everything that makes your apps yours.

Carel Jan excitedly told me the story of how he had just bought himself a new laptop, and all he had to do was bundle up the old Windows VM from his old machine, and copy it to his new machine. Presto! No more laptop upgrade purgatory. Brilliant.

Looks like I’ll have one more purgatory to survive, and, if I do things right, that will be the end of it for this lifetime.

Syncing..., Part 2

I've learned a lot about syncing my iPhone from the comments I received on my prior post about syncing. Here's a summary:
  • Plaxo is cool, but it just doesn't do what I need. It doesn't put appointments into my iPhone stand-alone Calendar application. ...Which means that when I'm in Europe and don't want to pay roaming charges for data, I'm not going to get an alert on my iPhone when a Google Calendar-entered appointment comes due.
  • GooSync looks interesting (e.g., "Sync multiple Google calendars"), but I'd have to pay for the Premium Account option to see if it would work. With so many things I've tried not working, and with plenty of other things occupying my time, my internal barrier to entry is too high to try this one.
  • It looks like Synthesis AG has interesting plans for an iPhone SyncML client, but it looks like that would give me less of a "solution" per se, and more of a basis for a new programming project that I could do myself with the GData APIs that Tony mentioned. I'm not interested in doing this myself as a project.
  • Dominic sent me an interesting article that does really get to the point of what I want, but it requires jail-breaking (i.e., voiding the warranty) on my iPhone. It didn't take me much introspection to figure out my position on this: I'm not a jail-breaking kind of guy.
The best solution appears to be to wait a few weeks and see what happens as a result of the scheduled-for-June release of iPhone G3 or whatever they'll call it, which is supposed to also take advantage of some mass of developers out there writing new apps for the new iPhone SDK. So, I'll wait and see what happens here in the next few weeks.