Monday, June 9, 2008

Why Guess? When You Can Know

The comment from plαdys on my post about flash drives and databases is a great entry into exactly the right conversation to be having. He wrote:

What about Data Warehouse type databases? Lots of full table scans, less use of cache (not sure if that's true)...

The traditional approach to having this conversation is for a few enthusiastic participants to argue over what happens "most of the time" or what "can happen." I used to participate in conversations like that at conferences, on panels, and in email. Nowadays, people have conversations like this in newsgroups and blogs.

I'll submit to you that the conversation about what "can happen" and subsequent arguments about what happens "most of the time" are irrelevant. Here's why. Imagine that a conversation like that converged to a state where one person was able to argue successfully that "in precisely 99% of cases, some proposition P is true." That never happens, but go with me for a second; imagine it did.

So, now, is P true for you? Most people seem to assume that the things that happen to them are like the typical things that happen to most people. Most people would think, "If P is that common, then surely P is true for me." Maybe it is. But then again, maybe it's not. The most likely situation is that P is true for some tasks running within your system, but it's not true for others. Whether P is true for the most important tasks on your system is simply a game of chance if you're like most people who have this conversation.

My point is: Why guess? When you can know. When it comes to Amdahl's Law, you should be able to know exactly how much response time of an individual business task is being consumed upon the component of your system that you're thinking about upgrading. If you want to see an example of what it looks like, look at our Profiler page.

With Oracle systems, though, the traditional approach is not to think that way. The traditional approach is to look at measurements upon the resources that comprise a system (its CPUs, its memory, its disks, its network), not measurements upon the business tasks that the company who owns the machine is spending its money to perform.

My whole point is that if you look at the response time of your business's most important tasks (that's what Method R is all about), then you don't have to care about conversations about other people's systems, or whether your system is typical enough to follow other people's advice. You won't have to guess about stuff like that, because you'll know the specific nature and needs of your system, regardless of whether your system happens to be like anyone else's.

Stop guessing. You can know. But you have to be willing to look at performance from the outside in, from the perspective of the task being processed, not from the traditional inside-out perspective of the resources doing the work.

Wednesday, June 4, 2008

Flash Drives and Databases

I learned today about "Sun to embed flash storage in nearly all its servers." This is supposed to be good news for database professionals all over because, flash storage "...consumes one-fifth the power and is a hundred times faster [than rotating disk drives]."

Hey-hey!

Of course, flash storage is going to cost a little more. Well, I'm not sure, maybe a lot more. But, according to the article:
John Fowler, the head of Sun’s servers and storage division, said at a press conference in Boston Tuesday. “The fact that it’s not the same dollars per gigabyte is perfectly okay.”
Alright, I understand that. Get more, pay more. I'm still on board.

But I predict that a lot of people who buy flash storage are going to be disappointed. Here's why.

We all know now that flash storage is a hundred times faster than rotating disk drives. (Says so right in the article. And consumes one-fifth the power.) We all also "know" that databases are I/O intensive applications. (The article says that, too. But everybody already "knew" that anyway.)

The problem that's going to happen is the people (1) who have a slow database application, (2) who assume that their application is slow because of the I/O it is doing, (3) whose application doesn't really spend much time doing I/O at all (whether it does a "lot" of I/O is irrelevant), and (4) who buy flash storage specifically on the hope that after the installation, their database application will "be 100x faster" (because, of course, the flash storage is 100x faster than the storage it is replacing).

See the problem?

Think about Amdahl's Law: improving the speed of a component will help a user's performance only in proportion to the duration for which that user used that component in the first place. Here's an example. Imagine this response time profile:
Total response time: 100 minutes (100%)
Time spent executing OS read calls: 5 minutes (5%) (e.g., db file sequential read)
Time spent doing other stuff: 95 minutes (95%)
Now, so how much time will you save if you upgrade your disk drives to a technology that's 100x faster. The answer is that the new "Time spent executing OS read calls" will be .05 minutes, right? Well, maybe. Let's go with that for a moment. If that were true, then how much time will you save? You'll save 4.95 minutes, which is 4.95% of your original response time. Your application won't be 100x faster (or, equivalently, 99% faster), it'll be 4.95% faster.

The users in this story aren't going to be happy with this if they're thinking that the result of an expensive upgrade is going to be 100x faster performance. If they're expecting 1-minute performance and get 95.05-minute performance instead, they're going to be, um, disappointed.

Now, reality is probably not even going to be that good. Imagine that those 5 minutes our user spent in the 100-minute original application response time was consumed executing 150,000 distinct Oracle db file sequential read calls (which map to 150,000 OS read calls). That makes your single-call I/O latency 0.002 seconds per call (300 seconds divided by 150,000 calls).

That's pretty good, but it's a normal enough latency these days on today's high-powered SAN devices. If you think about rotating disk drives, then 0.002 seconds per call is mind-blowingly excellent. But I/O latencies of 0.002 seconds or better don't come from disk drives, they come from the cache that's sitting in these SANs. The read calls that result in physical disk access are taking much longer, 0.005 seconds or more. An average latency of 0.002 is possible because so many of those read calls are being fulfilled from cache.

And the flash drive upgrades aren't going to improve the latency of those calls being fulfilled from cache.

So, to recap, the best improvement you'll ever get by upgrading to flash drives is a percentage improvement that's equivalent to the percentage of time you spent before the upgrade actually making I/O calls. If a lot of your I/O calls are satisfied by reads from cache to begin with, then upgrading to flash drives will help you less than that.

The biggest performance problem most people have is that they don't know where their users' time is going. They know where their system's time is going, but that doesn't matter. What people need to see is the response time profiles of the tasks that the business regards as the most important things it needs done. That's the cornerstone of what Method R (both the method and the company) is all about.

Flash drives might help you. Maybe a lot. And maybe they'll help you a little, or maybe not at all. If you can't see individual user response times, then you'll have to actually try them to find out whether they'll be good for you or not (imagine cash register sound here).

We built our Profiler software so that when we manage Oracle systems, we can see the users' response times and not have to guess about stuff like this. When you can see your response times, you don't have to guess whether a proposed upgrade is going to help you. You'll know exactly whom will be helped, and you'll know by how much.

The Magic of VMs

Something that Faan said in a comment to one of my posts stimulated a memory I’d like to share. In that post, I mentioned that I’m kind of interested in trying Microsoft Outlook 2007, but I’m too chicken to do it, because I don’t have enough faith that if I didn’t end up wanting to buy it, I’d be able to uninstall it without gorping up my Outlook 2003 installation, which I still rely upon.

He mentioned that a good way to evaluate a product without having that product go mad through your production data is to use virtual machine software, like VMware. In my estimation, this is brilliant.

And that’s where the memory comes in. On my most recent trip to Europe, I had some time with my good friend Carel Jan Engel. Among the many stories we traded, Carel Jan gave me an excellent solution to the age-old problem of the awful transition period you have to go through when you replace your laptop computer.

In the Old Days, when you got a new computer, you had to install all the stuff that used to be on your old computer onto your new computer. This typically required me to spend weeks with both laptops sitting in front of me, so I could have access to all the license keys and so forth that I needed to install everything onto my new machine. Then there was the issue of re-customizing all your toolbars and everything that makes your apps yours.

Carel Jan excitedly told me the story of how he had just bought himself a new laptop, and all he had to do was bundle up the old Windows VM from his old machine, and copy it to his new machine. Presto! No more laptop upgrade purgatory. Brilliant.

Looks like I’ll have one more purgatory to survive, and, if I do things right, that will be the end of it for this lifetime.

Syncing..., Part 2

I've learned a lot about syncing my iPhone from the comments I received on my prior post about syncing. Here's a summary:
  • Plaxo is cool, but it just doesn't do what I need. It doesn't put appointments into my iPhone stand-alone Calendar application. ...Which means that when I'm in Europe and don't want to pay roaming charges for data, I'm not going to get an alert on my iPhone when a Google Calendar-entered appointment comes due.
  • GooSync looks interesting (e.g., "Sync multiple Google calendars"), but I'd have to pay for the Premium Account option to see if it would work. With so many things I've tried not working, and with plenty of other things occupying my time, my internal barrier to entry is too high to try this one.
  • It looks like Synthesis AG has interesting plans for an iPhone SyncML client, but it looks like that would give me less of a "solution" per se, and more of a basis for a new programming project that I could do myself with the GData APIs that Tony mentioned. I'm not interested in doing this myself as a project.
  • Dominic sent me an interesting article that does really get to the point of what I want, but it requires jail-breaking (i.e., voiding the warranty) on my iPhone. It didn't take me much introspection to figure out my position on this: I'm not a jail-breaking kind of guy.
The best solution appears to be to wait a few weeks and see what happens as a result of the scheduled-for-June release of iPhone G3 or whatever they'll call it, which is supposed to also take advantage of some mass of developers out there writing new apps for the new iPhone SDK. So, I'll wait and see what happens here in the next few weeks.

Tuesday, May 27, 2008

Syncing iCal feeds with my iPhone: Not

Here's something I need to do, but I don't know how: sync an iCal feed with the calendar application on my iPhone, without a Mac, and without upgrading Outlook 2003 to Outlook 2007. Here's the whole story.

First, I travel. Sometimes, a lot. And I have a lot of appointment managing to do. I feel very disoriented whenever I don't have my itinerary available to me, in my pocket. I also need my schedule on my laptop, where I can see a whole month in one view. Of course, the schedule in my pocket and the schedule on my laptop need to be synchronized.

I own an iPhone. It's the first cellphone I've truly loved since the first Nokia I bought back in the 1990s. I love it. My iPhone syncs with Microsoft Outlook 2003 on my Dell laptop. I don't own Outlook 2007. I don't do email in Outlook anymore. I use Gmail, both at home and at work. And on my iPhone.

I do still use Outlook, though, for calendars and contacts. That's because I don't know a better way. I need read/write access to my calendar and contacts lists on my PC, and I certainly don't want the last copy of my calendar and contacts to be stored on a device that goes with me everywhere I go and that could easily be lost or stolen.

A better place to store calendar data is on the web. That way, I can share it with people (without other people, we wouldn't need calendars at all!), and I can access it from whatever computer happens to be available. Enter Google Calendar.

I want to love Google Calendar, but I can't. I love the idea of it, but I don't love it because I can't sync it with my iPhone. There's no direct hookup between Google Calendar and my iPhone. Yes, I know I can see my Google Calendar from my iPhone, and I remember being able to do some limited form of calendar editing from my iPhone, but that's not good enough. I need alerts when I'm not connected. I need my information stored locally within the calendar application on my iPhone.

I think the solution is supposed to be Google Calendar Sync. It syncs information from Google Calendar to Outlook 2003, where I can sync to my iPhone. But Google Calendar Sync doesn't work on my laptop. I keep getting error code 1008. I looked it up, and Google says they're working on it, but there's no relief today. Additionally, even if I could get Google Calendar Sync to work (I had it working a couple of months ago), it still doesn't do what I need it to do. That's because Google Calendar Sync syncs only my primary Google Calendar to Outlook. More on that in a minute.

Now, let me describe one of the world's very coolest web applications ever: TripIt.com. TripIt is wholly excellent. Imagine this. Book a flight at American Airlines, a hotel room at Hilton, and a rental car at Hertz. You get three confirmation email messages as a result. In the old days, you might have spent some of your time transcribing the information from those messages into Outlook or whatever. Or maybe you paid someone good money to transcribe them for you.

With TripIt, all you do is forward your three confirmation email messages to plans@tripit.com. And then all your itinerary information gets structured automatically into a complete, single itinerary that you can access on the web. You can print that itinerary on a page or two, stuff it into your briefcase, and have everything you need: flight times, rental car and hotel confirmation numbers, weather forecasts, pertinent local maps, ..., everything. 

That's not even the best part. The best part is that TripIt creates an iCal feed that Google Calendar can pick up automatically.

So let me recap. You book travel however you like. You forward the confirmation mails to plans@tripit.com. (It just occurred to me that you could even do this automatically with Gmail filters). And then Google Calendar picks up your whole itinerary automatically.
 
But for me, that's where the joy ends. Because even when Google Calendar Sync does work (remember the 1008 error causes it not to), it syncs only your primary calendar. It doesn't sync a secondary calendar obtained through an iCal feed, so it doesn't sync my TripIt calendar.

So, here's what my process looks like now:
  1. Book travel.
  2. Forward the confirmation emails to plans@tripit.com.
  3. Print my TripIt unified itinerary for my briefcase.
  4. Type my itinerary into Outlook (or onto my iPhone). If the travel spans more than just a couple of time zones, then I enter the itinerary at mytimetraveler.com (which does my time zone arithmetic for me), and then I download the Outlook itinerary record from the web page. Back when I could get Google Calendar Sync to work, copying my TripIt calendar records to my primary calendar was an option, but not a good one.
  5. Sync my iPhone with my laptop.
Here's what I wish my process looked like:
  1. Book travel.
  2. Forward the confirmation emails to plans@tripit.com.
  3. Print my TripIt unified itinerary for my briefcase. (Or not.)
  4. Sync my iPhone with my laptop.
I've actually considered upgrading to Microsoft Outlook 2007, which, I understand, knows about iCal feeds. It might be able to sync my TripIt data with my iPhone. But I think the price tag is too high to pay for that one feature. And I'm not even assured that it will work. I know Microsoft has a 60-day free trial, but I'm worried that Outlook 2003 won't ever work right again if I try 2007 and don't like it.

Another option I've considered is replacing my laptop with a MacBook Pro. As tempted as I am by that idea, I'm not going to do that right now, and I'm not sure whether it would actually solve my problem anyway. Would it?

I hope there's a solution that I can implement with minimal expense, and with the hardware I own today. If there is, I sure haven't found it yet. I'd love to hear from you if you have a helpful opinion.

Thursday, May 22, 2008

Karen Morton

Today I’ve added Karen Morton’s blog to my Blog list. I met her a few years back at a course I helped teach in Tennessee. She generously describes that the course changed her life, and she has since changed mine.

Recently, Karen helped me found Method R Corporation. She’s our director of education and consulting. Many of you have met Karen already in a classroom.

Karen is an excellent teacher (that means more than “excellent instructor”), and she’s just one of those rare people who, when she says she’ll do something, it’s as good as a COMMIT. She is also one of the best SQL optimizers I know, on top of being a pioneer and first-rate practitioner of the techniques Jeff and I talk about in Optimizing Oracle Performance.

She has already taught me many things, and I’m eager to watch what she will have to say online.

Friday, May 16, 2008

May 28 seminar, Minneapolis

Today I'm making preparations for another public event: this one is a one-day Performance Seminar I'll conduct in the Minneapolis area for Speak-Tech on May 28. In the morning, I'll do a "Why you can't see your real performance problems" session, and in the afternoon, I'll do "Measure once, cut twice (no, really)," which I discussed briefly here yesterday.

I'm looking forward to a lot of audience interaction on this one. We should have plenty of time on the 9:00am-4:30pm agenda for discussion.