Cary Millsap: BAARF party

Yesterday, David Best posted this question at Oracle-L:

If you had 8 disks in a server what would you do? From watching this list I can see alot of people using RAID 5 but i'm wary of the performance implicatons. (http://www.miracleas.com/BAARF/)

I was thinking maybe RAID 5 (3 disks) for the OS, software and
backups. RAID 10 (4 disks + 1 hot spare) for the database files.

Any thoughts?

I do have some thoughts about it.

There are four dimensions in which I have to make considerations as I answer this question:

Volume
Flow
Availability
Change

Just about everybody understands at least a little bit about #1: the reason you bought 8 disks instead of 4 or 16 has something to do with how many bytes of data you're going to store. Most people are clever enough to figure out that if you need to store N bytes of data, then you need to buy N + M bytes of capacity, for some M > 0 (grin).

#2 is where a lot of people fall off the trail. You can't know how many disks you really need to buy unless you know how many I/O calls per second (IOPS) your application is going to generate. You need to ensure that your sustained IOPS rate on each disk will not exceed 50% (see Table 9.3 in Optimizing Oracle Performance for why .5 is special). So, if a disk drive is capable of serving N 8KB IOPS (your disk's capacity for serving I/O calls at your Oracle block size), then you better make sure that the data you put on that disk is so interesting that it motivates your application to execute no more than .5N IOPS to that disk. Otherwise, you're guaranteeing yourself a performance problem.

Your IOPS requirement gets a little trickier, depending on which arrangement you choose for configuring your disks. For example, if you're going to mirror (RAID level 1), then you need to account for the fact that each write call your application makes will motivate two physical writes to disk (one for each copy). Of course, those write calls are going to separate disks, and you better make sure they're going through separate controllers, too. If you're going to do striping with distributed parity (RAID level 5), then you need to realize that each "small" write call is going to generate four physical I/O calls (two reads, and two writes to two different disks).

Of course, RAID level 5 caching complicates the analysis at low loads, but for high enough loads, you can assume away the benefits of cache, and then you're left with an analysis that tell you that for write-intensive data, RAID level 5 is fine as long as you're willing to buy 4× more drives than you thought you needed. ...Which is ironic, because the whole reason you considered RAID level 5 to begin with is that it costs less than buying 2× more drives than you thought you needed, which is why you didn't buy RAID level 1 to begin with.

If you're interested in RAID levels, you should peek at a paper I wrote a long while back, called Configuring Oracle Server for VLDB. It's an old paper, but a lot of what's in there still holds up, and it points you to deeper information if you want it.

You have to think about dimension #3 (availability) so that you can meet your business's requirements for your application to be ready when its users need it. ...Which is why RAID levels 1 and 5 came into the conversation to begin with: because you want a system that keeps running when you lose a disk. Well, different RAID levels have different MTBF and MTTR characteristics, with the bottom line being that RAID level 5 doesn't perform quite as well (or as simply) as RAID level 1 (or, say 1+0 or 0+1), but RAID level 5 has the up-front gratification advantage of being more economical (unless you get a whole bunch of cache, which you pretty much have to, because you want decent performance).

The whole analysis—once you actually go through it—generally funnels you into becoming a BAARF Party member.

Finally, dimension #4 is change. No matter how good your analysis is, it's going to start degrading the moment you put your system together, because from the moment you turn it on, it begins changing. All of your volumes and flows will change. So you need to factor into your analysis how sensitive to change your configuration will be. For example, what % increase in IOPS will require you to add another disk (or pair, or group, etc.)? You need to know in advance, unless you just like surprises. (And you're sure your boss does, too.)

Now, after all this, what would I do with 8 disks? I'd probably stripe and mirror everything, like Juan Loaiza said. Unless I was really, really (I mean really, really) sure I had a low write-rate requirement (think "web page that gets 100 lightweight hits a day"), in which I would consider RAID level 5. I would make sure that my sustained utilization for each drive is less than 50%. In cases where it's not, I would have a performance problem on my hands. In that case, I'd try to balance my workload better across drives, and I would work persistently to find any applications out there that are wasting I/O capacity (naughty users, naughty SQL, etc.). If neither of those actions reduced the load by enough, then I'd put together a justification/requisition for more capacity, and I would brace myself to explain why I thought 8 disks was the right number to begin with.

Cary Millsap

Wednesday, April 8, 2009

What would you do with 8 disks?