Wednesday, April 8, 2009

What would you do with 8 disks?

Yesterday, David Best posted this question at Oracle-L:
If you had 8 disks in a server what would you do? From watching this list I can see alot of people using RAID 5 but i'm wary of the performance implicatons. (http://www.miracleas.com/BAARF/)

I was thinking maybe RAID 5 (3 disks) for the OS, software and
backups. RAID 10 (4 disks + 1 hot spare) for the database files.

Any thoughts?
I do have some thoughts about it.

There are four dimensions in which I have to make considerations as I answer this question:
  1. Volume
  2. Flow
  3. Availability
  4. Change
Just about everybody understands at least a little bit about #1: the reason you bought 8 disks instead of 4 or 16 has something to do with how many bytes of data you're going to store. Most people are clever enough to figure out that if you need to store N bytes of data, then you need to buy N + M bytes of capacity, for some M > 0 (grin).

#2 is where a lot of people fall off the trail. You can't know how many disks you really need to buy unless you know how many I/O calls per second (IOPS) your application is going to generate. You need to ensure that your sustained IOPS rate on each disk will not exceed 50% (see Table 9.3 in Optimizing Oracle Performance for why .5 is special). So, if a disk drive is capable of serving N 8KB IOPS (your disk's capacity for serving I/O calls at your Oracle block size), then you better make sure that the data you put on that disk is so interesting that it motivates your application to execute no more than .5N IOPS to that disk. Otherwise, you're guaranteeing yourself a performance problem.

Your IOPS requirement gets a little trickier, depending on which arrangement you choose for configuring your disks. For example, if you're going to mirror (RAID level 1), then you need to account for the fact that each write call your application makes will motivate two physical writes to disk (one for each copy). Of course, those write calls are going to separate disks, and you better make sure they're going through separate controllers, too. If you're going to do striping with distributed parity (RAID level 5), then you need to realize that each "small" write call is going to generate four physical I/O calls (two reads, and two writes to two different disks).

Of course, RAID level 5 caching complicates the analysis at low loads, but for high enough loads, you can assume away the benefits of cache, and then you're left with an analysis that tell you that for write-intensive data, RAID level 5 is fine as long as you're willing to buy 4× more drives than you thought you needed. ...Which is ironic, because the whole reason you considered RAID level 5 to begin with is that it costs less than buying 2× more drives than you thought you needed, which is why you didn't buy RAID level 1 to begin with.

If you're interested in RAID levels, you should peek at a paper I wrote a long while back, called Configuring Oracle Server for VLDB. It's an old paper, but a lot of what's in there still holds up, and it points you to deeper information if you want it.

You have to think about dimension #3 (availability) so that you can meet your business's requirements for your application to be ready when its users need it. ...Which is why RAID levels 1 and 5 came into the conversation to begin with: because you want a system that keeps running when you lose a disk. Well, different RAID levels have different MTBF and MTTR characteristics, with the bottom line being that RAID level 5 doesn't perform quite as well (or as simply) as RAID level 1 (or, say 1+0 or 0+1), but RAID level 5 has the up-front gratification advantage of being more economical (unless you get a whole bunch of cache, which you pretty much have to, because you want decent performance).

The whole analysis—once you actually go through it—generally funnels you into becoming a BAARF Party member.

Finally, dimension #4 is change. No matter how good your analysis is, it's going to start degrading the moment you put your system together, because from the moment you turn it on, it begins changing. All of your volumes and flows will change. So you need to factor into your analysis how sensitive to change your configuration will be. For example, what % increase in IOPS will require you to add another disk (or pair, or group, etc.)? You need to know in advance, unless you just like surprises. (And you're sure your boss does, too.)

Now, after all this, what would I do with 8 disks? I'd probably stripe and mirror everything, like Juan Loaiza said. Unless I was really, really (I mean really, really) sure I had a low write-rate requirement (think "web page that gets 100 lightweight hits a day"), in which I would consider RAID level 5. I would make sure that my sustained utilization for each drive is less than 50%. In cases where it's not, I would have a performance problem on my hands. In that case, I'd try to balance my workload better across drives, and I would work persistently to find any applications out there that are wasting I/O capacity (naughty users, naughty SQL, etc.). If neither of those actions reduced the load by enough, then I'd put together a justification/requisition for more capacity, and I would brace myself to explain why I thought 8 disks was the right number to begin with.

7 comments:

Tony said...

Amen.

I am currently running RAID 5 because when I was getting help from my server admins during initial configuration I was too weak to fight for 0+1. When asked I first said RAW for ASM (frowning ensued), so on the second pass I asked for 0+1. The response was that RAID 5 works really well with databases (he was referring to SQL Server). Because I am spinless I ended up with a RAID 5 configuration and am constantly stuck with I/O problems. My EM console is full with ADDM reports about I/O bottlenecks and general disks read/write issues. I desperately wish I were 1) a bigger man when the original issue came up
2) able to reconfigure the server for 0+1.

The servers we replaced were running ASM and I never had any troubles with I/O. Now, even the "simple" queries have issues.

Cary Millsap said...

Amen...

Don Burleson said...

BAARF

That was very true a decade ago, but it's now ancient stuff, the fodder of myths . . .

Today, vendors have overcome the RAID-5 write penalty and there are RAID5 storage devices that can accommodate high update databases without the high "write penalty" latency.

One such example is the Hitachi TagmaStore RAID5 "Universal Storage Platform" with a quarter terabyte RAM cache!

Because this on-board RAM cache is so huge, the database can write to disk and move on; the parity is calculated from the data in cache asynchronously. So unless the cache is over-extended, the database will not suffer the RAID 5 write penalty.

Cary Millsap said...

All RAID level 5 implementations suffer from the small write penalty. It's just a matter of whether you can afford to put as much cache between you and that penalty as you need to satisfy your IOPS requirements. Some people can't.

The thing that has always puzzled me about RAID level 5 implmentations: it's a technology invented to save money compared to implementing RAID level 1. ...But to make it work, you have to have lots of software and memory to go with it. In my opinion, the cost and complexity of the implementation takes most of the benefit out of the technology.

You get some benefits with RAID level 1 that RAID level 5 can't provide, too, like the ability to peel off test systems with a third mirror. RAID level 1 gives much better performance under conditions of partial outage than level 5 gives, and level 1 MTBF is better than level 5 MTBF. I just think a customer gets less real net benefit with RAID level 5 than he's often led to believe.

Too many people still get nasty surprises because they don't sufficiently understand their own IOPS and availability requirements and how they relate to the various technologies on the menu.

Unknown said...

One thing that drives me nuts about some of the current implementations: they are so complex that when you run into performance issues, no one can figure it out. Not your on-site storage folks. Not your Unix folks. Not the folks who write the flavor of volume management software you are using. Not the HBA folks. Not the folks that set up the networking for your SAN. Not the folks that manufactured the frames. You wind up dumping everything and going to RAID 1+0 on a new batch of disks after everyone is out of breath from arguing.

Then again, people appear to like surprises - it's much more expensive to plan ahead and do the work up front, isn't it?

Hrumpf.

Cary Millsap said...

Dallas,

I guess that's what Cracker Jack, RAID level 5, and the Spanish Inquisition all have in common:

...The element of surprise.

Bob said...

I use SAME on all eight if possible. I've seen many RAID5 in this configuration, but with cache and rebuild limitations. With ASM I might *test* two disks RAID1 for OS, and six disks to ASM. ASM redundancy will lose half space just like SAME. Only eight disks limits ASM options, so this ASM alternative is a low-end storage solution. SAME and RAID5 might not be as easily expanded as ASM, so the performance and sizing requirements will help decide.