Friday, April 5, 2013

NoSQL and Oracle, Sex and Marriage

At last week’s Dallas Oracle Users Group meeting, an Oracle DBA asked me, “With all the new database alternatives out there today, like all these open source NoSQL databases, would you recommend for us to learn some of those?”

I told him I had a theory about how these got so popular and that I wanted to share that before I answered his question.

My theory is this. Developers perceive Oracle as being too costly, time-consuming, and complex:
  • An Oracle Database costs a lot. If you don’t already have an Oracle license, it’s going to take time and money to get one. On the other hand, you can just install Mongo DB today.
  • Even if you have an Oracle site-wide license, the Oracle software is probably centrally controlled. To get an installation done, you’re probably going to have to negotiate, justify, write a proposal, fill out forms, ...you know, supplicate yourself to—er, I mean negotiate with—your internal IT guys to get an Oracle Database installed. It’s a lot easier to just install MySQL yourself.
  • Oracle is too complicated. Even if you have a site license and someone who’s happy to install it for you, it’s so big and complicated and proprietary... The only way to run an Oracle Database is with SQL (a declarative language that is alien to many developers) executed through a thick, proprietary, possibly even difficult-to-install layer of software like Oracle Enterprise Manager, Oracle SQL Developer, or sqlplus. Isn’t there an open source database out there that you could just manage from your operating system command line?
When a developer is thinking about installing a database today because he needs one to write his next feature, he wants something cheap, quick, and lightweight. None of those constraints really sounds like Oracle, does it?

So your Java developers install this NoSQL thing, because it’s easy, and then they write a bunch of application code on top of it. Maybe so much code that there’s no affordable way to turn back. Eventually, though, someone will accidentally crash a machine in the middle of something, and there’ll be a whole bunch of partway finished jobs that die. Out of all the rows that are supposed to be in the database, some will be there and some won’t, and so now your company will have to figure out how to delete the parts of those jobs that aren’t supposed to be there.

Because now everyone understands that this kind of thing will probably happen again, too, the exercise may well turn into a feature specification for various “eraser” functions for the application, which (I hope, anyway) will eventually lead to the team discovering the technical term transaction. A transaction is a unit of work that must be atomic, consistent, isolated, and durable (that’where this acronym ACID comes from). If your database doesn’t guarantee that every arbitrarily complex unit of work (every transaction) makes it either 100% into the database or not at all, then your developers have to write that feature themselves. That’s a big, tremendously complex feature. On an Oracle Database, the transaction is a fundamental right given automatically to every user on the system.

Let’s look at just that ‘I’ in ACID for a moment: isolation. How big a deal is transaction isolation? Imagine that your system has a query that runs from 1 pm to 2 pm. Imagine that it prints results to paper as it runs. Now suppose that at 1:30 pm, some user on the system updates two rows in your query’s base table: the table’s first row and its last row. At 1:30, the pre-update version of that first row has already been printed to paper (that happened shortly after 1 pm). The question is, what’s supposed to happen at 2 pm when it comes time to print the information for the final row? You should hope for the old value of that final row—the value as of 1 pm—to print out; otherwise, your report details won’t add up to your report totals. However, if your database doesn’t handle that transaction isolation feature for you automatically, then either you’ll have to lock the table when you run the report (creating an 30-minute-long performance problem for the person wanting to update the table at 1:30), or your query will have to make a snapshot of the table at 1 pm, which is going to require both a lot of extra code and that same lock I just described. On an Oracle Database, high-performance, non-locking read consistency is a standard feature.

And what about backups? Backups are intimately related to the read consistency problem, because backups are just really long queries that get persisted to some secondary storage device. Are you going to quiesce your whole database—freeze the whole system—for whatever duration is required to take a cold backup? That’s the simplest sounding approach, but if you’re going to try to run an actual business with this system, then shutting it down every day—taking down time—to back it up is a real operational problem. Anything fancier (for example, rolling downtime, quiescing parts of your database but not the whole thing) will add cost, time, and complexity. On an Oracle Database, high-performance online “hot” backups are a standard feature.

The thing is, your developers could write code to do transactions (read consistency and all) and incremental (“hot”) backups. Of course they could. Oh, and constraints, and triggers (don’t forget to remind them to handle the mutating table problem), and automatic query optimization, and more, ...but to write those features Really Really Well™, it would take them 30 years and a hundred of their smartest friends to help write it, test it, and fund it. Maybe that’s an exaggeration. Maybe it would take them only a couple years. But Oracle has already done all that for you, and they offer it at a cost that doesn’t seem as high once you understand what all is in there. (And of course, if you buy it on May 31, they’ll cut you a break.)

So I looked at the guy who asked me the question, and I told him, it’s kind of like getting married. When you think about getting married, you’re probably focused mostly on the sex. You’re probably not spending too much time thinking, “Oh, baby, this is the woman I want to be doing family budgets with in fifteen years.” But you need to be. You need to be thinking about the boring stuff like transactions and read consistency and backups and constraints and triggers and automatic query optimization when you select the database you’re going to marry.

Of course, my 15-year-old son was in the room when I said this. I think he probably took it the right way.

So my answer to the original question—“Should I learn some of these other technologies?”—is “Yes, absolutely,” for at least three reasons:
  • Maybe some development group down the hall is thinking of installing Mongo DB this week so they can get their next set of features implemented. If you know something about both Mongo DB and Oracle, you can help that development group and your managers make better informed decisions about that choice. Maybe Mongo DB is all they need. Maybe it’s not. You can help.
  • You’re going to learn a lot more than you expect when you learn another database technology, just like learning another natural language (like English, Spanish, etc.) teaches you things you didn’t expect to learn about your native language.
  • Finally, I encourage you to diversify your knowledge, if for no other reason than your own self-confidence. What if market factors conspire in such a manner that you find yourself competing for an Oracle-unrelated job? A track record of having learned at least two database technologies is proof to yourself that you’re not going to have that much of a problem learning your third.

14 comments:

Anonymous said...

Great post, particularly about the business risk of discovering the value of transactions after the need for them has been made obvious by events.

I think you've also (intentionally or not) made a compelling argument that any organization with an Oracle site license should at least be proactively installing Oracle XE on every developer workstation, if they're not ready to make provision-on-demand Oracle databases available to their developers sans red tape.

NO LONGER ACTIVE said...
This comment has been removed by the author.
Steve said...

I wholeheartedly agree. The worst thing that can happen is that you learn something that may not be relevant to your current work. It will still help any DBA to learn new technology that is out there. You may have the option to lobby for its use at your company, increasing your influence and experience, or to work with it elsewhere.

If you don't mind (and please feel free to delete this comment if you do), I wanted to link an article I wrote on this called DBA, Grow Thyself - Moving and Shaking in the Era of Data Dominance (though I like your title FAR better!). There is never anything wrong with broadening your experience and branching out within your organization.

Of course, there may be some time wasters out there; something you take time to learn and the technology fades away. But chances are you will still gain useful experience. Data is the hottest topic in the business world right now, and any database professional should be willing to invest time in new ways of storing it even when they disagree with the tech.

Joel Garry said...

I think you are being too easy on developers. It's not because they think it's too costly etc., (although they do); it's because they want to be in the forefront of technology. Nothing wrong with that, except for the wheel-reinventions you mention. This tends to be ignored by evangelists.

So you wind up with management consultants talking about the positive effects of "disruptors" in the business, or whatever the paradigm of the day is someone read about in Harvard Business Review, and everyone drinks the kool-aid. Companies spend big bucks developing bespoke systems for ten times what they could have bought off the shelf, for no business advantage. Assuming the project doesn't just plain fail.

In the end, it's a management issue of strategic planning, or lack thereof.

For any given Oracle person, yes, it is beneficial to understand these things and broaden one's horizons. It's difficult to find a balance though, because you have to put in thousands of hours of work to really get competent at it. That's a big risk to a career, especially if the life cycle of the product is on the order of a decade. Not to say there isn't an upside to that - it can be quite lucrative to be one of the few experts left of an obsolete product, especially if it interfaces with Oracle. Until it isn't.

I knew a guy who had a sports car called an NSX, and he called it a NoSex (because it was too smooth and reliable, lacking the "soul" of, say, a Ferrari).

Good post!

Aris Prassinos said...

I don't think this should be viewed as an either/or question.
There are certain applications that may better belong in the nosql realm.
Or better yet, some applications may use a hybrid or polyglot approach to combine the relational and nosql approaches.
If you don't mind this shameless plug, I will be discussing such an application at KSCOPE13.
http://kscope13.com/component/seminar/seminarslist#Storing Billions of Images in a Hybrid Relational and NoSQL Database Using Active Data Guard and Ora

Iggy Fernandez said...

It appears that application developers have belatedly but correctly concluded that an RDBMS is not the best tool for every application. For example, relational algebra, relational calculus, and SQL are not the best tools for graph problems. As another example, weblogs are non-transactional and don’t benefit from the ACID properties of the RDBMS. Amazon created the Dynamo key-value store for a highly specific use case. From the Dynamo white paper: “Customers should be able to view and add items to their shopping cart even if disks are failing, network routes are flapping, or data centers are being destroyed by tornados. … There are many services on Amazon’s platform that only need primary-key access to a data store. … Simple read and write operations to a data item that is uniquely identified by a key. Data is stored as binary objects (i.e., blobs) identified by unique keys. No operations span multiple data items and there is no need for relational schema. … The operation environment is assumed to be non-hostile and there are no security related requirements such as authentication and authorization.”

Cary Millsap said...

Today, I corrected the paragraph beginning with, “Let’s look at just that ‘I’ in ACID for a moment: isolation.” In the original post, I had erroneously placed Oracle’s multi-version read-consistency feature into the ‘C’ (consistency) category. Thank you to my friends who politely and discreetly pointed out my error.

Mladen Gogala said...

In my experience, the movement away from Oracle is not driven so much by its complexity, but by its price.PostgreSQL, MySQL and MongoDB are not there yet. They cannot provide a platform which would be usable both for OLTP and DW applications,there are issues with ACID,
data loads, platform support etc, etc. Issues with LSD (ACID) can be resolved by installing a transaction manager, like System T (aka "Tuxedo") or CICS, the latter being the oldest and the most successful one of them all, but that only adds to the complexity.
The real danger for Oracle does not come from the fairly immature open source technologies like MongoDB or MySQL, it comes from its arch-nemesis known as IBM. DB2 is much cheaper, has all the same features and capabilities and is VERY compatible with Oracle. I am a consultant and I started encountering DB2 ever more frequently. That's the real danger: a database just like Oracle, but the vendor decided to drop the price and go after Oracle's market share. Oracle will have some interesting competition to win. If it loses the battle, it will be the end of it, the company will have to shrink significantly, just like MS which was reduced from an 800 LBS gorilla, a near perfect monopoly, to just one among the IT companies, not nearly as powerful as Apple or Google. That has happened before. Remember DEC and Ken Olsen?

Anonymous said...

Great points here, though I seldom see complete ignorance of transactions (I see plenty of immature and untested reactions to them - e.g. every operation is a transaction). What I've seen, heard, and experienced myself is a visceral reaction to Oracle sales and licensing policies. "IT" (admins and ops) is charged with safeguarding and plying a company's investment in Oracle to maximize ROI. Developers are charged with pushing application updates faster, maintaining (or creating) a competitive advantage and increasing conversions. It's a recipe for ongoing tension and derision that IaaS, PaaS, DBaaS and even BaaS allow developers to avoid. The larger enterprises (FiServ included) we work with prefer a services-based model because of the promised efficiency - not the cost savings many champion.

Today things need to be easier and Oracle simply cannot react quickly enough; the point of noSQL isn't K/V access - their BDB solution missed the point (so did MySQL's cluster API implementation/hack). Heroku got easy right for Ruby, node, etc developers. Amazon's OpsWorks is moving in that direction, and Rackspace has recently offered a specific stack to retain (capture) mobile developers being wooed by the likes of Firebase and Parse. Our company (akiban.com) provides a relational database with SQL+noSQL access, ACID guarantees and an operator-based framework for performance and scalability. To another commenter's point we're far younger than Oracle, but our customers and users understand what they need and how important it is to allow their developers to focus on growing their technology, their user base, and their business - not the volumes of IT red tape and Oracle's coffers.

Ross said...

@MLaden

"..PostgreSQL, MySQL and MongoDB are not there yet...."

While I am pretty sure you meant this in the sense of "taken as a whole" (in which case, point granted), I believe you'll find PostgreSQL has stronger ANSI SQL compliance than Beloved Oracle. A bit of a niggling point for most but in this sense is very much "there yet".

John Brady said...

I'm late to this (catching up on blogs after a week away), and agree with your points. I generally liken behind the scenes infrastructure software to plumbing in a house - very important to have right and working, but most people only ever see the bits that poke out from the wall and are unaware of the other stuff linking it all together.

Hot water and power showers are great features, but none of it works right unless the plumbing behind the scenes has been done properly.

Yes, plumbing is not "sexy" as a technical topic, but it is "essential". Just as you point out the ACID properties and transactional consistency and integrity are essential to a real world database.

Unfortunately there are a great many developers focussed on the externally visible features (user interface and business logic) who are not aware of the behind the scenes plumbing that is necessary to deliver all of that.

Agile is another thing that if done right can be a good thing, but when done badly leaves behind a mess. I've seen Agile used as an excuse to ignore what I call the plumbing issues, until they try to deploy the application into production and suddenly it does not scale at all because the plumbing was ignored.

John

L. Jenssen said...

Great post Cary. And thanks for your comments on my presentation at allthingsoracle.com today!

Every year I attend the JavaZone seminar in Norway. Iẗ́s a big Java conference. The last couple of years there has been a lot of talks/presentations about NoSQL databases. None on Oracle. This year Oracle was not even represented with a stand. I think this is kind of strange knowing that Oracle bought Sun (which is behind Java).

On the last years conference I was standing talking to some developers. As usually I introduce myself as an Oracle ressource. Then one of the developers replied: "We don't need Oracle anymore. We use NoSql databases". This made me start thinking. Really? I did not know much about NoSQL databases, so I did not start to argue. But the same day I bought several books about NoSQL databases. Next time I meet the developer, I surely have some questions for him :-)

After reading and testing some NoSQL databases (for instance MongoDB) I do understand why some developers have a preferrence towards NoSQL database. But I also understand that this is not a "one or the other" questions. I really see a lot of cases where NoSQL databases would be preferable also in Bank & Finance (where I work). For instance doing application logging. This kind of data might not have the same ACID requirements. For me NoSQL is Not Only SQL databases.

But If I take the developers side, I do understand that they don't like to be dependent of a peripheral DBA. A DBA that doesn't really speak their language and understand their needs. I took me about 5 minuttes to install the mongoDB software on my Ubuntu laptop, and in 5 more minuttes I've written some code against it.

So understanding why, we need to figure out how we can make Oracle more attractive to our developers. What can I do for my developers? How can I make my developers life easier? For instance make a Vagrant image of a VirtualBox implementation with an Oracle intallation. This way my developers can have an oracle database available localy on their laptop in a matter of minutes. Mapped towards a local port, as it was running locally next to their Eclipse (Java GUI).

The NoSQL databases are selling their way into the developers hearts. Oracle resources needs to start "selling" the Oracle database. Then it's not enough to start talking about installation, patching, backup, etc. The developers couldn't care less. DBA needs to talk the developer language. Well ... that's my honest opinion :-)

Cary Millsap said...

Thank you, Lasse. Excellent points.

—Cary

Colin 't Hart said...

IMHO, and it's just an opinion, developers like NoSQL databases because it frees them from having to think up front about the problems they're supposed to be solving and just dive right in and start building something.

Conversely, a relational database requires a lot more thought up front, something many developers are loathe to do.

Additionally, relational databases offer a high level of abstraction which many developers seem to want to ignore choosing instead for more basic building blocks -- probably because that's all they know or want to use (not developed here etc).