Cary Millsap: developers

Friday, April 5, 2013

NoSQL and Oracle, Sex and Marriage

At last week’s Dallas Oracle Users Group meeting, an Oracle DBA asked me, “With all the new database alternatives out there today, like all these open source NoSQL databases, would you recommend for us to learn some of those?”

I told him I had a theory about how these got so popular and that I wanted to share that before I answered his question.

My theory is this. Developers perceive Oracle as being too costly, time-consuming, and complex:

An Oracle Database costs a lot. If you don’t already have an Oracle license, it’s going to take time and money to get one. On the other hand, you can just install Mongo DB today.
Even if you have an Oracle site-wide license, the Oracle software is probably centrally controlled. To get an installation done, you’re probably going to have to negotiate, justify, write a proposal, fill out forms, ...you know, supplicate yourself to—er, I mean negotiate with—your internal IT guys to get an Oracle Database installed. It’s a lot easier to just install MySQL yourself.
Oracle is too complicated. Even if you have a site license and someone who’s happy to install it for you, it’s so big and complicated and proprietary... The only way to run an Oracle Database is with SQL (a declarative language that is alien to many developers) executed through a thick, proprietary, possibly even difficult-to-install layer of software like Oracle Enterprise Manager, Oracle SQL Developer, or sqlplus. Isn’t there an open source database out there that you could just manage from your operating system command line?

When a developer is thinking about installing a database today because he needs one to write his next feature, he wants something cheap, quick, and lightweight. None of those constraints really sounds like Oracle, does it?

So your Java developers install this NoSQL thing, because it’s easy, and then they write a bunch of application code on top of it. Maybe so much code that there’s no affordable way to turn back. Eventually, though, someone will accidentally crash a machine in the middle of something, and there’ll be a whole bunch of partway finished jobs that die. Out of all the rows that are supposed to be in the database, some will be there and some won’t, and so now your company will have to figure out how to delete the parts of those jobs that aren’t supposed to be there.

Because now everyone understands that this kind of thing will probably happen again, too, the exercise may well turn into a feature specification for various “eraser” functions for the application, which (I hope, anyway) will eventually lead to the team discovering the technical term transaction. A transaction is a unit of work that must be atomic, consistent, isolated, and durable (that’where this acronym ACID comes from). If your database doesn’t guarantee that every arbitrarily complex unit of work (every transaction) makes it either 100% into the database or not at all, then your developers have to write that feature themselves. That’s a big, tremendously complex feature. On an Oracle Database, the transaction is a fundamental right given automatically to every user on the system.

Let’s look at just that ‘I’ in ACID for a moment: isolation. How big a deal is transaction isolation? Imagine that your system has a query that runs from 1 pm to 2 pm. Imagine that it prints results to paper as it runs. Now suppose that at 1:30 pm, some user on the system updates two rows in your query’s base table: the table’s first row and its last row. At 1:30, the pre-update version of that first row has already been printed to paper (that happened shortly after 1 pm). The question is, what’s supposed to happen at 2 pm when it comes time to print the information for the final row? You should hope for the old value of that final row—the value as of 1 pm—to print out; otherwise, your report details won’t add up to your report totals. However, if your database doesn’t handle that transaction isolation feature for you automatically, then either you’ll have to lock the table when you run the report (creating an 30-minute-long performance problem for the person wanting to update the table at 1:30), or your query will have to make a snapshot of the table at 1 pm, which is going to require both a lot of extra code and that same lock I just described. On an Oracle Database, high-performance, non-locking read consistency is a standard feature.

And what about backups? Backups are intimately related to the read consistency problem, because backups are just really long queries that get persisted to some secondary storage device. Are you going to quiesce your whole database—freeze the whole system—for whatever duration is required to take a cold backup? That’s the simplest sounding approach, but if you’re going to try to run an actual business with this system, then shutting it down every day—taking down time—to back it up is a real operational problem. Anything fancier (for example, rolling downtime, quiescing parts of your database but not the whole thing) will add cost, time, and complexity. On an Oracle Database, high-performance online “hot” backups are a standard feature.

The thing is, your developers could write code to do transactions (read consistency and all) and incremental (“hot”) backups. Of course they could. Oh, and constraints, and triggers (don’t forget to remind them to handle the mutating table problem), and automatic query optimization, and more, ...but to write those features Really Really Well™, it would take them 30 years and a hundred of their smartest friends to help write it, test it, and fund it. Maybe that’s an exaggeration. Maybe it would take them only a couple years. But Oracle has already done all that for you, and they offer it at a cost that doesn’t seem as high once you understand what all is in there. (And of course, if you buy it on May 31, they’ll cut you a break.)

So I looked at the guy who asked me the question, and I told him, it’s kind of like getting married. When you think about getting married, you’re probably focused mostly on the sex. You’re probably not spending too much time thinking, “Oh, baby, this is the woman I want to be doing family budgets with in fifteen years.” But you need to be. You need to be thinking about the boring stuff like transactions and read consistency and backups and constraints and triggers and automatic query optimization when you select the database you’re going to marry.

Of course, my 15-year-old son was in the room when I said this. I think he probably took it the right way.

So my answer to the original question—“Should I learn some of these other technologies?”—is “Yes, absolutely,” for at least three reasons:

Maybe some development group down the hall is thinking of installing Mongo DB this week so they can get their next set of features implemented. If you know something about both Mongo DB and Oracle, you can help that development group and your managers make better informed decisions about that choice. Maybe Mongo DB is all they need. Maybe it’s not. You can help.
You’re going to learn a lot more than you expect when you learn another database technology, just like learning another natural language (like English, Spanish, etc.) teaches you things you didn’t expect to learn about your native language.
Finally, I encourage you to diversify your knowledge, if for no other reason than your own self-confidence. What if market factors conspire in such a manner that you find yourself competing for an Oracle-unrelated job? A track record of having learned at least two database technologies is proof to yourself that you’re not going to have that much of a problem learning your third.

Thursday, June 7, 2012

An Organizational Constraint that Diminishes Software Quality

One of the biggest problems in software performance today occurs when the people who write software are different from the people who are required to solve the performance problems that their software causes. It works like this:

Architects design a system and pass the specification off to the developers.
The developers implement the specs the architects gave them, while the architects move on to design another system.
When the developers are “done” with their phase, they pass the code off to the production operations team. The operators run the system the developers gave them, while the developers move on to write another system.

The process is an assembly line for software: architects specialize in architecture, developers specialize in development, and operators specialize in operating. It sounds like the principle of industrial efficiency taken to its logical conclusion in the software world.

In this waterfall project plan,
architects design systems they never see written,
and developers write systems they never see run.

Sound good? It sounds like how Henry Ford made a lot of money building cars... Isn’t that how they build roads and bridges? So why not?

With software, there’s a horrible problem with this approach. If you’ve ever had to manage a system that was built like this, you know exactly what it is.

The problem is the absence of a feedback loop between actually using the software and building it. It’s a feedback loop that people who design and build software need for their own professional development. Developers who never see their software run don’t learn enough about how to make their software run better. Likewise, architects who never see their systems run have the same problem, only it’s worse, because (1) their involvement is even more abstract, and (2) their feedback loops are even longer.

Who are the performance experts in most Oracle shops these days? Unfortunately, it’s most often the database administrators, not the database developers. It’s the people who operate a system who learn the most about the system’s design and implementation mistakes. That’s unfortunate, because the people who design and write a system have so much more influence over how a system performs than do the people who just operate it.

If you’re an architect or a developer who has never had to support your own software in production, then you’re probably making some of the same mistakes now that you were making five years ago, without even realizing they’re mistakes. On the other hand, if you’re a developer who has to maintain your own software while it’s being operated in production, you’re probably thinking about new ways to make your next software system easier to support.

So, why is software any different than automotive assembly, or roads and bridges? It’s because software design is a process of invention. Almost every time. When is the last time you ever built exactly the same software you built before? No matter how many libraries you’re able to reuse from previous projects, every system you design is different from any system you’ve ever built before. You don’t just stamp out the same stuff over and over.

Software is funny that way, because the cost of copying and distributing it is vanishingly small. When you make great software that everyone in the world needs, you write it once and ship it at practically zero cost to everyone who needs it. Cars and bridges don’t work that way. Mass production and distribution of cars and bridges requires significantly more resources. The thousands of people involved in copying and distributing cars and bridges don’t have to know how to invent or refine cars or bridges to do great work. But with software, since copying and distributing it is so cheap, almost all that’s left is the invention process. And that requires feedback, just like inventing cars and bridges did.

Don’t organize your software project teams so that they’re denied access to this vital feedback loop.