Cary Millsap: 2016

Thursday, May 19, 2016

Messed-Up App of the Day: Tables of Numbers

Quick, which database is the biggest space consumer on this system?

Database                  Total Size   Total Storage
-------------------- --------------- ---------------
SAD99PS                    635.53 GB         1.24 TB
ANGLL                        9.15 TB         18.3 TB
FRI_W1                       2.14 TB         4.29 TB
DEMO                         6.62 TB        13.24 TB
H111D16                      7.81 TB        15.63 TB
HAANT                         1.1 TB          2.2 TB
FSU                          7.41 TB        14.81 TB
BYNANK                       2.69 TB         5.38 TB
HDMI7                      237.68 GB       476.12 GB
SXXZPP                     598.49 GB         1.17 TB
TPAA                         1.71 TB         3.43 TB
MAISTERS                   823.96 GB         1.61 TB
p17gv_data01.dbf            800.0 GB         1.56 TB

It’s harder than it looks.

Did you come up with ANGLL? If you didn’t, then you should look again. If you did, then what steps did you have to execute to find the answer?

I’m guessing you did something like I did:

Skim the entire list. Notice that HDMI7 has a really big value in the third column.
Read the column headings. Parse the difference in meaning between “size” and “storage.” Realize that the “storage” column is where the answer to a question about space consumption will lie.
Skim the “Total Storage” column again and notice that the wide “476.12” number I found previously has a GB label beside it, while all the other labels are TB.
Skim the table again to make sure there’s no PB in there.
Do a little arithmetic in my head to realize that a TB is 1000× bigger than a GB, so 476.12 is probably not the biggest number after all, in spite of how big it looked.
Re-skim the “Total Storage” column looking for big TB numbers.
The biggest-looking TB number is 15.63 on the H111D16 row.
Notice the trap on the ANGLL row that there are only three significant digits showing in the “18.3” figure, which looks physically the same size as the three-digit figures “1.24” and “4.29” directly above and below it, but realize that 18.3 (which should have been rendered “18.30”) is an order of magnitude larger.
Skim the column again to make sure I’m not missing another such number.
The answer is ANGLL.

That’s a lot of work. Every reader who uses this table to answer that question has to do it.

Rendering the table differently makes your readers’ (plural!) job much easier:

Database          Size (TB)  Storage (TB)
----------------  ---------  ------------
SAD99PS                 .64          1.24
ANGLL                  9.15         18.30
FRI_W1                 2.14          4.29
DEMO                   6.62         13.24
H111D16                7.81         15.63
HAANT                  1.10          2.20
FSU                    7.41         14.81
BYNANK                 2.69          5.38
HDMI7                   .24           .48
SXXZPP                  .60          1.17
TPAA                   1.71          3.43
MAISTERS                .82          1.61
p17gv_data01.dbf        .80          1.56

This table obeys an important design principle:

The amount of ink it takes to render each number is proportional to its relative magnitude.

I fixed two problems: (i) now all the units are consistent (I have guaranteed this feature by adding unit label to the header and deleting all labels from the rows); and (ii) I’m showing the same number of significant digits for each number. Now, you don’t have to do arithmetic in your head, and now you can see more easily that the answer is ANGLL, at 18.30 TB.

Let’s go one step further and finish the deal. If you really want to make it as easy as possible for readers to understand your space consumption problem, then you should sort the data, too:

Database          Size (TB)  Storage (TB)
----------------  ---------  ------------
ANGLL                  9.15         18.30
H111D16                7.81         15.63
FSU                    7.41         14.81
DEMO                   6.62         13.24
BYNANK                 2.69          5.38
FRI_W1                 2.14          4.29
TPAA                   1.71          3.43
HAANT                  1.10          2.20
MAISTERS                .82          1.61
p17gv_data01.dbf        .80          1.56
SAD99PS                 .64          1.24
SXXZPP                  .60          1.17
HDMI7                   .24           .48

Now, your answer comes in a glance. Think back at the comprehension steps that I described above. With the table here, you only need:

Notice that the table is sorted in descending numerical order.
Comprehend the column headings.
The answer is ANGLL.

As a reader, you have executed far less code path in your brain to completely comprehend the data that the author wants you to understand.

Good design is a topic of consideration. And even conservation. If spending 10 extra minutes formatting your data better saves 1,000 readers 2 minutes each, then you’ve saved the world 1,990 minutes of wasted effort.

But good design is also a very practical matter for you personally, too. If you want your audience to understand your work, then make your information easier for them to consume—whether you’re writing email, proposals, reports, infographics, slides, or software. It’s part of the pathway to being more persuasive.

Friday, May 13, 2016

Fail Fast

Among movements like Agile, Lean Startup, and Design Thinking these days, you hear the term fail fast. The principle of failing fast is vital to efficiency, but I’ve seen project managers and business partners be offended or even agitated by the term fail fast. I’ve seen it come out like, “Why the hell would I want to fail fast?! I don’t want to fail at all.” The implication, of course: “Failing is for losers. If you’re planning to fail, then I don’t want you on my team.”

I think I can help explain why the principle of “fail fast” is so important, and maybe I can help you explain it, too.

Software developers know about fail fast already, whether they realize it or not. Yesterday was a prime example for me. It was a really long day. I didn’t leave my office until after 9pm, and then I turned my laptop back on as soon as I got home to work another three hours. I had been fighting a bug all afternoon. It was a program that ran about 90 seconds normally, but when I tried a code path that should have been much faster, I could let it run 50 times that long and it still wouldn’t finish.

At home, I ran it again and left it running while I watched the Thunder beat the Spurs, assuming the program would finish eventually, so I could see the log file (which we’re not flushing often enough, which is another problem). My MacBook Pro ran so hard that the fan compelled my son to ask me why my laptop was suddenly so loud. I was wishing the whole time, “I wish this thing would fail faster.” And there it is.

When you know your code is destined to fail, you want it to fail faster. Debugging is hard enough as it is, without your stupid code forcing you to wait an hour just to see your log file, so you might gain an idea of what you need to go fix. If I could fail faster, I could fix my problem earlier, get more work done, and ship my improvements sooner.

But how does that relate to wanting my business idea to fail faster? Well, imagine that a given business idea is in fact destined to fail. When would you rather find out? (a) In a week, before you invest millions of dollars and thousands of hours investing into the idea? Or (b) In a year, after you’ve invested millions of dollars and thousands of hours?

I’ll take option (a) a million times out of a million. It’s like asking if I’d like a crystal ball. Um, yes.

The operative principle here is “destined to fail.” When I’m fixing a reported bug, I know that once I create reproducible test case for that bug, my software will fail. It is destined to fail on that test case. So, of course, I want for my process of creating the reproducible test case, my software build process, and my program execution itself to all happen as fast as possible. Even better, I wish I had come up with the reproducible test case a year or two ago, so I wouldn’t be under so much pressure now. Because seeing the failure earlier—failing fast—will help me improve my product earlier.

But back to that business idea... Why would you want a business idea to fail fast? Why would you want it to fail at all? Well, of course, you don’t want it to fail, but it doesn’t matter what you want. What if it is destined to fail? It’s really important for you to know that. So how can you know?

Here’s a little trick I can teach you. Your business idea is destined to fail. It is. No matter how awesome your idea is, if you implement your current vision of some non-trivial business idea that will take you, say, a month or more to implement, not refining or evolving your original idea at all, your idea will fail. It will. Seriously. If your brain won’t permit you to conceive of this as a possibility, then your brain is actually increasing the probability that your idea will fail.

You need to figure out what will make your idea fail. If you can’t find it, then find smart people who can. Then, don’t fear it. Don’t try to pretend that it’s not there. Don’t work for a year on the easy parts of your idea, delaying the inevitable hard stuff, hoping and praying that the hard stuff will work its way out. Attack that hard stuff first. That takes courage, but you need to do it.

Find your worst bottleneck, and make it your highest priority. If you cannot solve your idea’s worst problem, then get a new idea. You’ll do yourself a favor by killing a bad idea before it kills you. If you solve your worst problem, then find the next one. Iterate. Shorter iterations are better. You’re done when you’ve proven that your idea actually works. In reality. And then, because life keeps moving, you have to keep iterating.

That’s what fail fast means. It’s about shortening your feedback loop. It’s about learning the most you can about the most important things you need to know, as soon as possible.

So, when I wish you fail fast, it’s a blessing; not a curse.

Monday, March 7, 2016

Loss Aversion and the Setting of DB_BLOCK_CHECKSUM

Within Accenture Enkitec Group, we have recently been discussing the Oracle db_block_checksum parameter and how difficult it is to get clients to set it to a safer setting.

Clients are always concerned about the performance impact of features like this. Several years ago, I met a lot of people who had—in response to some expensive advice with which I strongly disagreed—turned off redo logging with an underscore parameter. The performance they would get from doing this would set the expectation level in their mind, which would cause them to resist (strenuously!) any notion of switching this [now horribly expensive] logging back on. Of course, it makes you wish that it had never even been a parameter.

I believe that the right analysis is to think clearly about risk. Risk is a non-technical word in most people’s minds, but in finance courses they teach that risk is quantifiable as a probability distribution. For example, you can calculate the probability that a disk will go bad in your system today. For disks, it’s not too difficult, because vendors do those calculations (MTTF) for us. But the probability that you’ll wish you had set db_block_checksum=full yesterday is probably more difficult to compute.

From a psychology perspective, customers would be happier if their systems had db_block_checksum set to full or typical to begin with. Then in response to the question,

“Would you like to remove your safety net in exchange for going between 1% and 10% faster? Here’s the horror you might face if you do it...”

...I’d wager that most people would say no, thank you. They will react emotionally to the idea of their safety net being taken away.

But with the baseline of its being turned off to begin with, the question is,

“Would you like to install a safety net in exchange for slowing your system down between 1% and 10%? Here’s the horror you might face if you don’t...”

...I’d wager that most people would answer no, thank you, even though this verdict is opposite to the one I predicted above. They will react emotionally to the idea of their performance being taken away.

Most people have a strong propensity toward loss aversion. They tend to prefer avoiding losses over acquiring gains. If they already have a safety net, they won’t want to lose it. If they don’t have the safety net they need, they’ll feel averse to losing performance to get one. It ends up being a problem more about psychology than technology.

The only tools I know to help people make the right decision are:

Talk to good salespeople about how they overcome the psychology issue. They have to deal with it every day.
Give concrete evidence. Compute the probabilities. Tell the stories of how bad it is to have insufficient protection. Explain that any software feature that provides a benefit is going to cost some system capacity (just like a new report, for example), and that this safety feature is worth the cost. Make sure that when you size systems, you include the incremental capacity cost of switching to db_block_checksum=full.

My teammates get it, of course, because they’ve lived the stories, over and over again, in their roles on the corruption team at Oracle Support. You can get it, too, without leaving your keyboard. If you want to see a fantastic and absolutely horrifying short story about what happens if you do not use Oracle’s db_block_checksum feature properly, read David Loinaz’s article now.

When you read David’s article, you are going to see heavy quoting of my post here in his intro. He did that with my full support. (He wrote his article when my article here wasn’t an article yet.) If you feel like you’ve read it before, just keep reading. You really, really need to see what David has written, beginning with the question:

If I’ve never faced a corruption, and I have good backup strategy, my disks are mirrored, and I have a great database backup strategy, then why do I need to set these kinds of parameters that will impact my performance?

Enjoy.

Friday, January 8, 2016

The “Two Spaces After a Period” Thing

Once upon a time, I told my friend Chet Justice why he should start using one space instead of two after a sentence-ending period. I’m glad I did.

Here’s the story.

When you type, you’re inputting data into a machine. I know you like feeling like you’re in charge, but really you’re not in charge of all the rules you have to follow while you’re inputting your data. Other people—like the designers of the machine you’re using—have made certain rules that you have to live by. For example, if you’re using a QWERTY keyboard, then the ‘A’ key is in a certain location on the keyboard, and whether it makes any sense to you or not, the ‘B’ key is way over there, not next to the ‘A’ key like you might have expected when you first started learning how to type. If you want a ‘B’ to appear in the input, then you have to reach over there and push the ‘B’ key on the keyboard.

In addition to the rules imposed upon you by the designers of the machine you’re using, you follow other rules, too. If you’re writing a computer program, then you have to follow the syntax rules of the language you’re using. There are alphabet and spelling and grammar rules for writing in German, and different ones for English. There are typographical rules for writing for The New Yorker, and different ones for the American Mathematical Society.

A lot of people who are over about 40 years old today learned to type on an actual typewriter. A typewriter is a machine that used rods and springs and other mechanical elements to press metal dies with backwards letter shapes engraved onto them through an inked ribbon onto a piece of paper. Some of the rules that governed the data input experience on typewriters included:

You had to learn where the keys were on the keyboard.
You had to learn how to physically return the carriage at the end of a line.
You had to learn your project’s rules of spelling.
You had to learn your project’s rules of grammar.
You had to learn your project’s rules of typography.

The first two rules listed here are physical, but the final three are syntactic and semantic. Just like you wouldn’t press the ‘A’ key to make a ‘B’, you wouldn’t use the strings “definately” or “we was” to make an English sentence.

On your typewriter, you might not have realized it, but you did adhere to some typography rules. They might have included:

Use two carriage returns after a paragraph.
Type two spaces after a sentence-ending period.
Type two spaces after a colon.
Use two consecutive hyphens to represent an em dash.
Make paragraphs no more than 80 characters wide.
Never use a carriage return between “Mr.” and the proper name that follows, or between a number and its unit.

The rules were different for different situations. For example, when I wrote a book back in the mid 1980s, one of the distinctive typography rules my publisher imposed upon me was:

Double-space all paragraph text.

They wanted their authors to do this so that their copyeditor had plenty of room for markup. Such typography rules can vary from one project to another.

Most people who didn’t write for different publishers got by just fine on the one set of typography rules they learned in high school. To them, it looked like there were only a few simple rules, and only one set of them. Most people had never even heard of a lot of the rules they should have been following, like rules about widows and orphans.

In the early 1980s, I began using computers for most of my work. I can remember learning how to use word processing programs like WordStar and Sprint. The rules were a lot more complicated with word processors. Now there were rules about “control keys” like ^X and ^Y, and there were no-break spaces and styles and leading and kerning and ligatures and all sorts of new things I had never had to think about before. A word processor was much more powerful than a typewriter. If you did it right, typesetting could could make your work look like a real book. But word processors revealed that typesetting was way more complicated than just typing.

Doing your own typesetting can be kind of like doing your own oil changes. Most people prefer to just put gas in the tank and not think too much about the esoteric features of their car (like their tires or their turn signal indicators). Most people who went from typewriters to word processors just wanted to type like they always had, using the good-old two or three rules of typography that had been long inserted into their brains by their high school teachers and then committed by decades of repetition.

Donald Knuth published The TeXBook in 1984. I think I bought it about ten minutes after it was published. Oh, I loved that book. Using TeX was my first real exposure to the world of actual professional-grade typography, and I have enjoyed thinking about typography ever since. I practice typography every day that I use Keynote or Pages or InDesign to do my work.

Many people don’t realize it, but when you type input into programs like Microsoft Word should follow typography rules including these:

Never enter a blank line (edit your paragraph’s style to manipulate its spacing).
Use a single space after a sentence-ending period (the typesetter software you’re using will make the amount of space look right as it composes the paragraph).
Use a non-breaking space after a non-sentence-ending period (so the typesetter software won’t break “Mr. Harkey” across lines).
Use a non-breaking space between a number and its unit (so the typesetter software won’t break “8 oz” across lines).
Use an en dash—not a hyphen—to specify ranges of numbers (like “3–8”).
Use an em dash—not a pair of hyphens—when you need an em dash (like in this sentence).
Use proper quotation marks, like “this” and ‘this’ (or even « this »).

Of course, you can choose to not follow these rules, just like you can choose to be willfully ignorant about spelling or grammar. But to a reader who has studied typography even just a little bit, seeing you break these rules feels the same as seeing a sentence like, “You was suppose to use apostrophe's.” It affects how people perceive you.

So, it’s always funny to me when people get into heated arguments on Facebook about using one space or two after a period. It’s the tiniest little tip of the typography iceberg, but it opens the conversation about typography, for which I’m glad. In these discussions, two questions come up repeatedly: “When did the rule change? Why?”

Well, the rule never did change. The next time I type on an actual typewriter, I will use two spaces after each sentence-ending period. I will also use two spaces when I create a Courier font court document or something that I want to look like it was created in the 1930s. But when I work on my book in Adobe InDesign, I’ll use one space. When I use my iPhone, I’ll tap in two spaces at the end of a sentence, because it automatically replaces them with a period and a single space. I adapt to the rules that govern the situation I’m in.

It’s not that the rules have changed. It’s that the set of rules was always a lot bigger than most people ever knew.