Cary Millsap: design

Showing posts with label design. Show all posts

Monday, August 21, 2023

A Design Decision

This week, my team at Method R devoted some time to an enhancement request that required an interesting design decision. This post is about the analysis behind that decision.

The enhancement request was for our flagship product called Method R Workbench. It's an application that people use to mine, manage, and manipulate Oracle trace files.

One of its features, called mrskew, is a tool that allows a Workbench user to filter, group, and sort the raw dbcall and syscall data that Oracle Database processes write to their trace files. You can use mrskew from within the Workbench application, or from a *nix command line.

Here's an example of a mrskew command. It's what you would use to find out how long your program spent reading Oracle blocks from secondary storage. It will show you which blocks it read, how long they took, and how many times each block was read:

mrskew --name='db.*read' \
  --group='sprintf("%8d %8d", $p1, $p2)' \
  x_ora_1492.trc

Here's the output:

sprintf("%8d %8d", $p1, $p2)  DURATION       %  CALLS      MEAN  ...
----------------------------  --------  ------  -----  --------  
                  2        2  0.072918    1.0%     26  0.002805  ...
                 33   698186  0.051940    0.7%      1  0.051940  ...
                 50   339841  0.049261    0.7%      1  0.049261  ...
...

The important thing in this report is the meaning of the $p1 and $p2 variables. The combination of these two variables happens to represent the data block address (the file number and block number) of an Oracle block that was read by some kind of an Oracle read call. It would be nice for the report to tell you that instead of just telling you that the first two columns of numbers are the output of an sprintf function call.

We have a command-line option for that. The ‑‑group-label option lets you assign your own title for the group column. So, with some careful character counting, you could use…

‑‑group-label='    FILE    BLOCK'

…to get exactly the heading you want:

    FILE    BLOCK  DURATION       %  CALLS      MEAN  ...
-----------------  --------  ------  -----  --------  
       2        2  0.072918    1.0%     26  0.002805  ...
      33   698186  0.051940    0.7%      1  0.051940  ...
      50   339841  0.049261    0.7%      1  0.049261  ...
...

That makes sense. Now it's easy to see that Oracle has read one block (file #2, block #2) 26 times, consuming a total of 0.072918 seconds reading it.

The group label fits the output, only because of the careful character counting. The enhancement request was to allow the ‑‑group-label option to take an expression, not just a string. Like this:

--group-label='sprintf("%8s %8s", "FILE", "BLOCK")'

That way, he could print out the header he wanted, perfectly aligned, by just syncing his ‑‑group‑label expression to his ‑‑group expression, without having to count space characters that are literally invisible.

It's a smart idea. The group label option should have been designed that way from the beginning. We eagerly approved the enhancement request and began thinking about the design.

When we thought it through, we ended up with two different ideas about how we could implement this new idea:

Redefine ‑‑group‑label to take an expression instead of a string. mrskew will calculate the value of the expression before printing the column label.
Create a new option, say, ‑‑new‑group‑label, that takes an expression as its argument. And leave ‑‑group‑label as it is.

The first idea is how the enhancement request was worded. The second idea entered our minds because the first idea creates a compatibility problem: if we change the spec of the existing ‑‑group‑label option, it will break some existing mrskew scripts. For example, these will work in Workbench 9.2.5:

--group-label=FILE
--group-label="FILE BLOCK"

But if we redefine ‑‑group‑label to take an expression instead of a string, then these won't work anymore. People will need to quote their string expressions like this:

--group-label='"FILE"'
--group-label='"FILE BLOCK"'

In the end, we decided to redefine the existing option and live with the compatibility breach.

The way we make decisions like this is that we create strenuous arguments for each idea. Here are some of the arguments we considered en route to our decision.

First, the customer experience (cognitive expenditure).

Everyone who participated in the debate had the customer experience foremost in mind. But how can we objectively measure "customer experience"? How do you structure a scientific debate about the superiority of one experience over another?

One way to do it is to measure cognitive expenditure—the amount of mental effort that a user has to invest to get the desired outcome from our software. We want to minimize cognitive expenditure, to maximize a customer's return on investment of effort.

We began by realizing that responding to this enhancement request with one of our two ideas would necessarily force the user into one of two new regimes:

The syntax of ‑‑group-label has changed.
There's a new ‑‑new-group-label option.

In regime 1, our users would have to learn the new syntax. That's a cognitive expenditure. But it's a one-time expenditure, which is good. The new syntax would be consistent with the existing ‑‑group syntax, which is actually a cognitive savings for our users over what we have now. However, if a customer had saved any scripts that used the old syntax, then the customer would have to convert those scripts. That's a cognitive expenditure in a loop (one for each script), which is bad.

In regime 2, our users would have to learn about ‑‑new-group‑label, which is a cognitive expenditure. They'd still have to remember (or relearn) about ‑‑group‑label, too, which is a similar cognitive expenditure as the one in regime 1. They wouldn't have to modify any old scripts, but they would have to make the choice of whether to use ‑‑group‑label or ‑‑new-group‑label, every time they wrote a script in the future. That's another cognitive expenditure in a loop (one for each script), which is bad.

Second, the developer experience (technical debt).

We also need to consider the developer's experience. We don't want to create code that increases technical debt that makes the product unnecessarily difficult to support.

If we redefine ‑‑group-label, there's no long-term effect to worry about. But if we add ‑‑new‑group‑label to the story, I would expect for people to wonder, why are there two such similar group label options, when one (the one that takes an expression) is clearly superior? And why does the inferior one have the better name?

At some point in the future, I envision wanting to clean up the cruft and have just the one group label feature. Naturally, the right name for it would be ‑‑group‑label. But of course, changing the spec that way would introduce a compatibility problem. To make things worse, this would occur in the future when—one would hope, if our business is growing—such a decision would impact even more customers than it would today. So then, why create the cruft in the first place? It'll be a worse problem later than it is now.

The question that really seals the deal, is who will the change really affect? It's really a probability question about customer experiences.

Most users who use the Workbench application will never experience our group label option directly. It's there for everybody to use, but our Workbench has so many predefined reports built into it, most users never need to touch the group label option for themselves. When they do need to modify it, they're usually tweaking a report that we've predefined for them, which is a low–cognitive-expenditure experience.

In the end, Method R Corporation bears almost the entire cost of the ‑‑group‑label redefinition. It required us to revise:

The code for mrskew ‑‑group‑label
The prepackaged actions in Method R Workbench
The mrskew .rc files
The mrskew manual page
The Mastering Oracle Trace Data book
The Mastering Oracle Trace Data course

Most users will experience the benefit of the ‑‑group‑label change, without ever knowing that, once upon a time, it changed. And that's the way we want it. We want the product to be as smart as possible so that our customers get the most out of their investment, both cognitive and financial.

Thursday, May 19, 2016

Messed-Up App of the Day: Tables of Numbers

Quick, which database is the biggest space consumer on this system?

Database                  Total Size   Total Storage
-------------------- --------------- ---------------
SAD99PS                    635.53 GB         1.24 TB
ANGLL                        9.15 TB         18.3 TB
FRI_W1                       2.14 TB         4.29 TB
DEMO                         6.62 TB        13.24 TB
H111D16                      7.81 TB        15.63 TB
HAANT                         1.1 TB          2.2 TB
FSU                          7.41 TB        14.81 TB
BYNANK                       2.69 TB         5.38 TB
HDMI7                      237.68 GB       476.12 GB
SXXZPP                     598.49 GB         1.17 TB
TPAA                         1.71 TB         3.43 TB
MAISTERS                   823.96 GB         1.61 TB
p17gv_data01.dbf            800.0 GB         1.56 TB

It’s harder than it looks.

Did you come up with ANGLL? If you didn’t, then you should look again. If you did, then what steps did you have to execute to find the answer?

I’m guessing you did something like I did:

Skim the entire list. Notice that HDMI7 has a really big value in the third column.
Read the column headings. Parse the difference in meaning between “size” and “storage.” Realize that the “storage” column is where the answer to a question about space consumption will lie.
Skim the “Total Storage” column again and notice that the wide “476.12” number I found previously has a GB label beside it, while all the other labels are TB.
Skim the table again to make sure there’s no PB in there.
Do a little arithmetic in my head to realize that a TB is 1000× bigger than a GB, so 476.12 is probably not the biggest number after all, in spite of how big it looked.
Re-skim the “Total Storage” column looking for big TB numbers.
The biggest-looking TB number is 15.63 on the H111D16 row.
Notice the trap on the ANGLL row that there are only three significant digits showing in the “18.3” figure, which looks physically the same size as the three-digit figures “1.24” and “4.29” directly above and below it, but realize that 18.3 (which should have been rendered “18.30”) is an order of magnitude larger.
Skim the column again to make sure I’m not missing another such number.
The answer is ANGLL.

That’s a lot of work. Every reader who uses this table to answer that question has to do it.

Rendering the table differently makes your readers’ (plural!) job much easier:

Database          Size (TB)  Storage (TB)
----------------  ---------  ------------
SAD99PS                 .64          1.24
ANGLL                  9.15         18.30
FRI_W1                 2.14          4.29
DEMO                   6.62         13.24
H111D16                7.81         15.63
HAANT                  1.10          2.20
FSU                    7.41         14.81
BYNANK                 2.69          5.38
HDMI7                   .24           .48
SXXZPP                  .60          1.17
TPAA                   1.71          3.43
MAISTERS                .82          1.61
p17gv_data01.dbf        .80          1.56

This table obeys an important design principle:

The amount of ink it takes to render each number is proportional to its relative magnitude.

I fixed two problems: (i) now all the units are consistent (I have guaranteed this feature by adding unit label to the header and deleting all labels from the rows); and (ii) I’m showing the same number of significant digits for each number. Now, you don’t have to do arithmetic in your head, and now you can see more easily that the answer is ANGLL, at 18.30 TB.

Let’s go one step further and finish the deal. If you really want to make it as easy as possible for readers to understand your space consumption problem, then you should sort the data, too:

Database          Size (TB)  Storage (TB)
----------------  ---------  ------------
ANGLL                  9.15         18.30
H111D16                7.81         15.63
FSU                    7.41         14.81
DEMO                   6.62         13.24
BYNANK                 2.69          5.38
FRI_W1                 2.14          4.29
TPAA                   1.71          3.43
HAANT                  1.10          2.20
MAISTERS                .82          1.61
p17gv_data01.dbf        .80          1.56
SAD99PS                 .64          1.24
SXXZPP                  .60          1.17
HDMI7                   .24           .48

Now, your answer comes in a glance. Think back at the comprehension steps that I described above. With the table here, you only need:

Notice that the table is sorted in descending numerical order.
Comprehend the column headings.
The answer is ANGLL.

As a reader, you have executed far less code path in your brain to completely comprehend the data that the author wants you to understand.

Good design is a topic of consideration. And even conservation. If spending 10 extra minutes formatting your data better saves 1,000 readers 2 minutes each, then you’ve saved the world 1,990 minutes of wasted effort.

But good design is also a very practical matter for you personally, too. If you want your audience to understand your work, then make your information easier for them to consume—whether you’re writing email, proposals, reports, infographics, slides, or software. It’s part of the pathway to being more persuasive.

Monday, December 22, 2008

Cuts, Thinks About Cutting, Part 2

Thank you all for your comments on my prior post.

To Robyn's point in particular, I believe the processes of rehearsing, visualizing, and planning have extraordinarily high value. Visualization and rehearsal are vital, in my opinion, to doing anything well. Ask pilots, athletes, actors, public speakers, programmers, .... Check out "Blue Angels: A Year in the Life" from the Discovery Channel for a really good example.

But there's a time when the incremental value of planning drops beneath the value of actually doing something. I think the location of that "time" and the concept of obsessiveness are related. I'm learning that it's earlier than I had believed when I was younger.

The one sentence in my original blog post that captures the idea I want to emphasize is, "I'm not advocating that design shouldn't happen; I'm advocating that you not pretend your design is build-worthy before you can possibly know."

Thinking of all this reminds me of a "tree house" I wanted to build when I was about 10 years old. I use quotation marks, because it wasn't meant to have anything to do with a tree at all. Here was the general idea of the design:

My dad was a pilot, so when I was 10, I was too. The front of this thing was going to have an instrument panel and some windows, and I was going to fly this thing everywhere I could think of.

I drew detailed design drawings, and I dug a hole. (If you have children, by the way, make sure they dig a hole sometime before they grow up. Trust me.) My plan called for a 2x4 frame with plywood walls screwed to that frame. It's when I started gathering the 2x4 lumber that my plan fell apart. See, I had naively planned that 2x4 lumber is 2 inches thick by 4 inches wide. It's not. It's 1-1/2 inches thick by 3-1/2 inches wide. That meant I'd have to redraw my whole plan. Using fractions.

I don't remember exactly what happened next, but I do remember exactly what didn't happen. I didn't redraw my whole plan, and I didn't ever build the "tree house." Eventually, I did finally fill in that hole in the yard.

Here are some lessons to take away from that experience:

Not understanding other people's specs well enough is one way to screw up your project.
Actually getting started is a sure way to learn what you need to know about those other people's specs.
You learn things in reality that you just don't learn when you're working on paper.
Not building the "tree house" at all meant that I'd have to wait until I was older to learn that putting wood on (or into) the ground is an impermanent solution.
The problems that you think during the design phase are your big problems may not be your actual big problems.

Friday, December 19, 2008

Cuts, Thinks About Cutting

This xkcd comic I saw today reminded me of a conversation a former colleague at Oracle and I had some years ago. During a break in one of the dozens of unmemorable meetings we attended together, he and I were talking about our wood shops. He had built several things lately, which he had told me about. I was in the planning phase of some next project I was going to do. He observed that this seemed to be the conversation we were always having. He argued that it would be better for me to build something, have the fun of building it, actually use the thing I had built, and then learn from my mistakes than to spend so much of my time hatching on what I was going to do next.

I of course argued that I didn't want to waste my very precious whatever-it-was that I would mess up if I didn't do it exactly right on the first try. We went back and forth over the idea, until we had to go back to our unmemorable meeting. I remember him shaking his head as we walked back in.

Sometime after we sat down, he passed a piece of paper over to me that looked like this:

It said "cuts" with an arrow pointing to him, and "thinks about cutting" with an arrow pointing to me.

Thus was I irretrievably labeled.

He was right. In recent years, I've grown significantly in the direction of acquiring actual experience, not just imagined experience. I'm still inherently a "math guy" who really enjoys modeling and planning and imagining. But most of the things I've created that are really useful, I've created by building something very basic that just works. Most of the time when I've done that, I've gotten really good use out of that very basic thing. Most of the time, such a thing has enabled me to do stuff that I couldn't have done without it.

In the cases where the very basic thing hasn't been good enough, I've either enhanced it, or I've scrapped it in favor of a new version 2. In either case, the end result was better (and it happened earlier) than if I had tried to imagine every possible feature I was going to need before I built anything. Usually the enhancements have been in directions that I never imagined prior to actually putting my hands on the thing after I built it. I've written about some of those experiences in a paper called "Measure once, cut twice (no, really)".

Incremental design is an interesting subject, especially in the context of our Oracle ecosystem, where there is a grave shortage of competent design. But I do believe that a sensible application of the "measure-once, cut-twice" principle improves software quality, even in projects where you need a well-designed relational database model. I'm not advocating that design shouldn't happen; I'm advocating that you not pretend your design is build-worthy before you can possibly know.

Some good places where you can read more about this philosophy include...

Getting Real, a 91-essay e-book written by 37signals.
"Extreme programming: a gentle introduction", an introduction to a discipline for building things, wherein you can embrace the advantages of incremental design.
"Is design dead?", a paper by Martin Fowler that supports the idea that evolutionary design becomes viable in the presence of certain risk-mitigating practices.