It doesn’t take long to appreciate a great software tester. And it doesn’t matter if she’s a manual tester or writes automated tests, because what really matters are the types of tests being run: curious tests. Tests that don’t just discover a bug and quickly document it away in a ticket, along with the state of the whole world at the time of discovery. But instead, tests that try to find the exact circumstances in which the bug occurs.
The more defined those circumstances, the more helpful the ticket is to the developer and, ideally, will take their mind right away to the exact function that is responsible for the bug. In those cases, you can almost see the light bulb go off:
Tester: I’ve only seen the bug on the audio configuration screen, and it usually crashes the app after single-clicking the “source” input, but I’ve seen it a couple of times from the “save” button too. And it seems to only happen after a fresh install on Android 10.
Dev: ohhhh! That’s because the way we handle configuration in Android 10 changed and the file the audio source is saved in doesn’t exist anymore!
This is exactly the kind of dev reaction you want to a bug report. It’s an immediate diagnosis of the problem, which was only made possible by a very well-researched and described bug. But notice how that description could, with changes only to the jargon, have been written by an entomologist:
Entomologist: I’ve only seen the bug on a tiny island off the coast of Madagascar, and it’s usually blue with green spots, but I’ve seen a couple of them with yellow spots too. And it seems to only come out right after sunset in the wet season.
Which is kind of obvious when you think about it, because what do scientists do? They test the software that is our reality. Galileo’s gravity experiments is one of the more famous in history (and likely didn’t happen), but what is it, in software terms? He wanted to know if the rules of our universe took weight into account when pulling things toward the Earth. A previous power user, Aristotle, figured that the heavier a thing was, the faster it would fall. But that user failed to actually do any testing. So thank God that talented testers, like John Philoponus and Simon Stevin, came along and figured out that things mostly fall at the same rate through air, and then bothered to update the documentation.
What Aristotle did was assume the software worked in a certain way. Granted that he didn’t have the requirements to reference, but he probably noticed that you have to kick a heavy ball harder to go the same distance as a lighter ball, and he figured that the Earth kicks all things equally hard. That’s the equivalent of our tester above seeing the “source” input work on Android 9 and not bothering to test it on 10. Or seeing that it worked on the video configuration screen and not bothering to test it on the audio one too.
And that’s okay, because Aristotle was not a tester. He was more like a fanboy blogger. But what testers should be, is bonafide scientists, like Simon Stevin, who follow the scientific method:
Ask a question
Form a hypothesis
Make a prediction, based on your hypothesis
Run a test
Analyze the results
In our example with the “source” input, after the tester saw it the first time, she probably did something like this:
“why did it crash?”
“maybe it was because I pressed the ‘source’ input”
“if so, that’ll make it crash again”
Relaunched the app, tried it, it crashed again.
“okay, that was definitely the reason”
Aristotle might stop there and file the bug: “app crashes when using the ‘source’ input”. And the developer would try replicating it on their Android 9 phone and kick the ticket back with “couldn’t replicate”, and that whole cycle would be a waste of time. But our tester asked another question:
“does it crash on this other phone?”
“if it doesn’t, it’s a more nuanced bug”
“I think it’ll crash though”
Tried it on the other phone: it didn’t crash
“what’s different about this phone?”
And she continued the scientific process like that, asking more and more pertinent questions, until the environment that our bug exists in was fully described. Which is exactly what you want in a bug report, because anything less will, in aggregate, be a productivity weevil, wasting both developer and tester time with double replication efforts and conjectures about the tester’s environment and back and forths. A clear, complete bug report does wonders for productivity.
So then, why not just teach all your testers the scientific method? Because it doesn’t work in the real world. We all learn the scientific method, but few of us become scientists. And I imagine that, just like in any profession, a not-insignificant number of scientists aren’t good scientists. Knowing things like the scientific method is necessary, but not sufficient to make a good scientist. You also need creativity, in order to ask the interesting questions, and more importantly, curiosity to keep the process going until it’s natural conclusion — to uncover the whole plot.
Tangentially, curiosity is a hugely important trait in great developers, too. But for testers, even more so.
This is the third and final part of a series on opera and software. Part 1 explained why they’re related, part 2 explained the development process used to produce an opera, and this part explores what can be applied to the software development process.
But first, the disclaimers: I’ve never worked backstage anywhere else, so I don’t know how other opera houses operate — they could be exactly the same, or wildly different. Also, everything in this article comes from my experience, and nothing is from Sarasota Opera — they don’t know I’m writing this (and hopefully, they’ll be pleased if they do ever come across it). Finally, what I do know of the production process comes from years of observation from the sidelines, from the role of a super, so some stuff in here could be just plain wrong; I’ll try to make it a point to underline where I know my knowledge is shaky, but there might be things I’m more sure about than I should be.
Let’s just dive in by recapping the operatic development process using software terms:
The product manager writes functional specs (the libretto)
The architect (composer) writes technical specs (the sheet music) based on the functional ones
Management chooses a reasonable release date
About a year before the release date, development begins on the infrastructure and wireframes (stage, sets, costumes), congruent to the functional and technical specs.
The infrastructure may well be sourced from elsewhere, or built in-house — whatever makes the most financial and artistic sense
The main components (principal singers and stage management) are identified
With roughly six months to go, the technical lead (conductor) learns the technical specs (sheet music) backwards and forwards and begins work on the main components (the principal singers)
The same as how, in software, the architect and the lead developer could be the same person so in opera, the composer and conductor could be the same person: Verdi conducted the first runs of all of his operas, and so have most composers. But after that first run, their operas are conducted just as well by other conductors.
The UX designer (director) begins fleshing out the UI (choreography)
Development begins on smaller components (auditions for smaller parts, chorus)
Support infrastructure is stood up (props, rehearsal rooms, living arrangements)
Management meticulously schedules acceptance testing (rehearsals), including bug fixing time.
A couple of months before release, major development is done (the choreography, principal parts, sets, and costumes are all largely in order), the infrastructure is in place, and developer testing begins (all cast members rehearse at home)
A month before release, developer testing is complete, and development is largely frozen: the infrastructure is ready, the entire cast is on site, and know their parts and music. Developer testing on the main components has been especially rigorous.
Acceptance testing begins:
The testing phase is the climax of the process. All hands are on deck, for very full days, for the duration.
It takes place in a lightweight mock environment (the off-stage rehearsal room) with mock props and costumes
The mock environment is designed to be easily reset and changed from one test to another, which is not the case for the production environment (the stage with sets)
The architect, UX designer, developers, and management are all present
Each workflow (scene) is thoroughly tested
The testing harness supports starting from any point in a workflow, stopping at any point, and resetting to any point, so that the entire workflow doesn’t have to be run every time, making it possible to easily test every individual feature
The testing harness supports various levels of integration testing, from interaction of only main components with mocked minor components, to full end-to-end testing
Project management keeps track of each test’s outcome against the specs and design
Project management files and tracks enhancement requests and bug reports (changes and corrections to choreography and singing) made by the developers and designers, who update specs and designs as needed
The tech lead and UX designer make tweaks as needed and file bug reports, which are worked on with high urgency
Once the tests are going acceptably well, testing moves from the mock environment and into a real one: the actual stage
More enhancements and fixes are made until the UX designer and developer are happy with the outcome
Beta users (coworkers, friends, early adopters) are brought in for final testing. There are very few, if any, surprises at this point.
The product is released on opening night
A point release or two may quickly follow, with minor changes (in the first couple of performances)
That should all be very familiar, but how does that differ from the process in most software shops? From my experience, it’s the beginning and the end: the design and the testing. Most shops focus on the middle: the development itself, with testing being a necessary evil, and design often being a quick drawing with some boxes and arrows, maybe just in someone’s head.
Don’t Focus on the Coding
The above process takes probably about 18 months, and the first half of it is all design. Main feature development takes about six months, and testing takes half of that again. So a full two-thirds of the process is design and testing — not development. On top of that, at the end of that six months of development, everything is already well-tested by the developers, with unit tests and rudimentary integration and functional tests. Which means that even less of that six months is actually spent on developing features; maybe as much as half that time. Counting that developer testing, we may be looking at nine months of design, three months of development, and six months of testing.
At this point, you may be aghast at the idea of spending less than 20% of a release cycle coding, and 50% of it designing. The important thing to remember is that designing is programming. The difference is that it’s at a higher level, and it consumes almost all of the research time. It’s a more efficient, and more disciplined, approach than starting to code with only a vague plan, and changing direction as problems arise, pausing to learn new technologies, finding the right widget library, and then realizing that the protocol you started with doesn’t support one of your main features as you wrote it, which means going back to the drawing board.
That going back and re-coding, that’s what wastes a lot of time. That’s why the opera has so much design up-front and why a release date isn’t even chosen until the detailed design is complete, and thoroughly understood. And the opera is not alone, because every disciplined engineering profession does the same. I’ve never worked in civil engineering, but I’m pretty sure that every iron nail in a highway overpass is designed and placed before one worker arrives on site. A big design up front — not coming down a waterfall, but rather in a process that is flexible and agile — documented in great functional specs, is the hallmark of the timely and efficient building of anything.
When done well and thoroughly, it front-loads the vast majority of the variability. If you spend the first half of the cycle coming up with a great design, the last half should be an exercise in typing. The great screenwriter Aaron Sorkin said “You have to think about what you’re going to write before you write it”. Software works exactly the same way. The more specific and detailed your thinking, and the design document that comes out of that, the less surprises there will be in the development and testing phases.
This means the design should address gotchas head on: every technology used should be well-understood, it should be vetted to make sure it works like the architect thinks, and the design document should be as detailed as possible, with APIs, flowcharts, state machines and anything else that can reduce risk and variability in the second half of the cycle. Plus, that document should be very readable, and even entertaining, so that the variety of audiences that read it (devs, managers, QA, tech writers, etc) don’t fall asleep doing so.
If I’ve taken one thing away from my opera experience, is that it’s the important of a thorough design up front, in which literally every note is written before a singer is hired.
If there are two things I can take away from the opera, the second is definitely the outsized scope of testing. At the opera, testing is not only built into every part of the process — from design to development to release — but everyone is also laser-focused on it. Conductors play their music on the piano to make sure it doesn’t just sound good in their head. Singers practice for hours a day with coaches, who you can think of as the developers of singing. They essentially run unit tests all day long, making changes until the output (the voice) matches the spec that is the sheet music.
And once rehearsals start, everything falls in place, for the most part, with ease. There are always changes because a singer turned out to be taller than the director imagined, so he has to be placed further downstage; and three people in the chorus constantly put the emphasis on the wrong syllable of some Italian word they’ve never heard of, so the maestro has to explain Italian pronunciation yet again.
Enhancements and bugs are built into the acceptance testing phase, but they’re assumed to be minor. Easily changed and corrected, with relatively little effort from the developer. You don’t find out that the soprano can’t hit the high note at this stage, because she auditioned long before. You don’t find out that the set design is all wrong, because an intricate miniature replica was built first. If the main singer gets the flu, there’s a cover waiting to step in, having shadowed the singer at every rehearsal. Everything has been vetted, pre-tested, and every contingency addressed so that there are no show stoppers. Even without any changes at this phase, the opera would be good. The changes here are the little details that make it great: a flawless, amazing performance.
To get there, the testing has to happen in design and during development first. The acceptance testing has to involve all the roles in the production. And it has to be properly planned, with ample time scheduled for it, so that its success is the number one priority for everyone during that time.
By now, I’ve been to dozens of operas, many on opening nights, many in the audience and many backstage or on-stage, and I can count on one finger the number of major blunders I’ve experienced. It was about the fifth performance of a production of Turandot, when the king repeated the same line twice. People that weren’t paying attention to the Italian wouldn’t have even noticed, but someone else played the part the rest of the run.
I’ve seen many more small issues, like people dropping prop coins or papers, walking out the wrong exit, or forgetting to take a jacket off. But the lack of issues in general is astounding, to say nothing of the ability to release on time to the day, and on budget, year after year after year. All of that comes solely from the company’s ability to front-load risk, which then lets them plan the release with certainty, and devote ample time to testing and refactoring, so that the final product is an absolute gem.
This is the second of a three-part series on opera and software. Part 1 explained why they’re related, this explains the process used to produce an opera, and the last part explores what can be applied to the software development process.
Before I go any further, some disclaimers are in order: I’ve never worked backstage anywhere else, so I don’t know how other opera houses operate — they could be exactly the same, or wildly different. Also, everything in this article comes from my experience, and nothing is from Sarasota Opera — they don’t know I’m writing this (and hopefully, they’ll be pleased if they do ever come across it). Finally, what I do know of the production process comes from years of observation from the sidelines, from the role of a super, so some stuff in here could be just plain wrong; I’ll try to make it a point to underline where I know my knowledge is shaky, but there might be things I’m more sure about than I should be.
The stuff that happens early on is what I’m least sure about, because I generally start working on an opera during off-stage rehearsals, which is essentially the “acceptance testing” phase. But from what I’ve gathered, this is what happens before then:
The libretto is written. This is like a movie script and contains the dialogue and stage direction. For most operas performed today, this was written long before your grandparents were born.
The libretto is set to music. The composer takes the dialogue (and the gaps therein) and creates music that sounds nice to people who like operas. Again, this has usually been done at some point in the 1800s, but even for new operas, it would be at least somewhat before anything else happens. And now you have an opera — at least, on paper.
The opera is slotted into a pre-existing Friday opening, about 18 months in the future. By the time one season starts, the next season’s roster of operas and their exact release dates have already been made public.
Set and costume design are started. This is one of the longest phases, because a set may have to be created from scratch, and sets are big and can get complex. They actually take up most of the backstage area. Renting them is an option, if the size and shape of the stage is pretty standard. Costumes also may have to be made — sewn in the opera’s costume shop — but sourcing ready-made ones is time consuming too, since an opera with a full chorus and a bunch of settings could easily require over 50 costumes. And sizing is a thing to deal with on top of it all.
About a year before opening, the principal singers and stage management are hired and contracted for the work, which actually starts some months later. There are usually about six principal singers and maybe a couple of more minor roles.
The conductor starts learning the music and hires the orchestra.
The director starts choreography design, figuring out what happens in each scene: where, when, and how people enter; what path they walk; how they act, what they physically do; and where, when, and how they exit the stage. Some of this is dictated by the libretto, but most is not.
A couple of months later, general auditions take place for the chorus, which is about two dozen people.
A few months before opening, props start to be created or sourced: swords, treasure chests, torches, faux violins. The set and costumes are done, but not tailored.
The rehearsal schedule is completed — for two full months of rehearsals. Since four operas are being produced at the same time, only parts of each opera are rehearsed in each session (usually one particular act or scene), and rooms and the stage have to be scheduled so that rehearsals can happen in parallel. Also, the chorus usually appears in all four operas, so the scheduling has to take into account that they can’t be in two places at once.
A couple of months before, the cast is all set and the various design work is mostly done. The cast starts rehearsing the music on their own.
About one month before, the cast arrives on site (they come from all over the country, and internationally) and rehearsals actually begin.
This is the point where I enter. By the time of my first rehearsal, the principal actors have been rehearsing for weeks, and the larger chorus for some days. Principals have something like 10x the stage time as the chorus, so they have a lot more to work on.
The rehearsals early on are held in a large rehearsal room instead of the stage, to minimize labor costs associated with setting the stage. Instead, the floor of the room is marked with tape outlines of the sets, with different colors indicating different scenes. So Act 1 might be in red tape and outline some stairs and a couch, while Act 2 might be in blue tape and outline the same stairs, plus two doors and a table with chairs. When you rehearse Act 2, you walk around the blue tape and ignore the red tape.
There are generally a lot of new people, so everyone gets a name tag in the beginning. In each rehearsal, in addition to the performers, the conductor is present, along with a pianist, the director, a stage manager, and two assistant stage managers: one for stage left, one for stage right. Principals all get cover singers, to put the bus factor at 2, and their covers are also there.
Since the choreography is already designed by now, the director tells everyone what’s going to happen in the scene. No one is in costume, except principals might wear something a drunk person might think is their costume, to get used to walking around and singing with, say with a sword or a heavy coat. The singers have already memorized their music, and performed it in front of a maestro and gotten notes on what to change.
An important thing to understand is that everything in an opera is cued by music, and the music is cued by the maestro. She moves her baton at a certain tempo, and all of the singers and orchestra musicians keep that tempo. When she stops, they stop and when she starts, they start. If they start to lead or lag her, she stops and yells at them. “Can everyone see me? Can you, Jeremy? Because if you can, why are you half a measure behind?” The conductor is essentially the system clock, for a multi-processor system. Or maybe an NTP server.
Because of this, the cast’s actions are also cued by the music and it follows that the director’s directions are usually given relative to some piece of music: “when Isabella gasps, you three move upstage, and you go put your hand on her shoulder. Then, on the beat after ‘da dada da dun’ everyone but the Don exits the stage. Ok, let’s take it from two bars past 62”. It’s amazing to me that the singers can start from any part of the entire opera, with a second’s notice.
Stage management keeps a giant binder with the sheet music for every role and marks where in it things are supposed to happen — like stage entrances and exits. After everyone verbally understands what’s supposed to happen, we run that part of the scene, with piano music. Here usually, someone omits doing something — walking somewhere, facing someone, fake laughing — something got forgotten or misunderstood in the space from the director’s mouth to the cast, so we reset and run again. Sometimes, the director, having seen it run with actual people, decides to change where people stand or face or who they fake banter with, so we reset and run again. Later, when we rehearse on stage with the actual set, sometimes things are tweaked even more. And if the singing is off, the maestro stops the music, yells admonishments, we reset and run again.
Once the director and maestro are happy with the way things are running, they layer in any additional people in another rehearsal. For example, sometimes we have rehearsals just with the supers, or the supers and chorus but not principals, for various scheduling reasons. And when everyone knows their parts well enough, they’re all brought in to the same rehearsals.
The off-stage rehearsals are generally splayed over three weeks, with each scene being rehearsed about three times, depending on the complexity. Once in a rare while, things might be going so poorly that they might add rehearsals; as an unpaid volunteer doing this for 20 hours a week on top of your day job, that’s what you really don’t want.
Finally, it’s about three weeks before opening, and we get to rehearse on-stage! There are only a handful of rehearsals left at this point, but most of them are of the entire opera, start to finish:
A couple of rehearsals with only the piano accompaniment still
A piano dress rehearsal — meaning still only the piano, but full costumes for everyone, though only principals get makeup
A rehearsal with the orchestra but no costumes
Finally two full dress rehearsals with the orchestra.
These last two are run exactly as if they were live performances, with beta testing users being brought into the audience, in the form of friends, family, and big shot donors.
During these on-stage rehearsals, the lighting people are also testing the colors and intensities to make sure they create the proper environment and mood, the costume and makeup designers check to make sure everything looks good from the audience, and the director walks around the whole opera house to make sure various vantage points can all clearly see the important action.
For the last rehearsal, they generally bring in a couple of classes of middle or high school students, and again: they’re run exactly as they will be on opening night, with PA announcements, supertitles, intermissions, curtains, etc. — even the bows at the end are rehearsed. The only differences with respect to a real performance is that the conductor might still stop the music if things are going too wrongly, she gives notes to the orchestra at the beginning of each act, and the director hovers around the whole house looking for final tweaks to make.
After opening night, things are mostly on rails. The director might issue notes with small things the cast is doing wrong or she’d like to change, someone might need musical coaching if they’re too loud or soft or whatever, but by and large everything is set at that point, and it all just manifests like a clockwork for a dozen performances. That is actually the inflection point for when it gets really boring. After the jitters of opening night, it all becomes so routine that there’s not much excitement in it anymore — but that’s good, because then you can finally focus on the performance.
In the next and final part, we’re going to talk about how all of this applies to software development.
Over the past decade, I’ve done eight seasons with the Sarasota Opera. The usual reaction I get is either “Wow, I didn’t know you could sing!” or “I could totally see you as an opera singer!” I’m never quite sure if that second one is meant to be a compliment, but in either case: I can’t hold on to a note any better than to a cat. So instead of a Pavarotti-type, I’m what’s called a supernumerary, which is Latin for “extra numbers” — a non-singing extra, normally just shortened to “super”. (Because I’m male, they usually refer to me as a “super man”, which is awesome.) Also, Sarasota Opera is a very professional house — which is actually the reason for this essay — and only hires phenomenally qualified singers, who’ve formally trained for years. I, on the other hand, still haven’t figured out what a “clef” really does.
My parts are usually some kind of guard or soldier off in the background, my only reason for existence being to underline the importance of other people by escorting them on and off stage — or because they’re prisoners. Sometimes the parts are beefier, like a very cool executioner in the degradation scene in Jérusalem, or a nimble thief that scales the wall to kidnap Gilda in Rigoletto. My last role was a guy in a band in La Wally, faux-playing a faux-violin on top of a table, while trying to keep time with the actual violins in the orchestra pit, during a lively party that, of course, ends in disaster.
At this point, you might be thinking “this is neat and all, but what does it have to do with software?” — and that’s a perfectly valid thought. The answer is that over these long years, I’ve noticed some very interesting parallels between the production of an opera and the production of software. From my first day there back in 2010, Sarasota Opera amazed me with its clockwork precision, streamlined efficiency, and amazing courtesy and professionalism toward everyone, from principal singers, to lowly supers. And every year but this one, they deliver five operas: all on-time (same Friday every year), all impeccably sung, directed, and designed, generally getting great reviews, and involving a crew of several dozen people over a timeline of more than a year.
Can the average software company make the same claim? Apple and Google can: iOS and Android are released on a yearly cadence, like clockwork, and for the most part, they are masterful works with serious bugs being rare. However, in my experience, most other software releases are late and buggy. They’re probably buggy because they’re late and were rushed out the door, and they’re late because things didn’t go according to plan. So how does the opera — in essence a small, non-profit business — stay on plan and deliver such consistent and high quality results? What parts of that are and aren’t applicable to software development? And how much does the operatic world even have in common with that of software?
To answer the last question first, a lot:
Both follow the same process: design, development, testing, narrow release, wide release
Both produce highly technical works of art. Music, orchestration, choreography, and stage production are all, of course, highly technical. And if you don’t think software is a work of art, just think about how important good design is to user adoption.
Both have audiences that are intolerant of bugs
Both are staffed by highly trained professionals
Both have administrative staff managing “the talent“, plus support staff making everything actually work.
Both have the same constraints: deadlines, budgets and headcounts
Both require a great deal of planning and design in order to be done well
Both have a number of specialized departments that work in concert (pun intended): developers, testers, DevOps, ProdOps, UX, and product management on one side vs singers, orchestra, props, costumes, makeup, lighting, set design, and stage hand, on the other.
Both have hundreds of components interacting with each other, even in parallel. Opera is generally massively parallel, especially if counting the orchestra.
Both are run from compiled code: musical notation is conceptually the same thing as binary code, with each singer or musician being the processor that executes the instructions for their particular role
There are more, but ten is a nice round number. And hopefully the reason this essay exists makes more sense now. If you’re still keen, the next part talks about how an opera is produced and finally in the third part, what can be applied to the software development process — which is that projects should focus on design and testing, rather than coding. Oh wait… did that just spoil the ending? Because if not, you should also know that the other two parts are much longer, too.