The Case for the Diagnostics Team

I recently watched a lecture by Kevin Hale, who co-founded a startup named WuFoo back in 2006, grew it over five years to millions of customers, and sold it for 35M$, to SurveyMonkey. He subsequently became a partner at Y-Combinator for several years. The lecture was about making products people love, and one of the points he made was around WuFoo’s obsession with the customer:

  1. Each team member had a turn in the customer support rotation
  2. Their response time to customer support issues was a few minutes during the day, and a little longer at night
  3. They hand-wrote personalized thank you cards to random customers weekly
  4. Even though their business (form creation) was dry, the website was designed to be fun and warm, not business-y

It’s a great 45 minute video, and absolutely worth watching — it’s embedded down at the end. But what really drew my attention was that first point above, about everyone doing a customer support rotation. And that’s because at Voalte, which also had a customer obsession, we took a similar approach that we called The Diagnostics Team.

Voalte mobile app
Voalte is a communication platform for hospital staff

The team was like the cast of House: expert detectives in their domain that could tackle the hairiest problems, sometimes getting that “eureka!” moment from the unlikeliest of events. I/O throughput was our lupus.

The mission was a take on the support rotation, but with some twists:

  1. The team handled “Tier 4” support issues: the kind of stuff where a developer with source code knowledge was needed because the previous three tiers couldn’t figure out the issue.
  2. It was cross-functional, so that each codebase (Erlang backend, iOS, Android, JavaScript) was represented on the team
  3. The rotation was 6 months
  4. The team priorities were:
    1. Any urgent issues
    2. Code reviews, with a support and maintainability point of view
    3. Any customer-reported bugs
    4. Proactive log analysis, to find bugs before they’re noticed in the field
    5. Trivial, but noticeable bugs that would never get prioritized by the product teams
  5. Team members nominally did at least one customer visit during that 6 months

The model worked really well, and I think the team is still around, two acquisitions later, at Baxter. It wasn’t perfect (we never got that good at proactive log analysis while I was there, and customer visits ebbed and flowed depending on priorities and budgets) but overall, we hit the goals. “And what were those goals?”, you say. I’m glad you asked!

Cast of House, season 1

Remove uncertainty from the product roadmap

This was the main reason I pitched the idea of a Diagnostics team. After our initial release of Voalte Platform, we were constantly getting team members pulled off of product roadmap work in order to take a look at some urgent issue that a high profile customer was complaining about. And you could never tell how long they’d be gone: a day? a week? 3 weeks? How long does it take to find your keys? And if we had a couple of these going on at the same time, it would derail an entire release train.

The thinking was that having a dedicated team to handle those issues, while costly, was probably both less costly than the revenue lost from release delays, while also saving us money in the long run by preventing urgent issues.

And it worked: our releases became a lot more predictable. Not perfect of course, but a big improvement.

Keep a focus on customer needs and pain-points

Our customers were hospitals and we wanted to make sure things worked well in our app, because lives were literally on the line. Having a team that was plugged in to the voice of the customer meant that less complaints fell through the cracks of prioritization exercises. And while the Diagnostics team generally didn’t build features, once in a while they did: if the feature fixed a big pain-point.

This being Tier-4 support though is one major way in which it differed from WuFoo’s model, because the team wasn’t as much exposed to Tier-1 issues that were known to the frontline customer support people. When developers hear about a frustrating bug for the 4th time, they tend to just go ahead and fix it. But if they’re only exposed to that bug via a monthly report, it won’t frustrate them as much.

Our ideal here though, was to crush the big rocks, improve the operational excellence so that no more big rocks form, and then the team would be able to focus on the pebbles. We had varying success on this, depending on the codebase.

The other prong was customer visits. Each developer would pick a hospital and arrange a ~2 day visit. The hospital would generally assign them a buddy, and they would get the ground truth both from that buddy and by walking around to as many nurses’ stations as possible and asking them about the app.

Most of the time, they wouldn’t have anything to say. When they did, most of the time it was some known problem. But like 10% of the time, it would be revelatory: some weird issue because they tapped a combination of buttons we’d never thought of, or used a feature in a completely novel way than we meant it. And we’d write debriefs of the visit after the fact to share with the team.

No matter what was learned on the trip though, the engineers came back with a renewed sense of purpose and empathy for the customer, not to mention a much better understanding of how hospital staff work and use the product.

The House version of customer visits were rotations in the free clinic.
Great supercut on how not to act on your customer visits.

Improve the quality of the codebase over time

One of the things we were worried about in creating this team is that it would disconnect the developers on the product teams from the consequences of their actions. They’d release all kinds of bugs into the field and never be responsible for fixing them and so never improve. This was part of the reason we wanted Diagnostics to be a rotation. (Though, it ended up mostly not being a rotation, but more on that later.)

Our main tactic to prevent this problem was to make the Diagnostic team a specific and prominent part of the code review process. Part of the team’s remit was to review every PR for the codebase they worked in, and look for any potential pitfalls around quality and maintainability. Yes, those are already supposed to be facets of every code review, but:

  1. The Diagnostician would have a better sense of what doesn’t work, and
  2. They have more of a stake in preventing problematic code from seeing the light of day

Build expertise around quality and maintainability

To our great surprise, at the end of the team’s very first 6 month rotation, half of the members wanted to stay on indefinitely. They found the detective work not only interesting, but also varied in is breadth and depth, and fulfilling in a way that feature work just isn’t.

We debated on whether to allow long-term membership on the team, because we did want to expose all of the team members to this kind of work. But ultimately, we decided that the experience these veterans would build would be more valuable to the effort — especially when combined with them sharing that experience through code reviews and other avenues.

Over the years, they got exposed to more and more issues reported by customers — which are the ones that matter most — and they developed an intuition about what bothers them most and what kind of mistakes cause those kinds of issues. They also developed a sense of what programming patterns cause the Diagnosticians themselves problems both in terms of both monitoring and observability, so they can easily diagnose issues, but also in terms of refactoring code to fix problems, and what characteristics problematic components have in common.

That’s the kind of insight from which arises the most valuable part of the return on investment: preventing painful tech debt and convoluted bugs from ever getting shipped. It more than makes up for the cost the team.

Two More Annoying iOS 11 Bugs

So far, iOS 11 is maybe the buggiest release Apple has put out: it’s had eight updates to fix a bunch of issues, like the infamous A ⍰ bug, and while you’d think they’d have ironed most serious things out by now (11.2.1), I ran into two more, and verified them with Apple support on the phone.

I’m posting them here because I scoured the Internet for answers on them and got nowhere, which is why I resorted talking to Apple on the actual phone, and hopefully this will help you avoid doing the same thing. I’ll also update this post if and when they get fixed.

Bug the First: Edit and Share buttons are grayed out on some photos

This one is straightforward and super annoying and mind boggling: I can’t edit or share some small minority of photos. There seems to be no rhyme or reason as to which photos. The guy I talked to Apple couldn’t find the issue in their database, even though there’s at least one thread on it on discussions.apple.com (though it’s been erroneously closed).

Workaround: import the photo into an app that’ll let you re-save it. I used Camera+.

Bug the Second: On the first iCloud backup, you can’t choose what to (not) back up

Let’s say you have 5GB of Photos and you wanna do an iCloud backup of just the phone settings, because the photos are backed up already in some other way, and you got a new phone that you want to transfer your settings to. The settings themselves should be around 200MB.

Assuming you haven’t wasted money on iCloud and your limit is still 5GB, even if you have nothing in iCloud, when you turn on Backup it tells you there’s not enough space in iCloud, because it’s trying to upload your photos also.

Image result for ios 11 this iphone cannot be backed up because there is not enough icloud storage available

That’s it: you’re stuck. There is no way to get past this without buying more iCloud space.

In previous versions of iOS, you could switch off Photos, for instance, as well as iMessage and everything else, so you didn’t end up uploading the entire phone’s contents to the cloud. And you can still do this, on a backup-by-backup basis, but only — and this is the maddening part — if you already have a backup in iCloud.

Otherwise, there’s no way to control what goes up there or even see what the Next Backup Size will be.

Image result for ios 11 choose what to backup

Choosing what to back up was so easy in iOS 10, and I was so incredulous that it wasn’t in iOS 11, that I went to the Apple Store so they could show me what pathway I was missing. After that went nowhere, I brought it up with the guy on the phone, and he confirmed this is how things are now.

There’s a StackExchange question out about this, but the problem seems to be too nuanced to have made it onto Apple’s radar.

Workaround: use an iTunes backup to restore the settings onto the new phone.

I’ll update this post if and when these things get fixed.