Designing Business Experiments

As the COVID crisis drags on, casting us all into the world of having to make assumptions, rather than being able to rely on facts, a question I get a lot is how to design checkpoints or experiments to learn more about our new reality. Here, I’ll introduce you to some design principles and describe four examples that you might find helpful as you navigate your own reality.

Before moving on with the examples, a few design principles.

Spell Out Your Hypothesis

Like a scientific experiment, the first principle of designing a business experiment is that you need to spell out what hypothesis, guess, or assumption you want to test and what metrics will tell you what you learned. I see way too much of people rushing willy-nilly into designing surveys or interview protocols without clearly identifying what they hope to learn from it, and what evidence they’ll need to support their conclusion. You can think of this as a good dependent variable.

For instance, years ago, Ian MacMillan and I designed a survey for a division of General Electric that made a lot of deliveries by truck. There were some 40 branches in the truck delivery network, serving hundreds of customers. So we designed this great survey in which we asked customers how satisfied they were with the service they were receiving. Two of the questions were about timeliness of deliveries and about their accuracy — in other words, how quickly was the delivery made and was what was in the shipment what the customer ordered. Customers reported that they weren’t that satisfied with timeliness and were borderline happy with most of the other business elements.

So we did what seemed logical, and encouraged the company to invest in improving speed-to-deliver. Six months later, the program was implemented and everybody stood back to hear the applause. Unfortunately, sales didn’t budge. So we took the heretical step of actually going and talking to some customers. “Well,” they reported, “slow deliveries aren’t such a problem if you let us know in advance, but when the stock is wrong it causes major operational headaches, and that would lead us to look for an alternative source of supply.” We were stunned — we had made the assumption that satisfaction would correlate with purchasing behavior.

This is the problem — we think that the relationship between customer satisfaction and something we care about, like willingness to pay or loyalty, is a straight-line relationship, like this:

The difficulty is that it typically isn’t. Instead, there’s a curvilinear relationship between the variables that looks a lot more like this:

What you’ll find is that if you are too low on satisfaction, that can create an enraging situation for the customer. But incrementally improving satisfaction often takes you into a “zone of indifference” where customers are happier, but they aren’t buying more, providing you with more word of mouth or otherwise helping you out strategically. To get real results, you have to get all the way over to the right — where satisfaction picks up sharply and customers reward you for it.

Back to the drawing board on the survey. This time, we asked relative questions — relative to our competitors, or relative to your expectations, how was our performance? And we included the variable we really cared about, which was “what percentage of your ordering in the next six months will you plan to place with us?”

This time, we got much more actionable data. It turned out that customers felt our group was at just about the same level of performance on timeliness, but not nearly as good on accuracy, which allowed us to direct budget to where it would create a competitive difference.

Don’t Lead the Witness

Have you ever had this experience? You have an encounter with a car dealer or an internet provider, say, and they ask permission to send you a follow-up survey. You say yes, and the survey comes to you with the “excellent” column conveniently filled in, and the suggestion that if you had any other experience than “excellent” that you connect with the service provider, or worse, not send in the survey! Or the service person working with you positively begs for a high ranking.

What is going on here is the product of clumsy efforts, often by headquarters, to find out what is really going on in the field. If they land on their field operations people like bricks if customers report less than stellar experiences, then the field operations people have absolutely no incentive to provide honest responses. If you really want to learn via this feedback process, you have to be willing to hear the answers!

I see this all the time — projects that really should be discontinued (and everybody knows it) get more funding because the boss believes in it. Or data that provide a counter-narrative are ignored. Or issues are covered over because they might reveal underlying problems.

Here is where the groundbreaking work of Amy Edmondson on psychological safety is so important. Her research has shown that when people feel “safe” about speaking up with a data point or opinion that is different from others in a group, the group’s decisions are better and its performance rises. How to create a psychologically safe space? She suggests three behaviors: 1) articulate that “any one of us could have a valuable idea” and mean it; 2) Ask questions and really listen to the answers; and 3) respond in a way that shows interest. As she says, one of your goals if you are a team leader is to “go on a treasure hunt for the genius of your team.” Brilliant.

Another way of leading the witness is to ask people outright whether they would buy something. Most of us are spectacularly unaware of what causes us to make a purchase, and so our answers to such questions are fundamentally misleading. More on this below.

Use the Right Data Sources

The difficulty I often see here is that people over-rely on focus groups and surveys, without thinking through what the right source of a particular answer might be. You need a relevant sample, ideally one that reflects your intended future interactions. You also need to think through the entire “job to be done” of your potential customer, recognizing that even if your offering works as intended, other elements of the experience can be unsatisfying.

A great example of this is the ultimately failed decades-long quest to get consumers to adopt three-dimensional television. Sparked partially by the vast success of the 2009 3D movie “Avatar,” the entire television manufacturing community managed to convince itself that 3D television was the next big thing in viewing (and the next big thing in profits). Despite very limited evidence that consumers would flock to devices that required them to wear special glasses to watch their programming, many manufacturers rushed 3D sets into the market, often requiring customers who wanted to buy high-end models to purchase the 3D version. Of course, a few early adopters were probably enthusiastic, but as mass-market products, the technology failed (see also: quadrophonic sound).

Four Case Studies

With that behind us, let’s have a look at four interesting examples of experiments that were used to learn essential lessons for success.

1. Buffer and the smoke test

Buffer is a service that allows people to space out social posts without having to pre-determine the timing. Joel Gascoigne, Buffer’s co-founder, got the idea from his own frustration about how clunky it was to try to tweet more consistently.

He wanted to test whether anybody else shared his frustration. So he built what is sometimes called a “smoke test,” an offer of a product that doesn’t exist yet. So he built a very simple two-page website. The first page pitched “Tweet More Consistently with Buffer.” If a user clicked on it, they were taken to a second page, with the heading “Hello, You Caught Us Before We’re Ready” with a place for people to enter email addresses if they were interested.

Most people weren’t, but there was a small population that would like to solve that problem, too. A third web page was created in between the first two to test pricing hypotheses — and again, as Joel says, most people weren’t interested in paying, but enough were that it convinced him to build the product. Subsequent decisions involved how complex to make it and how many platforms to support — he ended up keeping it very simple and supporting only Twitter, at first, in 2010. The experiment worked! Buffer’s 2018 revenue was $18,346,077, not bad for a website that started as smoke.

2. Peter Kooman at Optimizely on A/B Testing

Peter Kooman, co-founder and president of a company called Optimizely, and author of a book on A/B testing, describes how it was used in the Obama campaign and how it has subsequently been used on other kinds of digital interfaces. As he describes in this presentation at Columbia’s BRITE conference, the Obama campaign faced a thorny challenge with its digital marketing. As visitors to the various campaign websites made their way from landing on the website to signing up for an email list to being emailed solicitations for support, the organizers noticed a pattern. The campaign was effective at getting people to the website, and at getting those who signed up for the email list to become paying donors or otherwise get involved. The weakness was that website traffic wasn’t readily converting into email signups. And nobody knew why.

So they did an A/B test, in which visitors are offered different versions of the same website to determine which version works best at achieving the desired outcome (in this case, signing up for the email list). Testing both the landing page imagery and the sign-up button led to some interesting results. On the button side, they tested four statements:

  • Sign up

  • Learn more

  • Join us now

  • Sign up now

On the imagery side, they tested several different images, including a talking headpiece, a family image, an inspirational speech, and a rally in Springfield. They used both static images and videos, with everyone on the campaign assuming the videos would do a better job of attracting customers.

The results? It turned out that viewers responded most favorably to the “learn more” button, combined with static family imagery, contrary to expectations. And this was no small difference — the “winning” combination outperformed the original by about 40 percent. And with some extrapolation, Kooman suggests that the impact could have counted for as much as an extra $60 million in donations to the campaign.

3. Alberto Savoia, pretotyping, and the skin-in-the-game caliper

Alberto Savoia was Google’s first engineering director and was part of the team that created the Adwords platform that proved to be so successful for the company. In a terrific book called The Right It: Why So Many Ideas Fail and How to Make Sure Yours Succeed, he goes through a process he called “pretotyping.” While the book is too full of great ideas to cover here, two are worth calling out in a special mention.

The first is in reversing the typical process of entrepreneurial product evaluation, which is to build an offering of some kind and then go out and assess interest. Instead, Savoia suggests, why don’t you get an expression of interest, and then build what customers are willing to pay for! I thought that was very smart.

The other idea that is worth a special mention is what he calls the skin-in-the-game caliper. The insight here is that the best feedback for any idea you have is likely to come from those who have “skin in the game” — in other words, who are investing their resources along with you, echoing Nassim Nicholas Taleb’s warning not to take advice from people who will be unaffected by the outcome of a collective decision. He creates a table that you can use to assign point values to different types of evidence of market acceptance. This table is copied from page 153 of his book.

Type of Evidence Examples Skin-in-the-game points Opinion (regular people or experts) “Great idea” 0 Encouragement or discouragement “Go for it!” 0 Throwaway or fake email address or phone number bogusemail@spam.com (123) 555-1212 0 Comments or likes on social media “This idea sucks” “Thumbs up or Thumbs down” 0 Surveys, polls, interviews online or off “How likely are you to buy ____ on a scale of 1-5” 0 A validated email address with the explicit understanding that it will be used for product updates and information Give us your email to receive updates about __________ 1 A validated phone number with the explicit understanding that you will be called for product updates and information Give us your phone number so we can call you about our product 10 Time commitment Come to a 30-minute product demonstration 30 (1 point per minute) Cash deposit Pay $50 to be on the waiting list 50 (1 point per dollar) Placing an order Pay $250 to buy one of the first 10 units when available 250 (1 point per dollar)

What I appreciate about Savoia’s approach is that he is pretty brutal about all the things we can easily gather that convince us an idea has merit, and pretty demanding of the proof we accept that things will work out.

4. Test parts of a model before building the whole business — Netflix

While the origin story of Netflix has it that the concept for the company was born in a single flash of insight, in which Reed Hastings ran up a big late fee bill on the rental of “Apollo 13,” the reality is a little more complicated. As Netflix co-founder Marc Randolph describes it, “I had no idea what would work and what wouldn’t. In 1997, all I knew was that I wanted to start my own company, and that I wanted it to involve selling things on the internet. That was it.” The ideas the two played around with included everything from personalized shampoo to custom sporting goods potentially to be sold over the internet. Both Randolph and Hastings were working at a firm called Pure Atria, and it looked as though they were both going to be made redundant as the result of a merger.

Out of a positive soup of potential ideas was the notion of offering customers a video rental service with the just-emerging DVD technology behind it. To see if it was even feasible, Randolph reportedly mailed a CD to Hastings, and when it arrived safely in one piece, concluded that the DVD by mail business was technically feasible. Note that the experiment involved just the test of a part of the business, not the build-out of the whole thing.

Things didn’t go too well for Netflix at first, as it basically aped the business model of physical stores, offering one-time rentals and video sales. It turned out customers weren’t particularly interested, given the immediate gratification that in-store experiences offered. An experiment it tried was to offer DVD’s as a subscription service, with people able to order several movies in a queue at a time. To the founders’ surprise, that model worked well.

In an even more interesting twist, the founders had always expected that content delivery by Netflix would involve streaming, but it took a lot longer than either expected. By the time enough households finally got enough bandwidth to make commercial streaming feasible, many were already Netflix customers and converting over proved pretty straightforward (with some very famous stumbles along the way).

A few thoughts in closing. It is almost impossible to get feedback from customers on something they have never experienced, so prototyping, creating experiments, and otherwise putting customers in a situation for real are invaluable. Good ideas and awful ideas look almost identical at birth. Sometimes it takes a fair amount of trial and error to hit on a business model that can work.

But, we’ve learned a lot about how to do experiments and new tools make it easier and cheaper than ever.