If you’re under the impression that you already understand how your customers behave, or you know the best way of designing a web page, you're unlikely to ever test that theory (or even recognise that it is just a theory). We all carry a bias and naturally seek confirmation of what we already believe to be true. So convinced by our understanding, we often miss the opportunity to see the truth.
Usually, making assumptions and being resistant to change means that the marketing standard is "we are doing this right" when the reality is that there are always better ways of doing things. Even small changes can have a huge influence on the experience your users have.
Last month I got to see him live (once again) at the Mayflower in Southampton. I even got to within a few meters of his awesome-self thanks to some exuberant audience participation, that I’m glad to say I wasn’t personally involved in.
I realise that reads slightly creepy, but as I say, I’m a huge fan.
I could spend ages listing all the shows of his that I’ve re-watched, over and over. From the time he managed to influence an innocent member of the public into pushing a stranger off a building, to the time he turned several ordinary, every-day people into opportunist bank robbers in “The Heist”.
But one of my very favourites was “The System”. It was all about horse racing.
Derren claimed he had developed a system for predicting the winner of a horse race with 100% accuracy. In fact, he guaranteed it to young woman called Khadisha.
Khadisha was contacted by an anonymous tipster (Derren) claiming to have developed a guaranteed system for predicting a seemingly impossible series of events. She was asked to look out for the result of the 9:20 at Wolverhampton, having been told that a particular horse would win. Indeed, the horse that "The System" predicted would win, won.
The series of predictions continued.
The second time Khadisha was encouraged to place a bet on a horse that “The System” said would win. Again, as you would expect, the horse won, and Khadisha collected the first of her winnings.
This happened two more times, and each time “The System” got it right, even though things that could never have been predicted happened during the course of a race (such as the leading horse falling just before the final hurdle).
In spite of all these things Khadisha won four times in a row. After the fourth win Derren revealed that he was the one who’d developed “The System” (much to her surprise. As you would imagine, I would have freaked).
Derren then asked her to make one final bet, betting everything she had, including money borrowed from friends, family and the bank.
After the bet was placed (but before that final race ran) Derren revealed how "The System" worked…
Khadisha was not the only person who was using “The System”. In fact, she was one of 7,776 people split into 6 groups of 1,296 people each. It's a pyramid.
When Khadisha was first contacted she was told the winner of a specific race would be one particular horse… as where 7,775 other people. Khadisha happened to be in the group that was assigned to the horse that did win. "The System" didn’t predict the winner, "The System" assigned people to one of six possible winners, one of which would eventually go on to win.
Each time a race was run, 5 in 6 people lost. They were contacted; “sorry, it seems the system doesn’t work after all”.
But for the other 1,296 people, they move on to race 2. The same happens again, with this group ultimately reduced to, 216, then 36, then 6. After race 4, which Khadish won, she became the last people standing out of 1,296 … and now she's put £4000 on a race because she believes the "The System" works... and it’s absolute rubbish.
What Derren's playfully and dramatically demonstrated to us are the effects of confirmation bias; the natural tendency to seek out information and signals that confirm our beliefs, rather than falsify them.
We are pattern matching animals that seek familiarity and confirmation. Let’s start with a few examples.
Newspapers have a habit of playing on our prejudice. Let’s take the EU referendum for starters. It’s certainly one of the most recent examples of people’s preconceptions being reaffirmed by an authority, especially as we frequently see the referendum turn into a one issue debate.
Then there’s conspiracy theories. Whilst being wildly unlikely, theorists naturally seek out evidence that supports their belief, rather than looking to disprove it.
- Paul McCartney died and was replaced by an impersonator
- World leaders are lizards
- The Phantom Time Hypothesis (the year is actually 1715AD)
Then there’s the Placebo effect, the remarkable phenomenon in which a placebo (a fake treatment) can sometimes improve a patient's condition simply because the person has the expectation that it will be helpful, which if you ever read Bad Science by Ben Goldacre you’ll be more than familiar with.
The point being, Confirmation Bias shows just how limiting (or influential) our own perspective can be, even if it’s not rooted in any fact at all. It also shows us how damaging our perspective can be when making decisions when we are biased, which everyone naturally is.
Btw, if you disagree, that’s because you have a Bias Blind Spot; a bias that allow you to view yourself as less bias than other people.
Published in Management Science, collaborative researchers from a number of US universities revealed that believing you are less biased than your peers (or that you are a fair judge of your own expertise, for example) has detrimental consequences, such as accurately judging whether advice is useful.
Ultimately, if you're not conscious of confirmation bias - in yourself and others - you could be making one of the following mistakes.
You are not testing
If you’re under the impression that you already have the answers, such as how your customers behave or what’s the best way of designing a particular web page, you are unlikely to ever test that theory (or even recognise that it is just a theory).
So convinced by our understanding, we often miss the opportunity to see the truth.
"Best practice" is too often confused with "most familiar"— Chris Cherrett (@chrischerrett) May 27, 2016
A good example would be a website carousel.
You might believe that carousels are good because other website’s use them (Bandwagon Effect) and whenever someone tells you that they like one of the pictures on your homepage this further confirms your believe that the carousel is a good thing, but this is no way proves anything about the effectiveness of the feature.
There are so many usability issues with carousels that I don’t need to list them here, but somehow they persist because the evidence that suggests they do not perform well is outweighed by personal preference.
Carousels don’t work because customers aren’t paying attention, and it’s essential that the evidence drives both design and business decisions.
With this in mind, and in order to drive long term success, we need to do two things:
- Question why something works the way it does
- Create and run tests to find the answer
Every design is a hypothesis
When building a website, you can’t research and research forever. You can certainly (and absolutely should) spend a great deal of time validating your assumptions and helping drive you design with evidence, but eventually it has to launch.
No matter how much research has gone into it, on the day of launch the design is still a hypothesis (even the requirements are a hypothesis, so the design is a hypothesis of a hypothesis... MIND BLOWN). It's the most-informed version of our website that we have today. We now have the opportunity to validate our decisions with real customers (and in all likelihood, more than we've had up until now).
The launch of the website is not the end of the build, but the start of the project. You now have a new hypothesis, and a project that attempts to prove or disprove that hypothesis should get underway.
Before we can do any form of testing we have to define what success looks like and what we’re measuring the performance of that design against. There are a ton of great tools out there, many of which we use ourselves, to produce variations of a web page and see which one out performs the others – once we’ve established our baseline.
Now that we’ve talked about bias, the reasons that might lead us not to test, and the justification for why we absolutely must test, it’s time to delve a little deeper into the anatomy of a popular testing technique and discover how you could make it smarter and ultimately more effective.
I’m going to make my own assumption and guess that you’re already familiar with the concept of A/B tests. Nothing new there. But what I do want to discuss is how we can reduce the perceived risk and expense of testing that might stop organisations from doing so; the things that make testing that bit harder to implement or that bit more difficult to instill as part of a culture towards testing.
As we know, for every A/B test there are a number of variations (let’s say three) each of which is served with an equal segment of traffic. We know that ultimately only one of our variations will be crowned the winner, which means that during the course of the test (irrespective of volume) 66% of our participants (real customers) are subject to an under-performing variant.
It is only at the end of the test that we will look at the results and determine that winner. Therefore, we need to achieve two things that will improve upon this method:
- Reduce the time is takes to deliver your best performing variant to the majority of the traffic
- Reduce the risk of lost conversions because users aren’t being served the better variant
We need to fail fast, learning quickly what doesn’t work quickly and adapting. In other words, we need to monitor which A/B variant with performing the best, and using it as much as we can. Let’s take this example.
We have three variants, one of which (between you and me) we know to be the best one. However, from the start, we have to assume they are all good because we don’t know anything about them.
The rule here is that rather than distributing the traffic evenly, we’ll always display the variant with the highest average reward (conversion rate).
First off, with all variants being equal, we’ll serve the first variant, but we find that it doesn’t convert. That’s 0 rewards for 1 visitor. Because that reduces the conversion rate to 0% we serve one of the other variants.
This time, by fluke, the other “bad” variant actually converts. This keeps the conversion rate at 100%, so we use it again, until it fails to convert, which happens on the next occasion. Our conversion rate for this variant is now at 50%.
One final step. We now serve our “best” variant (which is still at 100%), but it fails to convert, dropping the conversion rate to 0%. This simplified example gives us insight into a new problem here.
Our rules say that we must always show the best performing variant, however, this “bad” variant will always have a conversion rate above 0%, even if it never converts again. Unexpectedly, the rule leave us stuck with a bad variant.
This purpose of this illustration, a play on the multi-armed bandit problem, is to demonstrate a need to explore (rather than exploit 100% of the time) is at the very core of how designers, business owners and marketers can deliver better results through their website.
An e-greedy algorithm is one that serves the variant that worked best in the past 90% of the time, whilst attributing 10% of the time to any other variant just in case it’s better.
Let’s take this example below.
90% of the time we’re serving the best variant. However, the next customer to hit the page fails to convert, meaning the conversion rate drops to 40% (2 out of 5 hits converted).
Again, the next customer doesn’t convert. Using the rules we outlined previously, even if the user failed to convert, the algorithm would have persisted with the “bad” variant. However, because we’re attributing 10% of our traffic to a random variant, even when there is a better performing variant, our generously better variant still gets a chance to claw itself back.
There’s plenty of discussion out there about the pro’s and con’s of multi-armed bandit algorithms. Some say it’s not great because it takes longer to reach statistical significance, but I feel statistical significance and the fundemental answer to the question “which variation is better” is less important than the actual performance of the website. Wouldn’t it be better to serve the best of what you have now, whilst also allowing yourself to explore and improve at the same time?
I think so. Let me know what you think on Twitter.