Site logo
Video

Speaker/s name

Jon May

Description

Speaker:
Jon May, Author Send Better Emails

Video URL

https://vimeo.com/661634635

Transcript

Andrew Bonar 0:00
Are we live? Yes. I'd like to welcome everyone that's joined us online, because by all accounts, there's probably 10 times more online than there are here. Thank you very much for those that managed to make it. So early this morning, I understand a lot of you had a really, really good night last night. So some of the people watching the replays might get this. I'm glad you have a good night last night. And hopefully, you'll be able to make it. But yeah, the rooms filling up now everyone's got the messages. So that's great. The first session is Jon May. Jon May works at the RSC, but he also runs two or three little side gigs of his own, which I'm sure he's going to talk about. And one of the things he's recently done was published a book, send better emails, which is a fantastic read, I'm still trying to finish the book, because you only get to halfway through a chapter before he sends you to a bunch of PDFs that help you even more. So if nothing else, I realized that was spent 20 years in email, and Jon May still has a heap that he can teach me. So if there's lots that I've learned from Jon may know that everybody else has something that they can learn from Jon, May everyone does a little bit of email marketing, even if they don't give themselves an email marketer. Everyone's using email, Jon is certainly a fantastic person to tell you how to send your emails better. So without further ado, Jon.

Jon May 1:38
Thank you, Andrew, can you hear me? Yeah, thank you, Andrew. So today, I'm gonna be talking about a B testing, all about a B, we're not going to cover multivariate testing, because you need to have like huge data sets and probably applies more here. But I thought we'd stick with the basics, a bit of a crash course to a B testing, if you will. So my presentation is called making better faster data lead decisions with a B testing. And there's a lot of kind of, to that. So it's kind of, we always want to make better decisions, we want to make them in the least amount of time humanly possible. And we want to use this and I'll come on to what this is data led decision making. So I appreciate you see a B testing and everybody sneezes like a cat. Or as some of them still snoozing in the hotel rooms. But it's it's definitely a really is an interesting area of email. And it's it's only been the last year that I've kind of really probably gotten into it. And you don't have to have a statistics degree in order to fully understand what it is and how to get there and get the get the best out of it. But it certainly helps. But so I'm going to be going through a bit of a bit of a background as to how it works, some of the key things that people need to know when they're doing it, and then there's some resources at the end for you. So this is me wearing the same shirt. This is my nice shirt. So it's exactly the same thing. If you follow me on Twitter at John Jay O N does emails and they will see you know, at a quirky fact about yourself. So in lockdown and mixed in March this year, about nine months ago, I started cooking online on Tik Tok. And I don't know how many people know very much about tick tock in the room. I didn't know anything about tick tock until I started to get now I mean, this is a little bit i don't know like 220,000 followers on just me cooking dinner. So that's my that's my claim to fame is I cook on tick tock. So we already see the honestly, in Britain, it's the UK is largest breakdown provider. So if your car breaks down a chat like this, or women, we have quite a few female patrols Now we'll turn up and obviously try and fix a car if they fix four or five at the roadside. And if they can't move to to a local garage. So we've got 13 million customers. So the UK's biggest program provider with over 13 million customers. So that's the RSC and the AAA of the two big people in the space. So we kind of pretty much hold about the same each. But yeah, 13 million is quite a lot. No pressure. So we're gonna get through today is how data helps us make decisions. Why do a B testing and the kind of why not to there's a pretty convincing argument that sometimes actually, it's not always the best approach. And I know email geeks is always kind of splits when that someone asks about testing, that's quite a few people go, you should never do it. Or you shouldn't do it unless you've got an absolutely outstanding reason to how to build a test theory or hypothesis. And we'll come to that in just a bit how to actually analyze those results. Most ESPs suck at kind of telling you which one one and kind of the context behind it just as this one one was rolled out. But actually, with a bit a few more key details. It could actually really help. And there's some free downloads at the very end. I'll give you the link at the end. So how data helps us make decisions. So data driven. So there's three types of kind of how we make decisions using data. So data driven, the data tells you what to do. So your weather app says, bring an umbrella today. So you'll bring an umbrella. The data informs where it's presenting the data, and the human has to make all of the decision making goes, there is a 75% chance of rain today. And it's up to the human as to how much that's going to be I mean, I live in the UK 75% is a little low for us. So I'm not taking an umbrella. But if you live in a, you live in a country, whether that's, that's a bit high, you might take one, so it's all about that personal level of risk. And then data lead is kind of combining either lots of different types of information or giving a suggestion and the right amount of information. So it'll rain, but only when you're in meetings today. And then it gives a little bit more information. 75% Chance between about three and 5pm has given you just about enough information and a suggestion. And then maybe like, should I take an umbrella steak? Probably not, but is combining those different data sources together.

Obviously, an ESP, it would look like your email platform says rollup version, be it one hands down, just roll it out, no problem. So you probably select version B as your winner. data informed goes sales rep. 15%. And you're like, actually, yeah, 15%, that's loads of money. B. But actually, if you get a little bit more data, it says there actually isn't enough data to say. So it's little bit small here. So apologies. So visually sold six products version be sold seven products. So yes, it's up 15%. But there's nowhere near of data to be able to actually say. So it's probably a bit of a question mark, more data needed. So why should we do a B tests, lots of convincing arguments, more money is probably the best. But it helps us make better data lead. And that's kind of the idea of, you know, not data informed, but the kind of data led decision making. So it helps us kind of make those decisions a little bit faster. It helps turns opinions into facts. And I'll come on to in just a second. But if you've got some good Yeah. You know, putting Phil Phil up subject line with emojis. It really works. It uses articles like Oh, emojis increase open rates by 50%. But yeah, for that specific audience, but actually, until you do your test on your your own list your own audience, you actually don't really know. And especially b2c b2b, so many differences, it's you needed to kind of test all those ideas on your, your own list, and your own audience, helps internal stakeholders be heard. Not a great reason for doing it. But especially we've got quite a lot of internal stakeholders, the RSC have 4000 members of staff, there was a lot of debate about how we should be doing stuff. So when people want to think, oh, yeah, no, we want to, we want to try this out, it can help us go, actually, we tested it, and it didn't work. So it helps helps us be able to kind of schedule some of the things that maybe we didn't really feel. And this This is the community instead of internal stakeholders, this is the hippo and the hippo is the highest paid person's opinion. So this is normally your clients or your boss, or someone on the organizational chart is slightly higher. And if they go, Yeah, let's do that. It's difficult to say no, because it was your client, and they're paying you you've got to do it. But a B testing can certainly help go actually, yeah, let's test that. Let's turn that opinion into a fact. And that's what we're trying to do with to avoid the hippo we can learn specific things about our audience, but each audience is different. So so the RSC, we've got 30 million customers, that's gonna be very different from, say, a supermarket, because the age is skewed, the demographics are different. So actually, a B testing is helpful to understand our own audience a little bit more, and it's solely applies to that audience, or segments within it can help us try out unconventional ideas, a little bit wacky, but if you've got a few different kind of like, Yeah, we're gonna just, we're gonna change the background color to red, why not? You could try it out on a very small population size. So it helps us try out unconventional ideas that otherwise you might go, actually, it's probably too much of a risk, we're going to lose loads of money. So maybe we just try out a small pot first. And it reduces the overall risk of failure. So if you're doing a bit of an AV test, you can test it on smaller people. And actually, if it doesn't go well, you haven't lost all that much. But equally, there's a good case of when we shouldn't do a B tests. So when there's not enough data or subscribers, I see quite a lot of people, you know, they've got a list of maybe a couple 100 people and then that multivariate testing the life out of it. And it's like there's just not enough people or not enough data to be able to come to any kind of results that's going to be meaningful. It doubles the creative required if you're sending out version A and version B. You've got to make a version A and version B. If so, especially for teams that are either streamlines, or you're gonna get it out the door today, it's challenging. So quite a lot of obviously, other people have been talking about email design systems to it's going to speed up the process. But it does, you know, if you're going to send out two versions, you actually need to create two different emails. Not enough resources will staff and this is not necessarily in just creating it, but actually analyzing it, the RSC have a data analytics team, and they're kind of booked up for kind of weeks in advance. So you know, there's no point doing a test on you know, huge datasets, unless there's somebody who's kind of competent to know what what they're looking at, to be able to analyze it, and actually give you a recommendation.

And testing for the sake of testing, I like testing on what we test on probably are our most impactful campaigns, we don't test on all campaigns, because it would just generate so much work, we have maybe 160, campaigns live at any one time, try to test and analyze all of those. I mean, frankly, would just be another two or three people into the team, just to kind of analyze all those results. So the ones that's got the biggest optimum, biggest room for improvement, I think. And sometimes actually, instead of doing two versions, just focus on making one of them better. If you're strapped for time, cash, people resource, actually just making one thing better, can actually make it the whole, the whole program feel much better, as opposed to just doing trying to half assed emails and shove them out. So a test theory or like the formal things hypothesis, but people hear hypothesis, and then they immediately fall asleep. So I got to test the theory, just try and kind of liven it up just trying to democratize the words. So So the example I always give is, I think that and then an item to test will increase, decrease, or not change the min some kind of metric. So I'll come into a few examples in a second by and then I bracket them to a little bit on a lot. You don't want to say yeah, it's gonna increase it by 7.2%. Because that's kind of a stab in the dark unless you've got loads of data beforehand, because and then a reason now the reason it's kind of not necessary at this moment in time, but when you come back six months time, and you look back at them, and you're like, why did we do that test? What what was the thinking behind it, some of that reasoning can definitely help. So an example that we've actually used is, I think that image personalization, where we put maps in emails will increase the click rate, and that's the metric that we're using. For this specific test, each of the metrics will be different, and I'll come on to metrics in just a second, buy a bit, it's not gonna be 150%. But it'll do, it'll do a good amount, because customers can see and review their local garages on a map before they click and continue. And it definitely does increase it. But you know, it's good to understand that reasoning, because when I come back to it, 612 24 months down the line, or even when someone, you know, hand it off to someone else in the team, they've got an understanding of why I thought he was going to do that. And then ultimately, what happened. An example another one would be I think that kind of adding value where they could have added sort of discounting it by 25%, you add 25% on for free, will increase our overall revenue, because we're not discounting by quite a lot, because we're not cutting our legs off. Because discounting can hurt our we call the average revenue per unit, or a Ovie average order value. Because yeah, if you're adding extra people still getting the same kind of discount, it's just applied at the other end. So a few ideas. Probably no email novices in the room, as I mentioned, but a few ideas, you might think about, like, does sending a cross sell just across related products? improve overall revenue? Dances? Probably, yes. Does monthly emails through their annual subscription would increase the retention rates, and that's something that we spend a lot of time on. We've got, obviously, we were trying to acquire customers all the time. But actually, the big money is in keeping the customers we've already got. So how can we do that in a more effective way? Does a post purchase sequence lower the return range, especially in say, products, where there have to be physical shipping, especially this time of year, there are always delivery delays? If you can have contact them all the way through that? Does that reduce the complaint rates? Does discounting saving emails, improve the overall number of orders, even if that hurts the revenue, I suppose that's more for the business to understand or decide the importance that they place on that.

And, for example, say delivery notifications, actually does SMS and delivery improve or kind of lower that complaint rates from people saying I didn't know it was going to turn up? So kind of combining lots of different things together? These are just a few ideas to get started if you haven't, so email creative metrics. Now, this is where this is one of the metrics that we'll all be used to on a on a day is the basis. So delivery rates, open rate, and I put a little asterix there for iOS 15. But open rates, click rates, unsubscribe rates, we use what we call the click quality score, which is the ratio between clicks and unsubscribes. And that helps us understand the kind of positive in sentiment. So clicks through to a website or a landing page or CTA, we see as positive engagement, and unsubscribes. We see these quote unquote negative engagement. And we look at the kind of ratio between them and kind of plot it as a percent of positive percentage sign from zero to 100, just so that we can see and compare all the campaign's together and have a look actually, these are the worst performing ones, we either need to cut them have a look in more detail or kind of focus on those to start with, whereas the click rate, that might be a bit deceiving, because some of them might have more or bigger CTAs than others. email program metrics as a whole. So we go list growth net, as opposed to gross because it was taken out those unsubscribes average days this last engagement and if it starts to creep up, is there something that you can do to kind of measure that average days on the list from the kind of birth to their, quote unquote, death on the list, the wind back rates, if they're unengaged? How many people you can have reengaging back into the program, sunset against similar idea and abuse rates. So there's kind of quite a few different things just to kind of ask them as measure the email program as a whole. So if you're having a major, I mean, I'm not gonna talk about deliverability. I know very little about it. Other than you know, if you're not sending out spam, you're probably going to see lower abuse rates, that's probably maybe just that we can probably certainly predict before we get there. financial metrics, and these are probably most business orientated. So profit, or revenue, average order value, items per order. So the RSC is mostly an insurance product. So you only buy one, you tend not to buy several insurance products for the same car. customer lifetime value, average orders per customer. So especially in different types of E commerce, where you add things to baskets, actually, if we reduce the overall kind of discounting, could we actually increase the number of orders or the revenue associated? So there's the kind of the more the hardcore cash numbers again. And then I suppose the business or product, depending on what kind of area of businesses so lengthen an app, if it's a SaaS products, number of active users retention rates, and can email or kind of an email program impact any of these rebuy rate, cancellation rates, returns read complaint, I don't want to read them off. So once we've got this, we're gonna Yeah, I think that this is going to have a big impact on our retention rates. And yeah, let's do it. And he sent them all out. And you get the results from the SP and the SP is notoriously suck at doing this CLEVEO has started to put in statistical significance kind of banners in there a B testing, which was the first DSPs to do that. And it was actually very helpful for, I don't say email marketers, but say, generals, marketers who don't understand some of this stuff, actually get all this data back. Great. What does it mean?

So the RNC we use a few kind of trust checks, just to make sure that the data is kind of not deceiving us when you can always lie with statistics, but you want to make sure that at least the lies are manageable, I guess the trust checks. So if there's enough data, so if we're doing an ABS, we want at least 100, I say conversions, but if our if our metrics, the click rate, That click is considered the conversion, so you want at least 100 Neither side, because otherwise, any jump up or down, actually is kind of too small to be able to have any kind of real result, reliable data. So kind of repeating the test, you're gonna get the same result again and again, is it just was that one day you sent it out? And wow, you know, conversions went up 400%. And the next day, it went back to normal? Actually, could you repeat it, and it's not a fluke, to give a bit of a chancer, the lift is bigger than the background variance. Now, I don't think it's particularly controversial to say you should probably at some point doing a test where you send out exactly the same email exactly the same, and just send it out, and you will get one of them will be higher than the other, they never even out and I just Gremlins of the universe, you will always have a variance. So we have a net variance of about 1.6%, between point six and 1.6%. And it's weird. So if it's if it wins, and it's just instill in that variants, actually, it could just be the background radiation of the cosmos, rather than actually anything that you've done on the test that's actually making it bigger, no technical problems. So was having what's called a guardrail metric to be like, actually, let's make sure the delivery delivery rate is the same in both of these because if one of them's having, you know, issues with that, actually, that's probably going to hold and that's going to undermine the whole validity of the test itself. And we're gonna come on To confidence, the second 95% statistical confidence, or statistical significance, but we call it confidence, it's pretty much the gold standard. If you've not got enough data, you will have to go lower. But if you've got more data, you go higher, but 95% is generally considered the kind of gold standard. And what why do we use 95%? You know, 50 50% is a coin toss heads or tails, you know, which are present. I mean, you're just stabbing in the dark. Somebody present is a guess you'd be like, yeah, it's Valencia's probably not gonna ring today. That's a guess. 80% and educated guess. So you like actually, based on the last few days, it's probably less likely to rain. I'm obsessed with the weather. It's just a British thing. 90% is a pretty good guess. You know, actually, if you were to repeat it every now and again, it's not actually going to work. 95% You aren't? You're getting pretty sure. And when you get to 99%, you are sure. And I suppose that's the that's the burden of us was reasonable doubt in the justice system, it wasn't to make sure you're 99% or Sure. So got real metrics. It's a really boring thing to think about it. Oh, yeah, I want to make sure that the test is technically valid. But actually, if if we're going to slide off that guardrail metric, the whole test is to be avoided. Now doesn't happen to us very much. But it does. And it's very important to go, actually, whenever these results say, it's not quite right, and we need to avoid the whole thing. If it looks a bit weird. Have a look at why. So it's an unrelated metric to the test that you're doing. So normally, if you've got like open click conversion, you'd normally start a can, I suppose, a step ahead, to be unrelated. And it helps check that kind of, I suppose technical veracity of the campaign. So ESPs will, quote unquote, randomly select people, a computer computers are very bad at randomness. And I don't know why are is quite obvious BS, just kind of, it just looks ever so slightly skewed one way or the other. So it's a good way to just check that actually, it has done as a good and a randomness job as it possibly could have. So say, Does percentage saving, improve the number of orders, so it's up to subject lines, one product only nine pound 99 a year, or actually save 50% on an annual plan. So once a price point, once a percentage, and that's what I'm testing. So, so I'd probably use the number of bounces, if the number of bounces was very different on either one of those, actually, there's probably something wrong with my deliverability. And at which point, the whole test is void, because there's no point in going actually, yeah, this one, one, but Oh, also all the others, they went to the spam folder, oops. So in a test, so we got our A B tests where you, you literally have the random fork of 50%, go to one 50% Go to the other, the a test is exactly the same just you send it in both together, but should have been there before. But doing all the stats, and kind of trying to work it all out and doing all the differentiation. It's hard.

A live feed from how I felt when I tried to when I started getting it. So I've built a little free tool to try and help it a bit more. There's a Google Sheets version. And then later next year, there'll be an actual version of it, where you can, it's a bit smaller than screen apologist, where you can literally just type in a couple of numbers, and it will actually tell you and if there's any problems with it, actually, it will highlight actually, maybe that's a reason for investigation. So it's called junction being a motoring organization. When all that actually imaginative. So in the RSC, we actually did this, we had a, we kind of renamed it a bit more technical, but injunction on email we've made a bit simpler. So like possible sample, sample ratio mismatch, you know, some of these things that, you know, I hated to have to know. But when you do, you kind of like, you kind of want to make sure that you send in the you know, if you've got an A B test, and he's saying it to 50,000 people, you want to make sure it goes to 25,000 each, you don't want to send 30,000 to one and 20,000 to the other because you'll skew skew your analysis. And being a motoring organization, we put loads of road signs in because why not? So we've got quite a lot of our test trust checks on the way. So we've got sample ratio, match guardrail match, it is a winner, that's always very good. If the if the B didn't outperform the A, there's something going wrong. We've got a confidence and there's enough data. So once all these five checkboxes are done for us, that's a winner, if any one of those is maybe not, we'll have a look and see why. So what we have covered. So we've had a little bit of thought about how data helps us make decisions, why we do a B testing and why we shouldn't how to build a test theory or hypothesis and how to analyze a little bit some of those results. And there'll be a free download for you to help Africa. So this is the certificate to junction dot email slash hash Valencia it's a unique URL for all slides and the download. You can access that at your leisure. So yes, that I think that item to test We'll increase decrease the metric by a bit, because and then the reason I think it's the reason is really very helpful for future you to understand, actually, what the hell did we do that in the first place? So thank you ever so much for your time. And I'll hand back over to Andrew.

Andrew Bonar 25:20
much. Thank you so much, Jon.