Have Automation, Algorithms and Machine Learning Killed the A/B Test?

By Hannah Eisenhauer

Theoretically, A/B testing is fantastic: B2B marketers compare two versions of an ad, landing page, email or other copy, changing one variable. Then they look to see which version may better resonate with customers.

But in practice, a lot of outside variables in today’s digital landscape make it hard to do a clean A/B test.

Marketing channels can input all sorts of factors into A/B tests that likely have little to do with what you want to measure and learn about. Rather than let that slow you down, embrace it. The machine learning (ML) and artificial intelligence (AI) available in many of today’s channels and tools can help your campaigns perform far beyond completely manual management.

The channels in which B2B marketers place ads are highly dynamic, i.e., too dynamic to maintain strict environments for A/B testing. Rather than trying to make an A/B test perfect, lean into the dynamic nature of the channel and test broader messaging or creative ideas. Here’s how to do it.

More Digital Messages, More Digital Complexity

The digital advertising space offers increased levels and options for ML and AI automations every day. This includes the dynamic placement of ad assets (formerly extensions), headline rotation, dynamic keyword insertion and so much more.

The problem with leaning into ML and automated tools within platforms is they strip away our narrow ad testing controls.

Let’s take search ads as an example. Say we want to test “Ad A” vs. “Ad B” to determine which version may generate a higher click-through rate (CTR). Consider the following scenarios:

“Ad A” may always appear with two headlines while “Ad B” appears more often with three.
“Ad A” may always appear with sitelink assets while “Ad B” appears more often with image assets.

Each of these scenarios raises new, more difficult questions about how to evaluate the A/B test results. Comparing the CTR of Ad A to Ad B without considering the impact of each of those different scenarios could leave you missing extremely relevant variables. And collecting enough data for a statistically significant pool of data to be able to fairly evaluate Ads A and B while considering all of these extra factors will take extensive time and deplete budgetary allocations.

Even though ad assets — like sitelinks, callouts and images — compound the variability and meaning of A/B test results, I would never encourage marketers to shy away from them. I’d much rather they appear with our ads than not. Not only do assets allow you to present your brand in a more elevated way, but studies show assets greatly increase CTR and conversion rates.

Consider another scenario: Ad A may always appear in position 1 while Ad B may always appear in position 2.

In this scenario, our extra variable is not related to one of Google’s automated features or ad assets. So even when we don’t employ these features, variables based on the highly dynamic nature of Google search engine results pages (SERPs) are always going to be present.

All this complexity makes strict A/B testing very difficult. But B2B marketers should embrace this. Let’s face it: the search algorithms engineered within Google and Microsoft for rapidly collecting, computing and analyzing data are faster than any of us could ever be. Google knows far more about our audience in that auction moment than we could predict. And holding back from these dynamic elements in order to protect your testing may very well limit your search ad potential.

Finding Certainty in the Sea of Social Posts

Many of the same challenges exist with social media.

Conventional wisdom suggests that B2B marketers A/B test social posts by, for example, changing the color of images to determine which may best resonate with audiences. But today, you can’t control where a user sees your ad. It might be displayed next to two unrelated posts with content or design that doesn’t interest the user, prompting them to scroll past.

And A/B testing doesn’t always explain an audience’s reaction to your social posts. If one color appears more often for managers while a second color appears more often for directors, is it color 1 or is it managers? (Answer: It’s not clear.)

Placement can also vary by platform. A video ad on Meta could appear in stories, reels, a newsfeed or the video section on Facebook. If one ad variation serves in one of these placements more often than other variations, you could have skewed A/B test results.

This lack of certainty can be frustrating. However, even if you could more tightly control where ads appear on social platforms, new challenges would crop up. Once again, you’d find yourself needing a large budget and/or a lot of time to collect a statistically significant number of clicks.

Three Steps to Keep A/B Test Results Insightful

None of this is meant to suggest that B2B marketers should stop A/B testing. It remains a valuable methodology for learning about business audiences by comparing what messages, designs and/or formats they prefer. Let’s just be more efficient and refresh our approach to digital platforms by following these three steps:

1. Determine what you want to learn before testing.

Video versus images
Pain point A versus pain point B
Benefit A versus benefit B
Audience A versus audience B

There’s no right answer here. So many elements of an ad or message can be tested. But it may help to consider what you plan to do with your learnings, so you can take actionable steps once your test is complete.

2. Build campaigns and ads to answer these bigger questions.

Don’t restrict yourself to only testing a single difference among ad variations. Google, Bing, LinkedIn and Meta will introduce their own differentiating factors anyway. It’s OK to change more than just a single word, headline or sentence.
Instead of A/B testing, make it A/B/C/D/E/F/G testing. Focus on crafting a set of great ads — even if that requires large differences among variations or 10 different variations.

3. Don’t forget the outside variables that could impact your test results.

Be sure to double-check any factors that may have skewed the data, such as:
- How often did extensions appear with your search ads?
- Where did your video ads run?
- Which audiences saw which ads more often?
Don’t avoid the variables that could ultimately improve your performance, but consider the impact they may have had on your test when analyzing results.

Completing these steps can provide the insights that may help you make better decisions based on real data about and from your target audience. Even with all the new complexities, A/B testing is far better than guessing.

Remember, the goal of testing is to ultimately find the ways in which your ads can achieve the most success. It’s OK to lean into what works, even if it’s not the most scientific approach.