When Bots start doing Human Work, but Adobe Analytics still treats them like Bots…

Lukas Oldenburg
The Bounce
Published in
4 min readOct 25, 2023

--

A couple of months ago, Adobe migrated the (hardly useful) Bot Reports to the Analysis Workspace interface. Maybe they also updated some of the Bot Filtering logic since then, because the number of Bots identified seems higher than in the past, maybe it is just another real Bot rush.

Still, Adobe’s built-in Bot Filtering options are far from enough in most cases to significantly tackle your Bot problems. As I have explained in this article and in the presentation at Superweek with David Hermann, the site in question filters out 95% of the Bots already client-side, because we focus on the “preventative” methods, but still use some “reactive” methods (Virtual Report Suite with Bot Segments) for those that slip through. This is because we want to save the massive costs and additional slowness that filtering out those Bots with reactive methods (after they have been tracked already) incurs. We thus also avoid Bot traffic going to all those pixels, UX, AB Testing tools etc.

Even after that, a significant number of Bot Hits remains every day which “slip through”. As you can see, Adobe captures only a fraction:

Adobe (blue) and our

Now the problem is that we don’t know if it is actually a “fraction” of “our” Bots. In other words: What Adobe considers Bots does not necessarily overlap with the Bots that our own systems filter out!

So Adobe finds its own Bots based on the common, but simplistic IAB bot list which is based on User Agent / IP logic. But that does not always mean they are really Bots!

Adobe is more transparent than other vendors (GA? PiwikPro? Amplitude?) on their Bot Filtering. But unfortunately they still do not give us access to any data besides Page Names and “Bot Names” to analyze what they consider to be Bots:

That means you cannot see the eVars or Events that the Bots produced (unless you go through the raw clickstream feed I guess, which is always a project of its own). So their Bot detection logic is a black box and not actionable: We cannot draw any conclusions from it for improving our own Bot filtering to e.g. not even let some of those Bots through to AA (because Adobe bills us for these Hits after all…). It would for example be interesting for me to see which ones of these Headless Chrome browsers were let through by our client-side logic as “probably human”. And which ones of them might actually not be Bots after all…

And with this, we are where “it gets funkier” even, as Vulfpeck fans would say…

While debugging why some purchases were missing in Adobe Analytics even though we clearly sent them to AA, we found that these users were using Headless Chrome Browsers (User Agent “Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/93.0.4577.0 Safari/537.36” if that matters). And Adobe’s Bot rules go by User Agent and treat everything with “HeadlessChrome/” in them as Bots. As we can see, their Bots are almost all “HeadlessChrome/” browsers:

We however have proof that some of these Headless Chrome browsers are humans because they log in and even purchase, and one of the “users” is an internal user of the company that we use for tests (that’s how we found the proof). But there might also be others using scripted Bots to make purchases. While it is definitely a case for debate whether we should want to include such “users” in our reports (I remember Myriam Jessier bringing up this valuable point in the post-presentation discussion at Superweek), my view is that the technical attributes should be secondary as long as what the Bot does “looks human” (review my article human signals). So we do want to track Bots if the purchase, logins and so forth if they give us such “human signals”.

However, people (and Adobe) still see Bot Filtering as mainly an exercise to find the right User Agents, IP addresses and Network Domains… That helps, but it is far from enough. Adobe needs to be less simplistic here and allow for a human signal/whitelisting logic. We of course don’t want to switch off the AA Bot Filtering entirely, but that’s the only option you have at the moment if we want to see these users in the reports again, but then, every Headless Chrome browser gets through. There needs to be a more intelligent logic in place. E.g., Adobe clients should be able to include a signal in the tracking requests that “whitelists” a certain Hit as a “human” Hit so the Adobe Bot Filtering Logic is circumvented.

This could be done flexibly in an interface where I could e.g. say that, if the eVar which contains the login status equals “logged_in”, or if it is a “purchase” Event, Bot Filters should be not doing anything. Or more simply by sending an additional request parameter that signals whitelisting.

With more and more Bots doing human work, the simplistic User Agent / IP-based filtering logic will get more and more problematic.

Do you want to read my content right in your mailbox, without the Medium reading restrictions for non-paying users, immediately after I publish it? Subscribe! I will not use your contact for anything but this purpose.

--

--

Digital Analytics Expert. Owner of dim28.ch. Creator of the Adobe Analytics Component Manager for Google Sheets: https://bit.ly/component-manager