How to Avoid Bad Data

As many marketers know, data can be overwhelming. In today’s advertising and marketing ecosystem, we have more data to work with than ever before, and it can all get very daunting. However, data – and good quality data – are imperative to ensure accuracy and effectiveness in digital advertising. So, while it may seem intimidating, it’s important to understand what the data you’re using represents, and how to know you’re getting the exact the data you’ve paid for.

Accuracy vs. Precision

At Blis, we have rather high standards for data when it comes to accuracy and precision. “Accuracy” and “Precision” are words you’ll hear often to describe location data, and while they matter equally in our eyes, they’re definitely not the same.

Accuracy refers to the correctness of the data in relation to its description. If that data says it represents US citizens, and in fact it does reflect actually lat/long information for US mobile users, then it’s accurate. That’s very helpful if you plan to run a campaign targeting Americans. It’s less helpful if you’re targeting women who go to a particular chain of nail salons every week in the St. Louis metro area.

Precision refers to how specifically targeted the data is. Data that can narrowly target women who go to the Ten Spot Nail Salon on Main Street in Springfield, Missouri every week is precision data. That’s data that can be used to build a behavioral profile.

However, if data says it represents women who visit the nail salon in Springfield, but it actually includes men and women who visit businesses within 1,000 meters of the nail salon it’s not really accurate, is it? Apart from not being limited to women, it also may include people who patronize the coffee shop next store, the dry cleaner across the street, and the supermarket around the corner. If your campaign is intended for women in that neighborhood who get weekly manicures, that data – even though it’s only about 750 meters off – will not perform very well in your campaign.

That’s why Blis insists on data that is both precise and accurate.

Bad data is…

For us, bad data doesn’t necessarily mean inaccurate or fraudulent data. It refers to data that is simply too imprecise. As in the example above, imprecise data can wreak havoc on a programmatic campaign.

To help marketers execute the most successful campaigns possible, we work hard to identify misrepresented and imprecise data. There are a number of filters we use to catch this inaccurate data. We’ll dive more deeply into each of these in subsequent posts, but some of the “red flags” we look for include:

Centroids
Lack of precision in lat/long data
No country code/Wrong country code
Repetition of uniques
Not enough data
The Equator Test/ Greenwich Test
Bad publisher name

It would be impossible for humans to manage all these filters and capture every unreliable data point. Blis works with hundreds of thousands of publishers and manages over 40 billion impressions every day. We rely on computers and proprietary algorithms to detect and flag patterns of unnatural human behavior across all those publishers and all that data. After all, it’s computers and algorithms generating fake data and attempting to pass it off as real; what better way to fight that fire than with fire of our own?

If you look at visualisations of bad data, it’s easily discernable from reliable data. Human behavior, plotted out on a map, looks irregular and unpredictable, generally. Bad data looks like a different visually; it generates straight lines or repetitive patterns. It clashes with presumed or assumed human behavior. We can easily spot one single app passing a load of lat/long data that is being advertised as high-quality GPS data.

When we find bad data….

With so much data from so many sources, we have to look at the big picture to determine if there is actual bad data coming in, versus the occasional anomaly. To that end, we look at two weeks’ worth of data, from all sources, at once. That makes it easier to discern patterns. If there’s one day of bad data from one publisher, we can’t feel confident in that publisher’s ability to consistently deliver healthy data the next day. However, looking at fourteen days at a time, we have the ability to see that data has been consistently good over a two-week period, and feel that the publisher can be trusted to deliver quality data. When we do catch bad data, that publisher is temporarily removed from the supply set and monitored. When our confidence is restored, they can return to the pool.

While we have higher standards than most players in the industry when it comes to quality of our geo data, the industry is right behind us. The IAB and the Mobile Marketing Association are making earnest efforts to educate marketers about the important questions they need to ask about their data. We’re leading the pack, but there are efforts from around the ecosystem to educated and protect marketers who rely on clean, quality data.

Stay tuned for more of this series, and read about Smart Pin, the proprietary technology that filters out bad data and keeps our data set pristine.

Amy Fox

Product Director | Blis Amy is responsible for high-level product strategy and development alongside the release of new revenue streams and products to the market. As one of the original Blis employees, Amy has grown her career over the last few years from an entry level role in partner relationships to heading up both Operations and Product sequentially.

Relevant articles:

How to Avoid Bad Data

Amy Fox

Elias Psarologos

Avery Rudman

Avery Rudman