How Blis Filters Bad Data
As we’ve underscored in previous posts in this series, bad data is an ongoing concern in our industry. While it isn’t necessarily deliberate or fraudulent, quality issues make an enormous percentage of available data unusable for advertising purposes. At Blis, we’re incredibly vigilant about ensuring the quality of the data we pass on to our clients. Ultimately, only about ~20 percent of data collected meets our stringent standards.
Part of our approach is that all data is “guilty until proven innocent.” Because we understand the negative impact bad data can have on campaign results, we use several filters to weed out anything that may be inaccurate or phony. The majority of that bad data is identified through the three filters, which we referenced earlier in this series:
- Centroids: When visualized data falls into grids, straight lines or symmetrical shapes on a map, this is the result of broad-reaching centroids, which may deliver generally accurate but highly imprecise data.
- Precision: Blis considers lat/long data of three data points or less too imprecise to deploy in programmatic campaigns.
- Uniques: When curiously large volumes of data originate from a single lat/long – a space that is only square millimeters in size, that’s a red flag.
These filters alone remove the majority of bad information from the pool, but there are other patterns we scan for, as well, which may indicate that data does not achieve our quality standards.
- VPN: For a variety of reasons, users may access the internet through a virtual private network, or VPN. When I’m abroad, for example, I will use a VPN app to access television shows that can only be viewed in the UK. The VPN app makes is appear as if my phone is still in the UK, so I’m able to watch my shows. While that’s convenient for me as a user, it does create inaccurate data regarding my whereabouts. Fortunately, it’s easy for Blis algorithms to detect VPN access, so this data containing VPN information would not be permitted into our pool.
- Equator or Greenwich Test: Lat/long data, as we’ve seen, is numeric and contains a series of three to eight decimal places. However, if one of decimal series is zeroed out, it would mean that a user is either sat on the equator (zero latitude) or the Greenwich Meridian (zero longitude). While it’s very possible for someone to be on the Greenwich Meridian, since it passes through London, if we receive a series of bid requests where the lat or long is zeroed out in every request, we can assume it’s not accurate. This is typically due to a bug of some kind rather than fraud, but regardless – it renders that data both inaccurate and unusable.
- No Country Code or Mismatched Country Code: Sometimes, we’ll receive data with no country request attached. These will often show lat/long in the middle of the ocean or outside any known territory boundary. In these cases, we’ll assume the carrier or publisher doesn’t know where the user is, and consider that data unusable. When there’s a mismatch between the country code and the lat/long, we once again know that the data is not right, and consider it untrustworthy and unusable in most cases. Interestingly, these mismatches can occasionally be used to target tourists. For example, if their phone’s country code is in the UK, but the lat/long information places the user in the US, we can generally assume that this is a UK visitor visiting America.
While these filters are tight enough to prevent ~75 percent of bad data from getting through, there are additional, proprietary filters Blis uses to ensure our clients are getting the cleanest data available. For additional assurance, SmartPin brings transparency to the process, allowing clients to access and visualize the data for themselves.
Blis is committed to delivering the cleanest, most precise, and most accurate location data on the market. We understand that scale is sacrificed in the process: less data means less scale, of course. However, we’re confident in the quality. We know the targets are good. And we’re fairly sure that you’d rather have a trim and efficient campaign that hits exactly the right targets than a huge, scattershot campaign that hits none of them.
Better data yields better results, period.