Originally published March 1st, 2017
Updated April 1st, 2020
Geolocating Carmen Sandiego
Have you ever heard of the game Where in the World is Carmen Sandiego? Even if you haven’t, you can pretty much infer that the premise is to find someone by searching around the globe. That was the ’80s, though. Today, we use the Internet to search for things or to obtain a location.
If a company wants to know where a website visitor or app user is, it uses geolocation data. Sometimes, a site will even display these little nuggets of Internet familiarity:
- “This website would like to know your location.”
These gauge (or advise of said gauging) a user’s geolocation. They are not required, however. Or maybe they are. It all depends on how the website visitor’s geolocation is obtained.
What is Geolocation?
If you don’t know what geolocation is, and — at this point — you’re too afraid to ask, it refers to the geographical (latitudinal and longitudinal) location of an Internet-connected device. Not your location, mind you, but the location of whatever electronic medium is being used to access the Internet.
So, if you leave your phone in your car and go for an hour-long run in silence (like some kind of animal), your geolocation history for that hour is the physical location of your car (according to your phone). Contrarily, because your fitness tracker traveled with you the whole time on your wrist, its geolocation history for that hour is wherever you ran … maybe. Probably.
How It Works (Or Doesn’t) – Geolocation Data Types
1. Device-Based Data Collection
Mobile devices (like smartphones, tablets, laptops, smart watches, and fitness trackers) are fairly intuitive in terms of their usefulness with geolocation. (How far do I have to drive to experience the joy of mosquitoes? Did I run uphill long enough to eat a pound of gummy bears?)
Device-based data collection relies on GPS and cellular networks, so it’s more accurate in places with more people because there’s closer triangulation. The lower the population density, however, the lower the accuracy. In these instances, there are usually delays or pauses in data. The margin for error increases, but hopefully not to the point where the family minivan, following its GPS, goes over the river and through the woods to a national park instead of Grandma’s house. Hopefully.
As long as location-based services are enabled and you have a GPS chip and a cell network signal, you can access (and be accessed by) these services for finding your general location through GPS-tower-device triangulation. Obviously, Internet services having access to this raises privacy issues. Therefore, for device-based data collection:
- Users have to allow location detection on each device (and for each application).
- Websites have to ask for a visitor’s location.
- As of Chrome 50, the HTML Geolocation API will work only over secure website connections (as denoted by https:// in the URL, instead of http://).
2. Server-Based Data Collection
The other geolocation method uses server-based data collection tied to your device’s IP address through a Wi-Fi or Ethernet connection. IP addresses are stored in databases where physical locations are associated with those IPs, mapped by years of data mining. This data is sold by third-party servicers, which means accuracy is only as good as the servicer’s data. Whenever the value of the data is based on accuracy but the source of the data is based on availability, the integrity of the data becomes suspect.
Consider the three credit bureaus:
- Each receives its data from creditors’ databases, public records, etc.
- Each has its own criteria for that data (how long ago to look, where to look, how often to refresh data, which data to auto-refresh, and so forth).
- Each uses its own databases to store all of that gathered data.
When you consider the spidering of information gathering that just three credit bureaus tap into, the likelihood that each will house identical information is improbable.
IP-based location databases are no different, except there are so many more of them. But, like the credit bureaus, their servicers also have their own criteria about how the data was mined, allowing them to provide custom geolocation solutions. For instance, one popular solution provider’s data comes from servicers that employ “user-entered” inquiry, which is a direct approach for retrieving information, such as by simply asking visitors to enter their addresses into a form. When the information is analyzed against the same or similar location responses (supporting data), as well as vetted through location algorithms (more supporting data), it’s considered accurate, or — at least — as accurate as the available data allow.
What does that mean? If enough incorrect information is entered, or not enough information is available, the databases guess. So, that’s it: IP geolocation accuracy is based on the amount of data (and supporting data) relating to a specific location, as well as the timeliness of that data acquisition through third-party servicer databases. This is why, when trying to determine the geolocation of Gravitate’s office (based on my laptop’s IP address over Wi-Fi), the results were different: Some servicers indicated Portland; others Vancouver.
IP geolocation, for all intents and purposes, is more accurate the further out the data pointing goes. In the United States, IP geolocation is 90-something percent accurate (that number varies, depending on the source database) at the country level. At the city level, the accuracy drops to between 50 and 70 percent. Given this, IP geolocation is best used for broader location detection categories, like a website visitor’s country. Naturally, if accuracy (and even data access) is less than 50 percent, privacy isn’t a huge concern, which is why websites don’t have to request permission for your location when using it.
3. Combined Data Collection
There are caveats to using either type of geolocation, of course. Naturally, you need visitors to give their permission if you are using device-based detection, which is the most accurate and the best suited for city-specific location information. Server-based detection, which is the least invasive and best suited for country-specific information, can return bypassed data if the visitor’s IP address is routed through a proxy server (e.g., VPN). In this instance, the IP address is actually mapped to a location that’s relative to the server’s location, not the visitor’s. Therefore, because either type of data collection can fail, a website will sometimes incorporate both types as a fallback, considering some data better than none for providing the best user experience.