5 data sources used by IP geolocation services providers

5 data sources used by IP geolocation services providers

Understanding how IP geolocation works?

·

7 min read

IP geolocation technology has become a crucial component of digital content and commerce management systems. It is widely used for location-based features like personalisation, localisation, targeting or security. Major digital systems like CMS, CRM, eCommerce, streaming platforms and ad platforms depend on IP geolocation for location data.

However, the IP geolocation market is flooded with all sorts of geolocation service providers - ranging from free to premium. Why are there such discrepancies in the market?

The main reason is that it is very easy to set up an IP geolocation service by using freely available public data. This ease comes with a price of unreliable accuracy and service. Further, the lack of understanding of how IP geolocation works and various challenges aren't helping either.

An IP address doesn't inherently have a location property. Instead, IP geolocation service providers use various data sources and complex algorithms to attach location data to an IP address. The accuracy of this completely depends on the validity and authenticity of the data sources.

Furthermore, most IP addresses used by people to surf the internet aren't static. Your own IP address at home might frequently change based on the policy set by your ISP. Therefore, even if you are able to know an accurate location of an IP address, it might not be valid the next time.

Therefore, accurate IP geolocation services are those that consider various available data sources, use complex algorithms and frequently update their databases.

Let's look at some of the key data sources used to identify a location of an IP address.

Freely Available Public Data

IANA (Internet Assigned Numbers Authority), which is the global organisation for distributing and maintaining IP address numbers, has assigned five regional registries, RIR (Regional Internet Registry), to take responsibility for their respective regions.

  • AFRINIC Africa Region

  • APNIC Asia/Pacific Region

  • ARIN Canada, USA, and some Caribbean Islands

  • LACNIC Latin America and some Caribbean Islands

  • RIPE NCC Europe, the Middle East, and Central Asia

All the registration data associated with a block of IP addresses are open and publicly available. The main data points are:

  • IP Range: The range of IP address blocks owned.

  • Organisation Name: The name of the company that owns this block.

  • Organisation Address: The official address of the company.

Sample Whois data from Apnic

Screen Shot 2021-05-11 at 9.40.10 am.png

This is the foundation of all the IP geolocation service providers because it provides official verification of the ownership of an IP address. Many freely available IP geolocation services use ONLY this data to approximate the location of an IP address.

However, these registry data alone cannot be considered as a real location of where these IP addresses are used. Often, the address on these registry belongs to the headquarter of the company. Hence, more than likely, the IP address is being used somewhere else. This is why the accuracy of freely available IP geolocation services is not very reliable.

Collaboration with ISP providers and data centres

In order to improve the accuracy of their geolocation data, IP geolocation service providers sometimes partner with ISPs to retrieve the location data of where the IP addresses are allocated.

Further, many ISPs also voluntarily share their allocation data (in the form of geofeeds) with the IP geolocation service providers to improve their accuracy. You might wonder what is there for ISPs in this engagement.

When we consume online content on sites like Netflix or play games on Playstation/ Xbox, these applications are depended on IP geolocation services to deliver a personalised experience to their end consumers. When the experience is affected by inaccurate geolocation data, the ISPs end up receiving a lot of complaints from the consumers. Therefore, it is very important for them to ensure that their IP blocks are accurately recognised by the IP geolocation companies.

Screen Shot 2021-05-11 at 9.59.31 am.png

However, it is impossible to access all the geofeed from every ISPs and the companies in the world. Further, there are no standards or mandatory rules that ISPs and other companies have to share where they have allocated their IP address. Hence, it is difficult to solely depend on this partnership.

Crowdsourced: user-submitted data

Today, it is quite easy for anyone to access their geo-coordinates and share them with websites or mobile apps. Many websites today provide the ability for users to share their exact location coordinates to deliver personalised experiences.

It might seem that the data collected directly from the end-users should be the most accurate ones. However, even user-submitted data has some drawbacks:

  • GPS devices are not always accurate, especially in areas where the signal strength and the coverage of a GPS signal are weak.

  • Coordinates received from the web using HTML 5 geolocation API are dependent on Wifi signals, hence it can also suffer from inaccuracies resulting from weak signal and coverage.

  • Due to the nature of IP address allocation, the location of most of the IP addresses is always changing. Hence, the data collected may not be valid the next time.

  • The popularity of VPN services means that it is getting difficult to identify the real IP address of a user which can result in erroneous data points.

  • Lastly, the volume of data obtained from this method is limited compared to other data sources.

Reverse DNS data

The next option we have is to use reverse DNS data of an IP address. Just like forward DNS that provides an IP address of the domain name, reverse DNS can help us uncover the domain name associated with an IP address.

This textual data is not mandatory and doesn't have any standard format. Hence, often it is used by network engineers to encode information for internal purposes.

Some of these encoded data can, however, reveal geographic information about the IP address. For example, the DNS below indicated that the IP address belongs to San Jose city.

p1-0-0.sanjose1.br2.bbnplanet.net

However, reverse DNS has many limitations:

  • it is not a mandatory data point and hence isn't available for all the interfaces

  • no standard protocol on how to name the DNS can result in misnaming and data that are difficult to parse and interpret

  • City names can result in ambiguous conclusions as many countries can have cities with similar names

Nonetheless, it is a useful data point that can be used to enhance existing IP geolocation data.

BGP (Border Gateway Protocol) Data

The BGP is a global internet address routing directory. It is responsible for successfully routing internet packages across the globe. Every time we request data on the Internet, the directory is used to route our requests and deliver them to the right destination.

The data can be used to identify/verify a network or a router responsible for delivering the internet packets to the destination IP address. Unlike Who Is data, BGP data shows the actual network or company that is using the IP address. Hence, it can enhance the accuracy of the geolocation data.

In addition to providing location data, it can also be used for identifying IP addresses that are not announced on the global internet for which geolocation isn't valid. This can be essential in detecting fraudulent connections.

Other data sources

IP geolocation technology has often garnered interest from faculties of computer science as it makes an interesting problem. Many research papers have been published that use various techniques to identify the location of an IP address.

As a result, there are many experimental methods proposed by the research communities, such as using the time delay of the internet packets and correlating that with a physical address. However, the internet network is complex and often changing and such methods are unreliable for commercial usage.

Conclusion

We have so far looked into some of the most popular data sources and methods used to identify users' locations based on their IP addresses. The accuracy of the data will depend on the level of implementation of these data sources and the underlying algorithms used.

As mentioned earlier, cheap and freely available services are unreliable because they use simple data lookup methods on publicly available data. Unfortunately, many developers use such services to build their web personalisation and customisation. Many WordPress or CMS plugins also depend on free services which are unreliable.

Furthermore, many IP geolocation companies often use the same data source to provide their geolocation service. Maxmind's free Ip geolocation database is the most popular service because it is a free database that anyone can download and start using. This has resulted in many copy-cats and white labels in the market with various pricing plans and branding. Many end users may never realise that they are using the same data source at the backend.

If you are building a simple website, low-accuracy data would not make much of a difference to your business. However, if you are building an e-commerce platform or advanced marketing automation platform, having accurate IP geolocation can have a huge impact on your business.

Some of the reputable companies that provide IP geolocation services using their own advanced technology and data sources are BigDataCloud, Digital Element, Maxmind and Neustar.