most accurate geo-IP source to build country-based access lists

Martin_T · June 8, 2015, 2:11pm

Hi,

let's say that I need to build an ACL where I block all the IPv4
traffic from Sweden. I considered following solutions:

1) RIR statistics
files(ftp://ftp.ripe.net/ripe/stats/RIR-Statistics-Exchange-Format.txt)
accessible for example at ftp://ftp.apnic.net/pub/stats/. However,
those files contain allocations and assignment made by the registry
producing the file and not any sub-assignments by other agencies(for
example NIR, LIR). This means that this information is not very
accurate. Another problem which I found out is that in case of inetnum
object has many country fields, the first one is used. In addition,
even the RIR statistics exchange format document says that:

cc = ISO 3166 2-letter country code, and the enumerated
variances of

{AP,EU,UK}

These values are not defined in ISO 3166 but are widely used.

                The cc value identifies the country. However, it is
not specified
                if this is the country where the addresses are used.
                There are no rules defined for this value.
                It therefore cannot be used in any reliable way to map
IP addresses
                to countries

2) MaxMind products. Those should rely on user input(for example
MaxMind purchases user data from ISP's or content providers) and based
on personal experience defaults to RIR data if no other more accurate
source is available. If anyone has something to specify here, then
please do so.

3) Use iptables geoip module, but turned out, that it uses MaxMind database:

root@VM-host:~# grep -Hsi maxmind $(dpkg -L xtables-addons-common)
/usr/lib/xtables-addons/xt_geoip_build:# Converter for MaxMind
CSV database to binary, for xt_geoip
/usr/lib/xtables-addons/xt_geoip_dl:
http://geolite.maxmind.com/download/geoip/database/GeoIPv6.csv.gz \
/usr/lib/xtables-addons/xt_geoip_dl:
http://geolite.maxmind.com/download/geoip/database/GeoIPCountryCSV.zip;
root@VM-host:~#

4) In theory geofeeds(http://tools.ietf.org/html/draft-google-self-published-geofeeds-02)
would be a nice solution, but as I understand the RFC, it would work
for my example only in case all the IP address users would provide
their geofeed and there is a centralized database to query.

5) Use prefix AS path. However, there seems to be no reliable way to
determine source country based on information in BGP routing tables.

Are there any other possibilities to geolocate IPv4 addresses with
higher accuracy?

regards,
Martin

Dobbins_Roland · June 8, 2015, 2:14pm

There is no direct relationship between logical network topology and geopolitical boundaries.

Mans_Nilsson · June 8, 2015, 2:23pm

Are there any other possibilities to geolocate IPv4 addresses with
higher accuracy?

There are three levels of untruth: (in increasing order of falseness)

1. No, mom, I did not eat the pie.

2. "There are no Russian soldiers in Crimea"

3. IP Geolocation

David_Hofstee · June 8, 2015, 2:36pm

4. There are no Russian IPs in Crimea?

David Hofstee

Deliverability Management
MailPlus B.V. Netherlands (ESP)

-----Oorspronkelijk bericht-----

Alan_Buxey · June 8, 2015, 2:37pm

Hi,

2. "There are no Russian soldiers in Crimea"

eh? we know there are as it got annexed last year. I think you meant

"There are no Russian soldiers in Ukraine" ?

alan

John_McCormac · June 8, 2015, 2:56pm

On 08/06/2015 15:11, Martin T wrote:> Hi,
>
> let's say that I need to build an ACL where I block all the IPv4
> traffic from Sweden. I considered following solutions:
>
> 1) RIR statistics
> files(ftp://ftp.ripe.net/ripe/stats/RIR-Statistics-Exchange-Format.txt)
> accessible for example at ftp://ftp.apnic.net/pub/stats/. However,
> those files contain allocations and assignment made by the registry
> producing the file and not any sub-assignments by other agencies(for
> example NIR, LIR). This means that this information is not very
> accurate. Another problem which I found out is that in case of inetnum
> object has many country fields, the first one is used. In addition,
> even the RIR statistics exchange format document says that:
>

It is a very difficult problem because IP ranges change and are split or redelegated. This means that even a reasonably current database will have data that is either out of date or not current.

I mapped all websites in com/net/org/biz/info/mobi and the new gTLDs last year. While these are simply websites, the rise of VPN services and TOR have made blocking at a country level somewhat problematic. You may get many of the IPs associated with the country but you will not get them all.

At a brute force country level it is possible to use the Delegated ranges lists but that runs into the problem where IP ranges are subnetted and allocated to other countries. This happens more with hosting service providers more than ISPs. There is also the Adjacent Markets effect where a provider will be operating in geographically close markets and the provider's largest IP range will encompass all the country level allocations. This problem typically reoccurs every time a large transnational cable TV/ISP acquires a new range of IPs and the online services such as Netflix are waiting for the IP range lists to update. The cable ISP's users generally appear, to the online services, as being in another country.

> 4) In theory geofeeds(http://tools.ietf.org/html/draft-google-self-published-geofeeds-02)
> would be a nice solution, but as I understand the RFC, it would work
> for my example only in case all the IP address users would provide
> their geofeed and there is a centralized database to query.

The idea of all IP address users submitting their data is nice in theory but it runs into much the same problem as submission based web directories. Most users are either unaware of the existence of such projects or have no interest in doing so.

> Are there any other possibilities to geolocate IPv4 addresses with
> higher accuracy?

There is but it is seriously labour and resource intensive as it would require a working model of a country's network infrastructure. Basically it uses a combination of IP data and IP mapping using route tracing. There were some US patents published on it a few years ago (I think that Google may have been one of the patentees.

Regards...jmcc

Blake_Hudson · June 8, 2015, 5:43pm

Have you thought about application layer tests - e.g. is the client's character set/language set to Swedish? Has the user identified himself/herself/henself as living in or being from Sweeden?

--Blake

Alan_Buxey · June 8, 2015, 9:10pm

Hi,

Have you thought about application layer tests - e.g. is the
client's character set/language set to Swedish? Has the user
identified himself/herself/henself as living in or being from
Sweeden?

...just waiting for someone to suggest checking their web cookies
to see what area they've got defined in adultfriendfinder or whatever...

alan

Bacon_Zombie · June 8, 2015, 10:21pm

Tinder would be more accurate since it uses the phones GPS.

You could also cross check what subreddits they are subscribed to.

Martin_T · June 9, 2015, 9:11am

John,

At a brute force country level it is possible to use the Delegated
ranges lists but that runs into the problem where IP ranges are
subnetted and allocated to other countries.

Yeah.

In addition, to illustrate the point in my initial post, sometimes
inetnum objects contain more than one "country" attribute and only the
first country code is inserted into RIR delegated list. For example:

$ for deleg in $(wget -qO -
ftp://ftp.ripe.net/ripe/stats/delegated-ripencc-latest | grep ipv4 |
cut -d '|' -f 4 | tail -10000); do

[[ $(whois -rh whois.ripe.net -T inetnum "$deleg") = *country:*country:* ]] && echo "$deleg"
done

193.104.217.0
193.110.48.0
193.111.228.0
193.218.114.0
194.33.109.0
194.34.64.0
194.42.56.0
194.150.168.0
194.153.74.0
195.14.23.0
195.39.208.0
195.85.254.0
195.95.150.0
195.158.230.0
$

Blake,

Have you thought about application layer tests - e.g. is the client's
character set/language set to Swedish? Has the user identified
himself/herself/henself as living in or being from Sweeden?

Unfortunately I need this on network layer, i.e. it should work for
other traffic besides HTTP/HTTPS.

Anyway, thanks for all the replies!

Martin

Joe_Abley2 · June 9, 2015, 3:13pm

I would say that a perfectly accurate mapping of address to anything geographical (with more accuracy than "it's within the observed universe, somewhere") is unlikely ever to exist, except by accident and for short periods of time. Accuracy and lack of authoritative sources of data is one reason, constant uncoordinated reconfiguration is another. You need to decide how accurate your mapping needs to be (and figure out how to measure that, if accuracy is important).

Another part of the problem is framing the question in a useful way: a universal solution seems intractable when the following questions are answered differently (but accurately) by different people who have different needs.

Is a device in Uganda connected via satphone to a router in France in Uganda, or France?

Is a network in Fiji that can't talk to any other networks in Fiji without leaving the island but is one layer-3 hop away from Australia in Fiji, or Australia?

Does the source address of a packet always identify the device that sent the packet?

If I'm in region A and you're in region A, and you route within region to me but my replies leave the region on the way back, are we in the same region from my perspective? How about yours?

Even: if I'm in region A but I'm using a DNS resolver in region B, am I in region A or region B?

Joe

Dave_Sparro · June 10, 2015, 11:29am

Years ago when meeting with the lawyers to talk about the need to block
access to a list of websites I was coming from the technical side and
talking about how all of our possible solutions were incomplete and easily
circumvented by our users. The lawyers' response was to explain the
concept of good faith effort. The main point was that we needed to "do
something." We'd be in pretty good shape liability-wise as long as we made
an attempt. Getting back to the point of the question. I'd find the
cheapest/easiest way to implement a somewhat effective GeoIP block, and say
that you've done something.