Do ISP's collect and analyze traffic of users?

Michael_Thomas · May 15, 2023, 10:59pm

And maybe try to monetize it? I'm pretty sure that they can be compelled to do that, but do they do it for their own reasons too? Or is this way too much overhead to be doing en mass? (I vaguely recall that netflow, for example, can make routers unhappy if there is too much "flow").

Obviously this is likely to depend on local laws but since this is NANOG we can limit it to here.

Mike

Dave_Phelps · May 16, 2023, 1:39am

I think it’s safe to assume they are selling such data.

https://www.techdirt.com/2021/08/25/isps-give-netflow-data-to-third-parties-who-sell-it-without-user-awareness-consent/

https://www.vice.com/en/article/dy3z9a/fbi-bought-netflow-data-team-cymru-contract

Matthew_Petach2 · May 16, 2023, 4:45am

From the second article:

“Team Cymru’s products can also include data such as URLs visited, cookies, and PCAP data”

Really? From Netflow?

I admit, I’m perhaps a little behind on the latest netflow whiz-bangs,
but I’ve never seen a netflow record type that included HTTP cookies
or PCAP data before.

Certainly, the products listed on the Team Cymru website don’t make any mention
of including cookies or PCAP data, at least not from what I’ve been able to
ascertain from digging through their product listing.

Is there some secret “off the menu” product that allows one to purchase a
data feed that includes cookies and PCAP data?

Matt

Mailman · May 16, 2023, 8:10am

Take your pick from the "latest" ~2009 IPFIX Information Elements:

https://www.iana.org/assignments/ipfix/ipfix.xhtml

One can stuff almost anything in there.

Now if one should, and if one is allowed to.....

There is a reason why the marketing companies that control the general Internet moved the browser to HTTPS and are trying to move to using their VPNs/CDNs: cannot modify the data to alter or remove the Ad in-flight, and cannot easily see anymore what people are even contacting: visibility for the ad network and not the ISP (which is mostly a good thing, but not so much operationally )

Greets,
Jeroen

LouD · May 16, 2023, 11:35am

ISP capture traffic samplings in both directions
Upstream at aggregation points , Downstream at ingress and your DNS queries but the last part everyone knows .
Some of the most expensive gear is used to sample and aggregate that data

Rishi_Panthee · May 15, 2023, 11:05pm

I’ve got Akvorado and netflow to identify where traffic comes in/goes to so we can improve our peering and make less traffic go via transit. I did see an article about Team Cymru selling netflow data from ISPs to governments though. Here is the FBI’s Contract to Buy Mass Internet Data

Rishi Panthee
Ryamer LLC
Https://ryamer.com
rishipanthee@ryamer.com

Tom_Beecher · May 16, 2023, 12:54pm

Two simple rules for most large ISPs.

If they can see it, as long as they are not legally prohibited, they’ll collect it.
If they can legally profit from that information, in any way, they will.

Now, ther privacy policies will always include lots of nice sounding clauses, such as ‘We don’t see your personally identifiable information’. This of course allows them to sell ‘anonymized’ sets of that data, which sounds great , except as researchers have proven, it’s pretty trivial to scoop up multiple, discrete anonymized data sets, and cross reference to identify individuals. Netflow data may not be as directly ‘valuable’ as other types of data, but it can be used in the blender too.

Information is the currency of the realm.

Tom_Beecher · May 16, 2023, 1:07pm

I did see an article about Team Cymru selling netflow data from ISPs to governments though.

Team Cymru sold the same thing to the FBI Cyber Crimes division that any of us could purchase if we wanted to pay for it.

Josh_Luthman · May 16, 2023, 1:41pm

Our ISP does not collect (nor obviously sell) customer information/traffic. People volunteer all of their information on Facebook/Twitter/etc already, I’m not sure I see a concern.

Mailman · May 16, 2023, 2:35pm

+1 to what Josh writes below. I would also differentiate between mobile networks (service provisioned to individual devices & often carrier s/w on the device) and wireline networks (home devices behind a router/gateway/NAT).

I just don’t think sale of data is a business for wireline ISPs. If it were - given most companies are public - you’d see it in SEC 10K filings and on earnings calls. Indeed, they’d be required to talk about it with investors if it was a material revenue stream. I see none of that. Rather, the focus is on subscription revenue. If you want to know about data monetization - focus on services you don’t pay for…

Jason

Saku_Ytti1 · May 16, 2023, 2:55pm

I can't tell what large is. But I've worked for enterprise ISP and
consumer ISPs, and none of the shops I worked for had capability to
monetise information they had. And the information they had was
increasingly low resolution. Infraprovider are notoriously bad even
monetising their infra.

I'm sure do monetise. But generally service providers are not
interesting or have active shareholders, so very little pressure to
make more money, hence firesales happen all the time due
infrastructure increasingly seen as a liability, not an asset. They
are generally boring companies and internally no one has incentive to
monetise data, as it wouldn't improve their personal compensation. And
regulations like GDPR create problems people rather not solve, unless
pressured.

Technically most people started 20 years ago with some netflow
sampling ratio, and they still use the same sampling ratio, despite
many orders of magnitude more packets. Meaning previously the share of
flows captured was magnitude higher than today, and today only very
few flows are seen in very typical applications, and netflow is
largely for volumetric ddos and high level ingressAS=>egressAS
metrics.

Hardware offered increasingly does IPFIX as if it was sflow, that is,
0 cache, immediately exported after sampled, because you'd need like
1:100 or higher resolution, to have any significant luck in hitting
the same flow twice. PTX has stopped supporting flow-cache entirely
because of this, at the sampling rate where cache would do something,
the cache would overflow.

Of course there are other monetisation opportunities via other
mechanism than data-in-the-wire, like DNS

michael.brooks · May 16, 2023, 4:44pm

First NANOG post, the topic compels me to chime in.

For me, the question also implies that user-side we are attempting to scrub any of the data we volunteer on social media (or other) platforms. I am careful about what I volunteer up to the Internetz, and have been since my first AOL floppy experience… So, the question of do the ISPs collect data is particularly important because regardless of how careful I am to anonymize my own contribution to my “online profile,” Tom’s assessment is the bleakest possible picture for anyone attempting to limit the data set which represents us.

Michael_Thomas · May 16, 2023, 6:54pm

Given the pervasiveness of TLS these days, even if they could get it off the remaining unencrypted data I'm not sure it would have a lot of value.

Mike

Michael_Thomas · May 16, 2023, 6:57pm

Why would there be a difference between wireless and wired?

Mike

Michael_Thomas · May 16, 2023, 7:03pm

And with DoH, that doesn't sound like a very long term opportunity.

Mike

Matthew_Petach2 · May 16, 2023, 7:44pm

[…]
I admit, I’m perhaps a little behind on the latest netflow whiz-bangs,
but I’ve never seen a netflow record type that included HTTP cookies
or PCAP data before.

Take your pick from the “latest” ~2009 IPFIX Information Elements:

https://www.iana.org/assignments/ipfix/ipfix.xhtml

One can stuff almost anything in there.

Now if one should, and if one is allowed to…

Wow.

Thank you, Jeroen, I was indeed a bit out of date.
Thank you for the pointer!

(For those in the same boat as I, here’s the relevant portion that clearly points out that yes, you can export the entire packet if you so desire):

313 | ipHeaderPacketSection | octetArray | default | current |

This Information Element carries a series of n octets from the IP header of a sampled packet, starting sectionOffset octets into the IP header.

However, if no sectionOffset field corresponding to this Information Element is present, then a sectionOffset of zero applies, and the octets MUST be from the start of the IP header.

With sufficient length, this element also reports octets from the IP payload. However, full packet capture of arbitrary packet streams is explicitly out of scope per the Security Considerations sections of [RFC5477] and [RFC2804].

|

| - | - | - | - | - |

Thanks!

Matt
(still learning after all these years. )

Mark_Tinka4 · May 17, 2023, 4:06am

I tend to agree.

ISP's are, generally, terrible at evolving beyond selling bandwidth.

While there might be some ISP's that are able to monetize the data they collect - to whatever degree that monetization is useful - I'd hazard that the majority don't do this because it requires a different mindset that most ISP's simply don't have.

Mark.

Mailman · May 17, 2023, 2:38pm

Why would there be a difference between wireless and wired?

Service provisioning in a mobile network is at the device level and tied to an individual vs. at a home shared across many devices & people. So just starting off there is more visibility to say X traffic is related to Y person. Then there’s location data to know roughly where that person/device is traveling. Also most carriers have software installed on the device as part of the provisioning/authentication function and I think there are historical cases where that provided some visibility into other apps on the device. In any case, it seems the most value (to advertisers & data brokers) is in the location data and I think that’s where all the scrutiny on MNOs has been recently.

JL

Mathews_Robert · May 17, 2023, 4:05pm

For those who may have a broader interest in the topic of user/subscriber information collection by ISPs… Eight Years Holding ISPs to Account in Latin America: A Comparative Outlook of Victories and Challenges for User Privacy By Veridiana Alimonti May 12, 2023 All the best –

Justin_M_Streiner2 · May 19, 2023, 12:27pm

There are already so many different ways that organizations can find out all sorts of information about individual users, as others have noted (social media interactions, mobile location/GPS data, call/text history, interactions with specific sites, etc), that there probably isn’t much incentive for many providers to harvest data beyond what is needed for troubleshooting and capacity planning. Plus, gathering more data - potentially down to the level packet payload - is not an easy problem to solve (read: expensive) and doesn’t scale well at all. 100G links are very common today, and 400G is becoming so. I doubt that many infrastructure providers would be able to justify the major investments in extra infrastructure to support this, for a revenue stream that likely wouldn’t match that investment, which would make such an investment a loss-leader.

Content providers - particularly social media platforms - have a somewhat different business model, but those providers already have many different ways to harvest and sell large troves of user data.

Thank you
jms