[Nanog] P2P traffic optimization Was: Lies, Damned Lies, and Statistics [Was: Re: ATT VP: Internet to hit capacity by 2010]

Laird_Popkin · April 23, 2008, 10:30pm

I would certainly view the two strategies (reverse engineering network information and getting ISP-provided network information) as being complimentary. As you point out, for any ISP that doesn't provide network data, we're better off figuring out what we can to be smarter than 'random'. So while I prefer getting better data from ISP's, that's not holding us back from doing what we can without that data.

ISP's have been very clear that they regard their network maps as being proprietary for many good reasons. The approach that P4P takes is to have an intermediate server (which we call an iTracker) that processes the network maps and provides abstracted guidance (lists of IP prefixes and percentages) to the p2p networks that allows them to figure out which peers are near each other. The iTracker can be run by the ISP or by a trusted third party, as the ISP prefers.

- Laird Popkin, CTO, Pando Networks
mobile: 646/465-0570

Christopher_Morrow · April 23, 2008, 11:47pm

I would certainly view the two strategies (reverse engineering network information and getting ISP-
provided network information) as being complimentary. As you point out, for any ISP that doesn't
provide network data, we're better off figuring out what we can to be smarter than 'random'. So while I
prefer getting better data from ISP's, that's not holding us back from doing what we can without that
data.

ok, sounds better or more reasonable, or not immediately doomed to
blockage 'more realistic' even.

ISP's have been very clear that they regard their network maps as being proprietary for many good
reasons. The approach that P4P takes is to have an intermediate server (which we call an iTracker)
that processes the network maps and provides abstracted guidance (lists of IP prefixes and
percentages) to the p2p networks that allows them to figure out which peers are near each other. The > iTracker can be run by the ISP or by a trusted third party, as the ISP prefers.

What's to keep the itracker from being the new 'napster megaserver'? I
suppose if it just trades map info or lookup (ala dns lookups) and
nothing about torrent/share content things are less sensitive from a
privacy perspective. and a single point of failure of the network
perspective.

Latency requirements seem to be interesting for this as well... at
least dependent upon the model for sharing of the mapping data. I'd
think that a lookup model served the client base better (instead of
downloading many large files of maps in order to determine the best
peers to use). There's also a sensitivity to the part of the network
graph and which perspective to use for the client -> peer locality
mapping.

It's interesting at least

Thanks!
-Chris

(also, as an aside, your mail client seems to be making each paragraph
one long unbroken line... which drives at least pine and gmail a bit
bonkers...and makes quoting messages a much more manual process than
it should be.)

Michael_Holstein · April 24, 2008, 1:30pm

ISP's have been very clear that they regard their network maps as being proprietary for many good reasons. The approach that P4P takes is to have an intermediate server (which we call an iTracker) that processes the network maps and provides abstracted guidance (lists of IP prefixes and percentages) to the p2p networks that allows them to figure out which peers are near each other. The iTracker can be run by the ISP or by a trusted third party, as the ISP prefers.

Won't this approach (using a ISP-managed intermediate) ultimately end up
being co-opted by the lawyers for the various industry "interest groups"
and thus be ignored by the p2p users?

Cheers,

Michael Holstein
Cleveland State University

Mike_Gonnason · April 24, 2008, 1:38pm

This idea is what I am concerned about. Until the whole copyright mess
gets sorted out, wouldn't these iTracker supernodes be a goldmine of
logs for copyright lawyers? They would have a great deal of
information about what exactly is being transferred, by whom and for
how long.

-Mike Gonnason

Keith · April 24, 2008, 1:48pm

The iTrackers just helps the nodes to talk to each other in a more
efficient way, all the iTracker does is talk to another p2p tracker and
is used for network topology, has no caching or file information or user
information..

Keith O'Neill
Pando Networks

Mike Gonnason wrote:

michael.dillon · April 24, 2008, 1:52pm

Won't this approach (using a ISP-managed intermediate)
ultimately end up being co-opted by the lawyers for the
various industry "interest groups"
and thus be ignored by the p2p users?

To bring this back to network operations, it doesn't much
matter what lawyers and end users do. The bottom line is that
if P2P traffic takes up too much bandwidth at the wrong points
of the network or the wrong times of day, then ISPs will do things
like blocking it, disrupting connections(Comcast), and traffic
shaping (artificial congestion). The end users will get slower
downloads as a result.

Or, everybody can put their heads together, make something that
works for ISPs operationally, and give the end users faster
downloads. The whole question is how to multicast content over
the Internet in the most cost effective way.

--Michael Dillon

Alex_Harrowell · April 24, 2008, 2:24pm

This idea is what I am concerned about. Until the whole copyright mess
gets sorted out, wouldn't these iTracker supernodes be a goldmine of
logs for copyright lawyers? They would have a great deal of
information about what exactly is being transferred, by whom and for
how long.

A good point about the approach of announcing a list of prefixes and
preference metrics, rather than doing lookups for each peer individually, is
that the supernode's logs will only tell you who used a p2p client at all;
nothing about what they did with it.

If you have to lookup each peer, the log would be enough to start building a
social graph of the p2p network, which would be a good start towards knowing
who to send the nastygram to. Reading the following description of the P4P
group's current approach, this looks like it's what they're doing:

The approach that P4P takes is to have an intermediate server (which we

call an iTracker) that >processes the network maps and provides abstracted
guidance (lists of IP prefixes and >percentages) to the p2p networks that
allows them to figure out which peers are near each other.

Michael_Holstein · April 24, 2008, 3:50pm

Or, everybody can put their heads together, make something that
works for ISPs operationally, and give the end users faster
downloads. The whole question is how to multicast content over
the Internet in the most cost effective way.

This will work as long as the "optimization" strategy is content-agnostic.

p2p users want their content
netops want efficient utilization
lawyers want logfiles

You can have 2 out of 3.

Cheers,

Michael Holstein
Cleveland State University

Eric_Osterweil · April 24, 2008, 3:59pm

The iTrackers just helps the nodes to talk to each other in a more
efficient way, all the iTracker does is talk to another p2p tracker
and
is used for network topology, has no caching or file information or
user
information..

After reading the P4P paper, it seems like the iTrackers have some
large implications. Off the top of my head:
- - The paper says, "An iTracker provides... network status/
topology..." doesn't it seem like you wouldn't want to send this to
P2P clients? Is the "PID" supposed to preserve privacy here? I have
some doubts about how well the PID helps after exposing ASN and LOC.
- - As a P2P developer, wouldn't I be worried about giving the iTracker
the ability to tell my clients that their upload/download capacity is
0 (or just above)? It seems like iTrackers are allowed to control
P2P clients completely w/ this recommendation, right? That would be
very useful for an ISP, but a very dangerous DoS vector to clients.

These are just a couple of the thoughts that I had while reading.

Eric

Laird_Popkin · April 24, 2008, 4:24pm

Interesting discussion. Comments below:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

The iTrackers just helps the nodes to talk to each other in a more
efficient way, all the iTracker does is talk to another p2p tracker
and
is used for network topology, has no caching or file information or
user
information..

After reading the P4P paper, it seems like the iTrackers have some
large implications. Off the top of my head:
- - The paper says, "An iTracker provides... network status/
topology..." doesn't it seem like you wouldn't want to send this to
P2P clients? Is the "PID" supposed to preserve privacy here? I have
some doubts about how well the PID helps after exposing ASN and LOC.

The PID is an identifier of a POP, which is really just a grouping
mechanism telling the P2P network that all of the nodes with IP
addresses that match a list of prefixes are in "the same place" in
network terms. The definition of "the same place" is up to the ISP -
it can be metro area, region, or even local loop or cable head end,
depending on the ISP's desire to localize traffic. The PID is an
arbitrary string sent by the ISP, so it could be numbers, name of a
city, etc., depending on how much the ISP wants to reveal. PID's are
tied to ASN, but of course all IP's can be mapped to ASN easily, so
that's not revealing new information.

The information that the iTracker sends to the p2p network is:
    - ASN (which is public)
    - PID (e.g. "1234" or "New York")
    - For each PID, a list of IP prefixes that identify users in the PID
    - A weight matrix of how much the ISP wants peers to connect
between each pair of PID's. For example, if the PID's were cities, the
weights might be something like "NYC to Philadephia 30%, NYC to
Chicago 25%, NYC to LA 2%", and so on. Or if the PID's are
'anonymized' then it could be something like "123 to 456 30%, 123 to
876 25%, 123 to 1432 2%" and so on.

- - As a P2P developer, wouldn't I be worried about giving the
iTracker
the ability to tell my clients that their upload/download capacity is
0 (or just above)? It seems like iTrackers are allowed to control
P2P clients completely w/ this recommendation, right? That would be
very useful for an ISP, but a very dangerous DoS vector to clients.

It's important to keep in mind that P4P doesn't control the P2P
network, it's just an additional source of data provided to the P2P
Trackers (for example) in addition to whatever else the P2P network
already does, helping the p2p network make smarter peer assignments.
But P4P doesn't tell p2p clients what to do, or give the ISP any
control over the P2P network. Specifically, if the P4P data from one
ISP is bad, the P2P network can (and presumably will) choose to ignore
it.

These are just a couple of the thoughts that I had while reading.

I appreciate your taking the time. This is a good discussion.

Eric

Keith O'Neill
Pando Networks

Mike Gonnason wrote:

ISP's have been very clear that they regard their network maps
as being proprietary for many good reasons. The approach that
P4P takes is to have an intermediate server (which we call an
iTracker) that processes the network maps and provides
abstracted guidance (lists of IP prefixes and percentages) to
the p2p networks that allows them to figure out which peers are
near each other. The iTracker can be run by the ISP or by a
trusted third party, as the ISP prefers.

Won't this approach (using a ISP-managed intermediate)
ultimately end up
being co-opted by the lawyers for the various industry "interest
groups"
and thus be ignored by the p2p users?

Cheers,

Michael Holstein
Cleveland State University

This idea is what I am concerned about. Until the whole copyright
mess
gets sorted out, wouldn't these iTracker supernodes be a goldmine of
logs for copyright lawyers? They would have a great deal of
information about what exactly is being transferred, by whom and for
how long.

The P2P network doesn't provide this kind of information to the
iTracker.

We're comparing two models, "generic' and 'tuned per swarm'.

In the 'generic' model, the P2P network is given one weight matrix,
based purely on the ISP's network. In this model, the P2P network
doesn't provide any information to the iTracker at all - they just
request an updated weight matrix periodically so that when the ISP
changes network structure or policies it's updated in the P2P network
automatically.

In the 'tuned per swarm' model, the P2P network provides information
about peer distribution of each swarm's peers (e.g. there are seeds in
NYC and downloaders in Chicago). With this information, the iTracker
can provide a 'tuned' weight matrix for each swarm, which should in
theory be better. This is something that we're going to test in the
next field test, so we can put some numbers around it. This model
requires more communications, and exposes more of the p2p network's
information to the ISP, so it's important to be able to quantify the
benefit to decide whether it's worth it.

BTW, if this discussion is getting off topic for the NANOG mailing
list, we can continue the discussion offline. Does anyone think that
we should do so?

-Mike Gonnason

_______________________________________________
NANOG mailing list
NANOG@nanog.org
http://mailman.nanog.org/mailman/listinfo/nanog

_______________________________________________
NANOG mailing list
NANOG@nanog.org
http://mailman.nanog.org/mailman/listinfo/nanog

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iD4DBQFIEK5hK/tq6CJjZQIRAgXqAJd8t3XkmYqo1WYaJP7qOF4W67tYAJ9C5hZ+
iwVc8ZU8AJ3f98KCFCq8Eg==
=LEPV
-----END PGP SIGNATURE-----

_______________________________________________
NANOG mailing list
NANOG@nanog.org
http://mailman.nanog.org/mailman/listinfo/nanog

Laird Popkin
CTO, Pando Networks
520 Broadway, 10th floor
New York, NY 10012

laird@pando.com
c) 646/465-0570