Reducing Usenet Bandwidth

Like many Internet settlement schemes, this seems to not make much sense. If
a person reads USENET for many years enjoying all of its wisdom, why should
he get a free ride? And why should the people who supply that wisdom have to
pay to do so? A USENET transaction is presumed to benefit both parties, or
else they wouldn't have configured their computers to make that transaction.

Well, the idea wasn't exactly fully formed, and you've taken it in a
direction that doesn't match what I was thinking. I am definitely
*not* thinking at the granularity of "users." I've heard of users, and
their hunger for pornography, MP3s, and pirated copies of Word, but
this isn't about them. It's about sites that want to offer USENET to
these "users," and the ever-increasing cost to play in the global
USENET pool.

The topic being discussed is to try to reduce USENET bandwidth. One
way to do that is to pass pointers around instead of complete
articles. If the USENET distribution system passed pointers to
articles around instead of the actual articles themselves, sites could
then "self-tune" their spools to the content that their readers (the
"users") found interesting (fetch articles from a source that offered
to actually spool them), either by pre-fetching or fetching on-demand,
but still have access to the "total accumulated wisdom" of USENET -
and maybe it wouldn't need to be reposted every week, because sites
could also offer value to their "users" on the publishing side by
offering to publish their content longer.

It would be helpful if that last decision - how much to publish -
didn't have global impact the way it does now. When someone injects a
copy of Word into the USENET distribution system now, everyone's disk
and bandwidth cost is incurred immediately. If a pointer was flooded
instead, the upfront cost is less, and article transfer cost is
(arguably) able to more closely match a site's level of demand, rather
than other sites' willingness to supply.

This would have the effect of letting sites with different levels of
willingness to expend resources still play in the global USENET game
with, in theory, greater access to information. It would, again in
theory, allow sites that don't necessarily have the benefit of a lot
of resources to spool articles to leverage access to sites that do
(thus the comment about the cost being incurred by the publisher, or,
more appropriately, those willing to publish). The primary benefit, I
think, is that sites that publish poorly - allow a lot of trash to be
posted - could do so without poisoning the village green for others. A
downstream spool might be able to implement the policy option of
choosing not to pre-fetch on a site-by-site basis, rather than having
to tune their spool on a group-by-group basis, and the information is
all still there. The incentive to publish quality information is that
downstream sites are more willing to pre-fetch from you, lowering your
bandwidth costs.

There are, of course, a thousand devils in the details, like how to
chase down an article when you didn't necessarily have it and you
didn't necessarily know who might still be offering a copy. Some of
those problems in that vein that appeared insurmountable ten years ago
might have solutions in current "peer-to-peer" networking technologies
(thus the off-hand comment about Napster).

Users, in theory, would not really see anything different than they
see today. Underneath the covers, though, (a) some articles might take
longer to fetch than others, and (b) there'd be less trash distributed
globally. I don't envision reducing the hunger of "users" for
pornography, MP3s, or pirated copies of Word. Maybe we don't need to
incur so much cost transmitting them to and storing them in thousands
of sites around the net each week, though.

Stephen

Just to chime in, the idea of passing pointers around has come up in several
different versions over the last so many years and combined with
pre-fetching/caching seems like a very good idea; it's a pity that no one has
really tried it given that many of the tools already exist (disclaimer: I've
been dabbling with writing something using existing protocols on and off for
some time).

So far, and this was inspired by the loss of a clariNet newsfeed, I've written
a silly little personal nntp server based on an SQL back-end that collects RSS
pointers to website "articles". The server stores the "pointers" (URLs) and
retrieves the article directly from the remote webserver (or local http cache)
on demand (currently spits out html, ugh). The header info (overview) is
composed of the RSS data (unfortunately quite sparse in most cases) and the
"headline" fetcher script builds a "score" for each article based on my
predefined criteria.

I've found at least one mailing list archive that produces RSS files for each
mailing list that can be simliarly used to populate a "newsgroup". So mailing
lists provide a simple way of creating a "publishing" method other than a
website.

In all these cases, "replying" to a message is a bit involved, but not
impossible.

So, as far as I can tell, all the tools exist to implement a new set of
newsgroups based on existing protocols that don't involve the massive bit
movement of Usenet however preserve the highly efficient news reading
mechanisms of news readers.

Anyone have a sample, open-source, robust NNTP server implemented in Java or
perl that needs a SQL backend and article population mechanism?

Adi

I'm a bit behind on reading the NANOG list, so excuse the late reply.

If we can really build such a beast, this would be extremely cool. The
method of choice for publishing free information on the Net is WWW these
days. But it doesn't work very well, since there is no direct relationship
between an URL and the published text or file. So people will use a "far
away" URL because they don't know the same file can be found much closer,
and URLs tend to break after a while.

I've thought about this for quite a while, and even written down a good
deal of them. If you're interested:

http://www.muada.com/projects/usenet.txt

Iljitsch van Beijnum

You might want to have a look at the freenet project, If you haven't. Some
time ago there were ideas floating around about NNTP over Freenet. It gets
quite close to what you are describing (modulo freedom;)

  http://freenetproject.org

This is the art of content delivery and caching. And the nice thing is that depending on which technology you use the person who wants the material closer to the end user pays. If that's the end user, then use a cache with WCCP. If that's the content owner, use a cache with either an HTTP redirect or (Paul, forgive me) a DNS hack, either of which can be tied to the routing system. In either case there is, perhaps, a more explicit economic model than netnews. It's not to say there *isn't* an economic model with netnews. It's just that it doesn't make as much sense as it did (see smb's earlier comment).

Eliot

This is the art of content delivery and caching.

Actually _delivery_ is only part of the problem: it assumes the content is
available, people know enough about it to be able to decide they want it,
and they know where it is and how to request it. Obviously, delivery is an
important aspect of the whole process to optimize, since it takes a lot of
bandwidth, depending on the type of content. But distributing the
meta-information is even harder, and potentially more expensive. The
failure of Usenet to effectively do it demonstrates this: because
selection is pretty much impossible, you have to deliver everything to a
place very near the potential user, even the stuff that is of no interest
to any user.

And the nice thing is
that depending on which technology you use the person who wants the
material closer to the end user pays. If that's the end user, then use a
cache with WCCP. If that's the content owner, use a cache with either an
HTTP redirect or (Paul, forgive me) a DNS hack, either of which can be tied
to the routing system. In either case there is, perhaps, a more explicit
economic model than netnews. It's not to say there *isn't* an economic
model with netnews. It's just that it doesn't make as much sense as it did
(see smb's earlier comment).

In reality, people don't want to think about it. How much am I willing to
pay for the privilige of posting this message to NANOG? And you to read
it? If we both apply the hourly rate we bill our customers to the time we
spend on it (because we could have been doing work that actually pays
money instead), probably more than we realize. On the other hand, If I had
to cough up some money right here, right now to post this, I probably
wouldn't.

> This is the art of content delivery and caching.

Actually _delivery_ is only part of the problem: it assumes the content is
available, people know enough about it to be able to decide they want it,
and they know where it is and how to request it. Obviously, delivery is an
important aspect of the whole process to optimize, since it takes a lot of
bandwidth, depending on the type of content. But distributing the
meta-information is even harder, and potentially more expensive.

The only expensive part about dealing with the meta-information is adapting existing technology to point at URLs. Passing the meta-information is several orders of magnitude less expensive than the actual files themselves. Managing the meta-data (i.e., CPU/memory) is negligable compared to the cost and latency of retrieving that data. The retrieval of XOVER information and such scales well. That's been the lesson of the last few years. And content delivery networking mechanisms can be used for it.

The
failure of Usenet to effectively do it demonstrates this: because
selection is pretty much impossible, you have to deliver everything to a
place very near the potential user, even the stuff that is of no interest
to any user.

To call USENET a failure is a bit of a stretch. But its scaling point is on the side of the reader. That is solved today through stronger search capabilities of various search engines that were merely a gleam in several people's eye, even as late as 1990. It wasn't until WAIS came about that USENET became more searchable, and then the economics began to shift. Not that people didn't try to do selection. Brad Templeton had a semi-automated feedback mechanism with Clarinet that he gave away to his customers.

In reality, people don't want to think about it. How much am I willing to
pay for the privilige of posting this message to NANOG? And you to read
it? If we both apply the hourly rate we bill our customers to the time we
spend on it (because we could have been doing work that actually pays
money instead), probably more than we realize. On the other hand, If I had
to cough up some money right here, right now to post this, I probably
wouldn't.

Nobody wants to think about it until real money is involved. And while text discussions don't involve a whole lot of real money, this conversation was started by people who are tired of transporting warez and having to pay for it.

Eliot