While this thread is slowly drifting, I disagree with your assertion that so
much of the web traffic is cacheable (nlanr's caching effort, if I remember,
only got around 60% of requests hit in the cache, pooled over a large number
of clients. That probably should be the correct percentage of cacheable
content on the net). If anything, the net is moving to be *more* dynamic.
The problem is that web sites are putting unrealistic expires on images and
html files because they're being driven by ad revenues. I doubt that any of
the US based commercial websites are interested in losing the entries in
their hit logs. Caching is the type of thing is totally broken by
session-ids, (sites like amazon.com and cdnow).
The only way caching is going to truly be viable in the next 5 years is
either by a commercial company stepping in and working with commercial
content providers (which is happening now), or webserver software vendors
work with content companies on truly embracing a hit reporting protocol.
So basically, my assertion is that L4 caching on any protocol will not work
if the content provider is given any control of TTL and metrics. The only
way web caching *really* works is when people get aggressive and ignore the
expire tags from a network administrator point of view, not a content
company's. From what I remember, that was the only way the some Australian
isps were able to make very aggressive caching work for them. Further, the
more you rely on L4 implementations for caching, the more it seems you would
be open to broken implementations... Although that is a broad statement...