Digex transparent proxying

I went searching through my email (since Digex support claimed customers
had been notified of Digex's intent to start transparent proxying) and
found that I did get "a message" that I'd missed while out of town for
Linux Expo. Here's the message:

It might also be time for content providers of time-sensitive data on the web
to redirect requests coming from Digex' proxy harvest machines to a web page
that says something along the lines of "Digex has intercepted your web request
and directed it through their web caching system. This impacts the
time-sensitive data at this site. Hence you cannot access this site in this
manner. Please contact Digex at [insert contact info here] and politely ask
Digex to stop intercepting your web requests. When Digex removes this
interception mechanism and permits you to connect directly to this site again
without any interference or interception, you will again have full access.
Please understand we cannot offer full access without a direct, unintercepted
connection. Otherwise we cannot maintain the quality and timeliness of the
data this web site provides due to interference by a third party." And so
forth. Just an idea really.

Or does this hijacking mechanism NOT use harvesting techniques that can be
detected by the source IP address?

Blah. Tis late, I must be babbling incoherently.

Aaron out.

It might also be time for content providers of time-sensitive data on the web
to redirect requests coming from Digex' proxy harvest machines to a web page
that says something along the lines of "Digex has intercepted your web request
and directed it through their web caching system. This impacts the
time-sensitive data at this site. Hence you cannot access this site in this
manner. Please contact Digex at [insert contact info here] and politely ask
Digex to stop intercepting your web requests. When Digex removes this
interception mechanism and permits you to connect directly to this site again
without any interference or interception, you will again have full access.
Please understand we cannot offer full access without a direct, unintercepted
connection. Otherwise we cannot maintain the quality and timeliness of the
data this web site provides due to interference by a third party." And so
forth. Just an idea really.

Hay, I like it, but I don't know how many content providers would be
willing to do something like this. In fact some contect providers with low
bandwidth connections my actually be happy with Digex. I am glad I am not
a Digex customer because I would now be looking for a new provider.

Or does this hijacking mechanism NOT use harvesting techniques that can be
detected by the source IP address?

Blah. Tis late, I must be babbling incoherently.

Na, I think it is a good idea, I just don't think that many people will
put in the effort to make it happen.

<>

Nathan Stratton Telecom & ISP Consulting
www.robotics.net nathan@robotics.net

In article <35935600.FF54EBFE@infowest.com>,

Except when listening to the "proper" headers has been turned off by the
cache (which some of these caching products support or will support).

bye,
ken emery

Any content provider savvy enough to put that up could just as easily
put in the proper headers to tag their content as cacheable or
uncacheable or cacheable until time X, and that would be better for
*everyone* involved.

yes, but they don't, and they won't, not ever. expecting other people to do
the right thing is just not a hallmark of good operational practice, nor is
it a NANOG tradition.

> dialup users, corporate customers and downstream ISPs who
> utilize those links. Cached content is delivered accurately

Accurately is a bone of contention. We've all seen what caching can do to
time sensitive web-sites.

"go ahead, make my day." as the principal architect of a caching product
which served up yesterday's newspaper when today's newspaper was wanted, for
pretty much all of 1997, to about two dozen hardy "early adopters" (you can
tell who they are by the flaming arrows sticking out of their backs), i'm now
perfectly willing to challenge you to find any content, no matter how broken
or mangled or missing its headers are, that a Web Gateway Interceptor can't
serve correctly.

and that's without any central registry of "bad content" such as that
described by in the post here from inktomi earlier in the day.

i'm not saying this out of a desire to use NANOG for product advertising; my
last caching product pretty much served its purpose by showing that this kind
of caching doesn't win back enough bandwidth if correctness is the goal, and
the product has no distribution channel at all. i can't make money by arguing
about transparent caching on NANOG -- trust me.

so really i'm just flaming. correctness *can* be maintained. content
creators will *not* do the right thing. anybody who says otherwise is
welcome to tell me a URL they think my WGI can't serve and also show me
mail from a content creator saying "thanks for telling me that i was
doing the wrong thing, i'll fix my Expires: headers immediately!"

In thoise instances though the implimenter of the cache is the fool - not
the manufacturer - like anything the operator/implimentor of a caching
product should have some understanding of WTF they are doing - sadly you
are right - Ken - thre undoubtably are ppl out there that will say "Hey
this will let me cache it longer - kewl beans!" - and will suffer the
consequences of their decision - but that doesn't really mean that the
product should not afford the operater the options of fine tuning their
performance.

Or detect a proxy and refuse service (which, if I was doing this, is exactly
what I would do).

Not to mention those to derive advertising revenue from "views", and have no
way to measure them in a cached environment. If I was doing that, I would
also deny service to proxy servers and display a nice message telling the
user to remove the proxy or bitch about its forced use.

Karl that is silly - in the a lot less time and effort that you would put
in setting up your redirect - you could have just used a correct Expires
header and has your content (or at least just the suff that you want to
remain dynamic) avoid the cache. It serves you as the designer as much as
it serves the enduser (speedier loads). If your page comes up quickly
chances are the person will venture around a bit londer and click more
links in the 20 minutes that they have before their date picks them up
etc... It's common sense.

As soon as that happens widely the cache is useless (or worse) and therefore
people who use them have a reason to ignore the Expires headers.

Sure if "everything" on the site is tagged as dynamic or is pre-expired
then what is the point of caching at all - but "everything" doesn't need
to be for caching and hit-stats to coexist - just count your hits on one
item, just leave those items that ABSOLUTELY must be dynamic be dynamic -
and items that should be updated daily - should have the correct expires
date set - so that IMS can work to their advantage.

That doesn't work, since the meta-tag is on the PAGE, not the element (which
technically doesn't exist and is a figment of the server's imagination)

Therefore, any page which has a non-cacheable element (ie: an ad or
time-sensitive data) must be marked non-cachable.

Congratulations - you just specified that any shtml or asp page must be
marked non-cachable, along with any time-sensitive or advertiser-sponsored
page.

If that actually happens, then the proxy server operators will start
shutting off recognition of the headers, and now we're right back where
we started, along with the performance problems that this causes (forcing
the traffic through a proxy server for EACH access actually HURTS
performance, not helps it).

That doesn't work, since the meta-tag is on the PAGE, not the element (which
technically doesn't exist and is a figment of the server's imagination)

That is why it must first be parsed by the web server.

Therefore, any page which has a non-cacheable element (ie: an ad or
time-sensitive data) must be marked non-cachable.

No sireee Karl - no the entire page but the specific element that is time
sensitive.

Congratulations - you just specified that any shtml or asp page must be
marked non-cachable, along with any time-sensitive or advertiser-sponsored
page.

Yikes - Karl there are many differnt items on the page - they do not all
have to have the same attributes - a cgi, an asp a banner-add, or maybe
some other item on the page can be marked dynamic while the rest of the
elements can be cached.

If that actually happens, then the proxy server operators will start
shutting off recognition of the headers, and now we're right back where
we started, along with the performance problems that this causes (forcing
the traffic through a proxy server for EACH access actually HURTS
performance, not helps it).

Yeah assuming that everyone who owns/operates a proxy/cache is a total and
utter moron.

Sure if "everything" on the site is tagged as dynamic or is pre-expired
then what is the point of caching at all - but "everything" doesn't need
to be for caching and hit-stats to coexist - just count your hits on one
item, just leave those items that ABSOLUTELY must be dynamic be dynamic -
and items that should be updated daily - should have the correct expires
date set - so that IMS can work to their advantage.

more aggressive caching techniques would necessitate that the tags be
ignored anyway, and the "dynamic" content would still be cached. expires
etc are only useful if the caching box decides to honour them.

some of the content providers in the US would turn over in their graves if
they knew what people who pay heaps more $$$ for traffic are doing to
their web sites :). we have done some funky stuff with content (read:
caching of dynamic pages, even .asps, "tokenized" HTML, dropping realaudio
UDP etc etc etc).

more aggressive caching techniques would necessitate that the tags be
ignored anyway, and the "dynamic" content would still be cached. expires
etc are only useful if the caching box decides to honour them.

The tags are usually ignored by default - the html tags I mean - unless
itis parsed into http most caches will ignore it and you do have a
problem. The majority of caches I've seen have a built exclusion for the
usual "always dynamic" items - cgi, "?" in path, asp... .

some of the content providers in the US would turn over in their graves if
they knew what people who pay heaps more $$$ for traffic are doing to
their web sites :). we have done some funky stuff with content (read:
caching of dynamic pages, even .asps, "tokenized" HTML, dropping realaudio
UDP etc etc etc).

Yup - you folk in Australiaview caching as the norm - not the enemy. Go
figure...

This seems like more of a marketing issue than an operational issue. I see
transparent proxying as another product offering in which you would have two
options.

Option 1 -- Utilize transparent proxying by applying policy based routing.
This causes the backbone providers to use less bandwidth, so customers cost
is reduced (i.e., highest cost is the backbone, so apply a discount for less
bandwidth use). If contracts prior to the placement of the proxy are in
place, the customer would have to provide explicit approval (and realize a
reduced cost) before this policy would be applied.

Option 2 -- Apply no policy to the end users connection. This is typically
what "in place" contracts had in mind. If I agree to pay you $xxxx.xx/month
for a connection before the cache was put in place, I should still receive
that connection unaltered.

BTW, I am in favor of transparent proxying...but not if I don't control it.
:wink:

D.

Derek Elder
US Web / Gray Peak Technologies
Network Engineering
212-548-7468
Pager - 888-232-5028
delder@graypeak.com
http://www.usweb.com
A Strategic Partner for the
Information Age.

Or detect a proxy and refuse service (which, if I was doing this, is exactly
what I would do).

"Go ahead, make my day." If you can detect the proxy box I used to sell via
MII, and refuse service to it, I will post a retraction right here.

Not to mention those to derive advertising revenue from "views", and have no
way to measure them in a cached environment.

"No way"? How about:

rfc2227.txt -- Simple Hit-Metering and Usage-Limiting for HTTP.
  J. Mogul, P. Leach. October 1997. (Format: TXT=85127 bytes)
  (Status: PROPOSED STANDARD)

If I was doing that, I would also deny service to proxy servers and
display a nice message telling the user to remove the proxy or bitch about
its forced use.

Playboy.COM did something like that to @Home last year for a similar reason.