RE: Spam with no purpose?

Deepak Jain wrote:
Can someone explain to me (publicly or privately) why someone
would send spam with no product to sell, no position to pitch,
nothing except text designed to get by a spam filter -- without
even HTML to KNOW it got by a spam filter..

Likely two different goals here:

1. Reduce the efficiency of Bayesian-like filters: Trouble with this
kind of email is that they are a) of sufficient length b) contain only
"real" words c) contain none of the words regularly used by spammers
such as the v. word.

It's a lose-lose situation for the spam engine:

- If this message is marked as spam, it increases the likeliness of
false positives, as the message shares different common points with real
email in spam-measuring metrics such as length, percentage of real
words, etc.

- If this message is marked as legit, it reduces the catching abilities
of the spam engines as it shares similar patterns with a spam that would
be essentially the same text altered to a spam content.

You can bet that it won't be long until we see such messages that not
only use only dictionary words, but furthermore are constructed with a
valid grammar (and still mean nothing). One of the next fronts in spam
detection is based on grammatical correctness. What we are looking at is
the eternal battle between the shield and the weapon: as soon as someone
invents a new shield, someone else develops a new weapon that will
pierce it.

2. It might a statistical probe spam:
Spammer xyz has a list of 1 million addresses, out of which 500,000 are
invalid and bounce. By using a return address that is actually not
bogus, the spammer can indirectly measure the efficiency of Bayesian
outsmarting strategies.

First the spammer send a spam that will be blocked by the majority of
spam-detection engines (even the dumber ones) by including correctly
spelled well-knows spam words in both subject and text. You know, the
stuff that promises to put a foot in your pants that features
"always-on" service.

Let's say that our spammer gets 150,000 bounces out of this one, the
math is simple: out of 500,000 potential bounces they got only 150,000
which means that 350,000 have been blocked by spam engines prior to the
non-existing-user bounce.

Then, our spammer sends the kind of email you referred to the same list
and measures the bounce rate one more time. If this time he gets 450,000
bounces, it means that only 50,000 out of the 500,000 potential bounces
have been blocked by spam engines, which in turns mean that the same
email slightly alter will reach a large part of its targets.

This is a simplified view, as bounce rate alone is not a valid
measurement of outsmarting strategies, but correlating two or three of
that kind of metric gives a reasonably precise of which spamming
techniques still work, and which have become a waste of bandwidth.

Michel.

When someone dies, it's a tragedy.
When millions die, it's a statistic.
    -- Josef Stalin --

> Deepak Jain wrote:
> Can someone explain to me (publicly or privately) why someone
> would send spam with no product to sell, no position to pitch,
> nothing except text designed to get by a spam filter -- without
> even HTML to KNOW it got by a spam filter..

I'm surprised you only got it now. I had been receiving emails like that
for probably at least a year.

Likely two different goals here:

1. Reduce the efficiency of Bayesian-like filters: Trouble with this
kind of email is that they are a) of sufficient length b) contain only
"real" words c) contain none of the words regularly used by spammers
such as the v. word.

Have to agree, this foremost the reason.

Its interesting however that spammers are doing it not for their own companies
specific interest but for interest of their spamming industry in general

You can bet that it won't be long until we see such messages that not
only use only dictionary words, but furthermore are constructed with a
valid grammar (and still mean nothing).

I already saw it. Right now its just random phrases being put together and
not yet entire text. And somewhere (actually several years ago), I've read
of AI program capable of creating complete stories when its given some key
phrases to start with, would not be surprised if same or similar algorithms
began to be used.

Personally I do not believe that bayesian filtering (or text filtering in
general) is the way to fight spam, there is too much chance of filtering
false positives along the way (and it is only increasing as spammers are
is evident by what is discussed in this thread). Its better to focus on
authentication of the source source and of trust mechanisms for
legitimate mail senders. Spammers have a problem taht they are often
operating against the laws or policies of their providers and they have to
try to hide their identity and the mechanisms they use for that can be
identified and loopholes closed as much as possible.

Good bayesian filters do not score on single words alone, they also
score on "phrases" (ie multiple words). Random strings of words will
result in neutral scores (presuming those words are also used in
non-spam), while the phrases will be slightly higher. Re-used
gibberish (ie apparently random) strings of words will result in
"phrases" from that gibberish having high scores.

Also, a good bayesian filter should prune its database regularly of
phrases (including one word phrases) that have not had their score
updated recently, further reducing "pollution" by random words and
phrases.

noise is just noise. the spam specific stuff will still be
statistically significant, hopefully.

regards,