Randy Bush <email@example.com> writes:
>> this would be a straight sample, before filtering, ip address
>> blocking, etc.
> i realize this is difficult, as all of us go through much effort to
> reject this stuff as early as possible. but it will be a sample
> unbiased by your filtering techniques.
How do you classify email as spam without adding bias?
You can always claim bias.
There's often been debate, even in the anti-spam community, about what
"spam" actually means. The meaning has repeatedly been diluted over the
years, to a point where some now define it merely as "that which we do
not want," an attitude supported in code by some service providers who
now sport great big "Easy Buttons" (with apologies to any office supply
chain) labelled "This Is Spam."
Even so, there's some complexity - users making typos, for example.
However, the easiest way to avoid bias is to look for a mail stream that
has the quality of not having any valid recipients. There will be, of
course, someone who will disagree with me that mail sent to an address
that hasn't been valid in years, and whose parent domain was unresolvable
in DNS for at least a year is spam. However, it's as unbiased as I can
reasonably imagine being.