Data Mining/Crawling through a Mailing List


A bit off topic but i was looking for a way/tool that could crawl through
nanog(or other) archives and try to filter most common discussions and
things like that, if anyone is aware of such a tool, pls let me know.


That tool will have its work cut out for it... :wink:

Dump it all into Hadoop and run a word cloud analysis :3.

Honestly it sounds like a cool idea, and I'm sure someone has worked on it before but I don't know anything off the top of my head.


Were you thinking about parsing NANOG and creating a word-based streamgraph
like this?
The author of that streamgraph did provide some additional information on
the steps he took to create it,
but may be too long (including attachments) to post directly to NANOG.

Tony Patti
S. Walter Packaging Corp.