Anti-Spam Filtering
IBM
Research is developing an enterprise-class anti-spam filter
as part of our overall strategy of attacking the Spam problem
on multiple fronts. Our
anti-spam filter, SpamGuru, mirrors this philosophy by
incorporating several different filtering technologies and
intelligently combining their output to produce a single
spamminess rating or score for each incoming message. The use of multiple
algorithms improves the system's effectiveness and makes it
more difficult for spammers to attack. While a spammer may
defeat any single algorithm, SpamGuru can rely on its
remaining algorithms to maintain a high-degree of
effectiveness.
We are using SpamGuru as a testbed for exploring a number
of existing and new technologies for indentifying incoming
spam. The main technologies currently under investigation
include:
- JClassifier is a
Bayesian-style text classifier loosely based on Paul Graham's
original design.
- Chung-Kwei applies advanced
pattern matching algorithms developed in IBM's bioinformatics
group to spam detection. This new classification algorithm
can detect complex patterns in messages that go beyond the
simple word or word phrases used in most algorithms.
- Plagiarism Detection.
SpamGuru's employs research in plagiarism detection to
accurately detect textual variations of known spam to ensure
that simple variations of previously identified spam are also
blocked. SpamGuru's plagiarism detection algorithms have a
low very false positive rate due to their reliance on a near
match to known spam.
- Spoof Detection. SpamGuru's spoof detection algorithm analyzes DNS and domain records to determine whether the message is likely to have been spoofed or sent from a less reliable SMTP server. SpamGuru’s DNS analysis provides most of the advantages of the MARID MTA authentication record without the need for explicit publication of outgoing mail servers.
- Intelligent Rendering.
SpamGuru's intelligent rendering algorithm analyzes a
messages MIME encoding to extract what the user is likely to
see when reading a message rather than what a spammer wants
the filter to see. In the process, attempts to obsfucate a
message's true content are noted and passed as features to
SpamGuru's classification algorithms.
- Classifier Aggregation. We are investigating a variety of dynamic adaptive techniques for combining the evidence provided by multiple classifiers. This yields a single classifier that is both more accurate and more robust than any of its constituents.
SpamGuru technology forms the basis of the Intelligent
Mail Filter that is shipping as a technology preview in
Lotus Workplace 2.0, IBM's next-generation messaging and
collaboration framework. Please see the Lotus Workplace web
site for availibility and purchasing information.
|