Exploration of Neuro-Fuzzy Spam Filtering based on
Naive Bayesian Filters
Madhujit Ghosh and Jonathan J. Guernsey
Dr. Devinder Kaur
EECS 6360 - Knowledge Based Systems
Abstract: A text parser was used to calculate the
statistical distribution of words within an email
body. This information was used by a neuro-
fuzzy system to determine the spam classification
of the email. This process of detecting spam in
an email was experimentally found to be 90%
efficient. This design is exceptionally good as
compared to present day filters based on its
simplicity and limited scope of detection
methods. Our system could be further improved
by incorporating other identifiers of email spam.
Introduction
Spam is unsolicited email on the internet. A
major problem facing internet computer users
today is the deluge of unwanted and often rude
email waiting in
their email-boxes every
morning. This unsolicited email is flooding the
Internet with many copies of the same message,
in an attempt to force the message on people who
would not otherwise choose to receive it. Most
spam is commercial advertising, often for
dubious products, get-rich-quick schemes, or
quasi-legal services. Spam costs the sender very
little to send. Most of the cost is absorbed by the
recipient or by the carriers rather than by the
sender. This is mostly in the form of lost
productivity or network resources.[5] Current
estimates suggest that up to 30% of today's e-
mail volume is spam, a figure likely to increase
over time. In addition to the unnecessary strain it
places on a corporate network, spam frequently
contains viruses.
According to Paul Graham in his now well
known internet article, “A Plan for Spam.”
… they (spammers) have to deliver
their message, whatever it is. If we can
write software that recognizes their
messages, there is no way they can get
around that.”
Paul Graham introduced the idea of using Naive
Bayesian Filters (as pure discrete sets) in his
article.[3] This
information was
further
assimilated by I. Androutsopoulos et al in their
art