click here to download project abstract/base paper
Abstract
This paper presents detection of Spam and ham messages using various supervised machine learning algorithms like naïve Bayes Algorithm, support vector machines algorithm, and the maximum entropy algorithm and compares their performance in filtering the Ham and Spam messages. As people indulge more in Web-based activities, and with rising sharing of private – data by companies, SMS spam is very common. SMS spam filter inherits much functionality from E-mail Spam Filtering. Comparing the performance of various supervised learning algorithms we find the support vector machine algorithm gives us the most accurate result.
Literature Survey
SMS spam detection is comparatively a new research area than email, social tags, and twitter and web Spam detection. Some of the researches of Spam detection includes [1], [2], [3] etc. These researches are mostly conducted after 2011. There are several established email spam detection techniques. SMS spam detection technique has some challenges over email spam detection such as restricted message size, use of regional and shortcut words and limited header information. These challenges need to be solved. There is scope of research in this field and some research works have been conducted on it. There are different categories of SMS spam filtering such as white listing and black listing, content-based, non-content based, collaborative approaches and challenge-response technique [4], [5], [12], [29]. The techniques are used in client side, server side or in both client and server side [4]. Several Machine Learning Algorithms such as Naïve Bayes, Support Vector Machine (SVM), Logistic Regression, Decision Trees, K-Nearest Neighbor are used to classify between Spam and legitimate SMSes named as Ham. Discussion about the machine learning algorithms, process and techniques of spam filtering is discussed in the following subsections.