Twitter Spam Classification Using Machine Learning Techniques

click here to download project abstract

click here to download project base paper

ABSTRACT
Stream clustering methods have been repeatedly used for spam filtering in order to categorize input messages/tweets into spam and non spam clusters. These methods assume each cluster contains a number of neighbor small (micro) clusters, where each micro cluster has a symmetric distribution. Nonetheless, this assumption is not necessarily correct and big micro clusters might have asymmetric distribution. To enhance the assigning accuracy of former methods in their online phase, we suggest replacing the Euclidean distance by a set
of classifiers in order to assign incoming samples to the most relative micro cluster with arbitrary distribution. Here, a set of incremental Naïve Bayes (INB) classifier is trained for micro clusters whose population exceeds a threshold. These INBs can capture the mean and boundary of micro clusters, while the Euclidean distance just considers the mean of clusters and acts inaccurate for asymmetric big micro clusters. In this paper, Den Stream was promoted by the proposed framework, called here as INB Den Stream. To show the effectiveness of INB-Den Stream, state-of-the-art methods such as Den Stream, Stream KM++, and Clu Stream were applied to the Twitter datasets and their performance was determined in terms of purity, general precision, general recall, F1 measure, parameter sensitivity, and computational complexity. The compared results implied the superiority of
our method to the rivals in almost the datasets..

Abstract:

This postgraduate project aims to develop a Twitter spam classification system utilizing machine learning techniques. The existing system lacks efficient spam detection mechanisms, leading to increased instances of spam on Twitter. The proposed system leverages advanced machine learning algorithms to enhance the accuracy of spam detection, providing users with a cleaner and safer Twitter experience.

Existing System:

The current Twitter system relies on basic rule-based filters for spam detection, resulting in a high rate of false positives and negatives. This limitation necessitates the implementation of a more robust and adaptive approach.

Proposed System:

The proposed system employs machine learning algorithms such as Naive Bayes, Support Vector Machines (SVM), and Natural Language Processing (NLP) techniques to analyze tweet content and user behavior for accurate spam classification. The system aims to reduce false positives and negatives, enhancing the overall reliability of spam detection.

System Requirements:

Python programming language
Machine learning libraries (e.g., scikit-learn, TensorFlow)
Twitter API for data retrieval
Web development tools for UI implementation

Algorithms:

Naive Bayes
Support Vector Machines (SVM)
Natural Language Processing (NLP)

Hardware Requirements:

Standard computer with sufficient processing power
Adequate RAM for machine learning model training

Software Requirements:

Python IDE
Web development environment
Twitter API access credentials

Architecture:

The system architecture follows a modular structure with components for data retrieval, preprocessing, machine learning model training, and a web-based user interface for end-user interaction.

Technologies Used:

Python
Flask (for web UI)
scikit-learn, TensorFlow (for machine learning)
Twitter API

Web User Interface:

The web-based user interface provides users with an interactive platform for accessing and analyzing spam classification results. It includes features such as real-time updates and a user-friendly design.

Abstract:

Existing System:

Proposed System:

System Requirements:

Algorithms:

Hardware Requirements:

Software Requirements:

Architecture:

Technologies Used:

Web User Interface:

Architecture

Sequence Diagram:

Class Diagram

Comments

Leave a Reply Cancel reply

Convolutional neural network optimized by differential evolution for electrocardiogram classification

COLOR-NEUS: Reconstructing Neural Implicit Surfaces with Color

CODEGEEX: A PRE-TRAINED MODEL FOR GENERATION WITH MULTILINGUAL EVALUATIONS ON HUMANEVAL-X

Chatbot for Health Care System Using AI