Project Title: MovieSent Dual Approach Sentiment Analysis
Category: Machine Learning / AI
Project File: Download Project File
Muhammad Bilal
bilal.saleem@vu.edu.pk
bilalsaleem101
Project Domain / Category
Machine Learning / Natural Language Processing (NLP)
Imagine a bustling film festival where organizers and critics are eager to gauge audience sentiment on the latest releases. With thousands of reviews pouring in from social media and dedicated review sites, manually analyzing each comment becomes impractical. MovieSent – Dual Approach Sentiment Analysis steps in as a powerful tool to automatically assess the sentiment behind movie reviews.
By leveraging both a traditional Logistic Regression model and an advanced LSTM network, the system can quickly determine whether public opinion is positive, negative, or neutral. This dual approach not only provides comprehensive insights into viewer reactions but also enables film studios, critics, and streaming platforms to make informed decisions based on real-time sentiment trends. With an interactive web interface, users can simply submit a review and instantly see the analysis outcome, making MovieSent an indispensable asset in today’s fast-paced entertainment industry.
The functional requirements of this project are given below:
Requirement: Load a dataset containing at least 5,000 movie reviews. Details:
· Use the provided dataset from this Google Drive link: 5000 Movie Reviews Dataset.
· Ensure the dataset includes sentiment labels (e.g., Positive, Negative, or Neutral).
Requirement: Clean and verify the dataset. Details:
· Manually review and confirm sentiment labels.
· Save the data in CSV or JSON format.
· Include any additional metadata if available (e.g., review date, movie genre).
Requirement: Normalize and preprocess the raw text data. Details:
· Remove HTML tags, punctuation, and special characters.
· Convert text to lowercase.
· Tokenize reviews into words.
· Remove stopwords and perform stemming/lemmatization.
· Handle missing values and remove duplicate entries.
Requirement: Convert text into numerical representations for model input. Details:
· For Logistic Regression: Apply TF-IDF vectorization with N-Grams (Uni-Gram, Bi-Grams, Tri-Grams).
· For LSTM: Tokenize text, pad sequences to a fixed length, and optionally use an Embedding layer (e.g., with pre-trained GloVe embeddings).
Requirement: Partition the dataset into training and testing sets. Details:
· Use a 70/30 split, ensuring stratified sampling to preserve class distribution.
6. Model Development – Logistic Regression Requirement: Build a classical sentiment analysis model. Details:
· Train a Logistic Regression classifier using scikit-learn on TF-IDF features.
· Evaluate its performance using accuracy, precision, recall, and F1-score.
Requirement: Build a deep learning sentiment analysis model using an LSTM network. Details:
· Construct an LSTM-based network that includes an Embedding layer, one or more LSTM layers, and dropout for regularization.
· Compile the model with an appropriate loss function (binary or categorical cross-entropy) and optimizer (e.g., Adam).
· Train the model on tokenized and padded data.
Requirement: Evaluate and compare both models. Details:
· Generate confusion matrices for each model.
· Compute evaluation metrics (accuracy, precision, recall, F1-score) and analyze results to determine which approach performs better.
Requirement: Develop a web application to showcase real-time sentiment analysis. Details:
· Create a backend using Flask (or Django) to serve both models via RESTful API endpoints.
· Build a responsive front-end using HTML/CSS and optionally JavaScript/Bootstrap.
· Allow users to input movie reviews and select the model (Logistic Regression or LSTM) to get predictions.
· Implement error handling and provide clear instructions.
· Programming Language:
o Python: Primary language for data processing, model development, and backend services.
o Anaconda: Python distribution platform to manage environments and dependencies.
o Jupyter Notebook: For interactive coding, exploratory data analysis, and prototyping.
o Visual Studio Code (or PyCharm): For writing, debugging, and managing the project code.
o Data Processing: Pandas, NumPy, NLTK, spaCy.
o Machine Learning & Deep Learning: scikit-learn, TensorFlow/Keras.
o Web Development: Flask (or Django) for building the backend API; HTML/CSS and optionally JavaScript/Bootstrap for the front-end.
o Joblib: For model serialization (saving and loading models).
o Git: For version control and collaboration (optional).
Name: Muhammad Bilal
Email ID: bilal.saleem@vu.edu.pk
Skype ID: bilalsaleem101
No reviews available for this project.