MovieSent Dual Approach Sentiment Analysis

Machine Learning / AI

Project Details

Project Information

Project Title: MovieSent Dual Approach Sentiment Analysis

Category: Machine Learning / AI

Semester: Spring 2025

Course: CS619

Complexity: Complex

Supervisor Details

Project Description

MovieSent Dual Approach Sentiment Analysis

Project Domain / Category

Machine Learning / Natural Language Processing (NLP)

Abstract / Introduction

Imagine a bustling film festival where organizers and critics are eager to gauge audience sentiment on the latest releases. With thousands of reviews pouring in from social media and dedicated review sites, manually analyzing each comment becomes impractical. MovieSent – Dual Approach Sentiment Analysis steps in as a powerful tool to automatically assess the sentiment behind movie reviews.

 

By leveraging both a traditional Logistic Regression model and an advanced LSTM network, the system can quickly determine whether public opinion is positive, negative, or neutral. This dual approach not only provides comprehensive insights into viewer reactions but also enables film studios, critics, and streaming platforms to make informed decisions based on real-time sentiment trends. With an interactive web interface, users can simply submit a review and instantly see the analysis outcome, making MovieSent an indispensable asset in today’s fast-paced entertainment industry.

 

Functional Requirements:

The functional requirements of this project are given below:

1.  Data Collection

Requirement: Load a dataset containing at least 5,000 movie reviews. Details:

·    Use the provided dataset from this Google Drive link: 5000 Movie Reviews Dataset.

·    Ensure the dataset includes sentiment labels (e.g., Positive, Negative, or Neutral).

2.  Data Preparation

Requirement: Clean and verify the dataset. Details:

·    Manually review and confirm sentiment labels.

·    Save the data in CSV or JSON format.

·    Include any additional metadata if available (e.g., review date, movie genre).

3.  Data Pre-Processing

Requirement: Normalize and preprocess the raw text data. Details:

·    Remove HTML tags, punctuation, and special characters.

·    Convert text to lowercase.

·    Tokenize reviews into words.

·    Remove stopwords and perform stemming/lemmatization.

·    Handle missing values and remove duplicate entries.

 

4.  Feature Extraction

Requirement: Convert text into numerical representations for model input. Details:

·    For Logistic Regression: Apply TF-IDF vectorization with N-Grams (Uni-Gram, Bi-Grams, Tri-Grams).

·    For LSTM: Tokenize text, pad sequences to a fixed length, and optionally use an Embedding layer (e.g., with pre-trained GloVe embeddings).

5.  Train & Test Data Splitting

Requirement: Partition the dataset into training and testing sets. Details:

·    Use a 70/30 split, ensuring stratified sampling to preserve class distribution.

6.  Model Development – Logistic Regression Requirement: Build a classical sentiment analysis model. Details:

·    Train a Logistic Regression classifier using scikit-learn on TF-IDF features.

·    Evaluate its performance using accuracy, precision, recall, and F1-score.

7.  Model Development LSTM

Requirement: Build a deep learning sentiment analysis model using an LSTM network. Details:

·    Construct an LSTM-based network that includes an Embedding layer, one or more LSTM layers, and dropout for regularization.

·    Compile the model with an appropriate loss function (binary or categorical cross-entropy) and optimizer (e.g., Adam).

·    Train the model on tokenized and padded data.

8.  Performance Evaluation

Requirement: Evaluate and compare both models. Details:

·    Generate confusion matrices for each model.

·    Compute evaluation metrics (accuracy, precision, recall, F1-score) and analyze results to determine which approach performs better.

9.  Web Interface Integration

Requirement: Develop a web application to showcase real-time sentiment analysis. Details:

·    Create a backend using Flask (or Django) to serve both models via RESTful API endpoints.

·    Build a responsive front-end using HTML/CSS and optionally JavaScript/Bootstrap.

·    Allow users to input movie reviews and select the model (Logistic Regression or LSTM) to get predictions.

·    Implement error handling and provide clear instructions.

Tools:

·     Programming Language:

o  Python: Primary language for data processing, model development, and backend services.

·       Development Environments / IDEs:

o  Anaconda: Python distribution platform to manage environments and dependencies.

o  Jupyter Notebook: For interactive coding, exploratory data analysis, and prototyping.

o  Visual Studio Code (or PyCharm): For writing, debugging, and managing the project code.

 

·       Libraries & Frameworks:

o  Data Processing: Pandas, NumPy, NLTK, spaCy.

o  Machine Learning & Deep Learning: scikit-learn, TensorFlow/Keras.

o  Web Development: Flask (or Django) for building the backend API; HTML/CSS and optionally JavaScript/Bootstrap for the front-end.

·       Other Tools:

o  Joblib: For model serialization (saving and loading models).

o  Git: For version control and collaboration (optional).

Supervisor:

Name: Muhammad Bilal

Email ID: bilal.saleem@vu.edu.pk

Skype ID: bilalsaleem101

 

Languages

  • Python, HTML, CSS, JavaScript Language

Tools

  • Anaconda, Jupyter Notebook, Visual Studio Code, PyCharm, Pandas, NumPy, NLTK, spaCy, scikit-learn, TensorFlow, Keras, Flask, Django, Bootstrap, Joblib, Git Tool

Project Schedules

Assignment #
Title
Start Date
End Date
Sample File
1
SRS Document
Friday 2, May, 2025 12:00AM
Thursday 22, May, 2025 12:00AM
2
Design Document
Friday 23, May, 2025 12:00AM
Tuesday 29, July, 2025 12:00AM
3
Prototype Phase
Wednesday 30, July, 2025 12:00AM
Friday 12, September, 2025 12:00AM
4
Final Deliverable
Saturday 13, September, 2025 12:00AM
Monday 3, November, 2025 12:00AM

Viva Review Submission

Review Information
Supervisor Behavior

Student Viva Reviews

No reviews available for this project.