June 29, 2022

Analyze Twitter’s Reaction to Taylor Swift with HarperDB (Part 1)

Welcome to Community Posts
Click below to read the full article.
Summary of What to Expect
Table of Contents

This project aims at extracting tweets that include the topic ‘Taylor Swift’ and performing a detailed analysis by exploiting natural language processing.

1. Data Collection:

  • Install Twint library for Python to scrape tweets with the keyword "Taylor Swift."

  • Use Twint to scrape tweets and store them in a CSV file.

2. Data Preprocessing:

 • Handle null values by dropping unnecessary columns and rows.

  • Identify the primary key for the dataset.

  • Preprocess tweet text by removing hashtags, URLs, mentions, emojis, punctuation, and digits.

 • Tokenize the cleaned tweets and remove stop words.

3. Save the cleaned data in a CSV file.

4. Set up HarperDB:

  • Create a HarperDB account and instance.

  • Launch the instance, create a schema, and define tables.

  • Import the cleaned data into HarperDB.

See part two of this series as well.