August 17, 2022

How to Web Scrape Using Python, Snscrape & HarperDB

Welcome to Community Posts
Click below to read the full article.
Summary of What to Expect
Table of Contents
  1. Create a HarperDB Account: Sign up on https://harperdb.io/ or sign in at https://studio.harperdb.io/.
  2. Create a HarperDB Cloud Instance: Follow instructions to create a cloud instance for storing and fetching scraped data.
  3. Configure HarperDB Schema and Table: Create a schema (e.g., "data_scraping") and a table (e.g., "tweets") with a hash attribute.
  4. Install Required Packages: Install "harper-sdk-python" (pip install harperdb) and "snscrape" (pip install snscrape).
  5. Import Packages: Import necessary packages for Twitter scraping and HarperDB.
  6. Connect to HarperDB Cloud Instance: Connect to the cloud instance using the instance URL, username, and password.
  7. Create Function to Record Scraped Tweets: Define a function to insert scraped data into the "tweets" table.
  8. Scrape Tweets Using snscrape: Use snscrape to scrape tweets based on a search query and save them to the table.
  9. View the Tweets Table: Access your HarperDB cloud instance to view the scraped data in the "tweets" table.

Creating Custom Functions with HarperDB (Optional):

  1. Enable Custom Functions: Enable Custom Functions in HarperDB Studio.
  2. Create a Project: Create a project with a specified name, generating necessary files.
  3. Define a Route: Create a route to fetch data from the "tweets" table using SQL.
  4. Access Data via API Endpoint: Send an API request to the defined route to retrieve the data.