Sometimes you’ll find yourself wanting to use a set of data from a website, but the data won’t be available as an API or in a downloadable format like CSV. In these cases, you may have to write a web scraper to download individual web pages and extract the data you want from within their HTML. This guide will teach you the basics of writing a web scraper using  Node.js, and will note several of the obstacles you might encounter during web scraping.

Fetching Websites using Axio

In our examples, we’ll be using Axios to make our http requests. If you’d prefer something else, like Node Fetch to match the Fetch API until it’s ready in Node.js, that’s fine too. features are

  • Make XMLHttpRequests from the browser
  • Make http requests from node.js
  • Supports the Promise API
  • Intercept request and response
  • Transform request and response data
  • Cancel requests
  • Automatic transforms for JSON data
  • Client side support for protecting against XSRF
npm install axios

SQLite DB

The sqlite3 module is actively maintained and provides a rich set of features:

  • Simple API for query execution
  • Parameters binding support
  • Control the query execution flow, supporting both serialized and parallel modes.
  • Comprehensive debugging support
  • Full caching / Blob support
  • SQLite extension support
  • Bundles SQLite as a fallback
npm install sqlite3

 

Docker File

Simply build the image using 

docker build -t test/nodeweb:v1 .

GitHub

View Github