Libraries and Lazy Coders - Using npm to Create and Use Node Packages

Libraries and Lazy Coders - Using `npm` to Create and Use Node Packages

TIME LIMIT: 4 hours.

NOTE: The material in this chapter relies partly on information and instructions in the previous chapter.

Productive software engineers do not reinvent wheels. They say “Huh, I wonder if anybody invented a software library where I can import a WheelMaking function or class and then make instances of that wheel for my own software?” Although it’s important to have an idea of what a library’s underlying code is doing, there is a great deal to be said for this sort of “black box” functionality, wherein you just import a module of code and then use its API in your custom code. In this chapter, we will begin using libraries in the form of Node.js packages. We will also begin organizing our own code as Node.js packages.

We recreate last chapter’s HTTP server and data processing functionality, but we will reorganize the code so these two functionalities are split into two separate applications. Each of these applications will be their own Node.js package - named api and etl, respectively (ETL = extract-transform-load). In the process, we will explore packages and package managers.

So, first, what is a package?

Supercharge Your Workflow With Packages and Package Managers

Software libraries like Bootstrap and jQuery have tremendously enabled up to this point, but we have not been importing them in a very robust way. We have been manually copying and pasting HTML tags with links to specific URLs where we say that code lives.

This presents a host of potential problems. Some problems are external: What if the URL location of that code changes (which will surely necessitate an update to this book)? What if the URL remains active, but is now uploading something besides the software library we really wanted? What if the library vendors released a new version and that’s what’s at the URL now? Problems could also be created internally: What if I make a mistake the next time I am copying and pasting a script tag and I use different versions of a library on different pages?

These sorts of problems can be largely solved by using libraries as packages. A package is some coherent group of software files (like a library), coupled with some metadata about what those files are. Common metadata includes things like the software’s name, the version number of the specific files, who wrote the software (the vendor), etc. A package’s metadata also often includes a list of its own dependencies - other packages that package relies upon. This metadata is stored in a specially formatted file, which can be generally called a manifest file. In a Node.js file, package.json is the name of the manifest file. Having a package.json file in the root of a directory indicates that that directory is a Node.js package.

So that’s all a package is. A bunch of code files and an additional manifest file describing stuff about those code files. Below, we will explain how you can create your own manifest files with the npm init command.

Manifest files and the software they describe are often collected and/or compressed into a single *archive file*. By collecting everything you need into one file, archive files enhance the *portability* of software. Some common archive file formats include `.dmg`, `.tar`, and `.zip`

Where Do These Package Management Systems Come From?

Once a programming ecosystem gets large enough, people start looking for easier ways to distribute, download, and manage third-party software. Eventually, some members begin creating package management system. Package management systems set standards for how manifest files should be structured. They also often establish package registries, which are centralized organizations where package vendors register their package and conform to a certain set of standards. Vendors will release versions of their packages through the registries. Package consumers (like you!) can thus be assured some level of stability in the code upon which they are building their systems.

Packagement systems exist for a variety of ecosystems.

JavaScript has the world’s largest registry of packages in the world. It’s called NPM.

NPM the registry

If you go to npmjs.com, you can see the online home of the official Node.js software registry. As of 2018, the documentation is fairly informative. You can search for packages there. I recommend browsing and searching the site for things like HTTP, authentication, and Bootstrap. Spend a few minutes getting to know the site.

`npm` the command-line tool

We installed the npm command in /usr/local/bin in the last chapter. npm is the command line tool for Node.js packagement management, and it comes with dozens of npm <command> commands to help you. You can run npm help to begin orienting yourself and see a list of available commands.

Coding skills are best learned by example, which is why this chapter has you create two different Node.js packages of your own. Before we dive into the CityBucket scenario, though, let’s quickly review two commands - npm init and npm install.

Initializing Your Own Node.js Packages

To initialize a Node.js project from scratch on the command line:

mkdir web-app
cd web-app
npm init

That npm init command instantiates a node package, which asks you to declare certain metadata about your project, including project name, keywords (for search, in the event that you publish the package to the NPM registry), repository URL for your code, project homepage, etc. When in doubt, just press enter, and Node will enter a default value for each piece of metadata. Since you can freely edit the manifest file npm init creates, what we enter there doesn’t matter.

When you are finished answering questions, npm init creates package.json. You can go see it in the root of your project. This is where all of the Node Package Registry-related metadata lives. If you’re following along on the command line, open this in your text editor (or run cat package.json on the command line) and take a look.

Installing Third-Party Packages

To install a specific version of a package and save it for future use in your package.json, you would run the npm install command with the --save option:

npm install jquery@3.3.1 --save

If you go look at your package.json, you will see that a new field named dependencies has been nested at the top-level. The key-value pairs of dependencies are the names packages (and their version numbers) that you are saying your package must have in order to work correctly.

Say you were to version control this project and push it to GitHub. If anyone were to clone, they could cd into its root and run the command

npm install

with no options or parameters, and npm would go ahead and install verion 3.3.1 of jQuery (and whatever the hell else other packages you decided to add). This explicit declaration of package dependencies makes software much more organized and easier to work with. The rise of package management systems and package registries is a tremendously important development in the software field.

Let’s make some packages!

CityBucket Moves To Service-Oriented Architecture

It’s Monday. The mood around the office was slightly tense in the wake of the Backend Engineer leaving. But you’re not worried – you cranked out that backend work last week. Hell, you got hired as a frontend person – if they have you doing backend, don’t you probably deserve a raise, you’re wondering…

The door opens. The CTO and Lead Engineer walk in.

CTO: Hey, how are you today? Saw that little Node processing pipeline you put together last week. Good stuff.

Lead Engineer: Did you learn anything good from your research spike?

You: Pretty standard parsing and HTTP server work. Though I will say that I think the code is kind of awkward because we didn’t use any frameworks or libraries. The parsing code I wrote only works in a pretty specific instance. The HTTP server was kind of a tangled mess of callbacks and fs code. We should probably use npm packages to greater effect.

CTO: I agree completely.

Lead Engineer: We are going to completely rewrite all of the functionality we developed in the CSV-to-JSON processing script and the HTTP server script. This will be a new project. You can name the repo backend_services, because it’s going to contain all of our non-client software from here on out.

CTO: Yes, we are gravitating towards a service-oriented architecture. This means that we are going to start dividing up our cohesive units of functionality into their own separate applications. If you think about it, your processing and serving scripts really had nothing to do with each other. One was converting data. The other was serving data over HTTP connections. The only thing they had in common was the data. Aside from the fact they’re both JavaScript, it didn’t make sense to have the parsing and server scripts in the same scripts folder.

You: Yes. There were basically three layers to last chapter’s. First, a storage layer, or database, where we kept data. Second, an ETL layer – we extracted data from the CSV, transformed it with JavaScript into JSON, then loaded it into a JSON file. And third, we had an API layer with an /api/reports endpoint.

CTO: Exactly. And now let’s break these up. In the backend_services repo, let’s begin with these three components.

![database, app1, app2]

CTO: So, to put a fine point on it, there will be three directories in your repository to begin with. First, you will have a database folder. This will be the exact same as the data folder in the previous chapter. It will contain raw and processed sub-folders. We are just calling it database instead because…

You: Because 1) it is a database, and 2) we will probably eventually replace it with a more robust storage layer, e.g. a SQL or NoSQL solution, but just definitely not a bunch of folders we are version controlling….

Lead Engineer: Yes. It fits the mental model of where we should take that folder in the future.

CTO: Sounds like you’ve got it.

CTO: The other two folders should be named etl (for extract-transform-load) and api. While a database folder is just a database folder, api and etl are services. Each one of them should be a node package. Does that make sense to you?

You: Yes. When I start this project and create the three folders, I should cd into etl and api folders and separately run the npm init command in each one. The ETL and API folders should each have their own package.json manifest files. Each is separately declaring their own dependencies, etc. Any future interaction between the two should be done with module.exports and require statements, i.e. APIs they expose to one another.

CTO: Yes, that is the general idea. In this instance, though, the ETL and API services don’t need to talk to each other. If they did in the future, that would be how we did it. This is how service-oriented architecture works.

The ETL Service

Lead Engineer: Let’s talk about the etl service/package/application. First off, we made a slight change we made in the raw CSV specifications. From now on, our clients are going to be sending us those report CSVs with headers on them. The data will be in the same format, but there will now be an informational line at the top. Here is the healthcare.csv file we want you to use as dummy data. It belongs in the database/raw folder:

label, value
vaccinations, 87%
dental visits, 33412
eyeglasses issued, 15321

Lead Engineer: See that header info at the top? It tells you what each of the comma-delimited values below it stands for. This makes it much easier to convert to JSON. If you install the Node package, CSVTOJSON, the documentation should tell you how to easily convert this to JSON. Just install and save it to the package.json with npm install csvtojson@2.0.0 --save.

Lead Engineer: Once you’ve got the software installed, just create an index.js file at the application’s root. Use CSVTOJSON and the fs module to do the ETL work. I’ll be surprised if it’s more than 10 or so lines of code. Sound good?

You: I think so – let’s make sure wer are on the same page. Is this a relatively accurate diagram of what you’re thinking, structure-wise, for the database and etl folders?

two big boxes with some arrows between them, including all files

Lead Engineer: Perfect, yes. The ETL service reads CSVs from the database, transforms the data into JSON, then loads that JSON back into the database.

You: Very straightforward.

The API Service

CTO: OK, now onto the API service/package/application/server.

Lead Engineer: Same setup. cd into the api directory and run node init to make that directory a Node package.

Lead Engineer: For our server software, we are going to dial up from a library all the way to a framework, the ExpressJS server framework.

Frameworks are more rigid than libraries. If you're using one, you are agreeing to *inversion of control*, wherein the framework says "Hey, write your code THIS way, and then I will do this thing for you. But you have to do it my way."

You: Which version?

Lead Engineer: npm install express@4 should do.

Lead Engineer: Still just one API endpoint, but I want this one to accept a reportId path variable. It’s easy to register endpoints (or routes) on the Express object, the pseudocode looks something like this.

app.get(
  '/api/reports/:REPORT_ID',
  function (requestObject, responseObject) {
  --> get file at /data/processed/REPORT_ID.json
  --> send that file in Response object.
});

Lead Engineer: See the colon in the registered route? That’s how we indicate the path variable. So, using the raw/healthcare.csv dummy data, we have above… once you’ve run the ETL script, processed/healthcare.json should be available at http://localhost:4444/api/reports/healthcare.

You: Got it. I understand all of that. Port 4444, though?

Lead Engineer: Ah, yes. Let’s talk about how we’re going to structure the API service. I anticipate that the CityBucket API service will ultimately be deployed in a variety of different ways, so I want to keep the app both flexible and secure. We’re going to use environmental variables to help us configure the Express application.

Lead Engineer: Aside from the node_modules folder, the api package contains four files.

Lead Engineer: File 1, package.json, is our manifest file.

Lead Engineer: File 2, app.js, is where we instantiate our Express app (const app = Express()) and register the API route. However, you do not actually run the server there. Once you have that Express app and route together, use Node’s module.exports function. Set module.exports = app.

Lead Engineer: File 3, index.js, is where the action happens. It’s only four lines of code and looks like this:

require('dotenv').config();
const app = require('./app.js');
app.listen(process.env.API_PORT);
console.log("API listening at http://localhost:" + process.env.API_PORT);

You: What’s going on with that require('dotenv')?

Lead Engineer: Ah yes, that leads us to File 4, .env. This is a file where we store environmental variables (Remember the 12 Factor app?). We are just going to start with one. Our .env file is just one line:

API_PORT=4444

You: OK, I get. We create the Express app and API endpoint in app.js and then module.export the app. We import that app into index.js. We then run the app in that file, configuring it to listen on port 4444. I am assuming that dotenv is a Node package that attaches variables in a .env file to the Node’s global process object?

Lead Engineer: Yes. Node’s globals are so helpful. __dirname for the current directory name, process for information about the Node process running on the computer and it’s environment, console for logging info…. And of course: module.exports and require for module exporting and importing, respectively. You should make use of these in your Node code.

Wrapping Up

Lead Engineer: And as a matter of clean-up work, please add .gitignore files to each of the Node packages you make. We don’t need to be commiting the node_modules folder or debug logs.

You: Don’t keep our own copies of node_modules on version control… That makes sense, because it’s the package management’s system’s job to keep track of package versions.

CTO: Now go forth and spawn services.

Make a Plan

The CTO and Lead Engineer did such a thorough job explaining that it’s easy to make our plan!

We need to make a backend_services git repository and then put three folders in it - database, etl, and api. These three folders represents the three layers/components of our system. We will tackle each component in that sequential order:

First, we will structure our database directory. We will create raw and processed sub-directories. We will upload the healthcare.csv into raw storage.

Second, the extract-transform-load work in the etl directory. We will use the fs and csvtojson node packages to convert the database/raw/healthcare.csv into database/processed/healthcare.json.

Third, we create and launch the api service. The service will contain an app.js module with an Express server object that has an /api/reports/:report_id endpoint attached to it. index.js will import that app module, configure it with a port number determined from an environmental variable (using the dotenv package and the .env dotfile), then run that server. If the right API call is made, it will serve up the processed JSON.

Let’s get to work.

Setup

Go set up a git repository. Clone it to your Desktop.

cd ~/Desktop
git clone https://github.com/zero-to-code/backend_services
cd backend_services

Now make your three directories:

mkdir database api etl

Now let’s add the code to each directory.

Define the Database And Give It The Dummy Data

Set up your raw and processed folders.

cd database
mkdir raw processed
touch raw/healthcare.csv

Copy the healthcare.csv data into the appropriate file.

That’s it. You’re done here. Save and commit:

cd ..
git status
git add database
git commit -m "Database defined. healthcare.csv dummy data added."

Building the The Extract-Transform-Load Service

ETL Package Setup

Time to make your first node package! From the root-level of backend_services:

cd etl
touch index.js .gitignore
npm init

It doesn’t really matter how you answer the npm init questions. Just get through them all, then examine the package.json that was created. Add some dependencies to it:

npm install csvtojson@2.0.0 --save

Go make sure you have v2.0.0 of the CSVTOJSON in your manifest file’s dependencies field now. If you do, let’s commit and continue:

git add .
git commit -m "ETL node package initialized with initial csvtojson@2.0.0 dependency."

ETL Application Code

Now our ten lines of extract-transform-load code in etl/index.js:

const fs = require('fs');
const csv = require('csvtojson');

const csvPath = '../database/raw/healthcare.csv';
const jsonFilePath = '../database/processed/healthcare.json'

csv()
  .fromFile(csvPath)
  .then(writeJSON);

function writeJSON(jsonObj) {
    fs.writeFile(jsonFilePath, JSON.stringify(jsonObj));
}

Wow! That CSVTOJSON package is amazingly easy to use. You import the csv function, which returns an object with a fromFile method on it. When you run the fromFile method and pass it the path of a CSV file that has a header on it (like healthcare.csv) as an argument, the method reads that CSV file into memory and constructs a JSON object exactly like the one we need. When that method is complete, it uses the ES 6 promise then method. The then method takes a function as an argument, and it calls that function with the JSON object the fromFile method constructed. If this sounds complicated, just read through the code a few times and move on. The long and the short of it is that this JSON object is passed to our custom writeJSON function, which sends the JSON to our database/processed file. This has allowed us to skip writing all the messy and bug-prone parsing logic we had in our previous chapter.

Test, Iterate, Commit

Test this parsing/ETL script by running node index.js from within the etl directory. Once you’re generating the right database/processed/healthcare.json file, you’ve made it work. Add the .gitignore file the Lead Engineer requested, and you’ll be good to go! It’s just two lines:

node_modules
npm-debug.log  

If you now run git status, you will no longer see the node_modules folder. Git is ignoring it.

Now commit. From the etl folder:

git add .
git commit -m "ETL pipeline constructed. Parsing raw/healthcare.csv to processed/healthcare.json.  Not yet generalized for other files."
git push origin master

Building the API Service

Package Setup

Time to make your second Node package! From the root-level of backend_services:

cd api
touch app.js index.js .env
npm init

Again, it does not matter how you answer the questions. Now install the ExpressJS and DotEnv packages:

npm install express@4
npm install dotenv

Go check your package.json manifest file. Make sure your dependencies look correct. If so, go ahead and save:

git add .
git commit -m "API node package initialized with express@4 and dotenv dependencies."

Application Code

Create and configure your API route on app.js:

const fs = require('fs');
const express = require('express')

const app = express()

// Endpoint for GET /api/reports/:id requests.
app.get('/api/reports/:report_id', function (request, response) {
  const reportID = request.params.report_id;

  // Use template string & node's built-in `__dirname` variable to
  // create path to requested JSON data.
  const path = `${__dirname}/../database/processed/${reportID}.json`;

  const report = JSON.parse(fs.readFileSync(path, 'utf-8'))

  response.json(report);
});

// export application
module.exports = app;

There are two new things here: ExpressJS and module.exports. Express is a minimalistic HTTP server framework, and if you read through the code a few times, you will begin to get a sense for it’s patterns. In short: You create an instance of an Express application: const app = express(). This app object then lets you register routes on it with methods that mimic HTTP methods - GET, PUT, POST, DELETE become app.get, app.post, etc. These methods all take two arguments - the route and a route handler function. This route handler function is a callback function that will be passed two arguments: a request object, and a response object. You can then read information about the request from the request object. You use the response object to configure HTTP headers and/or a body, and to actually send the response. When you don’t configure it much, Express defaults to reasonable headers and status codes.

But we’re not even running the server yet! We are use module.exports to export the app.

Import the app into index.js and prepare to run it:

require(‘dotenv’).config() const app = require(‘./app.js’); app.listen(process.env.API_PORT); console.log(“API is now listening at http://localhost:” + process.env.API_PORT)

One more step, add the .env file:

API_PORT=4444

Test, Iterate, Commit

From the api folder, you can now run node index.js and if it’s working, visit http://localhost:4444/api/reports/healthcare and get the /database/processed/healthcare.json data. When you have it working, add the same .gitignore file to the api package as you did in the etl package, and commit:

git status
git add .
git commit -m "First pass at API service package. Needs error handling, e.g. should send 404s when we request routes not present."

Recap

You have now built your first two packages with NodeJS, implemented a service-oriented architecture, and used third-party Node packages. You are prepared to begin leveraging package management systems as part of your developer workflow. Handily, the knowledge about NPM that you gained here is applicable to other package management systems, as such systems all arise to deal with similar problems.

It is also worth noting that, from top to bottom, we refactored the same system we built in the previous chapter into one with the same functionality but neater code. In particular, we embraced componentization – we broke our top-level functionalities into separate services and packages. Within the API service, we further modularized our code with module.exports and require. This sort of componentization is key to success in web programming and, indeed, in any field which involves finding solutions for managing complexity.

Exercises

Read Chapter 15 of the Linux Book, “Package Management”.
If you are on a Macintosh, download the Homebrew package manager at https://brew.sh.
Go find the npmjs.com package pages for CSVTOJSON, Express, and DotEnv. Peruse the documentation for all three.

x. we are not currently doing anything for people who query our API for a route that does exist. Go implement a 404 route in the api/app module.

convert the etl/index.js script to scan the database/raw folder for any CSV files. If it finds any, it should try to convert them to JSON.