Libraries and Lazy Coders - Using npm
to Create and Use Node Packages
TIME LIMIT: 4 hours.
NOTE: The material in this chapter relies partly on information and instructions in the previous chapter.
Productive software engineers do not reinvent wheels. They say “Huh, I wonder if anybody invented a software library where I can import a WheelMaking function or class and then make instances of that wheel for my own software?” Although it’s important to have an idea of what a library’s underlying code is doing, there is a great deal to be said for this sort of “black box” functionality, wherein you just import a module of code and then use its API in your custom code. In this chapter, we will begin using libraries in the form of Node.js packages. We will also begin organizing our own code as Node.js packages.
We recreate last chapter’s HTTP server and data processing functionality, but we will reorganize the code so these two functionalities are split into two separate applications. Each of these applications will be their own Node.js package - named api
and etl
, respectively (ETL = extract-transform-load). In the process, we will explore packages and package managers.
So, first, what is a package?
Supercharge Your Workflow With Packages and Package Managers
Software libraries like Bootstrap and jQuery have tremendously enabled up to this point, but we have not been importing them in a very robust way. We have been manually copying and pasting HTML tags with links to specific URLs where we say that code lives.
This presents a host of potential problems. Some problems are external: What if the URL location of that code changes (which will surely necessitate an update to this book)? What if the URL remains active, but is now uploading something besides the software library we really wanted? What if the library vendors released a new version and that’s what’s at the URL now? Problems could also be created internally: What if I make a mistake the next time I am copying and pasting a script
tag and I use different versions of a library on different pages?
These sorts of problems can be largely solved by using libraries as packages. A package is some coherent group of software files (like a library), coupled with some metadata about what those files are. Common metadata includes things like the software’s name, the version number of the specific files, who wrote the software (the vendor), etc. A package’s metadata also often includes a list of its own dependencies - other packages that package relies upon. This metadata is stored in a specially formatted file, which can be generally called a manifest file. In a Node.js file, package.json
is the name of the manifest file. Having a package.json
file in the root of a directory indicates that that directory is a Node.js package.
So that’s all a package is. A bunch of code files and an additional manifest file describing stuff about those code files. Below, we will explain how you can create your own manifest files with the npm init
command.
Where Do These Package Management Systems Come From?
Once a programming ecosystem gets large enough, people start looking for easier ways to distribute, download, and manage third-party software. Eventually, some members begin creating package management system. Package management systems set standards for how manifest files should be structured. They also often establish package registries, which are centralized organizations where package vendors register their package and conform to a certain set of standards. Vendors will release versions of their packages through the registries. Package consumers (like you!) can thus be assured some level of stability in the code upon which they are building their systems.
JavaScript has the world’s largest registry of packages in the world. It’s called NPM.
NPM the registry
If you go to npmjs.com, you can see the online home of the official Node.js software registry. As of 2018, the documentation is fairly informative. You can search for packages there. I recommend browsing and searching the site for things like HTTP, authentication, and Bootstrap. Spend a few minutes getting to know the site.
npm
the command-line tool
We installed the npm
command in /usr/local/bin
in the last chapter. npm
is the command line tool for Node.js packagement management, and it comes with dozens of npm <command>
commands to help you. You can run npm help
to begin orienting yourself and see a list of available commands.
Coding skills are best learned by example, which is why this chapter has you create two different Node.js packages of your own. Before we dive into the CityBucket scenario, though, let’s quickly review two commands - npm init
and npm install
.
Initializing Your Own Node.js Packages
To initialize a Node.js project from scratch on the command line:
mkdir web-app
cd web-app
npm init
That npm init
command instantiates a node package, which asks you to declare certain metadata about your project, including project name, keywords (for search, in the event that you publish the package to the NPM registry), repository URL for your code, project homepage, etc. When in doubt, just press enter, and Node will enter a default value for each piece of metadata. Since you can freely edit the manifest file npm init
creates, what we enter there doesn’t matter.
When you are finished answering questions, npm init
creates package.json
. You can go see it in the root of your project. This is where all of the Node Package Registry-related metadata lives. If you’re following along on the command line, open this in your text editor (or run cat package.json
on the command line) and take a look.
Installing Third-Party Packages
To install a specific version of a package and save it for future use in your package.json
, you would run the npm install
command with the --save
option:
npm install jquery@3.3.1 --save
If you go look at your package.json
, you will see that a new field named dependencies
has been nested at the top-level. The key-value pairs of dependencies
are the names packages (and their version numbers) that you are saying your package must have in order to work correctly.
Say you were to version control this project and push it to GitHub. If anyone were to clone, they could cd
into its root and run the command
npm install
with no options or parameters, and npm
would go ahead and install verion 3.3.1 of jQuery (and whatever the hell else other packages you decided to add). This explicit declaration of package dependencies makes software much more organized and easier to work with. The rise of package management systems and package registries is a tremendously important development in the software field.
Let’s make some packages!
CityBucket Moves To Service-Oriented Architecture
It’s Monday. The mood around the office was slightly tense in the wake of the Backend Engineer leaving. But you’re not worried – you cranked out that backend work last week. Hell, you got hired as a frontend person – if they have you doing backend, don’t you probably deserve a raise, you’re wondering…
The door opens. The CTO and Lead Engineer walk in.
CTO: Hey, how are you today? Saw that little Node processing pipeline you put together last week. Good stuff.
Lead Engineer: Did you learn anything good from your research spike?
You: Pretty standard parsing and HTTP server work. Though I will say that I think the code is kind of awkward because we didn’t use any frameworks or libraries. The parsing code I wrote only works in a pretty specific instance. The HTTP server was kind of a tangled mess of callbacks and fs
code. We should probably use npm
packages to greater effect.
CTO: I agree completely.
Lead Engineer: We are going to completely rewrite all of the functionality we developed in the CSV-to-JSON processing script and the HTTP server script. This will be a new project. You can name the repo backend_services
, because it’s going to contain all of our non-client software from here on out.
CTO: Yes, we are gravitating towards a service-oriented architecture. This means that we are going to start dividing up our cohesive units of functionality into their own separate applications. If you think about it, your processing and serving scripts really had nothing to do with each other. One was converting data. The other was serving data over HTTP connections. The only thing they had in common was the data. Aside from the fact they’re both JavaScript, it didn’t make sense to have the parsing and server scripts in the same scripts
folder.
You: Yes. There were basically three layers to last chapter’s. First, a storage layer, or database, where we kept data. Second, an ETL layer – we extracted data from the CSV, transformed it with JavaScript into JSON, then loaded it into a JSON file. And third, we had an API layer with an /api/reports
endpoint.
CTO: Exactly. And now let’s break these up. In the backend_services
repo, let’s begin with these three components.
![database, app1, app2]
CTO: So, to put a fine point on it, there will be three directories in your repository to begin with. First, you will have a database
folder. This will be the exact same as the data
folder in the previous chapter. It will contain raw
and processed
sub-folders. We are just calling it database
instead because…
You: Because 1) it is a database, and 2) we will probably eventually replace it with a more robust storage layer, e.g. a SQL or NoSQL solution, but just definitely not a bunch of folders we are version controlling….
Lead Engineer: Yes. It fits the mental model of where we should take that folder in the future.
CTO: Sounds like you’ve got it.
CTO: The other two folders should be named etl
(for extract-transform-load) and api
. While a database folder is just a database folder, api
and etl
are services. Each one of them should be a node package. Does that make sense to you?
You: Yes. When I start this project and create the three folders, I should cd
into etl
and api
folders and separately run the npm init
command in each one. The ETL and API folders should each have their own package.json
manifest files. Each is separately declaring their own dependencies, etc. Any future interaction between the two should be done with module.exports
and require
statements, i.e. APIs they expose to one another.
CTO: Yes, that is the general idea. In this instance, though, the ETL and API services don’t need to talk to each other. If they did in the future, that would be how we did it. This is how service-oriented architecture works.
The ETL Service
Lead Engineer: Let’s talk about the etl
service/package/application. First off, we made a slight change we made in the raw CSV specifications. From now on, our clients are going to be sending us those report CSVs with headers on them. The data will be in the same format, but there will now be an informational line at the top. Here is the healthcare.csv
file we want you to use as dummy data. It belongs in the database/raw
folder:
label, value
vaccinations, 87%
dental visits, 33412
eyeglasses issued, 15321
Lead Engineer: See that header info at the top? It tells you what each of the comma-delimited values below it stands for. This makes it much easier to convert to JSON. If you install the Node package, CSVTOJSON, the documentation should tell you how to easily convert this to JSON. Just install and save it to the package.json
with npm install csvtojson@2.0.0 --save
.
Lead Engineer: Once you’ve got the software installed, just create an index.js
file at the application’s root. Use CSVTOJSON and the fs
module to do the ETL work. I’ll be surprised if it’s more than 10 or so lines of code. Sound good?
You: I think so – let’s make sure wer are on the same page. Is this a relatively accurate diagram of what you’re thinking, structure-wise, for the database
and etl
folders?
Lead Engineer: Perfect, yes. The ETL service reads CSVs from the database, transforms the data into JSON, then loads that JSON back into the database.
You: Very straightforward.
The API Service
CTO: OK, now onto the API service/package/application/server.
Lead Engineer: Same setup. cd
into the api
directory and run node init
to make that directory a Node package.
Lead Engineer: For our server software, we are going to dial up from a library all the way to a framework, the ExpressJS server framework.
You: Which version?
Lead Engineer: npm install express@4
should do.
Lead Engineer: Still just one API endpoint, but I want this one to accept a reportId
path variable. It’s easy to register endpoints (or routes) on the Express object, the pseudocode looks something like this.
app.get(
'/api/reports/:REPORT_ID',
function (requestObject, responseObject) {
--> get file at /data/processed/REPORT_ID.json
--> send that file in Response object.
});
Lead Engineer: See the colon in the registered route? That’s how we indicate the path variable. So, using the raw/healthcare.csv
dummy data, we have above… once you’ve run the ETL script, processed/healthcare.json
should be available at http://localhost:4444/api/reports/healthcare
.
You: Got it. I understand all of that. Port 4444, though?
Lead Engineer: Ah, yes. Let’s talk about how we’re going to structure the API service. I anticipate that the CityBucket API service will ultimately be deployed in a variety of different ways, so I want to keep the app both flexible and secure. We’re going to use environmental variables to help us configure the Express application.
Lead Engineer: Aside from the node_modules
folder, the api
package contains four files.
Lead Engineer: File 1, package.json
, is our manifest file.
Lead Engineer: File 2, app.js
, is where we instantiate our Express app (const app = Express()
) and register the API route. However, you do not actually run the server there. Once you have that Express app and route together, use Node’s module.exports
function. Set module.exports = app
.
Lead Engineer: File 3, index.js
, is where the action happens. It’s only four lines of code and looks like this:
require('dotenv').config();
const app = require('./app.js');
app.listen(process.env.API_PORT);
console.log("API listening at http://localhost:" + process.env.API_PORT);
You: What’s going on with that require('dotenv')
?
Lead Engineer: Ah yes, that leads us to File 4, .env
. This is a file where we store environmental variables (Remember the 12 Factor app?). We are just going to start with one. Our .env
file is just one line:
API_PORT=4444
You: OK, I get. We create the Express app and API endpoint in app.js
and then module.export
the app. We import that app into index.js
. We then run the app in that file, configuring it to listen on port 4444. I am assuming that dotenv
is a Node package that attaches variables in a .env
file to the Node’s global process
object?
Lead Engineer: Yes. Node’s globals are so helpful. __dirname
for the current directory name, process
for information about the Node process running on the computer and it’s environment, console
for logging info…. And of course: module.exports
and require
for module exporting and importing, respectively. You should make use of these in your Node code.
Wrapping Up
Lead Engineer: And as a matter of clean-up work, please add .gitignore
files to each of the Node packages you make. We don’t need to be commiting the node_modules
folder or debug logs.
You: Don’t keep our own copies of node_modules on version control… That makes sense, because it’s the package management’s system’s job to keep track of package versions.
CTO: Now go forth and spawn services.
Make a Plan
The CTO and Lead Engineer did such a thorough job explaining that it’s easy to make our plan!
We need to make a backend_services
git repository and then put three folders in it - database
, etl
, and api
. These three folders represents the three layers/components of our system. We will tackle each component in that sequential order:
First, we will structure our database
directory. We will create raw
and processed
sub-directories. We will upload the healthcare.csv
into raw storage.
Second, the extract-transform-load work in the etl
directory. We will use the fs
and csvtojson
node packages to convert the database/raw/healthcare.csv
into database/processed/healthcare.json
.
Third, we create and launch the api
service. The service will contain an app.js
module with an Express server object that has an /api/reports/:report_id
endpoint attached to it. index.js
will import that app module, configure it with a port number determined from an environmental variable (using the dotenv
package and the .env
dotfile), then run that server. If the right API call is made, it will serve up the processed JSON.
Let’s get to work.
Setup
Go set up a git repository. Clone it to your Desktop.
cd ~/Desktop
git clone https://github.com/zero-to-code/backend_services
cd backend_services
Now make your three directories:
mkdir database api etl
Now let’s add the code to each directory.
Define the Database And Give It The Dummy Data
Set up your raw and processed folders.
cd database
mkdir raw processed
touch raw/healthcare.csv
Copy the healthcare.csv
data into the appropriate file.
That’s it. You’re done here. Save and commit:
cd ..
git status
git add database
git commit -m "Database defined. healthcare.csv dummy data added."
Building the The Extract-Transform-Load Service
ETL Package Setup
Time to make your first node package! From the root-level of backend_services
:
cd etl
touch index.js .gitignore
npm init
It doesn’t really matter how you answer the npm init
questions. Just get through them all, then examine the package.json
that was created. Add some dependencies to it:
npm install csvtojson@2.0.0 --save
Go make sure you have v2.0.0 of the CSVTOJSON in your manifest file’s dependencies field now. If you do, let’s commit and continue:
git add .
git commit -m "ETL node package initialized with initial csvtojson@2.0.0 dependency."
ETL Application Code
Now our ten lines of extract-transform-load code in etl/index.js
:
const fs = require('fs');
const csv = require('csvtojson');
const csvPath = '../database/raw/healthcare.csv';
const jsonFilePath = '../database/processed/healthcare.json'
csv()
.fromFile(csvPath)
.then(writeJSON);
function writeJSON(jsonObj) {
fs.writeFile(jsonFilePath, JSON.stringify(jsonObj));
}
Wow! That CSVTOJSON package is amazingly easy to use. You import the csv
function, which returns an object with a fromFile
method on it. When you run the fromFile
method and pass it the path of a CSV file that has a header on it (like healthcare.csv
) as an argument, the method reads that CSV file into memory and constructs a JSON object exactly like the one we need. When that method is complete, it uses the ES 6 promise then
method. The then
method takes a function as an argument, and it calls that function with the JSON object the fromFile
method constructed. If this sounds complicated, just read through the code a few times and move on. The long and the short of it is that this JSON object is passed to our custom writeJSON
function, which sends the JSON to our database/processed
file. This has allowed us to skip writing all the messy and bug-prone parsing logic we had in our previous chapter.
Test, Iterate, Commit
Test this parsing/ETL script by running node index.js
from within the etl
directory. Once you’re generating the right database/processed/healthcare.json
file, you’ve made it work. Add the .gitignore
file the Lead Engineer requested, and you’ll be good to go! It’s just two lines:
node_modules
npm-debug.log
If you now run git status
, you will no longer see the node_modules
folder. Git is ignoring it.
Now commit. From the etl
folder:
git add .
git commit -m "ETL pipeline constructed. Parsing raw/healthcare.csv to processed/healthcare.json. Not yet generalized for other files."
git push origin master
Building the API Service
Package Setup
Time to make your second Node package! From the root-level of backend_services
:
cd api
touch app.js index.js .env
npm init
Again, it does not matter how you answer the questions. Now install the ExpressJS and DotEnv packages:
npm install express@4
npm install dotenv
Go check your package.json
manifest file. Make sure your dependencies look correct. If so, go ahead and save:
git add .
git commit -m "API node package initialized with express@4 and dotenv dependencies."
Application Code
Create and configure your API route on app.js
:
const fs = require('fs');
const express = require('express')
const app = express()
// Endpoint for GET /api/reports/:id requests.
app.get('/api/reports/:report_id', function (request, response) {
const reportID = request.params.report_id;
// Use template string & node's built-in `__dirname` variable to
// create path to requested JSON data.
const path = `${__dirname}/../database/processed/${reportID}.json`;
const report = JSON.parse(fs.readFileSync(path, 'utf-8'))
response.json(report);
});
// export application
module.exports = app;
There are two new things here: ExpressJS and module.exports
. Express is a minimalistic HTTP server framework, and if you read through the code a few times, you will begin to get a sense for it’s patterns. In short: You create an instance of an Express application: const app = express()
. This app
object then lets you register routes on it with methods that mimic HTTP methods - GET, PUT, POST, DELETE become app.get
, app.post
, etc. These methods all take two arguments - the route and a route handler function. This route handler function is a callback function that will be passed two arguments: a request object, and a response object. You can then read information about the request from the request object. You use the response object to configure HTTP headers and/or a body, and to actually send the response. When you don’t configure it much, Express defaults to reasonable headers and status codes.
But we’re not even running the server yet! We are use module.exports
to export the app.
Import the app into index.js
and prepare to run it:
require(‘dotenv’).config() const app = require(‘./app.js’); app.listen(process.env.API_PORT); console.log(“API is now listening at http://localhost:” + process.env.API_PORT)
One more step, add the .env
file:
API_PORT=4444
Test, Iterate, Commit
From the api
folder, you can now run node index.js
and if it’s working, visit http://localhost:4444/api/reports/healthcare
and get the /database/processed/healthcare.json
data. When you have it working, add the same .gitignore
file to the api
package as you did in the etl
package, and commit:
git status
git add .
git commit -m "First pass at API service package. Needs error handling, e.g. should send 404s when we request routes not present."
Recap
You have now built your first two packages with NodeJS, implemented a service-oriented architecture, and used third-party Node packages. You are prepared to begin leveraging package management systems as part of your developer workflow. Handily, the knowledge about NPM that you gained here is applicable to other package management systems, as such systems all arise to deal with similar problems.
It is also worth noting that, from top to bottom, we refactored the same system we built in the previous chapter into one with the same functionality but neater code. In particular, we embraced componentization – we broke our top-level functionalities into separate services and packages. Within the API service, we further modularized our code with module.exports
and require. This sort of componentization is key to success in web programming and, indeed, in any field which involves finding solutions for managing complexity.
Exercises
-
Read Chapter 15 of the Linux Book, “Package Management”.
-
If you are on a Macintosh, download the Homebrew package manager at https://brew.sh.
-
Go find the npmjs.com package pages for CSVTOJSON, Express, and DotEnv. Peruse the documentation for all three.
x. we are not currently doing anything for people who query our API for a route that does exist. Go implement a 404 route in the api/app
module.
convert the etl/index.js
script to scan the database/raw
folder for any CSV files. If it finds any, it should try to convert them to JSON.