Tuesday, February 19, 2013

assert-type: concise runtime type assertions for Node.js

I recently published my first npm package: assert-type, a library to help with writing concise runtime type assertions in Node.js programs.

Background: An OCaml hacker's year with Node.js

The new DNAnexus platform uses Node.js for several back-end components, so I've had to write a fair amount of JavaScript in the year since I joined. Considering I wrote the majority of my grad school code in OCaml, a language found at the opposite end of Steve Yegge's liberal/conservative axis, this has been quite a large adjustment. Indeed, I frequently find myself encountering certain kinds of silly runtime bugs, and writing especially tedious kinds of unit tests, that are both largely obviated in a language like OCaml.

So, I still count myself a hardcore conservative. But there's certainly a lot I've enjoyed about Node.js. When requirements evolve, as they always do, JavaScript and Node's "module system" (those are air quotes) will usually offer quick hacks instead of the careful refactoring that might be demanded by a type-safe language. This incurs technical debt, but a lot of times that's a fine tradeoff, especially at a startup. More generally, Node's rapid code/test/deploy cycle is a lot of fun, without all the build process and binary dependency headaches. The vibrancy of the developer community is amazing, as is the speed at which the runtime itself is improving. (There was a period a few years ago when I feared OCaml was dying out entirely, but there's some real momentum building now.)



And, sure, pay down debt when you can, write some tedious unit tests, work through the silly runtime bugs, and in most cases you'll get to something that works every bit as well as you need it to - at which point:

What about those other times?

Rely on Node.js for more than the periphery of your infrastructure, though, and you may eventually find yourself writing JavaScript code for something truly, utterly mission-critical. Here, this conservative has occasionally found himself simply unwilling to ship it without ensuring some measure of type safety. I have an idea of how this code works now, and I've written tests for it as such, but what's going to happen when another engineer with a slightly different idea starts using it next month? Crudely speaking, each argument to my function could have six different types, so with n arguments I've got 6n possible input signatures to account for right there. A lot of the 6n-1 cases I didn't have in mind will result in runtime errors, but some fraction is probably going to lead to subtle coercions with surprising results. (My desk may still show the coffee stain from the spit-take I did upon attempting [8,9,10].sort())

Here I've got to hunker down and painstakingly design my code to reject all unanticipated inputs before starting some irreversible action. So, using a useful assertion library like should.js, I'll end up with preludes everywhere:

function f(a, b, c) {
   arguments.length.should.equal(3);
   a.should.be.a('boolean');
   b.should.be.an.instanceOf(Array);
   c.should.have.keys('x', 'y');
   c.x.should.be.a('number');
   c.y.should.be.a('number');

   ...
}

That's some literate programming. Of course, since the checks only happen at runtime, I should also write pretty exhaustive tests to make sure I didn't miss any:

[undefined, null, 0, 1, NaN, {}, [], '', 'a'].forEach(function(a) {
  (function() { f(a, [], {x: 0, y: 0}); }).should.throw();
});

[undefined, null, false, true, 0, 1, NaN, {}, '', 'a'].forEach(function(b) {
  (function() { f(false, b, {x: 0, y: 0}); }).should.throw();
});


and so on. Then, of course, come the tests for the actually interesting inputs that at least have the expected types.

Having done all that, I can ship knowing I've put at least some protection in place against unintended behaviors of this code when it's [mis]used in the future. To be clear, I don't think such exhaustive input validation is called for in all JavaScript code - it's an approach you might want for the relatively risky stuff, like verifying passwords or permanently deleting data, where you really want to eliminate undefined behaviors to the greatest extent possible.

Type assertions based on composable predicates

My new library assert-type started out just to make those runtime type-checks a bit more concise and specific. For example, it provides a syntax to assert that x is an integer with a statement like T(ty.int)(x), with a healthy selection of simple type predicates you can plug in. This is similar to functionality that can be found in many other libraries, just with a slightly more concise syntax (and, I'm sure many would point out, less readable).

Inspired by the ML and Haskell type systems, I then added ways to compose those simple type predicates to define more complex types. (C++ templates and Java generics also compose like this, but things can get ugly if you really use them in the idiomatic ways of the aforementioned languages.) For example, we can assert that x is a non-empty array of non-empty strings:

T(ty.arr.ne.of(ty.str.ne))(x);

Or, assert that pt is a Cartesian point:

T(ty.obj.of({x: ty.num.finite, y: ty.num.finite}))(pt);

Such composite types can themselves be composed, so we could check for a Cartesian or polar point:

T(ty.or(ty.obj.of({x: ty.num.finite, y: ty.num.finite},
        ty.obj.of({r: ty.num.pos, theta: ty.num.finite})))(pt);

Or a non-empty array thereof:

T(ty.arr.ne.of(ty.or(ty.obj.of({x: ty.num.finite, y: ty.num.finite},
                     ty.obj.of({r: ty.num.pos, theta: ty.num.finite}))))(pts);

It's not as nice as, you know, a compiler that can infer these types and prove that your entire program complies with them, but with this framework we can compactly declare type constraints that might require tens of lines of procedural code to check.

Automatically enforcing function signatures

Lastly, I put in a way to declare the type signature of a function and have it checked automatically at runtime. This declares a int*bool->finite function, which will throw an error if it's called with the wrong number of arguments, or if any of the arguments or return value don't match their declared type:

var moveNuclearControlRods = ty.WrapFun([ty.int, ty.bool], ty.num.finite, 
  function(position, scram) {
    ...
    return coreTemperature;
  });

The type of a function can even be tested and composed, just like any other types. There are also variations of this wrapper for the Node.js asynchronous calling convention (i.e. functions that return through callbacks).

Discussion

There are at least a dozen or two JavaScript packages for writing predicates and assertions on the "atomic" language types, including a few mentioned above. Assert-type goes much farther than most of these, in providing ways to formulate composite types and precisely check them at runtime. The most similar existing Node.js package I found was implement.js, which has some similar functionality but isn't quite as expressive. Adt.js is a largely complementary library for embedding abstract data types, worth a strong look if you're working on symbolic data or manipulating mutable state with a lot of internal constraints.


I believe assert-type has a lot of potential to make the conservative's JavaScript life easier, and there are a number of directions I'd like to flesh out to that end, like variadic functions, polymorphic types, unit test generation, and high-precision error messages. It's already pretty useful now, though, and as always I welcome any feedback.