Mike's Fourth Try

Sunday, March 3, 2013

Why the natural sciences rely on affirming the consequent (and that's OK!)

Last weekend, I discussed a paper by Graur et al., which (among many other criticisms and insults) accused the ENCODE consortium of basing certain interpretations on a fallacy in deductive logic called "affirming the consequent." I pointed out how bizarre that criticism was, because "affirming the consequent" is actually a necessary and justified part of reasoning in the natural sciences.

Many readers seemed to be surprised by and skeptical of this claim, and some probably thought it proof of my insanity. I must, first and foremost, once again beg such skeptics to read Jaynes' outstanding book. The first few chapters are actually available as a free pdf, but the whole book is really worthwhile. If you're an academic, you can probably find the book in your library system.

Understanding, however, that the urging of an apparent madman may not be adequate motivation, I thought I'd try to explain a bit more why this is, actually, the case.

A true story about Big Science

Once, I decided to consult the literature for details about how to perform a certain selection test using PAML. I turned to my officemate Matt, and asked if he knew of any papers using it. He suggested three relevant papers, which indeed described details of that test, at least in their supplements. I was an author on two of those papers!

Sunday, February 24, 2013

My thoughts on the immortality of television sets

There's a new GB&E manuscript sensationally blasting a certain widely-reported claim of the 2012 ENCODE Consortium paper, namely that the data generated in that project "enabled us to assign biochemical functions for 80% of the genome." I'm one of 400+ authors on that paper, but I was a bit player - not at all involved in the consortium machinations that resulted in that particular wording, which has proven quite controversial, and has already been discussed/clarified by other authors big and small.

The first author of the new criticism, Dan Graur, is an authority on molecular evolution and authored a popular textbook on that topic (one I own!). The manuscript stridently argues that ENCODE erred in using a definition of "functional element" in the human genome based on certain reproducible biochemical activities, rather than a definition based on natural selection and evolutionary conservation. Interestingly, while the consortium was mostly focused on high-throughput experimental assays to identify the biochemical activities, my modest contributions to ENCODE were entirely based on examining evolutionary evidence, through sequence-level comparative genomics. So, a few comments by a former rogue evolutionary ENCODE-insider:

assert-type: concise runtime type assertions for Node.js

I recently published my first npm package: assert-type, a library to help with writing concise runtime type assertions in Node.js programs.

Background: An OCaml hacker's year with Node.js

The new DNAnexus platform uses Node.js for several back-end components, so I've had to write a fair amount of JavaScript in the year since I joined. Considering I wrote the majority of my grad school code in OCaml, a language found at the opposite end of Steve Yegge's liberal/conservative axis, this has been quite a large adjustment. Indeed, I frequently find myself encountering certain kinds of silly runtime bugs, and writing especially tedious kinds of unit tests, that are both largely obviated in a language like OCaml.

So, I still count myself a hardcore conservative. But there's certainly a lot I've enjoyed about Node.js. When requirements evolve, as they always do, JavaScript and Node's "module system" (those are air quotes) will usually offer quick hacks instead of the careful refactoring that might be demanded by a type-safe language. This incurs technical debt, but a lot of times that's a fine tradeoff, especially at a startup. More generally, Node's rapid code/test/deploy cycle is a lot of fun, without all the build process and binary dependency headaches. The vibrancy of the developer community is amazing, as is the speed at which the runtime itself is improving. (There was a period a few years ago when I feared OCaml was dying out entirely, but there's some real momentum building now.)

Testing OCaml projects on Travis CI

Update (Oct 2013): Anil Madhavapeddy has fleshed this out further.

This evening I spent some time getting unit tests for my OCaml projects to run on Travis CI, a free service for continuous integration on public GitHub projects. Although Travis has no built-in OCaml environment, it's straightforward to hijack its C environment to install OCaml and OPAM, then build an OCaml project and run its tests.

1. Perform the initial setup to get Travis CI watching your GitHub repo (up to and including step two of that guide).

2. Add a .travis.yml file to the root of your repo, with these contents:

language: c
script: bash -ex travis-ci.sh

3. Fill in travis-ci.sh, also in the repo root, with something like this:

# OPAM version to install
export OPAM_VERSION=0.9.1
# OPAM packages needed to build tests
export OPAM_PACKAGES='ocamlfind ounit'

# install ocaml from apt
sudo apt-get update -qq
sudo apt-get install -qq ocaml

# install opam
curl -L https://github.com/OCamlPro/opam/archive/${OPAM_VERSION}.tar.gz | tar xz -C /tmp
pushd /tmp/opam-${OPAM_VERSION}
./configure
make
sudo make install
opam init
eval `opam config -env`
popd

# install packages from opam
opam install -q -y ${OPAM_PACKAGES}

# compile & run tests (here assuming OASIS DevFiles)
./configure --enable-tests
make test

4. Add and commit these two new files, and push to GitHub. Travis CI will then execute the tests.

Working examples: ForkWork, yajl-ocaml

Installing OCaml and OPAM add less than two minutes of overhead, leaving plenty of room for your tests within the stated 15-20 minute time limit for open-source builds. I'm sure the above steps could be used as the basis for an eventual OCaml+OPAM environment built-in to Travis CI.