Sunday, August 11, 2013

Building OCaml programs in Cloud9 IDE

Cloud9 IDE is one of several new cloud-based products providing the ability to edit, build, test, and deploy code through a collaborative web application. Cloud9 IDE has one especially powerful feature: each workspace (i.e. project or repo) has a Linux home directory persisted along with the code and other settings, and you get a full bash terminal in this directory with modest but usable resource limits. This means it's possible to install and use a full OCaml toolchain inside, much like my previous effort on Travis CI.

I prepared a script to automate the process of installing OCaml and OPAM inside a Cloud9 IDE workspace. Enter into the terminal in any workspace:
curl -L | bash -ex
eval $(opam config env)
The OCaml toolchain and OPAM are then ready to go. Here's a screenshot of compiling and running 'Hello, world!':

Sunday, July 28, 2013

The human population harbors 172 mutations per non-lethal genome position. What'll happen to them?

A recent Panda's Thumb post highlighted that, given the size of the human genome, the rate of de novo point mutations, and the total size of the population, every non-lethal position can be expected to vary - meaning that, for every genome position or site, there's very likely at least one person (and usually dozens or more) with a new mutation there, so long as it's non-lethal. It's a trivial calculation and, while we could refine it in various ways, the essential point is clear.

"We are all, regardless of race,
genetically 99.9% the same."

Right or wrong?
Still, let's try to understand this a bit further. First, an equally simple, entirely compatible fact which might attenuate our surprise: the existence of a couple hundred people with new mutations in a certain site leaves about seven billion without a new mutation there. Indeed, at the vast majority of sites, almost all people are homozygous for the same allele - identical by descent from the hominid lineage.

In that light, here's a deep question one can ask about all those hundreds of billions of de novo mutations: what will be their ultimate fate? Will they all shuffle through the future human population, making our genome's future evolution look like the reels on a slot machine? Or is it going to be rather more like the pitch drop experiment?

Sunday, May 26, 2013

A taste of molecular phylogenetics in Julia

I've been meaning for some time to try out Julia, the up-and-coming scientific computing language/environment that might eventually give R, MATLAB, Mathematica, and SciPy all a run for their money. Julia feels familiar if you've used those systems, but it has a lot of more modern language features, an LLVM back-end that produces performant machine code, and integrated support for parallel and distributed computing. The project incubated from 2009-2012 and, with a strong push from applied math groups at MIT, has been gaining steam quickly in the last year.

As far as I could tell via Google, no phylogenetic sequence analysis code has been written in Julia, so this seemed like an interesting place to start. In this post, I'll build up some toy code for commonly-used models of molecular sequence evolution as continuous-time Markov processes on a phylogenetic tree. This will enable a little implementation of Felsenstein's algorithm.

Thursday, May 23, 2013

Working with cross-species genome alignments on the DNAnexus platform

I've recently been resurrecting some comparative genomics methods I developed in my last year of grad school, but never got to publish. These build on previous work to locate what we called Synonymous Constraint Elements (SCEs) in the human genome: short stretches within protein-coding ORFs that also encode additional, overlapping functional elements - evidenced by a markedly reduced apparent rate of synonymous substitutions in cross-species alignments. The first step in this analysis, and the subject of this post, involves extracting the cross-species sequence alignments of protein-coding ORFs from raw, whole-genome alignments. I hope to write a series of blog posts as I get various other parts of the pipeline going. I'm not exactly sure where it'll go from there, but it's pretty neat stuff I would eventually like to get peer-reviewed!

Sunday, May 19, 2013

Testing MathJax on Blogger

\hat{\mathcal{H}} = \frac{4 N_e u}{1+4 N_e u}

Followed these instructions, except the place where you can edit the HTML in the Blogger dashboard has moved: it's now under the "Template" tab, then the "Edit HTML" button.

Saturday, May 4, 2013

Lamenting the rise of the bio-brogrammers

A couple years ago the "brogramming" meme spread through Silicon Valley and the broader tech world. One manifestation would feature a hipster-filtered photo of the eponymous brogrammer in Wayfarer sunglasses, popped collar, drink or dumbbell in hand, tapping on his laptop in some incongruous setting. A former colleague of mine featured herself in some of the funniest ones I can remember. Less-tasteful versions included scantily clad women in supporting roles.

The meme started innocently enough, of course: as an expression that the ranks of software engineers aren't just populated by pale, odorous introverts with their glasses held together by tape. But then we thought about the meme and the attitudes that underlie it.

Today, it's de rigueur for tech companies to openly denounce brogramming and its sexist, exclusionary undertones. Not only because it's the right thing to do, but it's also a matter of survival: it's simply too hard to find top talent, and we cannot risk alienating wide swaths of the pool. The fall of brogramming was, of course, just one step in an ongoing journey, which continues to hit roadbumps.


Over the last few days, we saw echoes of this attitude in the genomics and bioinformatics community, when a twitter meme erupted featuring maternal insults phrased in terms of genome size. Most of these were hilarious and innocent. Some were clever but also sexist and degrading - conflating C-value with bra cups, giggling at those TATAs, or interpreting the .bed and .bam file formats as verbs, to call out some unfortunate examples. A few were just horrible.

Thursday, May 2, 2013

Bagram 747 crash video & speculation

An incredible, horrifying video has emerged apparently of the 747 cargo plane crash at Bagram Air Force Base in Afghanistan this past Monday.

Early speculation (and it is speculation) has focused on a shift of the cargo inside the plane, which was reportedly carrying several military vehicles. How could some cargo shifting around possibly cause such an utter catastrophe?

Consider this simple model of how an airplane stays under longitudinal (pitch) control:

Sunday, March 3, 2013

Why the natural sciences rely on affirming the consequent (and that's OK!)

Last weekend, I discussed a paper by Graur et al., which (among many other criticisms and insults) accused the ENCODE consortium of basing certain interpretations on a fallacy in deductive logic called "affirming the consequent." I pointed out how bizarre that criticism was, because "affirming the consequent" is actually a necessary and justified part of reasoning in the natural sciences.

Many readers seemed to be surprised by and skeptical of this claim, and some probably thought it proof of my insanity. I must, first and foremost, once again beg such skeptics to read Jaynes' outstanding book. The first few chapters are actually available as a free pdf, but the whole book is really worthwhile. If you're an academic, you can probably find the book in your library system.

Understanding, however, that the urging of an apparent madman may not be adequate motivation, I thought I'd try to explain a bit more why this is, actually, the case.

Wednesday, February 27, 2013

A true story about Big Science

Once, I decided to consult the literature for details about how to perform a certain selection test using PAML. I turned to my officemate Matt, and asked if he knew of any papers using it. He suggested three relevant papers, which indeed described details of that test, at least in their supplements. I was an author on two of those papers!

Sunday, February 24, 2013

My thoughts on the immortality of television sets

There's a new GB&E manuscript sensationally blasting a certain widely-reported claim of the 2012 ENCODE Consortium paper, namely that the data generated in that project "enabled us to assign biochemical functions for 80% of the genome." I'm one of 400+ authors on that paper, but I was a bit player - not at all involved in the consortium machinations that resulted in that particular wording, which has proven quite controversial, and has already been discussed/clarified by other authors big and small.

The first author of the new criticism, Dan Graur, is an authority on molecular evolution and authored a popular textbook on that topic (one I own!). The manuscript stridently argues that ENCODE erred in using a definition of "functional element" in the human genome based on certain reproducible biochemical activities, rather than a definition based on natural selection and evolutionary conservation. Interestingly, while the consortium was mostly focused on high-throughput experimental assays to identify the biochemical activities, my modest contributions to ENCODE were entirely based on examining evolutionary evidence, through sequence-level comparative genomics. So, a few comments by a former rogue evolutionary ENCODE-insider:

Tuesday, February 19, 2013

assert-type: concise runtime type assertions for Node.js

I recently published my first npm package: assert-type, a library to help with writing concise runtime type assertions in Node.js programs.

Background: An OCaml hacker's year with Node.js

The new DNAnexus platform uses Node.js for several back-end components, so I've had to write a fair amount of JavaScript in the year since I joined. Considering I wrote the majority of my grad school code in OCaml, a language found at the opposite end of Steve Yegge's liberal/conservative axis, this has been quite a large adjustment. Indeed, I frequently find myself encountering certain kinds of silly runtime bugs, and writing especially tedious kinds of unit tests, that are both largely obviated in a language like OCaml.

So, I still count myself a hardcore conservative. But there's certainly a lot I've enjoyed about Node.js. When requirements evolve, as they always do, JavaScript and Node's "module system" (those are air quotes) will usually offer quick hacks instead of the careful refactoring that might be demanded by a type-safe language. This incurs technical debt, but a lot of times that's a fine tradeoff, especially at a startup. More generally, Node's rapid code/test/deploy cycle is a lot of fun, without all the build process and binary dependency headaches. The vibrancy of the developer community is amazing, as is the speed at which the runtime itself is improving. (There was a period a few years ago when I feared OCaml was dying out entirely, but there's some real momentum building now.)

Sunday, February 10, 2013

Testing OCaml projects on Travis CI

Update (Oct 2013): Anil  Madhavapeddy has fleshed this out further.

This evening I spent some time getting unit tests for my OCaml projects to run on Travis CI, a free service for continuous integration on public GitHub projects. Although Travis has no built-in OCaml environment, it's straightforward to hijack its C environment to install OCaml and OPAM, then build an OCaml project and run its tests.

1. Perform the initial setup to get Travis CI watching your GitHub repo (up to and including step two of that guide).

2. Add a .travis.yml file to the root of your repo, with these contents:

language: c
script: bash -ex

3. Fill in, also in the repo root, with something like this:

# OPAM version to install
export OPAM_VERSION=0.9.1
# OPAM packages needed to build tests
export OPAM_PACKAGES='ocamlfind ounit'

# install ocaml from apt
sudo apt-get update -qq
sudo apt-get install -qq ocaml

# install opam
curl -L${OPAM_VERSION}.tar.gz | tar xz -C /tmp
pushd /tmp/opam-${OPAM_VERSION}
sudo make install
opam init
eval `opam config -env`

# install packages from opam
opam install -q -y ${OPAM_PACKAGES}

# compile & run tests (here assuming OASIS DevFiles)
./configure --enable-tests
make test

4. Add and commit these two new files, and push to GitHub. Travis CI will then execute the tests.

Working examples: ForkWorkyajl-ocaml

Installing OCaml and OPAM add less than two minutes of overhead, leaving plenty of room for your tests within the stated 15-20 minute time limit for open-source builds. I'm sure the above steps could be used as the basis for an eventual OCaml+OPAM environment built-in to Travis CI.

Sunday, February 3, 2013

Apartment hunting in Mountain View

Welcome to my fourth try at blogging; the last fell stagnant in 2006. In fact, I meant to start this one a year ago when I first moved here to Silicon Valley, but, better late than never...

I recently sent a friend some advice on apartment hunting in Mountain View, shortly after completing my second housing search here. First things first: it's not cheap. As of this writing, a decent 1br in this area will run $1500-$2000/mo, and a 2br will go for $2000-$2500. It's a bit more expensive in nearby Palo Alto, and a helluva lot worse in San Francisco!

A lot of high-density apartment housing developed in Mountain View in the 60's and 70's, presumably during the initial rise of Silicon Valley. As a result, most of the stock is of that vintage. They get renovated from time to time of course, but this only helps so much. Some warning signs to look for in an apparently nice unit: ungrounded (two-prong) electrical outlets, gravity wall heaters that make a lot of noise warming up and cooling down, lack of kitchen exhaust fan, several layered coats of paint (usually detectable around doorjambs and windowsills), superficial bathroom renovations involving acrylic slapped on the existing tile and tub, adjacency to noise sources such as Caltrain/Central Expwy and freeways (101/85).