## Sunday, August 11, 2013

### Building OCaml programs in Cloud9 IDE

Cloud9 IDE is one of several new cloud-based products providing the ability to edit, build, test, and deploy code through a collaborative web application. Cloud9 IDE has one especially powerful feature: each workspace (i.e. project or repo) has a Linux home directory persisted along with the code and other settings, and you get a full bash terminal in this directory with modest but usable resource limits. This means it's possible to install and use a full OCaml toolchain inside, much like my previous effort on Travis CI.

I prepared a script to automate the process of installing OCaml and OPAM inside a Cloud9 IDE workspace. Enter into the terminal in any workspace:
curl -L https://raw.github.com/mlin/c9-ocaml/master/c9-ocaml.sh | bash -ex
eval \$(opam config env)
The OCaml toolchain and OPAM are then ready to go. Here's a screenshot of compiling and running 'Hello, world!':

## Sunday, July 28, 2013

### The human population harbors 172 mutations per non-lethal genome position. What'll happen to them?

A recent Panda's Thumb post highlighted that, given the size of the human genome, the rate of de novo point mutations, and the total size of the population, every non-lethal position can be expected to vary - meaning that, for every genome position or site, there's very likely at least one person (and usually dozens or more) with a new mutation there, so long as it's non-lethal. It's a trivial calculation and, while we could refine it in various ways, the essential point is clear.

 "We are all, regardless of race,genetically 99.9% the same." Right or wrong?
Still, let's try to understand this a bit further. First, an equally simple, entirely compatible fact which might attenuate our surprise: the existence of a couple hundred people with new mutations in a certain site leaves about seven billion without a new mutation there. Indeed, at the vast majority of sites, almost all people are homozygous for the same allele - identical by descent from the hominid lineage.

In that light, here's a deep question one can ask about all those hundreds of billions of de novo mutations: what will be their ultimate fate? Will they all shuffle through the future human population, making our genome's future evolution look like the reels on a slot machine? Or is it going to be rather more like the pitch drop experiment?

## Sunday, May 26, 2013

### A taste of molecular phylogenetics in Julia

I've been meaning for some time to try out Julia, the up-and-coming scientific computing language/environment that might eventually give R, MATLAB, Mathematica, and SciPy all a run for their money. Julia feels familiar if you've used those systems, but it has a lot of more modern language features, an LLVM back-end that produces performant machine code, and integrated support for parallel and distributed computing. The project incubated from 2009-2012 and, with a strong push from applied math groups at MIT, has been gaining steam quickly in the last year.

As far as I could tell via Google, no phylogenetic sequence analysis code has been written in Julia, so this seemed like an interesting place to start. In this post, I'll build up some toy code for commonly-used models of molecular sequence evolution as continuous-time Markov processes on a phylogenetic tree. This will enable a little implementation of Felsenstein's algorithm.

## Thursday, May 23, 2013

### Working with cross-species genome alignments on the DNAnexus platform

I've recently been resurrecting some comparative genomics methods I developed in my last year of grad school, but never got to publish. These build on previous work to locate what we called Synonymous Constraint Elements (SCEs) in the human genome: short stretches within protein-coding ORFs that also encode additional, overlapping functional elements - evidenced by a markedly reduced apparent rate of synonymous substitutions in cross-species alignments. The first step in this analysis, and the subject of this post, involves extracting the cross-species sequence alignments of protein-coding ORFs from raw, whole-genome alignments. I hope to write a series of blog posts as I get various other parts of the pipeline going. I'm not exactly sure where it'll go from there, but it's pretty neat stuff I would eventually like to get peer-reviewed!

## Sunday, May 19, 2013

### Testing MathJax on Blogger

$$\hat{\mathcal{H}} = \frac{4 N_e u}{1+4 N_e u}$$

Followed these instructions, except the place where you can edit the HTML in the Blogger dashboard has moved: it's now under the "Template" tab, then the "Edit HTML" button.