Fork me on GitHub

TOC

Sharing Links

We're hiring

  • m8jobs at okcupid.com

Contact

  • max at okcupid.com
  • chris at okcupid.com

The Code

 

What is Tame?

Tame (or "TameJs") is an extension to JavaScript, written in JavaScript, that makes event programming easier to write, read, and edit. Tame is very easy to use in Node and other V8 projects. And it can be dropped into projects where desired - no need to rewrite any existing code.

You can jump into Tame on github, but here you'll find some notes on what we learned at OkCupid over the years and why we think Tame is needed for more ambitious projects. (Control-flow libraries are not good enough!) We've written hundreds of thousands of lines of purely async code at OkCupid, and our code has stayed manageable, even after 8 years of development.

Note Tame is not an attempt to dumb down async programming. It's just a cleaner way to write it. Further, your programs will likely have lower latency; with Tame it's a lot easier to keep parallel calls parallel.

A Simple Scenario

Let's say you're running a hot dating site, and a certain user, "Angel", just looked at another user, "Buffy." And here's one tiny piece of your program:

When Angel views Buffy

  • Figure out their match score
  • Request a new, next match for Angel to look at.
  • Record that Angel stalked Buffy, and get back the last time it happened.
  • Send Buffy an email that Angel just looked at her, but only if:
    1. they're a good match, and
    2. they haven't looked at each other recently.

This isn't very complicated logic. In our pre-async minds, our code looks something like this:

handleVisit : function(angel, buffy) {
	var match_score = getScore(angel, buffy);
	var next_match  = getNextMatch(angel);
	var visit_info  = recordVisitAndGetInfo(angel, buffy);
	if (match_score > 0.9 && ! visit_info.last_visit) {
		sendVisitorEmail(angel, buffy);
	}
	doSomeFinalThings(match_score, next_match, visit_info);
}

But of course these are all blocking calls requiring callback events. So our code ends up like this:

handleVisit : function(angel, buffy) {
  getScore(angel, buffy, function(match_score) {
    getNextMatch(angel, function(next_match) {
      recordVisitAndGetInfo(angel, buffy, function(visit_info) {
        if (match_score > 0.9 && ! visit_info.last_visit) {
          sendVisitorEmail(angel, buffy);
        }
        doSomeFinalThings(match_score, next_match, visit_info);
      });
    });
  });
}

There are other ways we could have written it, defining named callback functions, for example. Either way, it's pretty easy to write. But for an outside reader - or you returning to your own code later - it's difficult to follow and far worse to edit or rearrange. And it's just a simple example. In practice, a full async stack means one path through your code has dozens of calls and callbacks littered across all kinds of unnatural functions you were forced to create. Inserting new calls and rearranging are cumbersome.

We learned this about 6 months in at OkCupid. Our web services started out simple and elegant, like the example above, but the more developers added to them, the more absurd our code got. There were some dark days at OkCupid. Once we started integrating more async code: a distributed cache, a pub/sub system, etc., our code got heinous.

(A note for more experienced devs: control-flow libraries helped us fire our code in parallel, but they wouldn't let us throw async calls into the middle of an existing function without hacking that function in half. Later in this page you'll see an example that's horrible with such libraries.)

But back to our example. Worse than ugliness, we've made a programming mistake. All of those calls are made in serial. getNextMatch, getScore, and recordVisit are all contacting different servers, so they should be fired in parallel.

So...how does Tame solve this?

var res1, res2;
await {
	doOneThing(defer(res1));
	andAnother(defer(res2));
}
thenDoSomethingWith(res1, res2);

As shown above, Tame introduces 2 keywords, await and defer. They are used in tandem.

await marks a section of code that depends on externals events, like network or disk activity, or a timer. An await block contains one or more calls to defer. Calling defer constructs a new deferral. Only once all deferrals inside an await block are fulfilled does control continue past it. defer() behaves much like a normal function; it returns an anonymous function that you give to your async functions. These functions "fulfill" their deferrals by calling the callbacks passed to them.

If your callback function is supposed to take arguments, name them as arguments to defer, and they'll be available after the await block completes.

(For language geeks: await is the only new addition to JavaScript language semantics here. It takes the current program continuation and stores it in a hidden JavaScript object. defer is syntactic sugar that calls into the hidden object. If a deferral is the last to be fulfilled in its parent await block, it reactivates the stored program continuation.)

Sound confusing? It's really not when you see it in action. Let's show some example code.

for (var i = 0; i < 10; i++) {
	await { setTimeout (defer (), 100); }
	console.log ("Hello world! " + i);
}

The above code prints "Hello world" 10 times, separated by 100ms. The code looks and feels like threaded code, but it uses preexisting async-style functions. To be clear: this is the setTimeout you know and love, unmodified. TameJS works with all existing async code. setTimeout is expecting a function to execute, and that's what defer() is providing.

What happens if we put two setTimeout calls inside one await?

for (var i = 0; i < 10; i++) {
	await { 
		setTimeout (defer (), 10); 
		setTimeout (defer (), 100); 
	}
	console.log ("Hello world! " + i);
}

Tame's beauty is starting to unfold here: 2 timers are fired at once, and after both have returned the loop continues. So every 100ms it prints "Hello world" and the next number.

Moving the await outside the for loop is acceptable. All 20 timers would fire at once, and this code would let out your feelings in approximately 100ms:

var message = "I'm starting to get turned on.";
await { 
	for (var i = 0; i < 10; i++) {
		setTimeout (defer (), 10); 
		setTimeout (defer (), 100); 
	}
}
console.log (message);

In practice your async functions call back with information, and you want defer to collect that information. Let's use Node's dns.resolve function:

var err, ip;
await { 
	dns.resolve (host, "A", defer (err, ip));
}
if (err) { console.log ("ERROR! " + err); } 
else { console.log (host + " -> " + ip); }

Notice that dns.resolve is expecting a third parameter, a function to call with its result. defer provides a function that collects the results into err and ip.

Tame also lets us name those variables inline, for convenience:

await { 
	dns.resolve (host, "A", defer (var err, ip));
}

And finally, here's our first full Node program, a parallel DNS resolver that looks up all its arguments at once.

var dns = require("dns");

function do_one (ev, host) {
    await dns.resolve (host, "A", defer (var err, ip)); 
    if (err) { console.log ("ERROR! " + err); } 
    else { console.log (host + " -> " + ip); }
    ev();
};

function do_all (lst) {
    await {
        for (var i = 0; i < lst.length; i++) {
            do_one (defer (), lst[i]);
        }
    }
};
do_all (process.argv.slice (2));

All the DNS lookups are parallel, and output is fast:

yahoo.com -> 72.30.2.43,98.137.149.56,209.191.122.70
google.com -> 74.125.93.105,74.125.93.99,74.125.93.104
nytimes.com -> 199.239.136.200
okcupid.com -> 66.59.66.6

If you want to do these DNS resolutions in serial (rather than parallel), then the change from above is trivial: just switch the order of the await and for statements above:

function do_all (lst) {
    for (var i = 0; i < lst.length; i++) {
        await {
            do_one (defer (), lst[i]);
        }
    }
}

Back to Angel and Buffy

And finally, here's the Angel & Buffy code, Tamed.

handleVisit : function(angel, buffy) {

	//
	// let's fire all 3 at once
	//

	await {
		getScore (angel, buffy, defer(var score));
		getNextMatch (angel, buffy, defer(var next));
		recordVisitAndGetInfo (angel, buffy, defer(var vinfo));
	}

	//
	// they've called back, and now we have our data
	//

	if (score > 0.9 && ! vinfo.last_visit) {
		sendVisitorEmail(angel, buffy);
	}
	doSomeFinalThings(score, next, vinfo);
}

The Tame code isn't just easier to read. Remember, it also returns faster, firing all those calls in parallel. And if you want to change it, say by adding another async call or removing one, you won't be ripping functions.

A Real Buffy Example: Looping

This example shows off Tame even better, nesting two loops and doing all kinds of real-world matchmaking.

What To Do When Buffy Hunts Men

  • Request 10 Matches
  • For each one:
    • Get their thumbnail URL from our picture server
      • And ask our vampire server if it's a photo of a vamp
    • If it's not a vamp:
      • Get a personality summary from our personality servers
      • Look up the last time they talked
      • And add it to the soulmates array
  • If we don't have at least 10 soulmates, find some more.

Here's the Tamed solution, heavily commented.

huntMen : function(buffy) {

   var soulmates = [];
 
   while (soulmates.length < 10) {
 
      // Get 10 candidates for Buffy  
      await {  
        getMatches(buffy, 10, defer(var userids));
      }
  
      for (var i = 0; i < userids.length; i++) {
        var u = userids[i];
        await {
          // get their pic from our pic server
          getThumbnail  (u, defer(var thumb));
        }
        await {
          // ask our pic analyzer to review
          isPicAVampire(thumb, defer(var is_vamp));
        }
        if (! is_vamp) {
          await {
            // get 2 more pieces of info
            getPersonality(u, defer(var personality));
            getLastTalked (u, match, defer(var last_talked));
          }
          soulmates.push({
            "userid" : match,
            "thumb"  : thumb,
            "last_talked" : last_talked,
            "personality" : personality
          });
        }
      }
   }
 
   //
   // Our function can now continue to do
   // whatever it wants with
   // soulmates...
   //

}

Note we're not providing an untamed version of the above for comparison. (We had a hard time writing it.) If you're a Node programmer and up to the challenge: send us a version of it using a control-flow library of your choice.

How To Use Tame

In Node grab it with npm:

npm install -g tamejs

And just register the .tjs extension:

require ('tamejs').register (); // register the *.tjs suffix
require ("mylib.tjs");          // then use node.js's import as normal

That's it! Tame will take care of the rest, compiling the tjs file into native JS.

Or to use it from the command line:

tamejs -o <outfile> <infile>
node <outfile> # or whatever you want

How Does It Work?

The key idea behind the TameJs implementation is Continuation-Passing Style (CPS) compilation. On our github page we show a bunch of examples of what we actually do to tamed JS to convert it to real JS.

FAQ

Is it open-source? What license is it?

Yes, MIT. Fork it, dude.

So you guys have been using Tame for years?

Yes, a C++ version of it. (See the paper published at the 2007 USENIX Annual Technical Conference). Using Tame, OkCupid serves externally over 100 million dynamic HTTP requests every day (over 1,000/second on average), each of which fires off calls to all kinds of other services, literally billions of async calls daily. Everything is Tamed, and we'll never look back.

We've been watching the Node community for a while now, and here are our favorite sites/projects: HowToNode, debuggable, and Nodejitsu, and also the framework & middleware Express and Connect. The programmers at those sites have gotten us to turn our interest to Node. But async programming can fail in language scalability, if not performance scalability. JavaScript is missing native support for this kind of control-flow. (It's worth noting C# just added an await primitive! They're onto us.) We have the experience to see what it does to large-scale projects.

Can I use your C++ version of Tame?

Yes, but unlike TameJs, it requires committing to certain other libraries you might not want (sfslite, libasync). TameJs is designed for general use.

What's wrong with a control-flow library? You know, say Step? Or Seq?

First off, we're on the same boat as Tim Caswell, maker of Step, and James Halliday, maker of Seq. We clearly agree that a system like this is needed. If they're like us, they're sick of seeing code like this on github:

MongoDB Blog Example

That poor developer! Good luck changing that code.

As a quick comparison of Tame and Step, let's say you simply want to read a file and log its text in all caps. In Tame it's very simple:

await fs.readFile (__filename, defer (var err, text)); 
if (err) { throw (err): }
console.log (text.toUpperCase ());

Here's the same in Step - with code taken from Step's website. New functions "capitalize" and "showIt" had to be written.

Step(
  function readSelf() {
    fs.readFile(__filename, this);
  },
  function capitalize(err, text) {
    if (err) throw err;
    return text.toUpperCase();
  },
  function showIt(err, newText) {
    if (err) throw err;
    console.log(newText);
  }
);

Also, unlike the Step code, the Tame version can sit comfortably inside another function. If you want to read a file and then use its text in your code, you don't have to split the containing function in half.

Here's a second example from Step's website:

Step(
  // Loads two files in parallel
  function loadStuff() {
    fs.readFile(__filename, this.parallel());
    fs.readFile("/etc/passwd", this.parallel());
  },
  // Show the result when done
  function showStuff(err, code, users) {
    if (err) throw err;
    console.log(code);
    console.log(users);
  }
)

And Tame:

await { 
   fs.readFile (__filename, defer (var e1, code));
   fs.readFile ("/etc/password", defer (var e2, users)); 
}
if (e1) throw e1;
if (e2) throw e2;
console.log (code);
console.log (users);

Also of note: Step is making pretty specific assumptions about how errors are passed.

In general, control-flow libraries aren't bad for simple examples (such as our 1st simple Buffy example), but they are not happy with more complicated flow management (such as our 2nd Buffy example).

Have you heard about StratifiedJS?

Yes - it's also an enhancement of JavaScript and seems to use a similar compilation technique, and we think it's more evidence that the Node community needs something on top of JS.

Do exceptions and tame play well together?

Sort of. Code with try..catch will compile in all cases, and run as expected when not composed with await blocks. But consider the following snippet of traditional JavaScript code:

try {
   setTimeout (function () { throw new Error ("XX"); }, 10);
} catch (e) {
   console.log ("Caught: " + e);
}
The exception thrown after the 10 msec will not be caught, because returning to the base event loops rips apart the call stack. Not surprisingly, this tamed version exhibits the same behavior:
try {
   await setTimeout (defer (), 10);
   throw new Error ("XX");
} catch (e) {
   console.log ("Caught: " + e);
}

I'm still not sold. Ok?

Just try it in an existing project. We hope it will grow on you. Fortunately you don't need to commit to Tame for an entire project.

Latest Release / Issues

It's brand new but working well! See us on github.

Are you guys hiring?

Yes! If you're a Node developer and you'd like to help us build scalable and useful websites and web apps, let us know. We're hiring in NYC. To get in contact, email m8jobs at okcupid.