mozregression update

A good while ago I made a regression range finder for Firefox nightlies. It was a Python script intended to make it easier for people to manually find the regression ranges of bugs on the nightlies. The other day, someone actually used it! So I decided to revisit it and fix it up and make it easier to install. There’s more info about it here, but here’s a quick summary:

install with setuptools or pip if you know how to do that, otherwise checkout out the OS-specific installation instructions

run on the command line:

mozregression --good=2010-03-19 --bad=2010-09-12

Several nightlies will be downloaded and automatically run for you, asking you to check for the existence of the bug in each one, ending up with something like:

Last good nightly: 2010-09-08 First bad nightly: 2010-09-09

Pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=36f5cf6b2d42&tochange=8e0fce7d5b49

if you have questions, ask harth on #qa at irc.mozilla.org.


Jetpack: Showing Search Terms in Awesomebar

Note: this is not about searching from the awesomebar.


download addon·code on github

The awesomebar is one of my favorite things about Firefox. Especially compared to alternatives in other browsers – the awesomebar remains the fastest way for me to find where I want to go. Sometimes a site’s url and title aren’t enough to jog my memory however. At some point I wished Firefox would do full-text indexing for me. Then I realized that Google was already doing that, and there might be a way to hijack that power – by remembering what search terms you used to find the site and displaying that in the awesomebar. You can’t really get better than search terms, they put the value a site gives in your own words. Of course, you might find the site valuable for other things after viewing it, then you have no real choice but to manually tag it (or do you?).

I made a quick extension with Jetpack to do just that, implemented in the simplest possible way. It parses the referrer for ‘q=’, and appends that to the page’s user-set title (which is matched on in the awesomebar). It can be very helpful, but it was less helpful than I thought. It would be more helpful if the referrer persisted across link navigation, the referrer didn’t terminate after #, and more sites used ‘q’ to hold the query, but it was an interesting experiment, and I’d love to see more experiments in Firefox inferring tags for webpages.

Jetpack impressions

I enjoyed using the new Jetpack SDK, I’m a big fan. It took more than a couple minutes to get started but the docs were excellent and I love the CommonJS. I had to bust out the Cc but that’s just because the Places API isn’t quite ready yet. I only had to write 42 lines of code. I can confidently say I won’t be making a ‘regular’ addon again.


Javascript Neural Networks

There are tons of neural network implementations out there, but not many in JavaScript. This is pretty surprising given that JavaScript is awesome and neural networks could really benefit from being in the browser. One partial implementation was used to do some sweet Captcha OCR, and my last post was about using them to determine whether to display black or white text over a given background color.

I ended up creating brain, the missing JavaScript neural network library. I tried to make it easy to use. To use it you don’t have to know what a hidden layer is (but you can specify hidden layers if you want), you can also specify input (and expected output, for training) as hashes instead of arrays – good for sparse or labelled input, and you can pass trained networks around in JSON, which is useful with Worker threads.

If you want to find out more using neural networks from a programmatic perspective, this is a good introduction that just popped up.


Black or White Text? WCAG vs. Neural Networks

The other day I ran into a problem that some developers of color tools probably run into, which is: given a color, do you display white or black text over it? An algorithm based on the WCAG luminosity contrast ratio measure works great on most colors, but can be spotty. After looking at the WCAG formula, it seemed this problem was a good candidate for neural networks.

A couple weeks ago I finally sat down to try out the idea, and made a page where you can train your own neural network when to use black vs. white text (bonus: Worker thread example). After training it on a dozen or so colors, it matches the WCAG algorithm in most cases and does better in others. It’s kind of crazy that you can just throw data at this network and it will learn it. Not every problem can be solved by a neural network, but it’s cool to see it work when the problem fits. Might try out some other classifiers next.

I ended up making a simple JavaScript neural network implementation, you can check out the code on github.


FireFontFamily Extension for Firebug

FireFontFamily Firebug addon highlighting the rendered font in Firebug

addon·github

Highlights which font-family Firefox used from the font-family property. Another product of late-night experiments in font discovery. I think I like having this exposed in Firebug rather than in a separate addon like Context Font.


Context Font

Context Font Firefox addon screenshot

addon·github


Playing Around With js ctypes on Linux

I’m pretty elated about the ctypes module introduced in Firefox 3.6. ctypes.jsm is a module that lets chrome code call functions from shared libraries. This is a big win for a lot of extension developers. The baseline is that you no longer have to create an XPCOM component to call C++ code from javascript.

I had wanted to speed up some calculation-heavy js I was using in my extension by writing it in C++. Writing a C++ XPCOM component would probably force me to take up smoking again, so for my health I decided to ditch the effort…then I found out about ctypes!

Ctypes can help with calling Win API functions and such, and there are some examples of this on the ctypes.jsm wiki page for this. My use case however was loading my own shared library, so I decided to put together a short end-to-end tutorial on how to call your own C code from your extension.

First we write a little C function and put it in add.c:

int add(int a, int b) {
  return a + b;
}

To get a shared library from this code, compile with these commands:

gcc -fPIC -c add.c
gcc -shared -o add.so add.o

Now you have a file called add.so, which we can load from ctypes. Say you put add.so in your addon’s content directory, then your javascript might look something like this:

Components.utils.import("resource://gre/modules/ctypes.jsm");

function add(a, b) {
  var file = getFile("chrome://myext/content/add.so");
  var lib = ctypes.open(file); 

  var addc = lib.declare("add",
                           ctypes.default_abi,
                           ctypes.int32_t, // return type
                           ctypes.int32_t, // arg1 type
                           ctypes.int32_t // arg2 type
  );
  return addc(a, b);
}


function getFile(chromeURL) {
  // convert the chrome URL into a file URL
  var cr = Components.classes['@mozilla.org/chrome/chrome-registry;1']
           .getService(Components.interfaces.nsIChromeRegistry);
  var io = Components.classes['@mozilla.org/network/io-service;1']
           .getService(Components.interfaces.nsIIOService);
  var uri = io.newURI(decodeURI(chromeURL), 'UTF-8', null);
  var fileURL = cr.convertChromeURL(uri);
  // get the nsILocalFile for the file
 return  fileURL.QueryInterface(Components.interfaces.nsIFileURL).file;
}

Now try calling your add function from js. The getFile function isn’t that pretty, hopefully in the future there will be a way to open a library from a chrome url. Also, you can’t use ctypes from Worker threads, so that is very sad.


Getting the Color Scheme of a Website Using Canvas and Hierarchical Clustering

For a Firefox addon I’m working on I wanted to grab the color scheme of whatever website the user is viewing. There are a few extensions that do something like this: Colorzilla and Palette Grabber get the colors used by the website’s DOM, and ColorSuckr gets the most common colors from an image. The problem with getting the colors from the DOM is that websites use images and gradients so sometimes you can’t get the overall color scheme just from looking at the CSS colors.

Luckily, you can capture the color of every pixel on a webpage from an extension using the HTML 5 standard canvas. You can draw any image onto a canvas and get an array of pixel values used by the image with the getImageData function. In Firefox, you can actually draw the entire webpage onto a canvas using the drawWindow function (drawWindow is a Firefox-only canvas extension, but you can at least use drawImage in other browsers).

So getting the number of occurrences of each color is as simple as drawing the page to a canvas, looping through each pixel value in the array returned by getImageData, and tallying each color’s frequency in a javascript hash. This is what you get when performing this analysis on the Twitter homepage:

So you can get the colors that occurred most on the page pretty easily, there is however, one big problem with this. On images and gradients, there are areas of very similar colors that might as well be the same color in the context of the overall theme. As I’ve found out, there are usually over 10,000 different colors on each webpage, so these colors need to be grouped together.

This kind of problem is called clustering, and it’s a problem that comes up a lot in image analysis, computational biology, and other computational disciplines. There are two common clustering algorithms, k-means and hierarchical clustering. K-means can be faster than hierarchical clustering, but the problem with k-means is that you have to know what k is before you even start — you have to know exactly how many clusters you want to end up with. That can’t be determined in this situation, so hierarchical clustering is the best bet.

The premise of hierarchical clustering is simple. Each color starts out in its own cluster. On each pass of the algorithm, the two clusters that are most similar (according to a metric you define yourself) are merged. You keep on doing this until there are no more clusters that are similar enough to merge.

I defined the distance of two colors to be the maximum difference between the two colors’ red, green, and blue components. Two colors were ‘similar enough’ if their distance was less than 12 (where each color component ranged from 0 to 255). When two clusters were merged, the new cluster’s representative color was a weighted average of the two clusters’ representative colors. The algorithm worked pretty well for this application, check out the results:

The algorithm takes a long time even if you just focus on the top few hundred colors (a few seconds), but that’s what Web Workers are for, after all. You can check out the code here. Do you know a faster clustering algorithm? How would you quantify the distance between two colors?

Update: after getting some great feedback, I refined the distance measure to be the 3d distance between the two colors in the L*a*b* color space (the CIE76 distance measure) , thanks to the developer of FireColour for the open source L*a*b* conversion code.


Regression Range Finder for Firefox Nightly Builds

regression range finder in action on Windows

project page·github

UPDATE: mozregression has been polished up see http://harthur.github.com/mozregression

Last week I came across a bug that had snuck into Firefox sometime after 3.0. I went to go find the regression range using hg bisect but quickly realized this wouldn’t work for a regression that occurred so long ago – the dependencies for Linux had changed and building the old source was a pain. So I went to go start pinning down the range using the mozilla-central nightlies. This usually takes a couple hours and I was tired of doing this and miscalculating the bisect steps, so I wrote a python script to do practically all the work for me.

The script takes a ‘good’ date and a ‘bad’ date as arguments and will narrow down the range by executing a binary search on the mozilla nightlies, it will download each build, install it, then pop open a new window in the nightly. You do whatever you have to do to verify the bug’s presence, then enter ‘good’ or ‘bad’ into the command prompt depending on whether the bug appeared in that nightly. It will do this a few times to narrow down the range.

When you’ve checked enough nightlies (about log n nightlies, if your initial regression range is n days), you’ll see something like:
Last good nightly: 2009-06-12 First bad nightly: 2009-06-13
Which you can then paste into bugs to make people very happy (-:

Check out the installation instructions. Some future plans include:

* Automatic tests. Using them to find the regression range with no interaction required (targeting Mozmill tests first, then mochitest and the others)
* [done] Other Applications. Run other Mozilla nightlies like Thunderbird (this shouldn’t be too hard because the script uses mozrunner <3)
* Other branches. Not just mozilla-central.
* [done] Mac. Get it working on here.


Configure Apache To Accept Cross-Site XMLHttpRequests on Ubuntu

1. Make sure you have the mod_headers Apache module installed. to do this check out /etc/apache2/mods-enabled/ and see if there’s a ‘headers.load’ in there. If there isn’t then just sudo ln -s /etc/apache2/mods-available/headers.load /etc/apache2/mods-enabled/headers.load

2. Add the Access-Control-Allow-Origin header to all HTTP responses. You can do this by adding the line Header set Access-Control-Allow-Origin "*" to the desired <Directory> section in your configuration file (like the /etc/apache2/sites-available/default file). Saying "*" will allow cross-site XHR requests from anywhere. You can say "www.myothersite.com" to only accept requests from that origin.

3. Reload apache server. sudo /etc/init.d/apache2 reload

Maybe this is really obvious to a lot of people, but it wasn’t to me, so there you go.


Follow

Get every new post delivered to your Inbox.