For awhile, the context menu add-on (from my previous post) seemed to be helping me out well enough with bugzilla bug ids on MXR and emails and stuff like that. But after re-evaluating my workflow and some of the nice features on the bugzilla.mozilla.org domain itself, I ended up finding linkification of bug ids to be much more useful and productive. Actually, even more than that, I found adding tooltip descriptions to bug ids to be more useful. I often don’t actually want to open the bug, I just want to know what the bug is about, or refresh my memory a bit. So I made a Bug ID Helper Firefox/Thunderbird add-on. Bug Id Helper has three basic features:
Linkification – linkifies bug ids in web page text:
Tooltipification – adds descriptive tooltips to bud ids:
Context-menu – adds menu item when number selected:
By default, this will linkify and add tooltip text to any bug id in every webpage you load, but there are options to just have tooltips, just add links, or only linkify a whitelist of websites. There are also options (that can be edited from add-ons manager) for different combinations of bug information displayed in the tooltip text and although I set it to bugzilla.mozilla.org, you could change it to bugs.gentoo.org or something else in the preferences. For instance, it would linkify this: Bug # 1389 like so.
I like to think that it’s quiet and fast. I’ve had it turned on for a week or so and haven’t really noticed it, which is good. I tried to optimize for the common case (no bug ids on webpage) so speed-wise it shouldn’t be noticeable.
Linkification and tooltipification brought up a lot of issues and I ended up learning a lot about DOM traversal and manipulation, XPath, XHR, and regex speed.
To find the occurances of bug id’s, I wanted to find all the text in the content that looked something like “bug 2375” or “BUG #31721” The very first way I did this was to recursively walk through the DOM tree, executing a regex against the content of any text node and searching any other node that wasn’t in a blacklist of bad nodes (meta, img, applet, etc.). I linkified by snipping out the bug text with the splitText function and replacing these matches with new anchor nodes. This was nice and intuitive, but also very slow. On pages with thousands of lines of text, or a lot of individual text nodes, this parsing would take whole seconds to execute.
Then I found out about XPath. I honestly had no idea. Instead of walking the tree, I queried for all the text nodes in the document (minus ones with bad ancestors), and iteratively searched through each one for bug id occurances. This cut out a serious amount of overhead off of the search and linkification time. Still, some sites took a noticeable amount of time to linkify, so it still wasn’t cool.
Then I looked at the regex. I was looking for occurances of “bug” paired with a number and whitespace/word boundary characters surrounding it. My regex looked something like /(\s|\b|^)+bug\s*(\d+)/, After toying with it for a little bit, I noticed that taking out the test for whitespace in front of the bug id made it dramatically faster, something like this: /bug\s*(\d+)/. This makes sense, because as soon as a character is read in, if its not a ‘b’ then there’s no chance of a match. In fact, just taking out the ‘+’ quantifier and testing for one character of whitespace or boundary made things fast enough, /(\s|\b|^)bug\s*/(\d+)/ . I guess there is a lot of whitespace and commas, etc in text and catching these at the front of a regex is not the best idea. I would love to know more about how regexes and DFAs actually work because for some reason I end up using regexes all the damn time, kind of makes me want to take FLAC in the spring…
So after refining my regex and incorporating XPath, things were faster (2 – 10 times faster to be specific). But there were still some webpages that took enormous amounts of time to grovel through and linkify. There was one shopping website with about six thousand text nodes that took over 4 seconds to go through!
Finally, I noticed how fast the “Find” functionality was in Firefox for finding occurences of words in content. There was one downside, Find (nsIFind) only supported literal text and not regular expressions. But this ended up being fine. While I liked to capture the number in my regex and check for separation characters, I really only wanted to examine regions of text that had the exact string “bug” in them. And on almost every webpage I viewed (like the shopping one) there were no occurances of that word, so Find would know immediately if I needed to pay attention to the webpage at all. Find also had the pleasant side-effect of returning a DomRange where the bug text occured. I could easily change the endpoints of the range and execute my regex against this very small region of text. Furthermore, linkifying it just involved wrapping the range with the surroundContents function. And boy did it speed things up.The average webpage now took about 20 ms to grovel through and I never came across a site that took more than half a second (planet.mozilla.org takes about 200 ms, mxr of browser.js is about 500 ms).
Update: I found out that searching for “bug” with XPath is just as fast as searching for it with nsIFind, the only problem is it doesn’t return the DomRange where it is, just the text node, which you have to search again with your own regex to find the match (unless there is some crazy XPath query for it, let me know!). Needless to say, XPath was easier to work with in this situation, so I switched back to XPath.
I also had a lot of fun with the tooltip and the various ways to get bug information over HTTP, but I’ll save some of that for another post maybe.