Git: Finish features before merging them.

There’s no perfect way to develop software and use source control because projects, teams, and work environments can vary so much; what works for a small office of employees might not for a loose group of part-time contributors spread across many timezones, as many open source projects are.

Jade Rubick is not a fan of long-running feature branches in git and—if I’m reading this right—argues for merging into master frequently, not waiting for an entire feature to be implemented. This is supposed to force the team to be aware of all code changes occurring.

While lack of communication about features in development can certainly be problematic, I think this is a sledgehammer of a solution. Taking this approach to its extreme, it might make sense to have all developers work huddled together so they can say what they’re working on in real-time, or all work on one workstation. My point is that using a workflow with high costs to address a communication deficit is not so great an idea.

My big fear of this workflow is that it eases the flow of incorrect/unwise code into production. Who reviews this code? What if it takes the whole codebase in a bad direction, but no one at the moment has time to realize that? I think the benefits of feature branches/pull requests are just huge:

  1. A branch frees the developer to experiment big and take chances without forcing the rest of the team down their path. The value in some big ideas will not be apparent looking at them piecemeal. Some of this work will lead to great things, some will be tossed away, all of it will be good learning.
  2. Likewise, the PR process can catch incorrect/unwise solutions before they’re merged into the product. This is huge. Some ideas sound great but you only realize 80% into the work that they’re unwise. If that work is sitting in a PR, you just close it and it can live on as a reminder to future devs who get the same idea. If not, you now have the job of shoehorning that code out. On the codebases I work on so many features have been improved/overhauled/abandoned by the review/feedback loop that it seems absolutely crazy to bypass this process. In an async distributed team, I think the PR is basically the perfect code review tool.
  3. PRs provide a great historical and educational record of what changes are involved in providing a certain feature, what all files are involved, etc. I’ve found that reading pull requests and merge diffs to be just as illustrative as reading source code. If a feature required changes in dozens of files over three weeks, how will I ever piece together the 6 commits out of 100 that were important?
  4. Feature branches make it a lot easier to revert a feature or apply it to another branch. For life on the edge I’ve built upon versions of frameworks with experimental branches merged in. If I regret this I can always generate a revert commit to sync back up with a stable branch.

All workflows have costs/benefits, I just think that the benefits of not merging feature branches until they’re really ready are huge compared with the costs Jade described.

My hunch is there must be better ways to keep a team aware of other work being done on feature branches. E.g. Make pull requests as soon as the feature branch is created and push to it as you work. That way team members can set aside time to check in on pull requests in progress and provide feedback.

I agree with Jade that feature flags can be a great idea, but that’s mostly orthogonal to source control workflow.

 

JSMin’s classic delimma: division or RegExp literal

JSMin uses a pretty crude tokenizer-like algorithm based on assumptions of what kind of characters can precede/follow others. In the PHP port I maintain I’ve made numerous fixes to handle most syntax in the wild, but I see no fix for this:

Here’s a very boiled down version of a statement created by another minifier:

if(a)/b/.test(c)?d():e();

Since (a) follows if, we know it’s a condition, and, if truthy, the statement /b/.test(c)?d():e() will be executed.

However, JSMin has no parser and therefore no concept that (a) is an if condition; it just sees an expression. Since lots of scripts in the wild have expressions like (expression) / divisor, we’ve told JSMin to see that as division. So when JSMin arrives at )/, it incorrectly assumes / is the division operator, then incorrectly treats the following / as the beginning of a RegExp literal.

Lessons: Don’t run minified code through JSMin*, and don’t use it at all if you have access to a JavaScript/Java runtime.

*Minify already knows to skip JSMin for files ending in -min.js or .min.js.

Using BINARY in a MySQL IN Expression

With the default collations in MySQL (e.g. utf8_general_ci), strings will be matched case-insensitively:

WHERE col IN ('IT', 'Ruby') will match “it” and “ruby”.

For case-sensitive matches, each string must be preceded by the keyword BINARY:

WHERE col IN (BINARY 'IT', BINARY 'Ruby')

Caveat: I would guess that BINARY affects the matching of strings with combining characters (same character, different byte representation).

When Reasonable, Return Early

When you have a function or piece of code that must handle several different cases, I find it much better to eliminate special cases using return or throw at the beginning. (“returning early”).

if (!input.isValid()) {
    throw new InvalidInputException();
}

// handle input (20 lines)

Several advantages here:

  1. It frees our mind from worrying about those cases as we read the rest of the function. Programmers, as humans, have limited mental stacks, and if we’re reading a function that must handle several cases, each of those cases must be held in our memory until we see the code that handles them.
  2. It simplifies the remaining code, since it’s handling a smaller set of cases.
  3. It compartmentalizes the code into: “handle special cases, then handle the common case.”
  4. It keeps the code that handles special cases close to the checks for those cases.
  5. The primary piece of code stays at the root indent level.
  6. Diff views stay simple when adding/removing special conditions (it does not introduce big indenting changes across many lines).

In fact there’s (what I consider to be) an anti-pattern that involves checking for special cases early, but instead placing the meat of the function in the if block, followed by an else block to handle the special case:

if (input.isValid()) {

    // handle input (20 lines)

} else {
    throw new InvalidInputException();
}

Notice how the handling of the error can now become separated from the cause of the error, and the meat of the code now must be indented. This anti-pattern gets obviously worse the more conditions you must test for:

if (input1.isValid()) {
    if (input2.isValid()) {
        if (input3.isValid()) {
            if (input4.isValid()) {

                // handle input (20 lines)

            } else {
                throw new InvalidInputException('input4 is bad');
            }
        } else {
            throw new InvalidInputException('input3 is bad');
        }
    } else {
        throw new InvalidInputException('input2 is bad');
    }
} else {
    throw new InvalidInputException('input1 is bad');
}

This code is hard to read, hard to edit, and harder to review when changes occur to the conditions:

  1. The invalid handling code appears in the reverse order that the inputs are checked.
  2. The invalid handling code is very far from the conditions (you have to lean on the IDE to see where the braces line up).
  3. The meat of the function starts off 4 indents over.
  4. Adding/removing validity checks is a huge pain that involves re-indenting a bunch of code.
  5. Change diffs can look really messy as indent levels are changed.

Code with a lot of special cases/conditions probably needs to be refactored to separate validation from the processing of the input.

There are certainly situations where returning early is not an obvious win (hence “when reasonable”). This often happens when you have a return value that contains complex state, or some statements must be executed before every return/throw statement.

Apache mod_cache and mod_rewrite: Danger

I solved a very strange bug today where mod_cache kept returning a particular cached file for seemingly every URL on a site.

After using LogLevel debug, I finally realized that mod_cache does not store content by the request URL, but rather by the final URL after RewriteRules have been processed.

This PHP application was using a common RewriteRule to map most requests to a single script, but the full request URL was not being copied into the rewritten URL (the script was just relying on $_SERVER['REQUEST_URI'] to route the request). The result is mod_cache considered all URLs the same, and it was only by some magic of session cache-busting headers that the app managed to function at all. Internally mod_cache was just churning that single cache file over and over.

The simple fix was to make sure the full URL path ended up in the internal URL:

RewriteRule ^(.*)$  index.php/$1 [L,QSA]

Character Encoding Bug of the Day

Today I had one of those bugs that starts out looking simple and keeps going deeper and deeper. Video service Kaltura has a plugin for Moodle, that just stopped working one day (no changes on the server).

  • It’s throwing an exception because an expected element isn’t in the page.
  • Oh, the element’s supposed to be delivered by XHR from the plugin.
  • But the plugin’s code is generating correct markup…
  • Why is Moodle’s function to serialize an array into a JS function call returning null for that markup?
  • json_encode is converting the markup string to null?
  • Because json_encode is choking on invalid UTF-8.
  • Because the markup has a right single quotation encoded in Windows-1252 :(
  • And that string is coming from the Kaltura API.

So over 2 years ago someone named a video player Jim’s Test Player and over the weekend Kaltura’s API started returning that single quote in Windows-1252. We removed the quote from the name and the problem disappeared.

Simpler Masonry + Sortable Working Together

Since jQuery Masonry repositions elements purely by positioning, it does not play well with UI Sortable. People have posted complex solutions to this problem, but this simpler solution worked for me:

  1. Refresh masonry layout on Sortable’s start, change, and stop events
  2. While dragging, remove from the dragged item the class used to indicate it’s a masonry item
var $c = $('#my_container');
$c.masonry({
    itemSelector: '.masonry-item'
});
$c.sortable({
    start: function (e, ui) {
        ui.item.removeClass('masonry-item');
        $c.masonry('reload');
    },
    change: function (e, ui) {
        $c.masonry('reload');
    },
    stop: function (e, ui) {
        ui.item.addClass('masonry-item');
        $c.masonry('reload');
    }
});

Installing xhprof on XAMPP for OSX Lion

Directions adapted from Ben Buckman.

Download xhprof.

cd path/to/xhprof/.../extension

# If you don't have autoconf... I didn't.
sudo chmod o+w /usr/local/bin  #(brew needs to write a symlink there)
brew install autoconf

sudo /Applications/XAMPP/xamppfiles/bin/phpize

Make sure you have a CLI C compiler. I installed one via XCode.

sudo MACOSX_DEPLOYMENT_TARGET=10.6 CFLAGS='-O3 -fno-common -arch i386 -arch x86_64' LDFLAGS='-O3 -arch i386 -arch x86_64' CXXFLAGS='-O3 -fno-common -arch i386 -arch x86_64' ./configure --with-php-config=/Applications/XAMPP/xamppfiles/bin/php-config-5.3.1

sudo make

sudo make install

sudo nano /Applications/XAMPP/xamppfiles/etc/php.ini

Add these lines to php.ini:

[xhprof]
extension = xhprof.so
xhprof.output_dir = "/Applications/XAMPP/xhprof-logs"

Restart Apache.

jQuery.Deferred() is pretty easy

I was using an asynchronous file uploader and, for usability, wanted to make sure the upload progress bar was displayed for at least a couple seconds before changing the view. The jQuery.Deferred object made this a breeze, eliminating a bunch of callback/isDone checking mess:

var uploadFinished = $.Deferred(),
    timerFinished = $.Deferred();
$.when(uploadFinished, timerFinished).done(function () {
    changeView();
});

// immediately after starting upload
setTimeout(timerFinished.resolve, 2000);

// in upload completed event handler
uploadFinished.resolve();

The docs make the Deferred object a little more complicated than it really is. You don’t have to have to alter processes to return Deferred/Promise objects, you can just make them and pass them around as needed in a pinch.