String Subtypes for Safer Web Programming

Valid HTML markup involves several different contexts and escaping rules, yet many APIs give no precise indication of which context their string return values are escaped for, or how strings should be escaped before being passed in (let’s not even get into character encoding). Most programming languages only have a single String type, so there’s a strong urge to document function with @param string and/or @return string and move on to other work, but this is rarely sufficient information.

Look at the documentation for WordPress’s get_the_title:

Returns

(string) 
Post title. …

If the title is Stan "The Man" & Capt. <Awesome>, will & and < be escaped? Will the quotes be escaped? “string” leaves these important questions unanswered. This isn’t meant to slight WordPress’s documentation team (they at least frequently give you example code from which you can guess the escaping model); the problem is endemic to web software.

So for better web security—and developer sanity—I think we need a shared vocabulary of string subtypes which can supply this missing metadata at least via mention or annotation in the documentation (if not via actual types).

Proposed Subtypes and Content Models

A basic set of four might help quite a bit. Each should have its own URL to explain its content model in detail, and how it should be handled:

Unescaped
Arbitrary characters not escaped for HTML in any way, possibly including nulls/control characters. If a string’s subtype is not explicit, for safety it should be assumed to contain this content.
Markup
Well-formed HTML markup matching the serialization of a DocumentFragment
TaglessMarkup
Markup containing no literal less-than sign (U+003C) characters (e.g. for output inside title/textarea elements)
AttrValue
TaglessMarkup containing no literal apostrophe (U+0027) or quotation mark (U+0022) characters, for output as a single/double-quoted attribute value

What would these really give us?

These subtypes cannot make promises about what they contain, but are rather for making explicit what they should contain. It’s still up to developers to correctly handle input, character encoding, filtering, and string operations to fulfill those contracts.

The work left to do is to define how these subtypes should be handled and in what contexts they can be output as-is, and what escaping needs to be applied in other contexts.

Obvious Limitations

For the sake of simplicity, these subtypes shouldn’t attempt to address notions of input filtering or whether a string should be considered “clean”, “tainted”, “unsafe”, etc. A type/annotation convention like this should be used to assist—not replace—experienced developers practicing secure coding methods.

RotURL: Rot13 for URLs

RotURL is a simple substitution cipher for encoding/obscuring URLs embedded in other URLs (e.g. in a querystring). Also, common chars that need to be escaped (:/?=&%#) are mapped to infrequently used capital letters, so this generally yields shorter querystrings, too.

/**
 * Rot35 with URL/urlencode-friendly mappings. To avoid increasing size during
 * urlencode(), commonly encoded chars are mapped to more rarely used chars.
 */
function rotUrl($url) {
    return strtr($url,
        './-:?=&%# ZQXJKVWPY abcdefghijklmnopqrstuvwxyz123456789ABCDEFGHILMNORSTU',
        'ZQXJKVWPY ./-:?=&%# 123456789ABCDEFGHILMNORSTUabcdefghijklmnopqrstuvwxyz');
}

rotUrl('https://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=Base64#foo')
    == '8MMGLJQQ5EZR9B9G5491ZFI7QRQ9E45SZG8GKM9MC5VxG5391CPcjx51I38WL51I38Vk1L5fdY6FF';
rotUrl(rotUrl($anyUrl)) = $anyUrl;

You could save a few more bytes by encoding the schema (e.g. “h” for http://, “H” for https://). Since your end encoding has to be URL-safe, there’s not much you can do beyond this to compress a URL embedded in a URL.

NetBeans Love & Hate

For those cases where you have to work on remote code, NetBeans‘ remote project functionality seems to put it ahead of other PHP IDEs. It pulls down a tree of files and uploads files that you save. Having a local copy allows it to offer its full code comprehension, auto-complete, and great rename refactoring for “remote” code. In contrast Eclipse allows you to open remote files using Remote System Explorer, but you only get PHP syntax highlighting, not the excellent PDT.

But NetBeans is not all smiles and sunshine. Continue reading  

Helping Netbeans/PhpStorm with Autocomplete/Code-hinting

Where Netbeans can’t guess the type/existence of a local variable, you can tell it in a multiline comment:

/* @var $varName TypeName */

After this comment (and as long as TypeName is defined in your project/project’s include path), when you start to type $varName, Netbeans will offer to autocomplete it, and will offer TypeName method/property suggestions. If you rename the variable with Ctrl+r (rename refactoring), Netbeans will change the comment, too.

I usually forget this syntax because type comes first in @param declarations.

Update: PhpStorm supports a similar syntax, but reversing the type and variable name:

/* @var TypeName $varName */

Shibalike: a PHP emulation of a Shibboleth environment

Update 2011-06-23: All the essential components of Shibalike are complete and the project is hosted on Github. This is currently blueprint-ware, but I’m excited about it.

A co-worker and I have laid out a design for a flexible system that can emulate a working Apache/PHP Shibboleth stack, without requiring any outside resources (e.g. an IdP, mod_shib, shibd). I see this as useful in several cases/for several reasons:

  • Setting up your own IdP for testing would be a pain and a maintenance headache.
  • Depending on your institution, getting attributes approved for release to a new host may be time-consuming or impossible.
  • Shibboleth won’t work on http://localhost/.
  • You want to be able to test/experience a similar sign in process on localhost as users do in production.
  • You want to be able to test your PHP-based shibboleth auth module without a working shib environment.
  • You want to emulate an IdP problem, or allow a secondary auth method to kick in if the IdP is down (without switching auth adapters).
  • You might want to “hardcode” an identity for a unit/integration test
  • You might want to give a select group the ability to login under a testing identity after they authenticate at the real IdP.

Continue reading  

Filtering WordPress Navigation Menus

WordPress 3 introduced native navigation menus, usually created via the wp_nav_menu theme function and the built-in Custom Menu widget.  If you have a menu with a deep hierarchy, you may want to display only the active branch and the submenu directly below the active item (given class “current-menu-item” by WordPress). You can see this menu behavior on the left in the UF College of Education Faculty site.

This is sadly not doable with CSS. I originally did it with Javascript, but the requirements of progressive enhancement required that the whole menu be output before attacking the DOM. Depending on the page, this caused a distracting collapse of the menu during the layout.

The class Coewp_MenuFilter does this menu filtering server-side. Until I wrap this in a plugin, just put it in your theme folder:

// in functions.php
require dirname(__FILE__) . '/MenuFilter.php';
Coewp_MenuFilter::add();

How it works

add() attaches a filter to the “wp_nav_menu” hook, so WordPress passes menu HTML through the class’s filter() method before returning it in wp_nav_menu(). In filter(), the HTML is converted to a DOMDocument object, which is edited using DOM methods (with XPath available, this version was almost a direct port of the jQuery version). After cutting it down, the DOM tree is re-serialized to HTML.

I was really hoping this filtering could be done before the HTML was created, say by subclassing WP’s Walker_Nav_Menu class, but this proved difficult to debug.

Simpler API for Zend’s built-in Firebug Logger

Zend Framework has functionality to send messages to the Firebug console (via Firefox’s FirePHP addon), but if you’re not using the ZF front controller, the API is a bit of a pain. Besides your instance of Zend_Log, you must keep track of a few additional objects just to manually flush the headers after all your logging calls. Since I knew the old FirePHP class didn’t need this follow-up step, I figured I could just flush the headers after each send.

The result is FireLog. On the FireLog instance, calls to methods like log(), info(), warn(), etc. are proxied to an internal Zend_Log instance, while the methods send(), group(), and groupEnd() are proxied to the static methods on Zend_Wildfire_Plugin_FirePhp. In both cases the headers are set immediately using some simple ZF subclassing. Continue reading