Elgg, ElggChat, and Greener HTTP Polling

At my new job, we maintain a site powered by Elgg, the PHP-based social networking platform. I’m enjoying getting to know the system and the development community, but my biggest criticisms are related to plugins.

On the basis of “keeping the core light”, almost all functionality is outsourced to plugins, and you’ll need lots of them. Venturing beyond the “core” plugins—generally solid, but often providing just enough functionality to leave you wanting—is scary because generally you’re tying 3rd-party code into the event system running every request on the site. Nontrivial plugins have to provide a lot of their own infrastructure and this seems to make it more likely that you’ll run into conflict bugs with other plugins. With Elgg being a small-ish project, non-core plugins tend to end up not well-maintained, which makes the notion of upgrading to the latest Elgg version a bit scary when there have been API changes. Then there’s the matter of determining in what order your many plugins sit in the chain; order can mean subtle differences in processing and you just have to shift things around hoping to not break something while fixing something else. Those are my initial impressions anyway, and no doubt many other open source systems relying heavily on plugins have these problems. There’s a lot of great rope to hang yourself with.

Jeroen Dalsem’s ElggChat seems to be the slickest chat mod for Elgg. Its UI more or less mirrors Facebook’s chat, making it instantly usable. It’s a nice piece of work. Now for the bad news (as of version 0.4.5):

  • Every tab of every logged in user polls the server every 5 or 10 seconds. This isn’t a design flaw—all web chat clients must poll or use some form of comet (to which common PHP environments are not well-suited)—but other factors make ElggChat’s polling worse than it needs to be:
  • Each poll action looks up all the user’s friends and existing chat sessions and messages and returns all of that in every response. If the user had 20 friends, a table containing all 20 of them would be generated and returned every 5 seconds. The visible UI would also become unwieldy if not unusable.
  • The poll actions don’t use Elgg’s “action token” system (added in 1.6 to prevent CSRFs). This isn’t much of a security flaw, but in Elgg 1.6 it fills your httpd logs with “WARNING: Action elggchat/poll was called without an action token…” If you average 20 logged in users browsing the site, that’s 172,800 long, useless error log entries (a sea obscuring errors you want to see) per day. Double that if you’re polling at 5 seconds.
  • The recent Elgg 1.7 makes the action tokens mandatory so the mod won’t work at all if you’ve upgraded.
  • Dalsem hasn’t updated it for 80 days, I can’t find any public repo of the code (to see if he’s working on it), and he doesn’t  respond to commenters wondering about its future.

The thought of branching and fixing this myself is not attractive at the moment, for a few reasons (one of which being our site would arguably be better served by a system not depending on the Elgg backend, since we have content in other systems, too), but here are some thoughts on it.

Adding the action token is obviously the low hanging fruit. I believe I read Facebook loads the friends and status list only every 3 minutes, which seems reasonable. That would cut most of the poll actions down to simply maintaining chat sessions. Facebook’s solution to the friends online UI seems reasonable: show only those active, not offline users.

“Greener” Polling

Setting aside the ideal of comet connections, one of the worst aspects of polling is the added server load of firing up server-side code and your database for each of those extra (and mostly useless) requests. A much lighter mechanism would be to maintain a simple message queue via a single flat file, accessible via HTTP, for each client. The client would simply poll the file with a conditional XHR GET request and the httpd would handle this with minimal overhead, returning minimal 304 headers when appropriate.

In its simplest form, the poll file would just be an alerting mechanism: To “alert” a client you simply place a new timestamp in its poll file. On the next poll the client will see the timestamp change and immediately make another XHR request to fetch the new data from the server-side script.

Integrating this with ElggChat

In ElggChat, clicking a user creates a unique “chatsession” (I’m calling this “CID”) on the server, and each message sent is destined for a particular CID. This makes each tab in the UI like a miniature “room”, with the ability to host multiple users. You can always open a separate room to have a side conversation, even with the same user.

In the new model, before returning the CID to the sender, you’d update the poll files of both the sender and recipient, adding the CID to each. When the files are modified, you really need to keep only a subset of recent messages for each CID. Just enough to restore the chat context when the user browses to a new page. The advantage is, all the work of maintaining the chat sessions and queues is only done when posts are sent, never during the many poll requests.

Since these poll files would all be sitting in public directories, their filenames would need to contain an unguessable string associated with each user.