Elgg’s Path Forward

Like many older PHP projects, Elgg has lots of problems with tight coupling, procedural patterns, and untestability; and has a very web 1.9 model: spit out full page, add a little Ajax. The good news is that Elgg has a ton of great functionality and ideas embedded in that mess, we have a core team which often can find agreement about dev principles and goals, and we have a new schedule-based release process that ensures that hard work going into the product makes it to release more quickly.

Lately I feel like the Elgg core team is excitedly gearing up for a long hike, during which we’ll make tons of hard decisions and churn lots of code remolding Elgg to look more like a modern JavaScript + PHP API framework.

I’m not sure I want to make that hike.

My suspicion is there’s a shorter route around the mountain; some modern framework may be out there whose team has already put in the hard effort of building something close to what we’re looking forward to. I think the time it would take us to get there would be long and filled with tons of wheel-rebuilding—work that won’t be going into improving UX and which provides no cross-project knowledge gain for Elgg devs.

I’m also wondering if we would be wise to ignore our itching about back end code quality for a bit and focus all attention on the front end and on UI/UX problems. As a plugin developer, I certainly see back end design choices that cause problems, but they’re rarely blockers. I spend a lot more time improving the UX and dealing with our incomplete Ajax implementation. The jewel of the 1.9 release isn’t going to be the dependency container and PSR-0 compatible autoloading; it’ll be the responsive Aalborg theme.

For me, back end refactoring work is fun because it’s relatively easy. You’re changing the way the pieces snap together, not necessarily making them work better or solving new problems. It also keeps me in the comfort zone of working mostly with code and people I’m already familiar with. This is OK for a little while but doesn’t push me to grow.

This isn’t to imply that the core team is infected with Not-invented-here. We definitely want to replace as much home-grown code as possible with well-tested alternatives maintained elsewhere. It’s just a hunch I have that this will be a long process involving tons of decisions that have already been made somewhere else.

I’m still having a lot of fun developing for and in Elgg, but I’m itching to pick up something new, and to work in a system that’s already making good use of and establishing newer practices. Hitching Elgg to another project’s wagon seems adventurous.

I also have to vent that the decision to maintain support for PHP 5.2—a branch that ended long-term support 3.5 years ago—during 1.9.x seems disastrously wrong. 1.9 had a long development process during which a significant amount of high-quality, highly-tested, and actively maintained community code was off-the-table because it wasn’t 5.2 compatible. We had to port some things to 5.2 and fix the resulting bugs, and some unit tests are a mess without Closures; just a huge waste of time. Nor could we benefit from the work being done on Drupal or WordPress because both are GPL, as are a lot of other older PHP projects with 5.2-compatible code. PHP 5.2 is still expressive enough to solve most problems without namespaces, Closures, et al., but in 2014 devs don’t want to code with hands tied behind their backs to produce less readable code that will soon have to be refactored. /rant

Unpacking the Access Control Systems in Drupal and Elgg

Elgg’s access control system, which determines what content a user can view, is somewhat limited and very opinionated, with several use cases—access control lists, friends—baked into the core system. In hopes of making this cleaner and more powerful, I’ve been studying Drupal’s access system. (Caveat: My knowledge in this area of Drupal comes mainly from reading code, schema, docs, and two great overviews by Mike Potter and Larry Garfield, so please chime in if I run off the rails.) 

Drupal

Drupal’s system also influences update and delete permissions, but here I’m only interested in the “view” permission. Also, although Drupal has hook_node_access()—a procedural calculation of permissions for a node (like an Elgg entity) already in memory—I’m focusing on the systems that craft SQL conditions to fetch only nodes visible to the user. This is critical to get right in the SQL, because if your access control relies on code, you can never predict the number of queries required to generate a list for browsing. In this area, Drupal’s realms/grants API (hook_node_grants()is extremely powerful.

Realms and Grants in a Nutshell

At a particular time, a user exists in zero or more “realms”; more or less arbitrary labels which may be based on user attributes, roles, associations, the current system state, time…anything. Each realm has been granted (via DB rows) the permission to view individual nodes. So to query, we build up a user’s list of realms, this is baked into the query, and the DB returns nodes matching at least one realm.

E.g., at 2:30 PM today, an anonymous visitor might be in the realms (public, time_afternoon, season_winter*), whereas Mary, who logged in, might exist in the realms (public, logged_in, user_123, friendedby_345, role_developer, team_A, is_over_30, time_afternoon, season_winter). So Mary will likely see more nodes because her queries provide more opportunity to match grant rows. *Note these realms are made up examples.

Clearly this is very expressive, but Drupal (maybe for better) doesn’t provide many features out-of-the-box, so (maybe for worse) doesn’t build in many realms; the API is mostly a framework for implementing an access control system on top of added features. Contrib modules appear up to the task of providing realms based on all kinds of things (groups, taxonomies, associations with particular nodes), but it’s hard to collaboratively build an access control system, so these modules apparently don’t work well with each other and non-access modules must be careful to tap into the appropriate systems to keep nodes protected.

(Implementation oddities: The grants are done in the node_access table, which probably should’ve been called “node_grants”, especially because this table is only somewhat related to the hook called “node_access“. Less seriously—depending on your system size—each realm name (VARCHAR) is duplicated for every node/realm combination, so there’s some opportunity for normalization.)

Elgg

If you squint, Elgg’s system is a bit similar. Each entity has an “access level” (a realm), with values like “public”, “logged in”, “private”, “friends”, or values representing access control lists (a group or a subset of your friends like a Google+ circle).

That an entity can have only one realm is of course the biggest (and most painful) difference, but also the implementation is significantly complicated by some realms needing to map to different tables. E.g. Elgg has to ensure “friends” maps to rows in an entities relationship table based upon the owner of the entity, while also mapping to the ACL table.

I imagine a lot of these differences come from Drupal being old as time with a lot bigger API reboots, and because Elgg’s access system was targeted to meet the needs of features like friends and user groups, which were built-in from the beginning. It’s hard to predict which schema results in faster queries, and will depend on the use case, but Drupal queries I would guess are easier to generate and safer to alter.

Conclusions

I think in the long run Elgg would be wise to adopt a realms/grants schema, though I would probably suggest normalizing with a separate “realms” table to hold the name and other useful bits. Elgg group ACLs and friend collections would map directly into realms, but friend relationships would need to be duplicated into realms just like groups have an ACL distinct from the membership relationship. Really I think a grants table could completely replace Elgg’s “entity_relationships” table, since both tables just map one entity to others with a name.

As for Drupal, I think the docs could more clearly describe realms and grants (unless I’ve totally got this wrong). I’m less sure of the quality of the API that populates/maintains the tables; it looks like the hooks are pretty low-level ways of asking “would you like to dump some rows into node_access?” and it’s not clear how much of the table must be rebuilt or how often this happens.

Elgg Plugin Tip: Make Your Display Queries Extensible With Plugin Hooks

If you’re building an Elgg plugin that executes queries to fetch entities/annotations/etc. for display, odds are someone else will one day need to alter your query, e.g. to change the LIMIT or ORDER BY clauses. Depending on where your query occurs, he/she may have to override a view, replace an action, replace a whole page handler, or have no choice but to alter your plugin code directly. There’s a better way. Continue reading  

Elgg Core Proposal: New Table “entity_list_items”

[This proposal has been superseded with ElggCollection.]

As of Elgg 1.7.4 there’s no way to specify a specific list or ordering of entities for use in an elgg_get_entities query. E.g. one might want to implement:

  • A “featured items” list for a group or user widget. On a group page this could be implemented in “left” and “right” lists for better display control
  • “Sticky” entities in discussion lists, or any other object lists
  • A “favorites” list for each user

Continue reading