23 Jul

Elgg’s Path Forward

Like many older PHP projects, Elgg has lots of problems with tight coupling, procedural patterns, and untestability; and has a very web 1.9 model: spit out full page, add a little Ajax. The good news is that Elgg has a ton of great functionality and ideas embedded in that mess, we have a core team which often can find agreement about dev principles and goals, and we have a new schedule-based release process that ensures that hard work going into the product makes it to release more quickly.

Lately I feel like the Elgg core team is excitedly gearing up for a long hike, during which we’ll make tons of hard decisions and churn lots of code remolding Elgg to look more like a modern JavaScript + PHP API framework.

I’m not sure I want to make that hike.

My suspicion is there’s a shorter route around the mountain; some modern framework may be out there whose team has already put in the hard effort of building something close to what we’re looking forward to. I think the time it would take us to get there would be long and filled with tons of wheel-rebuilding—work that won’t be going into improving UX and which provides no cross-project knowledge gain for Elgg devs.

I’m also wondering if we would be wise to ignore our itching about back end code quality for a bit and focus all attention on the front end and on UI/UX problems. As a plugin developer, I certainly see back end design choices that cause problems, but they’re rarely blockers. I spend a lot more time improving the UX and dealing with our incomplete Ajax implementation. The jewel of the 1.9 release isn’t going to be the dependency container and PSR-0 compatible autoloading; it’ll be the responsive Aalborg theme.

For me, back end refactoring work is fun because it’s relatively easy. You’re changing the way the pieces snap together, not necessarily making them work better or solving new problems. It also keeps me in the comfort zone of working mostly with code and people I’m already familiar with. This is OK for a little while but doesn’t push me to grow.

This isn’t to imply that the core team is infected with Not-invented-here. We definitely want to replace as much home-grown code as possible with well-tested alternatives maintained elsewhere. It’s just a hunch I have that this will be a long process involving tons of decisions that have already been made somewhere else.

I’m still having a lot of fun developing for and in Elgg, but I’m itching to pick up something new, and to work in a system that’s already making good use of and establishing newer practices. Hitching Elgg to another project’s wagon seems adventurous.

I also have to vent that the decision to maintain support for PHP 5.2—a branch that ended long-term support 3.5 years ago—during 1.9.x seems disastrously wrong. 1.9 had a long development process during which a significant amount of high-quality, highly-tested, and actively maintained community code was off-the-table because it wasn’t 5.2 compatible. We had to port some things to 5.2 and fix the resulting bugs, and some unit tests are a mess without Closures; just a huge waste of time. Nor could we benefit from the work being done on Drupal or WordPress because both are GPL, as are a lot of other older PHP projects with 5.2-compatible code. PHP 5.2 is still expressive enough to solve most problems without namespaces, Closures, et al., but in 2014 devs don’t want to code with hands tied behind their backs to produce less readable code that will soon have to be refactored. /rant

17 Jul

Get rid of variable variable syntax

Uniform Variable Syntax was voted in (almost unanimously) for PHP6 PHP7 and introduces a rare back compatibility break, changing the semantics of expressions like these:

                           // old meaning            // new meaning
1. $$foo['bar']['baz']     ${$foo['bar']['baz']}     ($$foo)['bar']['baz']
2. $foo->$bar['baz']       $foo->{$bar['baz']}       ($foo->$bar)['baz']
3. $foo->$bar['baz']()     $foo->{$bar['baz']}()     ($foo->$bar)['baz']()
4. Foo::$bar['baz']()      Foo::{$bar['baz']}()      (Foo::$bar)['baz']()

IMO the “variable variable” syntax $$name is a readability disaster that we should get rid of. ${$name} is much clearer about what this is (dynamic name lookup) and what it’s doing: $ is the symbol table, and {$name} tells us that we’re finding an entry in it under a key with the value of $name. PHP should deprecate $$name syntax and emit a notice in the next minor version.

It should also deprecate the syntax ->$name and ::$$name. Both are bad news.

Doing so would completely eliminate the first 3 ambiguous expressions above, and the new warning emitted would call out code that would otherwise silently change meaning in PHP6 (one big negative about this RFC).

As for the fourth, Consider these expressions:

A. $foo->prop
B. $foo->prop()
C. $foo->prop['key']()
D. Foo::$prop
E. Foo::$prop()
F. Foo::$prop['key']()

To the chagrin of JavaScript devs, PHP will not let you reference a dynamic property in an execution context, so expression B will try to call a method “prop”, (and fatal if it can’t). For consistency, PHP should fatal on expression E. Currently PHP just does something completely unexpected and insane, looking up a local variable $prop.

Expression C does what you’d expect, accessing the property named “prop”, so expression F should do the same, which is why I think the RFC is a clear step forward at least.

25 Jun

What, when, and how is your IDE telling you?

A programmers.stackexchange question basically asks if there’s an ideal frequency to compile your code. Surely not, as it completely depends on what kind of work you’re doing and how much focus you need at any given moment; breaking to ask for feedback is not necessarily a good idea if you’re plan is solid and you’re in the zone.

But a related topic is more interesting to me: What’s the ideal form of automated information flow from IDE to programmer?

IDEs can now potentially compile/lint/run static analysis on your code with every keystroke. I’m reminded of that when writing new Java code in NetBeans. “OMG the method you started writing two seconds ago doesn’t have the correct return type!!!” You don’t say. And I’ve used an IDE that dumped a huge distracting compiler message in the middle of the code due to a simple syntax error that I could’ve easily dealt with a bit later. I vaguely remember one where the feedback interfered with my typing!

So on one side of the spectrum the IDE could be needling you, dragging you out of the zone, but you do want to know about real problems at some point. Maybe the ideal notification model is progressive, starting out very subtle then becoming more obvious as time passes or it becomes clear you’re moving away from that line.

Anyone seen any unique work in this area?

Stepping back to the notion of when to stop and see if your program works, I think the trifecta of dependency injection, sufficiently strong typing, and solid IDE static analysis has really made a huge difference in this for me. Assuming my design makes sense, I find the bugs tend to be caught in the IDE so that programs are more solid by the time I get to the slower—and more disruptive to flow—cycle of manual testing. YMMV!

11 Jun

On Frameworks

Building any non-trivial app, one of the toughest decisions to make is which development framework to base your work on. And there’s no way around this decision once you realize there’s no such thing as “not using a framework.”

Even if you’re just binding together 3rd party components, a framework will be born out of any development work, and it will have its own tradeoffs to consider.

  • Will it be documented?
  • Will it be tested?
  • Will it have a community of people working on/within it?
  • Will it have a plugin architecture encouraging code reuse?
  • Will you have access to plugins written by people outside your team?
  • Will configuration be easy/reusable?
  • Will it address code/UX scenarios you hadn’t considered?
  • Will you get free bug fixes/features from outside developers?
  • Will it securely handle password and protection against XSS, CSRF, etc.?
  • Will other developers be able to jump into the project?

Developers, plugins, sub-frameworks, tools, and shops all form an ecosystem around a framework that tends to build value as it grows. A strong ecosystem can make up for product weaknesses; a trip through the Drupal and WordPress codebases will have some seasoned developers shuddering at some architectural choices made (long ago), yet that code runs great due to the work of tons of people over time. And, thanks to the wealth of plugins available, and to the continual transfer of knowledge from one developer to the next, devs who are experienced on those platforms can build impressive systems quickly.

On the flip side, there are properties of mature frameworks that are important to consider:

  • They can be slow to change, which may mean maintaining a separate git branch with your own fixes.
  • They rarely bend to support the use cases of individual sites.
  • They tend to have older code that may be harder to integrate with the latest practices.
  • They can be large; any framework that solves a lot of problems out-of-the-box will be. Microframeworks can be a great choice, but their value tends to be limited to plumbing.

Thankfully the ideal framework needn’t be chosen first. You may be better off prototyping something as quickly as possible—by any means—then rebuilding after the requirements are clearer. The danger is the temptation to ship prototypes as production systems, where they grow to become liabilities over time.

27 Apr

Events with Dynamic Handlers, Simplified

Elgg’s event systems are based on registering callables to handle certain event names. As the codebase is still heavily procedural, most event handlers in the core and 3rd party plugins are global functions. Since I’m firmly in the static-code-is-evil camp, I’d much prefer handlers be dynamic method calls, but this introduces some complexity:

  1. Nice plugins allow their event handlers to be unregistered, but dynamic callbacks make this a bit awkward; any plugin that wanted to unregister another plugin’s handler would have to first get a reference to the other plugin’s object. Kind of messy.
  2. This requires proactively creating lots of objects—and loading lots of code—you don’t need in order to create the callbacks.

One way to work around the second issue would be to register callbacks on a proxy objects that lazy-load the real objects on the first method call. This adds complexity, could lead to typehint failures if the proxy gets accessed directly, and doesn’t solve the first issue at all. So I think the event system needs to be extended in a simple way:

We could allow callback strings of the form "service->method" where service is the name of a pre-registered (and lazy-loading) service on Elgg’s DI container. Essentially the event trigger would call $dic->service->method().

This would allow trivially (un)registering these handlers, as well as efficiently lazy-loading the objects without the complexity of proxies.

I’d also suggest never actually binding to the service. This way you could replace a whole set of handlers by replacing the service on the DI container.

6 Apr

Dependency Injection: Ask For What You Need

Despite the scary name, the concept is simple: Your class or function should ask for dependencies upfront. It should not reach out for them. This gives you a few powerful advantages:

1. In unit tests you can pass in mock dependencies to verify/fake behavior.

Imagine your class depends on a database. You really don’t want to require a live database to run your tests, but also you want to be able to test a variety of responses from the database without running a bunch of setup queries. What if your logic depends on the time being around 3AM? You can’t reliably test this if your object pulls time from the environment.

2. Your API doesn’t lie to users about what’s really required to use it.

Imagine if you started cooking a recipe and step 7 read, “Go buy 10 mangos now.” Imagine a Skydive class whose jump() method waited 10 seconds then executed Backpack.getInstance().getParachute.deploy(). “Umm, I’m in the air and the Skydive class didn’t tell me I’d need a backpack…”. Skydive should have required a parachute in the constructor.

3. Your API behaves more predictably.

Imagine a function getNumDays(month). If I pass in February, I usually get 28, but sometimes 29! This is because there’s a hidden dependency on the current year. This function would be useless for processing old data.

Baking the dependencies in

Note that the same principles apply to functions (fine for this exercise, but you should avoid global functions/static methods for all sorts of reasons). Consider this function for baking peanut butter cookies:

function makePBCookies(AlbertsonsPB $pb) {
  // ... 20 lines of setup code
  $egg = new LargePublixEgg();
  $sugar = new DixieSugar('1 cup');
  $chef = Yelp::find('chef');
  // create an oven
  // mix and bake for 10 min
  return $cookies;
}

From the argument list, it isn’t clear you’ll need a large Publix egg on hand. What if you don’t have a Publix in your area? What if you want less sugary cookies? Let’s refactor:

function makePBCookies(Egg $egg, Sugar $sugar, PB $pb) {
  // ... 20 lines of setup code
  $chef = Yelp::find('chef');
  // create an oven
  // mix and bake for 10 min
  return $cookies;
}

Now it’s immediately clear at the top what ingredients you’ll need; you can use any brands you want; and you can even change amounts/sizes to yield all kinds of flavors. This is really flexible, but we can make it even better:

function makePBCookies(Egg $egg, Sugar $sugar, PB $pb, Oven $oven, Chef $chef) {
  $mix = $chef->mix(array($egg, $sugar, $pb));
  return $chef->bake($oven, $mix);
}

In case it wasn’t obvious, it’s now clear we’ll need a chef and oven, and it makes good sense to outsource the oven design because this logic isn’t really specific to peanut butter cookies. A sign that our refactoring is going well is that the function body is starting to read more like a narrative.

Let’s build the ultimate cookie-making function:

function makeCookies(BakeList $list, Chef $chef, Oven $oven, Decorations $decs = null) {
  // BakeList is a composite of recipe & ingredients
  $mix = $chef->mix($list->getIngredients(), $list->getRecipe());
  $cookies = $chef->bake($oven, $mix);
  if ($decs) {
    $cookies = $chef->decorate($cookies, $decs);
  }
  return $cookies;
}

We’ve refactored into several easy-to-test components, each with a specific task. This function is also easy to test because we just need to verify how the dependencies are passed and what methods are called on them. If we wanted to go even further we might notice that cookies come out differently in Denver vs. NYC (there’s a hidden dependency on altitude).

But that’s all dependency injection is: asking for what you need and not relying on sniffing dependencies from the environment. It makes code more flexible, more reusable, and more testable. It’s awesome.

I highly recommend Miško Hevery’s 2008 talks about dependency injection and testability, which really solidified the concepts and their importance for me. [This is an improved version of an article I wrote around that time hosted elsewhere.]

19 Mar

WHA!? Our House is For Sale!

Here’s the one-page site I put together to show off the full quality pics:

house.mrclay.org

2 Mar

San Francisco, do you want to be Manhattan?

Another day, another article on S.F.’s crazy real estate market. Except it’s perfectly rational behavior: The secret is out that S.F. is an awesome place to live with plenty of very high-paying jobs, but also land is scarce and there are lots of development restrictions.

If no policies change, prices—both housing and general commodities—will continue to go up until S.F. is basically another Manhattan; if you want to live there you either must be in the upper class or able to live in a shoebox. The good news is that California is wonderful and people can live happily outside S.F. The city would just need to beef up its transportation infrastructure for an enormous commuting class, and the Bay Area will suffer the environmental implications of that.

If, however, you think there’s value in having residents from a wider range of incomes, you have to be willing to build a ton of new, and very dense, housing, including bulldozing some old areas—not every inch of the city can be treated as a historic artifact. Unfortunately plenty of lefties think that anything that’s good for rich developers must be bad for everyone else, and it just ain’t so. I highly recommend reading Matt Yglesias’s bite-sized ebook The Rent Is Too Damn High, which makes very convincing arguments that loosening development restrictions is a great idea for everyone. Lots of people want to live in S.F., and we should let them. Density is great for the economy and for decreasing the environmental impact of cars and commutes.

Renters are already living very densely packed in “single family” homes, so it’s pretty clear there would be plenty of demand for new apartments in a variety of sizes. The natural opposition to this is going to be existing owners that benefit from rising prices, but certainly a motivated majority (renters) could successfully push for expanded development. But until they realize it’s in their interest, they’ll keep complaining about a variety of things that don’t matter while being slowly forced out of the city.

Gainesville has been increasing the density of housing around the university and it seems to be pretty great to me. Until a few years ago it seemed inevitable that there would be ever increasing sprawl and student traffic, but now a lot more students can live in walking distance.

15 Feb

Unpacking the Access Control Systems in Drupal and Elgg

Elgg’s access control system, which determines what content a user can view, is somewhat limited and very opinionated, with several use cases—access control lists, friends—baked into the core system. In hopes of making this cleaner and more powerful, I’ve been studying Drupal’s access system. (Caveat: My knowledge in this area of Drupal comes mainly from reading code, schema, docs, and two great overviews by Mike Potter and Larry Garfield, so please chime in if I run off the rails.) 

Drupal

Drupal’s system also influences update and delete permissions, but here I’m only interested in the “view” permission. Also, although Drupal has hook_node_access()—a procedural calculation of permissions for a node (like an Elgg entity) already in memory—I’m focusing on the systems that craft SQL conditions to fetch only nodes visible to the user. This is critical to get right in the SQL, because if your access control relies on code, you can never predict the number of queries required to generate a list for browsing. In this area, Drupal’s realms/grants API (hook_node_grants()is extremely powerful.

Realms and Grants in a Nutshell

At a particular time, a user exists in zero or more “realms”; more or less arbitrary labels which may be based on user attributes, roles, associations, the current system state, time…anything. Each realm has been granted (via DB rows) the permission to view individual nodes. So to query, we build up a user’s list of realms, this is baked into the query, and the DB returns nodes matching at least one realm.

E.g., at 2:30 PM today, an anonymous visitor might be in the realms (public, time_afternoon, season_winter*), whereas Mary, who logged in, might exist in the realms (public, logged_in, user_123, friendedby_345, role_developer, team_A, is_over_30, time_afternoon, season_winter). So Mary will likely see more nodes because her queries provide more opportunity to match grant rows. *Note these realms are made up examples.

Clearly this is very expressive, but Drupal (maybe for better) doesn’t provide many features out-of-the-box, so (maybe for worse) doesn’t build in many realms; the API is mostly a framework for implementing an access control system on top of added features. Contrib modules appear up to the task of providing realms based on all kinds of things (groups, taxonomies, associations with particular nodes), but it’s hard to collaboratively build an access control system, so these modules apparently don’t work well with each other and non-access modules must be careful to tap into the appropriate systems to keep nodes protected.

(Implementation oddities: The grants are done in the node_access table, which probably should’ve been called “node_grants”, especially because this table is only somewhat related to the hook called “node_access“. Less seriously—depending on your system size—each realm name (VARCHAR) is duplicated for every node/realm combination, so there’s some opportunity for normalization.)

Elgg

If you squint, Elgg’s system is a bit similar. Each entity has an “access level” (a realm), with values like “public”, “logged in”, “private”, “friends”, or values representing access control lists (a group or a subset of your friends like a Google+ circle).

That an entity can have only one realm is of course the biggest (and most painful) difference, but also the implementation is significantly complicated by some realms needing to map to different tables. E.g. Elgg has to ensure “friends” maps to rows in an entities relationship table based upon the owner of the entity, while also mapping to the ACL table.

I imagine a lot of these differences come from Drupal being old as time with a lot bigger API reboots, and because Elgg’s access system was targeted to meet the needs of features like friends and user groups, which were built-in from the beginning. It’s hard to predict which schema results in faster queries, and will depend on the use case, but Drupal queries I would guess are easier to generate and safer to alter.

Conclusions

I think in the long run Elgg would be wise to adopt a realms/grants schema, though I would probably suggest normalizing with a separate “realms” table to hold the name and other useful bits. Elgg group ACLs and friend collections would map directly into realms, but friend relationships would need to be duplicated into realms just like groups have an ACL distinct from the membership relationship. Really I think a grants table could completely replace Elgg’s “entity_relationships” table, since both tables just map one entity to others with a name.

As for Drupal, I think the docs could more clearly describe realms and grants (unless I’ve totally got this wrong). I’m less sure of the quality of the API that populates/maintains the tables; it looks like the hooks are pretty low-level ways of asking “would you like to dump some rows into node_access?” and it’s not clear how much of the table must be rebuilt or how often this happens.

14 Feb

GitX for Cleaner Commits

A great way to make your pull requests easier to review is to reduce each commit to a particular purpose/functional change. In practice this often means not combining multiple feature additions in a single commit, or not including whitespace changes in a commit that makes some code change.

The command-line git add is just not great for anything but very trivial changes, so I use GitX (the active fork) when I’m on OSX.

Getting to the UI is a quick gitx in terminal and command+2 to switch to the Stage view. Here you can move changes in/out of staging via drag/drop files or by clicking on individual lines in diff views. This lets you run wild and loose during development then, with a scalpel, carefully craft your changes into a logical set of cleaner commits. E.g. if you’ve altering code, unstage all your unrelated whitespace changes and put them in the last commit, or just revert them or git stash to move them to another branch.