Tag Archives: outage

Creeping Dependencies

We had a client issue the other day where a system stopped working because a necessary third-party service just stopped, with little warning. It got me thinking about the nature of dependencies.

When building networked applications, there are necessarily aspects of which you don’t have control. You don’t usually control every mile of the Internet connection that reaches the user, for instance. So, for example, if it’s a user at home on a shared cable connection and it’s 4pm on a school day when every kid in the neighborhood just got home to stream video, they’re going to have a slower connection–and there’s not much you can do.

In this case, it was a service dependency, and that’s just as potentially problematic. After all, there are times when you can’t do it all yourself–maybe you need to poll an external database for information, or there’s already a cheap commodity service that will take care of an ancillary need.

But the trick, and one that we didn’t really execute as well as we should, is to note the risk upfront and do what you can to mitigate it. In our case, we had noted the risk and thought we had some ways to manage it, but when the outage occurred, we immediately saw breakdowns in the process. Paths for escalating the outage notification were unclear, and the error message created for the users was, shall we say, less than graceful.

Fortunately the outage didn’t last all that long, and the upstream provider of the service was very gracious and quick about getting things resolved. But it was a good reminder that every time you create a dependency, whether it’s an AJAX call that relies on a decently-fast connection or a threshold service for access, it’s critical to consider how to gracefully handle the situation where it fails.

Because it will. At some point. But in our interconnected world, what’s the alternative?