Most enterprises are sitting on a ticking time bomb and the Business Leaders either don’t know it or are not allowing their IT staffs to do anything about it. This is the risk of major computing systems that run the business every day stopping because some change was introduced into the environment usually to address some unrelated issue. For example, it happens every month when Microsoft releases its latest Windows patch set. But Microsoft is not the only culprit.
I once worked with a manager who told me, “Software rots!” (Parke, I’m giving you credit here in case you ever read this post.) Parke’s epithet was generally a retort to someone who had a piece of code that suddenly stopped working for no apparent reason. He knew that something had changed, it was up to the programmer to figure out what it was and why it broke his code. Even then, as in today’s computing environments, there were so many incremental changes happening at various levels in the computing environment that there was always some cause for every effect and hence, a reason for every failure.
I call this software atrophy. Atrophy is actually a term in biology. Wikipedia defines atrophy as the partial or complete wasting away of a part of the body. A major cause of atrophy in muscles is lack of use or lack of exercise. I apply this to software when the technology employed in a system is not kept up-to-date. This can cause it to become incompatible in an ever-changing computing environment. The risk is that Microsoft or some other vendor of software tools and operating systems will patch their product to fix a problem and simultaneously break a function in an enterprise’s production system that is critical to their operations. It happens all the time.
The proliferation of software development tools aimed at end users is largely responsible for this situation. The number of user-developed systems that have become mission-critical in day-to-day operations is astounding. Often, the authors of these systems have moved on from the organization and the current users only know how to use the system, not how to care for it. When it breaks, it becomes a production problem whether the IT department knows about the system or not.
End users are not the only source of this risk. IT departments have similar issues particularly with one-off systems they wrote to solve some operational need that could not be addressed at the time in the enterprise systems. Unless these systems have been maintained to keep their underlying technologies up-to-date, the dooms-day scenario is one month away when the next patch set comes out. Typically, the IT staff who developed these systems have moved on leaving the each of their systems without someone who cares for them. The more systems that used the affected old technology, the more carnage that last patch will cause.
It is well within an Enterprise Architecture strategy to identify and manage this type of risk. It’s not sexy, but it is vitally important. It is in fact a foundational activity for establishing an Enterprise Architecture program that yields high benefits in the short-term. Producing a risk analysis of the probability for a dooms-day scenario after a vendor patch set has been applied is a major component of the value proposition for an EA program. This is what Business Leaders want, so it’s a great way to get established.