The dangers of downtime in government IT

According to Federal Computer Week, the Trademark Electronic Application System that is managed by the Patent and Trademark Office was struck by a power outage on Dec. 22. It took almost a week – until Dec. 28 – for most major functionality to be finally restored. Even though the outage mostly fell during a holiday weekend, network outages are still a clear and present issue that government IT needs to address.

Ensuring that systems stay online come snow, rain, heat, or gloom of night is a difficult prospect for many IT professionals, especially in federal IT, where contracts, mandates and regulations make it difficult to stay up to date and respond to problems when they arise. Unforeseen technical issues and necessary system upgrades for compliance, performance and security abound.

Minutes matter
A study conducted by MeriTalk found that 70 percent of federal workers experienced network outages of longer than 30 minutes. That's a distressingly high proportion – holdups in government IT systems can cause considerable issues affecting the smooth operation of government services. When systems go down, all the groups and individuals who rely on them are left out in the cold.

"Holdups in government IT systems can cause considerable issues affecting the operation of services."

Outages at any scale can have real effects. Google had an outage in August 2013, wherein all of the tech giant's myriad services went offline for just five minutes. Analytics firm GoSquared found that during those five minutes, Web traffic slowed 40 percent. The entire Internet ground to a halt, and The Financial Times estimated that Google lost half a million dollars in revenue in those five minutes.

That was just a five minute outage, and though that figure might be pocket change for a company such as Google, outages like the one at the USPTO stretch into days, where both the integrity and reputation of a service can become seriously damaged for each lost hour of downtime.

System outages represent a real problem for employees and citizens alike. When employee systems are down, they can't do their job, and citizens need to be able to access user-facing systems in order to use the services many people rely upon. Ensuring system reliability involves being vigilant and knowing how and where network outages can occur, and being able to prepare for them – both in predicting where an issue might arise, and also having a backup plan in case things go poorly.

Uptime urgency
The debacle is fresh in the minds of the public and government employees alike – it underscores just how important it is for agencies to build strong, working front-end IT systems. Computers and the Web are not just niche tools that agencies are only now starting to embrace – they represent how virtually all work is done everywhere. Outages in computer networks cause frustration, inefficiency, and mistakes.

That's not to say that agencies aren't aware of the importance of uptime: The MeriTalk study found that federal IT professionals overwhelmingly recognize reliability as a top priority, but only 19 percent of them state that they actually have the resources necessary to establish 100 percent uptime.

The study showed that agencies have only half of the storage, computing power and staff that they need to prevent outages. But it isn't always easy to expand staff or resources. Partnering with a third-party IT resource can save money and time by tapping into the resources of an organization that already has experience ensuring network uptime and cybersecurity

"Simplifying IT policies and consistent monitoring of network services are crucial to making sure systems stay online."

Preventing network outages is a central tenet of good enterprise management. As Network Computing contributor Joel Dolisy pointed out, a large majority of outages are caused by human error, and that means simplification of IT policies and consistent monitoring of network services are crucial to making sure systems stay online.

Outside of human error, network outages are primarily caused by technical issues: configuration and update errors; or overstressed components; or environmental issues, such as improper climate control; or blackouts, which caused the outages at the Patent and Trademark Office.

Of course, not every agency has the luxury of simplifying its IT systems. Because of this, it's even more important that monitoring be used to ensure the integrity of the systems these departments protect, and that they work with organizations that specialize in IT service management to keep systems running at peak efficiency.