No, I meant “Tell them the MESSAGING SERVICE was down”. Not the stock price! Doh!
How embarrassing for such a well respected Canadian company as RIM to let slip that it hasn’t protected its critical systems adequately. The most telling news came from the long silence, followed by an information shell-game when they did release a statement. Here are the things I find disturbing about the April 17-18 outage in Blackberry service.
- The outage lasted for 12 hours. This is a long time to leave powerful people like politicians, lawyers, executives, sales people (OK, I’m stretching it…) without a text messaging service that they have come to depend on. It’s good that people depend on you; it’s what we most want. However, its a wake-up call that a network can be that vulnerable to a single outage. The company obviously does not appreciate the criticality of their service to their clients, regardless of their Service Level Agreements with carriers. You can point at SLA’s all day long, but in the end, you lose trust if you let customers down in this highly visible way. Their expectations were not met.
- The outage was caused by inadequate testing, which is another sign that RIM do not consider it necessary to fully test every software update to their production servers.
- Their supposed roll-back or failover mechanisms did not revert back to the previous system state as soon as the problem was discovered.
- The news, when it came, showed no amount of customer consideration in its ultimate explanation. The explanation seemed to be carefully constructed to be ambiguous enough that customers might have been able to assume the problem was minor, when in fact many assumed the worst. If they were trying to make people feel more secure, it did not appear to have that effect.
Had RIM done a full Business Impact Analysis or Sensitivity Analysis, I believe they would have already identified the service as being critical to their Operations, resulting in a proper testing program, and properly tested roll-back and/or failover capabilities upon failure of a software update, to continue without disrupting service for any significant period of time. They would also have recognized that communicating the status of the outage and expected resolution to clients is the best way to apprise them of the fact that it was not a security breach that caused the outage.
While these may not seem like Security Management issues to some, they are tell-tale signs that lead me to believe we could find more glaring security risks being ignored inside the company. An obscure reference to the Tom Peters book, In Search of Excellence, comes to mind, where he explained why Southwest Airlines told him they took such care in cleaning seat back tray tables. He said something like, ‘When an airline’s passengers find their tray tables with dirt on them, they tend to assume the company doesn’t do thorough engine maintenance.’
RIM had a chance to show everyone how proactive, business-like and customer-centric they could be. Instead, they unintentionally lead us to believe they may have problems under the hood through their inaction, and then their poor explanation.
The only good thing was, I only noticed the outage about 10 minutes before they announce that the problem had been fixed.

