Amazon Web Services storm outages serve as a warning of cloud risk to businesses

Australian businesses have been warned they need to spread the risk in their cloud computing operations across different regions after the Sydney storms on Sunday knocked out the operations of numerous Amazon Web Services customers.

The ferocious storms that hit NSW, left AWS clients including Domino’s Pizza, Foxtel, The Iconic, Stan and Domain without websites or key systems for hours.

It served as a warning that sending systems to the cloud, rather than hosting them on-premise did not remove the risk of costly failures.

Amazon Web Services declined to discuss whether it would compensate customers who lost business during the outage. AP

The failure represents a major embarrassment for the company, which generated $US2.57 billion revenue in the latest quarter, based largely on the fact that it is perceived as being hugely reliable.

It meant that customers that had committed all of their systems to its care were unable to trade from mid-afternoon until as late as Monday morning.

On Monday AWS refused to discuss the reasons behind the outage, instead referring inquiries to an online site, which showed the status of its data centres.

A spokesman also declined to comment on whether affected businesses would be entitled to compensation, and what damage the outages had caused for its own business locally.

Dan Nolan, co-founder of Proxima said his company was saved from the outage because it had decided to take a multi-availability zone failover approach. Anthony Johnson

It is understood that power was lost in a number of systems in its data centre, before being restored about 90 minutes later. It then took hours for some systems to be rebooted and brought back online.

Experts said that customers who insisted all their systems and data remained in Australia had been impacted, as other customers simply had their systems clicked over to Singapore and continued to trade as normal.

IBRS analyst Joe Sweeney said many would use the outage as an example of why businesses should not adopt cloud computing.

However it was more of a wake-up call for businesses to structure their technology in such a way that one data centre failing would not be fatal.

AirService co-founder Dominic Bressan said he understood things could go wrong with cloud computing services. Supplied

“In short, I think the outage is damaging for the entire market – not just AWS,” Mr Sweeney said.

“I think this event is a great opportunity for organisations to rethink what they were trying to get from moving to the cloud. It is also a cautionary tail – ensure your critical applications are built to survive failures of any data centre.”

The outages came as more than 226,000 homes and businesses lost power during the weekend storms. Roads, bridges and public transport were also affected.

The bodies of three men were also discovered in cars caught in floods in separate incidents in the ACT, the NSW Southern Highlands and Sydney’s south-west.

So far, $30 million worth of insurance claims have been made.

Co-founder of event software company Proxima, Dan Nolan, said his company was an AWS client, but was saved from the outage because it had decided to take a multi-availability zone failover approach.

“When we were setting up our infrastructure a bit over a year ago, we took a punt on multi-availability zone failover,” he said.

“It was literally for if this situation ever occurred … the cost otherwise though would have been quite substantial to the business. Having multi-availability zone is a fraction of the cost to having a site go down.”

Mr Nolan said the company had taken heed of lessons learnt in the United States where businesses such as Instagram had suffered major outages in the past.

Proxima was not the only company with a backup strategy in place.

Mobile ordering and payment technology company AirService uses AWS as its main provider, but had another ready to go in case of an outage.

“As amazing as AWS is, and we’ve been using it for a long time, we do understand that things can sometimes go wrong,” AirService chief executive Dominic Bressan said.

Telsyte analyst Rodney Gedda said Sunday’s storm showed that cloud computing wasn’t infallible, and that organisations needed to factor in risks when weighing up their strategy. Despite the failure on Sunday, AWS systems are likely to be much more robust than those run by individual organisations.

“No outage is ‘good’, but it’s unlikely this event will damage AWS’s brand too much. All the main cloud players have experienced some form of downtime so Amazon is not alone and events like these are generally accepted by customers,” Mr Gedda said.

“AWS will no doubt perform a post mortem on this event and do what it can to prevent it from happening again … Cloud services won’t magically protect your business from downtime or data loss. You still need to be proactive.”

 

Advertisements

Talking innovation and disruption with Telstra’s CRO

Kate Hughes talks to StrategicRISK about how risk management is helping Australia’s largest telecommunications provider become a global technology player

It’s 6.20am and Kate Hughes’s phone goes off. The chief risk officer for Australia’s largest telecommunications provider, Telstra, has been called to activate the crisis management team to deal with a major outage affecting thousands of customers.

By 7am, an action plan is in place and Hughes can begin her day. But an hour later she receives a report from a whistleblower alleging bad behaviour of a senior executive, which sees her launch an immediate internal investigation through her fraud team. Then, a few hours later, Hughes is alerted to a customer privacy breach, so it’s then a call to the regulators to alert them of the incident.

It’s not even lunchtime, and Hughes has already fielded more incidents than most chief risk officers would see in a month.

Hughes has agreed to an interview with StrategicRISK to discuss how risk management is helping Telstra navigate a strategic business model change from a traditional domestic telecommunications provider to a global technology company.

But first, a history lesson.

Telstra is one of Australia’s most well-known companies. The country’s largest telecommunications provider builds and operates networks around Australia and markets mobile, internet access, pay television and other entertainment products and services.

But the pace of digital change has not been kind to traditional telcos, forcing Telstra, and most of its competitors, to pivot from their historic business model.

Today the company has its sights on being a global technology company.

Last year the company invested almost $1.2b in acquisitions, including a controlling stake in 15 new businesses. It also expanded its reach in Asia through acquiring Pacnet in Singapore and launching TelkomTelstra in Indonesia, and activated new business units such as Telstra Health.

This pace of change, coupled with the profound shift in the way people connect and communicate, means Telstra faces a challenging set of business risks that threaten it achieving its growth ambitions and financial targets.

This is where Hughes comes in.

“Most people say to me, I’ve got one of the most interesting jobs in the company and I would agree that I do. There’s very little that I’m not across, or not involved in, or not able to add value to,” Hughes says. “I get to make decisions about the kind of ladders we use in the field, I get to talk about the risks of having handbrake alarms in some of our cars, and I also get to talk about the risks of technology disruption as it will impact on our strategy to be a world-class technology company.”

The risk function at Telstra has evolved significantly over the past three-and-a-half years under Hughes’s leadership. The 160-strong risk office now looks after the group’s risk management, compliance and privacy functions, as well as its law enforcement capabilities, fraud investigations, enterprise resilience, security, and health, safety and environment arms.

Hughes, who reports into chief financial officer Warwick Bray, admits she is lucky to work for an executive team who take risk management seriously.

“It’s a privilege to be involved in something that helps our executives make better decisions,” she says.

And with the pace of change that Telstra is facing, that decision-making needs to happen quickly.

“We can be disruptive or we can be disrupted and we’ll probably be both. That’s not necessarily a bad thing. I think disruption creates solid incentive to be more innovative and that’s good,” she says.

Telstra is undergoing a major internal simplification process, driven by the risk of not being able to keep up with younger, more agile, tech start-ups.

“I’m in a meeting every Tuesday morning on this to see what am I doing to help us get there,” Hughes says, adding that she sees the company’s simplification and disruption impetus as an opportunity to show the benefits of risk-based decision making.

“Everything we do requires us to do a risk assessment and that shouldn’t be seen as an onerous, bureaucratic thing, but actually built in to our processes every day.

“Part of the business case is doing a risk management assessment. You don’t tack it on the end, it’s not done at five minutes to midnight, it’s not done once we’ve agreed to everything else … it’s part of the process.

“That is the evolution of risk management – to take it out of the academic, out of the process, and make it much more part of the business conversation so that it actually adds value to the commercial decision-making challenge that your leader has,” she says.

Hughes cites an example with the head of Telstra property, who had to decide how to allocate his spending when it came to upgrade work on the group’s exchange sites. By applying a safety rating to every exchange, Hughes team was able to prioritise which sites should be worked on first.

Back to where it started

In some ways Hughes has come full circle to her role at Telstra.

After graduating with a commerce degree with majors in economics and finance, she took up a role at the NSW Treasury. One of the first companies she audited was Telstra, sitting in the very same Melbourne offices that she does today.

She then moved to the Sydney Futures Exchange where she was responsible for surveying the open trading floor for rouge or illegal trades during its final year of operation.

“I was one of about four women in a room of 400 men that had some pretty bad behaviours,” Hughes recalls.

From there, she moved to the Australian Securities and Investments Commission (ASIC), the country’s corporate, markets and financial services regulator. And it’s this insider experience which has proved invaluable to Hughes at Telstra – one of the country’s most highly regulated companies.

“One of our big risks is going to be a rapidly changing regulatory environment,” she says. “It will go to things like how we regulate data ownership and data sovereignty in the long term.”

Regulators around the world are struggling to keep up with the implications of new technology – and most are doing so at different paces, not to mention with vastly different strengths of legislative iron fists.

For a company with global expansion plans, this adds a huge layer of complexity.

“How do you grow in those countries where your company’s cloud strategies aren’t going to fit with theirs, for example,” she says.

“[Regulation] has the potential to certainly change how we develop and market products. It’s one of the material risks that we talk to the board about. What you have to get very good at doing is staring over the horizon beyond your normal two to three-year period, out to five to eight years and start to think about what regulation will matter then.”

In a disruptive environment, Hughes also sees the potential for corporates to challenge existing regulation.

“If you look at Uber and Airbnb as two business model challenges, everybody talks about those as being challenging at a business model level, but what for me was most interesting is that they challenged existing regulator models as well. Uber drivers never stopped and said ‘I need a taxi license’. So what would happen to us if we fundamentally changed [current] regulation? We do a lot of black swan thinking about some of those risks,” she says.

Cyber and security challenges

In the nearer term, Australia is set to bring in data loss notification laws which will force companies to advise customers when their details have been unlawfully accessed.

“It’s not going to be a huge issue for us because we’ve always thought long and hard about who we should tell when we’ve had a breach of some kind,” Hughes says.

This stance was put the test last year. Just two weeks before Telstra’s $697m acquisition of Pacnet was finalised, the Asian telecommunications business was hacked by an unknown third party which gained complete access to the company’s network including emails and other administrative systems.

Telstra said it wasn’t told about the breach until after the deal’s completion on 16 April.

In that instance, Hughes says Telstra voluntarily went to eight different regulators about the breach.

“Each one had different expectations about whether or not we would or should tell them,” she says. “We’ve always felt better to be upfront and honest. The worst thing you can do is look like you’re hiding it.”

But Hughes fears that the new breach notification laws could result in consumers getting “notification fatigue”, where they fail to act on important data breaches because they are being alerted of them so frequently.

Instead, when it comes to cyber security, Hughes is turning the lens to the company’s employees, which are often considered the weakest link in any cyber security programme.

“We run drills to see if we can trick our employees into doing something that they shouldn’t have,” she says, such as clicking on a link or opening a suspect attachment.

In the first drill, 30% of employees failed. That dropped to 18% in the second round.

What’s in a name?

Managing major reputation crises is also something that Hughes is well versed in.

In 2005, she was asked to join a company in the midst of a major corruption scandal that saw it on the front page of the papers for more than 400 consecutive days, and its shareholder value slashed by almost $1bn overnight. That company was the Australian Wheat Board (AWB), which was accused of paying millions of dollars in bribes to Saddam Hussein’s regime in Iran in exchange for lucrative wheat contracts.

“Part of my job was to build the right internal controls, the right risk processes and the right compliance controls to ensure we never ever did that again,” she says.

For four years, Hughes worked with a new management board to help turn the business around.

“Leadership in good times is always a pleasure. The hardest job you will ever do is lead in tough times when there’s bad news on the front page of the paper and your employees feel embarrassed to work for you,” she says.

Hughes believes reputation isn’t a risk as such, but an “outcome of other things you didn’t do very well”.

Regardless, when you’re an organisation the size of Telstra, reputation is incredibly important.

“This year we have put in place much more formal metrics to measure the impact of our resilience on reputation,” Hughes says.

For example, during network outages, Telstra can map social media mentions against the network issues to give an indication on the importance of resilience to its customers.

“It’s also a really good predictor of consumer behaviour, so how many of these [incidents] does it take before a consumer, one, rings up and complains, two, gives us a negative rating, or three, possibly changes services. That’s critical insightful data that we work with marketing, media and communications teams on,” she says.

Hughes is one of the most passionate advocates for strategic risk management that you will meet. But she’s far from traditional.

“The one thing I rarely say to people is that I’m the chief risk officer; what I often say is I’m an executive at Telstra, because part of my job is not just talking about the risks, but talking about the opportunities. At the end of the day my real job is to make sure that our executives know how to make decisions.

“Helping people consciously choose to take risks is good because it means that they’re doing it utterly informed.”

Hughes says that risk managers must move from talking about the “what” – the list of risks and risk registers – to talking about the “now what”.

“Being the person who forces people to sit through three-hour long risk workshops so we can satisfy ourselves that we’ve got 25 pages of risk registers is an academic exercise that has never sat well with me,” she says.

“Doing [risk management] for the sake of governance, whilst necessarily, is not necessarily always valuable. Doing it because it helps [the company] make a better decision, save money, spend it more wisely … and potentially be a disruptor yourself because you’ve found a hole in the market that no one else has, that’s where the real value comes from.

The changes Australia must make in the digitally disrupted world

If we are to keep up in the new world, I think Australia needs to change. We are besotted with the “corridor of comfort” – we don’t like tall poppies and distrust bankrupts – we like the middle too much.

We need to celebrate our successes and support those who fail. In the words of Winston Churchill: “Success is stumbling from failure to failure with no loss of enthusiasm.”

http://www.afr.com/technology/the-changes-australia-must-make-in-the-digitally-disrupted-world-20150904-gjeym9

Risk Management in a Digital Business

I had the pleasure to have had a 6 months sting with a digital tech startup – it looks and feels very different from the previous company I was in. From the size, people, to the culture, operational priorities, they are all different. But there are common things below the surface– revenue bottom line, customer value as a key driver and back office support functions such as HR, Finance. What is fundamentally different is the pace it operates, its responsiveness, use of data and the capacity to transform concepts from ideas to products at an amazing speed (GTM).

Managing two priorities

The very nature of digital means that you constantly managing two competing priorities – constant and speedy innovation vs. the achievement of long term strategy – or as a respected CIO put it in analogy – growth pain of a teenager wanting to party hard but also grow up. While in traditional environment, balancing tactical and strategic decisions is not a new concept, in the new digital age, this must be done ‘on the fly’. Much of the short-term innovation (tacticals) actually helps solidify/confirm/reinforce the longer-term strategy and the ability to quickly pivot becomes a key ingredient for success.

This is where aspects of Agile and Lean methodology I believe is the underlying factor. And I believe it’s an irreversible trend. It helps to deal with this constant change and being agile becomes a way of thinking and way of work, rather than a simple project management methodology.

See my previous posts on How does risk management stay relevant in a fast-evolving digital world? And How does risk management stay relevant in a fast-evolving digital world? (Continued)

Trial and error and de-risk

Underneath all great innovations is a common thread, which is the meeting of a demand of a customer. Because customer needs change all the time, the ability to innovate must adapt to it. What drives the success of a tech company is this concept of minimum viable product – being able to deliver incrementally and quickly, but also shut down those not meeting the needs. Forget about elaborate planning, business casing, projection etc, get the hands dirty and trial, test, de-risk, iterate and refine.

Not all projects survive because not all of them were good ideas to start with. Baby steps and thin slices make achieving something tangible easier, make decision making easier and worst case it makes the failure feel less painful and costly. The company I was in had a half yearly ‘hack days’ where all employees are encouraged to down tools and collaborate with anyone they like and work on a mini project for whatever the problem they feel passionate about solving. The end result is one that surpasses all expectations – lots of great ideas, lots of happy employees, lots of team building and lots of commercialised products.

Agile culture and risk culture

Arguably building this sort of culture is harder to do than building a risk-aware culture – yet I believe a collaborative, transparent and agile culture will embrace concept such as risk just like anything else. I’d go further to say that digital environment will foster a risk-aware culture faster and more effective than traditional management environment.

The culture that fosters innovation ultimately is the culture that allows for experimentation and failure. Whilst superficially it sounds not so rosy and presents challenges to risk professionals – because failures actually can result from a lack of risk consideration and can mean the eventuation of the risk you want to prevent at the first place, I believe it can be managed because the ingredient of a risk aware culture is there – collaboration, transparency, value driven. In this sort of environment, its more likely that whatever the decision made is actually a risk-informed decision and the team is conscious of the risks involved in getting down to the path we chose.

Cross-functional Collaboration

Another driver behind digital success comes from a deep-rooted concept that is cross-functional collaboration. Sales, marketing, finance and aspects of IT all become part of an extended team. To bring all these personalities together is a monumental challenge. The tech teams are not motivated nor challenged in the same way that Sales or Marketing teams. I had been to standups, inceptions, townhalls and retros and everyone has something different to contribute. How does risk present itself amongst all of these dynamic people remains an interesting challenge for me. But one thing for sure. Risk people need to be physically present, and only being present you can demonstrate that you are trying to understand the problem at hand, the thinking, the approach and the solution adopted. From there, maybe a well-thought out risk advice can be provided.

Different dynamics in IT

I also happen to have spent a lot time with our IT community. My take is that IT has a totally different definition in a digital environment. Using agile, the boundary between IT and business is ever blurred. IT is ‘actually’ the enabler to execution of the strategy. Nowhere else I saw IT being so value driven and collaborate with the rest of the business community so seamlessly. Via delivery leads and technical leads, real collaboration happen and everyone focus on one common delivery goal – customer.

In many businesses the IT team are purely infrastructure and or operations as a service function. However in digitally centric businesses the IT teams are your conduits to execution. And, much like how sales and marketing staff are evolving so are these IT teams. Even within IT community itself, evolution occurs as we speak, concepts such as scaled agile framework, scrum, lean, devOps, extreme programming are being trialled, adapted and matured. Logical organisation structure such as squad, tribe and guild are created to allow for one thing – better collaboration.

Conclusion

Digital is the buzz word now and disruption to the existing business is what most fear. As many embark on the digital transformation journey, risk management can play a unique role in making sure the end objective is achieved. Having a risk view to the whole digital landscape provides a balanced view on the pros and cons. Importantly, it ensures the adoption of a technology or digital initiative is not made independent of the benefit it can provide. A conversation (risk assessment) on the ‘value’ from moving to digital can save millions on what otherwise would have been spent on just another ‘feel good’ solution. Technology is an enabler, risk professionals need to remain objective and inform management that the customers forever sit at the heart of the solution and technology is there as merely a conduit.

How does risk management stay relevant in a fast-evolving digital world? (Continued)

Let’s start off with a recap

Last time we discussed how innovative culture represents the most important force behind modern business success, which drives innovation in both technology and the business model that applies such technology.  We dived into the concept of agile and agile thinking being the essential ingredient for building an innovative culture. We compared agile culture and its underlying value beliefs and how they influence the ‘way of work’ for different organisations.

In the context of risk management, the differences in this culture undertone, shared values and the way of work have profound implication to the risks faced by the organisation and how these risks can be best managed. Let’s explore this using project management as a relatively straightforward example.

Maybe I should rename this post to – ‘how to deal with the inevitable agile disruption?

Project and project risk management

How an organisation runs and manages its projects says a lot about itself. All projects are subject to constraints within the boundaries of the ‘way of work’ because they must work in tandem with resources already committed – people, funding, processes, priorities, inter-dependencies and timing.

An established companies usually run projects within a clear structure of teams, roles and responsibilities, reporting lines, handoff points and decision making regime. Embedded within a formalised PMO, traditional project risk management fits into this linear project lifecycle (usually waterfall) and is anchored by clear action points (or management controls) at key milestone stages such as qualitative and quantitative risk assessment, management review and approval points.

2-1

Comparatively in a start-up agile environment, project management is turned on its head. Team are mostly cross functional with significant autonomy in deciding what and how to deliver a product or value. Non-value adding activities such as process maps and progress report are considered ‘waste’ because constant progress will render them outdated by the time they are created.

Evolved from project management, the agile thinking has been dubbed as the single most fundamental feature for most of the modern successful tech companies. In advanced agile organisations, agile way of working extends beyond project management and becomes ‘the norm’ and ‘business as usual’ reinforced by people initiatives on trust, learning, innovation, sharing and value. I have seen organisations continuously deliver products or values in squads, tribes and guilds without formal reporting lines or delegation authority.

The key risks addressed by traditional project risk management – time, cost and quality – are significantly mitigated simply by the approach advocated by agile philosophy –  Agile teams can start with a simple user story to build a minimum viable product deployed into production through short and iterative sprints dotted with regular feedback loops. Agile practitioners will know that I had omitted a whole lot of other enablers, pre-requisites and tools of agile practice. For more details, refer to here, here, here, here, and here.

So how risk management can stay relevant and add value for a company that is yet to embrace digital revolution and agile (denial), or a company in transition (embrace), or a full digital/agile organisation (improve)? From my recent experiences, these are some conversation starters:

For agile aspirants, the values may lie in

  • Education of ‘agile way of working’ through a risk lens. A better understanding will lead to better acceptance for change. I was surely confused with jargons like lean, kata, Kanban, test-driven development and feature toggles and why the team believes ‘wall walks’ is better than a ‘status report’?
  • Providing people who came from an established environment with a risk assessment of ‘going agile’. An objective and independent assessment on how key attributes such as cost, time and quality can be managed in an agile environment? Why is it ok for project team to not commit to a deadline until much later into the project?

For those in transition, the values may lie in:

  • A risk assessment on how the transition as a project can impact and deliver value to the company. Informing a decision on the necessary investment in resources, people skills and organisational restructuring to maximise the values of ‘going agile’?
  • Leveraging risk knowledge in current process and practice to identify ‘best value’ opportunity for agile adoption. Being an agile champion and a catalyst for incremental changes. Highlighting the positive risks associated with agile and the risks of inaction.
  • Do what risk managers do best naturally in promoting a collaborative culture that values trust, transparency and challenging status quo.

For those already agile equipped, the values may lie in:

  • Providing management comfort over key risk areas that conventional risk information would otherwise not be available. Particularly for C-suite executives who usually came from a traditional setting. Gone are hefty business cases, periodic status reports and approval gates, replaced with risk insights from stand-ups, wall walks, demos and retrospective reviews.
  • Qualitative risk assessment in this high velocity environment where emphasis is skewed toward stakeholder feedback that is mostly subjective? Visual risk representation such as risk burn-down chart would bridge the communication between project teams and business stakeholders.
  • Contributing to the management of the most important asset – people. Devs, Agile coaches, BAs, Product owners and senior management. How best to recruit, retain, develop and grow people in an evolving environment? How to facilitate, promote and build a culture that is conducive to agile philosophy?
  • Providing risk support to the management of operational risks associated with ever moving structure and elevated autonomy. For instance proliferated user access management, IT change management (DevOps), security and availability risk consideration within the development lifecycle, incident response and problem management.

2-2

In true agile fashion, risk managers, irrespective of the industry or the disruptive journey the company is at, must also be digital-ready and agile – trial and error, explore and continuously improvement to stay relevant and add value. What is your experience?

#agile, #project management, #project risk management, #risk management, #culture, #values