Kolton Andrus on "Engineering for Chaos: Preparing for Disaster"

Ben Freeberg:        Hi everyone, this is Expert Open Radio.  I’m your host Ben Freeberg.  We are the hosts of TEDxAsburyPark.  Today we are here with an expert speaker, Kolton Andrus, who is a speaker at this year’s TEDx conference on May 18th.  Welcome Kolton.

Kolton Andrus:        Thank you very much.

Ben Freeberg:        Love to hear a little bit about yourself, how you're going to be pitching your idea and what the talk's about.

Kolton Andrus:        By trade I'm a Chaos Engineer, which sounds cool but let me break that down a little bit. We do Chaos Engineering. It's this kind of counterintuitive idea that we want to go break our systems in order to make them stronger. The analogy I always use when I'm home for the holidays or with my family is that of the flu shot or the vaccine, you go back 200 years and you ask them, "Hey, I'm going to inject you with this disease. Is that cool?, " you might have got a little bit of pushback.

Kolton Andrus:        That's kind of where we're at on the technical side when we run this by engineers and say, "Hey, we want to break our systems in order to find the weak spots and make them stronger. How do you feel about it?" Some get very excited, some are like, "Whoa, really? Is that a good idea?"

Ben Freeberg:        So then who are you pitching to at those different groups? Is it the engineers themselves or is it someone in management? Who gives the most pushback? Who's the most receptive?

Kolton Andrus:        Yeah, so it's certainly the messaging is for the engineers. Me and my cofounder,  were both on-call engineers at Amazon, Netflix, Salesforce and so we carried the pager. When Amazon broke at two in the morning, I hopped on a call and figured it out with a bunch of other people and fixed it. That's really who the messaging is for. We want to save that pain. We want to prevent those outages. We want to prevent people from getting woken up. We're lazy engineers. We just want the system to work well.

Ben Freeberg:        Are there any types of businesses in particular that have found your approach really exceptionally insightful or engaging?

Kolton Andrus:        Yeah, so we've certainly focused in the software industry, in particular coming from Amazon.com, the e-Commerce world, the financial world, the SaaS (Software as a Service) world, those make a lot of sense. Those are businesses that people expect to be always online and when they're down, they're losing a lot of money. Take Amazon. If they're not taking orders for a minute they could be losing anywhere from $10,000 to greater than $100,000, so it's well worth the time and investment to try to prevent every minute of downtime and the customer pain that comes with them.

Ben Freeberg:        So spending time on that side of e-Commerce, where do you see it within the next few years?

Kolton Andrus:        Yeah. It's been fun to watch the e-Commerce space grow. Amazon definitely pushed on people and you've seen the rest of the market, the rest of the industry change their approach. You know, I think this was a real competitive advantage for Amazon. They used this to have a higher degree of quality and a higher degree of reliability than their competitors. When you go back 10, 15, 20 years people would wait 30 seconds to a minute for a webpage to load.

        I used to get these AOL (America Online) discs in the mail. It would take an hour or two to download. You contrast that with the world we live in today, people get frustrated if they need to wait more than a second or two for their pages to load, and so the bar has just risen. People have higher expectations and it's more important that things work when we need them to.

Ben Freeberg:        So going back to the talk itself, so what are you going to call this talk? What's the title?

Kolton Andrus:        Embracing Chaos.  The gist is this idea that our world has become much more complex, whether it's our software systems, whether it's our government, whether it's our transportation and people are evermore reliant on this technology. When an airline has an outage people are unable to travel for work, they're unable to see their loved ones. It becomes a huge impact to society, and we think about medical technology, we think about government, the ability for people to get help, to be able to get loans, to finance.

        So we live in this world where everything has to work, but it's gotten much more complex. This idea of Chaos Engineering is really about taming that complexity. It's not about causing chaos. This is one of the misnomers. A lot of people think oh, we're going to chaotically affect our environment to understand how it happened. It's kind of the other way around. Our environment is chaotic. We're using this approach to really understand how the pieces fit together and how the failures occur so that we can better understand it, make it more stable, make it more reliable.

Ben Freeberg:        And so how could some of the less technical audience that's listening apply that idea to their day-to-day or their business?

Kolton Andrus:        We're just using the Scientific Method. We have a hypothesis, okay? So there was a big S3 (Amazon Simple Storage Service) outage a couple years ago. S3 is where people store a lot of their documents and data and a lot of people on the Internet went down because that didn't work right. So that's one of our hypothesis, you know? Hey, if we lose this ability to store data in the Cloud, how will our systems react?

:        And from that there's some measurements, there's some understanding what would we do instead. Is there a way we could get around that? That thought process really gets us to the point where we're diving into that complex system and we're sussing out the side effects and these little details that may have prevented it.

Ben Freeberg:        So why now? What's going on with either our, world, from the technological standpoint or just in terms of the social side where you think that this is the important time to share this idea? I know you've been working on it for some time and a little bit ahead of the curve, but what's going on today?

Kolton Andrus:        Ten years-plus I've been working on this idea, so a little bit ahead of the curve. On the software side our software's become a lot more complex. There's this concept of microservice architectures and so now we have, think of it almost like a graph. There's all these points and there's all these interconnected relationships. It's not five or 10, it's hundreds of these points.

        So there's two to the one hundred combinations of how things can interact and fail or have these side effects, so that complexity's a big part of it. On the software side we're running less of our own hardware in data centers. The move to the Cloud, people are now trusting Amazon, Microsoft, Google to host and run their infrastructure, but the truth is failure happens at scale often. A failure that could happen one out of 10,000 times, if you have 10,000 machines it could happen every day. Those failures are somewhat unavoidable and so we just need to be able to prepare for them.

Ben Freeberg:        So what are some of the key milestones in terms of maybe partnerships or "lucky breaks" that propelled you forward and kept you guys going?

Kolton Andrus:        So we had the opportunity to take this idea and approach and build it at Amazon. We had the opportunity to take it to Netflix, build it there, see a lot of value. See really lots of money saved, lots of engineering time saved, so that really got us started. I had the opportunity to get some VC (venture capital) funding. It's kind of a fun story. I got in an argument with a VC (venture capitalist) in the lobby of a conference about why I wasn't going to take money and I was going to bootstrap, and I have five kids, I live in California. It'd be a bit of a financial burden to really bootstrap the kind of company we want to build.

        So I think that was kind of our lucky break. It was an opportunity to find some people that really believed in what we were doing, that were willing to back us and support us and help us build the business and understand how to provide value to our customers.

Ben Freeberg:        That's great. What were the few big things you started now that you had that, a little bit more of the discretionary spending side or that ability, what were some of the big things you put it towards?

Kolton Andrus:        Yeah, it's funny, we came from Amazon. One of the core values there is being frugal so we're very thoughtful about how to spend our money. Again, with a large family we were always budget-conscious. Part of it was just being able to work on it full-time. We quit our day jobs. We were all in on this company. We were going out and building the first version. It allowed us to go talk to the customers, to understand what pain they were facing, what kind of solution they needed.

        Further down the road it let us really build a much bigger team. Our company's almost 50 now and we've grown 4X, 5X last year, so that really lets us lean in and we've gone and found the experts in the space, the people that have felt this pain and recruited them to our cause and to help us make our customers' lives better.

Ben Freeberg:        More on a personal note, so what are some things either other TED Talks, books or ideas that inspire you and propel you that you'd want to share with our listeners?

Kolton Andrus:        There's a few there. Obviously, there's a lot of great TED Talks. It's hard to pick a few. I love Simon Sinek's on The Power Of Why. Why do people care about things

Ben Freeberg:        I agree.

Kolton Andrus:        That's a favorite of mine and I enjoy his book. On the technical side, Nassim Nicholas Taleb has written a couple of books, Anti-Fragile (Antifragile: Things That Gain from Disorder), Black Swan (The Black Swan: The Impact of the Highly Improbable), Skin In The Game ( Skin in the Game: Hidden Asymmetries in Daily Life ). Kind of the crux of those is the people that are feeling the pain are the most motivated to fix it, and so often you want to align that pain so that people, you want engineers, we've seen this with this DevOps SRE (Site Reliability Engineering) trend. By making the engineers that write the software be on-call for it and get paged when it breaks, they care a lot more about the quality and preventing those things from occurring.

        Another one I recommend a lot that I've learned on kind of the negotiation business side is a book called Never Split The Difference. Chris Voss used to work for the FBI as a lead hostage negotiator.  He really learned how to negotiate well and lives were on the line. He's a very, very good writer. It's stories about what he learned and how people think and how to influence them in the right way. So I make all my sales team read that book because ...

Ben Freeberg:        What was the name of that one one more time? I want to ...

Kolton Andrus:        Never Split The Difference.

Ben Freeberg:        If the audience listening wants to have a bigger relationship with both yourself and this idea, what are some easy ways they could do it?

Kolton Andrus:        Twitter's a good one for that kind of bite-sized content. I'm @KoltonAndrus. Our  website talks a bit about this idea of Chaos Engineering. We've actually, we've spent a lot of time just teaching people about this concept and how to do it well and so we've got a public Slack community people can join through our website.

Ben Freeberg:        Kolton, thank you. Thank you so much for taking the time to be on here with us and then just to the audience, you've been listening to ExpertOpen Radio. Just a reminder to get your tickets for the largest, highest rated TEDx Conference on the East Coast, TEDxAsburyPark on May 18th, 2019, and you'll get to hear our friend Kolton speak and share some more words of wisdom. So thank you again, Kolton.

Kolton Andrus:        Thank you very much.