On Friday, September 28, 2018 I attended Chaos Conf 2018 at the Alamo Draft House in the Mission section of San Francisco. This was Gremlin’s first conference and it was a tremendous success. A sold out 350 attendees took part In discussions surrounding the emerging field of chaos engineering. The profile of the attendees were mainly DevOps/Ops and Software Engineers who wanted to grow as agents of chaos. The idea behind the event is to expand the minimization of risk by ensuring resilience in applications that are deployed to production. The profile of the attendees were mainly DevOps/Ops and Software Engineers who wanted to grow as agents of chaos. You can find video from this event here on YouTube.
The event was emceed by Ana Medina and Tammy Butow who really kept the day moving well and kept attendees entertained. The talks were extremely high value and included many leaders in the world of cloud computing including Microsoft’s own Jessie Frazzelle, AWS’s Adrian Cockcroft and Charity Majors of Honeycomb.
Kolton Andrus of Gremlin opened up the day with many points of why Chaos Engineering is so important in distributed systems. What really stood out to me during his talk was this slide:
Bright stuff, right? I agree. That’s why I really enjoyed what he had to say to kick off the morning. With that statement he also announced Gremlin’s Series-B round as well as a new application level chaos tool. I’ve been friendly with him since the days Gremlin was launched, Kolton and his team have built a really special company.
After Kolton, we heard from the always insightful Adrian Cockcroft. His history of Chaos Engineering along with taxonomy associated with the practice was great to hear. Adrian used a term in which made me remember many situations in which organizations I worked within had made plans for failover and disaster situations which essentially were comparable to the security lines in a TSA line at the airport or security theater. Having that backup datacenter is useless if you don’t actually know how the application you are running is going to recover.
Talks followed by Mark McBride of Turbine Labs, Mikolaj Pawlikowski (of Bloomberg), Christophe Rochefolle, Vilas Veeraraghavan (Director of Engineering at Walmart Labs), and Ronnie Chen of Twitter before lunch, Ronnie’s discussion about Chaos and her experience with deep sea diving was riveting and terrifying. If you can find time to watch her talk, I really do recommend you do. Ronnie focused on how inexperience in either diving or deploying applications requires you to focus heavily on the things that could potentially go wrong. Your team should always focus on skilling up the least experienced members and “raise the floor” rather than false bar raising.
After Ronnie, the Gremlin Empresses of Chaos, Ana and Tammy demoed the Gremlin product for us all. They did what they called “AKS vs EKS” which was essentially running Gremlin service attacks against Kubernetes clusters hosted on both AWS and Azure. This was a lot of fun and the crowd really was into the video game theme!
Charity Majors, the CTO and founder of Honeycomb.io did a great talk about what exactly observability means and why testing in prod isn’t just good, it’s a must. Her style of blunt truth and anecdotes always makes me appreciate we have her voice in the world of technology. Rather than pull punches or just push a product, Charity tells it like it is, you need to work on ensuring applications will recover in production. Her best quote in my opinion was, “without observability, you don’t have chaos engineering. You just have chaos.”
Jessie finished our day with a talk about breaking containers, or at least trying. Jessie talked about being bored and creating chaos in her free time at Docker. Also her talk included something that really stood out a lot – an admission of making mistakes is part of learning as you write software. It was great stuff, everyone loved hearing what she had to say. Jessie really is one of Microsoft’s shining stars. I continue to be impressed by all the work she puts out there and am super proud to be on the same team as someone quite as brilliant as her.
Chaos Conf was GREAT. I got to spend a ton of time with people who are looking at new ways to create better and more resilient applications. I saw many old friends, met new ones and learned a lot. The end of the day happy hour was a lot of fun and had some really tasty snacks. Overall it was a fun day and really one of the best new conferences I have ever been to. Great job, great time, great people.
Here’s a few tweets from the event I really liked!