{"id":1637,"date":"2021-03-07T21:46:48","date_gmt":"2021-03-07T20:46:48","guid":{"rendered":"https:\/\/benjiweber.co.uk\/blog\/?p=1637"},"modified":"2021-03-07T21:46:50","modified_gmt":"2021-03-07T20:46:50","slug":"we-got-lucky","status":"publish","type":"post","link":"https:\/\/benjiweber.co.uk\/blog\/2021\/03\/07\/we-got-lucky\/","title":{"rendered":"We got lucky"},"content":{"rendered":"\n<p class=\"lead\"><em>&#8220;We got lucky&#8221;<\/em>\u2014it&#8217;s one of those phrases I listen out for during post incident or near-miss reviews. It&#8217;s an invitation to dig deeper; to understand what led to our luck. Was it pure happenstance? \u2026or have we been doing things that increased or decreased our luck?\u00a0\u00a0\u00a0<\/p>\n\n\n\n<p>There&#8217;s a saying of apparently disputed origin: <em>&#8220;Luck is when preparation meets opportunity&#8221;<\/em>. There will always be opportunity for things to go wrong in production. What does the observation <em>&#8220;we got lucky&#8221;<\/em> tell us about our preparation?\u00a0<\/p>\n\n\n\n<style type=\"text\/css\">\nh2 { margin-top: 30px; }\n<\/style>\n\n\n\n<h2>How have we been decreasing our luck?<\/h2>\n\n\n\n<p>What unsafe behaviour have we <a href=\"https:\/\/en.wikipedia.org\/wiki\/Normalization_of_deviance\">been normalising<\/a>? It can be the absence of things that increase safety. What could we start doing to increase our chances of repeating our luck in a similar incident? What will we make time for?&nbsp;<\/p>\n\n\n\n<blockquote class=\"wp-block-quote\"><p><em>&#8220;We were lucky that Amanda was online, she&#8217;s the only person who knows this system. It would have taken hours to diagnose without her&#8221;\u00a0<\/em><\/p><\/blockquote>\n\n\n\n<p>How can we improve collective understanding and ownership?&nbsp;<\/p>\n\n\n\n<p>Post incident reviews are a good opportunity for more of the team to understand, but we don&#8217;t need to wait for something to go wrong. Maybe we should dedicate a few hours a week to understanding one of our systems together? What about trying pair programming? Chaos engineering?<\/p>\n\n\n\n<p>How can we make our systems easier to diagnose without relying on those who already have a good mental model of how they work? Without even relying on collaboration? How will we make time to make our systems observable? What would be the cost of &#8220;bad luck&#8221; here? maybe we should invest some of it in tooling?&nbsp;<\/p>\n\n\n\n<p>If <em>&#8220;we got lucky&#8221;<\/em> implies that we&#8217;d be unhappy with the unlucky outcome, then what do we need to stop doing to make more time for things that can improve safety?\u00a0<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2>How have we been increasing our luck?&nbsp;<\/h2>\n\n\n\n<p>I love the <a href=\"http:\/\/benjiweber.co.uk\/xp\">extreme programming<\/a> idea of looking for what&#8217;s working, and then <a href=\"https:\/\/benjiweber.co.uk\/blog\/2015\/04\/17\/modern-extreme-programming\/\">turning up the dials<\/a>.&nbsp;<\/p>\n\n\n\n<p>Let&#8217;s seek to understand what preparation led to the lucky escape, and think how we can turn up the dials.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote\"><p><em>&#8220;Sam spotted the problem on our SLIs dashboard&#8221;<\/em> <\/p><\/blockquote>\n\n\n\n<p>Are we measuring what matters on all of our services? Or was part of <em>&#8220;we got lucky&#8221;<\/em> that it happened to be one of the few services where we <em>happen<\/em> to be measuring the things that matter to our users?\u00a0<\/p>\n\n\n\n<blockquote class=\"wp-block-quote\"><p><em>&#8220;Liz did a developer exchange with the SRE team last month and learned how this worked&#8221; <\/em><\/p><\/blockquote>\n\n\n\n<p>Should we make more time for such exchanges or and personal learning opportunities?\u00a0<\/p>\n\n\n\n<blockquote class=\"wp-block-quote\"><p><em>&#8220;Emily remembered she was pairing with David last week and made a change in this area&#8221; <\/em><\/p><\/blockquote>\n\n\n\n<p>Do we often pair? What if we did more of it?<\/p>\n\n\n\n\n\n<h2>How frequently do we try our luck?<\/h2>\n\n\n\n<p>If you&#8217;re having enough production incidents to be able to evaluate your preparation, you&#8217;re probably either unlucky or unprepared ;)<\/p>\n\n\n\n<p>If you have infrequent incidents you may be well prepared but it&#8217;s hard to tell. <a href=\"https:\/\/en.wikipedia.org\/wiki\/Chaos_engineering\">Chaos engineering<\/a> experiments are a great way to test your preparation, and practice incident response in a less stressful context. It may seem like a huge leap from your current level of preparation to running automated chaos monkeys in production, but you don&#8217;t need to go straight there.&nbsp;<\/p>\n\n\n\n<p>Why not start with practice drills? You could have a game host who comes up with a failure scenario. You can work up to chaos in production.\u00a0<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2>Dig deeper: what are the incentives behind your luck?<\/h2>\n\n\n\n<p>Is learning incentivised in your team, or is there pressure to get stuff shipped?&nbsp;<\/p>\n\n\n\n<p>What gets celebrated in your team? Shipping things? Heroics when production falls over? Or time spent thinking, learning, working together?<\/p>\n\n\n\n<p>Service Level Objectives (SLOs) are often used to incentivise (enough) reliability work vs feature work\u2026if the SLO is at threat we need to prioritise reliability.&nbsp;<\/p>\n\n\n\n<p>I like SLOs, but by the time the SLO is at risk it&#8217;s rather late. Adding incentives to counter incentives risks escalation and stress.&nbsp;<\/p>\n\n\n\n<p>What if instead we removed (or reduced) the existing incentives to rush &amp; sacrifice safety. Remove rather than try to counter them with extra incentives for safety? &#x1f914;  <\/p>\n","protected":false},"excerpt":{"rendered":"<p>&#8220;We got lucky&#8221;\u2014it&#8217;s one of those phrases I listen out for during post incident or near-miss reviews. It&#8217;s an invitation to dig deeper; to understand what led to our luck. Was it pure happenstance? \u2026or have we been doing things that increased or decreased our luck?\u00a0\u00a0\u00a0 There&#8217;s a saying of apparently disputed origin: &#8220;Luck is&#8230;  <a href=\"https:\/\/benjiweber.co.uk\/blog\/2021\/03\/07\/we-got-lucky\/\" class=\"more-link\" title=\"Read We got lucky\">Read more &raquo;<\/a><\/p>\n","protected":false},"author":2,"featured_media":1639,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[21,17],"tags":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v14.9 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<meta name=\"description\" content=\"&quot;We got lucky&quot;\u2014it&#039;s one of those phrases I listen out for during post incident or near-miss reviews. It&#039;s an invitation to dig deeper; to understand what led to our luck. Was it pure happenstance? \u2026or have we been doing things that increased or decreased our luck?\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/benjiweber.co.uk\/blog\/2021\/03\/07\/we-got-lucky\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"We got lucky - Benji&#039;s Blog\" \/>\n<meta property=\"og:description\" content=\"&quot;We got lucky&quot;\u2014it&#039;s one of those phrases I listen out for during post incident or near-miss reviews. It&#039;s an invitation to dig deeper; to understand what led to our luck. Was it pure happenstance? \u2026or have we been doing things that increased or decreased our luck?\" \/>\n<meta property=\"og:url\" content=\"https:\/\/benjiweber.co.uk\/blog\/2021\/03\/07\/we-got-lucky\/\" \/>\n<meta property=\"og:site_name\" content=\"Benji&#039;s Blog\" \/>\n<meta property=\"article:published_time\" content=\"2021-03-07T20:46:48+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-03-07T20:46:50+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/benjiweber.co.uk\/blog\/wp-content\/uploads\/2021\/03\/lucky.png\" \/>\n\t<meta property=\"og:image:width\" content=\"937\" \/>\n\t<meta property=\"og:image:height\" content=\"754\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebSite\",\"@id\":\"https:\/\/benjiweber.co.uk\/blog\/#website\",\"url\":\"https:\/\/benjiweber.co.uk\/blog\/\",\"name\":\"Benji&#039;s Blog\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":\"https:\/\/benjiweber.co.uk\/blog\/?s={search_term_string}\",\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/benjiweber.co.uk\/blog\/2021\/03\/07\/we-got-lucky\/#primaryimage\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/benjiweber.co.uk\/blog\/wp-content\/uploads\/2021\/03\/lucky.png\",\"width\":937,\"height\":754},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/benjiweber.co.uk\/blog\/2021\/03\/07\/we-got-lucky\/#webpage\",\"url\":\"https:\/\/benjiweber.co.uk\/blog\/2021\/03\/07\/we-got-lucky\/\",\"name\":\"We got lucky - Benji&#039;s Blog\",\"isPartOf\":{\"@id\":\"https:\/\/benjiweber.co.uk\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/benjiweber.co.uk\/blog\/2021\/03\/07\/we-got-lucky\/#primaryimage\"},\"datePublished\":\"2021-03-07T20:46:48+00:00\",\"dateModified\":\"2021-03-07T20:46:50+00:00\",\"author\":{\"@id\":\"https:\/\/benjiweber.co.uk\/blog\/#\/schema\/person\/45ecb36b51f4ce99e6929d2d31ca5c09\"},\"description\":\"\\\"We got lucky\\\"\\u2014it's one of those phrases I listen out for during post incident or near-miss reviews. It's an invitation to dig deeper; to understand what led to our luck. Was it pure happenstance? \\u2026or have we been doing things that increased or decreased our luck?\",\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/benjiweber.co.uk\/blog\/2021\/03\/07\/we-got-lucky\/\"]}]},{\"@type\":\"Person\",\"@id\":\"https:\/\/benjiweber.co.uk\/blog\/#\/schema\/person\/45ecb36b51f4ce99e6929d2d31ca5c09\",\"name\":\"benji\",\"image\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/benjiweber.co.uk\/blog\/#personlogo\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/05fb47b31a0b329e1b790074a9b624ef?s=96&d=mm&r=g\",\"caption\":\"benji\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","amp_enabled":true,"_links":{"self":[{"href":"https:\/\/benjiweber.co.uk\/blog\/wp-json\/wp\/v2\/posts\/1637"}],"collection":[{"href":"https:\/\/benjiweber.co.uk\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/benjiweber.co.uk\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/benjiweber.co.uk\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/benjiweber.co.uk\/blog\/wp-json\/wp\/v2\/comments?post=1637"}],"version-history":[{"count":12,"href":"https:\/\/benjiweber.co.uk\/blog\/wp-json\/wp\/v2\/posts\/1637\/revisions"}],"predecessor-version":[{"id":1650,"href":"https:\/\/benjiweber.co.uk\/blog\/wp-json\/wp\/v2\/posts\/1637\/revisions\/1650"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/benjiweber.co.uk\/blog\/wp-json\/wp\/v2\/media\/1639"}],"wp:attachment":[{"href":"https:\/\/benjiweber.co.uk\/blog\/wp-json\/wp\/v2\/media?parent=1637"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/benjiweber.co.uk\/blog\/wp-json\/wp\/v2\/categories?post=1637"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/benjiweber.co.uk\/blog\/wp-json\/wp\/v2\/tags?post=1637"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}