Addressing the Root Causes of System Instability - Configuration and Change Management
After addressing compliance and risk management in the third step of our own SOC 2 compliance journey we developed in tandem with Laika, the formal nuts and bolts of configuration and change management came next.
Changes to production systems are often the root cause of system instability problems. To mitigate this risk, we embraced a change management process as part of our SOC 2 compliance. This process is documented in the configuration and change management policy.
The draft policy we received from Laika was extremely formal – the kind of thing you’d find in a big enterprise. It defined a change process with layers of approvals and sign-offs. We wanted something much lighter-weight, that works with the tools we already use like Jira, Github, and Slack. So we pretty much re-wrote this policy.
Our change management process is built around the Jira boards we were already using before we embraced SOC 2. Nothing makes it through the Jira board without getting the eyes of the full team, and discussed on a daily basis, so we cut out all of the up-front approval process. Also, almost all of our infrastructure is managed as code, so we rely on a standard code review process for approving changes. We also allow for break-glass (emergency) changes to bypass this process if they are reviewed retrospectively.
We were already using Jira and Github before SOC 2 came along. This policy just pushed us to mature our usage and train the team on the correct workflows. We’re a better company for having embraced this discipline, and we didn’t have to accept a lot of heavy-weight processes to get here.
As we stated at the outset of this series, ”The key point to understand is that [SOC 2] certification is about verifying that what you said you’d be doing in your policies is what you’re actually doing. You get to customize your policies to match the way you want to work, as long as it achieves the objectives of SOC 2. Keep that in mind as you’re reading these posts, and considering your own SOC 2 journey. It’s all about right-sizing your process.”
Want to make AWS downtime irrelevant with instant recovery protection for your AWS applications? Let’s talk.