Modernizing Our Infrastructure - Culture First
Modernizing Our Infrastructure - Culture First
- “the set of shared attitudes, values, goals, and practices that characterizes an institution or organization” Merriam-Webster
Transformations are hard but worthwhile endeavors. They require the right attitude, values, goals, and practices, all of which makes up a team’s culture. A team’s culture can make or break a transformation. Several years ago, I started running, encouraged by a co-worker to participate in a charity 5K. Before I began, I had to change my attitude about running from “I can’t” to “I can.” I already had my goal and identified the value as improved health. I created a practice schedule that included what time and days I would run. Likewise, one of the first steps to transforming an infrastructure team from traditional ways of working to modern ways of working is a change of culture and a change of attitude from “we can’t do that” to “we can do that."
Here are some of the culture changes we have made as part of transforming infrastructure teams at American and things we learned along the way.
Silos -> Cross-Functional Teams
Traditionally, infrastructure teams have been organized around specific technologies or silos, with subject matter experts focused on tasks related to a specific technology. This organizational structure encourages barriers between teams and puts artificial boundaries on what the teams feel they can and cannot do. Many get stuck in the mode I was in with running: that there are limits to what you can do, so why bother trying. Why learn about a new technology if you know you will not get the chance to use it?
With running, I found that I improved faster if I varied my exercise by riding my bike a few times each week. It not only worked different muscles but also added variety to my routine. With a cross-functional team, we can provide a similar opportunity to our team members, giving them the chance to learn new skills while keeping the work interesting. The culture changes that enable this model are critical to making the transformation successful.
My first experience with a cross-functional infrastructure team at American was partly driven by necessity as the team was small but had to support a lot of technologies. We divided the engineers into two squads, making sure each squad had an engineer with a particular skill. To implement the cultural change, we gave engineers time and space to build new skills and the opportunities to reach out to squad mates if they needed help. The variety of technology they could learn also helped keep the work from getting boring and repetitive. Our goal was to have any engineer be able to pick up a task and work it end-to-end where they would get the opportunity to use their new skills, and American received highly valued “T-shaped” engineers.
Oftentimes, infrastructure teams seek to avoid blame for issues, which usually leads to finger pointing and some level of distrust of other teams. Morale and time to deliver suffer in company cultures looking to place the blame. Sometimes, there are days when I do not run as far or as fast as I think I should be able to. If I beat myself up over it, I am less interested in running the next time. If I focus on figuring out what to do differently next time - maybe I was more tired than usual or maybe I ran too soon after eating (like after Thanksgiving dinner!) - then I can correct for that in the future.
Failures are a part of the learning process. If we really want to promote a learning culture, we need a culture where punitive action is not part of the response to failure. Teams should feel safe discussing failure publicly which allows the learnings gained from these failures to have a broader impact. Applying learnings from failures improves the system, to help avoid repeating the same failure. In infrastructure, this is done by automating tasks to remove the opportunity for human error, removing complexity, providing more visibility into the health of the infrastructure, or even swapping tools. It may mean an initial slow down or stoppage of normal delivery work to focus on improving the system. In the long term, it results in happier engineers (engineers do not enjoy fixing the same issue multiple times), stable systems and faster delivery.
Think Like A Developer
Another important culture change for infrastructure teams is think like a developer and to treat the infrastructure like any other software product. Many infrastructure engineers are already good at scripting tasks. However, this is more than just scripting. This includes how we build and manage our products and how they are consumed, as well as which products we deploy.
Version, version, version
Developers rely on version control systems for their code and infrastructure teams should do the same. Start by placing scripts into a repository, then versioning them as updates are made. Learn about branching and pull requests to allow multiple people to contribute. As you move down the infrastructure as code maturity path options like configuration management, containers, or things like Terraform all become viable ways to provision and manage the state of systems as code. Use these tools to deploy the infrastructure platforms as well, not just as resources for developers. Eventually, versioning the infrastructure stack in code will be possible. In some instances, American is far down the infrastructure as code path but in other cases we still have work to do.
Monitoring, Testing and Pipelines, oh my!
How the systems are monitored and deployed is another area where infrastructure teams can learn from developers. Infrastructure monitoring tools often include a wide array of metrics, from CPU to disk and network latency. Much like a runner who starts with a simple stopwatch, then upgrades to a step counter, then eventually GPS-based tracking, improving infrastructure monitoring is an iterative process.
Starting from basic metric-based monitoring, we can expand our monitoring to include synthetic checks, like applications use, to monitor the end user experience as well. We did this recently with a system to deploy VMs. We use a Github action, using Terraform, to build a VM, wait for the deployment to finish, then destroy it, on a set interval. These types of end-to-end monitoring provide a way to monitor health and are also easy for us to plug into CI/CD pipelines for testing new versions of objects on our platforms. As we mature into CI/CD pipelines for our infrastructure platforms, we take yet another best practice from developers. The culture of building repeatable, automated testing and deployment using a pipeline removes the tedious manual testing of the past and allows engineers to focus on the more interesting aspects of managing infrastructure. It makes it easier to open who can contribute to the platform including those who may not be experts to contribute. This helps remove resource constraints and enable cross-functional teams.
Lastly, many developers use and contribute to open source, from tooling to libraries and more. Infrastructure should embrace open source in a similar fashion. Not all infrastructure lends itself to open source, especially hardware. Teams should avoid trying to make excuses for not using open source when it is viable and able to meet the requirements. For some, the idea of not having a support number to call is concerning. This is mostly a culture change. Engineers need to know they can and should learn the product well enough to support it internally and they should be encouraged to give back to the external community as well.
Reach out to your development teams to help with this cultural change and get them involved in giving feedback, guidance, and teaching. On a previous infrastructure team at American, I was fortunate that my peer and the senior manager over the team was a former developer. His insights were unbelievably valuable in helping us think like developers.
We are still on this journey and learning and sharing learnings are a key part of the journey. We will continue to share what we learn at American as we transform other infrastructure teams.