Guest blog: Sean Luke, chief engineer for reliability engineering at DWP Digital, explains the demands of a discipline that is growing in importance
Last month I wrote a blogpost about how we’re building a site reliability engineering (SRE) capability in DWP Digital, suggesting our approach to it is unique. In this post I’ll explain why.
Firstly, a re-cap that SRE is a discipline often seen as an extension of DevOps – a combination of software development and operational support – with an emphasis on coding, the development of new features and automation. It provides an opportunity for a transformation of the public sector’s use of digital and is now a crucial element of the DWP’s own transformation.
I’ve recently completed research on the non-technical aspects of reliability engineering and looked into the type of skills SREs need in order to be effective in the role. Contact me if you’re interested in the source material I used, I’d be happy to share my reading list.
I’ve also run a small scale experiment with my team to test some of my theories. When we defined the SRE role we experimented with a more collaborative approach and engaged with the wider set of stakeholders early on in the process. Defining new roles in DWP can be a lengthy process but we managed to compress the task down to about eight weeks from a standing start to a full blown public recruitment campaign.
The experiment was not only successful but quite revealing about how effective collaboration can be as one of three pillars in speeding up team tasks. I found that taking a wider than usual stakeholder group through a workshop, then keeping that group actively involved in the subsequent follow-up work, meant that we built a good deal of trust across the group.
Trust is the second of those three pillars and we established it by forming the collaborative group earlier than usual and keeping it together for far longer than normal. I noticed that the energy levels within the group remained high during all of our subsequent follow-up sessions and guess that’s an indicator of how much trust there really is.
The third pillar is experimentation. The group were encouraged to think beyond the established process and focus on the target outcome, and this delivered a sizeable increase in the creativity of the group and a willingness to challenge and adapt existing processes to achieve a better outcome more quickly.
This got me thinking about how important collaboration and non-technical skills will be for building a successful reliability engineering capability. When reading through SRE forums and published experience in the field I discovered many parallels with the research and experiments undertaken to define the SRE role.
I’ve distilled this down to a set of non-technical skills that I think every SRE should work on if they want to be successful. The non-technical skills fall into two broad categories: team skills and human traits.
The team skills include being collaborative and systematic, willingly sharing knowledge, taking personal reputational risks in sharing ideas, communicating effectively, being decisive, continuously learning and being risk aware.
The human traits include remaining calm under pressure, being creative, possessing a growth mindset, being an evangelist, having a passion for technology, having empathy for colleagues and thinking critically.
These skills are complemented by aspiring to be what IT analyst Gartner terms a ‘versatilist’ – a tricky word to master that means someone who has a deep knowledge of a few technical domains and an excellent understanding and knowledge of a few more.
It also helps if you are a natural problem solver, good at estimating and sizing, an effective time manager, good at publishing knowledge, and have a mastery of agile methods.
Although the SRE role is truly technical in nature, in order to be effective we can’t ignore the non-technical skills and they need just as much development as the technical ones. The key difference is that the non-technical skills never go out of date, they just keep improving.
I’ve built a number of non-technical skills into our role definition to ensure that SREs here feel empowered to develop them. I’ll be encouraging them to experiment and collaborate in order to build stronger, more capable teams, and for this to become the norm throughout our organisation.
There is a recognition across the industry that human attributes are becoming more important and are increasingly a differentiator of candidates aiming to land new types of role such as SREs. Collaboration, experimentation, and trust are the three change accelerators we need in order to deliver reliability engineering.
Join our team
Be part of the one of the largest and most exciting digital transformations in the world. Apply now by visiting our careers website.
Image from GOV.UK, Open Government Licence v3.0