Summary: How Google Measures Developer Productivity

Google’s reputation as an industry leader in measuring developer productivity is well-earned. The company’s meticulous and comprehensive approach to understanding developer productivity metrics sets a gold standard for the industry.

In a recent interview, Abi Noda (CEO at DX and programmer, researcher and advocate of helping leaders to measure developer productivity in the right way for their teams) interviewed Google Engineering Productivity Researchers Ciera Jaspan and Collin Green.

It’s an extremely open, detailed and fascinating discussion about the developer experience path that Google has gone down and where they are today.

We highly recommend giving it a listen here, but we thought we’d give you a brief summary of some of the key takeaways.

Introduction to Google’s Engineering Productivity Research Team and its Mission

Google’s Engineering Productivity Research team was established to enhance developer productivity through informed decision-making. Unlike the traditional method of relying on educated guesses to determine tooling needs, this team takes a more systematic approach.

Their diverse team comprises of, not just software engineers, but also UX researchers, behavioural economists, social psychologists and public health personnel.

The amalgamation of contrasting professional backgrounds is a pivotal element in the team’s success. This blend allows them to decode the intricate layers of developer behaviour, leading to a holistic comprehension of their actions and feelings. The synergy between behavioural research methods from UX research and the scalability and domain expertise from software engineering results in a comprehensive approach to research.

This impressive mix enables a multi-dimensional and holistic understanding of developer experiences and needs.

“We wanted to create a team that would better understand on the ground developer experience so that we could figure out how to improve the tooling, the processes, everything that people do.” – Ciera Jaspan

The Triangulated Dimensions of Productivity

Ciera Jaspan stresses that there isn’t a one-size-fits-all metric for measuring developer productivity, and this is core to Google’s measurement philosophy:

“When we’re measuring developer productivity, we have a general philosophy first. There is no single metric that’s going to get you developer productivity. You have to triangulate on this. We actually do that through multiple axes.”

The first of these axes mentioned is: speed, ease and quality.

These three dimensions, while sometimes in tension, provide a holistic view of developer productivity. This approach involves multiple measures, each validated through a mixed methods process. For instance, the team employs logs for speed measurement, supplements it with self-reported perceptions of speed, and substantiates findings through diary studies and interviews.

By combining various research methods like;

diary studies
surveys
interviews
qualitative analysis
logs analysis

this gives further confidence to the data yielded. Through triangulating on developer productivity by comparing the outcomes of different research methods, this ensures that the findings are aligned and provide a comprehensive understanding.

Unifying Behavioural and Log-Based Data

Rather than treating one source as the ground truth, the research team harmonises behavioural data from sources like diary studies with log-based data.

This approach draws inspiration from psychological research, where multiple observers’ perspectives are weighed equally. Such synergy between behavioural and log data enhances the team’s ability to validate and cross-reference their findings accurately.

“We actually use the approach that psychologists have taken to iterate a reliability… Are these two lenses telling us about the same world?” – Collin Green

While log-based metrics are scalable and ideal for broad analysis, behavioural methods like diary studies enable a deep dive into a smaller subset of developers, yielding nuanced understanding.

In addition, behavioural methods serve to measure aspects that are challenging to quantify objectively. Surveys, for instance, aid in gauging technical debt and engineer satisfaction—parameters not easily deduced from logs. These methods bridge the gap between the objectively measurable and the subjective, enhancing the team’s insights and providing a comprehensive view of developer experiences.

“Surveys can help you measure things that you don’t know how to measure objectively… augmenting what we can do objectively with we can measure flow or satisfaction now.” – Collin Green

Cultivating Buy-In Across the Company

A highlight of Google’s approach is the quarterly engineering satisfaction survey. This survey, which runs for over five years, samples about one-third of Google engineers each quarter. It includes structured questions and open-ended prompts that help gather quantitative and qualitative insights. The open-ended responses have proven invaluable, giving tooling teams direct qualitative information about pain points and opportunities for improvement.

However, as you can imagine, it is a huge investment in time, money and other resources. Furthermore, it ‘requires engineer’s time, attention and effort’, so assessing a large number of engineers would seem counter-intuitive as the aim of all this is to measure and increase productivity. Thus, small groups of engineers are surveyed at a time. But once evidence from two sources matches up, the research team are able to focus on the more scalable methods.

Early on, scepticism from engineering leaders towards survey data was addressed by emphasising the aspects that are difficult to measure through other means. Colin highlights the importance of consistency and executing surveys longitudinally to establish credibility over time.

Addressing the challenge of gaining buy-in, Ciera describes their approach with skeptical VPs. They encouraged them to begin with survey data, as logs data alone lacks context of being good or bad. Ciera explains how they guided VPs to identify top problems through the survey results, offering a structured approach to problem-solving. Once the issues are identified, they advise looking at logs data to quantify the extent of the problem.

When survey data aligns with objective measures, it bolsters the confidence in both metrics. This alignment is especially crucial when convincing engineering leaders of the validity of survey-based research – which is no easy feat.

“…people are under the misimpression that it’s very easy to run a good survey, when in fact the easiest thing you can do is run a terrible, terrible survey.” – Colin Green

A Human-Centred Approach to Developer Productivity

Abi praises the paper recently co-published by Green and Jaspan, for its unique insights on the challenges of measuring developer productivity and the necessity to put the ‘human worker’ at the centre of it.

Colin and Ciera explain that the paper emerged from a culmination of experiences and discussions. One driving factor was the tendency to rely on existing metrics out of convenience, rather than considering a more comprehensive perspective. Furthermore, the desire to reintroduce the human element into productivity measurement prompted them to write this paper.

They caution that ignoring human-related issues can lead to incomplete analyses and skewed outcomes. The paper advocates for a balanced approach that acknowledges the structural solutions for human problems, while recognising the behavioural aspects of productivity. They also emphasise the significance of maintaining empathy and avoiding the oversimplification that can occur when focusing solely on quantifiable metrics.

It seems all to easy to forget when your heads in the data to remember that the subjects of all this are human beings and not computers or robots.

There’s a series of three papers I was reading when I was looking at published research where I got frustrated because somebody would do some research, for example, to understand hindrances and developer productivity, and then they’d get a bunch of hindrances from developers and they’d toss half of them away. Well, those are fluffy human problems, basically. Set them aside. We’re not going to talk about that. I’m going, “Well, no, but these things are tied together. You can’t separate out hindrances to productivity in human fluffy problems that are HR things versus tool hard, tool problems.” – Ciera Jaspan

In this discussion it is acknowledged that this concept can be hard for all in the industry to embrace. There is a desire for simple answers and the inclination to opt for familiar metrics. However, a shift in perspective is required to understand productivity measurement: it not solely about numbers but more importantly about the individuals, the organisation and the tools and people they interact with.

Measuring Technical Debt with Surveys

One of Google’s recent remarkable achievements in this area is attempting to define and measure technical debt, a concept that often eludes a concrete grasp. The team’s unique survey-based approach revealed that different types of technical debt hinder developers to varying extents.

While Google explored objective metrics in the codebase to correlate with survey results, the results were inconclusive. The conversation delved into the nuanced nature of technical debt, highlighting that it’s not simply about code quality but involves the conscious trade-offs made in development.

“…this points out that human cognition and reasoning play a big role in developer productivity, particularly because the conception of an ideal state of a system and using that imagined state as a benchmark against which the current state can be judged or measured.” – How Google Measures and Manages Technical Debt, Research Paper

The conversation beautifully encapsulated the role of human judgment in software development. The authors discussed how an idealised state, even one driven by perception and expectation, shapes our evaluation of current systems. This concept is not exclusive to technical debt but extends to all facets of software development, reinforcing the critical nature of human intuition in understanding the complexities of the field.

Colin mentions, “Engineers keep saying technical debt is a problem. I don’t even know what they mean.” They arrived at 10 kinds of technical debt by analysing survey responses and exploring consistent themes. This approach allowed them to identify major challenges and provide actionable insights for different teams.

Ciera reflects on the limitations of using surveys alone and the desire to have objective measures for technical debt. She describes how engineers wanted more specific indicators to identify and address hotspots of technical debt in their code bases. However, the team’s analysis of various objective metrics, such as code quality and lint errors, did not yield meaningful correlations with engineers’ perceptions of technical debt.

Whilst acknowledging that much more exploration into measuring technical debt is needed, this conversation sheds light on the art and science of measuring productivity, reaffirming Google’s position as an industry leader in the field.

Google’s approach to measuring developer productivity serves as an illuminating example for the industry, redefining conventional practices and championing a human-centered, multi-dimensional perspective. The insights shared in this interview emphasise Google’s commitment to continuous improvement and offer a blueprint for others seeking to navigate developer productivity measurement within their own teams.

Tags:

A Summary: How Google Measures Developer Productivity (a conversation)

Introduction to Google’s Engineering Productivity Research Team and its Mission