Recent Articles
Dumb Leadership Mistakes I’ve Made
Ignoring intuition for the sake of “being logical,” data-driven theater, ignoring my role as a business leader, and more.
You will need to make these mistakes for yourself.
I’m sharing here to give you some language around these mistakes, in hopes that it helps you recognise when you make them, and you can accelerate your own learning.
Dismissing intuition. Intuition has a bad reputation of being emotional and subjective. In reality, it’s a summary of millions of interactions and experiences — data points — that you’ve accumulated in your career. Then you are doing some crazy hind-brain computations on all of those data points and coming up with a judgement. There’s a reason that you think that way, so don’t dismiss it. I used to downright ignore intuition in a lot of circumstances because it wasn’t “logical” enough, or even because it felt unfair at times. I’ve worked on strengthening some skills around my intuition that help me harness it more effectively. It comes down to this: show your work. Intuition often relies on cues from body language, pattern matching from past experiences, details in presentation, etc. Get clear on what you observe and how it influences your confidence. Then nail down a clear answer on this critical question: how will I know if I am right?
Data-driven theater. Many times I’ve ignored intuition because being “a data-driven decision maker” was part of my identity. But it was a load of crap. Even if I had access to good data (which I rarely did), I was not in a position to actually use it to make decisions, because that was not part of the muscle of my particular org at that time. As an executive coach, I see this pattern in many leadership teams I work with. Two things can be true: you can want to be a data-driven or data-informed org, but not have access or opportunity to do so. Instead of lying about it (to yourself and others), be honest that you’re making decisions based on mission or vision, but not on data. The biggest mistake here isn’t making decisions without data (though I would encourage you to do otherwise), the mistake is believing you’re data-driven when you really aren’t. It’s hard to become data-driven when you falsely think you already are.
Trying to be smart instead of making other people smart. For a long time, I thought credibility made a person trustworthy. Deep down I knew this couldn’t be true; I’d worked with enough “brilliant assholes” to know that credibility couldn’t be everything. Still, earlier in my leadership career, I relied on expertise as the main way to gain people’s trust. This meant that I operated in solving mode more often than I should have. All of those things I solved could have been (sometimes better) solved by my team, and I took the opportunity away from them to protect my own ego and desire to “stay technical.” This mistake doesn’t just apply to situations with your direct reports. When explaining technical subjects to your peers on the leadership team, your goal for is for them to first understand what you are saying, and then second to gain independence with the topic so they can build on the foundations you are helping them build. The point of those conversations is not for them to walk away thinking “wow, they are so smart — they used a bunch of words I didn’t understand!” Even if you don’t use precise terminology, focus on explaining the concepts. This won’t be your last conversation about it, so you don’t need to cover every detail. It’s your job to figure out how to explain it so that it make sense, not theirs to try to decrypt a bunch of jargon.
Not utilizing experts soon enough. I believe in hiring people who also have with the “most everything is figure-out-able” gene. A lot of startups prefer to hire this kind of person too. But brute forcing your way through a problem costs a lot of time, and time can be even more scarce than money. Look to an expert to answer specific questions for you, or to provide support in an area where you haven’t built up muscle yet. You do not need to hire a full-time employee to do this — that’s another mistake. There are tons of experts out there who work on a project basis. These are one-off costs that save you time.
Not realizing that I’m not an engineering leader. I’m a business leader. Yes, expertise in engineering is important and it’s critical to do my job well. But I had to stop just thinking about how my decisions would impact engineering, and think about how they impact the product and the business. This is a skill that is underdeveloped in a lot of engineering leaders, simply because it hasn’t been expected of us in the past. Our reality is different now though. Engineering leaders are being asked to show their work and provide data and metrics to show both the impact of their decisions and the performance and contributions of their organizations. Beyond reporting up to your execs and board, being able to articulate the business impact of what you’re doing makes you a better partner to your peer leaders in other parts of the org.
DORA vs SPACE vs DevEx
I get asked a lot about various developer productivity metrics frameworks, so I’ve written a detailed whitepaper about each framework, what their goals are, and when you might consider using them.
It’s long and just looks nicer as a PDF, so the link is below.
What is Technical Debt?
We know it when we see it, but are you sure that everyone on your team is thinking about it in the same way?
“Too much technical debt.”
Whenever I work with a team, this is overwhelmingly the biggest problem cited by leaders and developers alike.
If you’re a developer (or have been one in the past), chances are this resonates with you. And if I ask you what technical debt is, you might pause to think for a bit, but you’ll come up with something.
On a recent project, I asked about 35 people (almost all managers who were previously senior/staff engineers) how their teams classify technical debt. This group had so many common traits – same company, same job – yet there were no two common answers, and even some spirited disagreements about the definition of tech debt.
“We know it when we see it.” Most developers (and former developers) have a bit of a spidey-sense when it comes to identifying technical debt. However, this does not serve us well when it comes to communicating the need to address technical debt to outside stakeholders. If we can’t even define it, how can we expect someone else to get on board?
What worries me a bit more about this lack of common definition is that technical debt is being studied more and more (a good thing) and now there is more concrete proof of the negative impact of technical debt – namely slow execution, poor quality, low morale, and even low self-esteem in developers (a bad thing).
It’s hard to choose interventions to prevent these poor outcomes if we don’t know what we’re up against.
Just ask them
Before going any further, I will fall back to the advice I give in almost every scenario: ask your team.
It is somewhat irrelevant if the “academic” definition of technical debt includes or excludes, for example, addressing security issues. If your team perceives security issues as a form of technical debt, it is. It is still going to result in all of the negative effects discussed above.
Talking to your team may seem like an obvious suggestion, or perhaps even silly. But have you ever asked the question “what do you consider to be technical debt?” before asking the question “are you satisfied with the amount of technical debt on our team?” Most teams haven’t, and you might be surprised by the conversation.
Defining technical debt
As with many things in software development, we can look at the pragmatic side of things (what’s going on in practice) or the dogmatic side (what the theory says).
Pragmatism always wins with me, so let’s start there.
My definition: the negative result of intentional or unintentional suboptimal decisions when building software.
Breaking it down a bit more, there some key variables at play in that definition:
Negative impact or friction
Quality of the decision
Time
And a wildcard: perception of team autonomy
Negative impact
You might choose to take out a loan to buy a house or a car instead of waiting an extra 5, 10, or 25 years to save up cash. So this was overall a positive decision for you, but it does carry some negative consequences: you’re going to pay more over time. Technical debt also comes with negative consequences, whether it’s slower development times, lower quality, or something else. This negative impact is crucial in defining technical debt. If there’s no friction or pain, that debt is essentially “free money,” and it’s hard to argue to pay it back immediately.
Quality of the decision
The decision leading to that negative result was suboptimal, but it may either be intentional or unintentional. Additionally, your team may not have been responsible for making the suboptimal decisions, yet needs to live with the consequences. I’m sure everyone reading this can think of an example where the business urged a feature to be released on a tight schedule, which contributed to technical debt.
It’s also hard to predict future business needs, which is why I mention both intentional and unintentional decisions as a factor of technical debt. Your team may knowingly need to postpone maintenance work, causing it to increase in scope and complexity once you have the time to get around to it. This is suboptimal but intentional. You may have also made a design choice early on in your product’s life that no longer serves the business, and your team feels like they’re continuously building on top of a system held together with toothpicks and duct tape. These decisions are still suboptimal in hindsight, but the effects were unintentional.
Time
Time plays a role. There are so many projects that we might exclude from our definition of technical debt on day 1, but on day 1000, they are absolutely technical debt. Postponing maintenance, dependency upgrades, or security issues would fall into this category. Across the 35 technical leaders I spoke with, I asked the question, “when does maintenance turn into technical debt?” Some leaders classified all maintenance work, even predictable routine maintenance, as technical debt. The majority settled on a definition similar to “if you wait too long, it becomes technical debt,” which is what I generally agree with. In this definition, we need to relate the suboptimal decision (postponing maintenance work) back to a negative result (lower quality, security risk) in order to consider it technical debt.
Lack of autonomy
There is one wildcard here: perception of team autonomy. I’ve written and spoken about how self-perception matters much more than we think it does when measuring productivity and other factors of software development. Oddly, technical debt is not an exception here, even though it feels like it should be something easier to objectively quantify.
Teams may still be unhappy with the amount of technical debt they have if they feel that they are not able to propose and execute on self-directed projects. In this situation, the definition of technical debt isn’t the one that I mentioned above, but rather a perceptive measure of how much control the team has.
If we don’t have control over what we’re working on, we must be accruing technical debt, because only we would know how to avoid it.
This is a sticky situation to be in as a team. Some leaders have asked me, “how can I help my team advocate for refactoring this piece of architecture that really bothers them? We know it’s the wrong design.” When we go through the evaluation criteria, inquiring about the negative result of not doing it, considering the severity and occurrence of those consequences, sometimes the team comes up empty-handed. There’s no compelling business reason to repay the debt now – and a question might be raised about if it should even be considered debt if it’s not having a negative impact on delivery or quality. These are uncomfortable conversations to have as a team.
Now, since we know more about the human impact of technical debt, specifically how it can tie into loss of productivity, low morale, and developer turnover, there is potential to develop a stronger argument for taking on these kinds of projects. Even in a cooler market like the one we are presently in, developer attrition is an expensive problem.
What research says
Usually, technical debt is described as a metaphor to compare the phenomenon of suboptimal development conditions to actual financial debt, where one must pay interest. Technical debt does also have real monetary cost, which is why it’s so interesting to businesses. At this point, these costs have been studied more extensively than the human factors, though that is changing in recent years (besides “just ask them,” my second rule is “everyone speaks money”).
I can best summarize the dogmatic definition of technical debt as “suboptimal choices that cost money.”
In practice, my observations are that most development teams, even in small companies, are too far separated from these types of financial models to reason about technical debt in terms of revenue, but absolutely do feel the pain of technical debt in their day-to-day work (even if they can’t quantify it in money).
What you can do next
Trying to reduce technical debt on your team?
Ask your team about what they consider technical debt and develop a common definition
Consider adding better descriptors to things you’ve labeled as technical debt. Is the issue related to security, reliability, scalability, or something else?
Speak money if you can. Can you quantify the impact of the project in cash?
Coach your team to get better at advocating for technical debt projects with stakeholders
At the same time, call out that technical debt creation will be a byproduct of certain business decisions, and state it as an explicit risk to certain decisions
Setting Clearer Expectations: The Compass vs. Map Method
How can you give your team enough information to be successful without feeling micromanage-y?
A quick technique to help set your teams up for success: are you giving your team a compass or a map?
Managers often provide too little detail when setting expectations. This is the compass method. Let’s say you’re in Italy. You tell your team to head north. They may end up in Switzerland, Lichtenstein, or Austria. They might also keep travelling and end up in Germany, or somewhere in the Arctic Circle. The compass method is just a direction, without a lot of constraints.
Managers sometimes prefer this method of expectation setting because it feels less “micromanage-y.” A clear direction should be enough to get your team where they need to go, but it’s not specific enough to inhibit their own creativity. It also doesn’t tell them how to get there — whether it’s by air, road, or sea.
The map method provides more constraints. Here, I’d give my team a map and tell them “travel north from Rome, via Berlin, and stop in Denmark.” I’m telling them the direction, but I’m also giving important constraints: an intermediate waypoint, the minimum result I’m looking for (just over the border in Denmark) to the maximum result (leaving Denmark). There’s less room for interpretation in what to do, but it’s also less likely that the team doesn’t hit my expectation.
Using the map method is challenging: you need to know what your lower threshold is. Leaders and managers are routinely pretty crappy at this, because we often don’t have the time (or take the time) to clearly define what we want to see. “I’ll know it when I see it” or “it’s their job to figure it out” are not great ways of approaching clear expectation setting. If you don’t know what you want, how can you expect your team to deliver it?
Real life map vs. compass example
Scenario: your team of engineers works closely with a product manager and the customer success team on big launches.
Compass: “If the project scope changes, the whole project team needs to be informed.”
Map: “If there are material changes to the UI or UX because of technical constraints, the whole project team needs to be informed before those changes are deployed — including PMs and customer success. It’s not enough to just mention them in your PR. We need to discuss it in Slack, and have these discussions as early as possible in case we need to change something.”
You know your team the best. Setting a direction, like in the compass example, might be enough for your team. But don’t count on it: I’ve spent hundreds of hours coaching engineering leaders, and even for very senior teams, people often need more direction than we think they do. Specifically, the key piece of information that helps teams succeed is “what’s not enough.” We get this in the map method, by sharing the minimum result, or by sharing an anti-pattern that may cause the team to think they’re doing fine (i.e. sharing the UX changes in a GitHub PR) but that actually misses the expectation that you have (sharing UX changes early and in Slack so everyone can see them, including non-engineering team members).
Remember that your team can’t read your mind.
Why Meetings Are Boring
That meeting probably could have been an email.
Meetings are boring when there is nothing at stake for the participants.
The “meeting that should have been an email” absolutely exists, and chances are you’ve put one on the calendar (we all have). I’ve even led meetings where I found my own mind wandering, because the content or format wasn’t particularly engaging. I can’t imagine what my team was thinking.
What does it mean when nothing is at stake?
The outcome of the meeting doesn’t have any measurable or meaningful consequence for the participants. They are warm bodies in seats just absorbing information that does not result in a decision, nor does it impact immediate priorities. The information is just… informative.
Status updates, recurring team check-ins, and even all-hands or department-wide meetings often fall into this category. So why do we continue to call them? Lack of trust, mostly. We don’t trust that our teams will get the information unless they have a Zoom window open, looking at slides. When I put it that way, it probably sounds a bit ridiculous — because it is.
I’m not anti-meeting. But I am anti-boring meeting. If you’re still reading, you probably are too. So here’s what I do to avoid them.
Discussion Meetings vs Decision Meetings
We’ve all been given the advice to decline any meeting without an agenda, but I’ve attended plenty of crappy meetings that had a clear agenda. Beyond the sequence of what is going to be talked about, a key expectation is whether the meeting is a decision or discussion meeting.
Decision meetings should be short -- 30 min max. This is achievable when the discussion required to make the decision happens outside of the meeting. The other important factor here is that someone must be responsible for making a decision. You can try to make the decision democratically first, but for stickier topics, that won’t always work, and someone needs to be responsible. That directly responsible individual should be called out and made obvious to the team early on in the process.
Golden Rule: the bigger the group, the slower the decision-making process. I’ll cover some approaches to deal with that in the Format section below.
Discussion meetings are helpful only when the material is complex enough that it can’t be handled async. Otherwise, we’re squarely in “meeting that could have been a Google Doc” territory. For discussion meetings, create an accompanying doc that has all of the research, resources, and any other materials. The meeting facilitator should send this out with plenty of time for the participants to review and comment — but also with a deadline for when input is due. Based on the async discussion, you’ll be able to set a clearer agenda of the topics that warrant discussion in a meeting, and make an agenda from there. The outcome of this meeting is not a decision, but rather to make sure all of the information is on the table so that the decision maker can make the decision.
Splitting up the discussion and decision allows you to avoid the situation where the impacted team members get into depth while the rest of the team’s eyes glaze over, because they’re only interested in the outcomes. There’s nothing at stake for them in the discussion.
You can use this technique on a micro level as well. For example, for items raised in retro, it can be helpful to understand if the team wants to surface it for discussion and awareness, or if a decision needs to be made.
Format
Switching from one talking head to another is incredibly boring, even if the content may be interesting. Zoom meetings often just mirror an in-person meeting — that is, a bunch of people sitting in a room together — when there are a lot more interesting facilitation techniques that you can use in order to keep people engaged and get to the outcome you want faster.
Small groups
Remember the golden rule: bigger groups mean slower decisions. So make smaller groups. If there’s a decision of sizeable importance, instead of trying to get 8+ people to come to agreement, I might split the team into groups of 2-3 and give them 10 minutes to come up with a recommendation (remember — at this point, all of the information to make the decision has been covered in a discussion meeting). After the time is up, all groups share recommendations with one another. It’s surprising how often this has resulted in all teams coming up with the same recommendation, even with fairly controversial topics.
If they don’t, then you have the opportunity to have more targeted conversations about the disagreements, or skip directly to the decision-making taking their input and making a final call.
Virtual whiteboards
In a discussion meeting, a common failure mode is that teams keep returning to the same points over and over again, or the conversation feels very unbalanced (both in terms of participants as well as content). Using a virtual whiteboard can help keep conversation on track and see where people already agree.
This can be lightweight: on a Miro board, I create a bunch of sticky notes and the team drops in their ideas in a 5-7 minute brainstorming session. Then we categorise the topics, and start either with the common ones first, or the most controversial.
You can also create more structured exercises, like spider charts, quadrants, and a number of different column formats. Depending on the meeting type, a tool like Retrium can be helpful here, or you can roll your own.
Chat, polls, voting, reactions
Zoom and Teams have plenty of built in features that allow people to give input without having to unmute themselves or wait until a break in the conversation to get some airtime. You can use emoji reactions in Zoom to take polls, utilise chat to get simultaneous input from participants, or even add quizzes, shared notes, and other stuff via Zoom’s app marketplace. Some of this will seem a bit cheesy at the beginning, but it doesn’t hurt to experiment.
Independent work
Just because you’re in a group meeting doesn’t mean that everything has to be done in a group. Sometimes it’s useful to give people 5-10 minutes to read something, draft a diagram, or answer some reflection questions to prep for discussion. Bonus: you can share your computer sound only in a Zoom meeting (Share Screen > Advanced > Computer Audio) and play some worktime music during these times. This is my default worktime song.
All of these different formats keep people’s brains from melting away as they watch talking heads rotate across the screen. Remember: if you’re bored in a meeting, everyone else probably is too.
1 Tip to Help You Stay Out of “Solution Mode” During Coaching Conversations
Staying out of “solution mode” is often one of the harder things to master when you’re coaching your team. Using coaching questions can help you resist the urge to inject your own solutions, and help give the other person space to come to the solution on their own.
Good coaching questions have two criteria:
They can’t be answered with “yes” or “no”
They don’t have an answer embedded in the question.
❌ “Have you tried changing your JIRA ticket template?” A simple yes or no would answer this question, so it’s not a coaching question/
❌ “Do you think they are writing unclear tickets because they don’t understand the problem or just because they don’t have time?” This is less obvious, because it feels open-ended. But in the question itself, you’ve limited the conversation to two options: either this or that.
✅ “Why do you think this keeps happening?” This question fulfils both criteria above.
Here’s my tip.
In every conversation, there’s a pivotal moment where you can fall into the trap of solution mode, or stay out if it. It’s usually the first thing out of your mouth when someone brings you a problem.
So, just for that moment, you need to practice what you’ll say instead.
Maybe it’s something like
“What have you tried so far?”
“How are you thinking about solving this?”
“What kind of result are you looking for?”
Practice saying this phrase out loud so you can hear how you sound when it comes out of your mouth. This might feel awkward, but it will increase your likelihood of actually using the phrase.
To get better at coaching conversations, you don’t need to memorise lists of coaching questions. All you need to do is memorise one phrase that you’ll say when someone brings a problem to you, instead of going into solution mode by default.
Using Metrics to Measure Individual Developer Performance
Look beyond activity data from GitHub and JIRA and take a look at a practical example of how I use data and evidence to measure individual performance.
“What metrics should leaders use to measure the individual performance of developers on their teams?”
I get asked this question a lot. I’ve asked myself this question before, too – both as a developer and then later as a leader.
A lot of research and “best practice” will tell us that metrics like lines of code, story points completed, or deployment frequency are not appropriate to measure individual performance. This is true. These metrics came out of research and studies to measure different things, like devops maturity, software delivery capability, and overall delivery performance. Applying them to individuals is unfair at best.
But here’s the question we don’t talk about enough:
“What data are you going to use to evaluate my performance?”
This is the question that really counts — coming directly from your team members. There’s a better answer than “best practice says not to use metrics,” which doesn’t spark a lot of trust in a fair performance review.
So if not metrics like PRs and commits, then what? I’ll break down arguments against using common metrics to measure performance of individuals, and then walk you through how I approach developing an evidence-based performance management system.
Software is written by teams, not individuals
Even if a feature or component is owned by an individual, it’s likely that their code depends on systems written by others. This is the main reason why applying team-level metrics to an individual is unfair. An individual cannot fully control their performance within a system where they are just one contributor. If the metrics aren’t satisfactory, they need to be addressed on a team or system level, not an individual level.
Additionally, team practices can vary widely. Lines of code, story points completed, and deployment frequency are examples of team or system performance metrics that are sometimes applied to individuals. And here’s another layer of trickiness: how can you fairly and effectively measure individual performance when teams have different estimation practices? For example, what might be a 3-point story to one team is a 5-point story to another. So is the individual on the team who calls it a 5-point story a better performer, because they close down more points?
Along with this, we often measure the wrong metrics altogether; that is, metrics that we think will give us a strong indication of performance, but have low correlation to impact and outcomes. Abi Noda, CEO and co-founder of DX, talks about the “Flawed Five” metrics that will lead you astray, both on a team level but especially on an individual level: The elusive quest to measure developer productivity - GitHub Universe 2019
Wrong metrics mean losing trust
But is it harmful to use these metrics in order to get a “close enough” understanding of individual performance?
Yes.
A fast way to reassure your team that you don’t understand their job is to pick what they perceive as arbitrary and unfair metrics to measure their performance:
Aside from losing trust in you, your team will be concerned about the implications of these metrics. If I pair with someone and therefore don’t commit the code myself, am I penalised? What if I’m working on a design document or coordinating a release? Responsibilities of developers often go beyond data that can be scraped from GitHub. You don’t want metrics that encourage the wrong behaviours.
Gaming the system?
Do you know about New York subway dogs? The NY metro banned dogs from the subway, unless they can fit in a bag. It’s not hard to imagine what happened next.
Humans are wired to maximise incentive, and also we are pretty creative. I’m not suggesting that your teams will intentionally start to game the system when it comes to improving these metrics. But it might happen, both as a function of self-preservation, but also because you picked the wrong metrics to begin with.
Goodhart’s law states that when a measure becomes a target, it ceases to be a good measure. If you measure a factory based on weight output, expect heavy products. If you measure it based on the number of items produced, expect tiny, tiny products. A practical example: if the number of commits is a target for individual performance, expect to see some very dirty git histories. You’d probably do the same thing.
Another danger comes from metrics that encourage behaviours that have a negative impact on your business. They punish desired behaviour while incentivising damaging behaviour.
Code coverage is a target, so development hours are spent on writing more tests, but the Change Failure Rate stays the same
More story points are pushed out, but maintainability suffers
Your team hits aggressive deadlines, but ⅓ of the team resigns within 2 months
More PRs are closed, but you’re not acquiring new customers
In this case, not only can your team lose trust in your leadership capabilities, but it’s likely that your own leadership team will lose trust in your judgement, as well.
Evidence doesn’t have to mean activity data
So, what to do instead?
It’s reasonable for an individual contributor to ask about the metrics which will be used to evaluate their performance. And it is important to have a transparent answer to this question. But, it doesn’t have to involve activity data from tools like GitHub and JIRA alone. They may tell one part of the story, but it’s unlikely that activity data alone can give you a clear picture of performance across all competencies that you expect from your team.
Evidence doesn’t have to mean activity data.
Instead of looking for a list of metrics to determine how you measure performance, figure out how you want to measure performance and then find metrics that help you measure the stuff that’s important to your company. While engineering roles do share common traits and objectives across companies, there’s not really a one-size-fits-all approach that will definitely fit your company’s objectives.
Work backwards
Time for a practical example. Here’s a job posting for a Senior Ruby on Rails Engineer at Treatwell.
(Side note – a list of over 1300 of companies that are still hiring.)
Looking at the responsibilities listed in the job description, I’ll work through how I arrive at a list of metrics, and other sources of evidence, to evaluate performance.
You’ll work as part of a cross-functional squad, collaborating to deliver incremental, meaningful changes to our customers.
Most software engineering roles have this type of delivery objective as part of its core performance expectations, but each role has different expectations of what’s being delivered. First, we need to break this down into smaller objectives that are more easily supported with evidence. Since this is both the top responsibility, and also the most complex one, I’ll spend more time breaking it down. I’m going to focus on some keywords here:
Cross-functional, collaborating: working as part of a cross-functional delivery team implies that this role is responsible for more than just writing code; they are responsible for making decisions and delivering business results as an equal partner to product management and design.
Incremental: The team should deliver small changes at a rapid pace. Given the cross-functional nature of the team, competencies like estimation and prioritisation are just as important as pure execution skills.
Meaningful: Simply put, the software should perform its business function. This aligns closely to the Performance (P) category in the SPACE framework, which covers criteria like user adoption but also quality and stability.
Considering only output-based metrics from GitHub and JIRA just isn’t appropriate for the full scope of this role.
So instead, my rough list might start to look like this:
Project on-time delivery, measured by % of projects that are delivered +/- 1 week of forecasted deadline
Satisfaction with engineering partnership, measured by feedback from cross-functional partners.
Quality and reliability, measured by incident, bugs, or even customer support ticket volume.
Business performance of features, measured by user adoption and other team-defined usage metrics.
What I’m not measuring is also important.
I’ve made a choice here not to directly measure things like PR count, commits, or even story points, though delivery is part of the role. What this role description emphasised is value delivered to the user. If the value is not there, I might debug why that is with specific activity metrics. I might also look into them if I receive peer feedback that this person is not able to keep up with the pace of development.
For the other areas in the role description, I’ll go through the same process. My brief notes below:
You’ll help your team in designing the system architecture for large scale applications.
Participation in architecture decisions, with consideration for the number of decisions with direct responsibility. This would be a sum measurement, where I just count the number of times it happened.
Outcomes of these decisions, measured by quality of software and ability to deliver on-time (at this point, we start seeing the interconnectedness of some of these performance criteria)
Communication and collaboration, measured by feedback from the engineering team as well as cross-functional partners.
You’ll support and mentor junior team members, helping them create well thought out and robust solutions.
Quality of junior team members’ outcomes, measured by quality of software as outlined above, but filtering by projects where this person played a large role in mentoring and guiding junior team members
Satisfaction with learning opportunities, measured by gathering feedback from junior team members
You’ll help your team identify opportunities to improve their ability to deliver all kinds of changes to their users.
Leadership and participation in retros, post-mortems, and other continuous improvement processes, measured by instances of participation.
You’ll help with the running and maintenance of your team's applications in production.
Operational stability, measured by the quality metrics mentioned above, and also other appropriate team-defined metrics.
This list is already getting a bit long, and this doesn’t include evidence from my own observations yet.
With so many objectives and sources of information, it’s likely that some performance cycles won’t touch on every single one. That’s fine – as long as you plan for it, and make expectations clear about what happens when that’s not the case. Some things might be fine to drop off (like architectural leadership, if there were no large architecture projects during the evaluation period) but not operational stability or project on-time delivery.
Metrics for senior vs. junior roles
The more senior a role is, like the one used in the example above, the more likely it is that the role’s responsibilities focus on strategic outcomes rather than task output.
Whereas a junior engineer will have duties on the task level, a staff engineer is responsible for building systems of software that enable other teams to execute effectively. It may be perfectly reasonable to look at task-level metrics for a junior engineer, but not for that staff engineer.
Treatwell doesn’t have a published career ladder that I can reference here, but chances are that you’ll be looking at a career ladder along side a job expectations document (and if not, you can find a lot of them for reference on progression.fyi).
The next step with these metrics would be to double-check that they’re also aligned to the role’s seniority and scope. Sometimes it can be the case where job descriptions, career ladders, and performance management processes don’t actually align to each other, but they should all be reinforcing the same things.
For example, looking at Medium’s career ladder for the mobile and web engineering tracks, we see a big difference in scope between criteria for those on Track 1 vs Track 4.
Track 1 examples
Delivers features requiring simple local modifications
Adds simple actions that call server endpoints
Uses CSS appropriately, following style guide
Reuses existing components appropriately
The scope of Track 1 is at the task level, and working within well-defined systems.
By Track 4, the problems are far beyond the task level.
Makes architectural decisions that eliminate entire classes of bugs
Designed and pioneered proto-based model storage
Migrated Android persistance layer to reactive programming
A senior role is responsible for managing whole projects and strategy, while a junior engineer is responsible for managing their tasks. As expectations change, so should the metrics for evaluating performance.
If I were a Track 4 engineer, I would find it a bit silly/annoying if my commit history was taken into consideration for my performance, as long as I was hitting the objectives laid out in my role description.
Using system-level metrics to measure performance of managers and senior technical leaders
But as a Track 4 engineer at Medium, I don’t think it’s unreasonable to use the team’s total number of bugs, or total amount of time spent on bugs, as a success metric for my performance.
And that takes me to one exception: using team metrics to measure individual performance is generally unfair unless it’s the explicit role of the individual to influence the system performance metrics.
Usually, this happens at a staff+ or management level. If your role’s main objective is to support teams by improving performance of CI/CD systems, and you’ve been given resources, time, and autonomy to do so, it’s reasonable that metrics like Change Failure Rate or Build Time would be used to evaluate how effective you’ve performed your role.
No shortcuts
If you’ve read this far, hoping for a list of metrics that you can grab and start using on your teams, you won’t find one. There’s no shortcut here, but you can eliminate some trial and error by following these principles:
Instead of looking for a list of metrics to determine how you measure performance, figure out how you want to measure performance and then find metrics that help you measure the stuff that’s important to your company.
Focus on outcomes, not output, but you might use output metrics like activity from GitHub and JIRA to debug why outcomes were missed.
It might be appropriate to use team-level metrics to evaluate the performance of senior technical leaders and managers, depending on their scope of responsibility.
Watch out for Goodhart’s Law, or other cases where you metrics may encourage the wrong behaviour.
Metrics are one thing, but not everything. You still need to do the hard work of active performance management, setting expectations, giving regular feedback, and supporting your team.
Democratic Technical Decisions
Two strategies when a democratic decision seems out of reach.
I’ll set the scene: you’re in an architecture meeting, and there are a few proposals being discussed. In the ideal case, one of the proposals is an obvious winner, but that’s not the case today. You get to the decision point of the meeting, and the team is split.
Sometimes as a manager, it’s your job to play the role of tiebreaker, but I’d reach for that tactic as an absolute last resort.
Here are two quick techniques that can help break deadlocks in decision making, while still preserving the democratic spirit of the team.
Who wants pizza?
I like both pizza and sushi. If someone asks if I want pizza or sushi, I’m forced to pick between two options, when I’d really be okay with either.
The trick? Don’t make people choose. They can vote for both.
Who is comfortable with implementing proposal A?
Who is comfortable with implementing proposal B?
Instead of forcing your team to choose, you’ll get a better sense of which proposal has the most consent (even if you don’t get to 100% consensus).
Objections
If your team has a “disagree and commit” kind of culture, they may be quite good at understanding if their disagreement is within normal bounds, or if it’s strong enough to warrant more discussion or block a decision. For teams that have this self-awareness, it can help to ask for strong objections rather than trying to get everyone to unite on a decision.
You can use this technique in combination with the pizza vs. sushi example above.
If most of the team is in favour of proposal B, I might ask if someone wants to share a blocking objection to proposal B.
They can work it out
There might be a case where there is a strong disagreement or objection. In this case, you might have to work against your instinct to play referee or tiebreaker. Instead, encourage the disagreeing parties to come up with a path toward solution.
I might ask things like:
What information do you need to make a confident decision on this?
When will you have a recommendation?
Do we need to adjust any sprint goals in order for you to have time to properly research this?
What support do you need from me?
If the team (or a subset) goes through the exercise and is still unable to reach a decision, then it’s appropriate for you to step in a bit more.
But don’t take away the opportunity to solve a conflict by intervening too early.