Thoughts on Code Reviews π
Why so many teams mess them up, and a modest proposal to do them better.
Hey, Luca here! This is my 2nd article on The Hybrid Hacker π all while Nicola is crafting a brand new, in-depth piece about salary reviews, which will go out next week!
One of the major concerns I have about running a newsletter full-time is to lose some hands-on engineering experience.
I try to compensate by doing a ton of research and talking with as many people as I can, but there is no deny that I now have some detachment from things that, up until a couple of years ago, I used to do every day.
While this is mostly a source of concern, over time I have also found an unexpected flip side.
As long as I was deep in the weeds, I used not to question many engineering practices that seemed standard to me. After all, a startup has its good share of risks already, so why spend a precious innovation token on process? Do not reinvent the wheel, Luca.
The problem with this (perfectly good) advice is that, from time to time, the proverbial wheel needs to be reinvented. Conditions change, tooling gets better, and we eventually find better ways to work.
So, I have found that by being removed from the daily engineering struggle, I feel less attached to the way I used to work, which in turn makes it easier to question things here and there.
Lately, one of my small obsessions is how we do code reviews.
Code reviews look, to me, like an imperfect solution to two supremely important problems:
π Keeping code quality high β which leads to ease of change, fewer defects, less maintenance, you name it.
π Sharing knowledge across the team β which creates alignment, growth, resilience, flexibility, and more.
These are also self-reinforcing goals: shared knowledge helps keep quality high, and, in turn, high quality code is generally crisper and easier to understand for others.
Code reviews address these goals by making code changes inspected by multiple engineers β at least two: the author, and one+ reviewers. This, to me, not only looks fine: it looks like the only possible way.
Still, even if we agree on the above, there is plenty left to figure out, like:
When should reviews be made?
Should all code be reviewed?
What goes into a review?
Who should do reviews?
Today, the most common workflow for code reviews is async + blocking + mandatory, that is:
πΒ Async β a PR is opened, one reviewer (or more) is assigned, and they will perform their review asap.
βΒ Blocking β code isnβt merged and deployed until it passes the review.
πΒ Mandatory β you do the same workflow for all changes.
I believe this is far from ideal, so I spent the last couple of weeks writing down my thoughts about it, and how I think we can do better.
I also spoke with many people in the Refactoring community, which was supremely helpful in getting real-world stories, some of which are quoted throughout the article.
So, this piece is a conversation on the upsides and downsides of code reviews, starting from first principles, and discussing both familiar and unconventional workflows.
Here is the agenda:
πΒ Why code reviews matter β and why your innovation token is probably well spent here.
πΒ The scope of code reviews β what should go into a code review?
π Β Automate / Defer / Pair β my modest proposal for a healthy review process.
π½Β Choosing for your team β how your team's maturity and seniority affects reviews.
Letβs dive in!
πΒ Why code reviews matter
This question looks banal.
However, while the main goals (quality, knowledge sharing) may be obvious, I have found that reviews have second and third-order effects that are trickier to figure out, and worth discussing.
Code reviews are your main (and possibly only) feedback loop on how your team writes code.
This feedback loop not only intercepts defects: it aligns people on practices and culture. From high level engineering principles, down to naming conventions, chances are many of these are not only enforced byβthey are literally born out ofβcode reviews.
So letβs start with what happens when this feedback loop is not in place.
A no-reviews story β
I have worked once as an EM on a team that didnβt review code. When I joined, it was made up of 4 senior engineers: they were all extremely fast and experienced, and they just pushed code to prod all the time.
Things apparently worked, and code quality was good. But there were other, invisible problems:
πͺ Gatekeeping β over time, each engineer had developed their personal areas of ownership, which had become impenetrable to everyone else.
π Inconsistency β different parts of the code had completely inconsistent choices about naming, libraries used, folder structure, and more.
π No docs β since the general expectation was that everyone worked on their own, there was little incentive to write good docs and keep them updated.
Down the line, these problems led to more problems:
Collaboration β any conversation about design, tradeoffs, or estimates, was extremely hard, because everyoneβs ownership was completely siloed.
Hiring β onboarding new engineers, either junior or senior, was a nightmare.
Resource allocation β if we wanted to invest more in a specific product area, it was hard because we couldnβt put more people on it at will.
Key man risk β individual engineers got an outsized importance for the business and, even ifβto their creditβno one ever tried to take advantage of it, it was concerning to me as a manager.
I will stop here and wonβt bother you with how we tried to improve things. This story was to show that the impact of no reviews goes way beyond the βmore bugs in prodβ. In the long run it is detrimental to almost everything you do.
The other extreme β slow reviews π
Now, there is another end of this spectrum which is equally bad. Bad code reviews are usually bad in two ways, often at the same time:
They are slow β they delay release for several hours (or days!)
They are superficial β they donβt really improve the code, nor let knowledge be shared.
The problems with superficial reviews are the same of the no-reviews scenario, so no need to elaborate on that.
Conversely, slow reviews are worth discussing, because I have known many people who apparently have no issue with them, when instead they should.
The problem with slow reviews is that they mess up engineersβ work. Big time.
Once a developer fires up a PR, their work is far from over: they may need to make improvements (based on the review), and, when the feature is released, check that everything works fine in prod through logs and instrumentation.
This is all legit, valuable work.
But including several hours of delay at some point in this process (which you should also multiply by the number of review iterations) makes engineers switch to other tasks, which increases WIP, increases cognitive load, reduces productivity, leads to batched releases, and a whole self-reinforcing cycle of badness.
Waiting just leads to more waiting.
In fact, when people are used to waiting a lot for reviews, they begin to create fewer & bigger PRs, which in turn increases the burden on the reviewer and makes reviews even slower and shallower.
So, how do you do good code reviews? To answer this, letβs figure out what should go in a code review first.
πΒ The scope of code reviews
If we go back to the idea of code reviews as a feedback loop, what should they give feedback on?
In most teams, there are various devices that impact how you write code. Here are a few:
π Principles β coding practices & conventions that you have encoded over time.
πΒ Design docs β where you discuss your design and implementation plan. You may follow a template that enforces good practices.
π€Β Static analysis β where you catch coding standards violations, security vulnerabilities, missing tests, and various kinds of smells.
Ideally, when it comes to quality, you want most of the direction to come from these other stages β as opposed to reviews, which should act as a fallback.
In fact, 1) you donβt want reviews to deal with design issues, as they are expensive to rectify at that stage, and 2) you obviously donβt want human reviewers to look for issues that can be found automatically.
Static analysis tools can do 2 things really well IMO:
Boring easy stuff like aligning brackets and single-to-double quotes (weβve just added a button into our UI that will allow you to generate a patch for all open issues like this in your codebase at once!)
Obscure things that not everybody understands (and thus potentially missed in code review) β a lot of security stuff is like this, for instance not generating a non-http-only cookie β you only know if you know!
β Kendrick Curtis, VP of Engineering at Codacy
Design docs, conventions, and static analysis donβt cover it all, nor are they ever perfect, but by investing in them you can shrink what goes into code reviews considerably, and address many items arguably in better ways β by either automating them, or shifting them left.
So, one of the goals of code reviews should be to continuously reduce their own scope, by allowing engineers to uncover items and rules that can be enforced by other parts of the dev process.
This story by Rado explains it well π
I used to have hard opinions about commits and PR, and over the years, I have softened my stances.
I used to see PR as a bug catcher and code quality. Now, I tend to watch PRs ask for knowledge transfer and paper trails.
For code quality, I rely on linters, team code standards and internal tools. For bugs, on tests, types, and video walkthroughs.
I have settled into trunk-driven development in the last 4-5 years. Most bigger features are split into multiple pull requests and shipped with a feature flag. To merge, one review is required. Often, this is someone who works closely with you on this feature.
β Radoslav Stankov, CTO at Angry Building
So, the TL;DR of this article so far is three things:
Reviews are about 1) improving quality, and 2) sharing knowledge.
Reviews are a crucial practice but they are also easy to mess up.
The scope of reviews is continuously squeezed by good design, principles, and static analysis (among others).
How do we use this to create a solid, healthy review process?
π Β Automate / Defer / Pair
A while ago I wrote about the Ship / Show / AskΒ framework by Rouan Wilsenach, which, for new code, considers three cases:
π’Β ShipΒ β You make the change directly on your mainline, without asking for a code review. This works when you fix unremarkable bugs, add features using established patterns, or do collateral changes like improving docs.
πΒ ShowΒ β You create a PR, run all the CI pipeline, merge it without anyoneβs approval, andΒ thenΒ ask for feedback. This way you release quickly but still create space for discussion. It works in situations where you want to share knowledge but donβt necessarily need feedback, or the feedback is valuable but shouldnβt be blocking.
βΒ AskΒ β You make changes to a branch, open a PR, and wait for the review before merging. This is the standard process most companies adopt today.
The intuition behind Ship / Show / Ask is that the optimal process should use different strategies based on the type of change, rather than imposing the same on everything.
It also encourages engineers to make their own call on the strategy to use, which is empowering.
Ship / Show / Ask has long been a favorite of mine, and over time I tweaked it and developed my own version, which is based on the same three options, but is more opinionated on how to perform them.
I am not good with memorable names, so I just called it Automate / Defer / Pair:
1) Automate π€
When 1) there is no remarkable knowledge to be shared, and 2) the potential for improvements is low, you can probably skip the review and just rely on the other parts of your pipeline to do the job, like static analysis, and of course tests.
Tasks that meet these criteria are never going to be the bulk of your work, but there are still plenty of them in any team: fixing small bugs, adding tests, updating dependencies, small cosmetic changes, and more.
2) Defer βͺοΈ
In a mature and battle-tested dev process, I believe the majority of code changes should be reviewed, but the review shouldnβt be blocking.
In fact, when you think at 1) improving quality, and 2) sharing knowledge, both can be accomplished after the code has been merged.
This works beautifully well in continuous delivery workflows, where you often push changes in disabled state, gated behind a flag. You merge, get an async review, and possibly push improvements later.
3) Pair π―
There are two scenarios in which async, deferred reviews are not ideal:
Correctness is very important β the change is hard to revert and/or has the potential to do big damage: think of db migrations, or payment workflows.
Complexity is high β the PR has thousands of LOCs, or is inherently complex, and the reviewer has little context on the task.
About the first, in my experience itβs hard for a reviewer to truly check for correctness. To do that, they shouldnβt only understand the code: they should check it out on their machine, review tests, and run everything on their own. How many reviewers do this? Few that I know of, because 1) itβs a lot of work, and 2) itβs still not going to be effective, because of all the missing context.
The latter point, about context, is also what makes complex PRs basically unreviewable. Whenever the PR is big, or the code is legitimately elaborate, how can you expect a reviewer to come fresh into the topic and come up with meaningful ideas? All of that in reasonable time?
In all these cases, my preferred solution is pairing.
Pairing on complex / critical reviews solves all these issues. Face-time is high-bandwidth so you get 1) deeper, and 2) faster reviews, plus plenty of collateral benefits which Vic summarized very well just recently: π
For the sake of clarity, you donβt need to pair on the actual development (even though it is probably a good idea) β it is enough to pair on the review.
Still, these kinds of reviews should be reserved for a small portion of your work.
If you feel the majority of changes need to get this treatment, you should probably focus on working in smaller batches, shorter-lived branches, and all the usual advice on doing good PRs, which of course still applies.
πΒ Bottom line β doing the best for your team
This framework isnβt necessarily a good fit for any team.
It is, in a way, aspirational.
Itβs like when we say that you should deploy on Fridays: you should not deploy on Fridays if your systems are not robust enough to prevent nightmarish weekends for your devs. Still, it is good to know that many teams indeed deploy on Fridays, so you can eventually get there, too.
Likewise, I have knownβand worked for a long time inβteams where 1) there was no agreement even on basic design topics, 2) no written conventions whatsoever, and/or 3) testing and static analysis were shallow at best.
In these cases, human + blocking + async reviews have indeed a ton of value, and you should 100% do them. But you should also invest in taking their findings and moving them out of manual reviews, one by one. Can you add that thing to the linter? Can you write down that naming rule? Can you add a todo item to the design checklist?
Counterintuitively, having consistently helpful and valuable reviews is a smell that you can do better in other parts of the process.
Finally, these are not team choices only, but also individual ones.
You can be more cautious with fresh hires and junior engineers β for whom more reviews also equals faster growth β while empowering folks who have been working with you for longer, with a leaner workflow.
This doesnβt mean senior engineers == push without review β it means trusting them with picking the right strategy based on what the situation demands.
π Resources
And thatβs it for today! Here are more articles I wrote in the past if you want to learn more:
π Code Reviews β a previous article I wrote about code reviews workflows where we go through more workflows (e.g. stacked diffs), recommend more tools, and more.
π How to Write Secure Code β static analysis is especially useful with security vulnerabilities. We talked about this and much more, including tool recommendations, in this piece I wrote last year.
π How to Test Software β one of the most popular Refactoring articles ever, where we talk about the true value of testing, different types of tests, strategies, and more.
And thatβs it for today! If you are finding this newsletter valuable, consider doing any of these:
1) π Subscribe to the full versionΒ β if you arenβt already, consider becoming a paid subscriber. You can learn more about theΒ benefits of the paid plan here.
2)Β β€οΈΒ Share itΒ β The Hybrid Hacker lives thanks to word of mouth. Share the article with your team or with someone to whom it might be useful!
I wish you aΒ great week! βοΈ
Luca
βAsk a developer to review 10 lines of code, he'll find 10 issues. Ask him to do 500 lines and he'll say LGTM.β
Thank you for this newsletter issue Luca, I love that you mentioned the fact that having
reviews that are often helpful means improvement might be necessary in other parts of the process