Dev Genius

Coding, Tutorials, News, UX, UI and much more related to development

Follow publication

Embrace the Mono-repo!

Moving your poly-repo to a mono-repo is an act of organization that Marie Kondo would endorse!

In source control, there are generally two extremes of how software teams manage their source:

  1. Poly-repo. Each discrete project or application scope has its own repository. For example, a team might have a separate repository for the front-end and back-end of an app in the simplest case.
  2. Mono-repo. All of the projects for a related application or even an organization are stored in one repository.

In between these extremes are hybrid repos which are poly-mono-repos.

But for most teams — especially startups — I’ve come to believe that the mono-repo is the way to go. In fact, poly-repo is often the earliest and most costly mistake small teams and startups make when organizing code because this very early act creates an invisible drag that doesn’t dissipate until a big lift to consolidate the repos.

Let’s examine the reasons why and when you should use a mono-repo versus a poly-repo setup for your team.

Background

Mono-repo’s brand has been somewhat maligned in recent history due to the rise of micro-services architectures in place of monolithic architectures.

But equating mono-repo with monolith is a common mistake. Simona Cotin of Microsoft wrote a great piece on this:

[I]t is not hard to see that where we develop our code and what/when we deploy are actually orthogonal concerns. Google, for instance, has thousands of applications in its monorepo, but obviously, all of them are not released together.

So monorepo !== monolith. Quite the contrary, because monorepos simplify code sharing and cross-project refactorings, they significantly lower the cost of creating libs, microservices and microfrontends. So adopting a monorepo often enables more deployment flexibility, and more modularity in application structure.

Teams that have adopted a service-oriented architecture often end up defaulting to poly-repo as a mechanism for segregating application communication boundaries as well team boundaries.

While this can certainly have some benefits, it comes with tremendous overhead and downsides, especially early on in the development cycle of a novel system.

Again, Cotin’s writing is spot on based on a distinction between published and public APIs:

Changing a public API requires us to update the clients of the API, but it’s possible to do it in a single go. In the case of a published API, it’s not possible to do it in a single go, and sometimes not possible to do at all. So changing a published API is a lot more effortful.

Observation: when something is effortful, developers avoid doing it. So developers avoid changing published APIs.

Early in the development lifecycle, developers are still figuring out the system’s architecture and protocols, deciding on what can be shared, and so on.

As a result, the cost of introducing published APIs is a lot higher because the number of times we need to change it is a lot higher.

With poly-repo, regardless of how the pieces of the system are connected — whether through REST, GQL, or a package manager — it always results in the need to work with shared code as a published API (lest you take the tried-and-true route of copying and pasting code…).

Mono-repo: So Old It’s New Again

In the last few years, mono-repo has actually had a sort of renaissance.

There’s a really good talk here about how mono-repos work at Google:

Many of Microsoft’s repositories have switched to mono-repo:

Other notable companies using mono-repo include Facebook, Dropbox, and more.

Despite it’s image problem and association with monolithic architectures, mono-repo seems to be making a comeback. Let’s take a look at the reasons why.

The “Root” Problem with Poly-repo

At the very core, the friction and pain of the poly-repo is really about the file system.

Primarily, the file system on each developer’s environment in a poly-repo setup is inconsistent meaning that there is a misalignment between the logical layout and the physical layout so scripting and automation becomes impossible. We could use some script to set up the environment so that everyone has the same physical layout.

Secondly, while microservices and poly-repo are not the same thing, there is often a strong correlation between these two patterns. A team building microservices is more likely to adopt a poly-repo than a mono-repo. The result of this decision is that it breaks the symbol linking between parts of the system in the codebase.

Finally, it requires more work to maintain the versions of each of the repos. Because they all sit in different directories, it often requires extra schleps to keep everything up-to-date and aligned. Of course, it is possible to write a script to do this as well, but the question to ask is why accept that friction?

The friction is a sign that you’re doing something wrong.

Mono-repo’s Many Benefits

Using GitHub Action’s paths syntax to activate workflows only on sub-trees of the repo

Development Velocity

As Cotin wrote, when you have a published API, it means that to update a reference to that API, there is typically a flow where there is a publish step followed by a refresh step to update the reference to the published API.

This is often the case with OpenAPI or gRPC references, for example.

While this makes perfect sense when APIs are relatively stable or actually managed by separate teams within an organization, it rarely makes sense in other cases. It especially doesn’t make sense for startups where everyone is working on a bit of everything. This extra overhead just slows teams down.

In a mono-repo, it becomes very straightfoward to script the automatic generation of schemas and downstream clients on every build.

Code Navigation

One of the problems with poly-repo — that is exacerbated when there is a forced service boundary between the components of each repo — is that code navigation has more friction.

Modern code editors and IDEs are able to follow symbol references across files in a project, but this breaks down when those files are not linked together and loaded together.

For example, imagine a .NET project with two repos for two projects. Without linking the two projects, the editor cannot traverse the linked symbols. If we link through package publishing, we lose the ability to quickly navigate through the codebase and end up relying on very primitive “Find All”.

Mono-repos allow code editors and IDEs to take full advantage of language servers and intelligent navigation of code symbols between projects.

Efficient Refactoring

Because poly-repo breaks symbol linking, it’s difficult to use the powerful refactoring facilities of modern IDEs and instead, those superpowers are restricted to the scope of the individual repositories. In such cases, it becomes a matter of Find All to find and replace references.

Modern tools can easily make consistent global changes to method signatures…when those symbols are linked.

Consistent Versions

When working with poly-repos, each repo has a different version that must be maintained and synchronized. This creates overhead and can even be a source of defects.

As the number of repositories multiply, the overhead of managing the versions just creates more and more friction. I’ve seen teams resort to writing scripts just to manage synchronization of the most current versions of repos across their poly-repo.

In a mono-repo, a single git pull synchronizes all of the versions of the repository. A single git checkout feat/awesome-feature ensures that your entire system is ready to run the awesome-feature without having to synchronize versions across multiple local repos.

Better Workflow

In poly-repos, the workflow around pull requests and code reviews becomes quite onerous when a developer’s work spans multiple repositories. For example, a server API is updated and a corresponding update needs to be made to the front-end.

With poly-repo, this requires going through multiple repositories in GitHub or GitLab or your source collaboration tool of choice to review the code. Maybe you’ll also want to link the PRs together for visibility.

If you’re pulling the branches locally, you’ll need to pull consistently across multiple branches.

It’s all just overhead.

In a mono-repo, all of the code review can happen in one repo in GitHub and if you want to test or review the code locally, a single git action will put you into the correct context.

Greater Visibility

Often overlooked is that mono-repos provide teams more visibility and accountability into each other’s work. In poly-repos, the PRs in GitHub would be isolated in each repo and each developer’s work potentially less visible to the others.

With a mono-repo setup, all the PRs flow through one repo and every pull brings down all of the code. This is a great way to improve visibility and enhance adoption of best practices, design patterns, and preventing code that duplicates existing functionality.

When Poly-repo Makes Sense

That’s not to say that poly-repo never makes sense, but rather it shouldn’t be the default for most teams.

Some criteria that favor poly-repo include:

  • Code is worked on by discrete teams. When separate teams with separate processes and practices are working on different parts of the code, there is a stronger case for separate repos. If each team has a different process to follow, then it makes sense that the repo boundary aligns with a process boundary.
  • The surface area of the libraries, APIs, or SDKs are stable. If there is very little volatility in how the functionality of that code is exposed to consumers, poly-repo’s overhead is reduced. Teams that are particularly adept at managing versions of code and data can often write APIs and SDKs in ways that minimize breaking changes.
  • Stability is more important than speed. That friction of poly-repo? It has a bias towards stability because the costs of updating code is higher and in some cases, that can be a “feature”. A poly-repo can be used as a “stick” to favor stability by making the cost of changing code higher. When speed is secondary to stability, this can be seen as a perk of poly-repo.
  • Code is managed under regulatory constraints. In life sciences where a GxP process is involved, code must be developed under regulatory constraints where versioning is critical. Using a poly-repo paradigm can help reduce the cost of this regulatory constraint (at the cost of development overhead) by allowing one part of the code to change without having to absorb the full cost of regulatory processes (e.g. regression testing). In other words, the repo boundary can be used as a boundary for processes related to regulatory overhead.

In general, the decision to adopt poly-repo is driven by a need to create a boundary (the very boundary that is the source of friction) because that boundary has process related benefits which usually supersede the development related frictions caused by poly-repo.

Infoq has a great discussion on mono-repo, microservices, and monoliths if you’d like to read more industry thoughts. Alex Noonan, in particular, talks in depth about the journey from mono → poly → mono and what they learned along the way:

You’d go in to change one thing and then tests were breaking somewhere else, and now you have to spend time fixing those tests. Breaking them out into separate repos only made that worse because now you go in and touch something that hasn’t been touched in six months. Those tests are completely broken because you aren’t forced to spend time fixing that.

Is a mono-repo for you? It’s important to look at your processes and workflow with an objective eye and see if some of the root causes of friction are simply because of the poly-repo setup. In the early stages, the mono-repo’s benefits are numerous and can help contribute to velocity.

Mono-repo does require a bit more complexity (just a teensy bit) on the CI/CD side to get it right as scripts and workflows need to be context aware, but the efforts can pay off in greater productivity and less friction with every line of code.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Published in Dev Genius

Coding, Tutorials, News, UX, UI and much more related to development

Written by Charles Chen

Maker of software ▪ Cofounder @ Turas.app ▪ Maker of CodeRev.app ▪ GCP, AWS, Fullstack, AI, Postgres, NoSQL, JS/TS, React, Vue, Node, and C#/.NET

Responses (3)

Write a response