Choose the right way to organize your code in a Git repository

Montacer Dkhilali
13 min readMay 3, 2021

Introduction

Git is one of the most brilliant software ever written (Linus’s biggest achievement, rather than Linux). It’s an example of software that’s so well designed that it keeps surprising you (well me, at least 🤷‍♂️) by its various features and performance.

As a software developer, you will definitely use Git everyday in your career (most of the time). You will also create a Git repository for your personal project or with your development team in your company. Therefore, choosing the right way for organizing your code in a single Git repository or multiple ones is very important because it will affect your workflow from development to deployment.

Most of software projects nowadays are splitted in many packages (Frontends and Backends). We may store the Frontend in an independent Git repository as the Backend in an independent one (Multiple Git repositories). Or, we may store them in a Monorepos (A single repository for all packages).

In this article, I will share with you ideas, from our experience in Breakpoint Technology, of the different ways to organize your code using Git.

  • Multiple repositories
  • Monorepo
  • Submodules
  • Subtree

I am not going to dive deep into these topics. Still, by the end of this article, you will be able to choose the right way to organize your project code. So, let’s git started! 🤘

Multiple Git repositories

Choosing the multiple repositories pattern for your project depends on a variety of factors (project type, project size, number of developers, …). The idea is to store a single logical package of your project in an independent repository.

For example, at Breakpoint Technology, we stored the BP Website in a single repository because it is composed of only one logical component which is a React Gatsby project. (View Image 1).

Image 1

Also, we stored Linkinnov projet which is composed of (Landing page, Frontend Web App, Mobile App, many Backend microservices) in this structure (View Image 2) :

  • Git repository for the Linkinnov Landing page.
  • Git repository for the Linkinnov Frontend Web App .
  • Git repository for the Mobile.
  • Many other Git repositories for each microservice Backend project …
Image 2

The reason for following this approach is that Linkinnov is a very large project that is composed of many pieces of software, and on each piece we have different developers (Frontend team, Backend team, Mobile team).

When might you want multi repositories ?

  • Your repositories are loosely coupled or decoupled.
  • A developer typically only needs one or a small subset of your repositories to develop. For example, a mobile app, a landing page. Etc.
  • You typically want to develop the repositories independently, and only need to synchronize them occasionally.
  • Different teams work on different repositories.

When should you avoid using multiple Git repositories ?

  • It would be hard for junior developers to work with multiple repositories, they will spend more time learning the Version Control’s structure before they can start coding. This is particularly difficult for developers new to Git.
  • Tracking the entire history of a feature is much more difficult. Sometimes, developers need to perform a Git revert action on some features, then they would execute the same actions in different repositories.

Hope that makes sense, now let’s discover the famous Git Monorepo !

Monorepo

It is a repository that contains more than one logically isolated project (Ionic Mobile App, React Web App and a Nest.js Backend project). These projects are most likely unrelated, loosely connected or can be connected by other means (e.g via dependency management tools). This concept is relatively old and appeared about a decade ago. Google was one of the first companies that adopted this approach for managing their codebases.

The repository should only one .git/ folder at the root path of the project.

The repository is large in many ways :

  • Number of commits.
  • Number of branches and/or tags.
  • Number of files tracked.
  • Size of content tracked.

Use case

At Breakpoint Technology, we often use Monorepos for many of our projects (Gelaas, Speachbot, Digirail, Expand, BP-Profiling, …). The common point of these project that they are of an average size, composed of a Backend package using Node.js and more than a Frontend App or Mobile App using React or Angular.

We manage the packages using Lerna which is a great tool built on top of NPM for managing JavaScript packages. With some basic commands, Lerna allow us dealing with semantic versions, setting up building workflow, pushing packages, lint and test code.

The project structure in a Monorepo project using Lerna would look like this (View Image 3 and Image 4) :

Image 3
Image 4

In the lerna.json file we will declare the path to the project packages :

lerna.json

{
"packages": [
"packages/crm-client",
"packages/crm-server",
"packages/engine"
],
"version": "independent"
}

Now, in the bitbucket-pipelines.yaml file we can use Lerna to perform some actions like verifying code lint and running unit tests. We published this breakpointtechnology/lerna Docker image on the public Docker Hub registry and imported it in bitbucket-pipelines.yaml file to be able to run lerna run … commands.

By running lerna run lint, Lerna will run look into package.json files of all packages, and execute NPM lint script if exists. Also, for unit test, and if an error is thrown, the pipeline will fail.

bitbucket-pipelines.yaml

image: breakpointtechnology/lerna:latestdefinitions:
caches:
crm-server: ./packages/crm-server/node_modules/
crm-client: ./packages/crm-client/node_modules/
engine: ./packages/engine/node_modules/
pipelines:
default:
- step:
name: Build and test
caches:
- node
- crm-server
- crm-client
- engine
script:
- lerna bootstrap
- lerna run lint
- lerna run test

We also used the power of Docker and Docker Compose to build and run all the packages locally in a very rapid and simple way by running the Makefile command make local.

Makefile

local:
docker-compose up -d --build --remove-orphans

So, instead of having a lot of repositories with their own configs, we will have only one repository, one pipeline, one Docker-compose and one way to run the project simply and efficiently. And you still have scalability, opportunity to separate concerns, code sharing with common packages. Sounds nice, right? Well, it is. But there are some drawbacks as well. Let’s take a close look at the exact pros and cons of using the Monorepos in the wild.

Why should you use Monorepo ?

  • Easily run the project locally : Taking advantage of the Makefile and Docker compose you are able to run the project all the project containers locally with a single command.
  • One place to store all configurations and tests : Since everything is located inside one repository, you can configure your CI/CD pipeline once and then just re-use configs to build all packages before publishing them to remote or deploying them.
  • Easily refactor or revert commits : Instead of making a pull request for each repository, figuring out in which order to build your changes, you just need to make an one atomic pull request which will contain all commits related to the feature that you are working on.

Why should you avoid using Monorepos ?

  • Security purpose : If you want to restrict access to some “packages”, then I believe this is impossible with Monorepo.
  • Test or deploy specific package : How do you run tests for only certain packages? How do you achieve CI/CD for certain packages? If a test fails for a specific package, deployment will fail for all other packages.
  • Higher build time : Because you will have a lot of source code in one place, it will take way more time for your pipline to run everything in order to approve every PR.

Now, let’s answer these questions in the Git Submodules section.

Git Submodules

They are Git core feature that allows you to keep a Git repository as a sub directory of another repository. Git Submodules are simply a reference to another repository at a particular snapshot in time. Git Submodules enable a Git repository to incorporate and track version history of external code.

Often a code repository will depend upon some external code. The external code can be directly copied and pasted into the main repository, here we are talking about the Monorepo concept. This method has the downside of losing any upstream changes to the external repository. Here comes Git Submodules with the solution.

A git Submodule is a record within a host git repository that points to a specific commit in another external repository. When adding a Submodule to a repository a new .gitmodules file will be created. The .gitmodules file contains meta data about the mapping between the submodule project’s URL and local directory. If the host repository has multiple Submodules, the .gitmodules file will have an entry for each Submodule.

When should you use Git Submodules ?

  • When an external component is changing too fast or upcoming changes will break the API, you can lock the code to a specific commit for your own safety.
  • When you are delegating a piece of the project to a third party and you want to integrate their work at a specific time or release. Again this works when updates are not too frequent.

When should you avoid using Git Submodules ?

  • Git Submodules add complexity to your version control system and you should ensure using Submodules is more of a benefit than that complexity.
  • If you are working on a team with Git Submodules, you should always need commit and push changes, to avoid branches divergences.

Demo

Given that we already have 2 existing repositories hosted on GitHub first-submodule and second-submodule. We would like to create a git-submodules-demo project that imports first-submodule and second-submodule projects as Git Submodules (View Image 5).

Image 5

Follow these steps to configure and setup your Git Submodules project :

Create the main repository git-submodules-demo :

> mkdir git-submodules-demo
> cd git-submodule-demo/
> git init

Next we will add first-submodule and second-submodule to this fresh new repo :

> git submodule add https://github.com/montacerdk/first-submodule.git
Cloning into '/home/montacer/projects/workshops/git-submodules-demo/first-submodule'...
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 0), reused 3 (delta 0), pack-reused 0
Receiving objects: 100% (3/3), done.
> git submodule add https://github.com/montacerdk/second-submodule.git
Cloning into '/home/montacer/projects/workshops/git-submodules-demo/second-submodule'...
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 0), reused 3 (delta 0), pack-reused 0
Receiving objects: 100% (3/3), done.

Git will immediately clone the first-submodule and second-submodule repositories. We can now review the current state of the repository using git status :

> git status
On branch master
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
new file: .gitmodules
new file: first-submodule
new file: second-submodule

There are now 3 new untracked files in the main repository .gitmodules, first-submodule and second-submodule directory. Looking at the contents of .gitmodules shows the new Submodules mapping :

[submodule "first-submodule"]
path = first-submodule
url = https://github.com/montacerdk/first-submodule.git
[submodule "second-submodule"]
path = second-submodule
url = https://github.com/montacerdk/second-submodule.git

Now, you can commit changes on the main repository :

> git add .
> git commit -m 'Adding submodules to the main repository'

Once Submodules are properly initialized within the main repository, they can be utilized exactly like stand-alone repositories. This means that Submodules have their own branches and history. When making changes to a Submodule it is important to push Submodule changes.

Let’s and create some commits on first-submodule and verify the status of the main repository :

> git status
On branch master
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: first-submodule (new commits)
no changes added to commit (use "git add" and/or "git commit -a")

Executing git status shows us that the main repository is aware of the new commits to the first-submodule but it doesn’t go into details, even if you run git diff on the main repository we can’t recognize what has changed :

> git diffdiff --git a/first-submodule b/first-submodule
index 1bd7ea6..33fa36e 160000
--- a/first-submodule
+++ b/first-submodule
@@ -1 +1 @@
-Subproject commit 1bd7ea66e729cce2c115e265521be828125a6f70
+Subproject commit 33fa36e538c801cdfc7776aeaddd45b2a63b43f6

Now, we can commit and push changes on both first-submodule and main repository and we are done!

Git Subtree

Git Subtree is an alternative to Git Submodules. It lets you nest one repository inside another as a sub-directory. It is one of several ways Git projects can manage project dependencies.

When you want to use a Subtree, you add the Subtree to an existing repository where the Subtree is a reference to another repository url and branch/tag. This add command adds all files and the git history into the main repository locally, it’s not just a reference to a remote repository.
(View Image 6).

Image 6

Why you may want to consider Git Subtree ?

  • The sub-project’s code is available right after the clone of the super project is done.
  • Git Subtree does not require users of your repository to learn anything new. They can ignore the fact that you are using Git Subtree to manage dependencies.
  • Git Subtree does not add new metadata files like Git Submodule does (i.e., .gitmodule).
  • Contents of the module can be modified without having a separate repository copy of the dependency somewhere else.

When should you avoid using Git Subtree ?

  • Contributing code back upstream for the sub-projects is slightly more complicated.
  • The responsibility of not mixing super and sub-project code in commits lies with you.
  • Switching between branches in Subtree require removing the current branch and checkout an other branch, which is a bit complicated.

Difference between Git Submodules and Git Subtree

  • With Git submodules you typically want to separate a large repository into smaller ones. It is a better fit for component-based development, where your main project depends on a fixed version of another component (repository). If you need a change within the Submodule you have to make a commit/push within the Submodule, then reference the new commit in the main repository and then commit/push the changed reference of the main repository.
  • With Git Subtree you integrate another repository in yours, including its history. So after integrating it, the size of your repository is probably bigger. After the integration there is no connection to the other repository, and you don’t need access to it unless you want to get an update. So this strategy is more for code and history reuse.

Demo

Create the Parent repository

In this demo, we will add a Subtree to a parent repository. We start first by linking our parent repository to GitHub after creating some commits :

~/parent-repo> git remote add origin https://github.com/montacerdk/parent-repository.git
~/parent-repo> git push -u origin master

Create the Child repository

We also need to create our child repository that will be added as a Subtree to our parent repository. Also, create some commits and push them to GitHub :

~/child-repo> git remote add origin https://github.com/montacerdk/child-repo.git
~/child-repo> git push -u origin master

Add the child Repository the parent as a Subtree

Now, let’s add the child-repo as a Subtree to our parent repository. To do so, we need to link the child-repo remote to our parent-repo. Then, we will execute the subtree command to clone the child-repo code into the parent-repo as a Subtree :

~/parent-repo> git remote add -f child https://github.com/montacerdk/child-repo.git
~/parent-repo> git subtree add --prefix child child master --squash

As a result, we will have the child-repo added as a directory to our parent-repo. (View Image 7) :

Image 7

We can view the Git history on the parent-repo by running :

git log --all --decorate --oneline --graph

As we see in the Image 8, If we add --squash into the git subtree add command, Git will squash all the child-repo’s commit history into one commit on the parent-repo. Otherwise, it will display all the child-repo’s commit history.

Image 8

Push changes

Now using git add, git commit and git push commands, we can push changes to the child-repo to origin/parent-repo. But, in order to push them to the child remote, which is origin/child-repo on child-repo, we need to run this command :

~/parent-repo> git subtree push --prefix child child master

Pull changes

First, start by adding some changes to the child-repo :

~/child-repo> git fetch origin -p
~/child-repo> git pull
~/child-repo> git add . && git commit -m 'Adding some files' && git push

Then we can pull these changes on the parent-repo by running this command :

~/parent-repo> git subtree pull --prefix=child child master --squash

If you added --squash when creating the Subtree, you will always need to add --squash when pulling, otherwise, pull will not pass.

Switch branches

Create a new branch on the child-repo, make some changes and push them to origin/child-repo :

~/child-repo> git checkout -b release/1.0.0
~/child-repo> git add . && git commit -m 'Adding some changes for release' && git push

To switch to release/1.0.0 branch on parent-repo, just delete child folder from history, commit and recreate the Subtree from the new release/1.0.0 branch :

~/parent-repo> git rm -r child
~/parent-repo> git commit 'Switching to release/1.0.0 branch'
~/parent-repo> git subtree add --prefix child child release/1.0.0 --squash

Conclusion

In the table below, you will find a comparison between the different ways of organizing your code using Git :

Git was engineered almost perfectly to fit the needs of any software project requirements regardless of the project size or type.

Each project and every developer has their own set of concerns, tools and workflows. Still, storing code into multiple packages in one repository a key to true modularity, which becomes increasingly popular in today’s ecosystem and provides great advantages when working with DevOps tools.

If you want further reference, here are some useful links :

--

--