Part 2: Remote repositories

Everything we have seen in Part 1 is restricted to a single computer - all our changes and history are stored in a repository that lives in a single directory of Sarah's PC.

This is fine and well for personal projects, but Git is designed to do much more. In particular, an important part of Git is working with multiple repositories all over the world, sharing your code with other users in a (mostly) painless way.

There are plenty of reasons why having multiple repositories can be a good idea. Two come immediately to mind:

It makes collaboration easier: if you want to share your work with others, having a central repository that all participants can use in parallel from the comfort of their own computers reduces the friction between teammates.
It lowers the chances of you losing your work: if something happens to your PC, you are safe knowing that there’s always an up-to-date copy of everything in another computer.

In this lesson we'll learn how to synchronize multiple repositories with each other.

First steps

In Part 1 we followed Sarah, who created a repository in a local directory to keep track of her recipes. This repository contains right now a single file, recipe.txt:

Poached egg
===========
Ingredients:
  * 1 egg
  * Salt

Instructions:
  * Submerge the egg in 80°C water with salt for 3-4 minutes.
  * Add salt to taste.

We also mentioned that a repository can be seen as a chain of commits. One way to represent these commits is with the following graph, where newer commits are placed on the top. We saw a similar text-only output in git log before. Remember: a commit's identifier is called its hash, we'll see them all the time, and we typically only need its first 7 characters.

Along comes Richard, who Sarah invited to collaborate on her recipe. While Git allows many esoteric ways of collaboration, in this tutorial we focus on the most common one: we will create a central repository in GitLab (or GitHub) and use it as the “official” repository. All collaborators will work on their own computers with a local copy of this central repository, and once in a while they’ll synchronize their copies with the central one.

GitHub and GitLab

Have you ever heard of GitHub and/or GitLab? At their core, both of their services are nothing but a convenient way for you to have a Git repository that lives in a public server everyone has access to. Everything we see here applies equally to both of them.

Throughout this tutorial I will use the term "GitLab" for simplicity, but the steps are exactly the same when using GitHub.

Who came first?

If Sarah wants to collaborate with Richard, she needs to create a repository in GitLab. And here we make a quick stop to explain a slightly unintuitive behavior.

When you create a new repository, GitLab will ask whether you want a completely empty repository (also known as a “bare” repository) or whether you’d like GitLab to create a README for you. How to answer depends on what you intend to do next.

If you have not yet written a single line of code (or, in our case, recipe), then you should tell GitLab to create the README file for you. Why? Because the act of adding that single file initializes the repository, and you can start working on it right away. Plus, you can always delete that extra file anyway.

If you already have a local repository (as we have been doing until now) then you don’t want GitLab to create any files for you because you have already initialized your local repository. And trying to mix two already-initialized repositories together is asking for trouble.

None of this alternatives is better than the other, but the first one is slightly easier if you are just starting. And you should definitely avoid creating a repository with a README if you already have a local repo.

Linking repositories

After creating a GitLab repository and inviting Richard, we have three computers that we need to link together:

Sarah's computer, who already has a working repository,
GitLab's computer, where we have a shiny empty repo, and
Richard's computer, who doesn’t have anything yet.

Sarah	GitLab	Richard

Since we already have some code, we follow GitLab’s advice and we create an empty repository, https://gitlab.com/sarah/recipes. Using this URL that GitLab provided us for own repo, Sarah performs the following commands:

$ git remote add origin https://gitlab.com/sarah/recipes.git
$ git push -u origin master
Enumerating objects: 9, done.
Counting objects: 100% (9/9), done.
Delta compression using up to 8 threads
Compressing objects: 100% (6/6), done.
Writing objects: 100% (9/9), 917 bytes | 458.00 KiB/s, done.
Total 9 (delta 1), reused 0 (delta 0)
remote: Resolving deltas: 100% (1/1), done.
To https://gitlab.com/sarah/recipes.git
 * [new branch]      master -> master
Branch 'master' set up to track remote branch 'master' from 'origin'.

The first command is doing a lot, so let's go step-by-step:

A remote repository is a repository that lives in a different directory and/or in a different computer. The command git remote add is letting Git know of the existence of a related repository somewhere in the world.
We can use any name we want for a remote repository, but origin is a standard name for saying "a central repository that a lot of other developers will follow". The command git remote -v shows all the remotes our Git repository knows.
Git identifies repositories via URLs. In this case, we are telling Git that the origin remote repository is located in GitLab's servers.

The second command, git push, sends all of our history to the remote repository, at which point both repositories are in sync with each other.

Sarah	GitLab	Richard

Now it is Richard’s turn. Since he doesn’t have any local code to worry about, he performs a single command, git clone:

$ git clone https://gitlab.com/sarah/recipes.git
Cloning into 'recipes'...
Username for 'https://gitlab.com': richard
Password for 'https://richard@gitlab.com': 
remote: Enumerating objects: 9, done.
remote: Counting objects: 100% (9/9), done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 9 (delta 1), reused 9 (delta 1), pack-reused 0
Unpacking objects: 100% (9/9), done.

This command is doing two steps in one. First, it retrieved an entire copy of the remote repository and stored it locally. Then it automatically set the origin remote to the same remote address from where we copied the repository. At this point we are all working on the exact same code and ready to start working with each other.

Sarah	GitLab	Richard

Moving changes around

Let’s say Sarah decides to add a spoon of vinegar to her recipe. She makes the change and commits it:

Poached egg
===========
Ingredients:
  * 1 egg
  * 1 spoon vinegar
  * Salt

Instructions:
  * Submerge the egg in 80°C water with salt and vinegar for 3-4 minutes.
  * Add salt to taste.

$ git commit -a -m "Adds vinegar"
[master 7821cc5] Adds vinegar
 1 file changed, 2 insertions(+), 1 deletion(-)

Since we want Richard to see these changes too, we send our changes to the central repository with the git push command:

$ git push
Username for 'https://gitlab.com': sarah
Password for 'https://sarah@gitlab.com': 
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 8 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 300 bytes | 300.00 KiB/s, done.
Total 3 (delta 1), reused 0 (delta 0)
remote: Resolving deltas: 100% (1/1), completed with 1 local object.
To https://gitlab.com/sarah/recipes.git
   ca0e2e3..7821cc5  master -> master

These are perfectly normal messages letting us know that everything went well. At this point both Sarah's and GitLab's repositories are in sync with each other, but Richard is one step (or one commit) behind:

Sarah	GitLab	Richard

Now it is Richard’s turn. When he comes to work the first thing he does is to check whether there are any changes that he should be aware of. In Git terms, he pulls any new changes that might have happened.

$ git pull
Username for 'https://gitlab.com': richard
Password for 'https://richard@gitlab.com': 
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (1/1), done.
remote: Total 3 (delta 1), reused 3 (delta 1), pack-reused 0
Unpacking objects: 100% (3/3), done.
From https://github.com/sarah/recipes
   ca0e2e3..7821cc5  master     -> origin/master
Updating ca0e2e3..7821cc5
Fast-forward
 recipe.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

A lot has happened here, with the end result that Richard has an up-to-date copy of everything in the repository. Understanding how this happens is key to solving the day-to-day issues that a Git user must deal with once in a while.

First, you should know that git pull is actually two operations rolled into one, git fetch (which brings the latest changes into your repository) and git merge (which integrates the changes with yours).

When Richard called git fetch, Git brought all new commits into his repository, but it didn’t connect them to anything yet:

The new commit (or commits, if there's more than one) is floating in the ether - it exists in the local repository, but we haven’t decided what to do with it yet. And if you remember that a commit is a set of changes, we can imagine this floating commit as a set instructions on how to change a file that hasn't changed any file yet. From Richard's point of view, all of his files still look exactly the same.

The next step, git merge, is the one that connects this floating commit with the existing files and updates the position of the HEAD label. This is what brings Richard's repository up to date:

Sarah	GitLab	Richard

Now that both Sarah and Richard are up to date with each other, Richard can further work on the recipe:

Poached egg
===========
Ingredients:
  * 1 egg
  * 1 spoon vinegar
  * Salt

Instructions:
  * Heat the water to 80°C.
  * Add salt and vinegar to the water.
  * Submerge the egg for 3-4 minutes.
  * Add salt to taste.

$ git commit -a -m "Style improvements"
[master fbe14a8] Style improvements
 1 file changed, 3 insertions(+), 1 deletion(-)

Sarah	GitLab	Richard

Richard keeps working and adding changes, and once he's done he pushes his changes back to the central repository from where Sarah can pick them up later.

Part 2 quick review

git clone: Makes a local copy of a remote repository.
git push: Sends your local changes to a remote repository. It only works if your local copy is up to date with the remote repository.
git pull: Retrieves changes from a remote repository and merges them with your local version. It's the same as git fetch followed by git merge.
git fetch: Retrieves changes from a remote repository and stores it locally. Doesn't modify your local copy.
git merge: Brings your repository in sync with changes from a second repository. Modifies your local copy.

This style of collaborative working is fine, but it’s not very exciting - at this point, all we’ve managed is to replicate the concept of sending files back and forth with extra steps. In Part 3 we will see how to work in parallel, including a look at what happens when we both modify the same file at the same time, what happens when we both modify the same line at the same time, and how to deal with editing conflicts.