Part 1: Git basics

Imagine that you are working on an important document. Once it looks good you send a copy to a colleague, who then sends it back with some suggestions for improvements. You keep both files as reference and unify them in a “final” file, only for you to realize that your “final” file has one wrong figure and you need to fix that. After some more back-and-forth, and before you realize it, your Documents directory is full of slightly-different versions of the same file and you have a hard time realizing which one is the “good” one.

A folder full of documents with variations of the same name, such as 'report', 'report_final', and 'report_final2'

Git is designed to solve this problem. Of all the things that you can do with Git, we'll focus on three specific ones:

The ability to store multiple versions of a file across time and to switch easily between them
The ability to work in parallel with other people and to painlessly bring your work together
The ability to keep track of multiple parallel versions of a file that, for some reason, cannot be unified into one yet.

This tutorial is designed to teach you how to use Git to achieve these three objectives.

But first, some technical details

There are two topics that this tutorial doesn't cover: how to install Git, and how to deal with whatever the authentication method of the month in GitHub/GitLab is. For instructions on installing Git there is no better starting point than the official Git website. As for the authentication, this tutorial will simply assume that you use a username and password. For a more detailed guide, I'm afraid you'll have to check what your repository provider has to say.

While there are several good graphical interfaces for Git, this tutorial will only use the command line. Graphical tools are good once you know what you want to do, but they abstract many important concepts that we first need to learn. Once you are done with the tutorial, you could consider using something like TortoiseGit to have a graphical interface .

And finally, there is a lot of complexity to Git, and explaining all of it at once can be overwhelming. For this reason this tutorial will often ask you to ignore certain directories and/or command outputs. But don't worry: by the end of the tutorial all of it will have been explained.

First steps

In this tutorial we will follow Sarah. She works as a chef for a fancy restaurant, and has finally decided to digitalize her recipe book. She would also like to collaborate with other chefs making new recipes, and is particularly interested in keeping multiple versions of the same dishes (such as her vegan paella and her salt-free chicken soup recipes) up-to-date.

The first core concept we’ll learn is a repository, or repo for short. For our purposes, a repository is a special directory that keeps a detailed log of everything that happens to each file in it.

Creating a new repository is easy. All Sarah needs to do is to go into her target directory and type:

$ git init .
Initialized empty Git repository in <your directory>/.git/

This empty directory should now contain a (hidden) directory called .git where Git stores all of its information. Users rarely need to modify files in this directory by hand, and therefore this tutorial will pretend that it doesn't exist.

Now that Sarah has a repository, we can ask Git if there’s anything we should worry about right now with the git status command. This command gives us an overview of what’s going on at any time, and will be very useful throughout this tutorial.

$ git status
On branch master

No commits yet

nothing to commit (create/copy files and use "git add" to track)

Unsurprisingly, since we haven’t done anything yet there’s not much for Git to do. Nonetheless, Git lets us know that…

… we are in a so-called branch named "master" (or "main" in some versions). This will be important later, but you can ignore it for now.
… there is nothing to commit and there are no commits yet, and
… we are ready to keep track of files

When we commit (verb) one or more files we are explicitly keeping track of their status at this point in time. We call this snapshot in time a commit (substantive), and there’s no better way to understand them than to see them in action.

Let’s create a new file recipe.txt inside our repository containing Sarah's first recipe:

Poached egg
===========
Ingredients:
  * 1 egg

Instructions:
  * Submerge the egg in boiling water for 3-4 minutes.

If we use git status now, Git will let us know that something has changed:

$ git status
On branch master

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	recipe.txt

nothing added to commit but untracked files present (use "git add" to track)

As you can see, we now have untracked files. An untracked file is a file that is located inside our repository but, at the same time, Git was never told to keep track of its status. If Sarah wants Git to keep track of this file, she needs to use the git add command:

$ git add recipe.txt

It may seem as if nothing happened, but the git status command tells another story:

$ git status
On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

	new file:   recipe.txt

With one single git add, we have actually performed two tasks at once:

We added the file recipe.txt to the repository. From now on, Git will keep track of all changes to this file.
We added this file to the so-called "staging area". What this means is that Git will save the current state of this file in our very next commit, unless we explicitly tell it not to do that.

Note: this behavior in which two commands are presented as a single one is common in Git. We will often describe the step-by-step behavior to make sure that no “magic” takes place, but in general you don’t need to think about the individual steps.

Since we just have this one file, we will now create a record of the state of all files added to the "staging area". In Git terms, we say that we will commit our changes to the repo:

$ git commit

Once we type this command, a text editor opens and asks us to type a description of the changes we are saving. I have added Sarah's description at the bottom to make it easier for us to read, but most programmers write it at the top. Feel free to write your description anywhere you want.


# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# On branch master
#
# Initial commit
#
# Changes to be committed:
#       new file:   recipe.txt
#

Adds the first version of my poached egg recipe

A lot has been written about how to write commit messages. This tutorial uses short messages of no more than one sentence for educational purposes, but I encourage you to dedicate a bit more time. Typically, a one-sentence title followed by an optional, longer description is ideal. I also prefer to start my commits with a verb in present tense.

When writing commit messages remember: they are not intended for you right now, but rather to future you. 5 months from now, which message will be more useful? "Fixes error" or "Corrects under-performing revenue numbers in our February 2019 report"?

Once we write this description, Git lets us know that all went well.

$ git commit
[master (root-commit) e809846] Adds the first version of my poached egg recipe
 1 file changed, 7 insertions(+)
 create mode 100644 recipe.txt

Ignoring the numbers between brackets for the moment, Git is showing us which changes are part of this commit. In this case we added a single file with 7 lines of text in it, but it could have been multiple files. If we now type git status again we’ll notice that Git has gotten less verbose:

$ git status
On branch master
nothing to commit, working tree clean

This is to be expected: all of our changes have been committed to the repository, and therefore there’s not much left to do.

Before we move forward, let’s make a quick recap of all the states in which a file can be - in your daily work you typically only need to think about "tracked" and "modified", but it's good to know the main ones:

                           git add
    +--------------------------------------------------+
    |                                                  |
    |                 file changes                     |
    |             +----------------+                   |
    |             |                |                   |
    |             |                v     git add       v
Untracked     Tracked         Modified -----------> Staged
                  ^                                    |
                  |                                    |
                  +------------------------------------+
                                git commit

A completely new file is untracked. Git doesn’t know anything about this file, and will never remember its changes. This is typically the case for temporary files that you have no interest in remembering and for files that you created just now and haven't added yet. It's also the case for files that you don't want other people to see such as files containing your password. Never add a password to Git unless you want the entire world to know it forever.
If you use the git add command with a file, you move it to the staging area. This means that the current state of the file will be saved the next time we run the git commit command. You typically do this when you are either happy about the state of the file or are about to do something dangerous and would like to have a backup.
After you add a file, its state changes to tracked. Git will remember the entire history of this file since its addition, but only when we tell it to do so via a commit.
If you modify a tracked file, its state changes to modified. This means that Git knows the file has changed since the last time and will tell you so, but it won’t do anything about it until you tell it to. Otherwise Git could save changes you didn't want to keep, and no one wants that.

Let’s see what happens when we modify out recipe (modifications in yellow):

Poached egg
===========
Ingredients:
  * 1 egg

Instructions:
  * Submerge the egg in 80°C water for 3-4 minutes.

$ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   recipe.txt

no changes added to commit (use "git add" and/or "git commit -a")

Git is now telling us a lot of information, so let’s go step-by-step:

the file "recipe.txt" has been modified.
if we want to keep the changes we made on this file, we can use git add <file> and Git will stage this file to be committed.
if those changes were a mistake and/or we changed our mind, we can go back to the last committed version of the file with git checkout -- <file>. Think of it as an "undo" command for files that haven't been committed yet. Part 5 talks about undoing changes in more detail.
in addition to using “git add”, we can use git commit -a to stage all modified files for commit at once.

Let’s go ahead and commit the current version of the file. We can use the -m flag to give the commit description right now without going through the text editor:

$ git commit -a -m "Lowers the required temperature"
[master 2045685] Lowers the required temperature
 1 file changed, 1 insertion(+), 1 deletion(-)

Git internals

Now that you know how to commit files, it is time to talk about what a commit actually is. The best way to think about them is to picture a commit as a list of all the differences between the previous version and the new one.

Here’s an example: let’s add some salt to our recipe:

Poached egg
===========
Ingredients:
  * 1 egg
  * Salt

Instructions:
  * Submerge the egg in 80°C water with salt for 3-4 minutes.
  * Add salt to taste.

The command git diff can tell us what has changed between the last commit and the current version of our file(s):

$ git diff
diff --git a/recipe.txt b/recipe.txt
index 3b35ae1..2de4951 100644
--- a/recipe.txt
+++ b/recipe.txt
@@ -2,6 +2,8 @@ Poached egg
 ===========
 Ingredients:
   * 1 egg
+  * Salt
 
 Instructions:
-  * Submerge the egg in 80°C water for 3-4 minutes.
+  * Submerge the egg in 80°C water with salt for 3-4 minutes.
+  * Add salt to taste.

This format is rather cryptic, but it’s not difficult to understand. What Git is telling us here is that the file recipe.txt was modified, with some lines going away (those marked with -) and some new lines coming in (those marked with +). Note that when you edit a line Git takes that as if you had erased it entirely and replaced it with a new one. A commit, then, can be understood as this specific set of changes to this specific set of files in our repo.

$ git commit -a -m "Adds salt everywhere"
[master ca0e2e3] Adds salt everywhere
 1 file changed, 3 insertions(+), 1 deletion(-)

Because we are thinking of a commit as a set of changes, a commit only makes sense if we also take into consideration the previous version of the file. And because that version depends on a previous one, we can visualize our commits as a chain going back to our very first file. We can see this chain with the git log command:

$ git log
commit ca0e2e38f38757e704bdc528f661e0d1e07fe23c (HEAD -> master)
Author: Sarah Arleen 
Date:   Tue Sep 7 18:12:49 2021 +0200

    Adds salt everywhere

commit 2045685293363b992885d66332767bf6fb6507a7
Author: Sarah Arleen 
Date:   Tue Sep 7 18:10:45 2021 +0200

    Lowers the required temperature

commit e80984624024e08568f05e50cbbb1f9054ab1fde
Author: Sarah Arleen 
Date:   Tue Sep 7 18:08:03 2021 +0200

    Adds the first version of my poached egg recipe

This command is showing us a lot of information about a commit:

the ID (or “hash”) of the commit in its long form (you can use only the first seven characters),
who made the commit and when,
the description of the commit,
the latest commit also shows a (HEAD -> master) field. HEAD means "this is the commit where your next work will be added", while master refers to our current branch. We will talk about branches in Part 4.

If we want to see an old version of our file, we can use git show and a specific commit hash (or, in this case, the first seven characters of the hash) to go back in time:

$ git show 2045685:recipe.txt
Poached egg
===========
Ingredients:
  * 1 egg

Instructions:
  * Submerge the egg in 80°C water for 3-4 minutes.

These commit hashes are how Git identifies every individual commit. We have seen them before: the short version of the hash is shown in brackets after every commit. We’ll be using these hashes in our more advanced lessons, but otherwise they are yet another detail that we can safely ignore for now.

One tip worth keeping in mind: you can add an easy-to-read tag to a commit and use it instead of its hash:

$ git tag v1.0 2045685
$ git show v1.0:recipe.txt
(same output as above)

If you want to see all existing tags, the command git tag will list them all. These tags will also show up in the output of git log.

$ git tag
v1.0

$ git log
commit ca0e2e38f38757e704bdc528f661e0d1e07fe23c (HEAD -> master)
Author: Sarah Arleen 
Date:   Tue Sep 7 18:12:49 2021 +0200

    Adds salt everywhere

commit 2045685293363b992885d66332767bf6fb6507a7 (tag: v1.0)
Author: Sarah Arleen 
Date:   Tue Sep 7 18:10:45 2021 +0200

    Lowers the required temperature

commit e80984624024e08568f05e50cbbb1f9054ab1fde
Author: Sarah Arleen 
Date:   Tue Sep 7 18:08:03 2021 +0200

    Adds the first version of my poached egg recipe

git show is convenient if we want to see an older version of a single file, but what if we want to see the state of the entire project instead? Here I have good and bad news. The good news are that there is a single command that does exactly that. But you'll have to wait until Part 5, "Revisiting the past", or I risk losing you with too many concepts at once.

Part 1 quick review

git init <directory>: Creates an empty Git repository in the given directory.
git status: Shows information about what has changed in your repository since your last commit.
git add: Tells git to start keeping track of a previously-unknown file.
git commit: Saves the current state of the files in your repository. This snapshot in time receives an identifier called a hash.
git tag: Adds a human-readable label to a specific commit so we don't need to remember it's hash.
git diff: Shows what has changed between two commits. If we don't give it any arguments, it shows what has changed between the last commit and now.
git show: Shows the content of a specific file at a specific point in time.

This concludes our first part. With the commands we’ve seen until now you can keep track of file changes, see their older versions, and add little tags to those versions you may want to revisit in the future.

In Part 2 we’ll talk about how to collaborate with other users, how to work in parallel, and how to solve collaboration issues.

Headers examples

Part 1: Git basics

But first, some technical details

First steps

Git internals

Part 1 quick review