Week 2: Using Git Locally
Learning Objectives:
Use an advanced understanding of Git
Skip the staging area to delete and move files within Git
Amend and roll back commits
Explain the concept of branching and merging
Create new branches
Use merging to combine branched data
Manage and handle merge conflicts
Table of Contents:
Advanced Git Interaction
Skipping the Staging Area
If we already know that the current changes are the ones that we want to commit, we can skip the staging step and go directly to the commit. We do this by using the -a flag to the git commit command.
This flag automatically stages every file that's tracked and modified before doing the commit letting it skip the git add step.
At first, you might think that git commit dash a is just a shortcut for git add followed by git commit but that's not exactly true. Git commit -a doesn't work on new files because those are untracked. Instead, git commit -a is a shortcut to stage any changes to tracked files and commit them in one step.
The head indicator is moved to the latest commit.
Git uses the head alias to represent the currently checked out snapshot of your project. This lets you know what the contents of your working directory should be. In this case, the current snapshot is the latest commit in the project. Think about it as a bookmark that you can use to keep track of where you are. Even if you have multiple books to read, the bookmark allows you to pick up right where you left off.
We'll soon learn about branches. In that case, head can be a commit in a different branch of the project.
As a shortcut, it's generally easy to think of head as a pointer to the current branch, although it can be more powerful than that.
Getting More Information About Our Changes
We've seen how git log shows us the list of commits made in the current Git repository. By default, it prints:
the commit message,
the author, and
the date of the change.
This is useful, but if we need to look at the actual lines that changed in each commit, we can do this with git log -p. The p comes from patch, because using this flag gives us the patch that was created.
If we don't want to scroll down until we find the commit that we're actually interested in, another option is to use the git show command. This command takes a commit ID as a parameter, and will display the information about the commit and the associated patch.
Another interesting flag for git log is the --stat flag. This will cause git log to show some stats about the changes in the commit, like which files were changed and how many lines were added or removed.
Sometimes it can take a while until we're ready to commit. Imagine you've been working on adding a new complex feature to a script and it requires thorough testing. Before committing it, you need to make sure that it works correctly.
Check that all the test cases are covered and so on and so on. So while doing this you find bugs in your code that you need to fix.
It's only natural that by the time you get to the commit step you don't really remember everything you changed.
To help us keep track git gives us the git diff command. This format is equivalent to the diff -u output that we saw in an earlier video.
We could pass a file by parameter to see the differences relevant to that specific file instead of all the files at the same time.
Something else we can do to review changes before adding them is to use the -p flag with the git add command. When we use this flag, git will show us the change being added and ask us if we want to stage it or not.
git diff shows only unstaged changes by default. Instead, we can call git diff -- staged to see the changes that are staged but not committed.
Deleting and Renaming Files
Let's say that you've decided to clean up some old scripts and want to remove them from your repository.
You can remove files from your repository with the git rm command, which will stop the file from being tracked by git and remove it from the git directory.
File removals go through the same general workflow that we've seen. So you'll need to write a commit message as to why you've deleted them.
23 lines in the file that are no longer there. And it states the file itself was deleted.
What if you have a file that isn't accurately named?
You can use the git mv command to rename files in the repository.
The git mv command works in a similar way to the mv command on Linux and so can be used for both moving and renaming.
If our repository included more directories in it, we can use the same git mv command to move files between directories.
The output of git status is a super useful tool to help us know what's up with our files. It shows us which files have tracked or untracked changes, and which files were added, modified, deleted or renamed.
If there are files that get automatically generated by our scripts, or our operating system generates artifacts that we don't want in our repo, we'll want to ignore them so that they don't add noise to the output of git status.
To do this, we can use the .gitignore file. Inside this file, we'll specify rules to tell git which files to skip for the current repo. To do this, we'll create a .gitignore file containing the name of this file.
Remember that the dot prefix in a Unix-like file system indicates that the file or directory is hidden and won't show up when you do the normal directory listing. That's why we have to use ls -la to see all files.
We've added a .gitignore file to our repo but we haven't committed it yet. This file needs to get tracked just like the rest of the files in the repo.
Advanced Git Cheat Sheet
Command | Explanation & Link |
---|---|
git commit -a | |
git log -p | |
git show | |
git diff | Is similar to the Linux `diff` command, and can show the differences in various commits |
git diff --staged | An alias to --cached, this will show all staged files compared to the named commit |
git add -p | Allows a user to interactively review patches to add to the current commit |
git mv | |
git rm | Similar to the Linux `rm` command, this deletes, or removes a file |
There are many useful git cheatsheets online as well. Please take some time to research and study a few, such as this one.
.gitignore files
.gitignore files are used to tell the git tool to intentionally ignore some files in a given Git repository. For example, this can be useful for configuration files or metadata files that a user may not want to check into the master branch. Check out more at: Git - gitignore Documentation .
A few common examples of file patterns to exclude can be found here.
Undoing Things
Undoing Changes Before Committing
You can change a file back to its earlier committed state by using the git checkout command followed by the name of the file you want to revert.
With that, we've demonstrated how we can use git checkout to revert changes to modify files before they get staged. This command will restore the file to the latest storage snapshot, which can be either committed or staged.
If you need to check out individual changes instead of the whole file, you can do that using the - p flag. This will ask you change by change if you want to go back to the previous snapshot or not.
That's it for undoing unstaged changes.
What if you added the changes to the staging area already?
We can unstage our changes by using the git reset command. Staging changes that we don't actually intend to commit happens all the time. Especially if we use a command like git add star, where the star is a file glob pattern used in Bash that expands to all files. This command will end up adding any change done in the working tree to the staging area.
We can see that this output file, which was supposed to be a temporary file for debugging, has now been staged in our repo but we didn't want to commit it.
Conveniently, the git status command tells us how to unstage the file right there in the output.
The example output mentions the head alias, the current checked out snapshot. So by running the suggested command, we're resetting our changes to whatever's in the current snapshot.
You can use git reset - p to get git to ask you which specific changes you want to reset.
Amending Commits
Let's say you just finished committing your latest batch of work, but you've forgotten to add a file that belongs to the same change. You'll want to update the commit to include that change. Or maybe the files were correct, but you realize that your commit message just wasn't descriptive enough. So you want to fix the description to add a link to the bug that you're solving with that commit. What can you do?
We can solve problems like these using the --amend option of the git commit command. When we run git commit --amend, git will take whatever is currently in our staging area and run the git commit workflow to overwrite the previous commit.
The list of added files for this commit now includes both files that we wanted to add. Now that the files have been added, we can also improve our initial commit message which was a bit too short.
Let's save the new description as usual. We've amended our previous commit to include both files and a better message.
You could also just update the message of the previous commit by running the git commit --amend command with no changes in the staging area.
While git --amend is okay for fixing up local commits, you shouldn't use it on public commits.
Meaning, those that have been pushed to a public or shared repository. This is because using --amend rewrites the git history removing the previous commit and replacing it with the amended one. This can lead to some confusing situations when working with other people and should definitely be avoided.
So remember, fixing up a local commit with amend is great and you can push it to a shared repository after you fixed it. But you should avoid amending commits that have already been made public.
Rollbacks
There are a few ways to rollback commits in Git. For now, we'll focus on using the git revert command. Git revert doesn't just mean undo. Instead, it creates a commit that contains the inverse of all the changes made in the bad commit in order to cancel them out.
For example, if a particular line was added in the bad commit, then in the reverted commit, the same line will be deleted.
This way you get the effect of having undone the changes, but the history of the commits in the project remains consistent leaving a record of exactly what happened.
We can revert the latest commit by using the head alias that we mentioned before. Since we can think of head as a pointer to the snapshot of your current commit, when we pass head to the revert command we tell Git to rewind that current commit.
So once we issue that git revert HEAD command, we're presented with the text editor commit interface that we've all seen before. In this case, we can see that git has automatically added some text to the command indicating it's a rollback. The first-line mentions that it's reverting the commit we just did called “Add call to disk full function”. The extra description even includes the identifier of the commit that got reverted.
While we could use this description as is, it's usually a good idea to add an explanation of why we're doing the rollback. Remember that the goal of these descriptions is to help our future selves understand why things happen. In this case, we'll explain that the reason for the rollback is that the code was calling a function that wasn't defined. Once we're done entering the description, we can exit and save as usual.
You'll notice the output that we get from the git revert command looks like the output of the git commit command. This is because git revert creates a commit for us.
Let's look at the last two entries in the log using -p and -2 as parameters.
As demonstrated before, the -p parameter lets us see the patch created by the commit while the -2 perimeter limits the output to the last two entries.
So in this log, we can see that when we called revert, git created a new commit that's the inverse of the previous one.
We can see that the original commit shows the lines we added by preceding them with a plus sign.
The same line shows up with a minus sign in the newer commit message indicating that they were removed.
In this example, we reverted the latest commit in our tree. But what if we had to revert a commit that was done before that?
Identifying a Commit
We can target a specific commit by using its commit ID. Commit IDs are those complicated looking strings that appear after the word commit in the log messages.
The commit ID is a 40 character long string. This long jumble of letters and numbers is actually something called a hash, which is calculated using an algorithm called SHA1.
Essentially, what this algorithm does is take a bunch of data as input and produce a 40 character string from the data as the output. In the case of Git, the input is all information related to the commit, and the 40 character string is the commit ID.
Cryptographic algorithms like SHA1 can be really complex, so we won't go too deep into what this means.
Still you might be wondering, why on earth would you use a long jumble of letters as an ID for commit, instead of incrementing an integer, like 123, etc?
To answer that, let's take a quick look at the reason why Git uses a hash instead of a counter, and how that hash is computed.
Although SHA1 is a part of the class of cryptographic hash functions, Git doesn't really use these hashes for security.
Instead, they're used to guarantee the consistency of our repository. Having consistent data means that we get exactly what we expect. This is really useful in distributed systems like Git because everyone has their own repository and is transmitting their own pieces of data.
Computing the hash keeps data consistent because it's calculated from all the information that makes up a commit. The commit message, date, author, and the snapshot taken of the working tree.
The chance of two different commits producing the same hash, commonly referred to as a collision, is extremely small. It'd take a lot of processing power to cause this to happen on purpose.
If you use a hash to guarantee consistency, you can't change anything in the Git commit without the SHA1 hash changing too.
Remember our discussion about fixing commits with the --amend command? Each time we amend a commit, the commit ID will change. This is why it's important not to use dash dash amend on commits that have been made public.
The data integrity offered by the commit ID means that if a bad disk or network link corrupt some data in your repository, or worse, if someone intentionally corrupt some data, Git can use the hash to spot that corruption. It will say, the data you've got isn't the data you expected, something went wrong.
How can you use commit IDs to specify a particular commit to work with, like during a rollback?
Let's look at the last two entries in our repo using the git log -2 command. Say we realized that we actually liked the previous name of our script, and so we want to revert this commit where we renamed it.
First, let's look at that specific commit using git show. We've copied and pasted the commit ID that we wanted to display, and that works.
Alternatively, we could provide just the first few characters identifying the commit to the command, and Git will be smart enough to guess which commit ID starts with those characters, as long as there's only one matching possibility. Two characters is not enough, but usually four to eight characters will be plenty.
Okay, now that we've seen how we can identify the commit that we want to revert, let's call the git revert command with this identifier.
As usual, this will open an editor where we should add a reason for the rollback. In this case, we'll say that the previous name was actually better.
As we called out before, when we generate the rollback, Git automatically includes the ID of the commit that we're reverting. This is useful when looking at a repo with a complicated history that includes a lot of commits.
Now, once we save and exit the commit message, Git will actually perform the rollback and generate a new commit with its own ID.
Git Revert Cheat Sheet
git checkout is effectively used to switch branches.
git reset basically resets the repo, throwing away some changes. It’s somewhat difficult to understand, so reading the examples in the documentation may be a bit more useful.
There are some other useful articles online, which discuss more aggressive approaches to resetting the repo.
git commit --amend is used to make changes to commits after-the-fact, which can be useful for making notes about a given commit.
git revert makes a new commit which effectively rolls back a previous commit. It’s a bit like an undo command.
There are a few ways you can rollback commits in Git.
There are some interesting considerations about how git object data is stored, such as the usage of sha-1.
Feel free to read more here:
Branching and Merging
What is a branch?
In Git, a branch at the most basic level is just a pointer to a particular commit. But more importantly, it represents an independent line of development in a project. Of which the commit it points to is the latest link in a chain of developing history.
The default branch that Git creates for you when a new repository initialized is called master.
The master branch is commonly used to represent the known good state of a project. When you want to develop a feature or try something new in your project, you can create a separate branch to do your work without worrying about messing up this current working state.
You can merge back into the master branch, when you've got something you like, or discard your changes without negative impact if they don't work out.
Creating New Branches
We can use the git branch command to list, create, delete, and manipulate branches.
Running git branch by itself will show you a list of all the branches in your repository.
We create a new branch by calling git branch with the name of the new branch.
The list we get shows that we're still on the master branch. We can tell because the current branch is indicated in the command's output with an asterisk in a different color.
To switch to a new branch, we'll need to use the git checkout command. We saw earlier how we can use git checkout to restore a modified file back to the latest commit. Checking out branches is similar in that, the working tree is updated to match the selected branch including both the files and the git history.
It might help to remember that we use git checkout to check out the latest snapshot for both files and for branches.
Creating a branch and switching to it immediately is a pretty common task. We can use the git checkout -b
new branch to do this.
Now that we have our shiny new branch, let's create a new file in. We'll create a new Python3 file, that will include the usual shebang line and empty main function and a call to that function.
Let's check the last two entries in the log. We see the last two commits in this branch. Notice how next to the latest commit ID, git shows that this is where head is pointing to and that the branch is called even better feature.
Next to the previous commit, git shows that both the master and the new feature branches are pointing to that snapshot of the project.
In this way, we can see that the even-better-feature branch is ahead of the master branch.
Working with Branches
We created a new branch different than the master branch and added a commit to it. Let's check out the current status of our repo by calling git status and ls -l.
So we see that we're on a clean working tree in the even-better-feature branch, and that a new free_memory.py file is in our working tree.
Let's now change back to the master branch using git checkout master and then lists the latest two commits there.
When we switch to a different branch using git checkout, under the hood, git changes where head is pointing.
The commit from even-better-feature doesn't show up at all, and the latest snapshot is the second entry we've seen before.
Remember that when we switch branches, git will also change files in our working directory or working tree to whatever snapshot head is currently pointing at.
Let's look at the current contents of our directory.
Free memory py isn't there. This demonstrates that when we switch branches in git, the working directory and commit history will be changed to reflect the snapshot of our project in that branch.
When we check out a new branch and commit on it, those changes will be added to the history of that branch.
One thing to note after all this back and forth, is that each branch is just a pointer to a specific commit in a series of snapshots. It's very easy to create new branches because there isn't any data that needs to be copied around.
When we switch to another branch, we check out a different commit and git updates both head and the contents of our working directory.
We can delete a branch by using git branch -d.
If there are changes in the branch we want to delete that haven't been merged back into the master branch, git will let us know with an error. Hopefully, git also gives us the command to run if we were sure that we wanted to delete the branch, even if it has unmerged changes.
Merging
A typical workflow for managing branches in Git, is to create a separate branch for developing any new features or changes. Once the new feature's in good shape, we merge the separate branch back into the main trunk of code.
Merging is the term that Git uses for combining branch data and history together. We'll use the git merge command, which lets us take the independent snapshots and history of one Git branch, and tangle them into another.
As we're on the master branch, HEAD points at master. We can see the even-better-feature and master branches are now both pointing at the same commit.
Git uses two different algorithms to perform a merge:
fast-forward merge
three-way merge
The merge we just performed is an example of a fast-forward merge. This kind of merge occurs when all the commits in the checked out branch are also in the branch that's being merged.
If this is the case, we can say that the commit history of both branches doesn't diverge. In these cases, all Git has to do is update the pointers of the branches to the same commit, and no actual merging needs to take place.
On the other hand, a three-way merge is performed when the history of the merging branches has diverged in some way, and there isn't a nice linear path to combine them via fast-forwarding.
This happens when a commit is made on one branch after the point when both branches split.
In our case, this could have happened if we made a commit on the master branch after creating the other branches.
When this occurs, Git will tie the branch histories together with a new commit. And merge the snapshots at the two branch tips with the most recent common ancestor, the commit before the divergence.
To do this successfully, Git tries to figure out how to combine both snapshots. If the changes were made in different files, or in different parts of the same file, Git will take both changes and put them together in the result. If instead the changes are made on the same part of the same file, Git won't know how to merge those changes, and the attempt will result in a merge conflict.
Merge Conflicts
We might find that both the branches we're trying to merge have edits to the same part of the same file. This will result in something called a merge conflict.
Normally, Git can automatically merge files for us. But when we have a merge conflict, it will need a little help to figure out what to do. To see how this would look, let's edit the free_memory.py file in the master branch.
Next, Let's check out the even-better-feature branch and make a change in the same place.
Let's check out the master branch again and try to merge the even-better-feature back into it. Git tells us it tried to automatically merge the two versions of the free memory file, but it didn't know how to do it.
We can use Git's status to get more information about what's going on.
It tells us that we have files that are currently unmerged, and that we need to fix the conflicts or abort the merge if we decide it was a mistake. It also tells us that we need to run Git add on each unmerged file to mark that the conflicts have been resolved.
To fix the conflict, let's open up free_memory.py in our text editor. Thankfully, Git has added some information to our files to tell us which parts of the code are conflicting.
The unmerged content of the file at head, remember, in this case, head points to master, is the docstring stating what the main function should do.
The unmerged content of the file in the even-better-feature branch is the call to the print function.
It's up to us to decide which one to keep or if we should change the contents of the file altogether.
In this case, we'll keep both statements and delete the merger markers. Now that we've fixed the conflict, we'll mark it as resolved by running git add on the file, and then call the git status to see how our merge is doing.
The comments that git commit shows us look different than other commits. That's because this is a merge and Git tells us so. It also tells us which file had conflicts which have now been resolved. The commit already has a description saying that it's merging the other branch. This description was automatically created when we called the git merge command. But we can add onto this description if we want.
To see what the commit history looks like now, we'll use a couple of handy options to the git log command; --graph for seeing the commits as a graph, and --oneline to only see one line per commit.
This format helps us better understand the history of our commits and how merges have occurred. We can see the new commit that was added and also the two separate commits that we merged. One coming from the master branch and the other coming from the even-better-feature branch. We can also see that master is pointing to the merge commit but even-better-feature is still pointing to the previous one.
If you want to throw the merge away and start over, you can use the git merge --abort command as an escape hatch. This will stop the merge and reset the files in your working tree back to the previous commit before the merge ever happened.
So by now you know how to create, delete, and switch between branches in Git.
Git Branches and Merging Cheat Sheet
Command | Explanation & Link |
git branch | |
git branch <name> | |
git branch -d <name> | |
git branch -D <name> | |
git checkout <branch> | |
git checkout -b <branch> | Creates a new branch and switches to it. |
git merge <branch> | |
git merge --abort | If there are merge conflicts (meaning files are incompatible), --abort can be used to abort the merge action. |
git log --graph --oneline | This shows a summarized view of the commit history for a repo. |
Module 2 Review
Module 2 Wrap Up: Using Git Locally
We started with some advanced commands, like skipping the staging area, getting additional info about our commits, and being able to delete and rename files in our repository.
After that, we dove into one of the main concepts of version control, the ability to undo things. We looked at how to revert unstaged and staged changes, how to amend commits, and how to perform rollbacks, whether from the latest commit or an older one.
Finally, we cut through the entangled world of branching. We learned how to create, switch to, and delete branches. We looked into how to merge branches, and how to solve pesky merge conflicts when they cause trouble.