Learning Objectives:
Describe the concept of version control and why it is important to use
Utilize the diff and patch commands to automate differentiating and editing files
Explain what Git is and its benefits of use
Install Git on local machine
Utilize Git to create and clone repositories, add code, check the status of code, and commit code
Course Introduction
This course focuses on how to keep track of the different versions of your code and configuration files using version control systems or VCS.
In this course, we'll introduce you to a popular VCS called Git, and show you some of the ways you can use it.
We'll also go through how to set up an account with the service called GitHub, so that you can create your very own remote repositories to store your code and configuration.
By the end of this course, you'll be able to store your codes history in Git, and collaborate with others in GitHub, where you'll also start creating your own portfolio.
On top of search results, here are some great Git resources available online:
Pro Git: This book (available online and in print) covers all the fundamentals of how Git works and how to use it. Refer to it if you want to learn more about the subjects that we cover throughout the course.
Git tutorial: This tutorial includes a very brief reference of all Git commands available. You can use it to quickly review the commands that you need to use.
Before Version Control
Intro to Module 1: Version Control
When trying to manage change in IT, it's super important to have detailed historical information for your organization's configuration files and automation code.
This let's the administrators see what was modified and when, which can be critical to troubleshooting.
It also provides a documentation trail that will let future IT specialists know why the infrastructure is the way it is, and it provides a mechanism for undoing a change completely.
Keeping Historical Copies
Have you ever worked on a project that was developing over time?
First, you need to remember to make the copy.
Second, you usually make a copy of the whole thing, even if you're only changing one small part.
And third, even if you're emailing your changes to your colleagues, it might be hard to figure out at the end who did what, and more importantly, why they did it.
The principle behind version control is the same. It lets us keep track of the changes in our files.
Diffing Files
We can use the diff command line tool to take two files or even to directories, and show the differences between them in a few formats.
Example:
We have two files rearrange 1.py and rearrange 2.py which contain two different versions of the same function.
When we call the diff command: diff rearrange1.py rearrange2.py
We get only the lines that are different between two files.
See the symbols at the beginning of each of those lines? The “<“ symbol tells us that the first line was removed from the first file, and the “>” symbol tells us that the second line was added to the second file. In other words, the old line got replaced by the new one.
Example:
Here there are more changes going on. We can see that diff splits the changes in two separate sections.
The section that starts with 5c5,6 shows a line in the first file that was replaced by two different lines in the second file. The number at the beginning of this section indicates the line number in the first and second files. The c in between the numbers means that a line was changed.
The section that starts with 11a13,15 shows three lines that are new in the second file. The a stands for added, but that block looks a bit strange doesn't it? It seems like we're adding a return and an if condition but nobody for the if block. What's up with that? To understand this better we can use the -u flag to tell diff to show the differences in another format.
This unified format is pretty different from the one that we saw before. It shows the change lines together with some context, using the “-“ sign to mark lines that were removed, and the “+” sign to mark lines that were added. The extra context let's us better know what's going on with the change that we're diffing. We can see that the new file actually has a completely new if block that's part of a chain of conditionals that looks very similar, and that's why with the diff output that we saw before, it was a little confusing which lines had been added.
There are a lot of tools out there to compare files. Diff is the most popular one, but not the only one available. For example, wdiff highlights the words that have changed in a file instead of working line by line like diff does.
To help us even more, there are bunch of graphical tools that display files side by side and highlight the differences by using color. Some examples of this include: meld, KDiff3, or vimdiff.
Applying Changes
Imagine a colleague sends you a script with a bug and asked you to help fix the issue. To make the change clear, you could send him a diff with the change so that they can see what the modified code looks like.
To do this, we typically use a command line like: diff -u old_file new_ file > change.diff
As a reminder, the greater than sign redirects the output of the diff command to a file. So with this command, we're generating a file called change.diff with the contents of diff -u command.
The generated file is usually referred to as a diff file or sometimes a patch file. It includes all the changes between the old file and the new one, plus the additional context needed to understand the changes and to apply those changes back to the original file.
There's a command called patch that takes a file generated by diff and applies the changes to the original file. How do we do that? We'll pass the name of the file that we want to patch as the first parameter to the command and then we'll provide the diff file through standard input.
We get one single line that says the file was patched, which means that we've successfully applied the changes.
You might be wondering, why go through all this trouble diffing, and patching, and not just send the whole file instead?
There are a few reasons for this:
The main reason is that the original code could have changed. By using a diff instead of the whole file, we can clearly see what they changed, no matter which version they were using. The patch command can detect that there were changes made to the file and will do its best to apply the diff anyways. It won't always succeed but in many cases it will.
Another reason is structure. In this case we're patching a single small file. But sometimes, you might be modifying a bunch of large files inside of a huge project. Say you are changing four files in a project tree that contain 100 different files, arranged in different directories according to what they do. If you were to send the whole files, you'd need to specify where those files were supposed to be placed. As we called out, we can diff whole directory structures and in that case the diff file can specify where each change file should be without having to do any manual juggling.
Practical Application of diff and patch
Imagine this, a colleague is asking our help with fixing a script named disk_usage.py.
Before we change anything, let’s make a couple copies of the script. We'll add _original to one copy, which we’ll keep unmodified and use for comparison and _fixed to the other copy, which we’ll use to repair our fix.
After making the changes, we need to send a fixed to our colleague so that they can fix their script.
By calling patch with the diff file, we've applied the changes that were necessary to fix the bugs. Let's check that disk_usage.py now executes successfully.
But this is still a very manual process, where version control systems can really help.
diff and patch Cheat Sheet
diff
diff is used to find differences between two files. On its own, it’s a bit hard to use; instead, use it with diff -u to find lines which differ in two files:
diff -u
diff -u is used to compare two files, line by line, and have the differing lines compared side-by-side in the same output. See below:
~$ cat menu1.txt Menu1: Apples Bananas Oranges Pears ~$ cat menu2.txt Menu: Apples Bananas Grapes Strawberries ~$ diff -u menu1.txt menu2.txt --- menu1.txt 2019-12-16 18:46:13.794879924 +0900 +++ menu2.txt 2019-12-16 18:46:42.090995670 +0900 @@ -1,6 +1,6 @@ -Menu1: +Menu: Apples Bananas -Oranges -Pears +Grapes +Strawberries
Patch
Patch is useful for applying file differences. See the below example, which compares two files. The comparison is saved as a .diff file, which is then patched to the original file!
~$ cat hello_world.txt Hello World ~$ cat hello_world_long.txt Hello World It's a wonderful day! ~$ diff -u hello_world.txt hello_world_long.txt --- hello_world.txt 2019-12-16 19:24:12.556102821 +0900 +++ hello_world_long.txt 2019-12-16 19:24:38.944207773 +0900 @@ -1 +1,3 @@ Hello World + +It's a wonderful day! ~$ diff -u hello_world.txt hello_world_long.txt > hello_world.diff ~$ patch < hello_world.diff patching file hello_world.txt ~$ cat hello_world.txt Hello World It's a wonderful day!
There are some other interesting patch and diff commands such as patch -p1, diff -r !
Check them out in the following references: