Week 3: Working with Remotes
Learning Objectives:
Describe the advantages of using separate branches
Utilize the git rebase command
Describe what GitHub is and how to interact with it
Explain what a remote repository is
Utilize remote repositories, fetch new changes, and update local repositories
Utilize the pull-merge-push workflow to address conflicts
Push remote branches so code can be viewed and tested by collaborators
Explain what rebasing is
Table of Contents:
Introduction to GitHub
Intro to Module 3: Working with Remotes
In this module, we'll learn a load of new things related to GitHub and remote repositories. We'll first talk about what GitHub is and why it matters, and then we'll dive into how to work with GitHub and other remote repositories. Being able to use remote repositories allows us to effectively collaborate with others.
What is GitHub?
Git is a distributed version control system. Distributed means that each developer has a copy of the whole repository on their local machine. Each copy is a peer of the others.
But we can host one of these copies on a server and then use it as a remote repository for the other copies. This lets us synchronize work between copies through this server.
Any of us can create a Git server like this one, and many companies have similar internal services. But if you don't want to set up a Git server yourself and host your repositories, you can use an online service like GitHub.
GitHub is a web-based Git repository hosting service. On top of the version control functionality of Git, GitHub includes extra features like bug tracking, wikis, and task management.
GitHub lets us share and access repositories on the web and copy or clone them to our local computer, so we can work on them.
Other services that provide similar functionality are BitBucket, and GitLab.
GitHub provides free access to a Git server for public and private repositories.
It limits the number of contributors for the free private repositories, and offers an unlimited private repository service for a monthly fee.
We'll be using a free repository for our examples, which is fine for educational use, small personal projects, or open source development.
A word of caution on how you can manage these repos though. If hackers get hold of information about your organization's IT infrastructure, they can use it to try and break into your network. So make sure you treat this information as confidential. For real configuration and development work, you should use a secure and private Git server, and limit the people authorized to work on it.
Once you've created a GitHub account, you can either create your own repos or contribute to repos from other projects.
Visit http://github.com to sign up for their service.
Basic Interaction with GitHub
Once you have your GitHub account, you're ready to create your brand new repository on GitHub.
Going step-by-step.
We'll start by clicking the Create a repository link on the left.
This will take us to the repo creation wizard. The wizard is pretty straightforward.
The first thing we need to do is give a name for our repo. We'll call this repo health checks.
After that comes a description of what the repo will be used for. We'll say that'll be used for scripts that check the health of our computers.
Then we need to select whether we want the repo to be public or private. We'll go with private for now.
Finally, the wizard can help us get started with some few initialization files like a README, a gitignore, or license file. We'll go with just the README for now, and then create the repo.
Using the wizard, we created the repo and have a fresh remote repository ready to go.
First step is to create a local copy of the repository. We'll do that by using the git clone command followed by the URL of the repo. GitHub conveniently lets us copy the URL from our repo from the interface so that we don't have to type it.
ย
We're now ready to clone the repo into our computer. We'll do that by calling git clone and paste in the URL we copied. To do this, GitHub will ask for our username and password. Just like that, we've downloaded a copy of the remote repository from GitHub onto the local machine.
This means that we can perform all the git actions that we've learned up till now. Since the repo is called health checks, a directory with that name was automatically created for us and now has the working tree of the Repository in it.
So let's change that directory and look at the contents. Our repo is basically empty. It only has the README file that GitHub created for us. This file is in a special format called markdown. Let's add a bit more content to it.
We've changed this file. We need to stage the change and committed. We've seen a couple of different ways to do that. Let's use our shortcuts to do this in just one command.
We got to remote repository set up on GitHub. So let's use it. We can send our changes to that remote repository by using the git push command which will gather all the snapshots we've taken and send them to the remote repository. Once again, we're asked for our password. After that, we see a bunch of messages from git related to the push.
ย
When we access our project, we see the contents of the README file. So if we check our repository on GitHub, we should see the updated message.
You've probably noticed that we had to enter our password both when retrieving the repo and when pushing changes to the repo.
There are a couple ways to avoid having to do this:
One way is to create an SSH key pair and store the public key in our profile so that GitHub recognizes our computer.
Another option is to use a credential helper which caches our credentials for a time window so that we don't need to enter our password with every interaction.
Git already comes with a credential helper baked in. We just need to enable it. We do that by calling git config --global credential.helper cache
Now that we've enabled the credential helper, we'll need to enter our credentials once more. After that, they'll be cached for 15 minutes.
To check this, we can try another git command, git pull which is the command we use to retrieve new changes from the repository. We'll enter our credentials on the first call to the command and they'll be cached, so we won't need to enter them again.
With that, we've seen how to create repositories on GitHub, clone our remote repository, push changes to it, and pull changes from it.
Basic Interaction with GitHub Cheat-Sheet
There are various remote repository hosting sites:
Follow the workflow at https://github.com/join to set up a free account, username, and password. After that, these steps will help you create a brand new repository on GitHub.
Some useful commands for getting started:
Command | Explanation & Link |
git clone URL | Git clone is used to clone a remote repository into a local workspace |
git push | Git push is used to push commits from your local repo to a remote repo |
git pull | Git pull is used to fetch the newest updates from a remote repository |
This can be useful for keeping your local workspace up to date.
https://help.github.com/en/articles/caching-your-github-password-in-git
https://help.github.com/en/articles/generating-an-ssh-key ย ย
Using a Remote Repository
What is a remote?
When we clone the newly created GitHub repository, we had our local Git Repo interact with a remote repository.
Remote repositories are a big part of the distributed nature of Git collaboration. It let lots of developers contribute to a project from their own workstations making changes to local copies of the project independently of one another.
When they need to share their changes, they can issue git commands to pull code from a remote repository or push code into one.
There are a bunch of ways to host remote repositories. There is many internet-based Git hosting providers like GitHub, BitBucket or GitLab which offer similar services.
We can also set up a Git server on our own network to host private repositories. A locally hosted Git server can run on almost any platform including Linux, mac OS, or Windows. This has benefits like increased privacy, control, and customization.
Using Git to manage a project helps us collaborate successfully.
Everyone will develop their piece of the project independently in their own local repositories maybe even using separate branches. Occasionally they'll push finished code into a central remote repository where others can pull it and incorporate it into their new developments.
So how does this work?
Alongside the local development branches like master, Git keeps copies of the commits that have been submitted to the remote repository and separate branches.
If someone has updated a repository since the last time you synchronize your local copy, Git will tell you that it's time to do an update.
If you have your own local changes when you pull down the code from the remote repo, you might need to fix merge conflicts before you can push your own changes.
In this way Git let's multiple people work on the same project at the same time. When pulling new code it will merge the changes automatically if possible or will tell us to manually perform the integrating if there are conflicts.
So when working with remotes the workflow for making changes has some extra steps.
Will still modify stage and commit our local changes.
After committing, we'll fetch any new changes from the remote repo manually merge if necessary and only then will push our changes to the remote repo.
Git supports a variety of ways to connect to a remote repository. Some of the most common are using the HTTP, HTTPS and SSH protocols and their corresponding URLs.
HTTP is generally used to allow read only access to a repository. In other words, it lets people clone the contents of your repo without letting them push new contents to it.
Conversely HTTPS and SSH, both provide methods of authenticating users so you can control who gets permission to push.
The distributed nature of the work means that there are no limits to how many people can push code into a repository.
It's a good idea to control who can push codes to repos and to make sure you give access only to people you trust.
Web services like GitHub, offer a bunch of different mechanisms to control access to Repositories.
Some of these are available to the general public while others are only available to enterprise users.
Working with Remotes
When we call a git clone to get a local copy of a remote repository, Git sets up that remote repository with the default origin name. We can look at the configuration for that remote by running git remote -v in the directory of the repo.
Here we see the URLs associated with the origin remote. There are two URLs:
one will be used to fetch data from the remote repository
the other one to push data to that remote repo
They'll usually point at the same place. But in some cases, you can have the fetch URL use HTTP for read only access, and the push URL use HTTPS or SSH for access control. This is fine as long as the contents of the repo that you read when fetching are the same that you write to in pushing.
Remote repositories have a name assigned to them, by default, the assigned name is origin.
This lets us track more than one remote in the same Git directory. While this is not the typical usage, it can be useful when collaborating with different teams on projects that are related to each other.
If we want to get even more information about our remote, we can call git remote show origin. We can see the fetch and push URLs that we saw before, and the local and remote branches too. For now we only have a master branch that exists locally and remotely.
Whenever we're operating with remotes, Git uses remote branches to keep copies of the data that's stored in the remote repository.
We could have a look at the remote branches that our Git repo is currently tracking by running git branch -r. These branches are read only.
We can look at the commit history, like we would with local branches, but we can't make any changes to them directly.
To modify their contents, we'll have to go through the workflow we called out before.
First, we pull any new changes to our local branch,
then merge them with our changes
and push our changes to the repo.
We can also use git status to check the status of our changes in remote branches as well.
Now that we're working with a remote repository, git status gives us additional information. It tells us that our branch is up to date with the origin/master branch, which means that the master branch in the remote repository called origin, has the same commits as our local master branch.
Fetching New Changes
We could always use the GitHub website to browse the changes that were submitted. But we want to learn how to do it by interacting through the command line because you might need to do it this way at your job,and it'll work the same no matter which platform you use to interact with Git.
So first, let's look at the output of the git remote show origin command.
Check out how it says that the local branches out of date. This happens when there were commits done to the repo that aren't yet reflected locally. Git doesn't keep remote and local branches in sync automatically, it waits until we execute commands to move data around when we're ready.
To sync the data, we use the git fetch command. This command copies the commits done in the remote repository to the remote branches, so we can see what other people have committed.
Fetched content is downloaded to the remote branches on our repository. So it's not automatically mirrored to our local branches.
We can run git checkout on these branches to see the working tree, and we can run git log to see the commit history.
Let's look at the current commits in the remote repo by running git log origin/master.
Looking at this output, we can see that the remote origin/branch is pointing to the latest commit. While the local master branch is pointing to the previous commit we made earlier on.
Git status helpfully tells us that there's a commit that we don't have in our branch.
It does this by letting us know our branches behind their remote origin/master branch. If we want to integrate the branches into our master branch, we can perform a merge operation, which merges the origin/master branch into our local master branch.
To do that, we'll call git merge origin/master. We've merged the changes of the master branch of the remote repository into our local branch.
Git tells us that the code was integrated using fast-forward. It also shows that two files were added, all checks and disk_usage.py. If we look at the log output on our branch now, we should see the new commit.
We see that now our master branch is up to date with the remote origin/master branch.
We can use git fetch like this to review the changes that happen in the remote repository. If we're happy with them, we can use git merge to integrate them into the local branch.
Fetching commits from a remote repository and merging them into your local repository is such a common operation in Git that there's a handy command to let us do it all in one action.
Updating the Local Repository
Earlier, we took a look at the basic workflow for working with remotes when we want to fetch the changes manually, merge if necessary, and only then push any changes of our own.
Since fetching and merging are so common, Git gives us the git pull command that does both for us.
Running git pull will fetch the remote copy of the current branch and automatically try to merge it into the current local branch.
If you look closely at this output, you'll see that it includes the output of the fetch and merge commands that we saw earlier.
First, Git fetched the updated contents from the remote repository, including a new branch.
And then it did a fast forward merge to the local master branch.
We'll see that the all_checks file was updated as well.
We can look at the changes by using git log -p -1.
We see that our colleague added a check_disk_full function that includes the code from the other disk_usage.py file that we saw earlier.
So now I'll exit the editor with q.
Let's check out the output of git remote show origin and see what it says about that new branch.
We see that there's a new remote branch called experimental, which we don't have a local branch for yet. To create a local branch for it, we can run git checkout experimental.
When we checked out the experimental branch, Git automatically copied the contents of the remote branch into the local branch. The working tree has been updated to the contents of the experimental branch.
In this last example, we got the contents of the experimental bunch together with those of the master branch when we called git pull, which also merged new changes onto the master branch.
If we want to get the contents of remote branches without automatically merging any contents into the local branches, we can call git remote update. This will fetch the contents of all remote branches, so that we can just call checkout or merge as needed.
Git Remotes Cheat-Sheet
Command | Explanation & Links |
git remoteย | |
git remote -v | |
git remote show <name> | |
git remote update | |
git fetch | |
git branch -r | Lists remote branches; can be combined with other branch arguments to manage remote branches |
You can also see more in the video Cryptography in Action from the course IT Security: Defense against the digital dark arts.
Solving Conflicts
The Pull-Merge-Push Workflow
What if when we go to push our changes, there are new changes to the remote repo?
To find out, let's start by making a change to our all_checks.py script. Let's stage it and commit it as usual. We'll first use git add -p to look at the changes we made and accept them.
Then we'll create a commit message.
We've made our change, staged it, and committed it. We should be ready to push into the remote repo, except now we have a collaborator also making changes. Let's see what happens when we try running git push.
When we tried to push, Git rejected our change, that's because the remote repository contains changes that we don't have in our local branch that Git can't fast-forward.
As usual, Git gives us some helpful information along with the error message, especially the part about integrating remote changes with git pull.
This means we need to sync our local remote branch with the remote repository before we can push. We learned earlier that we can do this with git pull.
Git tried to automatically merge the local and remote changes to all_checks.py, but found a conflict.
Let's first look at the tree of commits on all branches as represented by git log --graph --oneline --all. This graph shows us the different commits and positions in the tree. We can see the master branch, the origin/master branch, and the experimental branch.
The graph indicates that our current commit and the commit in the origin/master branch share a common ancestor, but they don't follow one another.
This means that we'll need to do a three-way merge. To do this, let's look at the actual changes in that commit by running git log -p origin/master.
So our colleague decide to reorder the conditional clauses in the function to match the order that the parameters are passed to the function. They happen to change in the same line that we changed, which caused the conflict that Git couldn't resolve.
Let's fix it by editing the file to remove the conflict.
One thing to notice is that Git will try to do all possible automatic merges and only leave manual conflicts for us to resolve when the automatic merge fails. In this case, we can see that the other changes we made were merged successfully without intervention. Only the change that happened in the same line of the file needed our input.
We fixed the conflict here, and the file is short enough that we can very quickly check that there are no other conflicts. For larger files, it might make sense to search for the conflict markers โ>>>โ in the whole file. This lets us check that there are no unresolved conflicts left.
Nice, now that we fixed the conflict, you can finish the merge. The editor message shows that it's performing a merge of the remote branch with the local branch. We can add extra information to this message.
Our merge is finally ready, we can try pushing to the remote again.
Yes, after fixing the conflict, we were able to push our work to the remote repo. Let's look at the commit history of the master branch now, by calling git log --graph --oneline.
We see that the latest commit is the merge, followed by the two commits that caused the merge conflict, which are on split paths in our graph.
As we called out before, when Git needs to do a three-way merge, we end up with a separate commit for merging the branches back into the main tree.
Pushing Remote Branches
There are many advantages to to create separate branches:
For example, it might take you a while to finish a new feature and in the meantime, there could be a critical bug that needs fixing in the main branch of the code. By having separate branches, you can fix the bug in the main branch, release a new version and then go back to working on your feature without having to integrate your code before it's ready.
Another advantage of working in separate branches is that you could even release two or more versions out of the same tree. One being the stable version and the other being the beta version.
That way, any disruptive changes can be tested on a few users or computers before they're fully released.
You could create the branch first, and then check it out or we can just create it and check it out with git checkout-b and the new branch name. Let's open up the file, and change it.
A new change is made to the code, saved, tested and committed. (done twice)
The first time we push a branch to a remote repo, we need to add a few more parameters to the Git push command. We'll need to add the -u flag to create the branch upstream, which is another way of referring to remote repositories.
We'll also have to say that we want to push this to the origin repo, and that we're pushing the refactor branch.
New refactor branch has been created in the remote repo, which is what we wanted.
Rebasing Your Changes
We mentioned that once our branch has been properly reviewed and tested, it can get merged back into the master branch.
This can be done by us or by someone else.
One option is to use the git merge command that we discussed earlier.
Another option is to use the git rebase command. Rebasing means changing the base commit that's used for our branch.
Let's quickly recap what we've learned about merges up till now. As we've seen in a lot of our earlier examples, when we create a branch at a certain point in the repo's history, Git knows the latest commit that was submitted on both branches.
If only one of the branches has new changes when we try to merge them, Git will be able to fast forward and apply the changes.
But if both branches have new changes when we try to merge, Git will create a new merge commit for the three way merge.
The problem with three way merges is that because of the split history, it's hard for us to debug when an issue is found in our code, and we need to understand where the problem was introduced.
By changing the base where our commits split from the branch history, we can replay the new commits on top of the new base. This allows Git to do a fast forward merge and keep history linear.
So how do we do it?
We run the command git rebase, followed by the branch that we want to set as the new base. When we do this, Git will try to replay our commits after the latest commit in that branch.
This will work automatically if the changes are made in different parts of the files, but will require manual intervention if the changes were made in other files.
Let's check out this process by rebasing our refactor branch onto the master branch.
First, we'll check out the master branch and pull the latest changes in the remote repo. Git tells us that it's updated the master branch with some changes that our colleague had made.
At this point, the changes that we have in the refactor branch can no longer be merged through fast forwarding into the master branch. That's because there's now an extra commit in the master that's not present in the refactor.
Let's see how this looks by asking the log command to show us the current graph of all branches.
It might take a bit to follow everything that's going on with this graph. But it can be really useful to understand complex history trees. As you can see, the refactor branch has three commits before the common ancestor, with the current commit that's at the head of the master branch.
If we merged our branch now, it would cause a three way merge. But we want to keep our history linear. We'll do this with a rebase of the refactor against master.
As usual, Git gives us a bunch of helpful information. It says that it rewound head and replayed our work on top of it. And luckily, everything succeeded.
Let's look at the output of git log --graph --oneline for our branch right now.
Now we can see the master branch and linear history with our list of commits. We're ready to merge our commits back onto the main trunk of our repo and have this fast forwarded.
To do that, we'll check out the master branch and merge the refactor branch.
We're now done with our refactor and can get rid of that branch, both remotely and locally. To remove the remote branch, we'll call git push --delete origin refactor. To remove the local branch, we'll call git branch -d refactor. We can now push changes back into the remote repo.
All right, we've just gone through an example using the git rebase command.
We had a feature branch created against an older commit from master. So we rebased our feature branch against the latest commit from master and then merged the feature branch back into master.
Another Rebasing Example
In our last example, we used git rebase it to rebase a feature branch so that it could be cleanly integrated.
There are many other possible uses of rebase.
One common example is to rebase the changes in the master branch when someone else also made changes and we want to keep history linear.
This is a pretty common occurrence when you're working on a change that's small enough not to need a separate branch and your collaborators just happened to commit something at the same time.
Let's check out how this would work in practice:
First, we'll make a change to our script, save it and commit it.
We want to check if one of our teammates also made a change in the master branch while we were working on our change.
We showed before how to do that by running git pull which will automatically create a three-way merge if necessary.
In this example, we want to look at a different approach to keep our project history linear. So we'll start by calling git fetch which you might remember we'll put the latest changes into the origin/master branch but we won't apply them to our local master branch.
We see that we fetched some new changes. This means that if we tried to merge our changes, we end up with a three-way merge. Instead, we'll now run git rebase against our origin/master to rebase our changes against those made by our colleague and keep history linear.
We've got a conflict and we'll need to fix it. Git is giving us a lot of info on what it tried to do including what worked, what didn't work and what we can do about it.
Since we asked it to rebase, it tried to rewind our changes and apply them on top of what was in the origin/master branch. The first commit made by our colleague, renamed all_checks.py to health_checks.py. Git detected this and automatically merged our changes into the new file name. But when trying to merge our changes with the changes made by our colleague in the file, there was a merge conflict.
The output gives us a bunch of instructions on how to solve this.
We could fix the conflict, skip the conflicting commit or even abort the rebased completely. In this example, we want to fix the conflict.
We'll start by looking at the current state of the health_checks.py file. We see that while we were adding the connectivity check, our colleague was adding a check for the CPU being constrained. We want both functions and the end result. So let's remove the conflict markers, cleaning up our file.
We now need to add the changes made to the health_checks.py file and continue with the rebase.
Now, the rebase has finished successfully let's check out the output of git log --graph --oneline to see what the history looks like at this point.
We see that we've applied our change on top of the other changes without needing a three-way merge.
What we did just now to resolve the conflict is very similar to what we did earlier to merge our changes. The difference is, that the commit history ended up being linear instead of branching out.
We're now ready to push our new check to the remote repo.
As we called out, keeping history linear helps with debugging especially when we're trying to identify which commit first introduced a problem in our project.
We've now seen two examples of how to use the git rebase command.
one for merging feature branches back into the main trunk of our code
one for making sure that our commits made in the master branch apply cleanly on top of the current state of the master branch and it doesn't stop there.
We can also use git rebase to change the order of the commits or even squash two commits into one.
Best Practices for Collaboration
It's worth spending some time talking about best practices for collaborating with others.
It's a good idea to always synchronize your branches before starting any work on your own. That way, whenever you minimize the chances of conflicts or the need for rebasing.
Another common practice is to try and avoid having very large changes that modify a lot of different things.
For example, if you are renaming a variable for clarity reasons, you don't want to have code that adds new functionality in the same commit. It's better if you split it into different commit. This makes it easier to understand what's going on with each commit.
On top of that, if you remember to push your changes often and pull before doing any work, you reduce the chances of getting conflict.
We called out already that when working on a big change, it makes sense to have a separate feature branch.
To make the final merge of the feature branch easier, it makes sense to regularly merge changes made on the master branch back onto the feature branch.
If you need to maintain more than one version of a project at the same time, it's common practice to have the latest version of the project in the master branch and a stable version of the project on a separate branch. You'll merge your changes into the separate branch whenever you declare a stable release.
Whenever we do a rebase, we're rewriting the history of our branch. The old commits get replaced with new commits, so they'll be based on different snapshots than the ones we had before and they'll have completely different hash sums.
This works fine for local changes, but can cause a lot of trouble for changes that have been published and downloaded by other collaborators.
As a general rule, you shouldn't rebase changes that have been pushed to remote repos.
The Git server will automatically reject pushes that attempt to rewrite the history of the branch. It's possible to force Git to accept the change, but it's not a great idea unless you really know what the implications will be.
In our feature branch example, we rebased the branch. Merged it to the master and then deleted the old one. That way, we didn't push the rebase changes to the refactor branch, only to the master branch that hadn't seen those changes before.
Having good commit messages is important.
It's already important when you're working alone since good commit messages help the future you understand what's going on, but it's even more important when you're collaborating with others.
Whenever we collaborate with others, there's bound to be some merge conflicts and they can sure be a pain. If I'm dealing with this type of merge conflict, my first step is to work backward and disable everything I've done and then see if the source still works, then I slowly add pieces of code until I hit the problem.
Conflict Resolution Cheat Sheet
Merge conflicts are not uncommon when working in a team of developers, or on Open Source Software. Fortunately, GitHub has some good documentation on how to handle them when they happen:
You can also use git rebase branchname to change the base of the current branch to be branchname
The git rebase command is a lot more powerful.ย Check out this link for more information.
Module 3 Review
Module 3 Wrap Up: Working with Remotes
Let's review what we've learned about working with remote in Git.
First, we talked about what GitHub is and what the basic interaction with the service looks like.
Then we discussed how remote repositories and the distributed nature of Git lets lots of contributors develop a project independently, and at the same time,
we then learned how to pull data down from remote repositories, push our local changes to them, and also resolve conflicts that pop-up when our local and remote branches are out of sync.
We wrapped up by looking at a complex example of using a feature branch for a refactor of our code and using rebase to make sure that our history stayed linear.
ย