Learning Objectives:
Describe the advantages of using separate branches
Utilize the git rebase command
Describe what GitHub is and how to interact with it
Explain what a remote repository is
Utilize remote repositories, fetch new changes, and update local repositories
Utilize the pull-merge-push workflow to address conflicts
Push remote branches so code can be viewed and tested by collaborators
Explain what rebasing is
Table of Contents:
Introduction to GitHub
Intro to Module 3: Working with Remotes
In this module, we'll learn a load of new things related to GitHub and remote repositories. We'll first talk about what GitHub is and why it matters, and then we'll dive into how to work with GitHub and other remote repositories. Being able to use remote repositories allows us to effectively collaborate with others.
What is GitHub?
Git is a distributed version control system. Distributed means that each developer has a copy of the whole repository on their local machine. Each copy is a peer of the others.
But we can host one of these copies on a server and then use it as a remote repository for the other copies. This lets us synchronize work between copies through this server.
Any of us can create a Git server like this one, and many companies have similar internal services. But if you don't want to set up a Git server yourself and host your repositories, you can use an online service like GitHub.
GitHub is a web-based Git repository hosting service. On top of the version control functionality of Git, GitHub includes extra features like bug tracking, wikis, and task management.
GitHub lets us share and access repositories on the web and copy or clone them to our local computer, so we can work on them.
Other services that provide similar functionality are BitBucket, and GitLab.
GitHub provides free access to a Git server for public and private repositories.
It limits the number of contributors for the free private repositories, and offers an unlimited private repository service for a monthly fee.
We'll be using a free repository for our examples, which is fine for educational use, small personal projects, or open source development.
A word of caution on how you can manage these repos though. If hackers get hold of information about your organization's IT infrastructure, they can use it to try and break into your network. So make sure you treat this information as confidential. For real configuration and development work, you should use a secure and private Git server, and limit the people authorized to work on it.
Once you've created a GitHub account, you can either create your own repos or contribute to repos from other projects.
Visit http://github.com to sign up for their service.
Basic Interaction with GitHub
Once you have your GitHub account, you're ready to create your brand new repository on GitHub.
Going step-by-step.
We'll start by clicking the Create a repository link on the left.
This will take us to the repo creation wizard. The wizard is pretty straightforward.
The first thing we need to do is give a name for our repo. We'll call this repo health checks.
After that comes a description of what the repo will be used for. We'll say that'll be used for scripts that check the health of our computers.
Then we need to select whether we want the repo to be public or private. We'll go with private for now.
Finally, the wizard can help us get started with some few initialization files like a README, a gitignore, or license file. We'll go with just the README for now, and then create the repo.
Using the wizard, we created the repo and have a fresh remote repository ready to go.
First step is to create a local copy of the repository. We'll do that by using the git clone command followed by the URL of the repo. GitHub conveniently lets us copy the URL from our repo from the interface so that we don't have to type it.
We're now ready to clone the repo into our computer. We'll do that by calling git clone and paste in the URL we copied. To do this, GitHub will ask for our username and password. Just like that, we've downloaded a copy of the remote repository from GitHub onto the local machine.
This means that we can perform all the git actions that we've learned up till now. Since the repo is called health checks, a directory with that name was automatically created for us and now has the working tree of the Repository in it.
So let's change that directory and look at the contents. Our repo is basically empty. It only has the README file that GitHub created for us. This file is in a special format called markdown. Let's add a bit more content to it.
We've changed this file. We need to stage the change and committed. We've seen a couple of different ways to do that. Let's use our shortcuts to do this in just one command.
We got to remote repository set up on GitHub. So let's use it. We can send our changes to that remote repository by using the git push command which will gather all the snapshots we've taken and send them to the remote repository. Once again, we're asked for our password. After that, we see a bunch of messages from git related to the push.
When we access our project, we see the contents of the README file. So if we check our repository on GitHub, we should see the updated message.
You've probably noticed that we had to enter our password both when retrieving the repo and when pushing changes to the repo.
There are a couple ways to avoid having to do this:
One way is to create an SSH key pair and store the public key in our profile so that GitHub recognizes our computer.
Another option is to use a credential helper which caches our credentials for a time window so that we don't need to enter our password with every interaction.
Git already comes with a credential helper baked in. We just need to enable it. We do that by calling git config --global credential.helper cache
Now that we've enabled the credential helper, we'll need to enter our credentials once more. After that, they'll be cached for 15 minutes.
To check this, we can try another git command, git pull which is the command we use to retrieve new changes from the repository. We'll enter our credentials on the first call to the command and they'll be cached, so we won't need to enter them again.
With that, we've seen how to create repositories on GitHub, clone our remote repository, push changes to it, and pull changes from it.
Basic Interaction with GitHub Cheat-Sheet
There are various remote repository hosting sites:
Follow the workflow at https://github.com/join to set up a free account, username, and password. After that, these steps will help you create a brand new repository on GitHub.
Some useful commands for getting started:
Command | Explanation & Link |
git clone URL | Git clone is used to clone a remote repository into a local workspace |
git push | Git push is used to push commits from your local repo to a remote repo |
git pull | Git pull is used to fetch the newest updates from a remote repository |
This can be useful for keeping your local workspace up to date.
Using a Remote Repository
What is a remote?
When we clone the newly created GitHub repository, we had our local Git Repo interact with a remote repository.
Remote repositories are a big part of the distributed nature of Git collaboration. It let lots of developers contribute to a project from their own workstations making changes to local copies of the project independently of one another.
When they need to share their changes, they can issue git commands to pull code from a remote repository or push code into one.
There are a bunch of ways to host remote repositories. There is many internet-based Git hosting providers like GitHub, BitBucket or GitLab which offer similar services.
We can also set up a Git server on our own network to host private repositories. A locally hosted Git server can run on almost any platform including Linux, mac OS, or Windows. This has benefits like increased privacy, control, and customization.
Using Git to manage a project helps us collaborate successfully.
Everyone will develop their piece of the project independently in their own local repositories maybe even using separate branches. Occasionally they'll push finished code into a central remote repository where others can pull it and incorporate it into their new developments.
So how does this work?
Alongside the local development branches like master, Git keeps copies of the commits that have been submitted to the remote repository and separate branches.
If someone has updated a repository since the last time you synchronize your local copy, Git will tell you that it's time to do an update.
If you have your own local changes when you pull down the code from the remote repo, you might need to fix merge conflicts before you can push your own changes.
In this way Git let's multiple people work on the same project at the same time. When pulling new code it will merge the changes automatically if possible or will tell us to manually perform the integrating if there are conflicts.
So when working with remotes the workflow for making changes has some extra steps.
Will still modify stage and commit our local changes.
After committing, we'll fetch any new changes from the remote repo manually merge if necessary and only then will push our changes to the remote repo.
Git supports a variety of ways to connect to a remote repository. Some of the most common are using the HTTP, HTTPS and SSH protocols and their corresponding URLs.
HTTP is generally used to allow read only access to a repository. In other words, it lets people clone the contents of your repo without letting them push new contents to it.
Conversely HTTPS and SSH, both provide methods of authenticating users so you can control who gets permission to push.
The distributed nature of the work means that there are no limits to how many people can push code into a repository.
It's a good idea to control who can push codes to repos and to make sure you give access only to people you trust.
Web services like GitHub, offer a bunch of different mechanisms to control access to Repositories.
Some of these are available to the general public while others are only available to enterprise users.
Working with Remotes
When we call a git clone to get a local copy of a remote repository, Git sets up that remote repository with the default origin name. We can look at the configuration for that remote by running git remote -v in the directory of the repo.
Here we see the URLs associated with the origin remote. There are two URLs:
one will be used to fetch data from the remote repository
the other one to push data to that remote repo
They'll usually point at the same place. But in some cases, you can have the fetch URL use HTTP for read only access, and the push URL use HTTPS or SSH for access control. This is fine as long as the contents of the repo that you read when fetching are the same that you write to in pushing.
Remote repositories have a name assigned to them, by default, the assigned name is origin.
This lets us track more than one remote in the same Git directory. While this is not the typical usage, it can be useful when collaborating with different teams on projects that are related to each other.
If we want to get even more information about our remote, we can call git remote show origin. We can see the fetch and push URLs that we saw before, and the local and remote branches too. For now we only have a master branch that exists locally and remotely.
Whenever we're operating with remotes, Git uses remote branches to keep copies of the data that's stored in the remote repository.
We could have a look at the remote branches that our Git repo is currently tracking by running git branch -r. These branches are read only.
We can look at the commit history, like we would with local branches, but we can't make any changes to them directly.
To modify their contents, we'll have to go through the workflow we called out before.
First, we pull any new changes to our local branch,
then merge them with our changes
and push our changes to the repo.
We can also use git status to check the status of our changes in remote branches as well.
Now that we're working with a remote repository, git status gives us additional information. It tells us that our branch is up to date with the origin/master branch, which means that the master branch in the remote repository called origin, has the same commits as our local master branch.
Fetching New Changes
We could always use the GitHub website to browse the changes that were submitted. But we want to learn how to do it by interacting through the command line because you might need to do it this way at your job,and it'll work the same no matter which platform you use to interact with Git.
So first, let's look at the output of the git remote show origin command.
Check out how it says that the local branches out of date. This happens when there were commits done to the repo that aren't yet reflected locally. Git doesn't keep remote and local branches in sync automatically, it waits until we execute commands to move data around when we're ready.
To sync the data, we use the git fetch command. This command copies the commits done in the remote repository to the remote branches, so we can see what other people have committed.
Fetched content is downloaded to the remote branches on our repository. So it's not automatically mirrored to our local branches.
We can run git checkout on these branches to see the working tree, and we can run git log to see the commit history.
Let's look at the current commits in the remote repo by running git log origin/master.
Looking at this output, we can see that the remote origin/branch is pointing to the latest commit. While the local master branch is pointing to the previous commit we made earlier on.
Git status helpfully tells us that there's a commit that we don't have in our branch.
It does this by letting us know our branches behind their remote origin/master branch. If we want to integrate the branches into our master branch, we can perform a merge operation, which merges the origin/master branch into our local master branch.
To do that, we'll call git merge origin/master. We've merged the changes of the master branch of the remote repository into our local branch.
Git tells us that the code was integrated using fast-forward. It also shows that two files were added, all checks and disk_usage.py. If we look at the log output on our branch now, we should see the new commit.
We see that now our master branch is up to date with the remote origin/master branch.
We can use git fetch like this to review the changes that happen in the remote repository. If we're happy with them, we can use git merge to integrate them into the local branch.
Fetching commits from a remote repository and merging them into your local repository is such a common operation in Git that there's a handy command to let us do it all in one action.
Updating the Local Repository
Earlier, we took a look at the basic workflow for working with remotes when we want to fetch the changes manually, merge if necessary, and only then push any changes of our own.
Since fetching and merging are so common, Git gives us the git pull command that does both for us.
Running git pull will fetch the remote copy of the current branch and automatically try to merge it into the current local branch.
If you look closely at this output, you'll see that it includes the output of the fetch and merge commands that we saw earlier.
First, Git fetched the updated contents from the remote repository, including a new branch.
And then it did a fast forward merge to the local master branch.
We'll see that the all_checks file was updated as well.
We can look at the changes by using git log -p -1.
We see that our colleague added a check_disk_full function that includes the code from the other disk_usage.py file that we saw earlier.
So now I'll exit the editor with q.
Let's check out the output of git remote show origin and see what it says about that new branch.
We see that there's a new remote branch called experimental, which we don't have a local branch for yet. To create a local branch for it, we can run git checkout experimental.
When we checked out the experimental branch, Git automatically copied the contents of the remote branch into the local branch. The working tree has been updated to the contents of the experimental branch.
In this last example, we got the contents of the experimental bunch together with those of the master branch when we called git pull, which also merged new changes onto the master branch.
If we want to get the contents of remote branches without automatically merging any contents into the local branches, we can call git remote update. This will fetch the contents of all remote branches, so that we can just call checkout or merge as needed.
Git Remotes Cheat-Sheet
Command | Explanation & Links |
git remote | |
git remote -v | |
git remote show <name> | |
git remote update | |
git fetch | |
git branch -r | Lists remote branches; can be combined with other branch arguments to manage remote branches |
You can also see more in the video Cryptography in Action from the course IT Security: Defense against the digital dark arts.
Solving Conflicts
The Pull-Merge-Push Workflow
Pushing Remote Branches
Rebasing Your Changes
Another Rebasing Example
Best Practices for Collaboration
Conflict Resolution Cheat Sheet
Merge conflicts are not uncommon when working in a team of developers, or on Open Source Software. Fortunately, GitHub has some good documentation on how to handle them when they happen:
You can also use git rebase branchname to change the base of the current branch to be branchname
The git rebase command is a lot more powerful. Check out this link for more information.