3. git and GitHub: version control and social open-source development¶
All code should be under version control to keep track of changes over time and when it comes to version control git
is the dominant system. A 2018 Stack Overflow survey of developers’ version control use found that 90% of developers use git, with the second most popular version control being the older Subversion, likely mostly to use legacy code that still lives in Subversion
repositories. git
is most widely supported by code hosting services, with GitHub only hosting git
repositories and BitBucket dropping support for the main git
alternative mercurial
in 2020. Basically, git
is now the only game in town.
3.1. Version control¶
Version control is a system for tracking and making changes to code as the code develops. Version control stores code in a “repository” and when using version control, any changes to the code are logged through a code “commit” that lists the files changed and provides a brief description of the changes; version control software then generates a “diff” with respect to the previous version that gets stored, such that the full history of changes is available for use in the future. Opinions differ on how many changes to include in a single commit (which can consist of changes to multiple files at once), but typically it is best to keep commits as “atomic” as possible, that is, create a commit at the smallest change that is reasonable to call a change or improvement to the code. Commits are often as small as changing a single line, perhaps improving the documentation or fixing a small bug. When making changes to existing code, it is always best to keep commits at the level of small changes, so any issues with the changes can later easily be pinned down to a specific change; change commits should typically consist of a few lines to a few dozen lines of edited code at most. When first implementing a new feature, it may make sense to wait to commit until a draft version of the feature is working and one can thus end up with a larger commit, but even then it is best to first implement a skeleton of the new feature and then edit it with small changes until the feature is fully implemented.
Early version control systems stored the history of changes in a central location while each developer only had a copy of the current version of the code; thus, every code commit and every query of the code’s history required interaction with the central location (often remote, requiring an internet connection). Because this meant that one could not commit while offline and even when online was an impediment to quick progress due to the sometimes slow response time of the central location, this
led to often bloated commits. One of the great improvements in git
is that each copy of the code’s repository contains the full history, leading to a decentralized system where there is no need for a central location. Different copies of a git
repositories are called “clones” and with git
, clones can communicate among themselves without needing to go through a central place. Of course, the current reality is that most git
repositories have “main” copies that are stored in online
services like GitHub or Bitbucket, with most of the communication between different clones happening through the main hosted copy of the code. Nevertheless, the fact that each clone contains the entire history means that you can easily commit code and investigate the history without requiring interaction with a centralized repository, and this is therefore much faster and robust against network interruptions.
In this chapter, I provide a brief overview of the basic git
features and commands and discuss how to use GitHub to host your scientific code package. Note that this is not supposed to be an exhaustive guide to using git
, many such guides already exist.
3.2. Basic git
use¶
The most basic cycle of git
use is a cycle of git pull
, git commit
, git push
. These commands, respectively, pull in the latest changes from the remote main version of the code repository (e.g., hosted on GitHub), commit new changes made to the code, and push the changes in this commit(s) back to the remote main version. If you add git diff
for looking at the not-yet-committed current set of changes, git status
for interrogating the status of the repository and git log
for looking at the history of the code, and you have well over 95% of my typical usage of git
.
To get started with git
, you can initialize any directory to be a git
repository using
git init
which initializes an empty git
repository. That is, even if the directory already has files in them, these are not automatically added to the git
repository, instead, you need to add them yourself. To follow along with this tutorial, you can create, for example, a directory exampy-GITHUBUSERNAME
where you can build your own version of the exampy
package that I discussed in the previous chapter. Keep in mind that while we use git init
here
to get started with git
, typically the way you start a new git
repository is not by running this command-line command, but instead by creating a new repository on GitHub and setting it up online in such a way that you can directly clone a local copy and start pulling, committing, and pushing changes (see below). I don’t think I have run git init
once in the last eight years.
As discussed above, you should make a commit as soon as enough changes to the code have accrued to make up a reasonable change to the code (again, this could be as simple as fixing a typo in the code, a single character). Simply running
git commit
will open a text editor (set by your git
defaults) that allows you to write a message describing the change; this will perform a single commit for all current changes to all changed files in the repository. You can avoid the use of the text editor by directly specifying the message as
git commit -m "A message describing the atomic change made"
which I personally prefer as a fan of the command line (and of speed!). You can list specific files to only commit changes to those files by adding them to the command as
git commit -m "A message describing the atomic change made" file1.py file2.py
It is good practice to always specify the files you are committing changes for rather than not specifying any files or specifying a folder (which would commit changes to all files in that folder). This way, you don’t end up accidentally committing changes made to other files that are unrelated to the current commit (we will see below how you can even split up changes in a single files into different commits).
Before you can start committing changes to files, you need to tell git
about the existence of the file in the first place (typically soon after you create the file, in preparation for your first commit of the file). This is done using git add
which you call with
git add file1.py
and you can also list multiple files. Even though you specify a directory and you can use wildcards, it is again good practice to always explicitly list all files that you are adding, rather than an entire folder or more, because that way you will invariably end up adding files that you did not want to add and removing them again can be difficult.
When your code is centrally hosted (as it should be!), each coding session should start with a git pull
to pull in changes in the remote main repository that have not yet been added to your clone of the repository. If you are the sole developer of a code, this may seem silly, but it is again good practice to always do this such that it becomes muscle memory and because even if you are the sole developer, you are likely to be developing the code on multiple machines (a personal laptop, a
desktop at work, a remote server for running large jobs, …) and this keeps the code in sync. When you have cloned the code from GitHub and are working in the main branch, a simple
git pull
will suffice to pull in remote changes, but in general you can specify both the location of the remote repository and the branch. For example, typically the simple git pull
will be equivalent to
git pull origin main
which tells git
to pull changes from the remote repository referenced as “origin” and to pull changes from the main branch.
After you have made one or more commits, you will want to push these commits back to the remote main repository. Before you do that, it is good practice to again first do git pull
to pull in any changes to the remote repository that may have occurred while you were coding, so you can resolve any conflicts before pushing your own changes (and possibly having them be rejected if there is a conflict). Once you have done this, you push your commits with
git push
which is again typically a shortcut for the full
git push origin main
With just these four git
commands you can get most of the basic functionality of git
version control. Further useful basic commands are git diff
, git status
, and git log
. The git diff
command provides a “diff” showing the difference between your clone’s current state and the last commit; thus, it shows the changes you have made since the last commit. Depending on your setup, this diff will simply have ‘-’ lines and ‘+’ lines to show lines that were removed and added (a
change on a single line giving both a removal of the old line and an addition of the edited new line) or they may be colored red and green. Running
git diff
without any additional arguments goes through all files that were changed, but you can look at changes in a single file or in a set of files by specifying them in the call, e.g., as
git diff file1.py
Running git status
gives a brief summary of the current version of your clone. It prints the branch you are on and whether you are up to date with the remote repository’s same branch or how many commits ahead of the remote repository you are. It also prints files that have changed since the last commit and files contained in your clone that have not been declared to git
(for example, new files before running git add
will show up in that list; after running git add
they will be
listed as newly added). I use git status
a lot to remind myself of what I have been doing since the last commit.
git log
prints a log of the history of changes to the code. Run without any options, it will provide a moderately verbose list of all commits, listing the commit hash (the unique identifier of every commit), the commit’s author and date, and the summary that you provided when running git commit
. But git log
’s output can be highly customized. To get a very succinct listing do
git log --oneline
which will list each commit in an abbreviated manner on a single line. Or use the --pretty=
option to get less or more information, e.g.,
git log --pretty=short
which is similar to the basic output, but does not include the date.
3.3. Branches¶
A feature of most version control systems and one that is especially easy to use with git
is the ability to branch off the main development branch of your code to focus on developing a single feature, fix a single bug, etc. After you are satisfied with the changes on the branch, these changes are merged back into the main development commit history. A crucial part of the implementation of the git
software is fast and intelligent algorithms to perform such merges automatically, even
when the difference between the feature branch and the main branch are substantial. When git
is unable to automatically merge branches, the repository goes into a suspended state until the user manually resolves any merges that cannot be automatically done.
Branches are an incredibly useful feature of git
, especially when combined with forks discussed below, and you should make liberal use of them. Branches allow you to split off things like implementing new features, while still keeping the ability to fix bugs in the main branch without that fix having to wait for the new feature to be ready to go “live”. Branches also allow you to develop new features in the incremental way that you
should implement all of your code (with many commits), without necessarily having to worry at first that the new feature is entirely compatible with the existing code or that it passes all existing tests.
The main branch is called main
. It is good practice to keep this branch as clean as possible, that is, avoid having it be in a state where it contains partially implemented features or bug fixes. The main
branch should always contain a fully working version of your code. Any significant changes to your code should therefore be done in other branches.
To create a new branch, do
git switch -c NEWFEATURE
which creates a branch called NEWFEATURE
(which should be a very brief string describing the new feature, e.g., “add_cube” if you are adding a function to compute the cube of a number) and switches the state of the repository to this branch. In detail, this command is a shorthand for the following two commands
git branch NEWFEATURE
git switch NEWFEATURE
where the first git branch
command creates the branch, while staying on the current branch (e.g., main
) and the git switch
command switches the state of the repository to the new branch. After running this, git status
will report that you are now on the NEWFEATURE
branch. Any commit that you make now is logged in the commit history of the branch, which is the same as that of the branch it branched off from up until the branching point and then starts containing additional
commits. Running
git branch
without any further arguments will show a list of all branches that exist in the local clone of the repository (this is not necessarily the same as the branches that exist in the centrally-hosted repository if those branches haven’t been checked-out in the local repository. To switch between branches, run
git switch SWITCH_TO_BRANCH
where SWITCH_TO_BRANCH
is the name of the branch you want to switch to (e.g., git switch main
to go back to main). This keeps the branch intact, it simply places the working state of the repository to another branch. This is useful if you are working on a new feature in one branch, but want to fix a bug in another branch. Make sure to commit all changes that you made in a branch before switching to another branch, otherwise there is a good chance that you will accidentally commit a
change you meant to commit in the feature branch in the wrong branch!
Once you are ready to merge the changes in your branch back into the main
branch, you switch back to main
and run the merge command
git switch main
git merge NEWFEATURE
git merge
will attempt to perform the merge automatically, in which case you have to do nothing except to okay a commit that performs the merge (sometimes not even that). If the automatic merge fails, you will get a message like
Auto-merging file.py
CONFLICT (content): Merge conflict in file.py
Automatic merge failed; fix conflicts and then commit the result.
notifying you that the merge has failed and that you have to resolve conflicts between the branches yourself. This is an annoying situation, but it will happen. The failed merge process will leave your files in a state where they record the attempted merge and why it failed; your file.py
in this case will have a section that looks like
<<<<<<< HEAD:file.py
def cube(x):
return x**3
=======
def newcube(x):
return x**3.
>>>>>>> NEWFEATURE:file.py
You can then manually resolve these, but it is typically easier to use a tool for this, which you can bring up with
git mergetool
This command will ask you which tool to use (e.g., opendiff
; if you use Visual Studio Code
it will automatically switch to this mode without having to do git mergetool
) and will then open the files with conflicts in sequence in the merge-tool to allow you to resolve the changes, with typical output
This message is displayed because 'merge.tool' is not configured.
See 'git mergetool --tool-help' or 'git help config' for more details.
'git mergetool' will now attempt to use one of the following tools:
opendiff kdiff3 tkdiff xxdiff meld tortoisemerge gvimdiff diffuse diffmerge ecmerge p4merge araxis bc3 codecompare vimdiff emerge
Merging:
file.py
Normal merge conflict for 'file.py':
{local}: modified file
{remote}: modified file
Hit return to start merge resolution tool (opendiff):
Typically, these tools will show the two versions of the file, labeling all sections that need to be merged and showing which cannot be performed automatically and it will show the merged version of the file, which you can edit to resolve the merge (either through an option, such as “choose main” or “choose NEWFEATURE” or by manually editing the merged file). Once you have resolved the conflicts, you need to perform a simple
git commit
without any other arguments (i.e., don’t specify any files) to commit the merge.
Once you have merged a branch’s changes back into the main
branch, you can delete the branch by running
git branch -d NEWFEATURE
If you have performed the merge elsewhere (e.g., on GitHub), this command might complain that the NEWFEATURE
branch contains changes that have not been merged yet, but if you are sure that all is okay, you can force-delete the branch by switching to an uppercase “D”
git branch -D NEWFEATURE
Be careful with this though, because if you accidentally delete a branch that you still need, it will be very difficult to get it back (although, because it’s git
, not necessarily impossible…).
3.4. Some useful advanced git
features¶
The git
features discussed above will allow you to do most of your day-to-day work with git
version control, but git
has many advanced features. This is not supposed to be an exhaustive guide to all git
features, but in this section I briefly discuss some of the more advanced git
features that I use on a semi-regular basis.
Above, we have used git switch
to switch branches, but git switch
is a special version of a more general command git checkout
that can do much more (for switching branches, git switch
and git checkout
are equivalent, but switching to a new branch is done with git checkout -b NEWBRANCH
instead). One often-used invocation is
git checkout -- file.py
which discards all changes in file.py
since the previous commit (you can also run it on the entire repository). This is useful when you’ve made a big mess and the easiest way out is to just give up and start over (this happens to me a lot). Again, be careful with this command, because once you discard the changes, it is impossible to get them back. In newer versions of `git
, you can equivalently run
git restore file.py
Besides checking-out branches, git checkout
can also check-out a previous commit, by specifying the commit’s hash as
git checkout COMMITHASH
where COMMITHASH
is the hash (the number like 625123ab491088d6714809648d8a13ae435b7cf8
that you can get from git log
or elsewhere). This will leave the repository in a “detached HEAD” state, which doesn’t sound good and which isn’t indeed all that good (if you want to actually start making changes, you will have to create a new branch starting from this commit), but it allows you to switch back to an earlier state of the repository and see what it looked like or run tests etc. for
the earlier state. That’s often useful when you are trying to figure out where in the commit history something went wrong.
If you are working in a branch and have uncommitted changes and you want to switch to another branch (briefly, say) and you really don’t want to commit the uncommitted changes before the switch, you can “stash” them away for future use. For this run
git stash
which stashes all uncommitted changes and reverts the repository back to the previous commit. Then you can switch to another branch without carrying over the uncommitted change. Once you are ready to start work on the uncommitted changes again, switch back to their branch and do
git stash pop
to bring back the uncommitted changes. You can stash multiple sets of uncommitted changes and there is support for listing them etc., but in practice that becomes ugly very quickly, so it is best to use git stash
very sparingly and only for very brief periods of time (e.g., you are in the middle of working on a new feature, someone reports a bug that will just take two minutes to fix, so you switch to a branch to fix the bug before coming back five minutes later to take up the new feature’s
implementation again).
Finally, git add
, in addition to adding files to the repository’s list of files, can be used to specify what parts of the current changes to “stage” for the next commit by running
git add -p
which is short for git add --patch
. Run like this, this will start an interactive session that breaks up all of the current set of changes into atomic chunks (called “hunks” for some reason) and asks you whether you want to “stage the hunk” (i.e., add it to the next commit), “not stage the hunk” (i.e., skip it), split the hunk into multiple sub-hunks if you want finer-grained control, or manually edit the way in which the current hunk is staged for the next commit (typing ‘?’ at any time
during the process brings up a helpful explanation of the different options).
Using git add -p
is useful when you have made a lot of changes since the last commit, perhaps because you need many changes to perform a meaningful test of the new implementation, and you want to break it up into multiple small commits for clarity in your repository’s history. In general, it is best to simply make small commits along the way, but if you’ve found yourself making lots of changes since the last commit, git add -p
will help you out in keeping a sane code history.
3.5. Using GitHub to build a community for your code¶
What is GitHub? GitHub is an online service to host software packages using the git
version control system and that has many of the additional bells and whistles to help with a package’s development, maintenance, and community interaction. While there is no direct association between git
and GitHub (there are other services to host git
repositories, like BitBucket), at this point git
and GitHub are heavily associated with one another and it seems to me
that the total dominance that git
has attained over other similar version-control systems has much to do with the exquisite support for hosting git
repositories that GitHub has provided now for many years.
At its most basic, GitHub provides the central location where the main copy of your repository is stored, the location from which you git pull
and to which you git push
changes to the code. As such, it provides a crucial back-up service for your code and a central hub that you can use to keep different copies of your code up-to-date with one another. But GitHub provides many more features than that. For aiding in the development and maintenance of your code, GitHub provides a full online
viewer of your code, arranged as a file system that is a central part of your code’s GitHub website, which allows you to see the latest version of your code as well as the code at any commit in its history. It can also display changes made in each commit in an easy-to-understand format and show you differences between the code at different points in the code’s history. But GitHub’s most important feature is that it provides your code’s public face where users of your code will go to learn about
the code, to find your code’s documentation, to interact with the developer(s), to commit patches to the code, etc. For many modern code packages, their GitHub page is the public website of the package.
While you can import an existing code repository hosted elsewhere online, typically you start by creating a new repository (log into your GitHub account to access this page). This brings up a page that asks basic information about the code repository that you want to create. First you specify the repository’s name and this should typically be the name of your software package; GitHub does not require names across GitHub to
be unique (only within your own account), but as we have discussed before, using a globally unique name is important. Then you can specify a brief description (which can be easily edited later, but it is good practice to always start with a cogent description) and whether to make the repository public (viewable by all internet users) or private (accessible only to yourself and any explicitly added collaborators). If your intention is for a
wide range of users to use your code, you’ll want to make it public! But even if this is your plan, you may want to start off creating a simple version of your repository in private if you so desire (I don’t judge, as long as you make it public soon 😀). Note that if you are an academic educator or researcher, GitHub has a program that gives you a free “Pro” account, which comes with unlimited private repositories. When you make a repository private,
you can always change it to public later in the repository’s Settings. You can then choose to “Initialize this repository with a README”, which is a good thing to do, because it will create a skeleton GitHub repository that contains a README file in Markdown format (therefore, README.md
) that contains the name and description (and that’s all your repository will contain at this point!). Starting out with a README.md
means that you can then clone the repository to your local machine, and
start adding and committing changes without having to locally git init
an empty repository. You also have the option to add a .gitignore
file (this is a file that contains rules for files that git should largely ignore, e.g, not list as unknown files to git
when you run git status
; this contains entries like *.pyc
to ignored compiled bytecode Python files; for a Python project, choose the Python version of the .gitignore
file). You can also immediately add a code license
from a list of open-source licenses, which is a good idea. Then hit “Create repository” and you’re done!
If you don’t initialize your repository with a README or any other file, it will be created but you will have to finalize the initialization of the repository yourself. This is what you do when you have already started the git
repository locally using git init
and by having added and committed some files. In that case, you need to run
git remote add origin https://github.com/GITHUBUSERNAME/REPOSITORYNAME.git
to tell your local repository about the newly created GitHub repository and then do
git push -u origin main
to push your local initialization to GitHub. You can run this command after as many commits as you want, that is, you can even push git
repositories with thousands of commits to a newly-created GitHub repository and the GitHub repository will then contain the entire previous history of the code in the same way as if you had developed it while using GitHub (in that sense, GitHub is simply a viewer of your repository’s commit history).
When you have initialized your GitHub repository with a README.md
file, you typically will create a local copy by running (e.g., for the repository that contains these notes)
git clone https://github.com/jobovy/code-packaging-minicourse.git
The URL here is standard https://github.com/GITHUBUSERNAME/REPOSITORYNAME.git
, but the simplest way to obtain it is to go your repository’s GitHub page and click the big green “Clone or download” button near the top, which will allow you to copy the URL to your clipboard. As the name implies, git clone
creates an exact, full copy of the GitHub repository on your local machine. When you obtain your local copy in this way, the local copy is automatically aware of the central GitHub
location of your code, such that commands like git pull
and git push
work without requiring any immediate further setup.
When you create a branch in your local copy of the repository, you need to tell your local copy how to link up this branch with a branch in the GitHub version of your code. Simply trying to run git pull
in a newly created branch tells you what you have to do here: You can either always (tediously) specify the remote branch as
git pull origin BRANCHNAME
(similar for git push
) where “origin” is a shorthand for the GitHub repository (in general, the central location of your code’s repository) or you can save this information using
git branch --set-upstream-to=origin/BRANCHNAME BRANCHNAME
such that you can again simply do git pull
and git push
and changes will be pulled and pushed to the correct branch on GitHub. Note that if you have not yet pushed a newly created branch to GitHub, the git pull origin BRANCHNAME
command will fail to find the remote branch; in this case, first push the branch with git push origin BRANCHNAME
.
One of GitHub most crucial features is that it allows other users to easily create their own copy of your code and hosting that on GitHub as well by creating a “fork” of your code. That is, a fork is a copy of your code that is hosted under another user’s account and that is identical to your git
repository (including all commits) up to the point at which the fork was made. This allows other users to make changes to your code using git
without needing write access to your version
of the code, they can push changes to their own version of the code and make these available to other users via GitHub. People who fork your code cannot directly write to your GitHub repository and neither can you write to their fork of your repository (but GitHub prominently links back to the original version). The purpose of most forks is for other users to make changes to the code that will quickly or eventually be merged back into the main GitHub repository, but some codes have forks that
are long-lasting and never re-unite with the original repository. To create a fork, navigate to a repository’s GitHub page and click the Fork button at the top right. If during work in a fork you want to merge in subsequent changes made in the original repository, you will need to tell the clone of your fork about the original repository. The original repository is normally called the “upstream” and you add it as a remote repository as, e.g.,
git remote add upstream https://github.com/jobovy/code-packaging-minicourse.git
if you have forked the repository containing these notes and want to merge in changes made in the original repository. Then you can pull in changes from an “upstream” branch with, e.g.,
git pull upstream main
to pull in changes from the upstream main
branch. Note that the “upstream” in this command is simply the shorthand for the URL that you added with the git remote add
command.
The main mechanism for merging changes made in a fork of a repository back into the main repository is through a “pull request”, essentially a request to git pull
the changes from the fork into the main repository (although what’s actually done is a git merge
). A fork is essentially like a set of branches, where all of the original repository’s branches are present as duplicates of the original’s and users can add additional branches. Merging changes made in a fork is essentially the
same as merging changes from a branch as we discussed above, with the only difference being that the fork is hosted remotely. Every GitHub repository has a tab called “Pull requests”, which lists the currently-open and previously-closed pull requests. To initiate a pull request, either go the main repository’s Pull requests
tab and click “New pull request” or go to your fork’s page, which has a Pull request
button that would initiate the pull request. When you open a pull
request, you should give a brief rationale behind the change that is being asked to be merged in. It is good practice not to make changes to a fork’s main
branch, but to instead create a new branch to implement changes and then initiate a pull request from this branch to the original main
branch. For one thing, this will allow the original repository’s owner to check out your fork’s branch more easily if this becomes necessary in the review or merging process.
When you open a pull request, the original code repository’s owners will likely ask you additional questions about the changes, to edit the changes to abide by the main repository’s coding style, to make sure that documentation/tests are updated (e.g., the log of changes), etc. and GitHub allows this conversation to happen on the page associated with the pull request. Keep in mind that your pull request may be rejected by the code’s owners: maybe it is implementing a new feature that they do not wish to support and maintain in the future (any new feature will entail a maintenance burden that will typically fall on the code’s owners rather than the person implementing it in a fork), or maybe they want more explanation/documentation/tests and you are not willing to provide this. Large pull requests may be difficult to review, so good pull requests are typically small (you may be able to split up a big change into smaller, atomic pull requests if each one can stand on its own). If you are concerned about doing a lot of work that might get rejected, contact the authors through the communication channel(s) that they prefer before you start the work, so you can find out whether they would be amenable to a pull request or not (you may want to do this by opening an Issue [see below] if there is no other obvious way to contact the code owners/maintainers).
Merging pull requests proceeds in the same way as merging between branches in a git
repository, with the main difference being that if the merge can be done automatically, the merge can be done entirely through the GitHub site of your code, with no need to check out the fork’s code on your own machine. If there is a merge conflict, you have to check out the fork’s code and manually merge them (although you will likely want to ask the fork’s author to do this on their side, unless it requires
deeper knowledge of the code than the fork’s author can reasonably be expected to have). GitHub has extensive support for helping in the review of pull requests, allowing you to make comments on all of the changes and request additional changes, asking for reviews from particular contributors before approving the changes to be merged, and running any automated tests that you have and reporting their results. If you are expecting to have pull requests be a common way for your code to evolve, it
is essential to have an automated test suite run with continuous integration that covers most of the lines in your code, to protect against unforeseen issues when changes are merged into the main code’s repository.
Pull requests are the most important social aspect of how your code can grow when it’s hosted on GitHub, but GitHub has many more features for the community of your users. A helpful README file is a great way to introduce your code to your users and READMEs can have a variety of formats that allow nice-looking GitHub sites to be created (don’t have one of those drab pure README
GitHub sites, use a Markdown README.md
or a reStructuredText README.rst
to create an attractive first
impression for your code). Like the “Pull requests” tab, there is also an “Issues” tab that provides a venue for users of your code to report issues with its installation or use. Anybody with a GitHub account can open an Issue, which then goes into a list of open issues to be resolved. Any given issue typically consists of a conversation between the user reporting the issue and the code’s maintainer(s) to figure out the root of the issue and commits that resolve the issue. Each issue has a
unique number that you can reference in code commits as #NUMBER
and GitHub will automatically link this commit to the issue online (you can even close issues through commits, by writing phrases like “fixes #NUMBER” in the commit! But make sure that the commit actually fixes the issue, because otherwise you will have to re-open it). When you are reporting an issue, it is important to write up a useful description of the issue: succinctly explain what the issue is, give the version of the code
that you are using and the version of any other relevant component (e.g., the Python version, your operating system, etc.), and try to create a minimal, reproducible example of the issue which allows the maintainer to quickly reproduce the issue themself and which can form the basis of a test added to the code’s test suite checking that the issue is and remains resolved. Report any errors in full, using a service like
pastebin to paste large logs (to not clog up the Issues page). When you open an issue, respond promptly to any follow-up questions (don’t open an issue just before going on vacation!) and make sure to close the issue once it has been resolved.
GitHub has many more features than the basic ones that I have discussed here, many of them having to do with the integration with automated documentation and testing tools that I will discuss later in these notes.