Wednesday, September 9, 2009

Maintaining an active local fork with Git

[WARNING: this is not a tutorial - it is a workflow deconstruction. Failure to read the entire article before attempting this operation could conceivably result in lost work. Please make certain you understand the process, and remain aware that this is not a linear sequence of instructions, but rather a description of one of my personal experiences. Thanks! James]



Topical Introduction: Git Operations vs. GitHub.com

 I have to say, before anything else is said, that I approached the task of maintaining a git fork with considerable trepidation. I've never really maintained a source tree before that I wasn't really the only one making commits - that was an SVN repo, and honestly, I was never very confident that it could handle more than a few users working on the same project in the absence of some considerable out-of-band cooperation and coordination between those users.

Yesterday I made my first such effort with Git, and though I'm by no means an expert overnight, the whole process went smoothly (including a graceful recovery from a significant mis-step!) and I am delighted to say that I have not only completed successfully the process of merging a changeset to our local fork, but I was also able to make that fork available to others as an upstream source for pulling the changeset for testing and potential incorporation to the project's master repository!

Needless to say, I am very excited about how much I was able to accomplish. Read on to see how I was able to get this done.


Git: Preparing a new (branch if necesary)

One of the first things you should do, typically speaking, is establish a working branch (if you havent already. If you have, skip ahead to 'Introducing new/modified source to the tree', below). This would tend to mean you are already logged into a command shell on the computer that hosts your local fork.

First things first: change your working directory to the top-level directory in the git repository. This allows git to find the repo you're working with, without you having to specify a bunch of location-designation parameters for git on the commandline.

Now, the branch: if you aren't reading this section because you appreciate my mad literary skillz, you're probably reading it because you want to know how to create a branch; and the answer is really simple: you use the 'branch' option to git as shown below:

git branch branch-name

This will create a new branch, containing the exact same source (initiallly) as the 'master' branch of head (whatever was in the repo at the time you made your local fork via a 'git clone url' command.

Simple, huh? Not so fast! now you have to check out that branch to actually work with it. This is done with the following command:

git checkout branchname

Executing this command makes the new branch the 'working' branch. Remember, git is a sort of object-database manager - it has a backstore in .git/ in the top-level directory in the repo, and inside it stores compressed representations of all your branches. This is not as heavy as it sounds - it stores changesets as a set of 'deltas' or changes to the main branch (master).

Now we are good to make changes to our new branch. What? you already incorporated your patched/new files into the 'master' branch (like I did)? Not to worry - putting the master branch back in it's original state is easy as pi:

git reset --hard #hashnum

#hashnum should be the hash value for the last commit in your repository before you cloned it - i.e., your local fork's original condition. The hash value can be obtained using the 'git log' command.

Having reset the repository to it's initial condition (assuming you made a similar mis-step), you are now ready to start over at the beginning of this section and make a proper branch to work in. This implies something:

you should not be editing code directly in the repository - you should be editing only the 'payload' - a working copy of the code extracted from the repository.

For a deconstruction of the simple process that accomplishes this, see my last blog entry :)




Introducing new/modified source to the tree

If you arrived here by skipping ahead, you probably already know that you need to be in a shell changed into the top-level directory of your repository. If not, you probably should go back and read what you skipped, just for the sake of getting some background info.

Adding source to the tree was accomplished with a very simple process; I copied any files from my working source tree that had been created anew or modified into the branch. How this was accomplished is an area I will be exploring in detail in the near future - for this operation, however, it was very simple. I simply copied the changed/added files from the working source into the repository, after having changed branches. I have a couple of questions about this methodology - I'll address them in a later post, but I'll go ahead and ask them now: 1. are there other (better) ways to go about this? and 2. under what conditions might this turn out to fail?

Copying the changed/added files into the repo is not enough, however - we need to invoke git in a couple of ways to prepare/preview the changes to be comitted. First, we need to add the files, as follows:

git add filespec

filespec is a pathspec that points at the new/changed files; if the files already existed in the repo, they are silently ignored - if the are new, they will be added to the repo. Next comes the preview:

git checkout

This will display a list of new and modified files. You should review them for completeness prior to the next command, which will acually make the commits.


Making the commit

git commit

and that will wrap up the commit of the changesets to the local repo.

Wow, was that easy or what!


GitHub.com: Pushing changes from your local fork to your GitHub.com account

GitHub is a relatively new and powerful website complimenting git by providing git users a social metaphor for 'organic' developement conducted on git repositories. It is feature-rich and targeted at both the opensource community and companies who employ opensource developement methodologies.

You will need to obtain an account and log in there, and then make the proper arrangements for authentication. This is done using ssh/rsa keypairs. Full instructions for establishing an account and configuring it for use are available on their website at http://github.com. I highly recommend it!


Exporting your GitHub-hosted repository: The pull request

The need to use GitHub implies certain things about your work and your repository: you will need it if, for instance, your local fork is used to develop code for a remote project you dont have the authorization to 'push' code to - that you want to make available to the maintainers of that remote project for evaluation and potential incorporation into the primary project.

Dont get me wrong, this can be accomplished without the use of GitHub - GitHub just makes it easy to be very communal in your approach. This is a good thing (tm), as git tends to be best used in a very organically organized project, as most opensource projects are.

The way to get your changes on GitHub is excrutiatingly simple: you push them. Your account at github will provide two types of interfaces (in the form of urls) to a fork you place there: one for public (anonymous) use, and one for private use. The former requires no authentication whatsoever; the latter employs a set of rsa keys, exactly as you would employ in the establishment of a secure shell connection. See the GitHub link 'Help with Keys' on the account settings page for details for creating and installing these credentials on your particular operating system platform. Having done so, and assuming you've created a target repository on the site, the push is accomplished as follows:

git push url

where url is the url of the GitHub repository, e.g., git@github.com:JamesStallings/opensim.git



Conclusion

If I have been clear at all, by now you should have a fair understanding of what I did to make this happen. It's not a perfect workflow, more of a discription of my first attempt to establish one with git. I am happy to report, that the operation seems to be a success. I'll continue asking and answering as many questions as I can about git and supporting technologies - if you have questions, use that comments button and I'll see if I can answer yours too.


In my next topic, a few days down the road, I hope to have some answers about the questions I asked wrt proper ways to import new code into the repository - and perhaps take a stab at a formal workflow tutorial.

Thanks for reading!
James

No comments: