[WARNING: this is not a tutorial - it is a workflow deconstruction. Failure to read the entire article before attempting this operation could conceivably result in lost work. Please make certain you understand the process, and remain aware that this is not a linear sequence of instructions, but rather a description of one of my personal experiences. Thanks! James]
Topical Introduction: Git Operations vs. GitHub.com
I have to say, before anything else is said, that I approached the task of maintaining a git fork with considerable trepidation. I've never really maintained a source tree before that I wasn't really the only one making commits - that was an SVN repo, and honestly, I was never very confident that it could handle more than a few users working on the same project in the absence of some considerable out-of-band cooperation and coordination between those users.
Yesterday I made my first such effort with Git, and though I'm by no means an expert overnight, the whole process went smoothly (including a graceful recovery from a significant mis-step!) and I am delighted to say that I have not only completed successfully the process of merging a changeset to our local fork, but I was also able to make that fork available to others as an upstream source for pulling the changeset for testing and potential incorporation to the project's master repository!
Needless to say, I am very excited about how much I was able to accomplish. Read on to see how I was able to get this done.
Git: Preparing a new (branch if necesary)
One of the first things you should do, typically speaking, is establish a working branch (if you havent already. If you have, skip ahead to 'Introducing new/modified source to the tree', below). This would tend to mean you are already logged into a command shell on the computer that hosts your local fork.
First things first: change your working directory to the top-level directory in the git repository. This allows git to find the repo you're working with, without you having to specify a bunch of location-designation parameters for git on the commandline.
Now, the branch: if you aren't reading this section because you appreciate my mad literary skillz, you're probably reading it because you want to know how to create a branch; and the answer is really simple: you use the 'branch' option to git as shown below:
git branch branch-name
This will create a new branch, containing the exact same source (initiallly) as the 'master' branch of head (whatever was in the repo at the time you made your local fork via a 'git clone url' command.
Simple, huh? Not so fast! now you have to check out that branch to actually work with it. This is done with the following command:
git checkout branchname
Executing this command makes the new branch the 'working' branch. Remember, git is a sort of object-database manager - it has a backstore in .git/ in the top-level directory in the repo, and inside it stores compressed representations of all your branches. This is not as heavy as it sounds - it stores changesets as a set of 'deltas' or changes to the main branch (master).
Now we are good to make changes to our new branch. What? you already incorporated your patched/new files into the 'master' branch (like I did)? Not to worry - putting the master branch back in it's original state is easy as pi:
git reset --hard #hashnum
#hashnum should be the hash value for the last commit in your repository before you cloned it - i.e., your local fork's original condition. The hash value can be obtained using the 'git log' command.
Having reset the repository to it's initial condition (assuming you made a similar mis-step), you are now ready to start over at the beginning of this section and make a proper branch to work in. This implies something:
you should not be editing code directly in the repository - you should be editing only the 'payload' - a working copy of the code extracted from the repository.
For a deconstruction of the simple process that accomplishes this, see my last blog entry :)
Introducing new/modified source to the tree
If you arrived here by skipping ahead, you probably already know that you need to be in a shell changed into the top-level directory of your repository. If not, you probably should go back and read what you skipped, just for the sake of getting some background info.
Adding source to the tree was accomplished with a very simple process; I copied any files from my working source tree that had been created anew or modified into the branch. How this was accomplished is an area I will be exploring in detail in the near future - for this operation, however, it was very simple. I simply copied the changed/added files from the working source into the repository, after having changed branches. I have a couple of questions about this methodology - I'll address them in a later post, but I'll go ahead and ask them now: 1. are there other (better) ways to go about this? and 2. under what conditions might this turn out to fail?
Copying the changed/added files into the repo is not enough, however - we need to invoke git in a couple of ways to prepare/preview the changes to be comitted. First, we need to add the files, as follows:
git add filespec
filespec is a pathspec that points at the new/changed files; if the files already existed in the repo, they are silently ignored - if the are new, they will be added to the repo. Next comes the preview:
git checkout
This will display a list of new and modified files. You should review them for completeness prior to the next command, which will acually make the commits.
Making the commit
git commit
and that will wrap up the commit of the changesets to the local repo.
Wow, was that easy or what!
GitHub.com: Pushing changes from your local fork to your GitHub.com account
GitHub is a relatively new and powerful website complimenting git by providing git users a social metaphor for 'organic' developement conducted on git repositories. It is feature-rich and targeted at both the opensource community and companies who employ opensource developement methodologies.
You will need to obtain an account and log in there, and then make the proper arrangements for authentication. This is done using ssh/rsa keypairs. Full instructions for establishing an account and configuring it for use are available on their website at http://github.com. I highly recommend it!
Exporting your GitHub-hosted repository: The pull request
The need to use GitHub implies certain things about your work and your repository: you will need it if, for instance, your local fork is used to develop code for a remote project you dont have the authorization to 'push' code to - that you want to make available to the maintainers of that remote project for evaluation and potential incorporation into the primary project.
Dont get me wrong, this can be accomplished without the use of GitHub - GitHub just makes it easy to be very communal in your approach. This is a good thing (tm), as git tends to be best used in a very organically organized project, as most opensource projects are.
The way to get your changes on GitHub is excrutiatingly simple: you push them. Your account at github will provide two types of interfaces (in the form of urls) to a fork you place there: one for public (anonymous) use, and one for private use. The former requires no authentication whatsoever; the latter employs a set of rsa keys, exactly as you would employ in the establishment of a secure shell connection. See the GitHub link 'Help with Keys' on the account settings page for details for creating and installing these credentials on your particular operating system platform. Having done so, and assuming you've created a target repository on the site, the push is accomplished as follows:
git push url
where url is the url of the GitHub repository, e.g., git@github.com:JamesStallings/opensim.git
Conclusion
If I have been clear at all, by now you should have a fair understanding of what I did to make this happen. It's not a perfect workflow, more of a discription of my first attempt to establish one with git. I am happy to report, that the operation seems to be a success. I'll continue asking and answering as many questions as I can about git and supporting technologies - if you have questions, use that comments button and I'll see if I can answer yours too.
In my next topic, a few days down the road, I hope to have some answers about the questions I asked wrt proper ways to import new code into the repository - and perhaps take a stab at a formal workflow tutorial.
Thanks for reading!
James
Wednesday, September 9, 2009
Thursday, September 3, 2009
Welcome back to me!
It has been a very long time since I made a post to this blog. The reasons are many and varied - among them, that I had little to post about that wasn't some sort of whining or bitching or other negativity. As most of the reasons for this state of affairs were more personal than related to OpenSim or OSGrid, I felt it better that I remain silent than reflect personal issues on either of these splendid efforts.
Indeed, I have been considering passing the blog off to someone else, in the interest of keeping the name in good, productive hands.
That said, I've changed my mind. I do have something to post about after all, that is positive, usefull, and perhaps even potentially usefull to others! YAY!
So, without further ado, I'll make this new post, and hopefully many more to come, on the topic of the Git source control management system.
First, a little bit about why this topic comes to light at this time.
Recently, much to the dismay of many users (and not a few developers), OpenSim transitioned from the Subversion SCM to the Git SCM. Without getting too deeply into history, Linus Torvalds, originator of the Linux Operating System project, wrote git in an effort to make a better source control system, flexible enough to accomodate the needs of the many thousands of people submitting patches to the linux kernel, and to make his (or others, as the case may be), lives easier with respect to managing these many patch submissions when merging these patches. As with just about everything Linus does, he pushed the tool to the limits of software technology, and produced entirely new functionalities not previously seen in any SCM, that specifically target organic software developement processes within volunteer communities of open source developers.
Whew! good thing I didn't get too deep into history!
So, essentially, until it's a non-issue, have a look here each or every other day or so, for a 'Git Trick of the Day'. And with that, here's todays Git Trick of the Day, presented as a problem coupled with a solution to that problem (I'll try to stick to that format as I accumulate posts):
Problem:
The local git repository, obtained via the 'git clone' operation, contains much more than just the source tree for the project. It also contains a good number of git SCM artifacts, which represent the entire revision history of the project to the time of the clone. This makes it less than optimal as an environment for building and testing the code that resides in the project.
While this might seem an incredible handicap, it is in fact one of the chief sources of power that comes with git.
I dont want to get too far from the problem space I'm addressing here today, so I wont get into a discussion of those intricasies at this point - rather, I will provide a command for extraction of the source tree into a working directory apropos the build+test operation.
Without further ado, here's that command:
git tar-tree HEAD opensimstaging | (cd ~ && tar xf -)
Now, lets disect that a bit. But first, a little warning is in order: Git is still very much in developement, and change is the order of the day - you will need the latest stable revision to use this command (1.6.0.2 at the time of this writing). Further, I'm a linux user primarily, and I have not yet vetted this procedure on windows - though I will, in the near future, and in the not-too-distant future I will advise in a comment to this post as to how to accomplish this on that 'other' platform ;)
One last warning with respect to the changing nature of git: this command, even on this late stable revision of the code, is already deprecated! Dont let that concern you though - git is pretty smart, and will 're-write' the command for you to produce one that is more current. I'll add that to comments as well, once I've puzzled it out.
Onward!
The git command above does the following things:
Firstly, it executes the git command:
git tar-tree HEAD opensimstaging
What this does is extract an archive ready copy of the repository 'payload' from the HEAD - also referred to as the 'master' branch. This is the default branch of the repository, chronologically current as of the time the git clone command was used to instantiate a local 'fork' or working copy of the repository. It bases this tree in the relative directory 'opensimstaging'. You can feel free to replace 'opensimstaging' in this command with a directory name more appropriate to your local needs.
As the output being produced is in 'tar' (tape archive) format, this is the directory the tree will be unpacked to (were it actually being written to a tar file. Tar files are an associated topic, but a bit out of scope for this post). Suffice it to say, that this tar format file will be written to a special unix file called 'stdout'. This is also known as 'the console' - in other words, where things get printed.
This is where the rest of the command comes into play:
| (cd ~ && tar xf -)
Note the first character, the pipe '|' symbol. In unix parlance, this is the command's 'plumbing'. It literally ties the output from one command to the input of another, in this case a tar command.
Note the presence of the parentheses, the 'cd' command, and the double ampersand (&&). This construct has the following effects - the parenthesis binds commands together, and enforces the order of execution within them - think of it as a sort of aggregation of the commands within the parentheses, causing the shell to interpret them as if they were a single command. The double ampersands concatenate the commands adjacent to it such that the latter does not execute unless the former is successful. It's a sort of error-condition control. If the path requested in the 'cd' command does not exist, causing an error condition and failure of that command as a result, the subsequent 'tar' command will not be launched.
Barring any trouble, the sequence is like this: the tar formatted output from the git command will be passed to the tar command joined on the line by the pipe, but only if the encapsulated cd command works. Tar then unpacks the tar archive on git's output stream into the directory specified. Nifty, huh! and the product produced in the target directory is without the git-related objects that make up the repository proper, leaving it 'clean' for purposes of building and testing.
A further benefit is that the 'git pull' command can be used on the original clone to update it, and the process described herein repeated to produce an up-to-date build+test environment, without the overhead of cloning the project a second time from 'origin'.
I've tried to be both brief and thorough here, and I hope I've not added to your confusion - if you have questions, feel free to ask them in comments, or catch me on FreeNode IRC on #osgrid - I'm Hiro_Protagonist both there and on OSGrid.
Thanks for reading, and I hope I have been able to help you feel more comfortable with git!
Cheers
James
Indeed, I have been considering passing the blog off to someone else, in the interest of keeping the name in good, productive hands.
That said, I've changed my mind. I do have something to post about after all, that is positive, usefull, and perhaps even potentially usefull to others! YAY!
So, without further ado, I'll make this new post, and hopefully many more to come, on the topic of the Git source control management system.
First, a little bit about why this topic comes to light at this time.
Recently, much to the dismay of many users (and not a few developers), OpenSim transitioned from the Subversion SCM to the Git SCM. Without getting too deeply into history, Linus Torvalds, originator of the Linux Operating System project, wrote git in an effort to make a better source control system, flexible enough to accomodate the needs of the many thousands of people submitting patches to the linux kernel, and to make his (or others, as the case may be), lives easier with respect to managing these many patch submissions when merging these patches. As with just about everything Linus does, he pushed the tool to the limits of software technology, and produced entirely new functionalities not previously seen in any SCM, that specifically target organic software developement processes within volunteer communities of open source developers.
Whew! good thing I didn't get too deep into history!
So, essentially, until it's a non-issue, have a look here each or every other day or so, for a 'Git Trick of the Day'. And with that, here's todays Git Trick of the Day, presented as a problem coupled with a solution to that problem (I'll try to stick to that format as I accumulate posts):
Problem:
The local git repository, obtained via the 'git clone' operation, contains much more than just the source tree for the project. It also contains a good number of git SCM artifacts, which represent the entire revision history of the project to the time of the clone. This makes it less than optimal as an environment for building and testing the code that resides in the project.
While this might seem an incredible handicap, it is in fact one of the chief sources of power that comes with git.
I dont want to get too far from the problem space I'm addressing here today, so I wont get into a discussion of those intricasies at this point - rather, I will provide a command for extraction of the source tree into a working directory apropos the build+test operation.
Without further ado, here's that command:
git tar-tree HEAD opensimstaging | (cd ~ && tar xf -)
Now, lets disect that a bit. But first, a little warning is in order: Git is still very much in developement, and change is the order of the day - you will need the latest stable revision to use this command (1.6.0.2 at the time of this writing). Further, I'm a linux user primarily, and I have not yet vetted this procedure on windows - though I will, in the near future, and in the not-too-distant future I will advise in a comment to this post as to how to accomplish this on that 'other' platform ;)
One last warning with respect to the changing nature of git: this command, even on this late stable revision of the code, is already deprecated! Dont let that concern you though - git is pretty smart, and will 're-write' the command for you to produce one that is more current. I'll add that to comments as well, once I've puzzled it out.
Onward!
The git command above does the following things:
Firstly, it executes the git command:
git tar-tree HEAD opensimstaging
What this does is extract an archive ready copy of the repository 'payload' from the HEAD - also referred to as the 'master' branch. This is the default branch of the repository, chronologically current as of the time the git clone command was used to instantiate a local 'fork' or working copy of the repository. It bases this tree in the relative directory 'opensimstaging'. You can feel free to replace 'opensimstaging' in this command with a directory name more appropriate to your local needs.
As the output being produced is in 'tar' (tape archive) format, this is the directory the tree will be unpacked to (were it actually being written to a tar file. Tar files are an associated topic, but a bit out of scope for this post). Suffice it to say, that this tar format file will be written to a special unix file called 'stdout'. This is also known as 'the console' - in other words, where things get printed.
This is where the rest of the command comes into play:
| (cd ~ && tar xf -)
Note the first character, the pipe '|' symbol. In unix parlance, this is the command's 'plumbing'. It literally ties the output from one command to the input of another, in this case a tar command.
Note the presence of the parentheses, the 'cd' command, and the double ampersand (&&). This construct has the following effects - the parenthesis binds commands together, and enforces the order of execution within them - think of it as a sort of aggregation of the commands within the parentheses, causing the shell to interpret them as if they were a single command. The double ampersands concatenate the commands adjacent to it such that the latter does not execute unless the former is successful. It's a sort of error-condition control. If the path requested in the 'cd' command does not exist, causing an error condition and failure of that command as a result, the subsequent 'tar' command will not be launched.
Barring any trouble, the sequence is like this: the tar formatted output from the git command will be passed to the tar command joined on the line by the pipe, but only if the encapsulated cd command works. Tar then unpacks the tar archive on git's output stream into the directory specified. Nifty, huh! and the product produced in the target directory is without the git-related objects that make up the repository proper, leaving it 'clean' for purposes of building and testing.
A further benefit is that the 'git pull' command can be used on the original clone to update it, and the process described herein repeated to produce an up-to-date build+test environment, without the overhead of cloning the project a second time from 'origin'.
I've tried to be both brief and thorough here, and I hope I've not added to your confusion - if you have questions, feel free to ask them in comments, or catch me on FreeNode IRC on #osgrid - I'm Hiro_Protagonist both there and on OSGrid.
Thanks for reading, and I hope I have been able to help you feel more comfortable with git!
Cheers
James
Subscribe to:
Posts (Atom)