Thursday, September 3, 2009

Welcome back to me!

It has been a very long time since I made a post to this blog. The reasons are many and varied - among them, that I had little to post about that wasn't some sort of whining or bitching or other negativity. As most of the reasons for this state of affairs were more personal than related to OpenSim or OSGrid, I felt it better that I remain silent than reflect personal issues on either of these splendid efforts.

Indeed, I have been considering passing the blog off to someone else, in the interest of keeping the name in good, productive hands.

That said, I've changed my mind. I do have something to post about after all, that is positive, usefull, and perhaps even potentially usefull to others! YAY!

So, without further ado, I'll make this new post, and hopefully many more to come, on the topic of the Git source control management system.

First, a little bit about why this topic comes to light at this time.

Recently, much to the dismay of many users (and not a few developers), OpenSim transitioned from the Subversion SCM to the Git SCM. Without getting too deeply into history, Linus Torvalds, originator of the Linux Operating System project, wrote git in an effort to make a better source control system, flexible enough to accomodate the needs of the many thousands of people submitting patches to the linux kernel, and to make his (or others, as the case may be), lives easier with respect to managing these many patch submissions when merging these patches. As with just about everything Linus does, he pushed the tool to the limits of software technology, and produced entirely new functionalities not previously seen in any SCM, that specifically target organic software developement processes within volunteer communities of open source developers.

Whew! good thing I didn't get too deep into history!

So, essentially, until it's a non-issue, have a look here each or every other day or so, for a 'Git Trick of the Day'. And with that, here's todays Git Trick of the Day, presented as a problem coupled with a solution to that problem (I'll try to stick to that format as I accumulate posts):


The local git repository, obtained via the 'git clone' operation, contains much more than just the source tree for the project. It also contains a good number of git SCM artifacts, which represent the entire revision history of the project to the time of the clone. This makes it less than optimal as an environment for building and testing the code that resides in the project.

While this might seem an incredible handicap, it is in fact one of the chief sources of power that comes with git.

I dont want to get too far from the problem space I'm addressing here today, so I wont get into a discussion of those intricasies at this point - rather, I will provide a command for extraction of the source tree into a working directory apropos the build+test operation.

Without further ado, here's that command:
git tar-tree HEAD opensimstaging | (cd ~ && tar xf -)

Now, lets disect that a bit. But first, a little warning is in order: Git is still very much in developement, and change is the order of the day - you will need the latest stable revision to use this command ( at the time of this writing). Further, I'm a linux user primarily, and I have not yet vetted this procedure on windows - though I will, in the near future, and in the not-too-distant future I will advise in a comment to this post as to how to accomplish this on that 'other' platform ;)

One last warning with respect to the changing nature of git: this command, even on this late stable revision of the code, is already deprecated! Dont let that concern you though - git is pretty smart, and will 're-write' the command for you to produce one that is more current. I'll add that to comments as well, once I've puzzled it out.


The git command above does the following things:

Firstly, it executes the git command:
git tar-tree HEAD opensimstaging

What this does is extract an archive ready copy of the repository 'payload' from the HEAD - also referred to as the 'master' branch. This is the default branch of the repository, chronologically current as of the time the git clone command was used to instantiate a local 'fork' or working copy of the repository. It bases this tree in the relative directory 'opensimstaging'. You can feel free to replace 'opensimstaging' in this command with a directory name more appropriate to your local needs.

As the output being produced is in 'tar' (tape archive) format, this is the directory the tree will be unpacked to (were it actually being written to a tar file. Tar files are an associated topic, but a bit out of scope for this post). Suffice it to say, that this tar format file will be written to a special unix file called 'stdout'. This is also known as 'the console' - in other words, where things get printed.

This is where the rest of the command comes into play:
| (cd ~ && tar xf -)

Note the first character, the pipe '|' symbol. In unix parlance, this is the command's 'plumbing'. It literally ties the output from one command to the input of another, in this case a tar command.

Note the presence of the parentheses, the 'cd' command, and the double ampersand (&&). This construct has the following effects - the parenthesis binds commands together, and enforces the order of execution within them - think of it as a sort of aggregation of the commands within the parentheses, causing the shell to interpret them as if they were a single command. The double ampersands concatenate the commands adjacent to it such that the latter does not execute unless the former is successful. It's a sort of error-condition control. If the path requested in the 'cd' command does not exist, causing an error condition and failure of that command as a result, the subsequent 'tar' command will not be launched.

Barring any trouble, the sequence is like this: the tar formatted output from the git command will be passed to the tar command joined on the line by the pipe, but only if the encapsulated cd command works. Tar then unpacks the tar archive on git's output stream into the directory specified. Nifty, huh! and the product produced in the target directory is without the git-related objects that make up the repository proper, leaving it 'clean' for purposes of building and testing.

A further benefit is that the 'git pull' command can be used on the original clone to update it, and the process described herein repeated to produce an up-to-date build+test environment, without the overhead of cloning the project a second time from 'origin'.

I've tried to be both brief and thorough here, and I hope I've not added to your confusion - if you have questions, feel free to ask them in comments, or catch me on FreeNode IRC on #osgrid - I'm Hiro_Protagonist both there and on OSGrid.

Thanks for reading, and I hope I have been able to help you feel more comfortable with git!



coyled said...

The other, newer command you were looking for is git-archive. So your example would be:

git archive --format=tar --prefix=opensimstaging/ HEAD | (cd ~ && tar xf -)

BTW, I think publishing a git tip of the day is a great idea.


James Stallings II said...

Thanks Coyled :)

Watch for a post today thats a real powerhouse of workflow deconstruction using Git in conjunction with a GitHub account - which wouldn't have been possible without ya!

Thanks for the comment and the assistance Dave!