Simone Bruno: git

Showing posts with label git. Show all posts

Sunday, March 7, 2010

Simplicity is the ultimate form of sophistication..

After some years of CVS and SVN experience, it's good to have some relief from a painful world of branches and merges: GIT makes everything simpler. You can alter a repository history rebasing your commits, for example. With the "onto" option it is easy to transplant a line of development from one branch to a completely different one. You can reset your repository with the reset option in three different ways (--hard, --soft and.. mixed!), depending whether you need to change your working tree, or only the git index (thus affecting the staged/cached content ready to be committed), or the reference to the HEAD of the current branch. The extra layer represented by the cached (or staged; two different words for the same concept, as - more generally - there are always multiple ways to the same truth) content of the index is beautifully managed by these three options (but be careful, do not make confusion between "reset" and "revert", the later being the dark side, so to speak, of the exotic "cherry-pick" command). Life is much easier now that a symmetrical diff is supported with the intuitive notation "git diff commit1...commit2" (did you notice the THREE dots?). Merges are very easy to perform now: you just need to be careful that your working tree is in sync with your index before starting the job (it's not good to start a merge with a dirty working dir!), and run the merge command. Oh, be aware of criss-cross merges, and choose carefully your merging strategy among the following: resolve, recursive, ours, subtree and the powerful octopus merge. And now when working with remote repositories, there's no more room for useless complications since your local repository contains tracking branches which are mapped to remote branches in the original repository, and these tracking branches (in which you should never run commit or push commands, don't forget it!) are mapped with local development branches using simple and intuitive refspecs configurations available in the .git/config file, which will be used by git whenever issueing a fetch, merge or push command. Anyway, the proliferating of branches and repositories will never add unnecessary complexity to the management of your git-version-controlled projects or your Continuous Integration environments, since it is a commonly adopted best practice in GIT projects to use a depot directory including an authoritative repository which all developers should clone/fetch/pull from and push to (don't call it master repository, or central repository: GIT is a DISTRIBUTED Version Control System!). GIT definitely recalls to me the beautiful declination of the Okkam's Razor by Leonardo da Vinci: simplicity is the ultimate form of sophistication :-).

Sunday, February 7, 2010

GIT security model

Git, SHA1 and security
Is the GIT security model dependent on the cryptographic security of the hashing algorithm (SHA1) used by git to generate id's for GIT objects?
After new progresses last year in breaking the SHA1 algorithm, it is reasonable to try to find an answer to this question before deciding to adopt GIT for your software project(s). This was the subject of an interesting discussion I recently had with some colleagues.
There's an interesting post by Linus Torvalds on the on the Cryptography Mailing List about this subject, dated 25 Apr. 2005.
Basically it would be very difficult for an an attacker, leveraging the possibility to generate a collision in order to corrupt a GIT object database, to produce huge harms because the GIT security model is NOT based on the cryptographic security of the SHA1 hash, but on the fact that (in Linus' words)"git is distributed, which means that a developer should never actually use a public tree for his development".
And, of course, the possibility of corrupting all the existing repositories of all users involved in a project, without anybody noticing it, is quite remote.
The adoption of SHA1: a design flaw?
Even if we do not consider the adoption of SHA1 an issue by the point of view of security (i.e., we agree that that the weakness of the SHA1 algorithm does not make life easier for attackers who wants compromise the integrity of a GIT archive), still this could be considered a design flaw, since the id's for objects are not deterministically unique, but only probabilistically. My opinion? The probability of collisions of two files in a software project using SHA1 is so low that this will never be a concrete issue for GIT users (thanks to Luca Milanesio, Peter Moore and Stefano Galarraga for your input).

Sunday, January 31, 2010

Exploring GIT

I passed this Sunday afternoon reading Version Control with GIT, by Jon Loeliger.
Git is the distributed version control system currently used for Linux Kernel development, conceived and developed under the protective wing of Linus Torvalds himself. The key word here is distributed: using GIT, there is no need of constant synchronization with a single, central repository, thus allowing a distributed model for software development. The book is quite interesting as it's different from most tutorials available in the web: the first chapters of the book describe the internal data structures GIT is based on (commits, trees, blobs and tags stored in the GIT 'Object Store'), and the 'staging' mechanism implemented via the GIT 'index'; the main git commands are then explained referring systematically to these concepts, describing in detail what changes occur to the the git object store and git index as different git commands are executed. The advantage of this approach is that it forces the reader to a deeper understanding of what is behind the scenes while running each command. Of course, you'll have to spend some hours understanding these concepts before diving into git commands, but I think it's worth spending some more hours initially to properly learn a version control technology than spend a lot of hours after, running commands without a full understanding of all the implications and consequences. After all, Linus Torvalds himself stated in the GIT mailing list that you can't grasp and fully appreciate the power of GIT without understanding the purpose of the GIT index, which in turns refers to the objects in the GIT Object Store. If you are using GIT and you are not familiar with these concepts.. you should spend some time studying them, and Version Control with GIT is a good resource to have a look at.

Simone Bruno