Sunday, February 7, 2010

GIT security model

Git, SHA1 and security
Is the GIT security model dependent on the cryptographic security of the hashing algorithm (SHA1) used by git to generate id's for GIT objects?
After new progresses last year in breaking the SHA1 algorithm, it is reasonable to try to find an answer to this question before deciding to adopt GIT for your software project(s). This was the subject of an interesting discussion I recently had with some colleagues.
There's an interesting post by Linus Torvalds on the  on the Cryptography Mailing List  about this subject, dated 25 Apr. 2005.
Basically it would be very difficult for an an attacker, leveraging the possibility to generate a collision in order to corrupt a GIT object database, to produce huge harms because the GIT security model is NOT based on the cryptographic security of the SHA1 hash, but on the fact that (in Linus' words)"git is distributed, which means that a developer should never actually use a public tree for his development".
And, of course, the possibility of corrupting all the existing repositories of all users involved in a project, without anybody noticing it, is quite remote.
The adoption of SHA1: a design flaw?
Even if we do not consider the adoption of SHA1 an issue by the point of view of security (i.e., we agree that that the weakness of the SHA1 algorithm does not make life easier for attackers who wants compromise the integrity of a GIT archive), still this could be considered a design flaw, since the id's for objects are not deterministically unique, but only probabilistically. My opinion? The probability of collisions of two files in a software project using SHA1 is so low that this will never be a concrete issue for GIT users (thanks to Luca Milanesio, Peter Moore and Stefano Galarraga for your input).

Thursday, February 4, 2010

Jdbc, TCP channels and other funny dudes...

My most affectionate readers already know that I recently worked on a java tool that automates the migration of database structure and content between different database versions (you know, changes in tables, fields, new indexes, and other database refactoring operations): the tool is called DMT, and I already spoke about it in the pages of this blog.
The tool is used by several development teams for all operations of database re-creation, migration and tests, and I was not receiving so many complaining emails these days, so I was pretty confident that it is now decently stable and the planned tests in production-like environment would run smoothly. Of course, this is not what happened: when DMT was used in a pre-production environment for testing purposes, the DBA's who run the tests experienced that it was not able to run the migration, hanging indefinitely without completing the execution of the migration task.
This was the right occasion for me to learn what happens when a TCP channel is closed while a jdbc connection is active over it.

The problem
  1. DMT connects to the Oracle host and sends through a jdbc connection the SQL statements to be executed
  2. The Oracle server receives the SQL statements and begins the execution. While the SQL statements are running, there is no traffic between DMT and the Oracle server, because DMT keeps waiting from the Oracle Server a signal when the execution of the statement is completed (or when an Oracle exception occurs)
  3. While the Oracle Server is running the SQL statements and DMT is waiting a signal that this process has been completed, a firewall detects that the TCP channel between DMT and the Oracle server is inactive and decides to close the connection because of a timeout configuration.
  4. When Oracle completes the SQL execution, the TCP connection with DMT is no longer open, so it is not able to communicate to DMT that the job is done, and DMT keeps waiting a message forever, like an unlucky man which has an unrequited love for a woman
The solution
The TCP Keep-alive mechanism!
It's possible to enable this mechanism in a jdbc connection simply adding the parameter ENABLE=BROKEN to the jdbc string used to activate the connection, and keep_alive "probes" will be  sent over the connection after a period of inactivity keeping the connection alive. The jdbc url will look like the following one:

jdbc:oracle:thin:@(DESCRIPTION=(ENABLE=broken)(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=yourhost1)(PORT=1521))(ADDRESS= (PROTOCOL=TCP)(HOST=yourhost2)(PORT=1521))(LOAD_BALANCE=on)(FAILOVER=on))(CONNECT_DATA=(SERVER=dedicated)(SERVICE_NAME=service.yourcompany.com)(FAILOVER_MODE=(TYPE=session)(METHOD=basic)(RETRIES=10)(DELAY=3))))

The TCP settings of the host OS will be used to determine when to start sending the keep-alive probes ("keep alive time" parameter), how many probes to send before detecting that the connection is closed ("keep alive probes" parameter) and the interval between two consecutive probes ("keep alive interval")(some further details here).

Bonus: What does the classic "TCP/IP Illustrated" book say about keepalive?
Many newcomers to TCP/IP are surprised to learn that no data whatsoever flows across an idle TCP connection. That is, if neither process at the ends of a TCP connection is sending data to the other, nothing is exchanged between the two TCP modules. There is no polling, for example, as you might find with other networking protocols. This means we can start a client process that establishes a TCP connection with a server, and walk away for hours, days, weeks or months, and the connection remains up. Intermediate routers can crash and reboot, phone lines may go down and back up, but as long as neither host at the ends of the connection reboots, the connection remains established . [...]
There are times, however, when a server wants to know if the client's host has either crashed and is down, or crashed and rebooted. The keepalive timer, a feature of many implementations, provides this capability.
But be aware: 
Keepalives are not part of the TCP specification. The Host Requirements RFC provides three reasons not to use them: (1) they can cause perfectly good connections to be dropped during transient failures, (2) they consume unnecessary bandwidth, and (3) they cost money on an internet that charges by the packet. Nevertheless, many implementations provide the keepalive timer. 
Hope this helps!