Sunday, December 12, 2010

Dom_12_Dic_2010_18:07:53_CET.log

Playing around with python messaging, in particular RabbitMQ, celery and pika, yesterday night till 5pm! I should probably get a life.

Sunday, September 5, 2010

Python and Oracle: you need to start somewhere!

If you need to access an Oracle database using python (actually, to access any database), the first thing you'll need to read is the Python Database API Specification 2.0, which proposes a standard interface to be used by database access libraries. This specification is database agnostic, and various libraries exist which implement it (more or less strictly) for different DBMS.
The library I am using for Oracle access is cx-Oracle, which (almost!) conforms to this specification.
If the Python Database API Specification is too abstract for you and you'd be happy if the official cx-Oracle docs contained more examples, what you need is a nice tutorial to introduce yourself to this library. Then maybe another tutorial to go beyond the basics. Now probably the Specification and the cx-Oracle docs make more sense!
Oh, if you want to work with cx-oracle, you will need to install it first :-). I found very useful this howto  by Catherine Devlin. If you work with Ubuntu, you will love it for sure.
Hope this helps!

PS
I assume that yo have some familiarity with Python and Oracle.
If not, I bet this post is not the most interesting you've ever read!

Sunday, May 16, 2010

Model View Controller

Straight to the point: in a MVC architecture, do Model and View need to communicate directly, or all the communications should be mediated by the Controller? There are two schools of thought regarding this topic, and the question often generates little funny religion wars.
The best answer to this question which I have ever found is in the Cocoa Design Patterns document, available on the Apple dev center website (by the way, the document is a recommended reading regarding design patterns in general, whatever the language, platform or technology you use).

To sum up, the document describes both models, named "traditional model" (in which there is direct communication between model and view) and the "Cocoa model", in which the Controller always mediates between View and Model.

Here is the "Traditional" MVC model:

Here is the "Cocoa" MVC model:


One of the main arguments in defence of the second architecture is that it theoretically improves the reusability of View and Model, because they are totally decoupled.
Read Cocoa Design Patterns for a thorough discussion!

Iphone Application startup: the complexity behind the scenes.

Let' write an IPhone application which displays a single "Helloworld" label at the center of the display.
It's plenty of tutorials and books which explain how to do that: usually they explain how to use a View Based project template in X-Code, add a label with Interface Builder, and run the application.
Very easy.
But have a look to the set of files generated by Xcode while choosing the View based template: you will see a main method, an Application Delegate class, a ViewController class, a plist file, and two .xib files which define the main Window and the main View of the application.
Also, you will notice that the main .xib file is referenced in the plist file (as the "main nib" ..), and that it is made up of five objects: a UIApplication, a UI Responder, the Application Delegate, the View Controller and a Window Object. Also, some references between objects (or, more precisely "Outlets") are defined: the UIApplication contains a reference to the Delegate Object, which in turn contains references to both the ViewController and the Main window. Finally the ViewController is owner of the second .xib file, which defines the actual interface of our application and is connected to our ViewController class by the usual mechanism of Outlets and Actions.
Quite complicated for a simple HelloWorld application!

Fortunately, somebody (Bill Dudney) already made an effort to put all the pieces together and explain very clearly how they all work together: Demystifying iPhone App Startup
Really a great post, highly recommended!

Sunday, May 9, 2010

Did I write that code?!?

Preliminary warning: this post is not worth reading for you if you never make mistakes.


Illusory superiority is, according to Wikipedia's definition, "a cognitive bias that causes people to overestimate their positive qualities and abilities and to underestimate their negative qualities, relative to others"
It is that well-known, well documented phenomenon (also known as the above average effect) that makes so easy for us (T)IT professionals to judge and criticize other people's work. It's a fact of life.
In other words, we tend naturally to have a distorted perception of reality when it comes to evaluate the quality/effectiveness of our work compared to the quality/effectiveness of the work of our colleagues.
It is extremely therapeutic, in order to come back to reality, to periodically try to spot major flaws in your own work: poor design or shitty code, completely wrong approach to problems, wrong organizational choices. Try to be as cruel as possible (which means: as cruel as you would be if you were analyzing other people's work).
The epiphany will occur when you'll find yourself wondering, confused: "did I really write this code?" (or "did I really choose this process? did I really proposed this solution? did I really ignore this issue?", depending on what your actual work is).
Everybody makes mistakes, experience allows us to improve ourselves recognizing them, but this virtuous path requires humility, as written in almost all books of self improvement written by spiritual gurus, genuinely illuminated guides or simply people which love making money selling books about this kind of stuff (And they actually sell them: smart guys, definitely above the average).

Tuesday, April 13, 2010

Iphone Apps vs Mobile Web vs Android vs... The winner is...

... Apple!

Yesterday night I attended a Mobile Monday meeting in Düsseldorf about current trends in mobile development, namely native apps (Iphone, Android, ...) vs browser-based web applications.
The meeting did not make me change my mind about the best opportunities the market is offering these days to smart developers. I actually think there's a winner around, Apple, and its mobile eco-system (Iphone, Ipod, Ipad, ...).
Why? A few reasons here:
  1. Apple Iphone Apps and the App Store proofed to be effective, changing the mobile standards and defining a brand new profitable business model paradigm; the other main actors (including Google, I would say) are making their moves, of course, but they are all, still, behind
  2. Objective C's popularity is quickly increasing, but still there are not as many good OC developers on the market as many web developers or java developers. Have a look to the job offers and you'll be quickly convinced that learning OC is not a waste of time on a career perspective.
  3. The market requires Iphone developers: loads of major companies want a branded Iphone application because it's cool, because it contributes positively to the brand, because the Iphone is de facto  the "smart phone" par excellence. An example to make my point clear: http://www.tecnisa.com.br/lp-iphone.html. Tecnisa is one of the major brazilian construction firms, and they have a branded Iphone app, beautifully advertised in their website. No Android application, did you notice it?
  4. Have you ever browsed your huge collection of porn pictures on an Ipad (come on, I know you DO have a huge collection of porn pictures, and you DO know who Peter North, Davide Jannello and Jenna Jameson are...)? The truth is: Apple devices are the coolest around (but I'm looking forward to see Casio mobile devices!!!).
Prepare to buy my applications on the Apple Store!

PS
Of course I know you still can make loads of money developing "traditional" Web Apps, or Android Apps or.. whatever. Even developing python scripts or java command-line tools for DBA's or Sys-Admins. Probably even playing a Calabash-made harp-lute. It all depends on three factors:
  1. How smart you are
  2. How business oriented you are
  3. How lucky you are!

Monday, March 8, 2010

Psychology (psychiatry?) of the True IT Professional

It's easy to spot the True IT Professional (TITP, capital T is on purpose) among a bunch of fake ones.

First of all, the TITP will never lose an occasion to talk about technology: you mention a language? The TITP already knows it, and in one of his previous professional experiences he has successfully used it to solve almost impossible problems. Or has just written a book about it, or an article in a authoritative website, or there's a post in his blog about it. At least, he has attended a conference about it, or he has read something about it. The same applies for any frameworks, technologies, methods/methodologies of software development you can possibly imagine, even if you try with the most abstruse buzzwords: the TITP already knows it, not in a superficial way but with a deep and thorough understanding, and can probably show to you an entry in his Curriculum Vitae which documents his rock-solid expertise in the field.
Furthermore, he will very often express an opinion about it, which is in 90% of cases extremely negative, in 10% of cases extremely positive and enthusiastic, but never neutral: "Technology X? It's a mess, and leads to messy code", "Framework Y? Mmh, no doubt: Z is much better, and I can tell you because I worked for many years with this kind of frameworks", "Method(ology) H? Brilliant, the best thing I've ever tried in my professional life. But unfortunately I know many IT guys that are not skilled enough to understand this and keep on using obsolete and ineffective ways to blablabla...".
Expressing an opinion about a technology is a subtle, indirect way to convey to your audience the following message: "I know that!".

The art of indirect messages is a technique that every TITP effortlessly masters, and it is the foundation of the second basic TITP behavioral pattern: the "All other IT guys are professionally inferior to me" one. The TITP will never say "I am the best around". That would be arrogant. But, still, he would really like to say it, because he actually thinks so or, in some cases, simply likes to think so. Instead, he will adopt an indirect strategy and start painting a world full of mistakes committed by other developers and other working teams; he will never run short of criticisms directed to other IT guys, primarily about their technical skills, but not neglecting their "political" skills, their ability to work in team, and even their lack of sense of humor. The subtle strategy behind this pattern is the following: if I can convey the idea that other people around are professionally less skilled and effective than me, de facto I am successfully conveying the idea that I am the best around (quod erat demonstrandum). Without looking arrogant. Clever.
Ten TITP's out of ten talking with you will imply with their words the message "I am the best around".
And you can bet they (we?) all are :-)!

Post Scriptum
A TITP has a blog, and he uses it to show technical proficiency, brilliant sense of humor and some knowledge of Latin

Sunday, March 7, 2010

Simplicity is the ultimate form of sophistication..

After some years of CVS and SVN experience, it's good to have some relief from a painful world of branches and merges: GIT makes everything simpler. You can alter a repository history rebasing your commits, for example. With the "onto" option it is easy to transplant a line of development from one branch to a completely different one. You can reset your repository with the reset option in three different ways (--hard, --soft and.. mixed!),  depending whether you need to change your working tree, or only the git index (thus affecting the staged/cached content ready to be committed), or the reference to the HEAD of the current branch. The extra layer represented by the cached (or staged; two different words for the same concept, as - more generally - there are always multiple ways to the same truth) content of the index is beautifully managed by these three options (but be careful, do not make confusion between "reset" and "revert", the later being the dark side, so to speak, of the exotic "cherry-pick" command). Life is much easier now that a symmetrical diff is supported with the intuitive notation "git diff commit1...commit2" (did you notice the THREE dots?).  Merges are very easy to perform now: you just need to be careful that your working tree is in sync with your index before starting the job (it's not good to start a merge with a dirty working dir!), and run the merge command. Oh, be aware of criss-cross merges, and choose carefully your merging strategy among the following: resolve, recursive, ours, subtree and the powerful octopus merge. And now when working with remote repositories, there's no more room for useless complications since your local repository contains tracking branches which are mapped to remote branches in the original repository, and these tracking branches (in which you should never run commit or push commands, don't forget it!) are mapped with local development branches using simple and intuitive refspecs configurations available in the .git/config file, which will be used by git whenever issueing a fetch, merge or push command. Anyway, the proliferating of branches and repositories will never add unnecessary complexity to the management of your git-version-controlled projects or your Continuous Integration environments, since it is a commonly adopted best practice in GIT projects to use a depot directory including an authoritative repository which all developers should clone/fetch/pull from and push to (don't call it master repository, or central repository: GIT is a DISTRIBUTED Version Control System!). GIT definitely recalls to me the beautiful declination of the Okkam's Razor by Leonardo da Vinci: simplicity is the ultimate form of sophistication :-).

Sunday, February 7, 2010

GIT security model

Git, SHA1 and security
Is the GIT security model dependent on the cryptographic security of the hashing algorithm (SHA1) used by git to generate id's for GIT objects?
After new progresses last year in breaking the SHA1 algorithm, it is reasonable to try to find an answer to this question before deciding to adopt GIT for your software project(s). This was the subject of an interesting discussion I recently had with some colleagues.
There's an interesting post by Linus Torvalds on the  on the Cryptography Mailing List  about this subject, dated 25 Apr. 2005.
Basically it would be very difficult for an an attacker, leveraging the possibility to generate a collision in order to corrupt a GIT object database, to produce huge harms because the GIT security model is NOT based on the cryptographic security of the SHA1 hash, but on the fact that (in Linus' words)"git is distributed, which means that a developer should never actually use a public tree for his development".
And, of course, the possibility of corrupting all the existing repositories of all users involved in a project, without anybody noticing it, is quite remote.
The adoption of SHA1: a design flaw?
Even if we do not consider the adoption of SHA1 an issue by the point of view of security (i.e., we agree that that the weakness of the SHA1 algorithm does not make life easier for attackers who wants compromise the integrity of a GIT archive), still this could be considered a design flaw, since the id's for objects are not deterministically unique, but only probabilistically. My opinion? The probability of collisions of two files in a software project using SHA1 is so low that this will never be a concrete issue for GIT users (thanks to Luca Milanesio, Peter Moore and Stefano Galarraga for your input).

Thursday, February 4, 2010

Jdbc, TCP channels and other funny dudes...

My most affectionate readers already know that I recently worked on a java tool that automates the migration of database structure and content between different database versions (you know, changes in tables, fields, new indexes, and other database refactoring operations): the tool is called DMT, and I already spoke about it in the pages of this blog.
The tool is used by several development teams for all operations of database re-creation, migration and tests, and I was not receiving so many complaining emails these days, so I was pretty confident that it is now decently stable and the planned tests in production-like environment would run smoothly. Of course, this is not what happened: when DMT was used in a pre-production environment for testing purposes, the DBA's who run the tests experienced that it was not able to run the migration, hanging indefinitely without completing the execution of the migration task.
This was the right occasion for me to learn what happens when a TCP channel is closed while a jdbc connection is active over it.

The problem
  1. DMT connects to the Oracle host and sends through a jdbc connection the SQL statements to be executed
  2. The Oracle server receives the SQL statements and begins the execution. While the SQL statements are running, there is no traffic between DMT and the Oracle server, because DMT keeps waiting from the Oracle Server a signal when the execution of the statement is completed (or when an Oracle exception occurs)
  3. While the Oracle Server is running the SQL statements and DMT is waiting a signal that this process has been completed, a firewall detects that the TCP channel between DMT and the Oracle server is inactive and decides to close the connection because of a timeout configuration.
  4. When Oracle completes the SQL execution, the TCP connection with DMT is no longer open, so it is not able to communicate to DMT that the job is done, and DMT keeps waiting a message forever, like an unlucky man which has an unrequited love for a woman
The solution
The TCP Keep-alive mechanism!
It's possible to enable this mechanism in a jdbc connection simply adding the parameter ENABLE=BROKEN to the jdbc string used to activate the connection, and keep_alive "probes" will be  sent over the connection after a period of inactivity keeping the connection alive. The jdbc url will look like the following one:

jdbc:oracle:thin:@(DESCRIPTION=(ENABLE=broken)(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=yourhost1)(PORT=1521))(ADDRESS= (PROTOCOL=TCP)(HOST=yourhost2)(PORT=1521))(LOAD_BALANCE=on)(FAILOVER=on))(CONNECT_DATA=(SERVER=dedicated)(SERVICE_NAME=service.yourcompany.com)(FAILOVER_MODE=(TYPE=session)(METHOD=basic)(RETRIES=10)(DELAY=3))))

The TCP settings of the host OS will be used to determine when to start sending the keep-alive probes ("keep alive time" parameter), how many probes to send before detecting that the connection is closed ("keep alive probes" parameter) and the interval between two consecutive probes ("keep alive interval")(some further details here).

Bonus: What does the classic "TCP/IP Illustrated" book say about keepalive?
Many newcomers to TCP/IP are surprised to learn that no data whatsoever flows across an idle TCP connection. That is, if neither process at the ends of a TCP connection is sending data to the other, nothing is exchanged between the two TCP modules. There is no polling, for example, as you might find with other networking protocols. This means we can start a client process that establishes a TCP connection with a server, and walk away for hours, days, weeks or months, and the connection remains up. Intermediate routers can crash and reboot, phone lines may go down and back up, but as long as neither host at the ends of the connection reboots, the connection remains established . [...]
There are times, however, when a server wants to know if the client's host has either crashed and is down, or crashed and rebooted. The keepalive timer, a feature of many implementations, provides this capability.
But be aware: 
Keepalives are not part of the TCP specification. The Host Requirements RFC provides three reasons not to use them: (1) they can cause perfectly good connections to be dropped during transient failures, (2) they consume unnecessary bandwidth, and (3) they cost money on an internet that charges by the packet. Nevertheless, many implementations provide the keepalive timer. 
Hope this helps!

Sunday, January 31, 2010

Exploring GIT

I passed this Sunday afternoon reading Version Control with GIT, by Jon Loeliger.
Git is the distributed version control system currently used for Linux Kernel development, conceived and developed under the protective wing of Linus Torvalds himself. The key word here is distributed: using GIT, there is no need of constant synchronization with a single, central repository, thus allowing a distributed model for software development. The book is quite interesting as it's different from most tutorials available in the web: the first chapters of the book describe the  internal data structures GIT is based on (commits, trees, blobs and tags stored in the GIT 'Object Store'), and the 'staging' mechanism implemented via the GIT 'index'; the main git commands are then explained referring systematically to these concepts, describing in detail what changes occur to the the git object store and git index as different git commands are executed. The advantage of this approach is that it forces the reader to a deeper understanding of what is behind the scenes while running each command. Of course, you'll have to spend some hours understanding these concepts before diving into git commands, but I think it's worth spending some more hours initially to properly learn a version control technology than spend a lot of hours after, running commands without a full understanding of all the implications and consequences. After all, Linus Torvalds himself stated in the GIT mailing list that you can't grasp and fully appreciate the power of GIT without understanding the purpose of the GIT index, which in turns refers to the objects in the GIT Object Store. If you are using GIT and you are not familiar with these concepts.. you should spend some time studying them, and Version Control with GIT is a good resource to have a look at.