Don’t you hate those days when you think you’ve seen and heard it all only to find out that there is one more thing that is going to make you want to poke your eyes out? Well, today I went past poking my eyes out as the only remedy. Now that I’m thinking about this again, I’m regretting having a scotch at the desk with me. I’m seriously considering trying to wash away the image burned into my retinas by dousing my eyes with said scotch. If that doesn’t work I might have to carve out my eyes and dig around in the empty sockets with a rusty ice pick trying to destroy all the optical nerves. Anyways, enough drama.
Surrogate keys. I’ve worked in a number of databases that have these and in those experiences I’ve used a number of different schemes to generate unique values. One system used an Identity field, another used GUIDs and still another used a managing table that incremented the value every time you requested a new one from it. As bad as that last one sounds, it works fairly well for a system with a small number of concurrent transactions being processed. I probably wouldn’t use the last one again simply because of the scalability issues (I can always dream that my systems will become big one day), but, in all it’s ugliness, it will work. Today I saw uglier.
So this afternoon I’m pawing through a bunch of stored procedures (apparently our timestamps aren’t really timestamps…another story for another day) and I stumble on this block of code where we’re generating a new surrogate key value. After my initial and immediate shock, I had to start asking other people on the team what they knew as being our standard way to generate these values. The first guy I ask confirms what I’ve just seen on the screen. The second guy says “Well…..” to which I respond “Please don’t tell me it’s what I think it is.” and receive a laugh and nod of confirmation. Believing that the first guy burned out all his cognitive capabilities on beer at our last team pub night and the second guy spent too many years exposed to the searing Caribbean sun, I went off to developer number 3 in search of the answer I so dearly wanted to hear. Once again the code on the screen was confirmed to be correct.
It’s at this point that I thought that I had two things left to do. First I needed to submit my inaugural code snippet to The Daily WTF and then I needed to buy a new bottle of scotch (the second part of this has become a code related theme this week). After some second thoughts I decided who really reads The Daily WTF anyways? If you want perversion you go this Edmonton blog and if you want curious information technology you come to the Igloo. So I decided to post it here (there also was the fact I hadn’t posted for quite a while too, but honestly that was down on the list).
I introduce you to my pseudo code version of the standard way that we are told to get surrogate keys.
Do some stuff…
Restart Loop:
Loop
Get Random Number and multiply by 1 billion
Insert to database with value above as the surrogate key
On error go to Restart Loop
End Loop
Carry on doing more stuff…
Yes folks, we are generating a random number and testing it’s uniqueness by inserting to the database and responding to key violations. This would be great if you had a hundred billion available key values and only expected a few thousand entries in the table. Unfortunately though, the more and more entries that are successfully added to the table, the more and more probably it becomes that you will encounter a collision and thus an error. There’s also the fact that more collisions will slow the system down.
The scary thing to me is that this is the enterprise standard for generating surrogate keys and we’re not dealing with a tiny data set in this company. I don’t know how close to being truly random the DB2 rand command is, but even if it is great no computer generated random number generator is truly random. Maybe I’ll be lucky and this won’t become a problem during my tenure. Chances are it will though, that’s just the way it works for me.