I spent some time playing around with different data modeling tools. MS Visio 2007 Professional does a nice job of reverse-engineering all database objects from SQL Server. Unfortunately, to forward-engineer any changes requires the Enterprise edition, which I can’t get at work: so I won’t be using Visio for data modeling. However, the small report it provides at the end of the reverse-engineering process provides some interesting and unexpected insights into my database:
Tables reverse engineered : 46
Check Clauses Reverse Engineered : 12
Primary Keys reverse engineered : 32
Foreign Keys reverse engineered : 25
Indexes reverse engineered : 7
Triggers reverse engineered : 4
Views Reverse Engineered : 3
Procedures reverse engineered : 147
The first thing I notice is that not all of my tables have primary keys. What’s going on?
Looking through the Visio diagram for tables without PK’s, most of them were used for previous data transformation processes and then left there “just in case” I might need them again soon after. These I can safely clean out; much like folders in a GTD system that are no longer needed. On the other hand, a small number are being used for “behind the scenes” application/data debugging and monitoring. These should have primary keys.
I’m surprised by the low number of indexes. That is something worth investigating soon.
I could probably make better use of triggers. There are only 4 – and each of them is very important to the system. For example, one prevents the deletion of any more than one record at a time from the inventory table if the command doesn’t come from our inventory management application. For example, in the early days of the project, a technical manager had admin access to the database but wasn’t very good at SQL and almost wiped out all inventory records – with no backups available! Another trigger logs every change (including a delete) in the inventory_audit table. This is a key part of our data quality procedures – that everything done to an asset is traceable.
I’m happy to see some use of check constraints, although I have a feeling that they are clustered in just 2 or 3 tables; this, too, is worth examining. Joe Celko points out that “Column constraints are also one of the most underused features of SQL, so you will look like a real wizard if you can master them.” (Celko, p.10) [1] – and I’ve got a few that are really pretty cool and go well beyond simple syntax.
Finally, and most surprising, is the almost 150 stored procedures and user defined functions. The most shocking aspect of this large number is that none of these objects are under version control. That is an important deficiency to fix soon. I’m currently searching for a good tool will easily link database objects with Subversion.
[1] Celko, Joe. SQL For Smarties: Advanced SQL Programming. 3rd Ed. New York: Morgan Kaufman Publishers. 2005