Rod Heyd wrote:
Hi Everyone, I've got a little debate that maybe you can all help me settle. I have three tables that I need to join, but there are performance problems with the joins due to some misunderstandings of my predecessor about what's needed to join the tables efficiently. Here's the setup. t1 has a unique key defined on it, call it command_id, t1 has a 1 to many relationship with t2. t2 has t1's command_id as a foreign key constraint, plus a set of one or more instrument_id's. Each command results in one or more instruments taking data. The commanding elements defined here are then set to our instrument to aquirethe data. When the data comes back it is organized into t3 by command_id and instrument_id. So the primary key on t3 looks like this:command_id_instrument_id.
Yuck!
So, now I need to write a query that looks at what was commanded in t1 and t2 and then look for any missing data that has not yet been received. So, I've got a query that looks something like this: SELECT stuff FROM t1 JOIN t2 ON t1.command_id = t2.command_id LEFT JOIN t3 ON t3.data_id = concat(t1.command_id,'_',t2.instrument_id) Now, I think everyone is going to see immediately that the join on t3 is going to have absolutely horrible performance, the join condition here can't take advantage of any indexes since the string function pretty much destroys any hope of that. To make matters worse, the left join is a total killer.
I think the LEFT JOIN isn't that big a deal, if the proper index is there and usable. You can easily find out by comparing the speed of your query below against the speed of the same query without "LEFT".
So my suggestion to solve the performance bottleneck is to add two columns to t3, command_id and instrument_id, and create a combined index on the command_id and instrument_id columns.
Yes, exactly. The combined index should be UNIQUE.
the join condition on t3 then becomes: LEFT JOIN t3 ON t3.command_id = t1.command_id AND t3.instrument_id = t2.instrument_id This works beautifully! The performance of this new join condition is about 480 times faster than the original.
Yes, this is the way this should be done.
Here's the rub. Since there is a unique data_id that already exists which combines the information in both command_id and instrument_id keys, I'm beingtold by our seasoned software developer that I am violating "classic" database design rules against redundant data.
It is true that you now have redundant data. Clearly, you do not need both the two new columns, command_id and instrument_id, and the old column, command_id_instrument_id. Redundant data is a violation of "classic" database design rules, so one or the other has to go if you want to follow the rules.
Which should go? Well, the old column, command_id_instrument_id, combines the answer to two questions, "Which command?" and "Which instrument?", into one column. That is also also a violation of the "classic" database design rules, and a really bad idea. It leads to precisely the sort of problem you are trying to fix. It also makes it difficult to find the rows in t3 which belong to a particular command, or to a particular instrument. Those queries require string matching, and the latter could not use an index. If you want to follow the rules, drop the old column.
In my opinion, this is a minor "violation" at best. We have a good reason for wanting to identify each data segment with our originally defined data_id, but this is not strictly a requirement on the database, it's more of an agreed upon convention that we are going to use to identify the data segments and distribute them to our consumers. From a database stand point, the only requirement is that the data_id be unique. It could be anything as far as the database is concerned, as long as the data_id remains unique, it doesn't matter that it may be overloaded with some "duplicate" information. Any more experienced DBA's than I have an opinion on this?
The strictly correct solution is to replace the old, broken column with your two new columns. You can either make the combined index on the two new columns the PRIMARY KEY, or you can make an AUTO_INCREMENT primary key and define the combination of the two columns as UNIQUE. It is then trivial to
SELECT CONCAT(command_id, '_', instrument_id) ... when you want to display the t3 id using the agreed upon convention.In short, your seasoned software developer is right to want to follow the rules, but the rules dictate replacing the old column with the two new columns.
Thanks!
Michael -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/suppressed
Mail converted by mhonarc 2.6.15
This archive provided courtesy of JSW4.NET, Internet Hosting Services for Small Business.