|
ApexSQL News Product announcements, promotions, discussions, and other ApexSQL Tools news SQL Server 2005 Still has a Dependency Problem One of the new features the has been ported to the engines of pretty much all new 2005x versions of Apex SQL Tools is our new, proprietary dependency parser. One unpleasant surprise with SQL Server 2005 is that it STILL (after 5 years) fails to adequetly handle dependencies. So fixing this for SQL 2005 got tacked onto the end of our dev cycle. This conversation grew out of a request from our support guy, Chuck, to Kamen, our new architect, to better understand dependency parsing, how it works, why we need it and how it might help customers get more accurrate results. I figured it would be good grist for the blog, and I wanted to recycle the effort (especially after we hit 500 words!) One of the thorniest issues DB developers always face is handling object relationships. Keeping your data consistent, and the various bits well tied to each other is among the highest priorities of any relational database. Otherwise we’d probably be using a bunch of linear tables... And “Database Theory 101” would be a single-week course in college. Fortunately (or not) things are much more complex in the real world. Where there are relations between data items, there are also dependency issues. We all know about those – even in the simplest order taking database, you can’t really delete a customer record without also deleting their order records... And this is not always desirable. Logical interconnections in any non-trivial database can – and do – get very complex, and it is easy to make a change in database object X, and thus break Y and Z even though you didn’t touch them at all. So DBAs and developers everywhere have to deal with the problem. In a typical enterprise-level database, touching a column in a single table often ends up affecting fifteen other tables and views, not to mention stored procedures and report quieries. This is where dependency handling comes in. MSSQL is pretty good at handling dependencies - _when_ you issue a DDL or DML statement against it. It will always tell you – and reject the statement – if anything bad is going to happen. Trouble is, you don’t really care for it rejecting your query (or DDL update); you want it to just work, moving the schema from state A to point B while also preserving what data is already there. ApexSQL Diff is all about getting your schema, and/or your data, from state A to state B – and doing so quickly, efficiently and without significant hair loss. That’s why we decided to incorporate robust dependency handling in our complete product line (several other ApexSQL products also benefit from this). Most of you will be familiar with the sysdepends table – and with its shortcomings. It doesn’t carry any foreign key relationship information, any column-level information, and it has a few – how to put it – “known issues” in certain common situations. Many DBAs won’t touch sysdepends with a 5-foot pole when researching potential schema changes. Instead, they’ll generate a full database script and chase bits and pieces information with a their favorite editor’s search function. It’s not difficult to spot the huge potential for mistakes, and the loss of time and money doing this. SQL Server 2005 made some improvements to this process, but not nearly as much as real-world engineers really need. It can now show you object relationships bidirectionally, right from within the Management Console... But it still can’t process column-level information, and has weaknesses for PK / FK relationships, and schema changes. What it basically does under the covers is analyze your database objects at the moment you CREATE or ALTER them – but doesn’t keep up with this analysis when other objects changes. The net result is, whenever things in your database change, users are often back at square one – dependency information is outdated or incomplete. Here at ApexSQL we decided to invest in building SqlDep - our own dependency analysis engine for MSSQL databases. It’s been in the making for a long time, and it proved to be a particularly research-intensive and hi-tech process. The ultimate benefits, though, are substantial: the new wave of our products can now intelligently handle complex database structures better (and way faster) than a human, and can – and will – try to intelligently arrange things in situations that don’t have a 100% correct solution – like having circular dependencies in the object graph, commonly caused by two-way links between tables and views, stored procedures using each other, and others. Specifically for ApexSQL Diff, dependency parsing is used to determine the extra objects that need to be ‘maintained’ when producing a structure synchronization script for a given set of ‘input’ objects that have to be ‘equalized’ across two databases (or, very often, two ‘versions’ of the same database – like Dev and Production) There are no gimmicks or any ‘intermediate data’ that needs to be maintained for SqlDep to work (like using sysdepends). It’s done the way a competent DBA would do it: generate the creation script for every single structural object, inspect it, and deduce any parent/child links between the object and the rest of the database. This involves parsing (or “breaking up and understanding the meaning of”) every single valid Transact-SQL construct. To give you a feeling of the complexity of the grammar, consider this: the number of language constructs in it is several times larger than those in a full-blown Java compiler’s parsing subsystem. In other words, the process looks like this: · Read in the full metadata about a database, including all ‘creation scripts’ for all objects (not only source-level ones such as SPs and user functions, but also everything else) · Parse the T-SQL script for every separate object – just like MSSQL does when you feed it a database script. · Extract all ‘referenced’ objects from the parsed script (every object A has to mention object B somewhere in order to be dependent on B; there is only one exception to that rule and it’s MSSQL 2005’s WITH SCHEMABINDING reverse-order dependency) · Build a graph with all this information and decide in what order to create, drop or alter things, what to detach and reattach in the script (like bound defaults, rules, indexes and such), and how to unfold circular dependencies. The core piece of technology here is the T-SQL parser. T-SQL is an extremely complex superset of ANSI SQL; the only thing comparable in complexity in the database world is Oracle’s PL/SQL. Just ‘scanning’ the pieces of script that make up a MSSQL DB would not help us know which object actually references which. Thus, the parser has some good semantic knowledge on what every single 2000/2005 T-SQL construct and statement means. By knowing the meaning of things in the script, it can understand, for example, whether QZObject refers to a table or view, to an xml schema collection, whether it’s used as a SELECT statement source data, DELETE statement target, or an administrative (e.g. DBCC or ALTER) statement’s target or parameter. SqlDep can determine not only that SProcA depends on TableB and ViewC, but also which columns from B and C are used in A. Thus, it would be possible to drop a column from B and know whether this can possibly affect A or not (after all, A might use a subset of B’s columns). This is what we call ‘column-level dependencies’ – again, a feature not available with sysdepends, either in SQL Server 2000 or 2005. The parser can also be used to ‘rewrite’ object references, e.g. regenerate a SProc’s creation script while preserving all formatting, logic and sequence, but ‘replacing’ references to schemaA.objectB with references to schemaC.objectD – this gives us tremendous potential to move, rename and generally refactor things around in our upcoming products. SqlDep can build ‘execution trees’ for suitable object types – sprocs, user functions and triggers. This feature can be used, for example, in a SQL editor: select ‘view code tree’ or similar for a stored procedure, and the editor shows you a boxed tree with all nested statements that make up the procedure, while putting stuff in branches for IF/THEN/ELSE, FOR and so on. The parser is also a great foundation for all kinds of source analysis features to be built later on; while parsing T-SQL, it could build a list of all variable names used in a SProc, gather all locations where a server-side CURSOR is used, determine which table/view columns don’t seem to be used anywhere (except, of course, with non-MSSQL queries), and so on. The topmost ‘level’ of SqlDep deals with ‘object ordering’. In a nutshell, it deals with establishing the proper order in which individual DB objects should be created or altered when syncing the structures of two databases. This may look simple but is in fact highly non-trivial. Very often DB schemas will have circular dependencies (so the dependency chart of a database is actually a graph, not a tree). Circles have to be ‘unfolded’ somewhere when generating script, and there is no hard and fast rule on where to do this – it’s mostly empirical knowledge. So unlike some other dependency analysis tools, we can not only tell when something’s too convoluted, but can also make an intelligent guess at how to go about tying all the loose ends. Get your developers the tools they need to do the job fast and right - ApexSQL Develper Studio is the ultimate combat multiplier for SQL Developers. 7 Best of Class tools - one download, install and discounted price. Click Here for more info. Labels: Industry News |
© 2010 Apex SQL Tools All Rights Reserved | 1.919.968.8444 | Contact Us | Terms of Use | Privacy Policy
