I’ve been playing around with SQL Server 2008 CTP, exploring the benefits of the new HIERARCHYID datatype, which has been designed to efficiently store depth-first tree structures in SQL Server 2008.
My requirements were to store URI paths such as ones you might find in URLs, whereby each path component is held as a separate record in a table. Each path component refers to a folder in file system, e.g.
URL: http://www.vamosa.com/index/information_solutions/technology_and_products/vamosa_content_migrator.htm URI: path: /index/information_solutions/technology_and_products/vamosa_content_migrator.htm
So traditionally in the database I would end up with parent/child relationship as such
id label parent_id
0 NULL NULL <--- root element
1 index 0
2 information_solutions 1
3 technology_and_products 2
4 vamosa_content_migration.htm 3
HIERARCHYID allows you to perform all sorts of fast hiearchical queries without having to wander up and down the parent/child foreign key relationship. Those are the benefits, however there’s a problem…
At the time of writing the .NET Framework 3.5’s System.Data.SqlTypes namespace does not provide any support for HIERARCHID datatypes, e.g. SqlHierarchId as a class is missing, meaning that if you want to work with the new type you’ll need hide it from your .NET code using stored procedures, and that’s the approach I’ve taken in this post.
The other thing to be aware of is that the HIERARCHYID column is not a foreign key to a record further up the tree. The HIERARCHYID column stores position of the current record within the tree, e.g.
id label hierarchyid_field.ToString() 0 NULL / <--- root element 1 index /1/ 2 information_solutions /1/1/ 3 technology_and_products /1/1/1/ 4 vamosa_content_migrator.htm /1/1/1/1/
The table I’ve created is called URI and is defined as follows:
1 2 3 4 5 6 7 | CREATE TABLE dbo.URI ( Id uniqueidentifier NOT NULL, Label nvarchar(256) NULL, UriHID hierarchyid NOT NULL, -- the magical new datatype CONSTRAINT [PK_URI] PRIMARY KEY ) |
I want to be able call my stored proc as follows:
1 2 3 4 | MyLINQDataContext db = new MyLINQDataContext(); // a previously defined context via .dbml file System.Guid? uri_id=null; // variable to capture the URI.Id of the last node inserted db.InsertURI("/products/vcm/index.html", ref uri_id); // call to stored proc Console.Out.Writeline( uri_id ); // doing something useful with the return result |
Stored procedures aren’t my strongest point by a long stretch so feel free to comment and improve the code, but here’s the InsertURI stored proc:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | CREATE Procedure InsertURI @uri nvarchar(4000), @uri_id uniqueidentifier OUTPUT AS declare @root_uri_id uniqueidentifier declare @path_remainder nvarchar(4000) declare @pos_slash int declare @parent_uri_hid HIERARCHYID SET @parent_uri_hid = HIERARCHYID::GetRoot() -- ensure the root exists SELECT @root_uri_id = u.id FROM URI u WHERE u.label IS NULL IF @root_uri_id IS NULL INSERT INTO URI VALUES (NEWID(), NULL, @parent_uri_hid) SET @path_remainder = @uri SET @pos_slash = charindex( '/', @path_remainder ) while (len(@path_remainder) > 0) begin declare @next_slash int declare @uri_label nvarchar(256) declare @current_uri_hid HIERARCHYID SET @next_slash = charindex( '/', @path_remainder, @pos_slash+1 ) -- determine the next label in the sequence of depth-first node names IF @next_slash > 0 begin SET @uri_label = SUBSTRING( @path_remainder, @pos_slash+1, @next_slash - @pos_slash - 1 ) SET @path_remainder = substring( @path_remainder, @next_slash, len(@path_remainder) ) end else begin SET @uri_label = SUBSTRING( @path_remainder, @pos_slash+1, len(@path_remainder) ) SET @path_remainder = '' end SET @pos_slash = charindex( '/', @path_remainder ) -- determine if current @uri_label exists as child of @parent_uri_hid SET @current_uri_hid = NULL SELECT @current_uri_hid = u.UriHID FROM URI u WHERE u.UriHID.GetAncestor(1) = @parent_uri_hid AND u.Label=@uri_label IF @current_uri_hid IS NULL begin -- the label doesn't exist as a child, hence create it -- first determine the new hierarchyid - new node will be last in row of siblings declare @last_child_uri_hid HIERARCHYID SELECT @last_child_uri_hid = MAX(UriHID) FROM URI u WHERE u.UriHID.GetAncestor(1) = @parent_uri_hid SET @current_uri_hid = @parent_uri_hid.GetDescendant(@last_child_uri_hid, NULL) SET @uri_id = NEWID() INSERT INTO uri VALUES (@uri_id, @uri_label, @current_uri_hid ) end SET @parent_uri_hid = @current_uri_hid end |
The stored proc takes the @uri string and chops it using the ‘/’-separator (orange highlighted), looping through each element in the path (red brown highlighted), first ‘index’ then ‘information_solutions’, etc. For each element it will check wether or not the element exists (red highlighted code), creating it if necessary (purple highlighted). The position of the new node will always be on the ‘far right’ of its siblings (green highlighted code).