I’ve been playing around with SQL Server 2008 CTP, exploring the benefits of the new HIERARCHYID datatype, which has been designed to efficiently store depth-first tree structures in SQL Server 2008.
My requirements were to store URI paths such as ones you might find in URLs, whereby each path component is held as a separate record in a table. Each path component refers to a folder in file system, e.g.
URL: http://www.vamosa.com/index/information_solutions/technology_and_products/vamosa_content_migrator.htm URI: path: /index/information_solutions/technology_and_products/vamosa_content_migrator.htm
So traditionally in the database I would end up with parent/child relationship as such
id label parent_id
0 NULL NULL <--- root element
1 index 0
2 information_solutions 1
3 technology_and_products 2
4 vamosa_content_migration.htm 3
HIERARCHYID allows you to perform all sorts of fast hiearchical queries without having to wander up and down the parent/child foreign key relationship. Those are the benefits, however there’s a problem…
At the time of writing the .NET Framework 3.5′s System.Data.SqlTypes namespace does not provide any support for HIERARCHID datatypes, e.g. SqlHierarchId as a class is missing, meaning that if you want to work with the new type you’ll need hide it from your .NET code using stored procedures, and that’s the approach I’ve taken in this post.
The other thing to be aware of is that the HIERARCHYID column is not a foreign key to a record further up the tree. The HIERARCHYID column stores position of the current record within the tree, e.g.
id label hierarchyid_field.ToString() 0 NULL / <--- root element 1 index /1/ 2 information_solutions /1/1/ 3 technology_and_products /1/1/1/ 4 vamosa_content_migrator.htm /1/1/1/1/
The table I’ve created is called URI and is defined as follows:
CREATE TABLE dbo.URI
(
Id uniqueidentifier NOT NULL,
Label nvarchar(256) NULL,
UriHID hierarchyid NOT NULL, -- the magical new datatype
CONSTRAINT [PK_URI] PRIMARY KEY
)
I want to be able call my stored proc as follows:
MyLINQDataContext db = new MyLINQDataContext(); // a previously defined context via .dbml file
System.Guid? uri_id=null; // variable to capture the URI.Id of the last node inserted
db.InsertURI("/products/vcm/index.html", ref uri_id); // call to stored proc
Console.Out.Writeline( uri_id ); // doing something useful with the return result
Stored procedures aren’t my strongest point by a long stretch so feel free to comment and improve the code, but here’s the InsertURI stored proc:
CREATE Procedure InsertURI
@uri nvarchar(4000),
@uri_id uniqueidentifier OUTPUT
AS
declare @root_uri_id uniqueidentifier
declare @path_remainder nvarchar(4000)
declare @pos_slash int
declare @parent_uri_hid HIERARCHYID
set @parent_uri_hid = HIERARCHYID::GetRoot()
-- ensure the root exists
select @root_uri_id = u.id from URI u where u.label is null
if @root_uri_id is null
insert into URI values (NEWID(), NULL, @parent_uri_hid)
set @path_remainder = @uri
set @pos_slash = charindex( '/', @path_remainder )
while (len(@path_remainder) > 0)
begin
declare @next_slash int
declare @uri_label nvarchar(256)
declare @current_uri_hid HIERARCHYID
set @next_slash = charindex( '/', @path_remainder, @pos_slash+1 )
-- determine the next label in the sequence of depth-first node names
if @next_slash > 0
begin
set @uri_label = SUBSTRING( @path_remainder, @pos_slash+1, @next_slash - @pos_slash - 1 )
set @path_remainder = substring( @path_remainder, @next_slash, len(@path_remainder) )
end
else
begin
set @uri_label = SUBSTRING( @path_remainder, @pos_slash+1, len(@path_remainder) )
set @path_remainder = ''
end
set @pos_slash = charindex( '/', @path_remainder )
-- determine if current @uri_label exists as child of @parent_uri_hid
set @current_uri_hid = NULL
select @current_uri_hid = u.UriHID from URI u where u.UriHID.GetAncestor(1) = @parent_uri_hid and u.Label=@uri_label
if @current_uri_hid is null
begin
-- the label doesn't exist as a child, hence create it
-- first determine the new hierarchyid - new node will be last in row of siblings
declare @last_child_uri_hid HIERARCHYID
SELECT @last_child_uri_hid = MAX(UriHID) FROM URI u WHERE u.UriHID.GetAncestor(1) = @parent_uri_hid
set @current_uri_hid = @parent_uri_hid.GetDescendant(@last_child_uri_hid, NULL)
set @uri_id = NEWID()
insert into uri values (@uri_id, @uri_label, @current_uri_hid )
end
set @parent_uri_hid = @current_uri_hid
end
The stored proc takes the @uri string and chops it using the ‘/’-separator (orange highlighted), looping through each element in the path (red brown highlighted), first ‘index’ then ‘information_solutions’, etc. For each element it will check wether or not the element exists (red highlighted code), creating it if necessary (purple highlighted). The position of the new node will always be on the ‘far right’ of its siblings (green highlighted code).