Storing URIs in SQL Server 2008 using the HIERARCHYID datatype and LINQ-To-SQL

I’ve been playing around with SQL Server 2008 CTP, exploring the benefits of the new HIERARCHYID datatype, which has been designed to efficiently store depth-first tree structures in SQL Server 2008.

My requirements were to store URI paths such as ones you might find in URLs, whereby each path component is held as a separate record in a table. Each path component refers to a folder in file system, e.g.

URL: http://www.vamosa.com/index/information_solutions/technology_and_products/vamosa_content_migrator.htm
URI: path: /index/information_solutions/technology_and_products/vamosa_content_migrator.htm

So traditionally in the database I would end up with parent/child relationship as such

id 	label					parent_id
0	NULL					NULL		<--- root element
1	index					0
2	information_solutions			1
3	technology_and_products			2
4	vamosa_content_migration.htm		3

HIERARCHYID allows you to perform all sorts of fast hiearchical queries without having to wander up and down the parent/child foreign key relationship. Those are the benefits, however there’s a problem…

At the time of writing the .NET Framework 3.5’s System.Data.SqlTypes namespace does not provide any support for HIERARCHID datatypes, e.g. SqlHierarchId as a class is missing, meaning that if you want to work with the new type you’ll need hide it from your .NET code using stored procedures, and that’s the approach I’ve taken in this post.

The other thing to be aware of is that the HIERARCHYID column is not a foreign key to a record further up the tree. The HIERARCHYID column stores position of the current record within the tree, e.g.

id  	label					hierarchyid_field.ToString()
0	NULL					/		<--- root element
1	index					/1/
2	information_solutions			/1/1/
3	technology_and_products			/1/1/1/
4	vamosa_content_migrator.htm		/1/1/1/1/

The table I’ve created is called URI and is defined as follows:

1
2
3
4
5
6
7
CREATE TABLE dbo.URI
(
	Id uniqueidentifier NOT NULL,
	Label nvarchar(256) NULL,
	UriHID hierarchyid NOT NULL,           -- the magical new datatype
	CONSTRAINT [PK_URI] PRIMARY KEY
)

I want to be able call my stored proc as follows:

1
2
3
4
  MyLINQDataContext db = new MyLINQDataContext(); // a previously defined context via .dbml file
  System.Guid? uri_id=null;			  // variable to capture the URI.Id of the last node inserted
  db.InsertURI("/products/vcm/index.html", ref uri_id);	// call to stored proc
  Console.Out.Writeline( uri_id );		 // doing something useful with the return result

Stored procedures aren’t my strongest point by a long stretch so feel free to comment and improve the code, but here’s the InsertURI stored proc:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
CREATE Procedure InsertURI
	@uri nvarchar(4000),
	@uri_id uniqueidentifier OUTPUT
AS
	declare @root_uri_id uniqueidentifier
	declare @path_remainder nvarchar(4000)
	declare @pos_slash int
	declare @parent_uri_hid HIERARCHYID
 
	SET @parent_uri_hid = HIERARCHYID::GetRoot()
 
	-- ensure the root exists
	SELECT @root_uri_id = u.id FROM URI u WHERE u.label IS NULL
	IF @root_uri_id IS NULL
		INSERT INTO URI VALUES (NEWID(), NULL, @parent_uri_hid)
 
	SET @path_remainder = @uri
	SET @pos_slash = charindex( '/', @path_remainder )
	while (len(@path_remainder) > 0)
	begin
		declare @next_slash int
		declare @uri_label nvarchar(256)
		declare @current_uri_hid HIERARCHYID
		SET @next_slash = charindex( '/', @path_remainder, @pos_slash+1 )
 
		-- determine the next label in the sequence of depth-first node names
		IF @next_slash > 0
		begin
			SET @uri_label = SUBSTRING( @path_remainder, @pos_slash+1, @next_slash - @pos_slash - 1 )
			SET @path_remainder = substring( @path_remainder, @next_slash, len(@path_remainder) )
		end
		else
		begin
			SET @uri_label = SUBSTRING( @path_remainder, @pos_slash+1, len(@path_remainder) )
			SET @path_remainder = ''
		end
		SET @pos_slash = charindex( '/', @path_remainder )
		-- determine if current @uri_label exists as child of @parent_uri_hid
		SET @current_uri_hid = NULL
		SELECT @current_uri_hid = u.UriHID FROM URI u WHERE u.UriHID.GetAncestor(1) = @parent_uri_hid AND u.Label=@uri_label
		IF @current_uri_hid IS NULL
		begin
			-- the label doesn't exist as a child, hence create it
			-- first determine the new hierarchyid - new node will be last in row of siblings
			declare @last_child_uri_hid HIERARCHYID
			SELECT @last_child_uri_hid = MAX(UriHID) FROM URI u WHERE u.UriHID.GetAncestor(1) = @parent_uri_hid
			SET @current_uri_hid = @parent_uri_hid.GetDescendant(@last_child_uri_hid, NULL)
 
			SET @uri_id = NEWID()
			INSERT INTO uri VALUES (@uri_id, @uri_label, @current_uri_hid )
		end
		SET @parent_uri_hid = @current_uri_hid
	end

The stored proc takes the @uri string and chops it using the ‘/’-separator (orange highlighted), looping through each element in the path (red brown highlighted), first ‘index’ then ‘information_solutions’, etc. For each element it will check wether or not the element exists (red highlighted code), creating it if necessary (purple highlighted). The position of the new node will always be on the ‘far right’ of its siblings (green highlighted code).
 

 



Leave a Reply