<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/2.2.1" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>About Sql Server</title>
	<link>http://aboutsqlserver.com</link>
	<description>Database design and development with Microsoft Sql Server</description>
	<pubDate>Wed, 22 Feb 2012 22:59:52 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.2.1</generator>
	<language>en</language>
			<item>
		<title>Store Custom Fields/Attributes in Microsoft SQL Server Database (Part 2 - Name/Value pairs)</title>
		<link>http://aboutsqlserver.com/2012/02/22/store-custom-fieldsattributes-in-microsoft-sql-server-database-part-2-namevalue-pairs/</link>
		<comments>http://aboutsqlserver.com/2012/02/22/store-custom-fieldsattributes-in-microsoft-sql-server-database-part-2-namevalue-pairs/#comments</comments>
		<pubDate>Wed, 22 Feb 2012 22:58:33 +0000</pubDate>
		<dc:creator>Dmitri Korotkevitch</dc:creator>
		
		<category><![CDATA[SQL Server 2005]]></category>

		<category><![CDATA[SQL Server 2008]]></category>

		<category><![CDATA[T-SQL]]></category>

		<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://aboutsqlserver.com/2012/02/22/store-custom-fieldsattributes-in-microsoft-sql-server-database-part-2-namevalue-pairs/</guid>
		<description><![CDATA[Last time we discussed 2 design patterns that can be used when you store custom attributes in SQL Server database. Today I&#8217;d like to talk about another pattern known as Name/Value pairs and sometimes called as Entity-Attribute-Values.
This pattern is very old and well known. Something like that (click on the image to open it in the [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://aboutsqlserver.com/2012/02/01/store-custom-fieldsattributes-in-microsoft-sql-server-database-part-1/" target="_blank">Last time</a> we discussed 2 design patterns that can be used when you store custom attributes in SQL Server database. Today I&#8217;d like to talk about another pattern known as Name/Value pairs and sometimes called as Entity-Attribute-Values.</p>
<p>This pattern is very old and well known. Something like that (click on the image to open it in the new window):</p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic0.png" target="_blank"></a><a href="http://dwkor.net/blog/2012-02-16/pic0.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic0.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic0.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic0.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic0.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic0.png" target="_blank"></p>
<p style="display: inline ! important"><img src="http://dwkor.net/blog/2012-02-16/pic0.png" onmouseout="undefined" onmouseover="undefined" title="undefined" height="418" width="368" /></p>
<p></a><a href="http://dwkor.net/blog/2012-02-16/pic0.png" target="_blank"></a><a href="http://dwkor.net/blog/2012-02-16/pic0.png" target="_blank"></a><a href="http://dwkor.net/blog/2012-02-16/pic0.png" target="_blank"></a><a href="http://dwkor.net/blog/2012-02-16/pic0.png" target="_blank"></a><a href="http://dwkor.net/blog/2012-02-16/pic0.png" target="_blank"></a></p>
<p>I&#8217;m pretty sure that 75% of developers tried to use it in one way or another. I&#8217;ve seen quite a few different implementations. I even saw the implementation where entire database consisted of just 2 tables: Objects, with 2 columns - ID and ObjectType, and Attributes - similarly to what we saw above with exception that value was the string (it was prior to sql_variant days). And system even worked - kind of, in development and QA. Funniest thing - that system had even been sold and first customer was a wholesales company that replaced their existing point-of-sale system. Poor customers and happy consultants who were hired to &#8220;solve&#8221; the problem.. <img src='http://aboutsqlserver.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>There is one killing factor though - you cannot store more than 8000 bytes in sql_variant. So (max) data types cannot be supported. If this is not the deal breaker - the design looks very flexible (and in fact it is). The general problem here is the attribute access cost. Classic approach produces 1 join per attribute. Something like that:</p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic1.png" target="_blank"></a><a href="http://dwkor.net/blog/2012-02-16/pic1.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic1.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic1.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic1.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic1.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic1.png" target="_blank"></p>
<p style="display: inline ! important"><img src="http://dwkor.net/blog/2012-02-16/pic1.png" onmouseout="undefined" onmouseover="undefined" title="undefined" height="271" width="412" /></p>
<p></a><a href="http://dwkor.net/blog/2012-02-16/pic1.png" target="_blank"></a><a href="http://dwkor.net/blog/2012-02-16/pic1.png" target="_blank"></a><a href="http://dwkor.net/blog/2012-02-16/pic1.png" target="_blank"></a><a href="http://dwkor.net/blog/2012-02-16/pic1.png" target="_blank"></a><a href="http://dwkor.net/blog/2012-02-16/pic1.png" target="_blank"></a></p>
<p>Could be inner join, could be outer - depends on the situation and design. But besides a lot of joins there is another problem. Developers are lazy. Everytime they need to write the statement like that, they would use cut and paste (see the nice red undeline above). And you can imagine amount of errors it could introduce.</p>
<p>Of course, when we talk about client application, we can select all attributes to the client as the rowset and pivot (remember this magic word) data there:</p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic2.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic2.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic2.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic2.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic2.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic2.png" target="_blank"></p>
<p style="display: inline ! important"><img src="http://dwkor.net/blog/2012-02-16/pic2.png" onmouseout="undefined" onmouseover="undefined" title="undefined" height="115" width="261" /></p>
<p></a><a href="http://dwkor.net/blog/2012-02-16/pic2.png" target="_blank"></a><a href="http://dwkor.net/blog/2012-02-16/pic2.png" target="_blank"></a><a href="http://dwkor.net/blog/2012-02-16/pic2.png" target="_blank"></a><a href="http://dwkor.net/blog/2012-02-16/pic2.png" target="_blank"></a><a href="http://dwkor.net/blog/2012-02-16/pic2.png" target="_blank"></a></p>
<p>Unfortunately that would not solve the problem when we need to sort/page/filter by the attributes nor, more importantly, help us with reports. And customers demand reports.</p>
<p>I&#8217;m not going to analyze  that approach based on criteria we specified. I&#8217;ll show you the examples how attribute access cost in the implementation based on joins kills that. But there is another way. With SQL 2005 and above, you can use PIVOT which is part of T-SQL. So let&#8217;s take a look. First, let&#8217;s create Articles and ArticleAttributes table</p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic3.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic3.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic3.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic3.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic3.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic3.png" target="_blank"></p>
<p style="display: inline ! important"><img src="http://dwkor.net/blog/2012-02-16/pic3.png" onmouseout="undefined" onmouseover="undefined" title="undefined" height="947" width="452" /></p>
<p></a><a href="http://dwkor.net/blog/2012-02-16/pic3.png" target="_blank"></a><a href="http://dwkor.net/blog/2012-02-16/pic3.png" target="_blank"></a><a href="http://dwkor.net/blog/2012-02-16/pic3.png" target="_blank"></a><a href="http://dwkor.net/blog/2012-02-16/pic3.png" target="_blank"></a><a href="http://dwkor.net/blog/2012-02-16/pic3.png" target="_blank"></a></p>
<p>As you probably noticed, I replaced Attribute Name with Index. This will add a little bit more complexity to the code but same time it saves us storage space. We are going to save a lot of records in that table. And that&#8217;s usual &#8220;It depends&#8221; question - is additional complexity worth that. There is also very good idea to keep some kind of &#8220;Metadata&#8221; table that stores some information about attributes and types. This is essential in case if you store attribute indexes but it also helps even in case if you store Attribute Names.</p>
<p>Now let&#8217;s populate it with the data</p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic3a.png" target="_blank"><img src="http://dwkor.net/blog/2012-02-16/pic3a.png" onmouseout="undefined" onmouseover="undefined" title="undefined" height="864" width="597" /> </a></p>
<p>Let&#8217;s enable IO statistics and execution plan and see how it behaves when we need to access the data. First - classic approach with Joins:</p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic4.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic4.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic4.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic4.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic4.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic4.png" target="_blank"></p>
<p style="display: inline ! important">&nbsp;</p>
<p></a><a href="http://dwkor.net/blog/2012-02-16/pic4.png" target="_blank"></a><a href="http://dwkor.net/blog/2012-02-16/pic4.png" target="_blank"></a><a href="http://dwkor.net/blog/2012-02-16/pic4.png" target="_blank"></a><a href="http://dwkor.net/blog/2012-02-16/pic4.png" target="_blank"></a><a href="http://dwkor.net/blog/2012-02-16/pic4.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic4.png" target="_blank"> </a><a href="http://dwkor.net/blog/2012-02-16/pic4.png" target="_blank"></a></p>
<p style="display: inline ! important"><a href="http://dwkor.net/blog/2012-02-16/pic4.png" target="_blank"><img src="http://dwkor.net/blog/2012-02-16/pic4.png" onmouseout="undefined" onmouseover="undefined" title="undefined" height="946" width="513" /></a></p>
<p style="display: inline ! important"><a href="http://dwkor.net/blog/2012-02-16/pic4.png" target="_blank"> </a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic4.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic4.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic4.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic4.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic4.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic4.png" target="_blank"></p>
<p style="display: inline ! important">&nbsp;</p>
<p style="display: inline ! important">&nbsp;</p>
<p></a><a href="http://dwkor.net/blog/2012-02-16/pic4.png" target="_blank"></a><a href="http://dwkor.net/blog/2012-02-16/pic4.png" target="_blank"></a><a href="http://dwkor.net/blog/2012-02-16/pic4.png" target="_blank"></a><a href="http://dwkor.net/blog/2012-02-16/pic4.png" target="_blank"></a><a href="http://dwkor.net/blog/2012-02-16/pic4.png" target="_blank"></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic4.png" target="_blank"> </a></p>
<p style="display: inline ! important"><a href="http://dwkor.net/blog/2012-02-16/pic5.png" target="_blank"><img src="http://dwkor.net/blog/2012-02-16/pic5.png" onmouseout="undefined" onmouseover="undefined" title="undefined" height="66" width="1007" /></a></p>
<p style="display: inline ! important"><a href="http://dwkor.net/blog/2012-02-16/pic5.png" target="_blank"> </a></p>
<p style="display: inline ! important"><a href="http://dwkor.net/blog/2012-02-16/pic5.png" target="_blank"> </a></p>
<p>As you can see this introduces the plan with a lot of joins and quite a lot of IO. Now let&#8217;s try to reshape the query to use PIVOT.</p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic6.png" target="_blank"><img src="http://dwkor.net/blog/2012-02-16/pic6.png" onmouseout="undefined" onmouseover="undefined" title="undefined" height="783" width="579" /></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic7.png" target="_blank"><img src="http://dwkor.net/blog/2012-02-16/pic7.png" onmouseout="undefined" onmouseover="undefined" title="undefined" height="153" width="457" /></a></p>
<p>As you can see - that&#8217;s far far better. You can play with the shape of the query if you want to change execution plan - for example approach below gives you nested loop instead of merge join.</p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic8.png" target="_blank"><img src="http://dwkor.net/blog/2012-02-16/pic8.png" onmouseout="undefined" onmouseover="undefined" title="undefined" height="719" width="599" /></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic9.png"><img src="http://dwkor.net/blog/2012-02-16/pic9.png" onmouseout="undefined" onmouseover="undefined" title="undefined" height="145" width="437" /></a></p>
<p>As you can see the difference in IO is dramatic.</p>
<p>Let&#8217;s play with a couple other scenarios. What if we want to search for specific value in one of the attributes? Well, we need to create the index in such case.</p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic10.png" target="_blank"><img src="http://dwkor.net/blog/2012-02-16/pic10.png" onmouseout="undefined" onmouseover="undefined" title="undefined" height="1000" width="564" /></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic11.png" target="_blank"><img src="http://dwkor.net/blog/2012-02-16/pic11.png" onmouseout="undefined" onmouseover="undefined" title="undefined" height="163" width="539" /></a></p>
<p>The biggest problem here is the size of the key. With index it cannot exceed 900 bytes. Value (which is SQL Variant) can go far above that. So we basically have 2 choices. First either do not include Value to the index (or have it as included column) or perhaps, use filtered index and  disable the (index) search for some fields. Even if first option does not look very promising, there is one thing to consider. Are there any other criteria for the search? If all your use cases include some additional columns in the query it could make sense to push those columns to Attributes table and make them part of the index. As the real-life example, assuming you&#8217;re collecting data and all your queries include time range. In such case you can push ATime column to Attributes table and make the index as (AttrIndex, ATime) include(Value). While it uses the range scan, it could be acceptable because of additional filter on ATime that limits number of records.</p>
<p>Another scenario - sorting/paging. Assuming you want to display 1 page of data (10 rows). You can do something like that:</p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic12.png" target="_blank"><img src="http://dwkor.net/blog/2012-02-16/pic12.png" onmouseout="undefined" onmouseover="undefined" title="undefined" height="1036" width="569" /></a></p>
<p><a href="http://dwkor.net/blog/2012-02-16/pic13.png" target="_blank"><img src="http://dwkor.net/blog/2012-02-16/pic13.png" onmouseout="undefined" onmouseover="undefined" title="undefined" height="158" width="990" /></a></p>
<p>Let&#8217;s go through the list of criteria for this approach:</p>
<ol>
<li>Multiple schemas support - Yes.</li>
<li>Expandability. Yes.</li>
<li>Online schema change. Yes. Although if customer needs to be able to change data type of the attribute some work is required.</li>
<li>Storage cost. Medium to High depend on indexes and use cases</li>
<li>Attribute access overhead. 1 join + PIVOT overhead</li>
<li>Search-friendly. Yes with extra index. Possible issues with large values (900 bytes key size limitation)</li>
<li>Sorting/Paging friendly. Same as above.</li>
<li>Upfront knowledge about data schema. Required. Client needs to know about the schema in order to build PIVOT statement. On the server side dynamic SQL could be required</li>
</ol>
<p>And the last one is the biggest limitation of the design. While it offers very good performance, you have to babysit the solution. You need to think about use cases to design queries and indexes. You need to maintain indexes - you&#8217;ll get excessive fragmentation there.</p>
<p>Next time will do some performance comparison of the various methods</p>
<p>Source code is available for <a href="http://dwkor.net/blog/2012-02-16/aboutsqlserver(2012-02-16).sql" target="_blank">download</a></p>
<p style="display: inline ! important">P.S. I want to thank Vladimir Zatuliveter (zatuliveter _at_ gmail _dot_com) for his help with preparation of this post.</p>
]]></content:encoded>
			<wfw:commentRss>http://aboutsqlserver.com/2012/02/22/store-custom-fieldsattributes-in-microsoft-sql-server-database-part-2-namevalue-pairs/feed/</wfw:commentRss>
		</item>
		<item>
		<title>2012 South FL Code Camp Presentations</title>
		<link>http://aboutsqlserver.com/2012/02/21/2012-south-fl-code-camp-presentations/</link>
		<comments>http://aboutsqlserver.com/2012/02/21/2012-south-fl-code-camp-presentations/#comments</comments>
		<pubDate>Tue, 21 Feb 2012 23:44:23 +0000</pubDate>
		<dc:creator>Dmitri Korotkevitch</dc:creator>
		
		<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://aboutsqlserver.com/2012/02/21/2012-south-fl-code-camp-presentations/</guid>
		<description><![CDATA[..are available for download
That was the great event! And thank you, Gokhan 
]]></description>
			<content:encoded><![CDATA[<p>..are available for <a href="http://aboutsqlserver.com/presentations/" target="_blank">download</a></p>
<p>That was the great event! And thank you, Gokhan <img src='http://aboutsqlserver.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /></p>
]]></content:encoded>
			<wfw:commentRss>http://aboutsqlserver.com/2012/02/21/2012-south-fl-code-camp-presentations/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Store Custom Fields/Attributes in Microsoft SQL Server Database (Part 1)</title>
		<link>http://aboutsqlserver.com/2012/02/01/store-custom-fieldsattributes-in-microsoft-sql-server-database-part-1/</link>
		<comments>http://aboutsqlserver.com/2012/02/01/store-custom-fieldsattributes-in-microsoft-sql-server-database-part-1/#comments</comments>
		<pubDate>Thu, 02 Feb 2012 02:08:58 +0000</pubDate>
		<dc:creator>Dmitri Korotkevitch</dc:creator>
		
		<category><![CDATA[SQL Server 2005]]></category>

		<category><![CDATA[SQL Server 2008]]></category>

		<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://aboutsqlserver.com/2012/02/01/store-custom-fieldsattributes-in-microsoft-sql-server-database-part-1/</guid>
		<description><![CDATA[Regardless how good and flexible the system is, it&#8217;s practically impossible to design it in the way that satisfies all customers. Don&#8217;t take me wrong - if you have internal development team that works on internal system, you could be fine. But as long as you start to sell the solution or, even better, design [...]]]></description>
			<content:encoded><![CDATA[<p>Regardless how good and flexible the system is, it&#8217;s practically impossible to design it in the way that satisfies all customers. Don&#8217;t take me wrong - if you have internal development team that works on internal system, you could be fine. But as long as you start to sell the solution or, even better, design the hosting solution for the multiple customers - you are stuck. There is always some customization involved.</p>
<p>One of the very common examples of customization is custom attributes customer wants to store. For example, let&#8217;s think about shopping cart system and Article table there. If you put some time trying to define Article attributes you can end up with quite extensive set. Size, Weight, Dimension, Color.. So far so good. But one day sales department close the deal with auto part store and now you have to deal with cylinders, trim types, engine, battery amp-hours and other funny stuff. Next day the company closes the deal with grocery store and you have to deal with nutrition information.</p>
<p>Unfortunately there are no perfect ways to solve the problem. I&#8217;m going to show a few obvious and no-so-obvious design patterns that you can use and outline scenarios when those patterns are useful. There will be 3 posts on the suject:</p>
<ol>
<li>Today we will talk about storing attributes in the separate columns and about storing them in XML</li>
<li>We will talk about Name/Value table  - there are 2 approaches - very very bad and very very interesting</li>
<li>We will do some performance testing and storage analysis for those approaches.</li>
</ol>
<p>But first, let&#8217;s define the set of criteria we are going to use evaluating the patterns:</p>
<ol>
<li>Multiple schema support. Can this solution be used in hosted environments where you have multiple different &#8220;custom fields&#8221; schema? For example, customer-specific attributes in the system that stores data from the multiple customers (remember auto parts shop and grocery store)</li>
<li>Expandability. Does solution offer unlimited expandability in terms of numbers and types of the attributes?</li>
<li>Online schema change. E.g. can schema be modified online with active users in the system?</li>
<li>Storage cost. How much storage solution requires?</li>
<li>Attribute access cost</li>
<li>Search-friendliness. How easy is to search for specific value in the attributes scope</li>
<li>Sorting/Paging friendliness. How easy is to sort by specific attribute and display specific page based on row number</li>
<li>Upfront knowledge of the data schema. What client needs to know about attributes while selecting data.</li>
</ol>
<p>And before we begin, as the disclaimer. I&#8217;m going to show a few patterns but by any means there are other solutions available. Every system is unique and you need to keep your own requirements in mind while choosing the design. Not the best pattern in general could be the best one for specific system and specific requirements.</p>
<p><strong>Pattern 1: One column per attribute</strong></p>
<p>This is probably one of the most common pattern you can find especially in the old systems. There are 3 most common ways how it&#8217;s get implemented. In first one, you predefine the set of the custom columns of different types up front and customer is limited by that predefined subset. Something like that (let&#8217;s call it 1.a):</p>
<p><img src="http://dwkor.net/blog/2012-02-01/pic0.png" height="323" width="279" /></p>
<p>Alternatively, there are the systems that dynamically alter the table when customer needs to create the new attribute. Something like that (let&#8217;s call it 1.b):</p>
<p><img src="http://dwkor.net/blog/2012-02-01/pic1.png" height="404" width="349" /></p>
<p>As you can see, I mentioned that rebuilding of clustered index is the good idea. It reduces the index fragmentation due increase of the row size. And if you drop the attribute, you&#8217;d need to <a href="http://aboutsqlserver.com/2010/09/01/hidden-facts-about-table-alteration/" target="_blank">reclaim the space</a>.</p>
<p>Third variation (let&#8217;s call it 1.c) is very similar to 1.b with exception that it stores custom attributes in the separate table:<br />
<img src="http://dwkor.net/blog/2012-02-01/pic2.png" height="654" width="345" /></p>
<p>Let&#8217;s evaluate them:</p>
<ol>
<li> Multiple schema support. It would work with the multiple schema if/when you have predefined set of the attributes (1.a) from above. Of course, in that case all customers will have the limitation on maximum number of attributes per type but it could be acceptable in some systems. In case if you dynamically alter the table only one schema could be supported. Of course, you can do some tricks with that - for example keep multiple tables (one per customer) or reuse the attribute columns created by other customers but either of those approaches would introduce a lot of complexity and management overhead. It&#8217;s simply not worth it.</li>
<li>Expandability. 1.a obviously is not expandable. At least automatically. 1.b and 1.c offer practically unlimited expandability (subject of SQL Server limitations on max row size and max number of columns).</li>
<li>Online schema change. 1.a does not require any physical schema changes. 1.b and 1.c require SCH-M lock acquired on the table during table alteration (which is basically exclusive table access) as well as user should have appropriate rights to execute the ALTER TABLE statement.</li>
<li>Storage cost. 1.a - it increases the size of the row in Articles table by the size of all fixed-width data types used by attributes plus at least 2 bytes per variable width attribute regardless if attributes are used or not (<a href="http://aboutsqlserver.com/2010/08/11/how-sql-server-stores-data-extents-data-pages-data-row-for-in-row-data/" target="_blank">see it in more details</a>). This could be OK as long as the table is not transactional (does not store a lot of data) and we are not going crazy with total number of attributes we predefined, but still - it needs to be considered. <a href="http://aboutsqlserver.com/2010/08/25/why-row-size-matters/" target="_blank">Row size matters</a>. 1.b and 1.c are much more efficient in that regard - attributes are created only when needed.</li>
<li>Attribute access cost. 1.a and 1.b - no overhead at all. Attribute is in the regular column in the row. 1.c - there is the extra join between the tables</li>
<li>Search-friendliness. Generally this would introduce the search clause like: <em>where (CustomText1 = @P1) or  (CustomText2 = @P1) ..</em>  Usually those patterns lead to clustered index scans unless there are predicates selective enough to utilize non clustered index. So this is more or less the question if system even need to allow search like that without any additional filters on other columns. One other thing to keep in mind - you need to be careful dealing with various data types and possible conversion errors.</li>
<li>Sorting/Paging friendliness. That pattern is extremely friendly for sorting and paging as long as there are some primary filters that limit number of rows to sort/page. Otherwise attribute either needs to be indexed or scan would be involved.</li>
<li>Upfront knowledge about the data schema generally is not required. While <em>select *</em> is not the best practice, it would work perfectly when you need to grab entire row with the attributes.</li>
</ol>
<p>The biggest benefits of that design are simplicity and low access cost. I would consider it for the system that require single data schema (box product) or when multiple data schema would work with limited number of attributes (1.a). While it can cover a lot of systems, in general it&#8217;s not flexible enough. I would also be very careful with that pattern in case if we need to add attributes to transactional tables with millions or billions of rows. You don&#8217;t want to alter those tables on the fly nor have storage overhead introduced by predefined attributes.</p>
<p>One other possible option is to use SPARSE columns with 1.a and 1.b. <a href="http://msdn.microsoft.com/en-us/library/cc280604.aspx" target="_blank">SPARSE columns</a> are ordinary columns that optimized for the storage of NULL values. That will technically allow you to predefine bigger set of the attributes than with regular columns without increasing the size of the row. Could be very useful in some cases. Same time you need to keep in mind that not null SPARSE column takes more space than regular column. Another important thing that tables with SPARSE columns cannot be compressed which is another very good way to save on the storage space.</p>
<p>And the last note about the indexing. If you need to support search and/or sort on every attribute you need to either limit the number of rows to process or index every (or most commonly used) attributes. While large number of indexes is not very good thing in general, in some system it&#8217;s perfectly OK (especially with filtered indexes that do not index NULL values), as long as you don&#8217;t have millions  of rows in the table nor very heavy update activity. Again, this is from &#8220;It depends&#8221; category.<br />
<strong>Pattern 2. Attributes in XML</strong></p>
<p>Well, that&#8217;s self-explanatory :). Something like that:</p>
<p><img src="http://dwkor.net/blog/2012-02-01/pic3.png" height="355" width="313" /></p>
<p>Let&#8217;s dive into the details.</p>
<ol>
<li>Multiple schema support. Not a problem at all. You can store whatever you want. As long as it&#8217;s the valid XML</li>
<li>Expandability. The same. Not a problem at all. Just keep the valid XML and you&#8217;re golden</li>
<li>Online schema change. Easy. There is no schema on the metadata level. Well, you can, of course, define XML Schema and it will help with performance but again, consider pros and cons of this step.</li>
<li>Storage cost. And now we started to talk about negative aspects. It uses good amount of space. XML is basically LOB - SQL Server does not store it in plan text - there are some minor compression involved but still. It uses a lot of space. And if we need to index XML column, it would require even more space</li>
<li>Attribute access cost. Heh, and this is another big one. That&#8217;s great that SQL Server has build-in XML support. But performance is far from ideal. We will do some performance testing in Part 3 of our discussion but trust me - shredding XML in SQL Server is slow and CPU intensive.</li>
<li>Search-friendliness. Well, you need to shred it before the search - it&#8217;s very slow. XML Indexes would help but it&#8217;s still slower than regular columns and introduce huge storage overhead.</li>
<li>Sorting/Paging friendly. Same as above. You need to shred data first.</li>
<li>Upfront knowledge about data schema. All data stored in 1 column. So client does not need to jump through any hoops to access the data. But of course, it needs to know how to parse it.</li>
</ol>
<p>Bottom line - storing custom attributes in XML is the perfect solution in terms of flexibility. Unfortunately it&#8217;s very storage-hungry and most importantly very slow in terms of performance. That solution is the best if the main goal is simple attribute storage and displaying/editing very small amount of rows on the client. For example Article detail page on the web site. Although if you need to shred, sort, filter the large number of rows - you&#8217;ll have performance issues even with XML Indexes.</p>
<p><a href="http://aboutsqlserver.com/2012/02/22/store-custom-fieldsattributes-in-microsoft-sql-server-database-part-2-namevalue-pairs/">Next time</a> we will talk about Name/Value table that deserves the separate post.</p>
]]></content:encoded>
			<wfw:commentRss>http://aboutsqlserver.com/2012/02/01/store-custom-fieldsattributes-in-microsoft-sql-server-database-part-1/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Locking in Microsoft SQL Server (Part 12 - Lock Escalation)</title>
		<link>http://aboutsqlserver.com/2012/01/11/locking-in-microsoft-sql-server-part-12-lock-escalation/</link>
		<comments>http://aboutsqlserver.com/2012/01/11/locking-in-microsoft-sql-server-part-12-lock-escalation/#comments</comments>
		<pubDate>Wed, 11 Jan 2012 17:22:18 +0000</pubDate>
		<dc:creator>Dmitri Korotkevitch</dc:creator>
		
		<category><![CDATA[SQL Server 2005]]></category>

		<category><![CDATA[SQL Server 2008]]></category>

		<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://aboutsqlserver.com/2012/01/11/locking-in-microsoft-sql-server-part-12-lock-escalation/</guid>
		<description><![CDATA[I hope everyone had the great holiday season!  
Today I&#8217;d like us to talk about Lock Escalation in Microsoft SQL Server. We will cover:

What is Lock Escalation?
How Lock Escalations affects the system
How to detect and troubleshoot Lock Escalations
How to disable Lock Escalation

What is Lock Escalation?
All of us know that SQL Server uses row level [...]]]></description>
			<content:encoded><![CDATA[<p>I hope everyone had the great holiday season! <img src='http://aboutsqlserver.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Today I&#8217;d like us to talk about Lock Escalation in Microsoft SQL Server. We will cover:</p>
<ol>
<li>What is Lock Escalation?</li>
<li>How Lock Escalations affects the system</li>
<li>How to detect and troubleshoot Lock Escalations</li>
<li>How to disable Lock Escalation</li>
</ol>
<p><strong>What is Lock Escalation?</strong><br />
All of us know that SQL Server uses row level locking. Let&#8217;s think about scenario when system modifies the row. Let&#8217;s create the small table and insert 1 row there and next check the locks we have. As usual every image is clickable.</p>
<p><a href="http://dwkor.net/blog/2012-01-11/pic0.png" target="_blank"><img src="http://dwkor.net/blog/2012-01-11/pic0.png" width="392" height="385" /></a></p>
<p>As you can see there are 4 locks in the picture. shared (S) lock on the database - e.g. indication that database is in use. Intent exclusive (IX) lock on the table (OBJECT) - e.g. indication that one of the child objects (row/key in our case) has the exclusive lock. Intent exclusive (IX) lock on the page - e.g. same indication about child object (row/key) exclusive lock. And finally exclusive (X) lock on the key (row) we just inserted.</p>
<p>Now let&#8217;s insert another row in the different session (let&#8217;s keep the original Session 1 transaction uncommitted).</p>
<p><a href="http://dwkor.net/blog/2012-01-11/pic0b.png" target="_blank"><img src="http://dwkor.net/blog/2012-01-11/pic0b.png" width="392" height="385" /></a></p>
<p>When we check the locks we will see that there are 8 locks - 4 per session. Both sessions ran just fine and don&#8217;t block each other. Everything works smooth - that great for the concurrency. So far so good. The problem though is that every lock takes some memory space - 128 bytes on 64 bit OS and 64 bytes on 32 bit OS). And memory is not the free resource. Let&#8217;s take a look at another example. I&#8217;m creating the table and populating it with 100,000 rows. Next, I&#8217;m disabling the lock escalation on the table (ignore it for now) and clear all system cache (don&#8217;t do it in production). Now let&#8217;s run the transaction in repeatable read isolation level and initiate the table scan.</p>
<p><a href="http://dwkor.net/blog/2012-01-11/pic1.png" target="_blank"><img src="http://dwkor.net/blog/2012-01-11/pic1.png" width="471" height="705" /></a></p>
<p>Transaction is not committed and as we remember, in repeatable read isolation level SQL Server <a href="http://aboutsqlserver.com/2011/04/28/locking-in-microsoft-sql-server-part-2-locks-and-transaction-isolation-levels/" target="_blank">holds the locks till end of transaction</a>. And now let&#8217;s see how many locks we have and how much memory does it use.</p>
<p><a href="http://dwkor.net/blog/2012-01-11/pic2.png" target="_blank"><img src="http://dwkor.net/blog/2012-01-11/pic2.png" width="612" height="286" /></a></p>
<p><a href="http://dwkor.net/blog/2012-01-11/pic3.png" target="_blank"><img src="http://dwkor.net/blog/2012-01-11/pic3.png" width="348" height="324" /></a><br />
As you can see, now we have 102,780 lock structures that takes more than 20MB of RAM. And what if we have a table with billions of rows? This is the case when SQL Server starts to use the process that called &#8220;Lock Escalation&#8221; - in nutshell, instead of keeping locks on every row SQL Server tries to escalate them to the higher (object) level. Let&#8217;s see how it works.</p>
<p>First we need to commit transaction and clear the cache. Next, let&#8217;s switch lock escalation for Data table to AUTO level (I&#8217;ll explain it in details later) and see what will happen if we re-run the previous example.</p>
<p><a href="http://dwkor.net/blog/2012-01-11/pic4.png" target="_blank"><img src="http://dwkor.net/blog/2012-01-11/pic4.png" width="496" height="645" /></a></p>
<p><a href="http://dwkor.net/blog/2012-01-11/pic5.png" target="_blank"><img src="http://dwkor.net/blog/2012-01-11/pic5.png" width="348" height="156" /></a></p>
<p>As you can see - just 2 locks and only 1Mb of RAM is used (Memory clerk reserves some space). Now let&#8217;s look what locks do we have:</p>
<p><a href="http://dwkor.net/blog/2012-01-11/pic6.png" target="_blank"><img src="http://dwkor.net/blog/2012-01-11/pic6.png" width="388" height="184" /></a></p>
<p>As you can see there is the same (S) lock on the database and now we have the new (S) shared lock on the table. No locks on page/row levels are kept. Obviously concurrency is not as good as it used to be. Now, for example, other sessions would not be able to update the data on the table - (S) lock is incompatible with (IX) on the table level. And obviously, if we have lock escalation due data modifications, the table would hold (X) exclusive lock - so other sessions would not be able to read the data either.</p>
<p>The next question is when escalation happens. Based on the documentation, SQL Server tries to escalate locks after it acquires at least 5,000 locks on the object. If escalation failed, it tries again after at least 1,250 new locks. The locks count on index/object level. So if Table has 2 indexes - A and B you have 4,500 locks on the index A and 4,500 locks on the index B, the locks would not be escalated. In real life, your mileage may vary - see example below - 5,999 locks does not trigger the escalation but 6,999 does.</p>
<p><a href="http://dwkor.net/blog/2012-01-11/pic7.png" target="_blank"><img src="http://dwkor.net/blog/2012-01-11/pic7.png" width="435" height="499" /></a></p>
<p><strong>How it affects the system?</strong></p>
<p>Let&#8217;s re-iterate our first small example on the bigger scope. Let&#8217;s run the first session that updates 1,000 rows and check what locks are held.</p>
<p><a href="http://dwkor.net/blog/2012-01-11/pic8.png" target="_blank"><img src="http://dwkor.net/blog/2012-01-11/pic8.png" width="435" height="449" /></a></p>
<p>As you see, we have intent exclusive (IX) locks on the object (table) and pages as well as various (X) locks on the rows. If we run another session that updates completely different rows everything would be just fine. (IX) locks on table are compatible. (X) locks are not acquired on the same rows.</p>
<p><a href="http://dwkor.net/blog/2012-01-11/pic9.png" target="_blank"><img src="http://dwkor.net/blog/2012-01-11/pic9.png" width="353" height="191" /></a><br />
Now let&#8217;s trigger lock escalation updating 11,000 rows.</p>
<p><a href="http://dwkor.net/blog/2012-01-11/pic10.png" target="_blank"><img src="http://dwkor.net/blog/2012-01-11/pic10.png" width="445" height="297" /></a></p>
<p>As you can see - now the table has exclusive lock. So if you run the session 2 query from above again, it would be blocked because (X) lock on the table held by session 1 is incompatible with (IX) lock from the session 2.</p>
<p>When it affects us? There are 2 very specific situations</p>
<ol>
<li>Batch inserts/updates/deletes. You&#8217;re trying to import thousands of the rows (even from the stage table). If your import session is lucky enough to escalate the lock, neither of other sessions would be able to access the table till transaction is committed.</li>
<li>Reporting - if you&#8217;re using repeatable read or serializable isolation levels in order to have data consistent in reports, you can have (S) lock escalated to the table level and as result, writers will be blocked until the end of transaction.</li>
</ol>
<p>And of course, any excessive locking in the system can trigger it too.</p>
<p><strong>How to detect and troubleshoot Lock Escalations</strong></p>
<p>First of all, even if you have the lock escalations it does not mean that it&#8217;s bad. After all, this is expected behavior of SQL Server. The problem with the lock escalations though is that usually customers are complaining that some queries are running slow. In that particular case waits due lock escalations from other processes could be the issue. If we look at the example above when session 2 is blocked, and run the script (as the session 3) that analyzes sys.dm_tran_locks DMV, we&#8217;d see that:</p>
<p><a href="http://dwkor.net/blog/2012-01-11/pic11.png" target="_blank"><img src="http://dwkor.net/blog/2012-01-11/pic11.png" width="708" height="785" /></a></p>
<p><a href="http://dwkor.net/blog/2012-01-11/pic12.png" target="_blank"><img src="http://dwkor.net/blog/2012-01-11/pic12.png" width="928" height="106" /></a></p>
<p>I&#8217;m very heavy on the wait statistics as the first troubleshooting tool (perhaps heavier than I need to be <img src='http://aboutsqlserver.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> ). One of the signs of the issues with lock escalations would be the high percent of intent lock waits (LCK_M_I*) together with relatively small percent of regular non-intent lock waits. See the example below:</p>
<p><a href="http://dwkor.net/blog/2012-01-11/pic13.png" target="_blank"><img src="http://dwkor.net/blog/2012-01-11/pic13.png" width="719" height="394" /></a></p>
<p>In case if the system has high percent of both intent and regular lock  waits, I&#8217;d focus on the regular locks first (mainly check if queries are  optimized). There is the good chance that intent locks are not related  with lock escalations.</p>
<p>In addition to DMVs (sys.dm_tran_locks, sys.dm_os_waiting_tasks, sys.dm_os_wait_stats, etc), there are Lock Escalation Profiler event and Lock Escalation extended event you can capture. You can also monitor performance counters related with locking and create the baseline (always the great idea)</p>
<p>Last but not least, look at the queries. As I <a href="http://aboutsqlserver.com/2011/05/12/locking-in-microsoft-sql-server-part-3-blocking-in-the-system/" target="_blank">mentioned before</a> in most part of the cases excessive locking happen because of non-optimized queries. And that, of course, can also trigger the lock escalations.</p>
<p><strong>How to disable Lock Escalation</strong></p>
<p>Yes, you can disable Lock Escalations. But it should be the last resort. Before you implement that, please consider other approaches</p>
<ol>
<li>For data consistency for reporting (repeatable read/serializable isolation levels) - switch to <a href="http://aboutsqlserver.com/2011/08/25/locking-in-microsoft-sql-server-part-8-optimistic-transaction-isolation-levels/" target="_blank">optimistic </a>(read committed snapshot, snapshot) isolation levels</li>
<li>For batch operations consider to either change batch size to be below 5,000 rows threshold or, if it&#8217;s impossible, you can play with lock compatibility. For example have another session that aquires IS lock on the table while importing data. Or use partition switch from the staging table if it&#8217;s possible</li>
</ol>
<p>In case if neither option works for you please test the system before you disable the lock escalations. So:</p>
<p>For both SQL Server 2005 and 2008 you can alter the behavior on the instance level with Trace Flags 1211 and 1224. Trace flag 1211 disables the lock escalation in every cases. In case, if there are no available memory for the locks, the error 1204 (Unable to allocate lock resource) would be generated. Trace flag 1224 would disable lock escalations in case if there is no memory pressure in the system. Although locks would be escalated in case of the memory pressure.</p>
<p>With SQL Server 2005 trace flags are the only options you have. With SQL Server 2008 you can also specify escalation rules on the table level with ALTER TABLE SET LOCK_ESCALATION statement. There are 3 available modes:</p>
<ol>
<li>DISABLE - lock escalation on specific table is disabled</li>
<li>TABLE (default) - default behavior of lock escalation - locks are escalated to the table level.</li>
<li>AUTO - if table is partitioned, locks would be escalated to partition level when table is partitioned or on table level if table is not partitioned</li>
</ol>
<p>Source code is available for <a href="http://dwkor.net/blog/2012-01-11/aboutsqlserver(2012-01-11).sql">download</a></p>
<p><a href="http://aboutsqlserver.com/2011/09/28/locking-in-microsoft-sql-server-table-of-content/">Table of content</a></p>
]]></content:encoded>
			<wfw:commentRss>http://aboutsqlserver.com/2012/01/11/locking-in-microsoft-sql-server-part-12-lock-escalation/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Sunday T-SQL Tip: select top N using aligned non-clustered index on partitioned table</title>
		<link>http://aboutsqlserver.com/2011/12/18/sunday-t-sql-tip-select-top-n-using-aligned-non-clustered-index-on-partitioned-table/</link>
		<comments>http://aboutsqlserver.com/2011/12/18/sunday-t-sql-tip-select-top-n-using-aligned-non-clustered-index-on-partitioned-table/#comments</comments>
		<pubDate>Sun, 18 Dec 2011 17:25:02 +0000</pubDate>
		<dc:creator>Dmitri Korotkevitch</dc:creator>
		
		<category><![CDATA[SQL Server 2005]]></category>

		<category><![CDATA[SQL Server 2008]]></category>

		<category><![CDATA[T-SQL]]></category>

		<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://aboutsqlserver.com/2011/12/18/sunday-t-sql-tip-select-top-n-using-aligned-non-clustered-index-on-partitioned-table/</guid>
		<description><![CDATA[Almost one year ago I blogged about table partitioning in Microsoft SQL Server. I mentioned one specific case when table partitioning hurt the performance - case when you need to select top N rows using aligned non-clustered index. I said that there is no good workarounds for this particular case. Well, there is one. Kind [...]]]></description>
			<content:encoded><![CDATA[<p>Almost one year ago I <a href="http://aboutsqlserver.com/2010/12/22/sql-server-and-table-partitioning-part-2-when-partitioning-is-you-enemy/" target="_blank">blogged</a> about table partitioning in Microsoft SQL Server. I mentioned one specific case when table partitioning hurt the performance - case when you need to select top N rows using aligned non-clustered index. I said that there is no good workarounds for this particular case. Well, there is one. Kind of.</p>
<p>First, let&#8217;s take a look at the original problem. I adjusted the script I used an year ago a little bit. First, assuming we have non-partitioned table with clustered index on ID and non-clustered index on DateModified date. Let&#8217;s create that table an populate it with some data (if you click on the images below those would be opened in the new browser window).</p>
<p><a href="http://dwkor.net/blog/2011-12-18/pic1.png" target="_blank"><img src="http://dwkor.net/blog/2011-12-18/pic1.png" height="587" width="384" /></a></p>
<p>Now let&#8217;s say we need to select top 100 rows based on DateModified column. This is quite typical scenario you&#8217;re using in production systems when you need to export and/or process the data.</p>
<p><a href="http://dwkor.net/blog/2011-12-18/pic2.png" target="_blank"><img src="http://dwkor.net/blog/2011-12-18/pic2.png" height="609" width="764" /></a></p>
<p>As long as table is not partitioned, you can see that plan is very good. Basically SQL Server looks up the first row in the non-clustered index for specific DateModified value and do the ordered scan for the first 100 rows. Very efficient. Now, let&#8217;s partition the table based on DateCreated on quarterly basis.</p>
<p><a href="http://dwkor.net/blog/2011-12-18/pic3.png" target="_blank"><img src="http://dwkor.net/blog/2011-12-18/pic3.png" height="484" width="314" /></a></p>
<p>And now - let&#8217;s run that statement again. As you can see, SQL Server started to use CI scan with SORT Top N. I explained why it happened in the previous post.</p>
<p><a href="http://dwkor.net/blog/2011-12-18/pic4.png" target="_blank"><img src="http://dwkor.net/blog/2011-12-18/pic4.png" height="616" width="721" /></a></p>
<p>If we force SQL Server to use the index, the plan would be even worse in this particular case.</p>
<p><a href="http://dwkor.net/blog/2011-12-18/pic5.png" target="_blank"><img src="http://dwkor.net/blog/2011-12-18/pic5.png" height="564" width="809" /></a></p>
<p>Although If you have the huge transactional table and # of rows with DateModified &gt; ? is relatively small, the plan above could be more efficient than CI scan but SCAN/SORT TOP N would always be there.</p>
<p>Is there solution to this problem? Well, yes and no. I don&#8217;t know if there is generic solution that would work in all cases, although if you table has limited number of partitions and packet size is not huge there is one trick you can do.</p>
<p>Let&#8217;s take a look at the picture that shows how non-clustered index is aligned.</p>
<p><img src="http://dwkor.net/blog/2010-12-22/Pic10.png" height="175" width="289" /></p>
<p>I just copied it from the old post, so dates are a little bit off. SQL Server cannot use the same efficient plan with non-partitioned/non-aligned index because data could reside on the different partitions. Although, we can still use ordered index scan within each partition. And next, if we select top N rows from each partition independently, union them all and next sort them all together and grab top N rows, we will have what we need. And we can do it using $Partition function. Let&#8217;s take a look:</p>
<p><a href="http://dwkor.net/blog/2011-12-18/pic6.png" target="_blank"><img src="http://dwkor.net/blog/2011-12-18/pic6.png" height="1031" width="561" /></a></p>
<p>Each PData CTE uses $Partition function that limits data search within the single partition so SQL Server can use ordered index scan there. In fact, it would be very similar to what we had when we did the select against non-partitioned table. Next, AllData CTE merges all results from PData CTEs and sort them based on DateModified and ID - returning top 100 rows. Last select joins the data from the main table with IDs returned from AllData CTE. One very <strong>important </strong>point I want to stress - as you can see, PData/AllData CTEs don&#8217;t select all columns from the table but only columns from the non clustered index. Data from the clustered index selected based on the join in the main select. This approach limits CTE operation to use index only and avoids unnecessary key lookups there.</p>
<p>If we look at result set, we can see that data is basically selected from partition 3 and 4.</p>
<p><a href="http://dwkor.net/blog/2011-12-18/pic7.png" target="_blank"><img src="http://dwkor.net/blog/2011-12-18/pic7.png" height="221" width="489" /></a></p>
<p>And now let&#8217;s look at the execution plan.</p>
<p><a href="http://dwkor.net/blog/2011-12-18/pic8.png" target="_blank"><img src="http://dwkor.net/blog/2011-12-18/pic8.png" height="429" width="1186" /></a></p>
<p>As you can see, red rectangles represent PData CTEs. There is no key lookups until very last stage and those lookups are done only for 100 rows. One other thing worth to mention that SQL Server is smart enough to perform SORT as part of Concatenation operator and illuminate unnesesary rows there. As you can see, only 1 row is returned as part of PData5 - SQL Server does not bother to get other 99 rows.</p>
<p>This particular example has the data distributed very evenly (which usually happens with DateCreated/DateModified pattern). Generally speaking, cost of the operation will be proportional to the number of partitions multiplied by packet size. So if you have the table with a lot of partitions, that solution would not help much. On the other hand, there are usually some tricks you can use. Even in this particular case you don&#8217;t need to include PData6 to the select. This partition is empty. Also, you can put some logic in place - perhaps create another table and store most recent DateModified value per partition. In such case you can dynamically construct the select and exclude partitions where data has not been recently modified.</p>
<p>As the disclaimer, that solution is not the silver bullet especially if you have a lot of partitions and need to select large data packet. But in some cases it could help. And PLEASE <strong>TEST IT</strong> before you put it to production</p>
<p>Source code is available for <a href="http://dwkor.net/blog/2011-12-18/aboutsqlserver(2011-12-18).sql">download</a></p>
]]></content:encoded>
			<wfw:commentRss>http://aboutsqlserver.com/2011/12/18/sunday-t-sql-tip-select-top-n-using-aligned-non-clustered-index-on-partitioned-table/feed/</wfw:commentRss>
		</item>
		<item>
		<title>A few more words about uniquifiers and uniqueness of the Clustered Index</title>
		<link>http://aboutsqlserver.com/2011/11/24/a-few-more-words-about-uniquifiers-and-uniqueness-of-the-clustered-index/</link>
		<comments>http://aboutsqlserver.com/2011/11/24/a-few-more-words-about-uniquifiers-and-uniqueness-of-the-clustered-index/#comments</comments>
		<pubDate>Thu, 24 Nov 2011 18:33:42 +0000</pubDate>
		<dc:creator>Dmitri Korotkevitch</dc:creator>
		
		<category><![CDATA[SQL Server 2005]]></category>

		<category><![CDATA[SQL Server 2008]]></category>

		<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://aboutsqlserver.com/2011/11/24/a-few-more-words-about-uniquifiers-and-uniqueness-of-the-clustered-index/</guid>
		<description><![CDATA[Long time ago we discussed that good clustered index needs to be unique, static and narrow. I mentioned that internally SQL Server needs to be able to unique identify every row in the table (think about non-clustered index key lookup operation) and in case, if clustered index is not defined as unique, SQL Server adds [...]]]></description>
			<content:encoded><![CDATA[<p>Long time ago we discussed that good clustered index needs to be <a href="http://aboutsqlserver.com/2010/11/17/primary-key-and-clustered-index/" target="_blank">unique, static and narrow</a>. I mentioned that internally SQL Server needs to be able to unique identify every row in the table (think about non-clustered index key lookup operation) and in case, if clustered index is not defined as unique, SQL Server adds 4 bytes uniquifier to the row. Today I want us to talk about that case in much more details and see when and how SQL Server maintains that uniquifier and what overhead it introduces.</p>
<p>In order to understand what happens behind the scene we need to look at the physical row structure and data on the page. First, I want to admit, that general comment about 4 bytes uniquifier is not exactly correct. In some cases overhead could be 0 bytes but in most cases it would be 2, 6 or 8 bytes. Let&#8217;s look at that in more details. Click on the images below to open them in the new window.</p>
<p>First, let&#8217;s create 3 different tables. Each of them will have only fixed-width columns 1000 bytes per row + overhead. So it gives us an ability to put</p>
<ul>
<li>UniqueCI - that table has unique clustered index on KeyValue column</li>
<li>NonUniqueCINoDups - that table has non unique clustered index on  KeyValue column. Although we don&#8217;t put any KeyValue duplicates to that  table</li>
<li>NonUniqueCIDups - that table has non unique clustered index on KeyValue column and will have a lot of duplicates.</li>
</ul>
<p><img src="http://dwkor.net/blog/2011-11-24/pic1.png" height="525" width="467" /></p>
<p>Now let&#8217;s populate some data to those tables.</p>
<p><img src="http://dwkor.net/blog/2011-11-24/pic2.png" height="588" width="475" /></p>
<p>First, let&#8217;s take a look at the physical stats on the clustered index. 2 things are interesting. First - Min/Max/Avg record size, and second is the Page Count.</p>
<p><a href="http://dwkor.net/blog/2011-11-24/pic3.png" target="_blank"><img src="http://dwkor.net/blog/2011-11-24/pic3.png" height="582" width="595" /></a></p>
<p>As you can see, best case scenario in UniqueCI table has 1007 bytes as Min/Max/Avg record size (again - all columns are fixed width) and uses 12500 pages. Each page can store 4 rows (1,007 bytes per row * 8 = 8,056 bytes &lt; 8,060 bytes available on the page.</p>
<p>Next, let&#8217;s take a look at NonUniqueCINoDups table. Even if clustered index is not unique, Min/Max/Avg/Page Count are the same with UniqueCI clustered index. So as you can see, in this particular case of non-unique clustered index, SQL Server does not put uniquifier for the first (unique) value of the clustered index. And we will see it in more details.</p>
<p>The last one - NonUniqueDups table, is more interesting. As you can see, if Min record size is the same (1,007 bytes), Maximum is 1,015 bytes. And Average record size is 1,014.991 - very similar to the maximum record size. Basically, uniquifier is added to all rows with exception of the first row per unique value. Interestingly enough that even if uniquifier itself is 4 bytes the total overhead is 8 bytes.</p>
<p>Another thing is worth to mention is the page count. As you can see, there are 1,786 extra pages (about extra 14M of the storage space). 8 rows don&#8217;t fit on the page anymore. Obviously this example does not represent real-life scenario (no variable page columns that can go off-row, etc) although if you think about non-clustered indexes, the situation is very close to the real-life. Let&#8217;s create non-clustered indexes and checks the stats.</p>
<p><a href="http://dwkor.net/blog/2011-11-24/pic4.png" target="_blank"><img src="http://dwkor.net/blog/2011-11-24/pic4.png" height="713" width="595" /></a></p>
<p>As you can see, in the latter case, we almost doubled the size of the non-clustered index leaf row and storage space for the index. That makes non-clustered index much less efficient.</p>
<p>Now let&#8217;s take a look at the actual row data to see how SQL Server stores uniquifier. We will need to take a look at the actual data on the page. So the first step is to find out what is the page number. We can use DBCC IND command below. Let&#8217;s find the first page on the leaf level (the one that stores very first row from the table). Looking at DBCC IND result set, we need to select PagePID for the IndexLevel = 0 and PrevPageFID = 0.</p>
<p><a href="http://dwkor.net/blog/2011-11-24/pic5.png" target="_blank"><img src="http://dwkor.net/blog/2011-11-24/pic5.png" height="341" width="1120" /></a></p>
<p>Next, we need to run DBCC PAGE command and provide that PagePID. Both DBCC IND and DBCC PAGE are perfectly save to run on the production system. One other thing you need to do is to enable trace flag 3604 to allow DBCC PAGE to display result in the console rather than put it to SQL Server error log.</p>
<p><a href="http://dwkor.net/blog/2011-11-24/pic6.png" target="_blank"><img src="http://dwkor.net/blog/2011-11-24/pic6.png" height="425" width="671" /></a></p>
<p>So let&#8217;s take a look at the row itself. First, we need to remember that DBCC PAGE presents multi-byte values with least-significant bytes first (for example int (4 bytes) value 0&#215;00000001 would be presented as 0&#215;01000000). Let&#8217;s take a look at the actual bytes.</p>
<ul>
<li>Byte 0 (TagByteA) (green underline) - this is the bit mask and in our case 0&#215;10 means that there is NULL bitmap</li>
<li>Byte 1 (TagByteB) - not important in our case</li>
<li>Byte 2 and 3 (yellow underline) - Fixed width data size</li>
<li>Bytes 4+ stores the actual fixed width data. You can see values 0&#215;01 (0&#215;10000000 in reverse order) for KeyValue and ID and &#8216;a..&#8217; for the CharData columns.</li>
</ul>
<p>But much more interesting is what we have after Fixed-Width data block. Let&#8217;s take a look:</p>
<p><a href="http://dwkor.net/blog/2011-11-24/pic7.png" target="_blank"><img src="http://dwkor.net/blog/2011-11-24/pic7.png" height="276" width="539" /></a></p>
<p>Here we have:</p>
<ul>
<li>Number of columns - 2 bytes (Red underline): 0&#215;0300 in reverse order - 3 columns that we have in the table.</li>
<li> Null bitmap (Blue underline): 1 byte in our case - no nullable columns - 0.</li>
</ul>
<p>So everything is simple and straightforward. Now let&#8217;s take a look at NonUniqueCINoDups data. Again, first we need to find the page id with DBCC IND and next - call DBCC PAGE.<br />
<img src="http://dwkor.net/blog/2011-11-24/pic8.png" height="260" width="421" /></p>
<p>I&#8217;m omitting first part of the row - it would be exactly the same with UniqueCI row. Let&#8217;s take a look at the data after fixed-width block.</p>
<p><a href="http://dwkor.net/blog/2011-11-24/pic9.png" target="_blank"><img src="http://dwkor.net/blog/2011-11-24/pic9.png" height="361" width="529" /></a></p>
<p>As you can see, number of columns (Red underline) is now 4 that includes uniquifier which does not take any physical space. And if you thinking about it for a minute - yes, uniquifier is nullable int column that stores in the variable-width section of the row. SQL Server omits data for nullable variable width columns that are the last in the variable-width section which is the case here.</p>
<p>And now let&#8217;s take a look at NonUniqueCIDups rows. Again, DBCC IND, DBCC PAGE.</p>
<p><img src="http://dwkor.net/blog/2011-11-24/pic10.png" height="276" width="419" /></p>
<p>If we look at the variable width section of the first row in the duplication sequence), it would be exactly the same with NonUniqueCINoDups. E.g. uniquefier does not take any space.</p>
<p><a href="http://dwkor.net/blog/2011-11-24/pic11.png" target="_blank"><img src="http://dwkor.net/blog/2011-11-24/pic11.png" height="336" width="540" /></a></p>
<p>But let&#8217;s look at the second row.</p>
<p><a href="http://dwkor.net/blog/2011-11-24/pic12.png" target="_blank"><img src="http://dwkor.net/blog/2011-11-24/pic12.png" height="327" width="548" /></a></p>
<p>Again we have:</p>
<ul>
<li>Number of columns - 2 bytes (Red underline): 4</li>
<li>Null bitmap (Blue underline)</li>
<li>Number of variable-width columns - 2 bytes (Green underline) - 0&#215;0100 in reverse order - 1 variable width column</li>
<li>Offset of variable-width column 1 - 2 bytes (Black underline)</li>
<li>Uniquefier value - 4 bytes (purple underline)</li>
</ul>
<p>As you can see, it introduces 8 bytes overhead total.</p>
<p>To summarize storage-wise - if clustered index is not unique then for unique values of the clustered key:</p>
<ul>
<li>There is no overhead if row don&#8217;t have variable-width columns or all variable-width columns are null</li>
<li>There are 2 bytes overhead (variable-offset array) if there is at least 1 variable-width column that stores not null value</li>
</ul>
<p>For non-unique values of the clustered key:</p>
<ul>
<li>There are 8 extra bytes if row does not have variable-width columns</li>
<li>There are 6 extra bytes if row has variable-width columns</li>
</ul>
<p>This applies not only to the clustered indexes but also to non-clustered index that references clustered index key values. Well, storage is cheap but IO is not..</p>
<p>Source code is available for <a href="http://dwkor.net/blog/2011-11-24/aboutsqlserver(2011-11-24).sql">download</a></p>
<p>P.S. Happy Thanksgiving! <img src='http://aboutsqlserver.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /></p>
]]></content:encoded>
			<wfw:commentRss>http://aboutsqlserver.com/2011/11/24/a-few-more-words-about-uniquifiers-and-uniqueness-of-the-clustered-index/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Locking in Microsoft SQL Server (Part 11 - Deadlocks due multiple updates of the same row)</title>
		<link>http://aboutsqlserver.com/2011/11/09/locking-in-microsoft-sql-server-part-11-deadlocks-due-multiple-updates-of-the-same-row/</link>
		<comments>http://aboutsqlserver.com/2011/11/09/locking-in-microsoft-sql-server-part-11-deadlocks-due-multiple-updates-of-the-same-row/#comments</comments>
		<pubDate>Thu, 10 Nov 2011 02:23:54 +0000</pubDate>
		<dc:creator>Dmitri Korotkevitch</dc:creator>
		
		<category><![CDATA[SQL Server 2005]]></category>

		<category><![CDATA[SQL Server 2008]]></category>

		<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://aboutsqlserver.com/2011/11/09/locking-in-microsoft-sql-server-part-11-deadlocks-due-multiple-updates-of-the-same-row/</guid>
		<description><![CDATA[We all already know that in most part of the cases deadlocks happen due non-optimized queries. Today I&#8217;d like to show another pattern that could lead to the deadlocks. It&#8217;s not something that happens very often but it&#8217;s worth to mention.
Let&#8217;s think about the following scenario. Assuming you have the system that collects some data [...]]]></description>
			<content:encoded><![CDATA[<p>We all already know that in most part of the cases deadlocks happen due <a href="http://aboutsqlserver.com/2011/06/09/locking-in-microsoft-sql-server-part-5-why-do-we-have-deadlocks/" target="_blank">non-optimized queries</a>. Today I&#8217;d like to show another pattern that could lead to the deadlocks. It&#8217;s not something that happens very often but it&#8217;s worth to mention.</p>
<p>Let&#8217;s think about the following scenario. Assuming you have the system that collects some data from the users. Assuming the data has a few parts that can be processed and saved independently from each other. Also let&#8217;s assume that there is some processing involved - let&#8217;s say there is a raw data part and something system needs to calculate based on that.</p>
<p>One of the approaches to architect the system is separating those updates and processing to the different threads/sessions. It could make sense in some cases - data is independent, threads and sessions would update different columns so even if they start updating the row simultaneously, in the worst case one session would be blocked for some time. Nothing terribly wrong as long as there are no multiple updates of the same row involved. Let&#8217;s take a look.</p>
<p>First, let&#8217;s create the table and populate it with some data:</p>
<p><img src="http://dwkor.net/blog/2011-11-10/pic1.png" height="558" width="291" /></p>
<p>Now let&#8217;s run the first session, open transaction and do the update of RawData1 column. Also, let&#8217;s check the plan. This update statement used non-clustered index seek/key lookup - keep this in mind, it would be important later.</p>
<p><img src="http://dwkor.net/blog/2011-11-10/pic2.png" height="285" width="524" /></p>
<p>Now let&#8217;s run the second session that updates different column on the same row. Obviously this session is blocked - first session holds (X) lock on the row.</p>
<p><img src="http://dwkor.net/blog/2011-11-10/pic3.png" height="139" width="335" /></p>
<p>Now let&#8217;s come back to the first session and try to update another column on the same row. This is the same session that holds (X) row so it should not be the problem.</p>
<p><img src="http://dwkor.net/blog/2011-11-10/pic4.png" height="121" width="231" /></p>
<p>But.. We have the deadlock.<br />
<img src="http://dwkor.net/blog/2011-11-10/pic5.png" height="67" width="590" /></p>
<p>Why? Let&#8217;s take a look at deadlock graph (click to open the new window)</p>
<p><a href="http://dwkor.net/blog/2011-11-10/pic6.png" target="_blank"><img src="http://dwkor.net/blog/2011-11-10/pic6.png" height="314" width="1134" /></a></p>
<p>So on the right we have the first session. This session holds the (X) lock on the clustered index row (PK_Users). When we ran the session 2 statement, that session obtained (U) lock on non-clustered index row (IDX_Users_ExternalID), requested (U) lock on the clustered index and was blocked because of the first session (X) lock. Now, when we ran the second update statement from the first session, it tries to request the (U) lock on the non-clustered index and obviously was blocked because the second session still holds (U) lock there. Classic deadlock.</p>
<p>As you can see, it happened because SQL Server uses non-clustered index seek/key lookup as the plan. Without non-clustered index seek everything would work just fine.</p>
<p>This is quite interesting scenario and you can argue that it does not happen often in the real life. Well, yes and no. If we think about 2 update statements in the row - yes - usually we don&#8217;t write code that way. But think about stored procedures. If the processing can be done/called from a few different places, you can decide to put the update to the stored procedure. And here you go.</p>
<p>But most importantly - there are the triggers. What if you have AFTER UPDATE trigger and want to update some columns from there. Something like that:</p>
<p><img src="http://dwkor.net/blog/2011-11-10/pic7.png" height="329" width="275" /></p>
<p>Now let&#8217;s run update statement in the first session.<br />
<img src="http://dwkor.net/blog/2011-11-10/pic8.png" height="150" width="331" /></p>
<p>And in the second session.</p>
<p><a href="http://"><img src="http://dwkor.net/blog/2011-11-10/pic9.png" height="241" width="365" /></a></p>
<p>Deadlock again. You can notice that I used ExternalId and as result non-clustered index seek/key lookup plan there. It does not make a lot of sense in this scenario - I could use UserId there and avoid the problem. So if you have to update original row from the trigger - be careful and write the query in the way that introduces clustered index seek.</p>
<p>Source code is available for <a href="http://dwkor.net/blog/2011-11-10/aboutsqlserver(2011-11-10).sql">download</a></p>
<p><a href="http://aboutsqlserver.com/2012/01/11/locking-in-microsoft-sql-server-part-12-lock-escalation/">Part 12 - Lock Escalation</a></p>
<p><a href="http://aboutsqlserver.com/2011/09/28/locking-in-microsoft-sql-server-table-of-content/">Table of content</a></p>
]]></content:encoded>
			<wfw:commentRss>http://aboutsqlserver.com/2011/11/09/locking-in-microsoft-sql-server-part-11-deadlocks-due-multiple-updates-of-the-same-row/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Sunday T-SQL Tip: Inline vs. Multi-statement Table Valued Functions</title>
		<link>http://aboutsqlserver.com/2011/10/23/sunday-t-sql-tip-inline-vs-multi-statement-table-valued-functions/</link>
		<comments>http://aboutsqlserver.com/2011/10/23/sunday-t-sql-tip-inline-vs-multi-statement-table-valued-functions/#comments</comments>
		<pubDate>Sun, 23 Oct 2011 14:13:33 +0000</pubDate>
		<dc:creator>Dmitri Korotkevitch</dc:creator>
		
		<category><![CDATA[SQL Server 2005]]></category>

		<category><![CDATA[SQL Server 2008]]></category>

		<category><![CDATA[T-SQL]]></category>

		<guid isPermaLink="false">http://aboutsqlserver.com/2011/10/23/sunday-t-sql-tip-inline-vs-multi-statement-table-valued-functions/</guid>
		<description><![CDATA[One of the biggest challenges for the developers who are not familiar with T-SQL is understanding the conceptual difference between client side and T-SQL functions. T-SQL functions look very similar to the functions developed withhigh-level programming languages. While encapsulation and code reuse are very important patterns there, it could hurt database code badly.
There are 2 [...]]]></description>
			<content:encoded><![CDATA[<p>One of the biggest challenges for the developers who are not familiar with T-SQL is understanding the conceptual difference between client side and T-SQL functions. T-SQL functions look very similar to the functions developed withhigh-level programming languages. While encapsulation and code reuse are very important patterns there, it could hurt database code badly.</p>
<p>There are 2 kinds of functions in Microsoft SQL Server that can return table result set. The good one and the bad one. Unfortunately the bad one is much easier to use and understand for people who used ti work with high-level languages.</p>
<p>Let&#8217;s take a look. First, let&#8217;s create 2 tables and populate them with the data. Don&#8217;t put much attention how good is the data and how logically correct are the statements - we&#8217;re talking about performance here.</p>
<p><img src="http://dwkor.net/blog/2011-10-23/pic1.png" height="541" width="342" /></p>
<p><img src="http://dwkor.net/blog/2011-10-23/pic2.png" /></p>
<p>Now let&#8217;s create the multi-statement function here and run it. As you can see, total execution time is 176 millisecond in my environment.</p>
<p><img src="http://dwkor.net/blog/2011-10-23/pic3.png" height="601" width="471" /></p>
<p>Now let&#8217;s do inline function. We need to change the original select statement and use <a href="http://aboutsqlserver.com/2010/10/10/sunday-t-sql-tip-apply-operator/" target="_blank">cross apply</a> here. Looks more complex but at the end - execution time is 106 milliseconds - about 40 percent faster.</p>
<p><img src="http://dwkor.net/blog/2011-10-23/pic4.png" height="568" width="476" /></p>
<p>Now let&#8217;s check the execution plans - as you can see - first plan (multi-statement) is very simple - CI scan + aggregate. Second (inline) introduces much more complicated execution plan. Also it worth to notice that SQL Server shows that second plan takes all the cost.</p>
<p><a href="http://dwkor.net/blog/2011-10-23/pic5.png" target="_blank"><img src="http://dwkor.net/blog/2011-10-23/pic5.png" height="347" width="849" /></a></p>
<p>How could it happen? How less expensive and simpler plan could run slower? The answer is that SQL Server lies - it does not show multi-statement function executions there at all. Let&#8217;s run the profiler and start to capture SP:Starting event.<br />
<a href="http://dwkor.net/blog/2011-10-23/pic6.png" target="_blank"><img src="http://dwkor.net/blog/2011-10-23/pic6.png" height="400" width="667" /></a></p>
<p>As you can see - multi-statement function introduces SP call for each row processed. Think about all overhead related with that. Inline functions are working similarly to C++ inline functions - those are &#8220;embedded&#8221; to the execution plan and don&#8217;t carry any SP calls overhead.</p>
<p>So the bottom line - don&#8217;t use multi-statement functions if possible. I&#8217;m going to start the set of the posts related with CTEs - and will show how you can convert very complex multi-statement functions to inline ones.</p>
<p>Source code is available for <a href="http://dwkor.net/blog/2011-10-23/aboutsqlserver(2011-10-23).sql">download </a></p>
<p><strong>Update (2011-12-18):</strong><br />
As Chirag Shah mentioned in comments, my example above is not 100% valid. I demonstrated the difference between Inline TVF and Scalar Multi-Statement function. So let&#8217;s correct that and and run the test again. (Image is clickable)</p>
<p><a href="http://dwkor.net/blog/2011-10-23/pic7.png" target="_blank"><img src="http://dwkor.net/blog/2011-10-23/pic7.png" height="919" width="523" /></a></p>
<p>As you can see, results are even worse. The main point I want to stress - as long as UDF body has begin/end keywords, SQL Server treats them similarly to stored procedures. And that hurts.</p>
<p>Source code has been updated to include the last example</p>
]]></content:encoded>
			<wfw:commentRss>http://aboutsqlserver.com/2011/10/23/sunday-t-sql-tip-inline-vs-multi-statement-table-valued-functions/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Sunday T-SQL Tip: Merge into CTE as the Target</title>
		<link>http://aboutsqlserver.com/2011/10/09/sunday-t-sql-tip-merge-into-cte-as-the-target/</link>
		<comments>http://aboutsqlserver.com/2011/10/09/sunday-t-sql-tip-merge-into-cte-as-the-target/#comments</comments>
		<pubDate>Mon, 10 Oct 2011 01:47:50 +0000</pubDate>
		<dc:creator>Dmitri Korotkevitch</dc:creator>
		
		<category><![CDATA[SQL Server 2008]]></category>

		<category><![CDATA[T-SQL]]></category>

		<guid isPermaLink="false">http://aboutsqlserver.com/2011/10/09/sunday-t-sql-tip-merge-into-cte-as-the-target/</guid>
		<description><![CDATA[If you spent some time working with Microsoft SQL Server 2008, you should be aware of the Merge statement. This statement is not only allowing you to insert/update/delete data as the part of the single statement (which helps with locking and performance), it also gives you an ability to intercept column values from the source [...]]]></description>
			<content:encoded><![CDATA[<p>If you spent some time working with Microsoft SQL Server 2008, you should be aware of the <a href="http://technet.microsoft.com/en-us/library/bb510625.aspx" target="_blank">Merge </a>statement. This statement is not only allowing you to insert/update/delete data as the part of the single statement (which helps with locking and performance), it also gives you an ability to <a href="http://aboutsqlserver.com/2010/10/24/sunday-t-sql-tips-inserted-deleted-tables-and-output-clause-part-2-merge-statement/" target="_blank">intercept </a>column values from the source rowset - something you cannot do with regular OUTPUT clause of insert, update and delete statement.</p>
<p>Today I&#8217;d like to show you another hidden beauty of this statement - ability to use CTE as the Target. Basically it gives you an ability to execute merge against subset of the data from the table. There are quite a few cases where it could be beneficial - let&#8217;s think about the situation when you need to synchronize target with the source that contains data only for subset of target rows. Confusing? Let&#8217;s think about one real life example.</p>
<p>Let&#8217;s think about order entry system and assume that you want to have a cache and store the information about last 15 orders per customer in the system. Let&#8217;s create the table and populate it with some data.</p>
<p><img src="http://dwkor.net/blog/2011-10-09/pic1.png" height="543" width="364" /></p>
<p>In this example orders are sorted by ID and partitioned by customers - so bigger ID means more recent orders. As you can see - you have 100 customers with 15 orders each in the cache.</p>
<p><img src="http://dwkor.net/blog/2011-10-09/pic2.png" height="456" width="300" /></p>
<p>Let&#8217;s assume that every day you get the data about the new orders placed into the system. This data contains the orders for subset of the customers (obviously some customers don&#8217;t place orders that day). It could also have the orders from the new customers that you don&#8217;t have in the cache. Let&#8217;s create the table:</p>
<p><img src="http://dwkor.net/blog/2011-10-09/pic3.png" height="563" width="396" /></p>
<p>As you can see, in this example we added 10 orders per customer for 21 old customers (CustomerIds from 80 to 100) as well as added 10 new customers (CustomerIds from 101 to 110).</p>
<p><img src="http://dwkor.net/blog/2011-10-09/pic4.png" height="436" width="297" /></p>
<p>What do we want to have at the end is to update the cache for existing customers (delete first 10 old orders) and add new customers to the cache. Obviously we don&#8217;t want to touch customers who did not submit any orders during the day.</p>
<p>Merge statement would work perfectly here. Although if we use Data table as the target, we will have hard time to differentiate the customers who didn&#8217;t submit any data. Fortunately we can put CTE that filters out customers who don&#8217;t have any orders today and use it as the target. Let&#8217;s take a look:</p>
<p><img src="http://dwkor.net/blog/2011-10-09/pic5.png" height="663" width="473" /></p>
<p>So, first CTE - <em>SourceData </em>- does the trick - it filters out everybody who don&#8217;t have the new orders. This would be our <em>Target</em>. Now let&#8217;s prepare the <em>Source </em>- first what we need to do is to combine data from the cache with the new data - <em>MergedData </em>CTE does that. As result of this CTE we&#8217;ll have all old and new orders combined for the customers who submits the orders today. Next - we need to determine most recent 15 orders - basically let&#8217;s sort <em>MergedData </em>(use ROW_NUMBER()) based on ID in descending order. Here is <em>SortedData </em>CTE. And now we can use first 15 rows per customer from this CTE as the <em>Source</em>.</p>
<p>The trick is what to do next - if there is the order in <em>SourceData </em>that is not in the <em>Source </em>(top 15 from <em>SortedData</em>) - it means order is old and we need to delete it from the cache. &#8220;<em>When not matched by source</em>&#8221; does that. If order is in the <em>Source </em>but not in the cache - we need to insert it (&#8221;<em>when not matched by Target</em>&#8220;). Obviously if order is in the both places, we should ignore it. And now, if you think about <em>SourceData </em>CTE which is the Target for the merge - it makes perfect sense. In case if you use the <em>dbo.Data </em>table there - all orders from the customers who did not submit data today would not be matched by <em>Source </em>and would be deleted. So CTE as the <em>Target </em>takes care of it.</p>
<p>If you look at the data, you&#8217;d see that new customers (CustomerID &gt; 100) have 10 rows in the cache with ID starting at 16. Old customers who submitted data today (CustomerID: 80..100) have last 15 orders - with ID from 11 to 25. And old customers data (CustomerID &lt; 80) is intact.</p>
<p><img src="http://dwkor.net/blog/2011-10-09/pic6.png" height="680" width="296" /></p>
<p>Source code is available for <a href="http://dwkor.net/blog/2011-10-09/aboutsqlserver(2011-10-09).sql">download</a></p>
]]></content:encoded>
			<wfw:commentRss>http://aboutsqlserver.com/2011/10/09/sunday-t-sql-tip-merge-into-cte-as-the-target/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Locking in Microsoft SQL Server (Table of Content)</title>
		<link>http://aboutsqlserver.com/2011/09/28/locking-in-microsoft-sql-server-table-of-content/</link>
		<comments>http://aboutsqlserver.com/2011/09/28/locking-in-microsoft-sql-server-table-of-content/#comments</comments>
		<pubDate>Wed, 28 Sep 2011 16:41:14 +0000</pubDate>
		<dc:creator>Dmitri Korotkevitch</dc:creator>
		
		<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://aboutsqlserver.com/2011/09/28/locking-in-microsoft-sql-server-table-of-content/</guid>
		<description><![CDATA[Just to make it simpler to navigate:

Part 1: Major Lock Types
Part 2: Locks and Transaction Isolation Levels
Part 3: Blocking in the system
Part 4: How to detect blocking
Part 5: Deadlocks
Part 6: How to troubleshoot deadlocks
Part 7: (Interlude) - Read Committed - duplicate reading
Part 8: Optimistic isolation levels
Part 9: Optimistic isolation levels - TANSTAAFL!
Part 10: What isolation [...]]]></description>
			<content:encoded><![CDATA[<p>Just to make it simpler to navigate:</p>
<ul>
<li><a href="http://aboutsqlserver.com/2011/04/14/locking-in-microsoft-sql-server-part-1-lock-types/" target="_blank">Part 1: Major Lock Types</a></li>
<li><a href="http://aboutsqlserver.com/2011/04/28/locking-in-microsoft-sql-server-part-2-locks-and-transaction-isolation-levels/" target="_blank">Part 2: Locks and Transaction Isolation Levels</a></li>
<li><a href="http://aboutsqlserver.com/2011/05/12/locking-in-microsoft-sql-server-part-3-blocking-in-the-system/" target="_blank">Part 3: Blocking in the system</a></li>
<li><a href="http://aboutsqlserver.com/2011/05/26/locking-in-microsoft-sql-server-part-4-how-to-detect-blocking/" target="_blank">Part 4: How to detect blocking</a></li>
<li><a href="http://aboutsqlserver.com/2011/06/09/locking-in-microsoft-sql-server-part-5-why-do-we-have-deadlocks/" target="_blank">Part 5: Deadlocks</a></li>
<li><a href="http://aboutsqlserver.com/2011/06/23/locking-in-microsoft-sql-server-part-6-how-to-troubleshoot-deadlocks/" target="_blank">Part 6: How to troubleshoot deadlocks</a></li>
<li><a href="http://aboutsqlserver.com/2011/08/04/locking-in-microsoft-sql-server-part-7-read-committed-duplicate-readings/" target="_blank">Part 7: (Interlude) - Read Committed - duplicate reading</a></li>
<li><a href="http://aboutsqlserver.com/2011/08/25/locking-in-microsoft-sql-server-part-8-optimistic-transaction-isolation-levels/" target="_blank">Part 8: Optimistic isolation levels</a></li>
<li><a href="http://aboutsqlserver.com/2011/09/08/locking-in-microsoft-sql-server-part-9-optimistic-transaction-isolation-levels-tanstaafl/" target="_blank">Part 9: Optimistic isolation levels - TANSTAAFL!</a></li>
<li><a href="http://aboutsqlserver.com/2011/09/26/locking-in-microsoft-sql-server-part-10-what-isolation-level-should-i-choose/" target="_blank">Part 10: What isolation level should I choose?</a></li>
</ul>
<p>Additional:</p>
<ul>
<li><a href="http://aboutsqlserver.com/2011/11/09/locking-in-microsoft-sql-server-part-11-deadlocks-due-multiple-updates-of-the-same-row/" target="_blank">Part 11: Deadlocks due multiple updates of the same row</a></li>
<li><a href="http://aboutsqlserver.com/2012/01/11/locking-in-microsoft-sql-server-part-12-lock-escalation/" target="_blank">Part 12: Lock Escalation</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://aboutsqlserver.com/2011/09/28/locking-in-microsoft-sql-server-table-of-content/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>

