Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 50 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
50
Dung lượng
416,11 KB
Nội dung
CHAPTER 7 SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS server. For a further discussion of the correct placement of data and application logic, refer back to Chapter 1. Examples of commonly cited situations in which SQLCLR is perhaps a better choice than TSQL include manipulation of string or XML data, certain math functions that are provided by dedicated methods in the .NET Base Class Library, and situations where procedural code is more efficient than set- based logic. In order to test the validity of these claims, I decided to set up some simple test cases to compare the relative performance of T-SQL against SQLCLR, which are described in the following sections. Creating a “Simple Sieve” for Prime Numbers For this test, I created two simple procedures that return a list of all prime numbers up to a supplied maximum value—one implemented in T-SQL, and one using SQLCLR. The logic of these tests was made as simple as possible: each is supplied with a maximum value that is decremented in a loop, and in each iteration of the loop, the modulo operator is used to determine the remainder when that value is divided by every lesser number. If the remainder of the division is 0 (in other words, we have found a factor), we know that the current value is not a prime number. The loop therefore moves on to test the next possible value. If the inner loop tests every possible divisor and has not found any factors, then we know the value must be a prime. Using this kind of “simple sieve” algorithm for finding prime numbers relies on basic mathematical functions and procedural logic, which makes it a good test to compare the performance of T-SQL and SQLCLR. Here’s the T-SQL implementation: CREATE PROCEDURE ListPrimesTSQL ( @Limit int ) AS BEGIN DECLARE -- @n is the number we're testing to see if it's a prime @n int = @Limit, --@m is all the possible numbers that could be a factor of @n @m int = @Limit - 1; -- Loop descending through the candidate primes WHILE (@n > 1) BEGIN -- Loop descending through the candidate factors WHILE (@m > 0) BEGIN -- We've got all the way to 2 and haven't found any factors IF(@m = 1) BEGIN PRINT CAST(@n AS varchar(32)) + ' is a prime' BREAK; END -- Is this @m a factor of this prime? IF(@n%@m) <> 0 BEGIN -- Not a factor, so move on to the next @m SET @m = @m - 1; CONTINUE; END 179 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Download at WoweBook.com CHAPTER 7 SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS ELSE BREAK; END SET @n = @n-1; SET @m = @n-1; END END; GO And here’s the SQLCLR implementation using exactly the same logic: [SqlProcedure] public static void ListPrimesCLR(SqlInt32 Limit) { int n = (int)Limit; int m = (int)Limit - 1; while(n > 1) { while(m > 0) { if(m == 1) { SqlContext.Pipe.Send(n.ToString() + " is a prime"); } if(n%m != 0) { m = m - 1; continue; } else { break; } } n = n - 1; m = n - 1; } } Note Clearly, if you actually wanted to get a list of the prime numbers, you would NOT use such a naive approach as this. The example used here is intended to provide a simple procedure that can be implemented consistently across both T-SQL and SQLCLR. 180 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. CHAPTER 7 SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS I tested each solution several times, supplying different values for the maximum limit from which the loop starts. The average execution time for each solution is shown in the graph illustrated in Figure 7-1. Figure 7-1. Comparison of prime number sieve implemented in T-SQL and SQLCLR The results should come as no surprise—since the approach taken relies on mathematical operations in an iterative loop, SQLCLR is always likely to outperform set-based T-SQL. However, you might be surprised by the magnitude of the difference between the two solutions, especially as the number of iterations increases. If we were to compare simple inline, or nonprocedural, calculations then there would likely not be such a stark contrast between the two methods. Calculating Running Aggregates Few practical database applications need to produce a list of prime numbers—a more common type of mathematical query operation that might benefit from the use of SQLCLR is when you need to calculate a value in a row based on the value of previous rows’ data. The most common example of such a linear query is in the calculation of aggregates, such as running sums of columns. The typical approach using T-SQL is to make use of a self-join on the table, such as follows: SELECT T1.x, SUM(T2.x) AS running_x FROM T AS T1 INNER JOIN T AS T2 ON T1.x >= T2.x GROUP BY T1.x; 181 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. CHAPTER 7 SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS Unfortunately, the process required to satisfy this query is not very efficient. Assuming that an index exists on the x column of the table, the preceding query generates the execution plan shown in Figure 7-2. Figure 7-2. A nested index seek used to create a running sum in T-SQL To sum all of the previous values in the column requires a nested loop containing an index seek. The number of rows returned by this seek increases exponentially as more rows are processed. On the first row, this seek must only sum one value, but to find the running sum over a set of 100 rows, 5,050 total rows need to be read. For a set of 200 rows, the query processor needs to process 20,100 total rows—four times the amount of work required to satisfy the previous query. Thus, the performance of this approach to calculate running aggregates degrades rapidly as more rows are added to the table. An alternative solution, which can yield some significant performance benefits, is to make use of a cursor. There is a commonly held perception in the SQLServer development world that cursors are a bad thing, but they do have valid use cases, and this might be one of them. However, there are a number of good reasons why many developers are reluctant to use cursors, and I’m certainly not advocating their use in general. A better approach would be to use SQLCLR to loop through and store the running values using local variables, and then stream the results one row at a time via the SqlPipe. An example of such a solution is given in the following code listing: [Microsoft.SqlServer.Server.SqlProcedure] public static void RunningSum() { using (SqlConnection conn = new SqlConnection("context connection=true;")) { SqlCommand comm = new SqlCommand(); comm.Connection = conn; comm.CommandText = "SELECT x FROM T ORDER BY x"; SqlMetaData[] columns = new SqlMetaData[2]; columns[0] = new SqlMetaData("Value", SqlDbType.Int); columns[1] = new SqlMetaData("RunningSum", SqlDbType.Int); int RunningSum = 0; SqlDataRecord record = new SqlDataRecord(columns); 182 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. CHAPTER 7 SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS SqlContext.Pipe.SendResultsStart(record); conn.Open(); SqlDataReader reader = comm.ExecuteReader(); while (reader.Read()) { int Value = (int)reader[0]; RunningSum += Value; record.SetInt32(0, (int)reader[0]); record.SetInt32(1, RunningSum); SqlContext.Pipe.SendResultsRow(record); } SqlContext.Pipe.SendResultsEnd(); } } I’ve used this solution on a number of occasions and find it to be very efficient and maintainable, and it avoids the need for any temp tables to be used to hold the running sums as required by the alternatives. When testing against a table containing 100,000 rows, I achieve an average execution time of 2.7 seconds for the SQLCLR query, compared to over 5 minutes for the TSQL equivalent. String Manipulation To compare the performance of string-handling functions between T-SQL and SQLCLR, I wanted to come up with a fair, practical test. The problem is that there are lots of ingenious techniques for working with string data: in T-SQL, some of the best performing methods use one or more common table expressions (CTEs), CROSS APPLY operators, or number tables; or convert text strings to XML in or order to perform nontrivial manipulation of character data. Likewise, in SQLCLR, the techniques available differ considerably depending on whether you use the native String methods or those provided by the StringBuilder class. I decided that, rather than try to define a scenario that required a string-handling technique, the only fair test was to perform a direct comparison of two built-in methods that provided the equivalent functionality in either environment. I decided to settle on the T-SQL CHARINDEX and .NET’s String.IndexOf(), each of which searches for and returns the position of one string inside another string. For the purposes of the test, I created nvarchar(max) strings of different lengths, each composed entirely of the repeating character a. I then appended a single character x onto the end of each string, and timed the performance of the respective methods to find the position of that character over 10,000 iterations. 183 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. CHAPTER 7 SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS The following code listing demonstrates the T-SQL method: CREATE PROCEDURE SearchCharTSQL ( @needle nchar(1), @haystack nvarchar(max) ) AS BEGIN PRINT CHARINDEX(@needle, @haystack); END; And here’s the() CLR equivalent: [SqlProcedure] public static void SearchCharCLR(SqlString needle, SqlString haystack) { SqlContext.Pipe.Send( haystack.ToString().IndexOf(needle.ToString()).ToString() ); } Note that the starting position for CHARINDEX is 1-based, whereas the index numbering used by IndexOf() is 0-based. The results of each method will therefore differ, but they will have done the same amount of work obtaining that result. I tested each procedure as follows, substituting different parameter values for the REPLICATE method to change the length of the string to be searched: DECLARE @needle nvarchar(1) = 'x'; DECLARE @haystack nvarchar(max); SELECT @haystack = REPLICATE(CAST('a' AS varchar(max)), 8000) + 'x'; EXEC dbo.SearchCharTSQL @needle, @haystack; The execution times required for 10,000 runs of each method are shown in Figure 7-3. Figure 7-3. Comparing performance of CHARINDEX against String. ()IndexOf() 184 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. CHAPTER 7 SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS As with the prime number sieve example given earlier, the logic required for string searching, matching, and replacing is best suited to the highly efficient routines provided by the .NET Base Class Library. If you currently have code logic that relies heavily on T-SQL string functionality including CHARINDEX, PATINDEX, or REPLACE, I highly recommend that you investigate the alternative options available through SQLCLR—you might be surprised by the performance gain you achieve. Enhancing Service Broker Scale-Out with SQLCLR Having discussed some of the theory behind working with the SQLCLR and given some isolated performance comparisons, let’s now turn our attention to a more detailed example that puts these ideas into practice. Service Broker is frequently mentioned as an excellent choice for helping to scale out database services. One of the more compelling use cases is a Service Broker service that can be used to asynchronously request data from a remote system. In such a case, a request message would be sent to the remote data service from a local stored procedure, which could do some other work while waiting for the response—the requested data—to come back. There are many ways to architect such a system, and given that Service Broker allows messages to be sent either as binary or XML, I wondered which would provide the best overall performance and value from a code reuse perspective. In the following sections, I’ll guide you through my investigations into XML and binary serialization using SQLCLR. XML Serialization I started working with the HumanResources.Employee table from the AdventureWorks2008 database as a sample data set, imagining a remote data service requesting a list of employees along with their attributes. After some experimentation, I determined that the FOR XML RAW option is the easiest way to serialize a table in XML format, and I used the ROOT option to make the XML valid: DECLARE @x xml; SET @x = ( SELECT * FROM HumanResources.Employee FOR XML RAW, ROOT('Employees') ); GO XML is, of course, known to be an extremely verbose data interchange format, and I was not surprised to discover that the data size of the resultant XML is 105KB, despite the fact that the HumanResources.Employee table itself has only 56KB of data. I experimented with setting shorter column names, but it had very little effect on the size and created what I feel to be unmaintainable code. Next, I set up a trace to gather some idea of the performance of the XML serialization (for more information on traces, refer to Chapter 3). The trace results revealed that the average execution time for the preceding query on my system, averaged over 1,000 iterations, was a decidedly unimpressive 3.9095 seconds per iteration. After some trial and error, I discovered that XML serialization could be made to perform better by using the TYPE directive, as follows: DECLARE @x xml; SET @x = ( 185 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. CHAPTER 7 SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS SELECT * FROM HumanResources.Employee FOR XML RAW, ROOT('Employees'), TYPE ); GO This change brought the average time per iteration down slightly, to 3.6687 seconds—an improvement, but still not a very good result. XML Deserialization Even though XML serialization had not yielded impressive performance, I decided to carry on and test deserialization. The first problem was the code required to deserialize the XML back into a table. In order to get back the same table I started with, I had to explicitly define every column for the result set; this made the code quite a bit more complex than I’d hoped for. Furthermore, since the XQuery value syntax does not support the hierarchyid datatype, the values in the OrganizationNode column must be read as nvarchar and then CAST to hierarchyid. The resulting code is as follows: DECLARE @x xml; SET @x = ( SELECT * FROM HumanResources.Employee FOR XML RAW, ROOT('Employees'), TYPE ); SELECT col.value('@BusinessEntityID', 'int') AS BusinessEntityID, col.value('@NationalIDNumber', 'nvarchar(15)') AS NationalIDNumber, col.value('@LoginID', 'nvarchar(256)') AS LoginID, CAST(col.value('@OrganizationNode', 'nvarchar(256)') AS hierarchyid) AS OrganizationNode, col.value('@JobTitle', 'nvarchar(50)') AS JobTitle, col.value('@BirthDate', 'datetime') AS BirthDate, col.value('@MaritalStatus', 'nchar(1)') AS MaritalStatus, col.value('@Gender', 'nchar(1)') AS Gender, col.value('@HireDate', 'datetime') AS HireDate, col.value('@SalariedFlag', 'bit') AS SalariedFlag, col.value('@VacationHours', 'smallint') AS VacationHours, col.value('@SickLeaveHours', 'smallint') AS SickLeaveHours, col.value('@CurrentFlag', 'bit') AS CurrentFlag, col.value('@rowguid', 'uniqueidentifier') AS rowguid, col.value('@ModifiedDate', 'datetime') AS ModifiedDate FROM @x.nodes ('/Employees/row') x (col); GO The next problem was performance. When I tested deserializing the XML using the preceding query, performance went from poor to downright abysmal—averaging 6.8157 seconds per iteration. At this point, I decided to investigate SQLCLR options for solving the problem, focusing on both reuse potential and performance. 186 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. CHAPTER 7 SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS Binary Serialization with SQLCLR My first thought was to return binary serialized DataTables; in order to make that happen, I needed a way to return binary-formatted data from my CLR routines. This of course called for .NET’s BinaryFormatter class, so I created a class called serialization_helper, cataloged in an EXTERNAL_ACCESS assembly (required for System.IO access): using System; using System.Data; using System.Data.SqlClient; using System.Data.SqlTypes; using Microsoft.SqlServer.Server; using System.Security.Permissions; using System.Runtime.Serialization.Formatters.Binary; public partial class serialization_helper { public static byte[] getBytes(object o) { SecurityPermission sp = new SecurityPermission( SecurityPermissionFlag.SerializationFormatter); sp.Assert(); BinaryFormatter bf = new BinaryFormatter(); using (System.IO.MemoryStream ms = new System.IO.MemoryStream()) { bf.Serialize(ms, o); return(ms.ToArray()); } } public static object getObject(byte[] theBytes) { using (System.IO.MemoryStream ms = new System.IO.MemoryStream(theBytes, false)) { return(getObject(ms)); } } public static object getObject(System.IO.Stream s) { SecurityPermission sp = new SecurityPermission( SecurityPermissionFlag.SerializationFormatter); sp.Assert(); 187 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. CHAPTER 7 SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS BinaryFormatter bf = new BinaryFormatter(); return (bf.Deserialize(s)); } }; Use of this class is fairly straightforward: to serialize an object, pass it into the getBytes method. This method first uses an assertion, as discussed previously, to allow SAFE callers to use it, and then uses the binary formatter to serialize the object to a Stream. The stream is then returned as a collection of bytes. Deserialization can be done using either overload of the getObject method. I found that depending on the scenario, I might have ready access to either a Stream or a collection of bytes, so creating both overloads made sense instead of duplicating code to produce one from the other. Deserialization also uses an assertion before running, in order to allow calling code to be cataloged as SAFE. My first shot at getting the data was to simply load the input set into a DataTable and run it through the serialization_helper methods. The following code implements a UDF called GetDataTable_Binary, which uses this logic: [Microsoft.SqlServer.Server.SqlFunction( DataAccess = DataAccessKind.Read)] public static SqlBytes GetDataTable_Binary(string query) { SqlConnection conn = new SqlConnection("context connection = true;"); SqlCommand comm = new SqlCommand(); comm.Connection = conn; comm.CommandText = query; SqlDataAdapter da = new SqlDataAdapter(); da.SelectCommand = comm; DataTable dt = new DataTable(); da.Fill(dt); //Serialize and return the output return new SqlBytes( serialization_helper.getBytes(dt)); } This method is used by passing in a query for the table that you’d like to get back in binary serialized form, as in the following example: DECLARE @sql nvarchar(max); SET @sql = 'SELECT BusinessEntityID, NationalIDNumber, LoginID, OrganizationNode.ToString(), OrganizationLevel, JobTitle, BirthDate, 188 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. [...]... 203 CHAPTER 8 DYNAMIC T -SQL SqlConnection sqlConn = new SqlConnection( "Data Source=localhost; Initial Catalog=AdventureWorks2008; Integrated Security=SSPI"); sqlConn.Open(); for (int i = 1; i . [Microsoft.SqlServer .Server. SqlFunction( DataAccess = DataAccessKind.Read)] public static SqlBytes GetDataTable_Binary(string query) { SqlConnection conn = new SqlConnection("context. following code listing: [Microsoft.SqlServer .Server. SqlProcedure] public static void RunningSum() { using (SqlConnection conn = new SqlConnection("context