Implementing a Custom Search Engine

Một phần của tài liệu Beginning ASP.NET 2.0 E-Commerce in C# 2005 From Novice to Professional PHẦN 3 docx (Trang 53 - 57)

We’ll start by presenting the custom search feature. Before moving on, it’s good to know the disadvantages of this method as compared with using Full-Text Search:

• Manual searches of the catalog are much slower than SQL Server’s Full-Text Search feature, which uses internal search indexes and features advanced search algorithms.

However, this won’t be a problem for your site until you have a large number of visitors and a large products database.

• You can’t easily implement all the features that you could use with SQL Server’s Full-Text Search, such as allowing the visitor to search using Boolean operators (AND, OR). Also, manually implementing advanced features such as searching for similar words adds even more performance penalties.

The good part about implementing the custom search engine is that you get to play a little more with some lesser-known features of SQL Server, so get ready. When it comes to manually searching the catalog, you have some options to choose from, as detailed in the following sections.

Searching Using WHERE and LIKE

The straightforward solution, most widely used in such situations, consists of using LIKE in the WHERE clause of the SELECT statement. The following query returns all the products that have the word “mask” somewhere in their description:

SELECT Name FROM Product WHERE Description LIKE '%mask%'

The percent (%) wildcard is used to specify any string of zero or more characters. Placing it before and after the word to be searched for guarantees that you’ll get all the products whose description contains the word “mask.”

This technique—using WHERE and LIKE—was used when building the SearchCatalog stored procedure for the ASP.NET 1.0 (but not in the ASP.NET 1.1) edition of this book, and is also presented in the Product Catalog case study in The Programmer’s Guide to SQL (Apress, 2003).

This is the fastest method for manually searching the catalog. In this edition of the book, we chose to present two other methods, which are not as fast as this one, but provide better search results.

Searching for Product Data in the Search String

This is yet another search strategy that doesn’t provide the best search results, but it’s worth taking a look at. This is the search method used in the Community Starter Kit (CSK).

Note The CSK is a complex, free, and customizable application (provided with full source code) from Microsoft that allows you to build powerful community web sites quickly and easily. By default, the CSK comes with out-of-the box functionality that supports nine types of content, including articles, books, events, photo galleries, downloads, user polls, and more. It also supports features such as moderation, upload quotas, comments, ratings, newsletters, advertisements, web services, and security. However, by the time of this writing, the CSK hasn’t been updated for ASP.NET 2.0. (You can download the CSK at http://www.asp.net.)

Like with BalloonShop, the CSK also implements a custom search feature instead of relying on SQL Server’s Full-Text Search, making it possible to use MSDE as the database server.

The typical search method when Full-Text Search is not available is to split the search string into words and then look into the database for these words. However, the search algorithm in the CSK goes the other way round: It takes every word from the searchable database content and verifies whether it exists in the search string. The method doesn’t offer much search flexibility, but it works.

In BalloonShop, the searchable content is the products’ names and descriptions. These are split into separate words called search keys. The search keys are saved into a special data table in the database whenever new content is added or updated in the database.

Tip To learn more about the CSK, check out another book I co-authored, Building Websites with the ASP.NET Community Starter Kit (Packt Publishing, 2004). Find more details about it at http://

www.CristianDarie.ro/books.html.

These two methods are okay, but in BalloonShop, you’ll implement something even better.

Searching by Counting the Number of Appearances

The disadvantage when searching with LIKE is that the search results are returned in random order. Today’s smart search engines provide a ranking system that places the results with higher rankings at the top of the search results page.

An intuitive solution for implementing a simple ranking system is to count how many times the words you’re searching for appear in the product’s name or description. Moreover, you can give higher ranking to products that have matching words in their names, rather than in their descriptions.

This solution can’t be implemented with LIKE and WHERE. LIKE returns True or False, spec- ifying whether you have a match or not, but can’t tell you how many times a word appears in a phrase.

Because SQL Server doesn’t provide an easy way to count how many times a substring appears in a string, you’ll manually implement this functionality as an SQL Server User-Defined Function.

Note SQL Server User-Defined Functions (UDFs) implement common functionality that can be called from stored procedures. Unlike stored procedures, which can be accessed from client applications, UDFs are created for database internal usage and can be called from stored procedures or from other UDFs. UDFs must return data to the caller function or stored procedure before exiting (their last command must be RETURN) and are not allowed to modify data—their role is to return information.

For the catalog, you’ll create a UDF named WordCount. It will take as parameters two strings and will return a SMALLINT value specifying how many times the first string appears in the second.

The definition of WordCount is CREATE FUNCTION dbo.WordCount (@Word VARCHAR(20),

@Phrase VARCHAR(1000)) RETURNS SMALLINT

This looks similar to how you create stored procedures, except that you use FUNCTION instead of PROCEDURE, you need to explicitly specify the user who owns the function (which, in the example, is dbo), and you need to specify the return data type.

Note dbo is a special database user who has full privileges on the database. If you’re not permitted to use dbo, use the username under which you have privileges on the database.

Inside WordCount, the challenge is to find an effective way to count how many times @Word appears in @Phrase, because SQL Server doesn’t provide a function for this (otherwise, you would have used that function instead of creating your own, right?).

The straightforward solution, which implies splitting the phrase where it has spaces (or other delimiter characters) and comparing word by word, is very slow. We’ve found a trick that performs the same functionality about five times faster.

SQL Server provides the REPLACE function, which replaces all occurrences of a substring in a string with another substring. REPLACE doesn’t tell you how many replacements it did, but it returns the modified initial string. REPLACE works much faster than a custom created UDF or stored procedure because it’s an internal SQL Server function.

You’ll use REPLACE to replace the word to search for with a word that is one character longer.

Say you want to count how many times the word “red” appears in “This is a red, red mask.”

Replacing “red” with “redx” generates “This is a redx, redx mask.” The length difference between the initial phrase and the modified phrase tells you how many times the word “red” appears in the initial phrase (nice trick, eh?).

The code that does this appears as follows:

/* @BiggerWord is a string one character longer than @Word */

DECLARE @BiggerWord VARCHAR(21) SELECT @BiggerWord = @Word + 'x'

/* Replace @Word with @BiggerWord in @Phrase */

DECLARE @BiggerPhrase VARCHAR(2000)

SELECT @BiggerPhrase = REPLACE (@Phrase, @Word, @BiggerWord) /* The length difference between @BiggerPhrase and @phrase is the number we're looking for */

RETURN LEN(@BiggerPhrase) - LEN(@Phrase)

Searching for Similar Words

The implementation shown earlier is fast, but has one drawback: It can’t be used to search for words that are similar to (sound like) the words entered by the visitor. For example, in the current database, searching for “balloon” generates many results, whereas searching for

“balloons” generates a single result. This may or may not be what you want—you must decide what works best for your site.

You can change this behavior by changing the WordCount function. However, the version that recognizes similar versions of words is very slow because you can’t use the REPLACE func- tion anymore—you need to manually split the phrase word-by-word in SQL code, which is a time-consuming process. We’ll present this modified version of WordCount at the end of the chapter for your reference.

Tip You can implement the WordCount stored procedure in many ways, so we encourage you to play with it until you get the best results for your solution.

8213592a117456a340854d18cee57603

Một phần của tài liệu Beginning ASP.NET 2.0 E-Commerce in C# 2005 From Novice to Professional PHẦN 3 docx (Trang 53 - 57)

Tải bản đầy đủ (PDF)

(70 trang)