Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 29 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
29
Dung lượng
1,27 MB
Nội dung
378 CHAPTER 14 coNfigUriNg aNd maNagiNg eNterPrise search That’s it; you are done with your tour of Foundation site search administration. Clearly, there are a lot of positives here; but keep reading. The next section covers SharePoint Server Search and Search Server. As you drool over those features, don’t forget that the Express version of Search Server is free, and you can bolt it right on top of Foundation with ease. Wow — a free solution and a more awesome Search. SHAREPOINT SERVER AND SEARCH SERVER This section covers the following products: SharePoint Server 2010 Standard SharePoint Server 2010 Enterprise SharePoint Server 2010 for Internet Sites Standard SharePoint Server 2010 for Internet Sites Enterprise Search Server 2010 Express Search Server 2010 This is the money section of the chapter. Most readers probably have one of the aforementioned products or are bugging their bosses to get one. Foundation Search is great for getting started, but it lacks the level of control you may be hoping for. FAST Search is amazing, but its price tag can be a tough hurdle to overcome in smaller environments — so that leaves you here, in a very nice and com- fortable place. Search Server versus SharePoint Server A very common question that first pops up in this conversation is “If I have SharePoint Server what do I get by adding Search Server?” The answer is simple: nothing at all. Search Server is only a subset of the functionality available in SharePoint Server and cannot be installed on an existing SharePoint Server installation. An example of a key difference is that SharePoint Server can index Active Directory information about your users after you configure and do a profile import, which is covered in Chapter 17. While Search Server can index SharePoint sites, it does not have a mechanism for doing the profile import from Active Directory, so it is unable to index user information. We will note similar limitations on Search Server throughout the chapter; otherwise, assume Search Server can perform the covered feature. The follow-up question is “What is the difference between Search Server and Search Server Express (SSX)?” Again the answer is simple: scale. SSX can only be deployed on one server in the farm. You cannot add more servers to make Search high availability. Search Server can be scaled in the same fashion as SharePoint Server, providing high availability for search and the capability to scale to some- where in the ballpark of 100 million items. Yikes! Of course, that power comes at a price. Express is free, whereas regular Search Server is not. SharePoint Server and Search Server 379 Configuration and Scale In Chapter 3 you took a good look at farm topologies and scale points. Noticeably absent from that chapter was a detailed discussion of Search. That wasn’t author laziness; the Search team at Microsoft chose to build their own tools for configuration of their service application. To access this tool, go into Central Administration Manage service applications and click on your Search service application. At the bottom of the administration window you will see the screen shown in Figure 14-6. FIGURE 146 Here you can view and modify all of the wonderful Search components. You want scale and high availability? Well, here it comes by the truckload. As indicated in the figure, there are four sections in the Search Application Topology: Admin, Crawl, Index Partition, and Databases. The first three are each addressed in the following sections. The various databases are associated with the various other components so they are discussed throughout as relevant. Admin In the Admin section of this screen you will find the Administration component. This is the boss of Search. It tells all of the other components and servers what to do by managing the topology. This component cannot be made redundant but that is okay; if this server is offline, then the rest of the servers will continue serving their role. No changes to the Search topology can be made while this server is offline. This server is responsible for such items as starting crawls, reassigning crawl tasks if it finds a crawler unavailable, and similar tasks. To store all of this information, this component uses the administration database. This database has all of the search configuration information, so when you learn how to create a new crawl rule, this is where you will find it. 380 CHAPTER 14 coNfigUriNg aNd maNagiNg eNterPrise search A fi nal note about the Admin component: It cannot be readily moved to a different server, so it will live forever on whatever server you fi rst provision it on. This might affect your planning if you are very particular about what is hosted on which server. Crawl You might think of the Crawl component as your indexer. This is the piece that will connect to your content, bring it down to the server, generate the index, and extract the necessary metadata. Notice I did not say the crawl component is your index server. This is because one crawl server can host multiple crawl components. The big change from MOSS 2007 is that the crawler does not store a copy of your index. Instead, the crawler is stateless. It simply marks the content as crawled in the crawl database and then pushes the changes for the index off to the appropriate query server. Additionally, it will take all your search property information and push it off to the property store database. The Crawl component keeps track of what it needs to crawl and what has been crawled in the crawl database, along with the crawl schedule and other details necessary for crawl operations. And the exciting part: You can have multiple crawlers assigned to the same crawl database. For you MOSS 2007 fans, this means no more relying on only one index server to build your index; now the sky is the limit regarding how much hardware you can throw at creating the index. Another benefi t of the crawler having a dedicated database is it does not add load to the property database while crawling. By default, if you have more than one crawl database associated with a service application, the load is spread between the databases by host name. Using host distribution rules, it’s possible to specify that a certain host (think content source like http://portal or \\server\share) is specifi cally tied to a crawl database. And because you assign Crawl components to specifi c crawl databases, you can now ensure that you have your most powerful crawlers working on that database. You may even choose to have that crawl database on a dedicated SQL Server. If you have multiple databases and you want to fi nd out what hosts are in what database, you can do that in the crawl log. Details about this cool capability follow later in the chapter. Index Partition You just learned about crawlers, and how they create an index but don’t store the actual index. The storage is actually done by the Query component. The Query component is responsible for respond- ing to search queries. When a user on a SharePoint site types “Cow” in the search box and hits Search, the web server hands that off to the Query Component server, more often than not just called the query server. The query server then digs through the index and property database to come up with a list of items for the search. Security trimming then takes place, and fi nally the web server renders those results back to the user. SharePoint Server and Search Server 381 If you want to add scale, you can actually divide the index into multiple partitions, or pieces (as described later in this chapter). That way, you can assign each partition to a query server. For example, if you have one million items in your index prior to partitioning, it might take one second to find your search results. If you divide that into two partitions and put each partition on its own query server, your index still has one million items in it but each query server has only 500,000 items in its partition to look through. Now your query results can be aggregated and returned to your browser in .5 seconds. That is how you scale the query servers for faster results. An important threshold for an index partition is 10 million items, the maximum number supported in a partition. Also, remember that each time you want to introduce a new partition you need to introduce a new query server. Very little is gained, and more than likely you actually will decrease performance, if you have only one query server and you try to break your index up into two partitions with both living on the same query server. Unlike the crawl databases that are divided up by hosts, the index partitions try to maintain a very close balance. So each item is sent to an index partition based on a hash of its document id. This method provides better scale with query partitions. Now you have two query servers but each one has half the index (its own partition). Next you need to configure redundancy. Partitions can also have mirrors. The mirror partition can be configured to respond to queries only if the primary partition is unavailable, or it can be a fully functional mirror that responds to queries. The balancing of query traffic is handled by the Search Admin component and is automatic. Typically, your index partition will be served by only one Query component, and configured with a failover mirror. The final piece here is the property database. This database stores all of the metadata associated with the index partition(s) to which it is connected. An index partition is associated with only one partition database, but a partition database can be connected to multiple index partitions. This SQL Server data- base can become a bottleneck over time as it grows. If that is the case, you can either move the database to a bigger, badder SQL Server or reduce the number of partitions associated with it. Adding a Server to the Search Topology Consider a scenario in which the server farm is fully configured with everything, including SQL Server, running on one machine. Another server, ServerRC, has been purchased, has the same ver- sion of SharePoint Server 2010 Enterprise installed, and is added to the farm. The initial configura- tion wizard has been run on the new server. This started the appropriate services on this server. To add the second server to your Search topology, follow these steps: 1. Open Central Administration Application Management Manage service applications. 2. Find your search service application and open the Manage interface. Remember that Search topology is defined per Search service application if for some reason you have more than one. 3. Scroll down the page and click Modify (refer to Figure 14-3). 4. Click New, and from the drop-down select Crawl Component. 5. For Server, select your new server’s name. For this example, it is ServerRC. 6. For Associated Crawl Database, select the Crawl Database from which you want this crawler to work. 382 CHAPTER 14 coNfigUriNg aNd maNagiNg eNterPrise search 7. If necessary, change the Temporary Location of Index. This location will only be used for creating the index updates before pushing them out, and it should remain relatively small. It will not increase in size as your index grows. Check out Figure 14-7 for an example and then click OK. FIGURE 147 8. You are returned to the Manage Search Topology screen, where you will see Pending creation next to your new component. Click the Apply Topology Changes button at the bottom of the screen, unless you plan to also add the Query component in the next set of steps. If so, skip this step. A processing screen will appear and process for a few minutes. Once it is complete, you are all set. You now have configured the two servers to share the load of the one crawl database. The next logi- cal step is to configure your new server to also be a query server. With the second Query component, you will get a second index partition, so you will want to define a mirror for each of your two partitions: 1. Return to the Search administration screen and click the Modify Search Application Topology button. 2. Click New. From the drop-down, select Index Partition and Query Component. 3. For Server, select your new server. 4. For Associated Property Database, choose the database you want this query component to use. You haven’t created any additional ones, so there should only be one item in the list. 5. Location of Index is an important consideration. This is where the physical index files will be stored on the server. Ensure that you have enough storage capacity in your chosen location. If at all possible, this should be on its own dedicated drive. SharePoint Server and Search Server 383 6. Leave the Set this query component as failover-only at its default setting of unchecked as illustrated in Figure 14-8. FIGURE 148 7. After you confirm your settings, click OK. This will automatically create Query component 2. 8. Now you have the two partitions you need to set up the mirrors. Hover over Query compo- nent 1, click the drop-down, and select Add Mirror. 9. For Server, choose the server that is currently not hosting this partition. 10. Confirm that your Index location is correct. (Remember that the C: drive is a bad place.) 11. Check the box for Set the query component as failover-only. 12. Click OK. 13. Repeat steps 8–12 for Query component 2. 14. You are returned to the Manage Search Topology screen. You will see Pending creation next to your new component. Click the Apply Topology Changes button at the bottom of the screen. A processing screen will appear and process for a few minutes. Once it is complete you are all set. Now both servers are participating in serving Search queries and helping to crawl all of the content. You also have solid redundancy. In most environments the preceding actions will be sufficient. You have the capacity to crawl a lot of content in a reasonable amount of time and your Search compo- nents are high availability. Note that this does not include SQL Server. It is up to you to implement a high-availability solution for the databases, whether that is SQL Server clustering, taking advantage of the database mirroring support, or some third-party solution. 384 CHAPTER 14 coNfigUriNg aNd maNagiNg eNterPrise search Scaling Up with Crawl Databases Fast forward a little bit and your SharePoint deployment demands have increased again. You now want to add the crawling of your very large file server. Because of the size and nature of the data, you expect the crawling burden to be very high, so you choose to add another crawl database running on a dedicated SQL Server. You will also make this a dedicated database. 1. Return to the Search administration screen and click the Modify Search Application Topology button. 2. Click New and select Crawl Database. 3. For Database Server, enter the SQL Server you want to host this database. It can be the same SQL Server the rest of your farm uses, or if you’re trying to add scale because of performance constraints on your current SQL Server, it may be a dedicated SQL Server. 3. Set Database Name to anything you would like. 4. Enable the checkbox for Dedicate this crawl store to hosts as specified in Host Distribution Rules, as shown in Figure 14-9. 5. Leave the other fields as is and click OK. FIGURE 149 SharePoint Server and Search Server 385 At the bottom of page you selected the option to Dedicate this crawl store to hosts as specified in Host Distribution Rules. This rule tells the database to not store anything that is not specifically added by a host distribution rule, which you will create in the next section. If you do not make this crawl database a dedicated database, then Search will automatically balance the load in this database with the other crawl database. Don’t forget to click Apply Topology Changes once you are done making updates to your topology. If you were to now go straight into adding a host distribution rule, you would not see your new crawl database listed. That’s because you have not associated your new crawl database with a crawl component, making it useless. To fix this, you need to follow the previous steps for creating a new crawl component, but this time select the new crawl database you created. Do this on Server1 and ServerRC. Adding a Content Source and Host Distribution Rule In these steps you will add a file share content source and then add it to the crawl database you specified earlier: 1. Go to the Search Administration page. 2. On the left side of the page, click Content Sources. 3. Click New Content Source. 4. Specify a Name. 5. For Content Source Type, choose File Shares. 6. For Start Addresses, enter the UNC path to the share(s) you want to crawl — for example, \\FileServer\Share. Note that the search crawl account needs to have “read access” to the share(s) being crawled. 7. For Crawl Settings, the default is normally correct. Crawl the whole share, not just the root folder. 8. For now, leave the crawl schedule set to None. (Crawl schedules are covered later in the chapter.) 9. Content Source Priority gives you the opportunity to mark a content source as high prior- ity. This way, if overlapping content source crawls are taking place, you can specify which should have priority. 10. Skip over Start Full Crawl. You will do that the old-fashioned way in a moment. 11. Click OK. Figure 14-10 shows a sample configuration. 386 CHAPTER 14 coNfigUriNg aNd maNagiNg eNterPrise search FIGURE 1410 Creating a Host Distribution Rule Now your file share content source is created. Before you start that full crawl, you need to set up your host distribution rule: 1. On the left side of the screen, click Host Distribution Rules. 2. Click the button for Add Distribution Rule. SharePoint Server and Search Server 387 3. For Hostname, enter FileServer. (Do not use slashes, just the actual host name. For example, if you had a content source of http://portal.contoso.com, your hostname would be portal.contoso.com. FileServer is used as the hostname here to keep up with the previous file share configured for \\FileServer\Share.) 4. From the Distribution Configuration, select the crawl database that you created in the earlier section. 5. Click OK. 6. Click Apply Changes. This will check to determine whether any content must be moved from one crawl database to another to comply with your new rule. If so, you are warned that this takes time and that any active/pending crawls will be paused for the duration of the move. Click the Redistribute Now button when you are ready to commit to the changes. Starting a Crawl With all of that done you are now ready to do a crawl of your content sources and watch them split up across the databases: 1. Click Content Sources on the left side of the screen. 2. Hover over File Share (your content source), click the drop-down, and select Start Full Crawl. 3. Click Search Administration on the top left. 4. Now you can get a nice can of Mountain Dew, and sit back and watch the crawler go. Perfect! Now you have your entire file share in one dedicated crawl database with two dedicated crawlers. Keep in mind that your dedicated crawlers are still on the same crawl server as the other crawlers. If you needed more scale, you could introduce more servers into the farm, create new crawl components on those servers, and then assign those crawlers to this crawl database and remove the current two. Scaling up is as flexible as Silly Putty. Matching Crawl Databases to Hosts For the final trick when it comes to playing with crawl databases, you need to look at the crawl logs: 1. On the left side of the Search Administration page, click Crawl Log. 2. From the top menu bar, click Host Name. Behold! All of your crawl databases are listed, and each one shows what hosts are included in the database. Take a gander at Figure 14-11. It doesn’t reflect the preceding steps, but rather includes some inter- esting things to test your knowledge. [...]... database, a new index partition is created as a by-product That’s because the index partition is associated with a specific property database and cannot be changed This means that you now need to reevaluate your index partitions For example, the partition you just created doesn’t have a mirror You need to add a mirror to it And the old partition is gone but the mirror of that partition is still floating... normal SharePoint Search Web Parts with FAST bolted on top of them This reduces their development time and your administrative learning curve because the Web Parts have a very familiar feel to them The second thing to note is that there are no more hidden query objects In SharePoint 2007, the communication between the Web Parts was not accessible by developers, so if they wanted to add a Search Web Part. .. public folders — SharePoint knows how to talk to Exchange to index public folders In addition, Exchange 2007 and 2010 have change logs that SharePoint can access, enabling it to perform true incremental crawls against these sources ➤➤ Line of business data — This option is similar to the Business Data content source option from SharePoint 2007 If you have an Enterprise license for SharePoint 2010, you can... ➤➤ Web sites — Non -SharePoint websites can be crawled and indexed by SharePoint Search, and made part of the Search index For instance, maybe your organization uses SharePoint to host its intranet, but the public-facing Internet site is a traditional website Because useful information is also posted on the public site, you could set up a crawl source of that website to include in SharePoint Search results... hardware now just works out of the box with no effort on your part Query Federation Query federation enables you to add search results from any OpenSearch-compliant search engine to your SharePoint site These results appear in a separate Web Part on the right-hand side of the screen and are not intermixed with your SharePoint results Also, this Web Part is asynchronous by default, which means it will load... within SharePoint You can crawl all external data sources or select specific data sources to be included in the content source ➤➤ Custom repository — In SharePoint 2010 you can connect to additional content sources by creating your own custom connectors Protocol handlers from MOSS 2007 have been deprecated and replaced with these connectors The best part is that the connector framework is common across SharePoint. .. hassle and expense of having SharePoint crawl across the WAN Instead, they set each farm to crawl itself, and then use Search Federation to display results from both farms on the same page Remember, though, these are two separate sets of results and will not be combined Extensible Web Parts Extensible Web Parts sounds an awful lot like a developer topic, and for the most part it is, but as a good admin... previous crawls, set up sharePoint server and search server ❘ 399 crawl rules, manage your index, and configure the fi le types that should be crawled, among other options The following list outlines the available Crawl settings ➤➤ Content Sources — SharePoint can’t crawl what it can’t find Use the Content Sources link to define what SharePoint will be crawling Lucky for you, SharePoint was nice enough... which includes all your existing SharePoint web applications, as shown in Figure 14-21 (Any web applications added after Crawl is configured are also automatically added to this default source.) FIguRE 14-21 You can create a new content source by clicking the New Content Source link on the toolbar You are not limited to crawling SharePoint sites, however SharePoint 2010 enables you to create six different... carried over to SharePoint 2010 and is available out of the box Setting up federated locations enable a user’s query to be performed on multiple, alternative sources along with the standard SharePoint search index Internet search engine results 404 ❘ Chapter 14 Configuring and Managing Enterprise Search can be incorporated into the results, as well as databases and search scopes defined for SharePoint . awesome Search. SHAREPOINT SERVER AND SEARCH SERVER This section covers the following products: SharePoint Server 2010 Standard SharePoint Server 2010 Enterprise SharePoint Server 2010 for Internet. with the index partition(s) to which it is connected. An index partition is associated with only one partition database, but a partition database can be connected to multiple index partitions Enterprise SharePoint Server 2010 for Internet Sites Standard SharePoint Server 2010 for Internet Sites Enterprise Search Server 2010 Express Search Server 2010 This is the money section of the chapter.