Designing a Microsoft SharePoint 2010 Infrastructure Vol 1 part 46 ppt

MCT USE ONLY. STUDENT USE PROHIBITED Designing an Enterprise Search Strategy 9-9 Enterprise Search Architecture Key Points The search architecture in SharePoint 2010 has changed significantly from previous versions of SharePoint. The Search Service Application (SSA) has three primary components: • The index server role hosts one or more crawlers, which connect to and index content from various sources, including: • SharePoint content • Web sites • File shares • Exchange public folders • Lotus Notes content Crawlers enumerate the source content and pass text information to the relevant index partition on the query server. The crawler also indexes any metadata to the search property database and updates the crawl status in the crawl database. MCT USE ONLY. STUDENT USE PROHIBITED 9-10 Designing a Microsoft® SharePoint® 2010 Infrastructure • The query server role hosts one or more index partitions to provide results to search queries from users. Users enter a search query on a Web Front End (WFE) server, which then directs the query to a query server. The query server retrieves information from the index partition stored on the local file system of the query server, and metadata information from the search property database. • Database servers in the farm host a search administration database, one or more search property databases, and one or more crawl databases for each SSA: • The search administration database holds search configuration details such as best bets and index access control lists (ACLs) for security trimming purposes. • The search crawl database holds information about the status of crawls, crawled items, and crawl history. • The search property database holds information associated with crawled items, such as properties (metadata) and crawl queues. Note: The index server role does not hold a copy of any index partitions as in previous versions of SharePoint. The crawler constructs index partitions directly on query servers. The FAST search architecture complements the SharePoint search architecture with a similar mechanism, but additional server roles. To implement FAST search, you must implement one or more FAST index server roles and one or more FAST query server roles. Additional Reading For more information about SharePoint search architecture, see http://go.microsoft.com/fwlink/?LinkID=167739. MCT USE ONLY. STUDENT USE PROHIBITED Designing an Enterprise Search Strategy 9-11 Crawling Content in SharePoint 2010 Key Points To connect to and crawl content in SharePoint 2010, the index server role and the crawler component must understand how to connect to the content source, and how to read the type of content that is stored at the content source. When the crawler starts the crawl of a content source, it first looks at the URL of the content source and makes the appropriate connection. After connecting to the content source, the following steps occur: 1. As individual items such as Web pages or documents are crawled, the crawler streams metadata, or document properties, into the search property database. The crawler must also understand the formatting and, in some cases, the binary storage format of the item. The crawler uses an iFilter to open files, extract the text, and discard the formatting. Note: By default, SharePoint includes iFilters for documents in Microsoft Office 2003, 2007, and 2010; HTML; text files; XML; and TIFF files. To index other file types, you must install additional iFilters. MCT USE ONLY. STUDENT USE PROHIBITED 9-12 Designing a Microsoft® SharePoint® 2010 Infrastructure 2. The crawler also performs word breaking, that is, separating the content into discrete chunks. Word breakers are characters such as punctuation or spaces that SharePoint uses to identify the end of one word, and the start of another. 3. Finally, the crawler drops noise words, such as “at,” “the,” and “is,” which are typically not useful to the result set because of the large number of matches. MCT USE ONLY. STUDENT USE PROHIBITED Designing an Enterprise Search Strategy 9-13 Controlling Crawl Behavior in SharePoint 2010 Key Points When you are planning to crawl and index content in SharePoint 2010, you must understand how to configure the crawler component to ensure that: • Content is up to date. • The correct content is indexed. • The crawler does not overload the content source with requests. Crawl Schedules You can configure full and incremental crawl schedules with the content source settings. Crawl schedules control how frequently you want the crawler to index content, thus controlling how current the index is. If you want users to be able to search for the latest content, you should ensure that you perform frequent crawls of the content (either incremental or full crawls). However, more frequent crawls will increase the load on the index, WFE, and database servers. Incremental crawls only index changed content, so an incremental crawl is typically much faster. You MCT USE ONLY. STUDENT USE PROHIBITED 9-14 Designing a Microsoft® SharePoint® 2010 Infrastructure may consider performing more frequent incremental crawls to keep the index as up to date as possible. Note: SharePoint does not automatically index content when users add or upload content to a SharePoint site. SharePoint only indexes content during a full or incremental crawl. Crawl Rules You can use crawl rules to specify certain crawl behaviors. For example, you may want to crawl your existing intranet site, which is not based on SharePoint technology, but avoid crawling the template section of the site, which includes empty document templates. You could use a crawl rule to exclude the address of the template site from the index in SharePoint. You can use crawl rules to: • Exclude or include specific address paths. • Exclude or include complex URLs (addresses that contain question marks). • Specify a user account to connect to the address, other than the default account. In SharePoint 2010, you can also create crawl rules that use regular expressions. Regular expressions are a way of describing attributes that match a certain format, such as URLs that contain social security numbers. Crawler Impact Rules When you are planning to crawl content sources outside the farm where the SSA is configured, you must consider the impact that crawling content will have on the content source itself and the users accessing the content source. If the crawler tries to index all of the content on the public Web site of the organization as fast as possible, the number of requests that the crawler generates could significantly reduce the performance of the public Web servers. This would effectively create a denial of service to other legitimate users. You can use crawler impact rules to control how many requests the crawler can make for a specific path, and whether the crawler should pause between consecutive requests. In this way, you can “throttle” how quickly the crawler can index a site or location, and how resource-intensive the crawl process is. MCT USE ONLY. STUDENT USE PROHIBITED Designing an Enterprise Search Strategy 9-15 Planning Security Accounts for SharePoint Search Key Points You must configure SharePoint search with user accounts for the search service application and content access. SharePoint uses the user account that you configure for the search service application as the security context for running the search service. SharePoint search uses the content access account to connect to content sources and crawl content. For this reason, it is important that the content access account has read permission to required content sources. When you configure the SSA and specify the default content access account, SharePoint automatically grants the default content access account the necessary read permissions to the SharePoint content databases in the same farm. When you perform a search, SharePoint automatically shows only search results that are relevant and that you have permission to access. This mechanism is called security trimming. For SharePoint to perform security trimming on search results, the content access account must not have administrative permissions on the content sources that it is crawling. MCT USE ONLY. STUDENT USE PROHIBITED 9-16 Designing a Microsoft® SharePoint® 2010 Infrastructure For accessing content sources that are external to SharePoint, the SSA can use the default content access account, or you can specify a different account by using a crawl rule. MCT USE ONLY. STUDENT USE PROHIBITED Designing an Enterprise Search Strategy 9-17 Planning Search Federation Key Points Search in SharePoint 2010 includes a federated search component. Federated search enables SharePoint search to forward a search query to other, preconfigured search services that use OpenSearch technologies, such as Bing or a different SharePoint farm search instance. You configure federated search by adding federated locations to the SSA. You can also control whether SharePoint always queries the federated location, or only queries the federated location if the search matches a specific prefix or pattern. This is known as the trigger. For example, you could configure all queries to search Bing in addition to SharePoint search, but queries that start with the words “weather check,” such as “weather check London,” would be sent to a different federated location. Note: In the previous example, SharePoint search would only send the word London to the federated search location. MCT USE ONLY. STUDENT USE PROHIBITED 9-18 Designing a Microsoft® SharePoint® 2010 Infrastructure You can use federated locations to provide a means for searching other SharePoint search applications. For example, you could use federated search during a migration from a Microsoft Office SharePoint Server 2007 farm to a SharePoint Server 2010 farm to remove the need to crawl the Office SharePoint Server 2007 content from the SharePoint Server 2010 farm. When you plan your SSAs, you should include planning for the number of federated locations that you should include and the type of trigger that you require for each federated location. . indexes any metadata to the search property database and updates the crawl status in the crawl database. MCT USE ONLY. STUDENT USE PROHIBITED 9 -10 Designing a Microsoft SharePoint 2 010 Infrastructure. search administration database, one or more search property databases, and one or more crawl databases for each SSA: • The search administration database holds search configuration details such. the address, other than the default account. In SharePoint 2 010 , you can also create crawl rules that use regular expressions. Regular expressions are a way of describing attributes that match

Designing a Microsoft SharePoint 2010 Infrastructure Vol 1 part 46 ppt

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan