1. Trang chủ
  2. » Công Nghệ Thông Tin

How to Do Everything With Your Scanner- P50 pdf

5 156 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 5
Dung lượng 179,72 KB

Nội dung

Scanning Text Documents Using OCR Software Chapter 13 Copyright 2001 by The McGraw-Hill Companies, Inc. Click Here for Terms of Use. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. How To… ■ Understand how OCR software works ■ Recognize the limitations of OCR software ■ Properly prepare your original document for OCR software to read ■ Optimize scanning of text documents Installing Optical Character Recognition (OCR) software is sort of like teaching your computer to read and type text into your PC. It allows you to scan a document and convert it to a text file you can edit. This is a very useful feature, but the process can be plagued with inaccurately read characters and resultant errors. This chapter gives you a few pointers on how to make the process more accurate. Accuracy is the hallmark of effectiveness when scanning text documents. What OCR Software Does OCR software has the capabilities to do the following: ■ Capture text images from your scanner ■ Compare those characters to an existing database of characters and identify them ■ Produce output that you can edit OCR software looks at the millions of tiny dots that make up the characters on a page of text. It sifts to find characters that it recognizes, and converts those characters into a new, readable text file. A Look at the Leading OCR Programs Currently the leading OCR software is TextBridge Pro. Frequently bundled with scanners, TextBridge Pro converts scanned files into Microsoft Word. Once a file is in Microsoft Word format, you can covert it to other Microsoft applications, such as Access, Excel files, Internet Explorer, Netscape Navigator, and FrontPage. You can learn more about this popular program at the manufacturer’s Website at www.scansoft.com, as shown in Figure 13-1. This program sells for about $80, as of the writing of this book. 230 How to Do Everything with Your Scanner Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. FIGURE 13-1 TextBridge Pro is a popular OCR program that comes bundled with many scanners. A more sophisticated version, TextBridge Pro Millennium Business Edition, is intended to bridge the gap between bundled scanner software and a full-blown professional program. It comes with special tools for managing documents in a business setting, and sells for around $500, as of the writing of this book. ScanSoft also offers another sophisticated program, called OmniPage Pro. In addition to offering basic text-reading and conversion capabilities, this product offers CHAPTER 13: Scanning Text Documents Using OCR Software 231 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. enhanced accuracy and automatic correction features for crooked and damaged pages. It also has more sophisticated capabilities to retain the original formatting of documents, such as Excel spreadsheets and magazine articles with multiple columns. This software also sells for just under $500, as of the writing of this book. How OCR Software Works OCR software uses mathematical algorithms to classify the characters your scanner extracts during the scanning process. Depending on what kind of software you’re using, the program might use one or both of the following methods: Matrix matching This process identifies scanned symbols by matching them against character templates within the software. This process is gradually being replaced by the feature extraction method in newer software programs. Feature extraction This method analyzes each character extracted by your scanner using a mathematical algorithm (a formula). For example, the algorithm might describe mathematical information about a circle, which is the letter o, or other shapes and angles that are common to all letters. Both these methods achieve fast results—much faster than having a typist simply input the documents into a PC. For example, an average typist can input about 60 words per minute. Your scanner, using OCR software, can extract upwards of 600 words per minute. Each character identified by the OCR software is assigned a confidence level, depending on how closely the extracted character corresponds to the specifications of the algorithm. Characters below a certain confidence level are flagged, and a certain character is substituted, such as a ~, which you must hand-correct. Many OCR programs boast an accuracy rate of 90 percent or better, which means you might need to hand-correct up to 10 errors for every 100 words of text. Recognize the Limitations of OCR Software Optical character recognition is designed to read only text—sometimes referred to as machine type. It’s important to understand that OCR software cannot read any of the following: ■ Handwriting 232 How to Do Everything with Your Scanner Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. ■ Data that must be extracted from boxes on forms such as tax returns (although high-end programs offer some capabilities in this area) ■ Bar codes Additionally, OCR software can be very finicky about the print quality it will convert. Carbon copies, newspapers, poor-quality faxes, and old documents typed on manual typewriters can pose problems for your OCR software. Prepare Your Original Document for OCR Software to Read When scanning a document prior to using your OCR software, always choose the best original available. Before scanning a paper document, inspect it carefully and attempt to fix the following: Missing or broken characters If your document has characters with small breaks or gaps, your eyes might be able to read it, but your OCR scanner might not. Look closely at the copy (maybe even with a magnifying glass). Try to find a copy with the least number of gaps, or reduce the brightness setting on your scanner to obscure them. Dirt and smudges You can keep these marks from confusing your scanner and OCR software by covering them with thin white correction tape. Handwritten notations Obviously you want to avoid writing on documents destined for OCR conversion. If someone has already done so, cover it up with white correction tape. Staple holes and wrinkles Cover these with correction tape as well. Facsimile documents If you plan to scan a facsimile document, ask the sender to transmit it using fine mode rather than standard. Glossy paper Glossy paper is less susceptible to clear, clean scanning. When an original has been scanned on glossy paper, a photocopy might work better for OCR conversion. Don’t try to fix a poor-quality original simply by photocopying it. A poor original simply makes a poor photocopy. CHAPTER 13: Scanning Text Documents Using OCR Software 233 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. . book. 230 How to Do Everything with Your Scanner Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. FIGURE 13-1 TextBridge Pro is a popular OCR program that comes bundled with many. important to understand that OCR software cannot read any of the following: ■ Handwriting 232 How to Do Everything with Your Scanner Please purchase PDF Split-Merge on www.verypdf.com to remove. teaching your computer to read and type text into your PC. It allows you to scan a document and convert it to a text file you can edit. This is a very useful feature, but the process can be plagued with

Ngày đăng: 03/07/2014, 15:20

w