Evaluating Document Formats The most important decision we need to make is what format to deliver the certificate in. Options include paper, ASCII text, HTML, Microsoft Word, or another word processor’s for- mat, Rich Text Format, PostScript, and Portable Document Format. Given the ten attributes listed previously, we can consider and compare some of our options. Paper Delivering the certificate on paper has some obvious advantages. We retain complete control over the process. We can see exactly what each certificate output looks like before sending it to the recipient. We do not need to worry about software or bandwidth, and the certificate could be printed with anti-counterfeiting measures. It would meet all of our needs except for attributes 5 and 6. The certificate could not be created and delivered quickly. Postal delivery could take days or weeks depending on our and the recipient’s location. Each certificate would also cost us a few cents to a few dollars in printing and postage costs and probably more in handling. Automatic electronic delivery would be cheaper. ASCII Delivering documents as ASCII or plain text comes with some advantages. Compatibility will be no problem. Bandwidth required would be small, so cost would be very low. The simplicity of the end result will make it very easy to design and very quick for a script to generate. If we present our visitors with an ASCII file, however, we have very little control over the appearance of their certificate. We cannot control fonts or page breaks. We can only include text and have very little control over formatting. We have no control over a recipient’s duplica- tion or modification of the document. This is the method that makes it easiest for the recipient to fraudulently alter her certificate. HTML An obvious choice for delivering a document on the Web is HTML. Hypertext Markup Language is specifically designed for this purpose. As you are no doubt already aware, it includes formatting control, syntax to include objects such as images, and is compatible (with some variation) with a variety of operating systems and software. It is fairly simple, so it will be both easy to design and quick for a script to generate and deliver. Drawbacks to using HTML for this application include: limited support for print related formatting such as page breaks; little consistency in the output on different platforms and Generating Personalized Documents in Portable Document Format (PDF) C HAPTER 30 30 GENERATING PERSONALIZED DOCUMENTS IN PDF 745 36 7842 CH30 3/6/01 3:40 PM Page 745 programs; and variable quality printing. In addition, although HTML can include any type of external element, the capability of the browser to display or use these elements cannot be guaranteed for unusual types. Word Processor Formats Particularly for intranet projects, providing documents as word processor documents makes some sense. However, for an Internet project, using a proprietary word processor format will exclude some visitors, but given its market dominance, Microsoft Word would make sense. Most users will either have access to Word or to a word processor that will try to read Word files. Windows users without Word can download the freeware Word Viewer from http://www.microsoft.com/office/000/viewers.htm Generating a document as a Microsoft Word document has some advantages. As long as you have a copy of Word, designing a document is easy. We have very good control over the printed appearance of our documents and a lot of flexibility with its contents. You can also make it relatively difficult for the recipient to modify by telling word to ask for a password. Unfortunately, Word files can be large, particularly if they contain images or other complex elements. There is also no easy way to generate them dynamically with PHP. The format is documented, but is a binary format and the format documentation comes with license conditions. Rich Text Format Rich Text Format or RTF gives us most of the power of Word, but the files are easier to gener- ate. We still have flexibility over layout and formatting of the printed page. We can still include elements such as vector or bitmap images. We can still be fairly sure that the user will see a similar result to us when they view or print the document. RTF is Microsoft Word’s text format. It is intended as an interchange format to transfer docu- ments between different programs. In some ways, it is similar to HTML. It uses syntax and key words rather than binary data to convey formatting information. It is therefore relatively human readable. The format is well documented. The specification is freely available and can be found here: http://msdn.microsoft.com/library/specs/rtfspec.htm The easiest way to generate an RTF document is to choose a Save As RTF option in your word processor. As RTF files contain only text, it is possible to generate them directly and existing ones can easily be modified. Building Practical PHP and MySQL Projects P ART V 746 36 7842 CH30 3/6/01 3:40 PM Page 746 Because the format is documented and freely available, RTF is readable by more software than Word’s binary format. Be aware though that users opening a complex RTF file in older ver- sions of Word or different word processors will often see somewhat different results. Each new version of Word introduces new keywords to RTF, so older implementations will usually ignore controls they do not understand or have chosen not to implement. From our original list, an RTF certificate would be easy to design using Word or another word processor; able to contain a variety of different elements such as vector and bitmap images; give a high quality printout; can be generated easily and quickly; and can be delivered elec- tronically at low cost. It will work with a variety of applications and operating systems, although with somewhat variable results. On the down side, an RTF document can be easily and freely modified by any- body, which is a problem for a certificate and some other types of document. The file size might get moderately large for complex documents. RTF is a good option for many document delivery applications, so we will use it as one option here. PostScript PostScript, from Adobe, is a page description language. It is a powerful and complex pro- gramming language intended to represent documents in a device independent way—that is, a description that will produce consistent results across different devices such as printers and screens. It is very well documented. At least three full-length books are available, as well as countless Web sites. A PostScript document can contain very precise formatting, text, images, embedded fonts, and other elements. You can easily generate a PostScript document from an application by printing it to a PostScript printer driver. If you were interested, you could even learn to program in it directly. PostScript documents are quite portable. They will give consistent high-quality printouts from different devices and different operating systems. There are a couple of significant down sides to using PostScript to distribute documents: • The files can be huge. • Many people will need to download additional software to use them. Most UNIX users will be able to deal with PostScript files, but Windows users will usually need to download a viewer such as GSview, which uses the Ghostscript PostScript interpreter. This software is available for a wide variety of platforms. Although it is available free, we do not really want to force people to download more software. Generating Personalized Documents in Portable Document Format (PDF) C HAPTER 30 30 GENERATING PERSONALIZED DOCUMENTS IN PDF 747 36 7842 CH30 3/6/01 3:40 PM Page 747 You can read more about Ghostscript at http://www.ghostscript.com/ and download it from http://www.cs.wisc.edu/~ghost/ For our current application, PostScript scores very well for consistent high-quality output, but falls short on most of our other needs. Portable Document Format Fortunately, there is a format with most of the power of PostScript, but with significant advan- tages. The Portable Document Format (also from Adobe) was designed as a way to distribute documents that would behave consistently on different platforms, and deliver predictable high-quality output on screen or on paper. Adobe describes PDF as “the open de facto standard for electronic document distribution worldwide. Adobe PDF is a universal file format that preserves all of the fonts, formatting, colors, and graphics of any source document, regardless of the application and platform used to create it. PDF files are compact and can be shared, viewed, navigated, and printed exactly as intended by anyone with a free Adobe Acrobat Reader.” PDF is an open format, and documentation is available from here: http://partners.adobe.com/asn/developer/technotes.html as well as many other Web sites and an official book. Judged against our desired attributes, PDF looks very good. PDF documents give consistent, high-quality output, are capable of containing elements such as bitmap and vector images, can use compression to create a small file, can be delivered elec- tronically and cheaply, are usable on the major operating systems, and can include security controls. Working against PDF is the fact that most of the software used to create PDF documents is commercial. A reader is required to view PDF files, but the Acrobat Reader is available free for Windows, UNIX, and Macintosh from Adobe. Many visitors to your site will already be familiar with the .pdf extension and will most likely already have the reader installed. PDF files are a good way to distribute attractive, printable documents, particularly ones that you do not want recipients to be able to easily modify. We will look at two different ways to generate a PDF certificate. Building Practical PHP and MySQL Projects P ART V 748 36 7842 CH30 3/6/01 3:40 PM Page 748 Solution Components To get the system working, we will need to be able to examine users’ knowledge and (assum- ing that they pass the test) generate a certificate reporting their performance. We will experi- ment with generating this certificate in three different ways: two using PDF and one using RTF. Let’s look at the requirements of each of these components in some detail. Question and Answer System Providing a flexible system for online assessment that allowed a variety of different question types, various media types for supporting information, useful feedback on wrong answers, and clever statistic gathering and reporting, would be a complex task on its own. In this chapter, we are mainly interested in the challenge of generating customized documents for delivery over the Web, so we will only build a very simple quiz system. The quiz does not rely on any special software. It uses an HTML form to ask questions and a PHP script to process the answers. We have been doing this since Chapter 1, “PHP Crash Course.” Document Generation Software No additional software is needed on the Web server to generate RTF or PDF documents from templates, but you will need software to create the templates. In order to use the PHP PDF creation functions, you will need to have compiled PDF support into PHP. (We’ll discuss more about this in a minute.) Software to Create RTF Template You can use the word processor of your choice to generate RTF files. We used Microsoft Word to create our certificate template. The certificate template is included on the CD-ROM in the Chapter 30 directory. If you prefer another word processor, it would still be a good idea to test the output in Word as this is the software that the majority of your visitors will be using. Software to Create PDF Template PDF documents are a little more difficult to generate. The easiest way is to purchase Adobe Acrobat. This software will let you create high-quality PDFs from various applications. We used Acrobat to create the template file for this project. Generating Personalized Documents in Portable Document Format (PDF) C HAPTER 30 30 GENERATING PERSONALIZED DOCUMENTS IN PDF 749 36 7842 CH30 3/6/01 3:40 PM Page 749 To create the file, we used Microsoft Word to design a document. One of the tools in the Acrobat package is Adobe Distiller. Within Distiller, we needed to select a few non-default options. The file must be stored in ASCII format, and compression needs to be turned off. After these are set, creating a PDF file is as easy as printing. You can find out more about Acrobat here: http://www.adobe.com/products/acrobat/ and either buy it online or from a regular software retailer. Another option to create PDFs is the conversion program ps2pdf, which as the name suggests converts PostScript files into PDF files. This has the advantage of being free, but does not always produce good output for documents with images or non-standard fonts. The ps2pdf converter comes with the Ghostscript package mentioned previously. Obviously, if you are going to create a PDF file this way, you will need to create a PostScript file first. UNIX users will typically use either the a2ps or dvips utilities for this purpose. If you are working in a Windows environment, you can also create PostScript files without Adobe Distiller, albeit via a slightly more complicated process. You will need to install a PostScript printer driver. For example, you can use the Apple LaserWriter IINT driver. If you don’t have a PostScript driver installed, you can download one from Adobe at http://www.adobe.com/support/downloads/5672.htm To create your PostScript file, you will need to select this printer and the Print to File option, typically found on the Print dialog box. Most Windows applications will then produce a file with a .prn extension. This should be a PostScript file. You should probably rename this to be a .ps file. You should be able to view it using GSview or another PostScript viewer, or create a PDF file using the ps2pdf utility. Be aware that different printer drivers produce PostScript output of varying quality. You might find that some of the PostScript files you produce give errors when run through the ps2pdf utility. We suggest using a different printer driver. If you only intend to create a small number of PDF files, Adobe’s online service might suit you. For $9.99 a month, you can upload files in a number of formats and download a PDF file. The service worked well for our certificate, but does not let you select options that are impor- tant for this project. The PDF created will be stored as a binary file and compressed. This makes it very difficult to modify. This service can be found at http://createpdf.adobe.com/ Building Practical PHP and MySQL Projects P ART V 750 36 7842 CH30 3/6/01 3:40 PM Page 750 There is a free trial option for this service if you want to test it out. There is also a free ftp-based interface to ps2pdf at the Net Distillery: http://www.babinszki.com/distiller/ Software to Create PDF Programmatically Support for creating PDF documents is available from within PHP. Two different function libraries are available, with similar intentions. As they rely on external libraries, neither is compiled in to PHP by default. PHP’s PDFlib functions use the PDFlib library, available from http://www.pdflib.com The ClibPDF functions use the ClibPDF library, available from http://www.fastio.com/ Both these libraries are similar. They provide an API of functions to generate a PDF document. We have elected to use PDFlib because it seems to be updated and maintained more regularly. It is worth noting that neither of these libraries are Free Software. Both permit some non- commercial use without charge, but require a license fee if you intend to provide a commercial service using them. You can see if PDFlib is already installed on your system by checking the output of the func- tion phpinfo(). Under the heading pdf, you can find out if PDFlib support is enabled, as well as the version of PDFlib used. In order to install PDFlib, you will also need to install the TIFF library, available from http://www.libtiff.org/ and the JPEG library, available from ftp://ftp.uu.net/graphics/jpeg/ On a UNIX system, these pieces of software are installed in the usual way, using configure and make. You will need to recompile PHP with the switch with-pdflib. On a Windows server, the easiest way to get PDFlib support is to download one of the unoffi- cial precompiled binaries available on the Web. To test the code from this chapter on a Windows machine, we got a precompiled binary from http://php.weblogs.com/easywindows Another popular build is available from http://www.php4win.de Generating Personalized Documents in Portable Document Format (PDF) C HAPTER 30 30 GENERATING PERSONALIZED DOCUMENTS IN PDF 751 36 7842 CH30 3/6/01 3:40 PM Page 751 Solution Overview We will produce a system with three possible outcomes. As shown in Figure 30.1, we will ask quiz questions, assess the answers, and then generate a certificate in one of three ways: • We will generate an RTF document from a blank template. • We will generate a PDF document from a blank template. • We will generate a PDF document programmatically via PDFlib. Building Practical PHP and MySQL Projects P ART V 752 Generate RTF file from blank template Generate RTF file from blank template Generate PDF via PDFlib Assess Quiz Answers Ask Quiz Questions FIGURE 30.1 Our certification system will generate one of three different certificates. A summary of the files in the certification project is shown in Table 30.1. TABLE 30.1 Files in the Certification Application Name Type Description index.html HTML page The HTML form that con- tains the quiz questions score.php Application Script to assess users’ answers rtf.php Application Script to generate RTF certificate from template pdf.php Application Script to generate PDF certificate from template 36 7842 CH30 3/6/01 3:40 PM Page 752 TABLE 30.1 Continued Name Type Description pdflib.php Application Script to generate PDF certificate using PDFlib signature.tif image Bitmap image of signature to be included on the PDFlib certificate PHPCertification.rtf RTF RTF certificate template PHPCertification.pdf PDF PDF certificate template Let’s go ahead and look at the application. Asking the Questions The file index.html is straightforward. It needs to contain an HTML form asking the user for his name, and the answer to a number of questions. In a real assessment application, we would most likely retrieve these questions from a database. Here we are focusing on producing the certificate, so we will just hard-code some questions into the HTML. The name field is a text input. Each question has three radio buttons to allow the user to indi- cate his preferred answer. The form has an image button as a submit button. The code for this page is shown in Listing 30.1. LISTING 30.1 index.html—HTML Page Containing Quiz Questions <html> <body> <h1><p align = center> <img src = “rosette.gif” alt = “”> Certification <img src = “rosette.gif” alt = “”></h1> <p>You too can earn your highly respected PHP certification from the world famous Fictional Institute of PHP Certification. <p>Simply answer the questions below: <form action = score.php method = post> <p>Your Name <input type = text name = name> <p>What does the PHP statement echo do? <ol> Generating Personalized Documents in Portable Document Format (PDF) C HAPTER 30 30 GENERATING PERSONALIZED DOCUMENTS IN PDF 753 36 7842 CH30 3/6/01 3:40 PM Page 753 LISTING 30.1 Continued <li><input type = radio name = q1 value = 1> Outputs strings. <li><input type = radio name = q1 value = 2> Adds two numbers together. <li><input type = radio name = q1 value = 3> Creates a magical elf to finish writing your code. </ol> <p>What does the PHP function cos() do? <ol> <li><input type = radio name = q2 value = 1> Calculates a cosine in radians. <li><input type = radio name = q2 value = 2> Calculates a tangent in radians. <li><input type = radio name = q2 value = 3> It is not a PHP function it is a lettuce. </ol> <p>What does the PHP function mail() do? <ol> <li><input type = radio name = q3 value = 1> Sends a mail message. <li><input type = radio name = q3 value = 2> Checks for new mail. <li><input type = radio name = q3 value = 3> Toggles PHP between male and female mode. </ol> <p align = center><input type = image src = “certify-me.gif” border = 0> </form> </body> </html> The result of loading index.html in a Web browser is shown in Figure 30.2. Building Practical PHP and MySQL Projects P ART V 754 36 7842 CH30 3/6/01 3:40 PM Page 754 . Application Script to assess users’ answers rtf .php Application Script to generate RTF certificate from template pdf .php Application Script to generate PDF certificate from template 36 784 2 CH30 3/6/01. from this chapter on a Windows machine, we got a precompiled binary from http:/ /php. weblogs.com/easywindows Another popular build is available from http://www .php4 win.de Generating Personalized. delivery applications, so we will use it as one option here. PostScript PostScript, from Adobe, is a page description language. It is a powerful and complex pro- gramming language intended to represent