Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 65 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
65
Dung lượng
518,44 KB
Nội dung
AWSAccessKeyId='[AWSAccessKeyId]' AWSSecretAccessKey = '[AWSSecretAccessKey]' FILENAME = 'D:\Document\PersonalInfoRemixBook\858Xtoc___.pdf' BUCKET = 'mashupguidetest' from boto.s3.connection import S3Connection def upload_file(fname, bucket, key, acl='public-read', metadata=None): from boto.s3.key import Key fpic = Key(bucket) fpic.key = key #fpic.set_metadata('source','flickr') fpic.update_metadata(metadata) fpic.set_contents_from_filename(fname) fpic.set_acl(acl) return fpic # set up a connection to S3 conn = S3Connection(AWSAccessKeyId, AWSSecretAccessKey) # retrieve all the buckets buckets = conn.get_all_buckets() print "number of buckets:", len(buckets) # print out the names, creation date, and the XML the represents the ACL # of the bucket for b in buckets: print "%s\t%s\t%s" % (b.name, b.creation_date, b.get_acl().acl.to_xml()) # get list of all files for the mashupguide bucket print "keys in " + BUCKETmg_bucket = conn.get_bucket(BUCKET) keys = mg_bucket.get_all_keys() for key in keys: print "%s\t%s\t%s" % (key.name, key.last_modified, key.metadata) # upload the table of contents to mashupguide bucket. metadata = {'author':'Raymond Yee'} upload_file(FILENAME,mg_bucket,'samplefile','public-read',metadata) # read back the TOC toc = mg_bucket.get_key('samplefile') print toc.metadata CHAPTER 16 ■ USING ONLINE STORAGE SERVICES 485 858Xch16FINAL.qxd 2/4/08 3:27 PM Page 485 Summary From reading this chapter, you should now know how to get started with the Amazon S3 API using PHP and Python. The APIs for other online storage systems are different but will have some conceptual similarity to S3. CHAPTER 16 ■ USING ONLINE STORAGE SERVICES486 858Xch16FINAL.qxd 2/4/08 3:27 PM Page 486 Mashing Up Desktop and Web-Based Office Suites I’ve long been excited about the mashability and reusability of office suite documents (for example, word processor documents, spreadsheets, and slide presentations), the potential of which has gone largely unexploited. There are many office suites, but in this chapter I’ll con- centrate on the latest versions of OpenOffice.org, often called OO.o (version 2.x), and Microsoft Office (2007 and 2003). Few people realize that both these applications not only have program- ming interfaces but also have XML-based file formats. In theory, office documents using the respective file formats (OpenDocument and Office Open XML) are easier to reuse and generate from scratch than older generations of documents using opaque binary formats. And as you have seen throughout the book, knowledge of data formats and APIs means having opportunities for mashups. For ages, people have been reverse engineering older Microsoft Office documents, whose formats were not publicly documented; however, recombining office suites has been made easier, though not effortless, by these new formats. In this chapter, I will also introduce you to the emerging space of web-based office suites, specifically ones that are programmable. I’ll also briefly cover how to program the office suites. This chapter does the following: • Shows how to do some simple parsing of the OpenDocument format (ODF) and Office Open XML documents • Shows how to create a simple document in both ODF and Open XML • Demonstrates some simple scripting of OO.o and Microsoft Office • Lays out what else is possible by manipulating the open document formats • Shows how to program Google Spreadsheets and to mash it up with other APIs (such as Amazon E-Commerce Services) Mashup Scenarios for Office Suites Why would mashups of office suite documents be interesting? For one, word processing doc- uments, spreadsheets, and even presentation files hold vast amounts of the information that we communicate to each other. Sometimes they are in narratives (such as documents), and sometimes they are in semistructured forms (such as spreadsheets). To reuse that information, 487 CHAPTER 17 ■ ■ ■ 858Xch17FINAL.qxd 2/4/08 3:36 PM Page 487 it is sometimes a matter of reformatting a document into another format. Other times, it’s about extracting valuable pieces; for instance, all the references in a word processor docu- ment might be extracted into a reference database. Furthermore, not only does knowledge of the file formats enable you to parse documents, but it allows you to generate documents. Some use case scenarios for the programmatic creation and reuse of office documents include the following: Reusing PowerPoint: Do you have collections of Microsoft PowerPoint presentations that draw from a common collection of digital assets (pictures and outlines) and com- plete slides? Can you build a system of personal information management so that PPT presentations are constructed as virtual assemblages of slides, dynamically associated with assets? Writing once, publishing everywhere: I’m currently writing this manuscript in Microsoft Office 2007. I’d like to republish this book in (X)HTML, Docbook, PDF, and wiki markup. How would I repurpose the Microsoft Word manuscript into those formats? Transforming data: You could create an educational website in which data is downloaded to spreadsheets, not only as static data elements but as dynamic simulations. There’s plenty of data out there. Can you write programs to translate it into the dominant data analysis tool used by everyone, which is spreadsheets, whether it is on the desktop or in the cloud? Getting instant PowerPoint presentations from Flickr: I’d like to download a Flickr set as a PowerPoint presentation. (This scenario seems to fit a world in which PowerPoint is the dominant presentation program. Even if Tufte hates it, a Flickr-to-PPT translator might make it easier to show those vacation pictures at your next company presentation.) There are many other possibilities. This chapter teaches you what you need to know to start building such applications. The World of Document Markup This chapter focuses on XML-based document markup languages in two dominant groups of office suites: Microsoft Office 2007 and OpenOffice.org. There are plenty of other markup languages, which are covered well on Wikipedia: • http://en.wikipedia.org/wiki/Document_markup_language • http://en.wikipedia.org/wiki/List_of_document_markup_languages • http://en.wikipedia.org/wiki/Comparison_of_document_markup_languages The OpenDocument Format ODF is “an open XML-based document file format for office applications to be used for documents containing text, spreadsheets, charts, and graphical elements,” developed under the auspices of CHAPTER 17 ■ MASHING UP DESKTOP AND WEB-BASED OFFICE SUITES488 858Xch17FINAL.qxd 2/4/08 3:36 PM Page 488 OASIS. 1 ODF is also an ISO/IEC standard (ISO/IEC 206300:2006). 2 ODF is used most prominently in OpenOffice.org (http://www.openoffice.org/) and KOffice (http://www.koffice.org/), among other office suites. For a good overview of the file format, consult J. David Eisenberg’s excellent book on ODF, called OASIS OpenDocument Essentials, which is available for download as a PDF (free of charge) or for purchase. 3 The goal of this section is to introduce you to the issues of parsing and creating ODF files programmatically. ■Note For this section, I am assuming you have OpenOffice.org version 2.2 installed. A good way to understand the essentials of the file format is to create a simple instance of an ODF file and then analyze it: 1. Fire up OpenOffice.org Writer, type Hello World, and save the file as helloworld.odt. 4 2. Open the file in a ZIP utility (such as WinZip on the PC). One easy way to do so is to change the file extension from .odt to .zip so that the operating system will recognize it as a ZIP file. You will see that it’s actually a ZIP-format file when you go to unzip it. (See the list of files in Figure 17-1.) Figure 17-1. Unzipping helloworld.zip. An OpenDocument file produced by OpenOffice.org is actually in the ZIP format. CHAPTER 17 ■ MASHING UP DESKTOP AND WEB-BASED OFFICE SUITES 489 1. http://www.oasis-open.org/committees/office/ 2. http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=43485 3. http://books.evc-cit.info/OD_Essentials.pdf or http://develop.opendocumentfellowship.com/book/ 4. http://examples.mashupguide.net/ch17/helloworld.odt 858Xch17FINAL.qxd 2/4/08 3:36 PM Page 489 You’ll see some of the files that can be part of an ODF file: • content.xml • styles.xml • meta.xml • settings.xml • META-INF/manifest.xml • mimetype • Configuration2/accelerator/ • Thumbnails/thumbnail.png You can also use your favorite programming language, such as Python or PHP, to generate a list of the files. The following is a Python example: import zipfile z = zipfile.ZipFile(r'[path_to_your_file_here]') z.printdir() This generates the following: File Name Modified Size mimetype 2007-06-02 16:10:18 39 Configurations2/statusbar/ 2007-06-02 16:10:18 0 Configurations2/accelerator/current.xml 2007-06-02 16:10:18 0 Configurations2/floater/ 2007-06-02 16:10:18 0 Configurations2/popupmenu/ 2007-06-02 16:10:18 0 Configurations2/progressbar/ 2007-06-02 16:10:18 0 Configurations2/menubar/ 2007-06-02 16:10:18 0 Configurations2/toolbar/ 2007-06-02 16:10:18 0 Configurations2/images/Bitmaps/ 2007-06-02 16:10:18 0 content.xml 2007-06-02 16:10:18 2776 styles.xml 2007-06-02 16:10:18 8492 meta.xml 2007-06-02 16:10:18 1143 Thumbnails/thumbnail.png 2007-06-02 16:10:18 945 settings.xml 2007-06-02 16:10:18 7476 META-INF/manifest.xml 2007-06-02 16:10:18 1866 You can get the equivalent functionality in PHP with the PHP zip library (see http://us2.php.net/zip): <?php $zip = zip_open('[path_to_your_file]'); while ($entry = zip_read($zip)) { print zip_entry_name($entry) . "\t". zip_entry_filesize($entry). "\n"; CHAPTER 17 ■ MASHING UP DESKTOP AND WEB-BASED OFFICE SUITES490 858Xch17FINAL.qxd 2/4/08 3:36 PM Page 490 } zip_close($zip); ?> This produces the following: mimetype 39 Configurations2/statusbar/ 0 Configurations2/accelerator/current.xml 0 Configurations2/floater/ 0 Configurations2/popupmenu/ 0 Configurations2/progressbar/ 0 Configurations2/menubar/ 0 Configurations2/toolbar/ 0 Configurations2/images/Bitmaps/ 0 content.xml 2776 styles.xml 8492 meta.xml 1143 Thumbnails/thumbnail.png 945 settings.xml 7476 META-INF/manifest.xml 1866 Generating a simple ODF file using OpenOffice.org gives you a basic file from which you can build. However, it’s useful to boil the file down even further because even the simple ODF generated by OO.o contains features that make it difficult to see what’s happening. Let’s pare down the “Hello World” ODF document further. There are at least two ways to figure out a minimalist instance of an ODF document. One is to consult the ODF specification, specifically the ODF schema, to generate a small instance. OO.o 2.2 uses the ODF 1.0 specification. 5 The specification contains a RELAX NG schema for ODF. RELAX NG (http://relaxng.org/) is a schema language for XML. That is, you can use RELAX NG to specify what elements and attributes can be used in ODF—and in what combination. Schemas, stemming from the http://oasis-open.org page, include the following: • The schema for office documents, “extracted from chapter 1 to 16 of the specification” —Version 1.0 6 • “The normative schema for the manifest file used by the OpenDocument package format” —Version 1.0 7 • “The strict schema for office documents that permits only meta information and format- ting properties contained in this specification itself” —Version 1.0 8 CHAPTER 17 ■ MASHING UP DESKTOP AND WEB-BASED OFFICE SUITES 491 5. http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office 6. http://www.oasis-open.org/committees/download.php/12571/OpenDocument-schema-v1.0-os.rng 7. http://www.oasis-open.org/committees/download.php/12570/OpenDocument-manifest-schema- v1.0-os.rng 8. http://www.oasis-open.org/committees/download.php/12569/OpenDocument-strict-schema- v1.0-os.rng 858Xch17FINAL.qxd 2/4/08 3:36 PM Page 491 Instead of taking this approach here, I will instead show you how to use OO.o and the online ODF Validator (http://opendocumentfellowship.com/validator). The basic approach is to use a bit of trial and error to generate an ODF file and add pieces while feeding it to the ODF Validator to see how far you can distill the file. Why should you care about minimal instances of ODF (and later OOXML) documents? ODF and OOXML are complicated markup formats. One of the best ways to figure out how to create formats is to use a tool such as OO.o or Microsoft Office to generate what you want, save the file, unzip the file, extract the section of the docu- ment you want, and plug that stuff into a minimalist document that you know is valid. That’s why you’re learning about boiling ODF down to its essence. The ODF specification (and its RELAX NG schema) should tell you theoretically how to find a valid ODF instance—but in practice, you need to actually feed a given instance to the applications that are the destinations for the ODF documents. OpenOffice.org is currently the most important implementation of an office suite that interprets ODF, making it a good place to experiment. J. David Eisenberg’s excellent book on ODF, OASIS OpenDocument Essentials, provides an answer to the question of which files are actually required by OO.o: The only files that are actually necessary are content.xml and the META-INF/manifest.xml file. If you create a file that contains word processor elements and zip it up and a manifest that points to that file, OpenOffice.org will be able to open it successfully.The result will be a plain text-only document with no styles. You won’t have any of the meta-information about who created the file or when it was last edited, and the printer settings, view area, and zoom factor will be set to the OpenOffice.org defaults. Let’s verify Eisenberg’s assertion. Create an .odt file with the same content.xml as helloworld.odt, listed here: <?xml version="1.0" encoding="UTF-8"?> <office:document-content xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" xmlns:math="http://www.w3.org/1998/Math/MathML" xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0" xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0" xmlns:ooo="http://openoffice.org/2004/office" xmlns:ooow="http://openoffice.org/2004/writer" CHAPTER 17 ■ MASHING UP DESKTOP AND WEB-BASED OFFICE SUITES492 858Xch17FINAL.qxd 2/4/08 3:36 PM Page 492 xmlns:oooc="http://openoffice.org/2004/calc" xmlns:dom="http://www.w3.org/2001/xml-events" xmlns:xforms="http://www.w3.org/2002/xforms" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" office:version="1.0"> <office:scripts/> <office:font-face-decls> <style:font-face style:name="Tahoma1" svg:font-family="Tahoma"/> <style:font-face style:name="Times New Roman" svg:font-family="'Times New Roman'" style:font-family-generic="roman" style:font-pitch="variable"/> <style:font-face style:name="Arial" svg:font-family="Arial" style:font-family-generic="swiss" style:font-pitch="variable"/> <style:font-face style:name="Arial Unicode MS" svg:font-family="'Arial Unicode MS'" style:font-family-generic="system" style:font-pitch="variable"/> <style:font-face style:name="MS Mincho" svg:font-family="'MS Mincho'" style:font-family-generic="system" style:font-pitch="variable"/> <style:font-face style:name="Tahoma" svg:font-family="Tahoma" style:font-family-generic="system" style:font-pitch="variable"/> </office:font-face-decls> <office:automatic-styles/> <office:body> <office:text> <office:forms form:automatic-focus="false" form:apply-design-mode="false"/> <text:sequence-decls> <text:sequence-decl text:display-outline-level="0" text:name="Illustration"/> <text:sequence-decl text:display-outline-level="0" text:name="Table"/> <text:sequence-decl text:display-outline-level="0" text:name="Text"/> <text:sequence-decl text:display-outline-level="0" text:name="Drawing"/> </text:sequence-decls> <text:p text:style-name="Standard">Hello World!</text:p> </office:text> </office:body> </office:document-content> Now edit META-INF/metadata.xml to reference only content.xml and the META-INF directory: <?xml version="1.0" encoding="UTF-8"?> <manifest:manifest xmlns:manifest="urn:oasis:names:tc:opendocument:xmlns:manifest:1.0"> <manifest:file-entry manifest:media-type="application/vnd.oasis.opendocument.text" manifest:full-path="/"/> CHAPTER 17 ■ MASHING UP DESKTOP AND WEB-BASED OFFICE SUITES 493 858Xch17FINAL.qxd 2/4/08 3:36 PM Page 493 <manifest:file-entry manifest:media-type="text/xml" manifest:full-path="content.xml"/> </manifest:manifest> This leaves you with an .odt file that consists of only those two files. 9 You will find that such a file will load successfully in OpenOffice.org 2.2 and the OpenDocument Viewer 10 — giving credence to the assertion that, in OO.o 2.2 at least, you don’t need any more than content.xml and META-INF/manifest.xml. ■Note You can download and install the OpenDocument Validator 11 or run the online version. 12 Nonetheless, the OpenDocument Validator doesn’t find the file to be valid; it produces the following error message: 1. warning does not contain a /mimetype file. This is a SHOULD in OpenDocument 1.0 2. error styles.xml is missing 3. error settings.xml is missing 4. error meta.xml is missing Since the OpenDocument Validator dies on one of the Fellowship’s test files, 13 you can see there are some unresolved problems with the validator or the test files produced by the OpenDocument Fellowship. Although there is nothing wrong with our minimalist file, it’s a good idea to use a file that has all the major pieces in place. If you insert skeletal styles.xml, settings.xml, and meta.xml files, you can convince the OpenDocument Validator to accept the resulting .odt file as a valid document. Furthermore, you can strip content.xml of extraneous declarations. (Strictly speaking, the namespace decla- rations are extraneous, but they are useful to have once you start plugging in chunks of ODF.) The resulting ODF text document is what you find here: http://examples.mashupguide.net/ch17/helloworld_min_odt_2.odt CHAPTER 17 ■ MASHING UP DESKTOP AND WEB-BASED OFFICE SUITES494 9. http://examples.mashupguide.net/ch17/helloworld_min_odt_1.odt 10. http://opendocumentfellowship.com/odfviewer 11. http://opendocumentfellowship.com/projects/odftools 12. http://opendocumentfellowship.com/validator 13. http://testsuite.opendocumentfellowship.com/testcases/General/DocumentStructure/ SingleDocumentContents/testDoc/testDoc.odt via http://testsuite.opendocumentfellowship.com/ testcases/General/DocumentStructure/SingleDocumentContents/TestCase.html 858Xch17FINAL.qxd 2/4/08 3:36 PM Page 494 [...]... xmlns:xlink="http://www.w3.org/ 199 9/xlink"/> ... you to take this approach on spreadsheets and presentations; the ODF for those files formats have a similar framework as the text documents This example text contains some common elements: • Headings of level 1 and 2 • Several paragraphs • An ordered and unordered list • Text that has some italics and bold and a font change • A table • An image 858Xch17FINAL.qxd 2/4/08 3:36 PM Page 499 CHAPTER 17 ■ MASHING... • I’ll cover OpenDocumentPHP (http://opendocumentphp.org/), which is in the early stages of development In the next two subsections, I will show you how to use Odfpy and OpenDocumentPHP Odfpy I’ll first use Odfpy to generate a minimalist document and then to re-create the full-blown ODF text document from earlier in the chapter To use it, follow the documentation here: http://opendocumentfellowship.com/files/api-for-odfpy.odt... xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" xmlns:xlink="http://www.w3.org/ 199 9/xlink" /> . 0 Configurations2/images/Bitmaps/ 20 07 -06 - 02 16: 10: 18 0 content.xml 20 07 -06 - 02 16: 10: 18 27 76 styles.xml 20 07 -06 - 02 16: 10: 18 84 92 meta.xml 20 07 -06 - 02 16: 10: 18 1143 Thumbnails/thumbnail.png 20 07 -06 - 02 16: 10: 18. 0 Configurations2/popupmenu/ 20 07 -06 - 02 16: 10: 18 0 Configurations2/progressbar/ 20 07 -06 - 02 16: 10: 18 0 Configurations2/menubar/ 20 07 -06 - 02 16: 10: 18 0 Configurations2/toolbar/ 20 07 -06 - 02 16: 10: 18 0 Configurations2/images/Bitmaps/. Size mimetype 20 07 -06 - 02 16: 10: 18 39 Configurations2/statusbar/ 20 07 -06 - 02 16: 10: 18 0 Configurations2/accelerator/current.xml 20 07 -06 - 02 16: 10: 18 0 Configurations2/floater/ 20 07 -06 - 02 16: 10: 18 0 Configurations2/popupmenu/