ptg 14 Chapter 1: HTML and the Web Widgets come in many varieties and are rarely harmful. ey run within the browser’s security setup and are generally isolated from your computer’s le system. However, they can cause trouble if they are not well written. e problems include messing up the display of a web page, using up too much of the browser resources, or even causing a browser to crash. Any stand-alone computer application or soware program that exchanges information over the Web (Twitter clients, for example) is a user agent. So are the automatic soware update programs that come with computer operating systems. So is the online Help feature of Microso Word or, for that matter, an Xbox, Nintendo, or PlayStation game console. Many of the apps on a modern smartphone are user agents, sending requests to web servers and using the returned information to do something useful or keep you informed. Every web browser must provide three basic functions: 1) It must provide a control interface for human users; 2) it must exchange information with other computers; and 3) it must interpret HTML and render a web page. We are primarily interested in this last function—how HTML is understood by a browser and how that determines what is seen on the page. Many browser makers use the same open source, HTML rendering engines and dier mostly in their user interfaces. As a result, only four browser types cover most Web surng: Internet Explorer, Mozilla (Firefox, Flock), Webkit (Safari, Chrome), and everything else (mobile phone browsers, legacy versions of IE, and Inter- net appliances). As with browsers, several dierent web servers are in use today, hosting nearly a quarter billion websites in total. By far the most popular web server, according to a November 2009 survey by Netcra, is Apache, an open-source product from the Apache Foundation. It hosts about half of all sites worldwide. e next most popular web server is the Internet Information Server (IIS) from Microso, with about one-third of the market. e remaining web serv- ers are Google Web Server (GWS), which the company uses internally to host its massive search engine and user sites; nginx (pronounced “engine X”), a free, lightweight, high-performance server written by Igor Sysoev; and Qzone, a Chinese web server used by QQ.com to host upward of 20 million blogs under its domain. When a web server receives a request from a user agent, all it has to do is gure out which le to return. Actually, it is a bit more complicated than that. Apache, for example, has a modular structure with “hooks” that allow a systems administrator to include custom components. Apache analyzes From the Library of Wow! eBook ptg The Web Bestiary 15 the incoming request, applying defaults and rewriting rules. It determines whether to satisfy the request by returning the contents of a le or by execut- ing a program and returning the output. If the requested resource requires authentication, Apache returns a status code instructing the browser to resubmit the request aer prompting for a username/password combination. e HTTP request contains additional information such as the name of the browser or user agent and the preferred language. is enables Apache to provide a dierent page for mobile users or to substitute a translation of the requested page if one is available. Web browsers and servers speak many other Internet protocols. Browsers are, in a sense, the Swiss army knives of Internet clients. Web servers have plug-in interfaces to email, database, FTP, streaming video players, and other services. Web servers can also make requests to each other and serve as mir- rors or proxies for each other. T W B is section contains a lot of acronyms and denitions. Much of the descrip- tive material is taken from Wikipedia. In a very real sense, Wikipedia rep- resents the current usage and understanding of these terms by the Web community. I’ve listed them in order of decreasing importance, or their likelihood of ever coming up in casual conversation. is list is by no means complete. . HTML (HyperText Markup Language) e predominant markup language for web pages. It provides a means to create structured docu- ments using semantic tags for such things as headings, paragraphs, lists, links, quotes, and other items. It lets you embed images and other media objects and can be used to create interactive forms. . CSS (Cascading Style Sheets) e language for describing the presen- tation (that is, the formatting and layout) of an HTML document. CSS is designed to enable the separation of document content from the details of how it should be presented, including the typography, positioning, colors, and margins. is separation improves content accessibility and provides more exibility in controlling presentation characteristics. . JavaScript An object-oriented scripting language. Although JavaScript has other uses, we are concerned here about client-side JavaScript—the version that runs inside a user’s browser and manipulates HTML page From the Library of Wow! eBook ptg 16 Chapter 1: HTML and the Web elements. JavaScript code can be embedded within the HTML elements of a web page or imported from a separate le. Not all web pages have JavaScript components, and users can turn o their browsers’ JavaScript engine if they want to. Robots generally ignore JavaScript code as they examine web pages. . HTTP (HyperText Transport Protocol) e set of rules governing how user agents, web browsers, and the like send requests to a web server and how the web server responds to the request. e web server returns a status code and data, or sometimes just the status code, when something goes wrong. e familiar 404 error code is returned when the web server cannot nd what you are looking for. ere are two primary HTTP request methods. A Get request is typically sent by your browser when you click a link with the intention of going to another web page. A Post request is typically sent when you click a form’s submit button, essen- tially asking that the web server do something with your input. . CGI (Common Gateway Interface) A protocol for dynamically gen- erating web pages in response to a get request or form submission. e term is typically used as an adjective to indicate a server-side process, such as CGI script. CGI programs are typically written using a script- ing language such as Perl, Ruby, C, vBasic, or Python. Many websites are entirely driven by CGI processes, although the relative number of such sites has probably been declining as newer technologies, such as AJAX and PHP, have become popular. . AJAX (Asynchronous JavaScript and XML) e most recent versions of JavaScript and other client-side scripting languages contain features that a developer can use to create web pages that can make independent HTTP requests to the server while the page is loading or anytime there- aer. AJAX is the set of techniques used to create web pages with ele- ments that can be independently updated with new content in response to a user’s mouse click or some other event without having to reload the entire page. is is how many widgets work. . XML (eXtensible Markup Language) A set of rules for marking up documents that emphasizes generality and global usability. It is widely used to transmit arbitrarily structured data in mixed client/server environments. XML and HTML are compatible members of a family of markup languages called Standard Generalized Markup Language (SGML). HTML is an SGML language with a specic Document Object From the Library of Wow! eBook ptg The Web Bestiary 17 Model (DOM) focused on describing hypertext documents. e two technologies are combined in the XHTML specication. . JSON (JavaScript Object Notation) Although based on JavaScript, JSON is a language-independent system for representing data objects. It is simpler than XML and is oen used as an alternative to XML in AJAX applications to transfer data objects between a server and a script run- ning in a user’s browser. . CMS (Content Management System) An application program or a package of soware tools that facilitates the creation of web pages and automates their maintenance using a Web-based interface for author- ing, editing, and administration. e term has broader use beyond the Web. For our purposes, it refers to any site or soware that generates web pages from content stored in a database and provides a means of creat- ing, editing, and managing that content without requiring knowledge of HTML, CSS, or FTP. A good CMS permits you to directly enter HTML with the content for ner control of web page presentation. Blogs are a form of content management system. . Flash (Adobe Flash, formerly Macromedia Flash) A popular mul- timedia platform for adding animation and interactivity to web pages. Flash is commonly used to create animations, advertisements, and various interactive components, to integrate video into web pages and to develop rich Internet applications. Some websites are done entirely in Flash. However, this is now considered a poor practice, partly because the content of a Flash site is generally inaccessible to robots. . PHP (PHP Hypertext Preprocessor) PHP originally stood for Per- sonal Home Page. e PHP Group, the informal organization that currently oversees the development of the language, decided to expand the meaning of PHP a few years ago and gave us the current recursive acronym. PHP is a server-side technology for dynamically generating websites. It is powerful and easy to write but oen dicult to read. A PHP le intermixes program logic—PHP statements enclosed in special tags—with HTML markup. When a request is sent to a web server for a le ending with the .php extension, the web server preprocesses the coded le, executes the PHP instructions, and returns an HTML docu- ment to the user’s browser. Many modern Web applications, such as the popular blogging soware WordPress, are written in PHP. From the Library of Wow! eBook ptg 18 Chapter 1: HTML and the Web . FTP (File Transfer Protocol) An Internet protocol for transferring les from one computer to another, usually using a stand-alone applica- tion. Web browsers and page editors also use FTP to upload and down- load les. Dozens of FTP clients are available. One of the most popular is FileZilla, a free, open-source program that runs on Windows, Macintosh, and UNIX computers. . jQuery (JavaScript Query Language) A library of JavaScript functions (oen called a framework) that simplies the development of dynamic, interactive web pages. It provides a language for selecting DOM elements and giving them complex behaviors. jQuery takes care of cross-browser dierences in the DOM and facilitates the use of AJAX. In much the same way that CSS does with web page presentation, jQuery encourages the separation of semantic HTML markup from the descriptions of how HTML elements should respond to events. jQuery makes Web program- ming fun. . RSS (Real Simple Syndication) An XML protocol for distributing con- tent. Such distributed content from a website is called a feed and provides an alternative means for users to access the content. Users can subscribe to feeds using a number of stand-alone newsreaders or by using the feed-reading facilities incorporated into their browsers and email clients. Feeds from one website can also be embedded into web pages on another site in a syndicated publishing model. RSS is quite popular but evolved in an ad hoc way and is not a recognizedstandard. A newer feed protocol called Atom is more robust and follows the applicable standards. . DNS (Domain Name System) A system for assigning names to com- puters connected to the Internet or a private network. It translates domain names meaningful to humans into the numerical addresses associated with networking equipment for the purpose of locating these devices worldwide. e Domain Name System can be thought of as the “phone book” for the Internet. . DOM (Document Object Model) A dictionary and grammar for interpreting HTML. A DOM describes HTML elements and their attributes and properties and how they are used to create web pages. DOMs are published in a form that can be read by both humans and machines. Every web browser has at least one DOM, and most modern browsers conform to DOMs published by the W3C. Yet there are still From the Library of Wow! eBook ptg HTML5 and Web Standards 19 some dierences in browser behavior arising from coding bugs, DOM misinterpretation, and edge conditions where browser behavior is not fullydened. In this book, whenever you encounter the term DOM, it means the W3C’s dra specication for HTML5 as interpreted by your favorite browser. Your browser may or may not support this or that new HTML5 element when you experiment with the examples given. e same is true of any particular editing tool or environment you like to use. My aim is to present HTML that works reliably across all modern browsers and is pleasing to all user agents. HTML W S Over the past two decades, HTML has evolved through several iterations— HTML, HTML2, HTML3, HTML3.2, HTML4, HTML4.1, XHTML. ese changes have been driven by both standards-setting organizations, such as the W3C, and individual soware companies, such as Netscape and Microso. HTML5 is the next iteration. Technically, it is not yet a standard, and it will not be for several years. It is the W3C’s working dra for the standard that it will eventually recommend to ocial standards organizations around the world. Still, browser manufacturers are already adopting HTML5 features. For now, HTML5 is best thought of as a directional guide to good standards of practice in Web design. New HTML5 elements and attributes provide a richer description of online documents as interactive multimedia spaces. Prior HTML versions (HTML4 and XHTML) are tied to a print metaphor of a page to which interactive capabilities and media support have been added ad hoc. Many pages on the Web are the online equivalent of printed pages. In contrast, HTML5 encourages a broader conception of the Web as a unied, intelligent, interactive, hyperlinked medium. For online document authors, HTML5 adds new elements to dene docu- ment sections (the section element) and new section subelements to dene page headers (header) and footers (footer). Section headings can be composed of heading groups (hgroup) and can contain the new navigation (nav) element. HTML4 provided only a single division element (div) for these purposes, and coders used id and class attributes to make the distinction in usage. ere is a new article element (article) and a means (the aside element) to designate text that’s tangential to the main topic. ere is even an element for indicating sarcastic remarks (sarcasm) in the W3C dra specication, but I think this is an inside joke. From the Library of Wow! eBook ptg 20 Chapter 1: HTML and the Web For Web developers, the HTML5 dra specication for the rst time describes how the browser should expose HTML elements to scripts. Using JavaScript syntax, it describes the methods that scripts may call on document objects. In other words, it describes what commands a given HTML element understands and obeys. Previous HTML specication referred generally to ECMAScript, a standardized family of languages that includes JavaScript, JScript (Microso’s version of JavaScript), and ActionScript (Adobe’s scripting language for Flash). e use of JavaScript in this book is not meant to imply the exclusion of other scripting languages. Equally exciting is the new HTML5 canvas element. It provides a bitmap canvas area that scripts can draw on or load images and video into. A canvas element can be used to render graphs, game graphics, or other visual images on-the-y. ere are also new elements for creating meters (meter) and prog- ress bars (progress). ere are also new element attributes that allow parts of a document to be moved around the page or edited in place and saved acrosssessions. Even with all these new features, HTML5 emphasizes simplicity. is is achieved by segregating the description of document content from the descrip- tions of presentation and interactive behavior. Web authors are encouraged to code the minimal HTML necessary to provide a semantic description of a document. is is what Web Standards is all about: the standards of practice that create web pages that display well on all devices and that are pleasing to everyone and everything that reads them. Allow me to expand on this last point. Search has changed how we use the Web. Although a work must be read and understood by people, it is just as important that the information to help people nd that work be properly constructed. In other words, a web page must be both robot-friendly and people-friendly. is dictum of being friendly to everything (within reason) goes beyond just being browser- and robot-friendly. e Web embraces all kinds of devices, including phones, tablets, netbooks, computers, game consoles, and large public video displays, as well as devices for the visually handicapped. e Web also embraces all languages and writing systems, including right-to-le languages such as Hebrew and Farsi and ideographic character sets such as Japanese and Chinese. We are entering the age of the collaborative Web. It is important to think about pleasing the coauthors, contributors, curators, archivists, and translators who will work with your documents long aer you write them. From the Library of Wow! eBook ptg Summary 21 Do We All Have to Learn HTML5 Now? e short answer is no. First of all, new versions of the HTML specication do not make older versions obsolete. For example, the rst home page I ever created looks the same in Firefox and Chrome today as it did in Mosaic and Arena in 1994. What’s important is the assurance that the web pages we build today will look and function the same in another 15 years. We may update those pages for marketing and aesthetic reasons, but we will not be forced to edit them for technical reasons. Second, if you already know some HTML, it is not a matter of learning a new language or dialect, but simply incorporating new elements into your HTML vocabulary. If you are a content creator/editor using Web-based tools to update web pages and post articles, you need to know that any HTML markup you use in a blog post, press release, or email newsletter will be the same in all your readers’ browsers. It is best for you to stick with the elements and attributes of HTML4 until HTML5 has been more widely adopted and more guidance is forthcoming on how to use the new features. If you design websites and keep up with tech trends on a regular basis, you will learn from your online resources about browser support for new HTML5 elements, which you can incorporate into your work with appropriate fall- backs and cross-browser testing. Now is the time to play with HTML5, while you reexamine your Web design and development methods. e HTML5 Web is collaborative. If you manage a Web design company or development shop, your websites are probably sophisticated enough that you already do browser detection. My suggestion is to let one of your programmers become your HTML5 specialist, creating HTML5-aware versions of some of your in-development and existing websites. Summary Here are the important points to remember from this chapter: . HTML is a semantic markup language for online, hypertext-linked documents. . e Web has a client/server architecture. Web servers respond to requests from user agents such as web browsers, search robots, and web page editors. From the Library of Wow! eBook ptg 22 Chapter 1: HTML and the Web . HTML is supported by many other technologies, the most important of which are Cascading Style Sheets (CSS) for describing the presenta- tion aspects of page elements, and JavaScript for describing element behaviors. . e Web is global and collaborative. Observing Web standards in creat- ing documents will help others build upon your work. . HTML5 provides new elements and attributes for Web designers to work with. However, it is still a dra specication and thus should be seen as a guide for future projects, when more support is available. From the Library of Wow! eBook ptg This page intentionally left blank From the Library of Wow! eBook . a poor practice, partly because the content of a Flash site is generally inaccessible to robots. . PHP (PHP Hypertext Preprocessor) PHP originally stood for Per- sonal Home Page. e PHP Group,. stand-alone applica- tion. Web browsers and page editors also use FTP to upload and down- load les. Dozens of FTP clients are available. One of the most popular is FileZilla, a free, open-source. interactive multimedia spaces. Prior HTML versions (HTML4 and XHTML) are tied to a print metaphor of a page to which interactive capabilities and media support have been added ad hoc. Many pages on the