, , , , ,
- ,
- , , , and Layout: These are tags that can be used to control or modify layout, like and Redundant: Since user-generated HTML content will be parsed inside your tag, it’d be no use and harmful for your site to end up with more than one tag You need to be sure these tags (, , , , , and the like) are not coming in with the user’s content Script, frame, and style tags: These are tags that allow additional code or languages embedded to run when a page loads and are therefore extremely dangerous Suspect tags are , , , , , , , and This is nowhere near an exhaustive list, but it does cover many of the common tags that can be responsible for causing damage to your site The official HTML and XHTML specifications are available at the web site of the W3C (www.w3.org), also known as the World Wide Web Consortium For some, these documents may be boring and too technical for their needs, so another recommendation for quick references of HTML and XHTML is a web site called W3Schools, which can be found at www.w3schools.com To be honest, the best and safest solution is to not accept any HTML code, but commercial demands may require you to allow some markup to give users a degree of personalization Compile a whitelist of tags that will be allowed across your entire application or site Limits like this need to be documented so every team member, and clients themselves, know what restrictions will always apply when adding object properties and input elements Too bad this is only half the problem when allowing markup Unfortunately, strip_tags() does not modify tag attributes Tag attributes should be taken as seriously as tags themselves Any attribute could be exploited An tag’s href attribute could activate some scary JavaScript or a link that’s not acceptable under your terms-of-use policy What’s worse is that even if you managed to filter href attributes, someone could use an onmouseover attribute that you let get by to run some hidden malicious code The last thing you want to find is users hiding code that would allow them to siphon cookies from other users, enabling the hacker to log in to the site disguised as another user A good solution would be hunting down some reliable regular expressions Or check out a PHP Extension and Application Repository (PEAR) class called HTML_Safe (available online at http://pear.php.net/package/HTML_Safe), which does a good job at cleaning up tags, links, and tag attributes of your choice If you have any plans for accepting JavaScript or similar code from users, you multiply your headaches of safely accepting and rendering user content Most advertisements use JavaScript in their ad code, and you can’t trust any code from users People can use JavaScript for malicious attacks on you and your users, and that would suck You also can’t be sure that everyone’s JavaScript code works as intended Any snippet of copied JavaScript could potentially break the rendering or other scripts of 228 8962CH08.qxd 11/7/07 10:09 AM Page 229 FORMS AND VALIDATION your pages Not good for you, and not good for your users If your application doesn’t need to accept JavaScript code, don’t let it Starting with plain text or only HTML is more than enough for many users Accepting user-generated CSS Cascading Style Sheets, or CSS, is a W3C recommended standard that helps reduce markup by taking care of all the formatting, styling, and layout for documents Here, I’ll touch on the main aspects of CSS you need to watch out for when securing your application Since CSS is popular, you can surf the Web on your own to find more detailed discussion on how CSS can compromise your design or security The first two questionable CSS properties are display and visibility: both can be used against any area of the page and easily hide required branding and advertisements Something like this should be a violation of your usage policies, since it’s affecting the application in an unintended way and hurts advertisement campaigns Here’s a usable CSS style declaration that won’t fix or prevent anything, but each line should give you some insight into how that style property would hurt your application or site .fakeEvilStyle, #header, #ads, ad, #footer, body, #content:before, #content:after, #content:hover { /* your page will be hidden entirely */ display: none; visibility: hidden; /* your page can be rendered offstage */ top: -1000; left: -1000; /* can add unexpected content to the page */ content: string('hacked by CSSx0r'); /* masks your entire site to a 1px by 1px */ clip: rect(0px,0px,1px,1px); /* fake (invisible) image used for some bad reason */ backround-image: url('http://site.com/tracking/you_with/fake.jpg'); } lessHarmfulStyles, #content, h1, h2, h3, h4, h5, h6, p { /* All harmless but when used together they're suspicious */ width: 0; /* no width is not what you intended I'm sure */ height: 0; /* no height is not what you intended either */ overflow: hidden; /* mixing false width/height makes it suspect */ z-index: -100; /* moves elements to the "back of the canvas" */ } Again, you should think how each line of the preceding code sample would affect your site if a proficient user has the chance to override your style sheets As mentioned earlier, just not accepting the input is always the safest move, but you’re going to hear it from someone upstairs that his teenage kid can’t tweak his profile like on some other site The only real way to get around this is to take a smarter approach than just looking for style properties You need to ensure your application forces your own CSS declarations over any set by users while tossing out everything people try to slip by 229 8962CH08.qxd 11/7/07 10:09 AM Page 230 CHAPTER A good recommendation for this server magic is CSS Parser, a small PHP class that is available freely at www.phpclasses.org/cssparser PEAR fans can use a package called HTML_CSS, available at http://pear.php.net/package/HTML_CSS Both provide methods for handling style sheet declarations and can parse strings and CSS documents to inspect any CSS selector and property For CSS cleanup jobs, I use classes like CSS Parser to make sure advertisements and their tags don’t get altered with “harmless” CSS First, imagine your pages have tags wrapped around each advertisement with their class="advertisement" applied You’ll use the CSS Parser class to read your official styles and inspect all user input to make sure users aren’t sneaking anything in that could hurt your ads The following code loads the CSS Parser class, loads your CSS documents, and then prunes unwanted usergenerated content:
- ,