www.it-ebooks.info www.it-ebooks.info Network Security Through Data Analysis Building Situational Awareness Michael Collins www.it-ebooks.info Network Security Through Data Analysis by Michael Collins Copyright © 2014 Michael Collins All rights reserved Printed in the United States of America Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472 O’Reilly books may be purchased for educational, business, or sales promotional use Online editions are also available for most titles (http://my.safaribooksonline.com) For more information, contact our corporate/ institutional sales department: 800-998-9938 or corporate@oreilly.com Editors: Andy Oram and Allyson MacDonald Production Editor: Nicole Shelby Copyeditor: Gillian McGarvey Proofreader: Linley Dolby February 2014: Indexer: Judy McConville Cover Designer: Randy Comer Interior Designer: David Futato Illustrators: Kara Ebrahim and Rebecca Demarest First Edition Revision History for the First Edition: 2014-02-05: First release See http://oreilly.com/catalog/errata.csp?isbn=9781449357900 for release details Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc Network Security Through Data Analysis, the picture of a European Merlin, and related trade dress are trademarks of O’Reilly Media, Inc Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks Where those designations appear in this book, and O’Reilly Media, Inc was aware of a trademark claim, the designations have been printed in caps or initial caps While every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein ISBN: 978-1-449-35790-0 [LSI] www.it-ebooks.info Table of Contents Preface ix Part I Data Sensors and Detectors: An Introduction Vantages: How Sensor Placement Affects Data Collection Domains: Determining Data That Can Be Collected Actions: What a Sensor Does with Data Conclusion 10 13 Network Sensors 15 Network Layering and Its Impact on Instrumentation Network Layers and Vantage Network Layers and Addressing Packet Data Packet and Frame Formats Rolling Buffers Limiting the Data Captured from Each Packet Filtering Specific Types of Packets What If It’s Not Ethernet? NetFlow NetFlow v5 Formats and Fields NetFlow Generation and Collection Further Reading 16 18 23 24 24 25 25 25 29 30 30 32 33 Host and Service Sensors: Logging Traffic at the Source 35 Accessing and Manipulating Logfiles The Contents of Logfiles The Characteristics of a Good Log Message 36 38 38 iii www.it-ebooks.info Existing Logfiles and How to Manipulate Them Representative Logfile Formats HTTP: CLF and ELF SMTP Microsoft Exchange: Message Tracking Logs Logfile Transport: Transfers, Syslog, and Message Queues Transfer and Logfile Rotation Syslog Further Reading 41 43 43 47 49 50 51 51 53 Data Storage for Analysis: Relational Databases, Big Data, and Other Options 55 Log Data and the CRUD Paradigm Creating a Well-Organized Flat File System: Lessons from SiLK A Brief Introduction to NoSQL Systems What Storage Approach to Use Storage Hierarchy, Query Times, and Aging Part II 56 57 59 62 64 Tools The SiLK Suite 69 What Is SiLK and How Does It Work? Acquiring and Installing SiLK The Datafiles Choosing and Formatting Output Field Manipulation: rwcut Basic Field Manipulation: rwfilter Ports and Protocols Size IP Addresses Time TCP Options Helper Options Miscellaneous Filtering Options and Some Hacks rwfileinfo and Provenance Combining Information Flows: rwcount rwset and IP Sets rwuniq rwbag Advanced SiLK Facilities pmaps Collecting SiLK Data YAF iv | Table of Contents www.it-ebooks.info 69 70 70 71 76 77 78 78 80 80 82 82 83 86 88 91 93 93 93 95 96 rwptoflow rwtuc Further Reading 98 98 100 An Introduction to R for Security Analysts 101 Installation and Setup Basics of the Language The R Prompt R Variables Writing Functions Conditionals and Iteration Using the R Workspace Data Frames Visualization Visualization Commands Parameters to Visualization Annotating a Visualization Exporting Visualization Analysis: Statistical Hypothesis Testing Hypothesis Testing Testing Data Further Reading 102 102 102 104 109 111 113 114 117 117 118 120 121 121 122 124 127 Classification and Event Tools: IDS, AV, and SEM 129 How an IDS Works Basic Vocabulary Classifier Failure Rates: Understanding the Base-Rate Fallacy Applying Classification Improving IDS Performance Enhancing IDS Detection Enhancing IDS Response Prefetching Data Further Reading 130 130 134 136 138 138 143 144 145 Reference and Lookup: Tools for Figuring Out Who Someone Is 147 MAC and Hardware Addresses IP Addressing IPv4 Addresses, Their Structure, and Significant Addresses IPv6 Addresses, Their Structure and Significant Addresses Checking Connectivity: Using ping to Connect to an Address Tracerouting IP Intelligence: Geolocation and Demographics 147 150 150 152 153 155 157 Table of Contents www.it-ebooks.info | v DNS DNS Name Structure Forward DNS Querying Using dig The DNS Reverse Lookup Using whois to Find Ownership Additional Reference Tools DNSBLs 158 158 159 167 168 171 171 More Tools 175 Visualization Graphviz Communications and Probing netcat nmap Scapy Packet Inspection and Reference Wireshark GeoIP The NVD, Malware Sites, and the C*Es Search Engines, Mailing Lists, and People Further Reading Part III 175 175 178 179 180 181 184 184 185 186 187 188 Analytics 10 Exploratory Data Analysis and Visualization 191 The Goal of EDA: Applying Analysis EDA Workflow Variables and Visualization Univariate Visualization: Histograms, QQ Plots, Boxplots, and Rank Plots Histograms Bar Plots (Not Pie Charts) The Quantile-Quantile (QQ) Plot The Five-Number Summary and the Boxplot Generating a Boxplot Bivariate Description Scatterplots Contingency Tables Multivariate Visualization Operationalizing Security Visualization vi | Table of Contents www.it-ebooks.info 193 194 196 197 198 200 201 203 204 207 207 210 211 213 Further Reading 220 11 On Fumbling 221 Attack Models Fumbling: Misconfiguration, Automation, and Scanning Lookup Failures Automation Scanning Identifying Fumbling TCP Fumbling: The State Machine ICMP Messages and Fumbling Identifying UDP Fumbling Fumbling at the Service Level HTTP Fumbling SMTP Fumbling Analyzing Fumbling Building Fumbling Alarms Forensic Analysis of Fumbling Engineering a Network to Take Advantage of Fumbling Further Reading 221 224 224 225 225 226 226 229 231 231 231 233 233 234 235 236 236 12 Volume and Time Analysis 237 The Workday and Its Impact on Network Traffic Volume Beaconing File Transfers/Raiding Locality DDoS, Flash Crowds, and Resource Exhaustion DDoS and Routing Infrastructure Applying Volume and Locality Analysis Data Selection Using Volume as an Alarm Using Beaconing as an Alarm Using Locality as an Alarm Engineering Solutions Further Reading 237 240 243 246 249 250 256 256 258 259 259 260 260 13 Graph Analysis 261 Graph Attributes: What Is a Graph? Labeling, Weight, and Paths Components and Connectivity Clustering Coefficient Analyzing Graphs 261 265 270 271 273 Table of Contents www.it-ebooks.info | vii Using Component Analysis as an Alarm Using Centrality Analysis for Forensics Using Breadth-First Searches Forensically Using Centrality Analysis for Engineering Further Reading 273 275 275 277 277 14 Application Identification 279 Mechanisms for Application Identification Port Number Application Identification by Banner Grabbing Application Identification by Behavior Application Identification by Subsidiary Site Application Banners: Identifying and Classifying Non-Web Banners Web Client Banners: The User-Agent String Further Reading 279 280 283 286 290 291 291 292 294 15 Network Mapping 295 Creating an Initial Network Inventory and Map Creating an Inventory: Data, Coverage, and Files Phase I: The First Three Questions Phase II: Examining the IP Space Phase III: Identifying Blind and Confusing Traffic Phase IV: Identifying Clients and Servers Identifying Sensing and Blocking Infrastructure Updating the Inventory: Toward Continuous Audit Further Reading 295 296 297 300 305 309 311 311 312 Index 313 viii | Table of Contents www.it-ebooks.info ...www.it-ebooks.info Network Security Through Data Analysis Building Situational Awareness Michael Collins www.it-ebooks.info Network Security Through Data Analysis by Michael Collins... organization of data Data storage and lo‐ gistics are a critical problem in security analysis; it’s easy to collect data, but hard to search through it and find actual phenomena Data has a footprint,... book is about collecting data and looking at networks in order to understand how the network is used The focus is on analysis, which is the process of taking security data and using it to make