1. Trang chủ
  2. » Công Nghệ Thông Tin

Tài liệu Cassandra: The Definitive Guide potx

330 2.4K 2

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Cấu trúc

  • Table of Contents

  • Foreword

  • Preface

    • Why Apache Cassandra?

    • Is This Book for You?

    • What’s in This Book?

    • Finding Out More

    • Conventions Used in This Book

    • Using Code Examples

    • Safari® Enabled

    • How to Contact Us

    • Acknowledgments

  • Chapter 1. Introducing Cassandra

    • What’s Wrong with Relational Databases?

    • A Quick Review of Relational Databases

      • RDBMS: The Awesome and the Not-So-Much

        • Transactions, ACID-ity, and two-phase commit

        • Schema

        • Sharding and shared-nothing architecture

        • Summary

      • Web Scale

    • The Cassandra Elevator Pitch

      • Cassandra in 50 Words or Less

      • Distributed and Decentralized

      • Elastic Scalability

      • High Availability and Fault Tolerance

      • Tuneable Consistency

      • Brewer’s CAP Theorem

      • Row-Oriented

      • Schema-Free

      • High Performance

    • Where Did Cassandra Come From?

    • Use Cases for Cassandra

      • Large Deployments

      • Lots of Writes, Statistics, and Analysis

      • Geographical Distribution

      • Evolving Applications

    • Who Is Using Cassandra?

    • Summary

  • Chapter 2. Installing Cassandra

    • Installing the Binary

      • Extracting the Download

      • What’s In There?

    • Building from Source

      • Additional Build Targets

      • Building with Maven

    • Running Cassandra

      • On Windows

      • On Linux

      • Starting the Server

    • Running the Command-Line Client Interface

    • Basic CLI Commands

      • Help

      • Connecting to a Server

      • Describing the Environment

      • Creating a Keyspace and Column Family

      • Writing and Reading Data

    • Summary

  • Chapter 3. The Cassandra Data Model

    • The Relational Data Model

    • A Simple Introduction

    • Clusters

    • Keyspaces

    • Column Families

      • Column Family Options

    • Columns

      • Wide Rows, Skinny Rows

      • Column Sorting

    • Super Columns

      • Composite Keys

    • Design Differences Between RDBMS and Cassandra

      • No Query Language

      • No Referential Integrity

      • Secondary Indexes

      • Sorting Is a Design Decision

      • Denormalization

    • Design Patterns

      • Materialized View

      • Valueless Column

      • Aggregate Key

    • Some Things to Keep in Mind

    • Summary

  • Chapter 4. Sample Application

    • Data Design

    • Hotel App RDBMS Design

    • Hotel App Cassandra Design

    • Hotel Application Code

      • Creating the Database

        • Loading the schema

      • Data Structures

      • Getting a Connection

      • Prepopulating the Database

      • The Search Application

    • Twissandra

    • Summary

  • Chapter 5. The Cassandra Architecture

    • System Keyspace

    • Peer-to-Peer

    • Gossip and Failure Detection

    • Anti-Entropy and Read Repair

    • Memtables, SSTables, and Commit Logs

    • Hinted Handoff

    • Compaction

    • Bloom Filters

    • Tombstones

    • Staged Event-Driven Architecture (SEDA)

    • Managers and Services

      • Cassandra Daemon

      • Storage Service

      • Messaging Service

      • Hinted Handoff Manager

    • Summary

  • Chapter 6. Configuring Cassandra

    • Keyspaces

      • Creating a Column Family

      • Transitioning from 0.6 to 0.7

    • Replicas

    • Replica Placement Strategies

      • Simple Strategy

      • Old Network Topology Strategy

      • Network Topology Strategy

    • Replication Factor

      • Increasing the Replication Factor

    • Partitioners

      • Random Partitioner

      • Order-Preserving Partitioner

      • Collating Order-Preserving Partitioner

      • Byte-Ordered Partitioner

    • Snitches

      • Simple Snitch

      • PropertyFileSnitch

    • Creating a Cluster

      • Changing the Cluster Name

      • Adding Nodes to a Cluster

      • Multiple Seed Nodes

    • Dynamic Ring Participation

    • Security

      • Using SimpleAuthenticator

      • Programmatic Authentication

      • Using MD5 Encryption

      • Providing Your Own Authentication

    • Miscellaneous Settings

    • Additional Tools

      • Viewing Keys

      • Importing Previous Configurations

    • Summary

  • Chapter 7. Reading and Writing Data

    • Query Differences Between RDBMS and Cassandra

      • No Update Query

      • Record-Level Atomicity on Writes

      • No Server-Side Transaction Support

      • No Duplicate Keys

    • Basic Write Properties

    • Consistency Levels

    • Basic Read Properties

    • The API

      • Ranges and Slices

    • Setup and Inserting Data

    • Using a Simple Get

    • Seeding Some Values

    • Slice Predicate

      • Getting Particular Column Names with Get Slice

      • Getting a Set of Columns with Slice Range

        • Counts

        • Reversed

      • Getting All Columns in a Row

    • Get Range Slices

    • Multiget Slice

    • Deleting

    • Batch Mutates

      • Batch Deletes

      • Range Ghosts

    • Programmatically Defining Keyspaces and Column Families

    • Summary

  • Chapter 8. Clients

    • Basic Client API

    • Thrift

      • Thrift Support for Java

      • Exceptions

      • Thrift Summary

    • Avro

      • Avro Ant Targets

      • Avro Specification

      • Avro Summary

    • A Bit of Git

    • Connecting Client Nodes

      • Client List

      • Round-Robin DNS

      • Load Balancer

    • Cassandra Web Console

    • Hector (Java)

      • Features

      • The Hector API

    • HectorSharp (C#)

    • Chirper

    • Chiton (Python)

    • Pelops (Java)

    • Kundera (Java ORM)

    • Fauna (Ruby)

    • Summary

  • Chapter 9. Monitoring

    • Logging

      • Tailing

      • General Tips

        • Following along

        • Warning signs

    • Overview of JMX and MBeans

      • MBeans

      • Integrating JMX

    • Interacting with Cassandra via JMX

    • Cassandra’s MBeans

      • org.apache.cassandra.concurrent

      • org.apache.cassandra.db

      • org.apache.cassandra.gms

      • org.apache.cassandra.service

        • StorageService

        • StreamingService

    • Custom Cassandra MBeans

    • Runtime Analysis Tools

      • Heap Analysis with JMX and JHAT

      • Detecting Thread Problems

    • Health Check

    • Summary

  • Chapter 10. Maintenance

    • Getting Ring Information

      • Info

      • Ring

        • Range Tokens

    • Getting Statistics

      • Using cfstats

      • Using tpstats

    • Basic Maintenance

      • Repair

      • Flush

      • Cleanup

    • Snapshots

      • Taking a Snapshot

      • Clearing a Snapshot

    • Load-Balancing the Cluster

      • loadbalance and streams

    • Decommissioning a Node

    • Updating Nodes

      • Removing Tokens

      • Compaction Threshold

      • Changing Column Families in a Working Cluster

    • Summary

  • Chapter 11. Performance Tuning

    • Data Storage

    • Reply Timeout

    • Commit Logs

    • Memtables

    • Concurrency

    • Caching

    • Buffer Sizes

    • Using the Python Stress Test

      • Generating the Python Thrift Interfaces

        • Getting Thrift

      • Running the Python Stress Test

    • Startup and JVM Settings

      • Tuning the JVM

    • Summary

  • Chapter 12. Integrating Hadoop

    • What Is Hadoop?

    • Working with MapReduce

      • Cassandra Hadoop Source Package

    • Running the Word Count Example

      • Outputting Data to Cassandra

      • Hadoop Streaming

    • Tools Above MapReduce

      • Pig

      • Hive

    • Cluster Configuration

    • Use Cases

      • Raptr.com: Keith Thornhill

      • Imagini: Dave Gardner

    • Summary

  • Appendix. The Nonrelational Landscape

    • Nonrelational Databases

    • Object Databases

    • XML Databases

      • SoftwareAG Tamino

      • eXist

      • Oracle Berkeley XML DB

      • MarkLogic Server

      • Apache Xindice

      • Summary

    • Document-Oriented Databases

      • IBM Lotus

      • Apache CouchDB

      • MongoDB

      • Riak

    • Graph Databases

      • FlockDB

      • Neo4J

    • Key-Value Stores and Distributed Hashtables

      • Amazon Dynamo

      • Project Voldemort

      • Redis

    • Columnar Databases

      • Google Bigtable

      • HBase

      • Hypertable

      • Polyglot Persistence

    • Summary

  • Glossary

  • Index

Nội dung

www.it-ebooks.info www.it-ebooks.info Cassandra: The Definitive Guide www.it-ebooks.info www.it-ebooks.info Cassandra: The Definitive Guide Eben Hewitt Beijing • Cambridge • Farnham • Köln • Sebastopol • Tokyo www.it-ebooks.info Cassandra: The Definitive Guide by Eben Hewitt Copyright © 2011 Eben Hewitt. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/institutional sales department: (800) 998-9938 or corporate@oreilly.com. Editor: Mike Loukides Production Editor: Holly Bauer Copyeditor: Genevieve d’Entremont Proofreader: Emily Quill Indexer: Ellen Troutman Zaig Cover Designer: Karen Montgomery Interior Designer: David Futato Illustrator: Robert Romano Printing History: November 2010: First Edition. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. Cassandra: The Definitive Guide, the image of a Paradise flycatcher, and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, or for damages resulting from the use of the information con- tained herein. TM This book uses RepKover™, a durable and flexible lay-flat binding. ISBN: 978-1-449-39041-9 [M] 1289577822 www.it-ebooks.info This book is dedicated to my sweetheart, Alison Brown. I can hear the sound of violins, long before it begins. www.it-ebooks.info www.it-ebooks.info Table of Contents Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii 1. Introducing Cassandra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 What’s Wrong with Relational Databases? 1 A Quick Review of Relational Databases 6 RDBMS: The Awesome and the Not-So-Much 6 Web Scale 12 The Cassandra Elevator Pitch 14 Cassandra in 50 Words or Less 14 Distributed and Decentralized 14 Elastic Scalability 16 High Availability and Fault Tolerance 16 Tuneable Consistency 17 Brewer’s CAP Theorem 19 Row-Oriented 23 Schema-Free 24 High Performance 24 Where Did Cassandra Come From? 24 Use Cases for Cassandra 25 Large Deployments 25 Lots of Writes, Statistics, and Analysis 26 Geographical Distribution 26 Evolving Applications 26 Who Is Using Cassandra? 26 Summary 28 2. Installing Cassandra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Installing the Binary 29 Extracting the Download 29 vii www.it-ebooks.info What’s In There? 29 Building from Source 30 Additional Build Targets 32 Building with Maven 32 Running Cassandra 33 On Windows 33 On Linux 33 Starting the Server 34 Running the Command-Line Client Interface 35 Basic CLI Commands 36 Help 36 Connecting to a Server 36 Describing the Environment 37 Creating a Keyspace and Column Family 38 Writing and Reading Data 39 Summary 40 3. The Cassandra Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 The Relational Data Model 41 A Simple Introduction 42 Clusters 45 Keyspaces 46 Column Families 47 Column Family Options 49 Columns 49 Wide Rows, Skinny Rows 51 Column Sorting 52 Super Columns 53 Composite Keys 55 Design Differences Between RDBMS and Cassandra 56 No Query Language 56 No Referential Integrity 56 Secondary Indexes 56 Sorting Is a Design Decision 57 Denormalization 57 Design Patterns 58 Materialized View 59 Valueless Column 59 Aggregate Key 59 Some Things to Keep in Mind 60 Summary 60 viii | Table of Contents www.it-ebooks.info [...]... enormous volumes of data; the fact that it does stands as a monument to the ingenious architecture of the Web But some of this infrastructure is starting to bend under the weight In 1966, a company like IBM was in a position to really make people listen to their innovations They had the problems, and they had the brain power to solve them As we enter the second decade of the 21st century, we’re starting... for some length of time; that’s the very point of making updates— that they’re there for others to read However, a more subtle examination might lead us to want to find a way to tune these properties a bit and control them slightly There is, as they say, no free lunch on the Internet, and once we see how we’re paying for our transactions, we may start to wonder whether there’s an alternative Transactions... DB2 database gets its name as the successor to DB1 the product built around the hierarchical data model IMS IMS was released in 1968, and subsequently enjoyed success in Customer Information Control System (CICS) and other applications It is still used today But in the years following the invention of IMS, the new model, the disruptive model, the threatening model, was the relational database In his... www.it-ebooks.info RDBMS, NoSQL The horse, the car, the plane They each build on prior art, they each attempt to solve certain problems, and so they’re each good at certain things—and less good at others They each coexist, even now So let’s examine for a moment why, at this point, we might consider an alternative to the relational database, just as Codd himself four decades ago looked at the Information Management... through the use of transactions, which require locking some portion of the database so it’s not available to other clients This can become untenable under very heavy loads, as the locks mean that competing users start queuing up, waiting for their turn to read or write the data We typically address these problems in one or more of the following ways, sometimes in this order: • Throw hardware at the problem... and updates in the database, which is exacerbated over a cluster • We turn our attention to the database again and decide that, now that the application is built and we understand the primary query paths, we can duplicate some of the data to make it look more like the queries that access it This process, called denormalization, is antithetical to the five normal forms that characterize the relational... www.it-ebooks.info www.it-ebooks.info CHAPTER 1 Introducing Cassandra If at first the idea is not absurd, then there is no hope for it —Albert Einstein Welcome to Cassandra: The Definitive Guide The aim of this book is to help developers and database administrators understand this important new database, explore how it compares to the relational database management systems we’re used to, and help you put... same time, then one of them will have to wait for the other to complete Durable Once a transaction has succeeded, the changes will not be lost This doesn’t imply another transaction won’t later modify the same data; it just means that writers can be confident that the changes are available for the next transaction to work with as necessary On the surface, these properties seem so obviously desirable as... First, the new model was very different from the old model, which it pointedly controverted It was threatening because it can be hard to understand something different and new Ensuing debates can help entrench people stubbornly further in their views—views that might have been 1 www.it-ebooks.info largely inherited from the climate in which they learned their craft and the circumstances in which they... a day, and in other Web 2.0 applications The idea here is that you split the data so that instead of hosting all of it on a single server or replicating all of the data on all of the servers in a cluster, you divide up portions of the data horizontally and host them each separately For example, consider a large customer table in a relational database The least disruptive thing (for the programming . www.it-ebooks.info www.it-ebooks.info Cassandra: The Definitive Guide www.it-ebooks.info www.it-ebooks.info Cassandra: The Definitive Guide Eben Hewitt Beijing • Cambridge • Farnham • Köln • Sebastopol • Tokyo www.it-ebooks.info Cassandra:. Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. Cassandra: The Definitive Guide, the image

Ngày đăng: 21/02/2014, 19:20

TỪ KHÓA LIÊN QUAN

w