Devops a software architect’s perspective

DevOps promises to accelerate the release of new software features and improve monitoring after systems are placed into operation. However, DevOps has crucial implications for system design and architecture that most previous books ignore. In DevOps: A Software Architect''''s Perspective, three world-class software architects address these issues head-on, helping organizations deploy DevOps more efficiently, avoid common problems, and drive more value. The authors begin by reviewing DevOps'''' impact on every phase of the development cycle, including build, test, deployment, and post-deployment monitoring and observation. For each phase, they systematically identify issues, tools, team practices, and key tradeoffs associated with preparing for DevOps and using it effectively. Next, they turn to cross-cutting concerns that transcend a single function, offering practical insights into compliance, cloud environments, human and process performance, reliability, repeatability, and more. Throughout, they offer real-world case studies, detailed references, practical examples, and convenient checklists. You''''ll find indispensable guidance for addressing key questions like: How can I design new systems to work more effectively with DevOps? How do I address culture and communication problems between Dev and Ops? How do I integrate DevOps with agile methods and TDD? What are the best ways to handle specific issues such as failure detection and upgrade planning?

Trang 1

About This eBook

ePUB is an open, industry-standard format for eBooks However, support of ePUB and its manyfeatures varies across reading devices and applications Use your device or app settings tocustomize the presentation to your liking Settings that you can customize often include font, fontsize, single or double column, landscape or portrait mode, and figures that you can click or tap toenlarge For additional information about the settings and features on your reading device or app,visit the device manufacturer’s Web site

Many titles include programming code or configuration examples To optimize the presentation

of these elements, view the eBook in single-column, landscape mode and adjust the font size tothe smallest setting In addition to presenting code and configurations in the reflowable textformat, we have included images of the code that mimic the presentation found in the print book;therefore, where the reflowable format may compromise the presentation of the code listing, youwill see a “Click here to view code image” link Click the link to view the print-fidelity codeimage To return to the previous page viewed, click the Back button on your device or app

Trang 3

PART ONE BACKGROUND

CHAPTER 1 What Is DevOps?

1.9 For Further Reading

CHAPTER 2 The Cloud as a Platform

2.1 Introduction

2.2 Features of the Cloud

2.3 DevOps Consequences of the Unique Cloud Features2.4 Summary

CHAPTER 3 Operations

Trang 4

3.1 Introduction

3.2 Operations Services

3.3 Service Operation Functions

3.4 Continual Service Improvement

3.5 Operations and DevOps

3.6 Summary

PART TWO THE DEPLOYMENT PIPELINE CHAPTER 4 Overall Architecture

4.1 Do DevOps Practices Require Architectural Change?4.2 Overall Architecture Structure

4.3 Quality Discussion of Microservice Architecture4.4 Amazon’s Rules for Teams

4.5 Microservice Adoption for Existing Systems

4.6 Summary

CHAPTER 5 Building and Testing

5.1 Introduction

5.2 Moving a System Through the Deployment Pipeline5.3 Crosscutting Aspects

5.4 Development and Pre-commit Testing

5.5 Build and Integration Testing

5.6 UAT/Staging/Performance Testing

Trang 5

PART THREE CROSSCUTTING CONCERNS CHAPTER 7 Monitoring

Trang 6

7.7 Tools

7.8 Diagnosing an Anomaly from Monitoring Data—the Case of Platformer.com7.9 Summary

CHAPTER 8 Security and Security Audits

8.10 Application Design Considerations

8.11 Deployment Pipeline Design Considerations

8.12 Summary

CHAPTER 9 Other Ilities

9.1 Introduction

9.2 Repeatability

9.3 Performance

9.4 Reliability

Trang 7

CHAPTER 10 Business Considerations

PART FOUR CASE STUDIES

CHAPTER 11 Supporting Multiple Datacenters

Trang 8

CHAPTER 12 Implementing a Continuous Deployment Pipeline for Enterprises

12.1 Introduction

12.2 Organizational Context

12.3 The Continuous Deployment Pipeline

12.4 Baking Security into the Foundations of the CD Pipeline

12.5 Advanced Concepts

12.6 Summary

CHAPTER 13 Migrating to Microservices

13.1 Introduction to Atlassian

13.2 Building a Platform for Deploying Microservices

13.3 BlobStore: A Microservice Example

13.4 Development Process

13.5 Evolving BlobStore

13.6 Summary

PART FIVE MOVING INTO THE FUTURE

CHAPTER 14 Operations as a Process

14.1 Introduction

14.2 Motivation and Overview

14.3 Offline Activities

14.4 Online Activities

Trang 9

14.5 Error Diagnosis

14.6 Monitoring

14.7 Summary

CHAPTER 15 The Future of DevOps

(e.g., the novel The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win) and from the project manager’s perspective (e.g., Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation) In addition, there is a raft of

material about cultural change and what it means to tear down barriers between organizationalunits

What frustrated us is that there is very little material from the software architect’s perspective.Treating operations personnel as first-class stakeholders and listening to their requirements iscertainly important Using tools to support operations and project management is also important.Yet, we had the strong feeling that there was more to it than stakeholder management and the use

of tools

Trang 10

Indeed there is, and that is the gap that this book intends to fill DevOps presents a fascinatinginterplay between design, process, tooling, and organizational structure We try to answer twoprimary questions: What technical decisions do I, as a software architect, have to make toachieve the DevOps goals? What impact do the other actors in the DevOps space have on me?

The answers are that achieving DevOps goals can involve fundamental changes in thearchitecture of your systems and in the roles and responsibilities required to get your systemsinto production and support them once they are there

Just as software architects must understand the business context and goals for the systems theydesign and construct, understanding DevOps requires understanding organizational and businesscontexts, as well as technical and operational contexts We explore all of these

The primary audience for this book is practicing software architects who have been or expect to

be asked, “Should this project or organization adopt DevOps practices?” Instead of being asked,the architect may be told As with all books, we expect additional categories of readers Studentswho are interested in learning more about the practice of software architecture should findinteresting material here Researchers who wish to investigate DevOps topics can find importantbackground material Our primary focus, however, is on practicing architects

Previewing the Book

We begin the book by discussing the background for DevOps Part One begins by delving intothe goals of DevOps and the problems it is intended to solve We touch on organizational andcultural issues, as well as the relationship of DevOps practices to agile methodologies

In Chapter 2, we explore the cloud DevOps practices have grown in tandem with the growth ofthe cloud as a platform The two, in theory, are separable, but in practice virtualization and thecloud are important enablers for DevOps practices

In our final background chapter, Chapter 3, we explore operations through the prism of theInformation Technology Infrastructure Library (ITIL) ITIL is a system of organization of themost important functions of an operations group Not all of operations are included in DevOpspractices but understanding something of the responsibilities of an operations group providesimportant context, especially when it comes to understanding roles and responsibilities

Part Two describes the deployment pipeline We begin this part by exploring the microservicearchitectural style in Chapter 4 It is not mandatory that systems be architected in this style inorder to apply DevOps practices but the microservice architectural style is designed to solvemany of the problems that motivated DevOps

In Chapter 5, we hurry through the building and testing processes and tool chains It is important

to understand these but they are not our focus We touch on the different environments used toget a system into production and the different sorts of tests run on these environments Sincemany of the tools used in DevOps are used in the building and testing processes, we providecontext for understanding these tools and how to control them

Trang 11

We conclude Part Two by discussing deployment One of the goals of DevOps is to speed updeployments A technique used to achieve this goal is to allow each development team toindependently deploy their code when it is ready Independent deployment introduces manyissues of consistency We discuss different deployment models, managing distinct versions of asystem that are simultaneously in production, rolling back in the case of errors, and other topicshaving to do with actually placing your system in production.

Part Two presents a functional perspective on deployment practices Yet, just as with any othersystem, it is frequently the quality perspectives that control the design and the acceptance of thesystem In Part Three, we focus on crosscutting concerns This begins with our discussion ofmonitoring and live testing in Chapter 7 Modern software testing practices do not end when asystem is placed into production First, systems are monitored extensively to detect problems,and secondly, testing continues in a variety of forms after a system has been placed intoproduction

Another crosscutting concern is security, which we cover in Chapter 8 We present the differenttypes of security controls that exist in an environment, spanning those that are organization wideand those that are specific system wide We discuss the different roles associated with achievingsecurity and how these roles are evaluated in the case of a security audit

Security is not the only quality of interest, and in Chapter 9 we discuss other qualities that arerelevant to the practices associated with DevOps We cover topics such as performance,reliability, and modifiability of the deployment pipeline

Finally, in Part Three we discuss business considerations in Chapter 10 Practices as broad asDevOps cannot be adopted without buy-in from management A business plan is a typical means

of acquiring this buy-in; thus, we present the elements of a business plan for DevOps adoptionand discuss how the argument, rollout, and measurement should proceed

In Part Four we present three case studies Organizations that have implemented DevOpspractices tell us some of their tricks Chapter 11 discusses how to maintain two datacenters forthe purpose of business continuity; Chapter 12 presents the specifics of a continuous deploymentpipeline; and Chapter 13 describes how one organization is migrating to a microservicearchitecture

We close by speculating about the future in Part Five Chapter 14 describes our research and how

it is based on viewing operations as a series of processes, and Chapter 15 gives our prediction forhow the next three to five years are going to evolve in terms of DevOps

Acknowledgments

Books like this require a lot of assistance We would like to thank Chris Williams, John Painter,Daniel Hand, and Sidney Shek for their contributions to the case studies, as well as AdneneGuabtni, Kanchana Wickremasinghe, Min Fu, and Xiwei Xu for helping us with some of thechapters

Trang 12

Manuel Pais helped us arrange case studies Philippe Kruchten, Eoin Woods, Gregory Hartman,Sidney Shek, Michael Lorant, Wouter Geurts, and Eltjo Poort commented on or contributed tovarious aspects of the book.

We would like to thank Jean-Michel Lemieux, Greg Warden, Robin Fernandes, Jerome Blin, Felipe Cuozzo, Pramod Korathota, Nick Wright, Vitaly Osipov, Brad Baker, and Jim Wattsfor their comments on Chapter 13

Touffe-Addison-Wesley did their usual professional and efficient job in the production process, and thisbook has benefited from their expertise

Finally, we would like to thank NICTA and NICTA management NICTA is funded by theAustralian government through the Department of Communications and the Australian ResearchCouncil through the ICT Centre of Excellence Program Without their generous support, thisbook would not have been written

Legend

We use four distinct legends for the figures We have an architectural notation that identifies thekey architectural concepts that we use; we use Business Process Model and Notation (BPMN) todescribe some processes, Porter’s Value Notation to describe a few others, and UML sequencediagrams for interleaving sequences of activities We do not show the UML sequence diagramnotation here but the notation that we use from these other sources is:

Architecture

FIGURE P.1 People, both individual and groups

FIGURE P.2 Components (runtime entities), modules (code-time collections of entities), and

data flow

Trang 13

FIGURE P.3 Specialized entities

FIGURE P.4 Collections of entities

BPMN

We use Business Process Model and Notation (BPMN) for describing events and activities[OMG 11]

FIGURE P.5 Event indications

FIGURE P.6 Activities and sequences of activities

Porter’s Value Chain

This notation is used to describe processes (which, in turn, have activities modelled in BPMN)

FIGURE P.7 Entry in a value chain

Part One: Background

This part provides the necessary background for the remainder of the book DevOps is amovement that envisions no friction between the development groups and the operations groups

In addition, the emergence of DevOps coincides with the growth of the cloud as a basic platformfor organizations, large and small Part One has three chapters

Trang 14

In Chapter 1, we define DevOps and discuss its various motivations DevOps is a catchall termthat can cover several meanings, including: having development and operations speak to eachother; allowing development teams to deploy to production automatically; and havingdevelopment teams be the first responders when an error is discovered in production In thischapter, we sort out these various considerations and develop a coherent description of whatDevOps is, what its motivations and goals are, and how it is going about achieving those goals.

In order to understand how certain DevOps practices work, it is necessary to know how the cloudworks, which we discuss in Chapter 2 In particular, you should know how virtual machineswork, how IP addresses are used, the role of and how to manipulate Domain Name System(DNS) servers, and how load balancers and monitors interact to provide on-demand scaling

DevOps involves the modifications of both Dev and Ops practices In Chapter 3, we discuss Ops

in its totality It describes the services that Ops provides to the organization and introduces Opsresponsibilities, from supporting deployed applications to enforcing organization-wide securityrules

We begin by defining DevOps and providing a short example Then we present the motivationfor the movement, the DevOps perspective, and barriers to the success of DevOps Much of thewriting on DevOps discusses various organizational and cultural issues In this first chapter, wesummarize these topics, which frame the remainder of the book

Defining DevOps

DevOps has been classified as “on the rise” with respect to the Gartner Hype Cycle forApplication Development in 2013 This classification means that the term is becoming a buzz

Trang 15

word and, as such, is ill defined and subject to overblown claims Our definition of DevOpsfocuses on the goals, rather than the means.

DevOps is a set of practices intended to reduce the time between committing a change to a system and the change being placed into normal production, while ensuring high quality.

Before we delve more deeply into what set of practices is included, let’s look at some of theimplications of our definition

The quality of the deployed change to a system (usually in the form of code) is important.Quality means suitability for use by various stakeholders including end users, developers, orsystem administrators It also includes availability, security, reliability, and other “ilities.” Onemethod for ensuring quality is to have a variety of automated test cases that must be passed prior

to placing changed code into production Another method is to test the change in production with

a limited set of users prior to opening it up to the world Still another method is to closelymonitor newly deployed code for a period of time We do not specify in the definition howquality is ensured but we do require that production code be of high quality

The definition also requires the delivery mechanism to be of high quality This implies thatreliability and the repeatability of the delivery mechanism should be high If the deliverymechanism fails regularly, the time required increases If there are errors in how the change isdelivered, the quality of the deployed system suffers, for example, through reduced availability

or reliability

We identify two time periods as being important One is the time when a developer commitsnewly developed code This marks the end of basic development and the beginning of thedeployment path The second time is the deploying of that code into production As we will see

in Chapter 6, there is a period after code has been deployed into production when the code isbeing tested through live testing and is closely monitored for potential problems Once the codehas passed live testing and close monitoring, then it is considered as a portion of the normalproduction system We make a distinction between deploying code into production for livetesting and close monitoring and then, after passing the tests, promoting the newly developedcode to be equivalent to previously developed code

Our definition is goal oriented We do not specify the form of the practices or whether tools areused to implement them If a practice is intended to reduce the time between a commit from adeveloper and deploying into production, it is a DevOps practice whether it involves agilemethods, tools, or forms of coordination This is in contrast to several other definitions.Wikipedia, for example, stresses communication, collaboration, and integration between variousstakeholders without stating the goal of such communication, collaboration, or integration.Timing goals are implicit Other definitions stress the connection between DevOps and agilemethods Again, there is no mention of the benefits of utilizing agile methods on either the time

to develop or the quality of the production system Still other definitions stress the tools beingused, without mentioning the goal of DevOps practices, the time involved, or the quality

Trang 16

Finally, the goals specified in the definition do not restrict the scope of DevOps practices totesting and deployment In order to achieve these goals, it is important to include an Opsperspective in the collection of requirements—that is, significantly earlier than committingchanges Analogously, the definition does not mean DevOps practices end with deployment intoproduction; the goal is to ensure high quality of the deployed system throughout its life cycle.Thus, monitoring practices that help achieve the goals are to be included as well.

an operator Involving operations in the development of requirements will ensure that these types

of requirements are considered

Make Dev more responsible for relevant incident handling These practices are intended toshorten the time between the observation of an error and the repair of that error Organizationsthat utilize these practices typically have a period of time in which Dev has primaryresponsibility for a new deployment; later on, Ops has primary responsibility

Enforce the deployment process used by all, including Dev and Ops personnel These practicesare intended to ensure a higher quality of deployments This avoids errors caused by ad hocdeployments and the resulting misconfiguration The practices also refer to the time that it takes

to diagnose and repair an error The normal deployment process should make it easy to trace thehistory of a particular deployment artifact and understand the components that were included inthat artifact

Use continuous deployment Practices associated with continuous deployment are intended toshorten the time between a developer committing code to a repository and the code beingdeployed Continuous deployment also emphasizes automated tests to increase the quality ofcode making its way into production

Develop infrastructure code, such as deployment scripts, with the same set of practices asapplication code Practices that apply to the development of infrastructure code are intended toensure both high quality in the deployed applications and that deployments proceed as planned.Errors in deployment scripts such as misconfigurations can cause errors in the application, theenvironment, or the deployment process Applying quality control practices used in normalsoftware development when developing operations scripts and processes will help control thequality of these specifications

Figure 1.1 gives an overview of DevOps processes At its most basic, DevOps advocates treatingOperations personnel as first-class stakeholders Preparing a release can be a very serious andonerous process (We describe that in the section “Release Process.”) As such, operations

Trang 17

personnel may need to be trained in the types of runtime errors that can occur in a system underdevelopment; they may have suggestions as to the type and structure of log files, and they mayprovide other types of input into the requirements process At its most extreme, DevOpspractices make developers responsible for monitoring the progress and errors that occur duringdeployment and execution, so theirs would be the voices suggesting requirements In betweenare practices that cover team practices, build processes, testing processes, and deploymentprocesses We discuss the continuous deployment pipeline in Chapters 5 and 6 We also covermonitoring, security, and audits in subsequent chapters.

FIGURE 1.1 DevOps life cycle processes [Notation: Porter’s Value Chain]

You may have some questions about terminology with the terms IT professional, operator, and operations personnel Another related term is system administrator The IT professional

subsumes the mentioned roles and others, such as help desk support The distinction interminology between operators and system administrators has historical roots but is much lesstrue today Historically, operators had hands-on access to the hardware—installing andconfiguring hardware, managing backups, and maintaining printers—while systemadministrators were responsible for uptime, performance, resources, and security of computersystems Today it is the rare operator who does not take on some duties formerly assigned to a

system administrator We will use the term operator to refer to anyone who performs computer

operator or system administration tasks (or both)

Example of Continuous Deployment: IMVU

IMVU, Inc is a social entertainment company whose product allows users to connect through3D avatar-based experiences This section is adapted from a blog written by an IMVU engineer

IMVU does continuous integration The developers commit early and often A commit triggers

an execution of a test suite IMVU has a thousand test files, distributed across 30–40 machines,and the test suite takes about nine minutes to run Once a commit has passed all of its tests, it isautomatically sent to deployment This takes about six minutes The code is moved to thehundreds of machines in the cluster, but at first the code is only made live on a small number ofmachines (canaries) A sampling program examines the results of the canaries and if there hasbeen a statistically significant regression, then the revision is automatically rolled back.Otherwise the remainder of the cluster is made active IMVU deploys new code 50 times a day,

on average

Trang 18

The essence of the process is in the test suite Every time a commit gets through the test suite and

is rolled back, a new test is generated that would have caught the erroneous deployment, and it isadded to the test suite

Note that a full test suite (with the confidence of production deployment) that only takes nineminutes to run is uncommon for large-scale systems In many organizations, the full test suitethat provides production deployment confidence can take hours to run, which is often doneovernight A common challenge is to reduce the size of the test suite judiciously and remove

“flaky” tests

1.2 Why DevOps?

DevOps, in many ways, is a response to the problem of slow releases The longer it takes arelease to get to market, the less advantage will accrue from whatever features or qualityimprovements led to the release Ideally, we want to release in a continuous manner This is

often termed continuous delivery or continuous deployment We discuss the subtle difference

between the two terms in Chapters 5 and 6 In this book, we use the term continuous deployment or just deployment We begin by describing a formal release process, and then we

delve more deeply into some of the reasons for slow releases

Release Process

Releasing a new system or version of an existing system to customers is one of the mostsensitive steps in the software development cycle This is true whether the system or version isfor external distribution, is used directly by consumers, or is strictly for internal use As long asthe system is used by more than one person, releasing a new version opens the possibility ofincompatibilities or failures, with subsequent unhappiness on the part of the customers

Consequently, organizations pay a great deal of attention to the process of defining a releaseplan The following release planning steps are adapted from Wikipedia Traditionally, most ofthe steps are done manually

1 Define and agree on release and deployment plans with customers/stakeholders This could be

done at the team or organizational level The release and deployment plans will include thosefeatures to be included in the new release as well as ensure that operations personnel (includinghelp desk and support personnel) are aware of schedules, resource requirements are met, and anyadditional training that might be required is scheduled

2 Ensure that each release package consists of a set of related assets and service components that

are compatible with each other Everything changes over time, including libraries, platforms, anddependent services Changes may introduce incompatibilities This step is intended to preventincompatibilities from becoming apparent only after deployment In Chapter 5, we discuss theways of ensuring all of these compatibilities Managing dependencies is a theme that will surfacerepeatedly throughout this book

Trang 19

3 Ensure that the integrity of a release package and its constituent components is maintained

throughout the transition activities and recorded accurately in the configuration managementsystem There are two parts to this step: The first is to make sure that old versions of acomponent are not inadvertently included in the release, and the second is to make sure that arecord is kept of the components of this deployment Knowing the elements of the deployment isimportant when tracking down errors found after deployment We discuss the details ofdeployment in Chapter 6

4 Ensure that all release and deployment packages can be tracked, installed, tested, verified, and/

or uninstalled or rolled back, if appropriate Deployments may need to be rolled back (new

version uninstalled, old version redeployed) under a variety of circumstances, such as errors inthe code, inadequate resources, or expired licenses or certificates

The activities enumerated in this list can be accomplished with differing levels of automation Ifall of these activities are accomplished primarily through human coordination then these stepsare labor-intensive, time-consuming, and error-prone Any automation reflects an agreement onthe release process whether at the team or organization level Since tools are typically used morethan once, an agreement on the release process encoded into a tool has persistence beyond asingle release

In case you are tempted to downplay the seriousness of getting the deployment correct, you maywant to consider recent media reports with substantial financial costs

On August 1, 2012, Knight Capital had an upgrade failure that ended up costing (US) $440million

On August 20, 2013, Goldman Sachs had an upgrade failure that, potentially, could costmillions of dollars

These are just two of the many examples that have resulted in downtime or errors because ofupgrade failure Deploying an upgrade correctly is a significant and important activity for anorganization and, yet, one that should be done in a timely fashion with minimal opportunity forerror Several organizations have done surveys to document the extent of deployment problems

We report on two of them

XebiaLabs is an organization that markets a deployment tool and a continuous integration tool.They did a survey in 2013 with over 130 responses 34% of the respondents were from ITservices companies with approximately 10% each from health care, financial services, andtelecommunications companies 7.5% of the respondents reported their deployment process was

“not reliable,” and 57.5% reported their deployment process “needs improvement.” 49% reportedtheir biggest challenge in the deployment process was “too much inconsistency acrossenvironments and applications.” 32.5% reported “too many errors.” 29.2% reported theirdeployments relied on custom scripting, and 35.8% reported their deployments were partiallyscripted and partially manual

Trang 20

CA Technologies provides IT management solutions to their customers They commissioned asurvey in 2013 that had 1,300 respondents from companies with more than (US) $100 millionrevenue Of those who reported seeing benefits from the adoption of DevOps, 53% said theywere already seeing an increased frequency of deployment of their software or services and 41%said they were anticipating seeing an increased frequency of deployment 42% responded thatthey had seen improved quality of deployed applications, and 49% responded they anticipatedseeing improved quality.

Although both surveys are sponsored by organizations with a vested interest in promotingdeployment automation, they also clearly indicate that the speed and quality of deployments are

a concern to many companies in a variety of different markets

Reasons for Poor Coordination

Consider what happens after a developer group has completed all of the coding and testing for asystem The system needs to be placed into an environment where:

Only the appropriate people have access to it

It is compatible with all of the other systems with which it interacts in the environment

It has sufficient resources on which to operate

The data that it uses to operate is up to date

The data that it generates is usable by other systems in the environment

Furthermore, help desk personnel need to be trained in features of the new system and operationspersonnel need to be trained in troubleshooting any problems that might occur while the system

is operating The timing of the release may also be of significance because it should not coincidewith the absence of any key member of the operations staff or with a new sales promotion thatwill stress the existing resources

None of this happens by accident but each of these items requires coordination between thedevelopers and the operations personnel It is easy to imagine a scenario where one or more ofthese items are not communicated by the development personnel to the operations personnel Acommon attitude among developers is “I finished the development, now go and run it.” Weexplore the reasons for this attitude when we discuss the cultural barrier to adoption of DevOps.One reason that organizations have processes to ensure smooth releases is that coordination doesnot always happen in an appropriate manner This is one of the complaints that motivated theDevOps movement

Trang 21

Limited Capacity of Operations Staff

Operations staff perform a variety of functions but there are limits as to what they canaccomplish or who on the staff is knowledgeable in what system Consider the responsibilities of

a modern operations person as detailed in Wikipedia

Analyzing system logs and identifying potential issues with computer systems

Introducing and integrating new technologies into existing datacenter environments

Performing routine audits of systems and software

Performing backups

Applying operating system updates, patches, and configuration changes

Installing and configuring new hardware and software

Adding, removing, or updating user account information; resetting passwords, etc

Answering technical queries and assisting users

Ensuring security

Documenting the configuration of the system

Troubleshooting any reported problems

Optimizing system performance

Ensuring that the network infrastructure is up and running

Configuring, adding, and deleting file systems

Maintaining knowledge of volume management tools like Veritas (now Symantec), SolarisZFS, LVM

Each of these items requires a deep level of understanding Is it any wonder that when we askedthe IT director of an Internet-based company what his largest problem was, he replied “findingand keeping qualified personnel.”

The DevOps movement is taking a different approach Their approach is to reduce the need fordedicated operations personnel through automating many of the tasks formerly done byoperations and having developers assume a portion of the remainder

Trang 22

1.3 DevOps Perspective

Given the problems we have discussed and their long-standing nature, it is no surprise that there

is a significant appeal for a movement that promises to reduce the time to market for newfeatures and reduce errors occurring in deployment DevOps comes in multiple flavors and withdifferent degrees of variation from current practice, but two themes run consistently through thedifferent flavors: automation and the responsibilities of the development team

Automation

Figure 1.1 shows the various life cycle processes The steps from build and testing throughexecution can all be automated to some degree We will discuss the tools used in each one ofthese steps in the appropriate chapters, but here we highlight the virtues of automation Some ofthe problems with relying on automation are discussed in Section 1.7

Tools can perform the actions required in each step of the process, check the validity of actionsagainst the production environment or against some external specification, inform appropriatepersonnel of errors occurring in the process, and maintain a history of actions for quality control,reporting, and auditing purposes

Tools and scripts also can enforce organization-wide policies Suppose the organization has apolicy that every change has to have a rationale associated with the change Then prior tocommitting a change, a tool or script can require a rationale to be provided by the individualmaking the change Certainly, this requirement can be circumvented, but having the tool ask for

a rationale will increase the compliance level for this policy

Once tools become central to a set of processes, then the use of these tools must also bemanaged Tools are invoked, for example, from scripts, configuration changes, or the operator’sconsole Where console commands are complicated, it is advisable to script their usage, even ifthere is only a handful of commands being used Tools may be controlled through specificationfiles, such as Chef cookbooks or Amazon CloudFormation—more on these later The scripts,configuration files, and specification files must be subject to the same quality control as theapplication code itself The scripts and files should also be under version control and subject toexamination for corrections This is often termed “infrastructure-as-code.”

Development Team Responsibilities

Automation will reduce the incidence of errors and will shorten the time to deployment Tofurther shorten the time to deployment, consider the responsibilities of operations personnel asdetailed earlier If the development team accepts DevOps responsibilities, that is, it delivers,supports, and maintains the service, then there is less need to transfer knowledge to theoperations and support staff since all of the necessary knowledge is resident in the developmentteam Not having to transfer knowledge removes a significant coordination step from thedeployment process

Trang 23

1.4 DevOps and Agile

One of the characterizations of DevOps emphasizes the relationship of DevOps practices to agilepractices In this section, we overlay the DevOps practices on IBM’s Disciplined Agile Delivery.Our focus is on what is added by DevOps, not an explanation of Disciplined Agile Delivery For

that, see Disciplined Agile Delivery: A Practitioner’s Approach As shown in Figure 1.2,Disciplined Agile Delivery has three phases—inception, construction, and transition In theDevOps context, we interpret transition as deployment

FIGURE 1.2 Disciplined Agile Delivery phases for each release (Adapted from Disciplined

Agile Delivery: A Practitioner’s Guide by Ambler and Lines) [Notation: Porter’s Value Chain]

DevOps practices impact all three phases

1 Inception phase During the inception phase, release planning and initial requirements

specification are done

a Considerations of Ops will add some requirements for the developers We will see these in

more detail later in this book, but maintaining backward compatibility between releases andhaving features be software switchable are two of these requirements The form and content ofoperational log messages impacts the ability of Ops to troubleshoot a problem

b Release planning includes feature prioritization but it also includes coordination with

operations personnel about the scheduling of the release and determining what training theoperations personnel require to support the new release Release planning also includesensuring compatibility with other packages in the environment and a recovery plan if the releasefails DevOps practices make incorporation of many of the coordination-related topics in releaseplanning unnecessary, whereas other aspects become highly automated

2 Construction phase During the construction phase, key elements of the DevOps practices are

the management of the code branches, the use of continuous integration and continuousdeployment, and incorporation of test cases for automated testing These are also agile practicesbut form an important portion of the ability to automate the deployment pipeline A new element

is the integrated and automated connection between construction and transition activities

Trang 24

3 Transition phase In the transition phase, the solution is deployed and the development team is

responsible for the deployment, monitoring the process of the deployment, deciding whether toroll back and when, and monitoring the execution after deployment The development team has arole of “reliability engineer,” who is responsible for monitoring and troubleshooting problemsduring deployment and subsequent execution

The advantages of small teams are:

They can make decisions quickly In every meeting, attendees wish to express their opinions.The smaller the number of attendees at the meeting, the fewer the number of opinions expressedand the less time spent hearing differing opinions Consequently, the opinions can be expressedand a consensus arrived at faster than with a large team

It is easier to fashion a small number of people into a coherent unit than a large number Acoherent unit is one in which everyone understands and subscribes to a common set of goals forthe team

It is easier for individuals to express an opinion or idea in front of a small group than in front of

a large one

The disadvantage of a small team is that some tasks are larger than can be accomplished by asmall number of individuals In this case the task has to be broken up into smaller pieces, eachgiven to a different team, and the different pieces need to work together sufficiently well toaccomplish the larger task To achieve this, the teams need to coordinate

The team size becomes a major driver of the overall architecture A small team, by necessity,works on a small amount of code We will see that an architecture constructed around acollection of microservices is a good means to package these small tasks and reduce the need forexplicit coordination—so we will call the output of a development team a “service.” We discussthe ways and challenges of migrating to a microservice architecture driven by small teams

in Chapter 4 and the case study in Chapter 13 from Atlassian

Trang 25

Team Roles

We lift two of the roles in the team from Scott Ambler’s description of roles in an agile team

Team lead This role, called “Scrum Master” in Scrum or team coach or project lead in other

methods, is responsible for facilitating the team, obtaining resources for it, and protecting it fromproblems This role encompasses the soft skills of project management but not the technical onessuch as planning and scheduling, activities which are better left to the team as a whole

Team member This role, sometimes referred to as developer or programmer, is responsible for

the creation and delivery of a system This includes modeling, programming, testing, and releaseactivities, as well as others

Additional roles in a team executing a DevOps process consist of service owner, reliabilityengineer, gatekeeper, and DevOps engineer An individual can perform multiple roles, and rolescan be split among individuals The assignment of roles to individuals depends on thatindividual’s skills and workload as well as the skills and amount of work required to satisfy therole We discuss some examples of team roles for adopting DevOps and continuous deployment

in the case study in Chapter 12

Service Owner

The service owner is the role on the team responsible for outside coordination The serviceowner participates in system-wide requirements activities, prioritizes work items for the team,and provides the team with information both from the clients of the team’s service and aboutservices provided to the team The requirements gathering and release planning activities for thenext iteration can occur in parallel with the conception phase of the current iteration Thus,although these activities require coordination and time, they will not slow down the time todelivery

The service owner maintains and communicates the vision for the service Since each service isrelatively small, the vision involves knowledge of the clients of the team’s service and theservices on which the team’s service depends That is, the vision involves the architecture of theoverall system and the team’s role in that architecture

The ability to communicate both with other stakeholders and with other members of the team is akey requirement for the service owner

Trang 26

means being on call for services that require high availability Google calls this role “SiteReliability Engineer.”

Once a problem occurs, the reliability engineer performs short-term analysis to diagnose,mitigate, and repair the problem, usually with the assistance of automated tools This can occurunder very stressful conditions (e.g., in the middle of the night or a romantic dinner) Theproblem may involve reliability engineers from other teams In any case, the reliability engineerhas to be excellent at troubleshooting and diagnosis The reliability engineer also has to have acomprehensive grasp of the internals of the service so that a fix or workaround can be applied

In addition to the short-term analysis, the reliability engineer should discover or work with theteam to discover the root cause of a problem The “5 Whys” is a technique to determine a rootcause Keep asking “Why?” until a process reason is discovered For example, the deployedservice is too slow and the immediate cause may be an unexpected spike in workload Thesecond “why” is what caused the unexpected spike, and so on Ultimately, the response is thatstress testing for the service did not include appropriate workload characterization This processreason can be fixed by improving the workload characterization for the stress testing.Increasingly, reliability engineers need to be competent developers, as they need to write high-quality programs to automate the repetitive part of the diagnosis, mitigation, and repair

Gatekeeper

Netflix uses the steps given in Figure 1.3 from local development to deployment

from http://techblog.netflix.com/2013/11/preparing-netflix-api-for-deployment.html) [Notation:BPMN]

Trang 27

Each arrow in this figure represents a decision to move to the next step This decision may bedone automatically (in Netflix’s case) or manually The manual role that decides to move aservice to the next step in a deployment pipeline is a gatekeeper role The gatekeeper decideswhether to allow a version of a service or a portion of a service through “the gate” to the nextstep The gatekeeper may rely on comprehensive testing results and have a checklist to use tomake this decision and may consult with others but, fundamentally, the responsibility forallowing code or a service to move on through the deployment pipeline belongs to thegatekeeper In some cases, the original developer is the gatekeeper before deployment toproduction, making a decision informed by test results but carrying the full responsibility.

Human gatekeepers (not the original developer) may be required by regulators in some industries

such as the financial industry

Mozilla has a role called a release coordinator (sometimes called release manager) This

individual is designated to assume responsibility for coordinating the entire release The releasecoordinator attends triage meetings where it is decided what is in and what is omitted from arelease, understands the background context on all work included in a release, referees bugseverity disputes, may approve late-breaking additions, and can make the back-out decision Inaddition, on the actual release day, the release coordinator is the point for all communicationsbetween developers, QA, release engineering, website developers, PR, and marketing Therelease coordinator is a gatekeeper

DevOps Engineer

Examine Figure 1.2 again with an eye toward the use of tools in this process Some of the toolsused are code testing tools, configuration management tools, continuous integration tools,deployment tools, or post-deployment testing tools

Configuration management applies not only to the source code for the service but also to all ofthe input for the various tools This allows you to answer questions such as “What changedbetween the last deployment and this one?” and “What new tests were added since the lastbuild?”

Tools evolve, tools require specialized knowledge, and tools require specialized input TheDevOps engineer role is responsible for the care and feeding of the various tools used in theDevOps tool chain This role can be filled at the individual level, the team level, or theorganizational level For example, the organization may decide on a particular configurationmanagement tool that all should use The team will still need to decide on its branchingstrategies, and individual developers may further create branches Policies for naming and accesswill exist and possibly be automatically enforced The choice of which release of theconfiguration management tool the development teams will use is a portion of the DevOpsengineer’s role, as are the tailoring of the tool for the development team and monitoring itscorrect use by the developers The DevOps engineering role is inherent in automating thedevelopment and deployment pipeline How this role is manifested in an organizational or teamstructure is a decision separate from the recognition that the role exists and must be filled

Trang 28

1.6 Coordination

One goal of DevOps is to minimize coordination in order to reduce the time to market Two ofthe reasons to coordinate are, first, so that the pieces developed by the various teams will work

together and, second, to avoid duplication of effort The Oxford English Dictionary defines

coordination as “the organization of the different elements of a complex body or activity so as toenable them to work together effectively.” We go more deeply into the concept of coordinationand its mechanisms in this section

Forms of Coordination

Coordination mechanisms have different attributes

Direct—the individuals coordinating know each other (e.g., team members).

Indirect—the coordination mechanism is aimed at an audience known only by its

characterization (e.g., system administrators)

Persistent—the coordination artifacts are available after the moment of the coordination (e.g.,

documents, e-mail, bulletin boards)

Ephemeral—the coordination, per se, produces no artifacts (e.g., face to face meetings,

conversations, telephone/video conferencing) Ephemeral coordination can be made persistentthrough the use of human or mechanical recorders

Synchronous—individuals are coordinating in real time, (e.g., face to face).

Asynchronous—individuals are not coordinating in real time (e.g., documents, e-mail).

Coordination mechanisms are built into many of the tools used in DevOps For example, aversion control system is a form of automated coordination that keeps various developers fromoverwriting each other’s code A continuous integration tool is a form of coordinating the testing

of the correctness of a build

Every form of coordination has a cost and a benefit Synchronous coordination requiresscheduling and, potentially, travel The time spent in synchronous coordination is a cost for allinvolved The benefits of synchronous coordination include allowing the people involved to have

an immediate opportunity to contribute to the resolution of any problem Other costs and benefitsfor synchronous coordination depend on the bandwidth of communication, time zonedifferences, and persistence of the coordination Each form of coordination can be analyzed interms of costs and benefits

The ideal characteristics of a coordination mechanism are that it is low cost in terms of delay,preparation required, and people’s time, and of high benefit in terms of visibility of thecoordination to all relevant stakeholders, fast resolution of any problems, and effectiveness incommunicating the desired information

Trang 29

The Wikipedia definition of DevOps that we mentioned earlier stated that “communication,collaboration, and integration” are hallmarks of a DevOps process In light of our currentdiscussion of coordination, we can see that too much manual communication and collaboration,especially synchronous, defeats the DevOps goal of shorter time to market.

Team Coordination

Team coordination mechanisms are of two types—human processes and automated processes.The DevOps human processes are adopted from agile processes and are designed for high-bandwidth coordination with limited persistence Stand-up meetings and information radiatorsare examples of human process coordination mechanisms

Automated team coordination mechanisms are designed to protect team members frominterference of their and others’ activities (version control and configuration managementsystems), to automate repetitive tasks (continuous integration and deployment), and to speed uperror detection and reporting (automated unit, integration, acceptance, and live production tests).One goal is to provide feedback to the developers as quickly as possible

Cross-team Coordination

Examining the release process activities again makes it clear that cross-team coordination is themost time-consuming factor Coordination must occur with customers, stakeholders, otherdevelopment teams, and operations Therefore, DevOps processes attempt to minimize thiscoordination as much as possible From the development team’s perspective, there are threetypes of cross-team coordination: upstream coordination with stakeholders and customers,downstream coordination with operations, and cross-stream coordination with other developmentteams

The role of the service owner is to perform upstream coordination Downstream coordination isaccomplished by moving many operations responsibilities to the development team It is cross-team coordination that we focus on now There are two reasons for a development team tocoordinate with other development teams—to ensure that the code developed by one team workswell with the code developed by another and to avoid duplication of effort

1 Making the code pieces work together One method for supporting the independent work of

different development teams while simplifying the integration of this work is to have a softwarearchitecture An architecture for the system being developed will help make the pieces worktogether Some further coordination is still necessary, but the architecture serves as acoordinating mechanism An architecture specifies a number of the design decisions to create anoverall system Six of these design decisions are:

a Allocation of responsibilities In DevOps processes, general responsibilities are specified in

the architecture but specific responsibilities are determined at the initiation of each iteration

Trang 30

b Coordination model The coordination model describes how the components of an architecture

coordinate at runtime Having a single coordination model for all elements removes the necessity

of coordination about the coordination model

c Data model As with responsibilities, the data model objects and their life cycle are specified

in the architecture but refinements may occur at iteration initiation

d Management of resources The resources to be managed are determined by the architecture.

The limits on these resources (e.g., buffer size or thread pool size) may be determined duringiteration initiation or through system-wide policies specified in the architecture

e Mapping among architectural elements The least coordination is required among teams if

these mappings are specified in the architecture and in the work assignments for the teams Wereturn to this topic when we discuss the architectural style we propose for systems developedwith DevOps processes, in Chapter 4

f Binding time decisions These are specified in the overall architecture Many runtime binding

values will be specified through configuration parameters, and we will discuss the management

of the configuration parameters in Chapter 5

2 Avoiding duplication of effort Avoiding duplication of effort and encouraging reuse is another

argument for coordination among development teams DevOps practices essentially argue thatduplication of effort is a necessary cost for shorter time to market There are two portions to thisargument First, since the task each team has to accomplish is small, any duplication is small.Large potential areas of duplication, such as each team creating their own datastore, are handled

by the architecture Second, since each team is responsible for its own service, troubleshootingproblems after deployment is faster with code written by the team, and it avoids escalating aproblem to a different team

1.7 Barriers

If DevOps solves long-standing problems with development and has such clear benefits, whyhaven’t all organizations adopted DevOps practices? In this section we explore the barriers totheir adoption

Culture and Type of Organization

Culture is important when discussing DevOps Both across organizations and among differentgroups within the same organization, cultural issues associated with DevOps affect its form andits adoption Culture depends not only on your role but also on the type of organization to whichyou belong

One of the goals of DevOps is to reduce time to market of new features or products One of thetradeoffs that organizations consider when adopting DevOps practices is the benefits of reducedtime to market versus the risks of something going awry Almost all organizations worry aboutrisk The risks that a particular organization worries about, however, depend on their domain of

Trang 31

activity For some organizations the risks of problems occurring outweigh a time-to-marketadvantage.

Organizations that operate in regulated domains—financial, health care, or utility services—have regulations to which they must adhere and face penalties, potentially severe, if they violatethe regulations under which they operate Even organizations in regulated domains may haveproducts that are unregulated So a financial organization may use DevOps processes for someproducts For products that require more oversight, the practices may be adaptable, for example,

by introducing additional gatekeepers We discuss security and audit issues in Chapter 8

Organizations that operate in mature and slow-moving domains—automotive or buildingconstruction—have long lead times, and, although their deadlines are real, they are alsoforeseeable far in advance

Organizations whose customers have a high cost of switching to another supplier, such asEnterprise Resource Planning systems, are reluctant to risk the stability of their operations Thecost of downtime for some systems will far outweigh the competitive advantage of introducing anew feature somewhat more quickly

For other organizations, nimbleness and fast response are more important than the occasionalerror caused by moving too fast

Organizations that rely on business analytics to shape their products want to have shorter andshorter times between the gathering of the data and actions inspired by the data Any errors thatresult can be quickly corrected since the next cycle will happen quickly

Organizations that face severe competitive pressure want to have their products and newfeatures in the marketplace before their competitors

Note that these examples do not depend on the size of the organization but rather the type ofbusiness they are in It is difficult to be nimble if you have regulators who have oversight andcan dictate your operating principles, or if your lead time for a product feature is measured inyears, or if your capital equipment has a 40-year estimated lifetime

The point of this discussion is that businesses operate in an environment and inherit much of theculture of that environment See Chapter 10 for more details Some DevOps practices aredisruptive, such as allowing developers to deploy to production directly; other DevOps practicesare incremental in that they do not affect the overall flow of products or oversight Treatingoperations personnel as first-class citizens should fall into this nondisruptive category

It is possible for a slow-moving organization to become more nimble or a nimble organization tohave oversight If you are considering adopting a DevOps practice then you need to be aware ofthree things

Trang 32

1 What other practices are implicit in the practice you are considering? You cannot do

continuous deployment without first doing continuous integration Independent practices need to

be adopted prior to adopting dependent practices

2 What is the particular practice you are considering? What are its assumption, its costs, and its

benefits?

3 What is the culture of your business, and what are the ramifications of your adopting this

particular DevOps practice? If the practice just affects operations and development, that is one

thing If it requires modification to the entire organizational structure and oversight practices,that is quite another The difficulty of adopting a practice is related to its impact on otherportions of the organization But even if the adoption focuses on a single development team and

a few operators, it is important that the DevOps culture is adopted by all people involved Acommonly reported way of failing in the adoption of DevOps is to hire a DevOps engineer andthink you are done

Type of Department

One method for determining the culture of an organization is to look at what kinds of results areincentivized Salespeople who work on commission work very hard to get sales CEOs who arerewarded based on quarterly profits are focused on the results of the next quarter This is humannature Developers are incentivized to produce and release code Ideally, they are incentivized toproduce error-free code but there is a Dilbert cartoon that shows the difficulty of this: Thepointy-headed boss offers $10 for every bug found and fixed, and Wally responds, “Hooray, I amgoing to write me a new minivan this afternoon.” In any case, developers are incentivized to gettheir code into production

Operations personnel, on the other hand, are incentivized to minimize downtime Minimizingdowntime means examining and removing causes of downtime Examining anything in detailtakes time Furthermore, avoiding change removes one of the causes of downtime “If it ain’tbroke, don’t fix it” is a well-known phrase dating back over decades

Basically, developers are incentivized to change something (release new code), and operationspersonnel are incentivized to resist change These two different sets of incentives breed differentattitudes and can be the cause of culture clashes

Silo Mentality

It is easy to say that two departments in an organization have a common goal—ensuring theorganization’s success It is much more difficult to make this happen in practice An individual’sloyalty tends to be first to her or his team and secondarily to the overall organization If thedevelopment team is responsible for defining the release plan that will include what features getimplemented in what priority, other portions of the organization will see some of their powerbeing usurped and, potentially, their customers become unhappy If activities formerly performed

by operations personnel are now going to be performed by developers, what happens to theoperations personnel who now have less to do?

Trang 33

These are the normal ebbs and flows of organizational politics but that does not make them lessmeaningful and less real.

Personnel Issues

According to the Datamation 2012 IT salary guide, a software engineer earns about 50% morethan a systems administrator So by moving a task from a system administrator (Ops) to asoftware engineer (Dev), the personnel performing the task cost 50% more Thus, the time spentperforming the task must be cut by a third just to make the performance of the task cost the sameamount A bigger cut is necessary to actually gain time, with automation being the prevalentmethod to achieve these time savings This is the type of cost/benefit analysis that anorganization must go through in order to determine which DevOps processes to adopt and how toadopt them

Developers with a modern skill set are in high demand and short supply, and they also have aheavy workload Adding more tasks to their workload may exacerbate the shortage ofdevelopers

1.8 Summary

The main takeaway from this chapter is that people have defined DevOps from differentperspectives, such as operators adopting agile practices or developers taking operationsresponsibilities, among others But one common objective is to reduce the time between theconception of a feature or improvement as a business idea to its eventual deployment to users.DevOps faces barriers due to both cultural and technical challenges It can have a huge impact onteam structure, software architecture, and traditional ways of conducting operations We havegiven you a taste of this impact by listing some common practices We will cover all of thesetopics in detail throughout the rest of the book

Some of the tradeoffs involved in DevOps are as follows:

Trang 34

Creation of a need to support DevOps tools This tool support is traded off against the

shortening of the time to market of new functions

Moving responsibilities from IT professionals to developers This tradeoff is multifaceted The

following are some of the facets to be considered:

The cost to complete a task from the two groups

The time to complete a task from the two groups

The availability of personnel within the two groups

The repair time when an error is detected during execution If the error is detected quickly afterdeployment, then the developer may still have the context information necessary to diagnose itquickly, whereas if the error is initially diagnosed by IT personnel, it may take time before theerror gets back to the developer

Removing oversight of new features and deployment This tradeoff is between autonomy for the

development teams and overall coordination The efficiencies of having autonomousdevelopment teams must outweigh the duplications of effort that will occur because of no overalloversight

All in all, we believe that DevOps has the potential to lead IT onto exciting new ground, withhigh frequency of innovation and fast cycles to improve the user experience We hope you enjoyreading the book as much as we enjoyed writing it

You can read about different takes on the DevOps definition from the following sources:

Gartner’s Hype Cycle [Gartner] categorizes DevOps as on therise: http://www.gartner.com/DisplayDocument?doc_cd=249070

AgileAdmins explains DevOps from an agile perspective: devops/

http://theagileadmin.com/what-is-You can find many more responses from the following recent surveys and industry reports: XebiaLabs has a wide range of surveys and state of industry reports on DevOps-related topicsthat can be found at http://xebialabs.com/xl-resources/whitepapers/

CA Technologies’ report gives some insights into business’ different understanding of DevOpsand can be found at http://www.ca.com/us/collateral/white-papers/na/techinsights-report-what-smart-businesses-know-about-devops.aspx

Trang 35

While some vendors or communities extended continuous integration tools toward continuousdeployment, many vendors also released completely new tools for continuous delivery anddeployment.

The popular continuous integration tool Jenkins has many third-party plug-ins including someworkflows extending into continuous deployment You can find some plug-ins from Cloudbees

The duties of an operator are listed in http://en.wikipedia.org/wiki/DevOps

The 5 Whys originated at Toyota Motors and are discussed

in http://en.wikipedia.org/wiki/5_Whys

There are also discussions around whether or not continuous deployment is just a dream[BostInno 11] Scott Ambler has not only coauthored (with Mark Lines) a book on disciplinedagile delivery [Ambler 12], he also maintains a blog from which we adapted the description ofthe roles in a team [Ambler 15]

Netflix maintains a technical blog where they discuss a variety of issues associated with theirplatform Their deployment steps are discussed in [Netflix 13]

Mozilla’s Release Coordinator role is discussed in [Mozilla]

Len Bass, Paul Clements, and Rick Kazman discuss architectural decisions on page 73 and

subsequently in Software Architecture in Practice [Bass 13]

The discussion of IMVU is adapted from a blog written by Timothy Fitz [Fitz 09]

2 The Cloud as a Platform

Trang 36

We’ve redefined cloud computing to include everything that we already do … The computer industry is the only industry that is more fashion-driven than women’s fashion … We’ll make cloud computing announcements because if orange is the new pink, we’ll make orange blouses I’m not going to fight this thing.

—Larry Ellison

2.1 Introduction

The standard analogy used to describe the cloud is that of the electric grid When you want to useelectricity, you plug a device into a standard connection and turn it on You are charged for theelectricity you use In most cases, you can remain ignorant of the mechanisms the variouselectric companies use to generate and distribute electricity The exception to this ignorance is ifthere is a power outage At that point you become aware that there are complicated mechanismsunderlying your use of electricity even if you remain unaware of the particular mechanisms thatfailed

The National Institute of Standards and Technology (NIST) has provided a characterization ofthe cloud with the following elements:

On-demand self-service A consumer can unilaterally provision computing capabilities, such as

server time and network storage, as needed automatically without requiring human interactionwith each service provider

Broad network access Capabilities are available over the network and accessed through

standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g.,mobile phones, tablets, laptops, and workstations)

Resource pooling The provider’s computing resources are pooled to serve multiple consumers

using a multi-tenant model, with different physical and virtual resources dynamically assignedand reassigned according to consumer demand There is a sense of location independence in thatthe customer generally has no control over or knowledge of the exact location of the providedresources but may be able to specify location at a higher level of abstraction (e.g., country, state,

or datacenter) Examples of resources include storage, processing, memory, and networkbandwidth

Rapid elasticity Capabilities can be elastically provisioned and released, in some cases

automatically, to scale rapidly outward and inward commensurate with demand To theconsumer, the capabilities available for provisioning often appear to be unlimited and can beappropriated in any quantity at any time

Measured service Cloud systems automatically control and optimize resource use by

leveraging a metering capability at some level of abstraction appropriate to the type of service(e.g., storage, processing, bandwidth, and active user accounts) Resource usage can bemonitored, controlled, and reported, thereby providing transparency for both the provider andconsumer of the utilized service

Trang 37

From the perspective of operations and DevOps, the most important of these characteristics areon-demand self-service and measured (or metered) service Even though the cloud provides whatappear to be unlimited resources that you can acquire at will, you must still pay for their use As

we will discuss, the other characteristics are also important but not as dominant as on-demandself-service and paying for what you use

Implicit in the NIST characterization is the distinction between the provider and the consumer ofcloud services Our perspective in this book is primarily that of the consumer If yourorganization runs its own datacenters then there may be some blurring of this distinction, buteven in such organizations, the management of the datacenters is not usually considered asfalling within the purview of DevOps

NIST also characterizes the various types of services available from cloud providers, as shown

in Table 2.1 NIST defines three types of services, any one of which can be used in a DevOpscontext

TABLE 2.1 Cloud Service Models

Software as a Service (SaaS) The consumer is provided the capability to use the provider’s

applications running on a cloud infrastructure The applications are accessible from variousclient devices through either a thin client interface, such as a web browser (e.g., web-based e-mail) or an application interface The consumer does not manage or control the underlying cloudinfrastructure including networks, servers, operating systems, storage, or even individualapplication capabilities, with the possible exception of limited user-specific applicationconfiguration settings

Platform as a Service (PaaS) The consumer is provided the capability to deploy onto the cloud

infrastructure consumer-created or acquired applications created using programming languages,libraries, services, and tools supported by the provider The consumer does not manage orcontrol the underlying cloud infrastructure including networks, servers, operating systems, orstorage, but has control over the deployed applications and possibly configuration settings for theapplication-hosting environment

Infrastructure as a Service (IaaS) The consumer is provided the capability to provision

processing, storage, networks, and other fundamental computing resources where the consumer

Trang 38

is able to deploy and run arbitrary software, which can include operating systems andapplications The consumer does not manage or control the underlying cloud infrastructure buthas control over operating systems, storage, and deployed applications; and possibly limitedcontrol of select networking components (e.g., host firewalls).

We first discuss the mechanisms involved in the cloud, and then we discuss the consequences ofthese mechanisms on DevOps

2.2 Features of the Cloud

The fundamental enabler of the cloud is virtualization over hundreds of thousands of hostsaccessible over the Internet We begin by discussing IaaS-centric features, namely, virtualizationand IP management, followed by some specifics of PaaS offerings Then we discuss generalissues, such as the consequences of having hundreds of thousands of hosts and how elasticity issupported in the cloud

Virtualization

In cloud computing, a virtual machine (VM) is an emulation of a physical machine A VM image

is a file that contains a bootable operating system and some software installed on it A VM imageprovides the information required to launch a VM (or more precisely, a VM instance) In thisbook, we use “VM” and “VM instance” interchangeably to refer to an instance And we use

“VM image” to refer to the file used to launch a VM or a VM instance For example, an AmazonMachine Image (AMI) is a VM image that can be used to launch Elastic Compute Cloud (EC2)

VM instances

When using IaaS, a consumer acquires a VM from a VM image by using an applicationprogramming interface (API) provided by the cloud provider for that purpose The API may beembedded in a command-line interpreter, a web interface, or another tool of some sort In anycase, the request is for a VM with some set of resources—CPU, memory, and network Theresources granted may be hosted on a computer that is also hosting other VMs (multi-tenancy)but from the perspective of the consumer, the provider produces the equivalent of a stand-alonecomputer

Creating a Virtual Machine

In order to create a VM, two distinct activities are performed

The user issues a command to create a VM Typically, the cloud provider has a utility thatenables the creation of the VM This utility is told the resources required by the VM, the account

to which the charges accrued by the VM should be charged, the software to be loaded (seebelow), and a set of configuration parameters specifying security and the external connectionsfor the VM

The cloud infrastructure decides on which physical machine to create the VM instance The

operating system for this physical machine is called a hypervisor, and it allocates resources for

Trang 39

the new VM and “wires” the new machine so that it can send and receive messages The new

VM is assigned an IP address that is used for sending and receiving messages We havedescribed the situation where the hypervisor is running on bare metal It is also possible thatthere are additional layers of operating system–type software involved but each layer introducesoverhead and so the most common situation is the one we described

Loading a Virtual Machine

Each VM needs to be loaded with a set of software in order to do meaningful work The softwarecan be loaded partially as a VM and partially as a result of the activated VM loading softwareafter launching A VM image can be created by loading and configuring a machine with thedesired software and data, and then copying the memory contents (typically in the form of thevirtual hard disk) of the machine to a persistent file New VM instances from that VM image(software and data) can then be created at will

The process of creating a VM image is called baking the image A heavily baked image contains all of the software required to run an application and a lightly baked image contains only a

portion of the software required, such as an operating system and a middleware container Wediscuss these options and the related tradeoffs in Chapter 5

Virtualization introduces several types of uncertainty that you should be aware of

Because a VM shares resources with other VMs on a single physical machine, there may besome performance interference among the VMs This situation may be particularly difficult forcloud consumers as they usually have no visibility into the co-located VMs owned by otherconsumers

There are also time and dependability uncertainties when loading a VM, depending on theunderlying physical infrastructure and the additional software that needs to be dynamicallyloaded DevOps operations often create and destroy VMs frequently for setting up differentenvironments or deploying new versions of software It is important that you are aware of theseuncertainties

IP and Domain Name System Management

When a VM is created, it is assigned an IP address IP addresses are the means by whichmessages are routed to any computer on the Internet IP addresses, their routing, and theirmanagement are all complicated subjects A discussion of the Domain Name System (DNS), andthe persistence of IP addresses with respect to VMs follows

DNS

Underlying the World Wide Web is a system that translates part of URLs into IP addresses Thisfunction concerns the domain name part of the URL (e.g., ssrg.nicta.com.au), which can beresolved to an IP address through the DNS As a portion of normal initiation, a browser, forexample, is provided with the address of a DNS server As shown in Figure 2.1, when you enter

Trang 40

a URL into your browser, it sends that URL to its known DNS server which, in association with

a larger network of DNS servers, resolves that URL into an IP address

FIGURE 2.1 DNS returning an IP address [Notation: Architecture]

The domain name indicates a routing path for the resolution The domainname ssrg.nicta.com.au, for example, will go first to a root DNS server to look up how toresolve .au names The root server will provide an IP address for the Australian DNS serverwhere .com names for Australia are stored The .com.au server will provide the IP address ofthe nicta DNS server, which in turn provides an IP address for ssrg

The importance of this hierarchy is that the lower levels of the hierarchy—.nicta and .ssrg—are under local control Thus, the IP address of ssrg within the .nicta server can be changedrelatively easily and locally

Furthermore, each DNS entry has an attribute named time to live (TTL) TTL acts as anexpiration time for the entry (i.e., the mapping of the domain name and the IP address) Theclient or the local DNS server will cache the entry, and that cached entry will be valid for aduration specified by the TTL When a query arrives prior to the expiration time, the client/localDNS server can retrieve the IP address from its cache When a query arrives after the expirationtime, the IP address has to be resolved by an authoritative DNS server Normally the TTL is set

to a large value; it may be as large as 24 hours It is possible to set the TTL to as low as 1 minute

We will see in our case studies, Chapters 11–13, how the combination of local control and shortTTL can be used within a DevOps context

One further point deserves mention In Figure 2.1, we showed the DNS returning a single IPaddress for a domain name In fact, it can return multiple addresses Figure 2.2 shows the DNSserver returning two addresses

Tiêu đề	DevOps a Software Architect's Perspective
Tác giả	Len Bass, Ingo Weber, Liming Zhu
Chuyên ngành	Software Engineering
Thể loại	eBook

Định dạng
Số trang	291
Dung lượng	4,78 MB