DevOps promises to accelerate the release of new software features and improve monitoring after systems are placed into operation. However, DevOps has crucial implications for system design and architecture that most previous books ignore. In DevOps: A Software Architect''''s Perspective, three world-class software architects address these issues head-on, helping organizations deploy DevOps more efficiently, avoid common problems, and drive more value. The authors begin by reviewing DevOps'''' impact on every phase of the development cycle, including build, test, deployment, and post-deployment monitoring and observation. For each phase, they systematically identify issues, tools, team practices, and key tradeoffs associated with preparing for DevOps and using it effectively. Next, they turn to cross-cutting concerns that transcend a single function, offering practical insights into compliance, cloud environments, human and process performance, reliability, repeatability, and more. Throughout, they offer real-world case studies, detailed references, practical examples, and convenient checklists. You''''ll find indispensable guidance for addressing key questions like: How can I design new systems to work more effectively with DevOps? How do I address culture and communication problems between Dev and Ops? How do I integrate DevOps with agile methods and TDD? What are the best ways to handle specific issues such as failure detection and upgrade planning?
Trang 1About This eBook
ePUB is an open, industry-standard format for eBooks However, support of ePUB and its manyfeatures varies across reading devices and applications Use your device or app settings tocustomize the presentation to your liking Settings that you can customize often include font, fontsize, single or double column, landscape or portrait mode, and figures that you can click or tap toenlarge For additional information about the settings and features on your reading device or app,visit the device manufacturer’s Web site
Many titles include programming code or configuration examples To optimize the presentation
of these elements, view the eBook in single-column, landscape mode and adjust the font size tothe smallest setting In addition to presenting code and configurations in the reflowable textformat, we have included images of the code that mimic the presentation found in the print book;therefore, where the reflowable format may compromise the presentation of the code listing, youwill see a “Click here to view code image” link Click the link to view the print-fidelity codeimage To return to the previous page viewed, click the Back button on your device or app
Trang 3PART ONE BACKGROUND
CHAPTER 1 What Is DevOps?
1.9 For Further Reading
CHAPTER 2 The Cloud as a Platform
2.1 Introduction
2.2 Features of the Cloud
2.3 DevOps Consequences of the Unique Cloud Features2.4 Summary
2.5 For Further Reading
CHAPTER 3 Operations
Trang 43.1 Introduction
3.2 Operations Services
3.3 Service Operation Functions
3.4 Continual Service Improvement
3.5 Operations and DevOps
3.6 Summary
3.7 For Further Reading
PART TWO THE DEPLOYMENT PIPELINE CHAPTER 4 Overall Architecture
4.1 Do DevOps Practices Require Architectural Change?4.2 Overall Architecture Structure
4.3 Quality Discussion of Microservice Architecture4.4 Amazon’s Rules for Teams
4.5 Microservice Adoption for Existing Systems
4.6 Summary
4.7 For Further Reading
CHAPTER 5 Building and Testing
5.1 Introduction
5.2 Moving a System Through the Deployment Pipeline5.3 Crosscutting Aspects
5.4 Development and Pre-commit Testing
5.5 Build and Integration Testing
5.6 UAT/Staging/Performance Testing
Trang 56.10 For Further Reading
PART THREE CROSSCUTTING CONCERNS CHAPTER 7 Monitoring
Trang 67.7 Tools
7.8 Diagnosing an Anomaly from Monitoring Data—the Case of Platformer.com7.9 Summary
7.10 For Further Reading
CHAPTER 8 Security and Security Audits
8.10 Application Design Considerations
8.11 Deployment Pipeline Design Considerations
8.12 Summary
8.13 For Further Reading
CHAPTER 9 Other Ilities
9.1 Introduction
9.2 Repeatability
9.3 Performance
9.4 Reliability
Trang 79.10 For Further Reading
CHAPTER 10 Business Considerations
10.6 For Further Reading
PART FOUR CASE STUDIES
CHAPTER 11 Supporting Multiple Datacenters
Trang 811.9 For Further Reading
CHAPTER 12 Implementing a Continuous Deployment Pipeline for Enterprises
12.1 Introduction
12.2 Organizational Context
12.3 The Continuous Deployment Pipeline
12.4 Baking Security into the Foundations of the CD Pipeline
12.5 Advanced Concepts
12.6 Summary
12.7 For Further Reading
CHAPTER 13 Migrating to Microservices
13.1 Introduction to Atlassian
13.2 Building a Platform for Deploying Microservices
13.3 BlobStore: A Microservice Example
13.4 Development Process
13.5 Evolving BlobStore
13.6 Summary
13.7 For Further Reading
PART FIVE MOVING INTO THE FUTURE
CHAPTER 14 Operations as a Process
14.1 Introduction
14.2 Motivation and Overview
14.3 Offline Activities
14.4 Online Activities
Trang 914.5 Error Diagnosis
14.6 Monitoring
14.7 Summary
14.8 For Further Reading
CHAPTER 15 The Future of DevOps
(e.g., the novel The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win) and from the project manager’s perspective (e.g., Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation) In addition, there is a raft of
material about cultural change and what it means to tear down barriers between organizationalunits
What frustrated us is that there is very little material from the software architect’s perspective.Treating operations personnel as first-class stakeholders and listening to their requirements iscertainly important Using tools to support operations and project management is also important.Yet, we had the strong feeling that there was more to it than stakeholder management and the use
of tools
Trang 10Indeed there is, and that is the gap that this book intends to fill DevOps presents a fascinatinginterplay between design, process, tooling, and organizational structure We try to answer twoprimary questions: What technical decisions do I, as a software architect, have to make toachieve the DevOps goals? What impact do the other actors in the DevOps space have on me?
The answers are that achieving DevOps goals can involve fundamental changes in thearchitecture of your systems and in the roles and responsibilities required to get your systemsinto production and support them once they are there
Just as software architects must understand the business context and goals for the systems theydesign and construct, understanding DevOps requires understanding organizational and businesscontexts, as well as technical and operational contexts We explore all of these
The primary audience for this book is practicing software architects who have been or expect to
be asked, “Should this project or organization adopt DevOps practices?” Instead of being asked,the architect may be told As with all books, we expect additional categories of readers Studentswho are interested in learning more about the practice of software architecture should findinteresting material here Researchers who wish to investigate DevOps topics can find importantbackground material Our primary focus, however, is on practicing architects
Previewing the Book
We begin the book by discussing the background for DevOps Part One begins by delving intothe goals of DevOps and the problems it is intended to solve We touch on organizational andcultural issues, as well as the relationship of DevOps practices to agile methodologies
In Chapter 2, we explore the cloud DevOps practices have grown in tandem with the growth ofthe cloud as a platform The two, in theory, are separable, but in practice virtualization and thecloud are important enablers for DevOps practices
In our final background chapter, Chapter 3, we explore operations through the prism of theInformation Technology Infrastructure Library (ITIL) ITIL is a system of organization of themost important functions of an operations group Not all of operations are included in DevOpspractices but understanding something of the responsibilities of an operations group providesimportant context, especially when it comes to understanding roles and responsibilities
Part Two describes the deployment pipeline We begin this part by exploring the microservicearchitectural style in Chapter 4 It is not mandatory that systems be architected in this style inorder to apply DevOps practices but the microservice architectural style is designed to solvemany of the problems that motivated DevOps
In Chapter 5, we hurry through the building and testing processes and tool chains It is important
to understand these but they are not our focus We touch on the different environments used toget a system into production and the different sorts of tests run on these environments Sincemany of the tools used in DevOps are used in the building and testing processes, we providecontext for understanding these tools and how to control them
Trang 11We conclude Part Two by discussing deployment One of the goals of DevOps is to speed updeployments A technique used to achieve this goal is to allow each development team toindependently deploy their code when it is ready Independent deployment introduces manyissues of consistency We discuss different deployment models, managing distinct versions of asystem that are simultaneously in production, rolling back in the case of errors, and other topicshaving to do with actually placing your system in production.
Part Two presents a functional perspective on deployment practices Yet, just as with any othersystem, it is frequently the quality perspectives that control the design and the acceptance of thesystem In Part Three, we focus on crosscutting concerns This begins with our discussion ofmonitoring and live testing in Chapter 7 Modern software testing practices do not end when asystem is placed into production First, systems are monitored extensively to detect problems,and secondly, testing continues in a variety of forms after a system has been placed intoproduction
Another crosscutting concern is security, which we cover in Chapter 8 We present the differenttypes of security controls that exist in an environment, spanning those that are organization wideand those that are specific system wide We discuss the different roles associated with achievingsecurity and how these roles are evaluated in the case of a security audit
Security is not the only quality of interest, and in Chapter 9 we discuss other qualities that arerelevant to the practices associated with DevOps We cover topics such as performance,reliability, and modifiability of the deployment pipeline
Finally, in Part Three we discuss business considerations in Chapter 10 Practices as broad asDevOps cannot be adopted without buy-in from management A business plan is a typical means
of acquiring this buy-in; thus, we present the elements of a business plan for DevOps adoptionand discuss how the argument, rollout, and measurement should proceed
In Part Four we present three case studies Organizations that have implemented DevOpspractices tell us some of their tricks Chapter 11 discusses how to maintain two datacenters forthe purpose of business continuity; Chapter 12 presents the specifics of a continuous deploymentpipeline; and Chapter 13 describes how one organization is migrating to a microservicearchitecture
We close by speculating about the future in Part Five Chapter 14 describes our research and how
it is based on viewing operations as a series of processes, and Chapter 15 gives our prediction forhow the next three to five years are going to evolve in terms of DevOps
Acknowledgments
Books like this require a lot of assistance We would like to thank Chris Williams, John Painter,Daniel Hand, and Sidney Shek for their contributions to the case studies, as well as AdneneGuabtni, Kanchana Wickremasinghe, Min Fu, and Xiwei Xu for helping us with some of thechapters
Trang 12Manuel Pais helped us arrange case studies Philippe Kruchten, Eoin Woods, Gregory Hartman,Sidney Shek, Michael Lorant, Wouter Geurts, and Eltjo Poort commented on or contributed tovarious aspects of the book.
We would like to thank Jean-Michel Lemieux, Greg Warden, Robin Fernandes, Jerome Blin, Felipe Cuozzo, Pramod Korathota, Nick Wright, Vitaly Osipov, Brad Baker, and Jim Wattsfor their comments on Chapter 13
Touffe-Addison-Wesley did their usual professional and efficient job in the production process, and thisbook has benefited from their expertise
Finally, we would like to thank NICTA and NICTA management NICTA is funded by theAustralian government through the Department of Communications and the Australian ResearchCouncil through the ICT Centre of Excellence Program Without their generous support, thisbook would not have been written
Legend
We use four distinct legends for the figures We have an architectural notation that identifies thekey architectural concepts that we use; we use Business Process Model and Notation (BPMN) todescribe some processes, Porter’s Value Notation to describe a few others, and UML sequencediagrams for interleaving sequences of activities We do not show the UML sequence diagramnotation here but the notation that we use from these other sources is:
Architecture
FIGURE P.1 People, both individual and groups
FIGURE P.2 Components (runtime entities), modules (code-time collections of entities), and
data flow
Trang 13FIGURE P.3 Specialized entities
FIGURE P.4 Collections of entities
BPMN
We use Business Process Model and Notation (BPMN) for describing events and activities[OMG 11]
FIGURE P.5 Event indications
FIGURE P.6 Activities and sequences of activities
Porter’s Value Chain
This notation is used to describe processes (which, in turn, have activities modelled in BPMN)
FIGURE P.7 Entry in a value chain
Part One: Background
This part provides the necessary background for the remainder of the book DevOps is amovement that envisions no friction between the development groups and the operations groups
In addition, the emergence of DevOps coincides with the growth of the cloud as a basic platformfor organizations, large and small Part One has three chapters
Trang 14In Chapter 1, we define DevOps and discuss its various motivations DevOps is a catchall termthat can cover several meanings, including: having development and operations speak to eachother; allowing development teams to deploy to production automatically; and havingdevelopment teams be the first responders when an error is discovered in production In thischapter, we sort out these various considerations and develop a coherent description of whatDevOps is, what its motivations and goals are, and how it is going about achieving those goals.
In order to understand how certain DevOps practices work, it is necessary to know how the cloudworks, which we discuss in Chapter 2 In particular, you should know how virtual machineswork, how IP addresses are used, the role of and how to manipulate Domain Name System(DNS) servers, and how load balancers and monitors interact to provide on-demand scaling
DevOps involves the modifications of both Dev and Ops practices In Chapter 3, we discuss Ops
in its totality It describes the services that Ops provides to the organization and introduces Opsresponsibilities, from supporting deployed applications to enforcing organization-wide securityrules
We begin by defining DevOps and providing a short example Then we present the motivationfor the movement, the DevOps perspective, and barriers to the success of DevOps Much of thewriting on DevOps discusses various organizational and cultural issues In this first chapter, wesummarize these topics, which frame the remainder of the book
Defining DevOps
DevOps has been classified as “on the rise” with respect to the Gartner Hype Cycle forApplication Development in 2013 This classification means that the term is becoming a buzz
Trang 15word and, as such, is ill defined and subject to overblown claims Our definition of DevOpsfocuses on the goals, rather than the means.
DevOps is a set of practices intended to reduce the time between committing a change to a system and the change being placed into normal production, while ensuring high quality.
Before we delve more deeply into what set of practices is included, let’s look at some of theimplications of our definition
The quality of the deployed change to a system (usually in the form of code) is important.Quality means suitability for use by various stakeholders including end users, developers, orsystem administrators It also includes availability, security, reliability, and other “ilities.” Onemethod for ensuring quality is to have a variety of automated test cases that must be passed prior
to placing changed code into production Another method is to test the change in production with
a limited set of users prior to opening it up to the world Still another method is to closelymonitor newly deployed code for a period of time We do not specify in the definition howquality is ensured but we do require that production code be of high quality
The definition also requires the delivery mechanism to be of high quality This implies thatreliability and the repeatability of the delivery mechanism should be high If the deliverymechanism fails regularly, the time required increases If there are errors in how the change isdelivered, the quality of the deployed system suffers, for example, through reduced availability
or reliability
We identify two time periods as being important One is the time when a developer commitsnewly developed code This marks the end of basic development and the beginning of thedeployment path The second time is the deploying of that code into production As we will see
in Chapter 6, there is a period after code has been deployed into production when the code isbeing tested through live testing and is closely monitored for potential problems Once the codehas passed live testing and close monitoring, then it is considered as a portion of the normalproduction system We make a distinction between deploying code into production for livetesting and close monitoring and then, after passing the tests, promoting the newly developedcode to be equivalent to previously developed code
Our definition is goal oriented We do not specify the form of the practices or whether tools areused to implement them If a practice is intended to reduce the time between a commit from adeveloper and deploying into production, it is a DevOps practice whether it involves agilemethods, tools, or forms of coordination This is in contrast to several other definitions.Wikipedia, for example, stresses communication, collaboration, and integration between variousstakeholders without stating the goal of such communication, collaboration, or integration.Timing goals are implicit Other definitions stress the connection between DevOps and agilemethods Again, there is no mention of the benefits of utilizing agile methods on either the time
to develop or the quality of the production system Still other definitions stress the tools beingused, without mentioning the goal of DevOps practices, the time involved, or the quality
Trang 16Finally, the goals specified in the definition do not restrict the scope of DevOps practices totesting and deployment In order to achieve these goals, it is important to include an Opsperspective in the collection of requirements—that is, significantly earlier than committingchanges Analogously, the definition does not mean DevOps practices end with deployment intoproduction; the goal is to ensure high quality of the deployed system throughout its life cycle.Thus, monitoring practices that help achieve the goals are to be included as well.
an operator Involving operations in the development of requirements will ensure that these types
of requirements are considered
Make Dev more responsible for relevant incident handling These practices are intended toshorten the time between the observation of an error and the repair of that error Organizationsthat utilize these practices typically have a period of time in which Dev has primaryresponsibility for a new deployment; later on, Ops has primary responsibility
Enforce the deployment process used by all, including Dev and Ops personnel These practicesare intended to ensure a higher quality of deployments This avoids errors caused by ad hocdeployments and the resulting misconfiguration The practices also refer to the time that it takes
to diagnose and repair an error The normal deployment process should make it easy to trace thehistory of a particular deployment artifact and understand the components that were included inthat artifact
Use continuous deployment Practices associated with continuous deployment are intended toshorten the time between a developer committing code to a repository and the code beingdeployed Continuous deployment also emphasizes automated tests to increase the quality ofcode making its way into production
Develop infrastructure code, such as deployment scripts, with the same set of practices asapplication code Practices that apply to the development of infrastructure code are intended toensure both high quality in the deployed applications and that deployments proceed as planned.Errors in deployment scripts such as misconfigurations can cause errors in the application, theenvironment, or the deployment process Applying quality control practices used in normalsoftware development when developing operations scripts and processes will help control thequality of these specifications
Figure 1.1 gives an overview of DevOps processes At its most basic, DevOps advocates treatingOperations personnel as first-class stakeholders Preparing a release can be a very serious andonerous process (We describe that in the section “Release Process.”) As such, operations
Trang 17personnel may need to be trained in the types of runtime errors that can occur in a system underdevelopment; they may have suggestions as to the type and structure of log files, and they mayprovide other types of input into the requirements process At its most extreme, DevOpspractices make developers responsible for monitoring the progress and errors that occur duringdeployment and execution, so theirs would be the voices suggesting requirements In betweenare practices that cover team practices, build processes, testing processes, and deploymentprocesses We discuss the continuous deployment pipeline in Chapters 5 and 6 We also covermonitoring, security, and audits in subsequent chapters.
FIGURE 1.1 DevOps life cycle processes [Notation: Porter’s Value Chain]
You may have some questions about terminology with the terms IT professional, operator, and operations personnel Another related term is system administrator The IT professional
subsumes the mentioned roles and others, such as help desk support The distinction interminology between operators and system administrators has historical roots but is much lesstrue today Historically, operators had hands-on access to the hardware—installing andconfiguring hardware, managing backups, and maintaining printers—while systemadministrators were responsible for uptime, performance, resources, and security of computersystems Today it is the rare operator who does not take on some duties formerly assigned to a
system administrator We will use the term operator to refer to anyone who performs computer
operator or system administration tasks (or both)
Example of Continuous Deployment: IMVU
IMVU, Inc is a social entertainment company whose product allows users to connect through3D avatar-based experiences This section is adapted from a blog written by an IMVU engineer
IMVU does continuous integration The developers commit early and often A commit triggers
an execution of a test suite IMVU has a thousand test files, distributed across 30–40 machines,and the test suite takes about nine minutes to run Once a commit has passed all of its tests, it isautomatically sent to deployment This takes about six minutes The code is moved to thehundreds of machines in the cluster, but at first the code is only made live on a small number ofmachines (canaries) A sampling program examines the results of the canaries and if there hasbeen a statistically significant regression, then the revision is automatically rolled back.Otherwise the remainder of the cluster is made active IMVU deploys new code 50 times a day,
on average
Trang 18The essence of the process is in the test suite Every time a commit gets through the test suite and
is rolled back, a new test is generated that would have caught the erroneous deployment, and it isadded to the test suite
Note that a full test suite (with the confidence of production deployment) that only takes nineminutes to run is uncommon for large-scale systems In many organizations, the full test suitethat provides production deployment confidence can take hours to run, which is often doneovernight A common challenge is to reduce the size of the test suite judiciously and remove
“flaky” tests
1.2 Why DevOps?
DevOps, in many ways, is a response to the problem of slow releases The longer it takes arelease to get to market, the less advantage will accrue from whatever features or qualityimprovements led to the release Ideally, we want to release in a continuous manner This is
often termed continuous delivery or continuous deployment We discuss the subtle difference
between the two terms in Chapters 5 and 6 In this book, we use the term continuous deployment or just deployment We begin by describing a formal release process, and then we
delve more deeply into some of the reasons for slow releases
Release Process
Releasing a new system or version of an existing system to customers is one of the mostsensitive steps in the software development cycle This is true whether the system or version isfor external distribution, is used directly by consumers, or is strictly for internal use As long asthe system is used by more than one person, releasing a new version opens the possibility ofincompatibilities or failures, with subsequent unhappiness on the part of the customers
Consequently, organizations pay a great deal of attention to the process of defining a releaseplan The following release planning steps are adapted from Wikipedia Traditionally, most ofthe steps are done manually
1 Define and agree on release and deployment plans with customers/stakeholders This could be
done at the team or organizational level The release and deployment plans will include thosefeatures to be included in the new release as well as ensure that operations personnel (includinghelp desk and support personnel) are aware of schedules, resource requirements are met, and anyadditional training that might be required is scheduled
2 Ensure that each release package consists of a set of related assets and service components that
are compatible with each other Everything changes over time, including libraries, platforms, anddependent services Changes may introduce incompatibilities This step is intended to preventincompatibilities from becoming apparent only after deployment In Chapter 5, we discuss theways of ensuring all of these compatibilities Managing dependencies is a theme that will surfacerepeatedly throughout this book
Trang 193 Ensure that the integrity of a release package and its constituent components is maintained
throughout the transition activities and recorded accurately in the configuration managementsystem There are two parts to this step: The first is to make sure that old versions of acomponent are not inadvertently included in the release, and the second is to make sure that arecord is kept of the components of this deployment Knowing the elements of the deployment isimportant when tracking down errors found after deployment We discuss the details ofdeployment in Chapter 6
4 Ensure that all release and deployment packages can be tracked, installed, tested, verified, and/
or uninstalled or rolled back, if appropriate Deployments may need to be rolled back (new
version uninstalled, old version redeployed) under a variety of circumstances, such as errors inthe code, inadequate resources, or expired licenses or certificates
The activities enumerated in this list can be accomplished with differing levels of automation Ifall of these activities are accomplished primarily through human coordination then these stepsare labor-intensive, time-consuming, and error-prone Any automation reflects an agreement onthe release process whether at the team or organization level Since tools are typically used morethan once, an agreement on the release process encoded into a tool has persistence beyond asingle release
In case you are tempted to downplay the seriousness of getting the deployment correct, you maywant to consider recent media reports with substantial financial costs
On August 1, 2012, Knight Capital had an upgrade failure that ended up costing (US) $440million
On August 20, 2013, Goldman Sachs had an upgrade failure that, potentially, could costmillions of dollars
These are just two of the many examples that have resulted in downtime or errors because ofupgrade failure Deploying an upgrade correctly is a significant and important activity for anorganization and, yet, one that should be done in a timely fashion with minimal opportunity forerror Several organizations have done surveys to document the extent of deployment problems
We report on two of them
XebiaLabs is an organization that markets a deployment tool and a continuous integration tool.They did a survey in 2013 with over 130 responses 34% of the respondents were from ITservices companies with approximately 10% each from health care, financial services, andtelecommunications companies 7.5% of the respondents reported their deployment process was
“not reliable,” and 57.5% reported their deployment process “needs improvement.” 49% reportedtheir biggest challenge in the deployment process was “too much inconsistency acrossenvironments and applications.” 32.5% reported “too many errors.” 29.2% reported theirdeployments relied on custom scripting, and 35.8% reported their deployments were partiallyscripted and partially manual
Trang 20CA Technologies provides IT management solutions to their customers They commissioned asurvey in 2013 that had 1,300 respondents from companies with more than (US) $100 millionrevenue Of those who reported seeing benefits from the adoption of DevOps, 53% said theywere already seeing an increased frequency of deployment of their software or services and 41%said they were anticipating seeing an increased frequency of deployment 42% responded thatthey had seen improved quality of deployed applications, and 49% responded they anticipatedseeing improved quality.
Although both surveys are sponsored by organizations with a vested interest in promotingdeployment automation, they also clearly indicate that the speed and quality of deployments are
a concern to many companies in a variety of different markets
Reasons for Poor Coordination
Consider what happens after a developer group has completed all of the coding and testing for asystem The system needs to be placed into an environment where:
Only the appropriate people have access to it
It is compatible with all of the other systems with which it interacts in the environment
It has sufficient resources on which to operate
The data that it uses to operate is up to date
The data that it generates is usable by other systems in the environment
Furthermore, help desk personnel need to be trained in features of the new system and operationspersonnel need to be trained in troubleshooting any problems that might occur while the system
is operating The timing of the release may also be of significance because it should not coincidewith the absence of any key member of the operations staff or with a new sales promotion thatwill stress the existing resources
None of this happens by accident but each of these items requires coordination between thedevelopers and the operations personnel It is easy to imagine a scenario where one or more ofthese items are not communicated by the development personnel to the operations personnel Acommon attitude among developers is “I finished the development, now go and run it.” Weexplore the reasons for this attitude when we discuss the cultural barrier to adoption of DevOps.One reason that organizations have processes to ensure smooth releases is that coordination doesnot always happen in an appropriate manner This is one of the complaints that motivated theDevOps movement
Trang 21Limited Capacity of Operations Staff
Operations staff perform a variety of functions but there are limits as to what they canaccomplish or who on the staff is knowledgeable in what system Consider the responsibilities of
a modern operations person as detailed in Wikipedia
Analyzing system logs and identifying potential issues with computer systems
Introducing and integrating new technologies into existing datacenter environments
Performing routine audits of systems and software
Performing backups
Applying operating system updates, patches, and configuration changes
Installing and configuring new hardware and software
Adding, removing, or updating user account information; resetting passwords, etc
Answering technical queries and assisting users
Ensuring security
Documenting the configuration of the system
Troubleshooting any reported problems
Optimizing system performance
Ensuring that the network infrastructure is up and running
Configuring, adding, and deleting file systems
Maintaining knowledge of volume management tools like Veritas (now Symantec), SolarisZFS, LVM
Each of these items requires a deep level of understanding Is it any wonder that when we askedthe IT director of an Internet-based company what his largest problem was, he replied “findingand keeping qualified personnel.”
The DevOps movement is taking a different approach Their approach is to reduce the need fordedicated operations personnel through automating many of the tasks formerly done byoperations and having developers assume a portion of the remainder
Trang 221.3 DevOps Perspective
Given the problems we have discussed and their long-standing nature, it is no surprise that there
is a significant appeal for a movement that promises to reduce the time to market for newfeatures and reduce errors occurring in deployment DevOps comes in multiple flavors and withdifferent degrees of variation from current practice, but two themes run consistently through thedifferent flavors: automation and the responsibilities of the development team
Automation
Figure 1.1 shows the various life cycle processes The steps from build and testing throughexecution can all be automated to some degree We will discuss the tools used in each one ofthese steps in the appropriate chapters, but here we highlight the virtues of automation Some ofthe problems with relying on automation are discussed in Section 1.7
Tools can perform the actions required in each step of the process, check the validity of actionsagainst the production environment or against some external specification, inform appropriatepersonnel of errors occurring in the process, and maintain a history of actions for quality control,reporting, and auditing purposes
Tools and scripts also can enforce organization-wide policies Suppose the organization has apolicy that every change has to have a rationale associated with the change Then prior tocommitting a change, a tool or script can require a rationale to be provided by the individualmaking the change Certainly, this requirement can be circumvented, but having the tool ask for
a rationale will increase the compliance level for this policy
Once tools become central to a set of processes, then the use of these tools must also bemanaged Tools are invoked, for example, from scripts, configuration changes, or the operator’sconsole Where console commands are complicated, it is advisable to script their usage, even ifthere is only a handful of commands being used Tools may be controlled through specificationfiles, such as Chef cookbooks or Amazon CloudFormation—more on these later The scripts,configuration files, and specification files must be subject to the same quality control as theapplication code itself The scripts and files should also be under version control and subject toexamination for corrections This is often termed “infrastructure-as-code.”
Development Team Responsibilities
Automation will reduce the incidence of errors and will shorten the time to deployment Tofurther shorten the time to deployment, consider the responsibilities of operations personnel asdetailed earlier If the development team accepts DevOps responsibilities, that is, it delivers,supports, and maintains the service, then there is less need to transfer knowledge to theoperations and support staff since all of the necessary knowledge is resident in the developmentteam Not having to transfer knowledge removes a significant coordination step from thedeployment process
Trang 231.4 DevOps and Agile
One of the characterizations of DevOps emphasizes the relationship of DevOps practices to agilepractices In this section, we overlay the DevOps practices on IBM’s Disciplined Agile Delivery.Our focus is on what is added by DevOps, not an explanation of Disciplined Agile Delivery For
that, see Disciplined Agile Delivery: A Practitioner’s Approach As shown in Figure 1.2,Disciplined Agile Delivery has three phases—inception, construction, and transition In theDevOps context, we interpret transition as deployment
FIGURE 1.2 Disciplined Agile Delivery phases for each release (Adapted from Disciplined
Agile Delivery: A Practitioner’s Guide by Ambler and Lines) [Notation: Porter’s Value Chain]
DevOps practices impact all three phases
1 Inception phase During the inception phase, release planning and initial requirements
specification are done
a Considerations of Ops will add some requirements for the developers We will see these in
more detail later in this book, but maintaining backward compatibility between releases andhaving features be software switchable are two of these requirements The form and content ofoperational log messages impacts the ability of Ops to troubleshoot a problem
b Release planning includes feature prioritization but it also includes coordination with
operations personnel about the scheduling of the release and determining what training theoperations personnel require to support the new release Release planning also includesensuring compatibility with other packages in the environment and a recovery plan if the releasefails DevOps practices make incorporation of many of the coordination-related topics in releaseplanning unnecessary, whereas other aspects become highly automated
2 Construction phase During the construction phase, key elements of the DevOps practices are
the management of the code branches, the use of continuous integration and continuousdeployment, and incorporation of test cases for automated testing These are also agile practicesbut form an important portion of the ability to automate the deployment pipeline A new element
is the integrated and automated connection between construction and transition activities
Trang 243 Transition phase In the transition phase, the solution is deployed and the development team is
responsible for the deployment, monitoring the process of the deployment, deciding whether toroll back and when, and monitoring the execution after deployment The development team has arole of “reliability engineer,” who is responsible for monitoring and troubleshooting problemsduring deployment and subsequent execution
The advantages of small teams are:
They can make decisions quickly In every meeting, attendees wish to express their opinions.The smaller the number of attendees at the meeting, the fewer the number of opinions expressedand the less time spent hearing differing opinions Consequently, the opinions can be expressedand a consensus arrived at faster than with a large team
It is easier to fashion a small number of people into a coherent unit than a large number Acoherent unit is one in which everyone understands and subscribes to a common set of goals forthe team
It is easier for individuals to express an opinion or idea in front of a small group than in front of
a large one
The disadvantage of a small team is that some tasks are larger than can be accomplished by asmall number of individuals In this case the task has to be broken up into smaller pieces, eachgiven to a different team, and the different pieces need to work together sufficiently well toaccomplish the larger task To achieve this, the teams need to coordinate
The team size becomes a major driver of the overall architecture A small team, by necessity,works on a small amount of code We will see that an architecture constructed around acollection of microservices is a good means to package these small tasks and reduce the need forexplicit coordination—so we will call the output of a development team a “service.” We discussthe ways and challenges of migrating to a microservice architecture driven by small teams
in Chapter 4 and the case study in Chapter 13 from Atlassian
Trang 25Team Roles
We lift two of the roles in the team from Scott Ambler’s description of roles in an agile team
Team lead This role, called “Scrum Master” in Scrum or team coach or project lead in other
methods, is responsible for facilitating the team, obtaining resources for it, and protecting it fromproblems This role encompasses the soft skills of project management but not the technical onessuch as planning and scheduling, activities which are better left to the team as a whole
Team member This role, sometimes referred to as developer or programmer, is responsible for
the creation and delivery of a system This includes modeling, programming, testing, and releaseactivities, as well as others
Additional roles in a team executing a DevOps process consist of service owner, reliabilityengineer, gatekeeper, and DevOps engineer An individual can perform multiple roles, and rolescan be split among individuals The assignment of roles to individuals depends on thatindividual’s skills and workload as well as the skills and amount of work required to satisfy therole We discuss some examples of team roles for adopting DevOps and continuous deployment
in the case study in Chapter 12
Service Owner
The service owner is the role on the team responsible for outside coordination The serviceowner participates in system-wide requirements activities, prioritizes work items for the team,and provides the team with information both from the clients of the team’s service and aboutservices provided to the team The requirements gathering and release planning activities for thenext iteration can occur in parallel with the conception phase of the current iteration Thus,although these activities require coordination and time, they will not slow down the time todelivery
The service owner maintains and communicates the vision for the service Since each service isrelatively small, the vision involves knowledge of the clients of the team’s service and theservices on which the team’s service depends That is, the vision involves the architecture of theoverall system and the team’s role in that architecture
The ability to communicate both with other stakeholders and with other members of the team is akey requirement for the service owner
Trang 26means being on call for services that require high availability Google calls this role “SiteReliability Engineer.”
Once a problem occurs, the reliability engineer performs short-term analysis to diagnose,mitigate, and repair the problem, usually with the assistance of automated tools This can occurunder very stressful conditions (e.g., in the middle of the night or a romantic dinner) Theproblem may involve reliability engineers from other teams In any case, the reliability engineerhas to be excellent at troubleshooting and diagnosis The reliability engineer also has to have acomprehensive grasp of the internals of the service so that a fix or workaround can be applied
In addition to the short-term analysis, the reliability engineer should discover or work with theteam to discover the root cause of a problem The “5 Whys” is a technique to determine a rootcause Keep asking “Why?” until a process reason is discovered For example, the deployedservice is too slow and the immediate cause may be an unexpected spike in workload Thesecond “why” is what caused the unexpected spike, and so on Ultimately, the response is thatstress testing for the service did not include appropriate workload characterization This processreason can be fixed by improving the workload characterization for the stress testing.Increasingly, reliability engineers need to be competent developers, as they need to write high-quality programs to automate the repetitive part of the diagnosis, mitigation, and repair
Gatekeeper
Netflix uses the steps given in Figure 1.3 from local development to deployment
from http://techblog.netflix.com/2013/11/preparing-netflix-api-for-deployment.html) [Notation:BPMN]
Trang 27Each arrow in this figure represents a decision to move to the next step This decision may bedone automatically (in Netflix’s case) or manually The manual role that decides to move aservice to the next step in a deployment pipeline is a gatekeeper role The gatekeeper decideswhether to allow a version of a service or a portion of a service through “the gate” to the nextstep The gatekeeper may rely on comprehensive testing results and have a checklist to use tomake this decision and may consult with others but, fundamentally, the responsibility forallowing code or a service to move on through the deployment pipeline belongs to thegatekeeper In some cases, the original developer is the gatekeeper before deployment toproduction, making a decision informed by test results but carrying the full responsibility.
Human gatekeepers (not the original developer) may be required by regulators in some industries
such as the financial industry
Mozilla has a role called a release coordinator (sometimes called release manager) This
individual is designated to assume responsibility for coordinating the entire release The releasecoordinator attends triage meetings where it is decided what is in and what is omitted from arelease, understands the background context on all work included in a release, referees bugseverity disputes, may approve late-breaking additions, and can make the back-out decision Inaddition, on the actual release day, the release coordinator is the point for all communicationsbetween developers, QA, release engineering, website developers, PR, and marketing Therelease coordinator is a gatekeeper
DevOps Engineer
Examine Figure 1.2 again with an eye toward the use of tools in this process Some of the toolsused are code testing tools, configuration management tools, continuous integration tools,deployment tools, or post-deployment testing tools
Configuration management applies not only to the source code for the service but also to all ofthe input for the various tools This allows you to answer questions such as “What changedbetween the last deployment and this one?” and “What new tests were added since the lastbuild?”
Tools evolve, tools require specialized knowledge, and tools require specialized input TheDevOps engineer role is responsible for the care and feeding of the various tools used in theDevOps tool chain This role can be filled at the individual level, the team level, or theorganizational level For example, the organization may decide on a particular configurationmanagement tool that all should use The team will still need to decide on its branchingstrategies, and individual developers may further create branches Policies for naming and accesswill exist and possibly be automatically enforced The choice of which release of theconfiguration management tool the development teams will use is a portion of the DevOpsengineer’s role, as are the tailoring of the tool for the development team and monitoring itscorrect use by the developers The DevOps engineering role is inherent in automating thedevelopment and deployment pipeline How this role is manifested in an organizational or teamstructure is a decision separate from the recognition that the role exists and must be filled
Trang 281.6 Coordination
One goal of DevOps is to minimize coordination in order to reduce the time to market Two ofthe reasons to coordinate are, first, so that the pieces developed by the various teams will work
together and, second, to avoid duplication of effort The Oxford English Dictionary defines
coordination as “the organization of the different elements of a complex body or activity so as toenable them to work together effectively.” We go more deeply into the concept of coordinationand its mechanisms in this section
Forms of Coordination
Coordination mechanisms have different attributes
Direct—the individuals coordinating know each other (e.g., team members).
Indirect—the coordination mechanism is aimed at an audience known only by its
characterization (e.g., system administrators)
Persistent—the coordination artifacts are available after the moment of the coordination (e.g.,
documents, e-mail, bulletin boards)
Ephemeral—the coordination, per se, produces no artifacts (e.g., face to face meetings,
conversations, telephone/video conferencing) Ephemeral coordination can be made persistentthrough the use of human or mechanical recorders
Synchronous—individuals are coordinating in real time, (e.g., face to face).
Asynchronous—individuals are not coordinating in real time (e.g., documents, e-mail).
Coordination mechanisms are built into many of the tools used in DevOps For example, aversion control system is a form of automated coordination that keeps various developers fromoverwriting each other’s code A continuous integration tool is a form of coordinating the testing
of the correctness of a build
Every form of coordination has a cost and a benefit Synchronous coordination requiresscheduling and, potentially, travel The time spent in synchronous coordination is a cost for allinvolved The benefits of synchronous coordination include allowing the people involved to have
an immediate opportunity to contribute to the resolution of any problem Other costs and benefitsfor synchronous coordination depend on the bandwidth of communication, time zonedifferences, and persistence of the coordination Each form of coordination can be analyzed interms of costs and benefits
The ideal characteristics of a coordination mechanism are that it is low cost in terms of delay,preparation required, and people’s time, and of high benefit in terms of visibility of thecoordination to all relevant stakeholders, fast resolution of any problems, and effectiveness incommunicating the desired information
Trang 29The Wikipedia definition of DevOps that we mentioned earlier stated that “communication,collaboration, and integration” are hallmarks of a DevOps process In light of our currentdiscussion of coordination, we can see that too much manual communication and collaboration,especially synchronous, defeats the DevOps goal of shorter time to market.
Team Coordination
Team coordination mechanisms are of two types—human processes and automated processes.The DevOps human processes are adopted from agile processes and are designed for high-bandwidth coordination with limited persistence Stand-up meetings and information radiatorsare examples of human process coordination mechanisms
Automated team coordination mechanisms are designed to protect team members frominterference of their and others’ activities (version control and configuration managementsystems), to automate repetitive tasks (continuous integration and deployment), and to speed uperror detection and reporting (automated unit, integration, acceptance, and live production tests).One goal is to provide feedback to the developers as quickly as possible
Cross-team Coordination
Examining the release process activities again makes it clear that cross-team coordination is themost time-consuming factor Coordination must occur with customers, stakeholders, otherdevelopment teams, and operations Therefore, DevOps processes attempt to minimize thiscoordination as much as possible From the development team’s perspective, there are threetypes of cross-team coordination: upstream coordination with stakeholders and customers,downstream coordination with operations, and cross-stream coordination with other developmentteams
The role of the service owner is to perform upstream coordination Downstream coordination isaccomplished by moving many operations responsibilities to the development team It is cross-team coordination that we focus on now There are two reasons for a development team tocoordinate with other development teams—to ensure that the code developed by one team workswell with the code developed by another and to avoid duplication of effort
1 Making the code pieces work together One method for supporting the independent work of
different development teams while simplifying the integration of this work is to have a softwarearchitecture An architecture for the system being developed will help make the pieces worktogether Some further coordination is still necessary, but the architecture serves as acoordinating mechanism An architecture specifies a number of the design decisions to create anoverall system Six of these design decisions are:
a Allocation of responsibilities In DevOps processes, general responsibilities are specified in
the architecture but specific responsibilities are determined at the initiation of each iteration
Trang 30b Coordination model The coordination model describes how the components of an architecture
coordinate at runtime Having a single coordination model for all elements removes the necessity
of coordination about the coordination model
c Data model As with responsibilities, the data model objects and their life cycle are specified
in the architecture but refinements may occur at iteration initiation
d Management of resources The resources to be managed are determined by the architecture.
The limits on these resources (e.g., buffer size or thread pool size) may be determined duringiteration initiation or through system-wide policies specified in the architecture
e Mapping among architectural elements The least coordination is required among teams if
these mappings are specified in the architecture and in the work assignments for the teams Wereturn to this topic when we discuss the architectural style we propose for systems developedwith DevOps processes, in Chapter 4
f Binding time decisions These are specified in the overall architecture Many runtime binding
values will be specified through configuration parameters, and we will discuss the management
of the configuration parameters in Chapter 5
2 Avoiding duplication of effort Avoiding duplication of effort and encouraging reuse is another
argument for coordination among development teams DevOps practices essentially argue thatduplication of effort is a necessary cost for shorter time to market There are two portions to thisargument First, since the task each team has to accomplish is small, any duplication is small.Large potential areas of duplication, such as each team creating their own datastore, are handled
by the architecture Second, since each team is responsible for its own service, troubleshootingproblems after deployment is faster with code written by the team, and it avoids escalating aproblem to a different team
1.7 Barriers
If DevOps solves long-standing problems with development and has such clear benefits, whyhaven’t all organizations adopted DevOps practices? In this section we explore the barriers totheir adoption
Culture and Type of Organization
Culture is important when discussing DevOps Both across organizations and among differentgroups within the same organization, cultural issues associated with DevOps affect its form andits adoption Culture depends not only on your role but also on the type of organization to whichyou belong
One of the goals of DevOps is to reduce time to market of new features or products One of thetradeoffs that organizations consider when adopting DevOps practices is the benefits of reducedtime to market versus the risks of something going awry Almost all organizations worry aboutrisk The risks that a particular organization worries about, however, depend on their domain of
Trang 31activity For some organizations the risks of problems occurring outweigh a time-to-marketadvantage.
Organizations that operate in regulated domains—financial, health care, or utility services—have regulations to which they must adhere and face penalties, potentially severe, if they violatethe regulations under which they operate Even organizations in regulated domains may haveproducts that are unregulated So a financial organization may use DevOps processes for someproducts For products that require more oversight, the practices may be adaptable, for example,
by introducing additional gatekeepers We discuss security and audit issues in Chapter 8
Organizations that operate in mature and slow-moving domains—automotive or buildingconstruction—have long lead times, and, although their deadlines are real, they are alsoforeseeable far in advance
Organizations whose customers have a high cost of switching to another supplier, such asEnterprise Resource Planning systems, are reluctant to risk the stability of their operations Thecost of downtime for some systems will far outweigh the competitive advantage of introducing anew feature somewhat more quickly
For other organizations, nimbleness and fast response are more important than the occasionalerror caused by moving too fast
Organizations that rely on business analytics to shape their products want to have shorter andshorter times between the gathering of the data and actions inspired by the data Any errors thatresult can be quickly corrected since the next cycle will happen quickly
Organizations that face severe competitive pressure want to have their products and newfeatures in the marketplace before their competitors
Note that these examples do not depend on the size of the organization but rather the type ofbusiness they are in It is difficult to be nimble if you have regulators who have oversight andcan dictate your operating principles, or if your lead time for a product feature is measured inyears, or if your capital equipment has a 40-year estimated lifetime
The point of this discussion is that businesses operate in an environment and inherit much of theculture of that environment See Chapter 10 for more details Some DevOps practices aredisruptive, such as allowing developers to deploy to production directly; other DevOps practicesare incremental in that they do not affect the overall flow of products or oversight Treatingoperations personnel as first-class citizens should fall into this nondisruptive category
It is possible for a slow-moving organization to become more nimble or a nimble organization tohave oversight If you are considering adopting a DevOps practice then you need to be aware ofthree things
Trang 321 What other practices are implicit in the practice you are considering? You cannot do
continuous deployment without first doing continuous integration Independent practices need to
be adopted prior to adopting dependent practices
2 What is the particular practice you are considering? What are its assumption, its costs, and its
benefits?
3 What is the culture of your business, and what are the ramifications of your adopting this
particular DevOps practice? If the practice just affects operations and development, that is one
thing If it requires modification to the entire organizational structure and oversight practices,that is quite another The difficulty of adopting a practice is related to its impact on otherportions of the organization But even if the adoption focuses on a single development team and
a few operators, it is important that the DevOps culture is adopted by all people involved Acommonly reported way of failing in the adoption of DevOps is to hire a DevOps engineer andthink you are done
Type of Department
One method for determining the culture of an organization is to look at what kinds of results areincentivized Salespeople who work on commission work very hard to get sales CEOs who arerewarded based on quarterly profits are focused on the results of the next quarter This is humannature Developers are incentivized to produce and release code Ideally, they are incentivized toproduce error-free code but there is a Dilbert cartoon that shows the difficulty of this: Thepointy-headed boss offers $10 for every bug found and fixed, and Wally responds, “Hooray, I amgoing to write me a new minivan this afternoon.” In any case, developers are incentivized to gettheir code into production
Operations personnel, on the other hand, are incentivized to minimize downtime Minimizingdowntime means examining and removing causes of downtime Examining anything in detailtakes time Furthermore, avoiding change removes one of the causes of downtime “If it ain’tbroke, don’t fix it” is a well-known phrase dating back over decades
Basically, developers are incentivized to change something (release new code), and operationspersonnel are incentivized to resist change These two different sets of incentives breed differentattitudes and can be the cause of culture clashes
Silo Mentality
It is easy to say that two departments in an organization have a common goal—ensuring theorganization’s success It is much more difficult to make this happen in practice An individual’sloyalty tends to be first to her or his team and secondarily to the overall organization If thedevelopment team is responsible for defining the release plan that will include what features getimplemented in what priority, other portions of the organization will see some of their powerbeing usurped and, potentially, their customers become unhappy If activities formerly performed
by operations personnel are now going to be performed by developers, what happens to theoperations personnel who now have less to do?
Trang 33These are the normal ebbs and flows of organizational politics but that does not make them lessmeaningful and less real.
Personnel Issues
According to the Datamation 2012 IT salary guide, a software engineer earns about 50% morethan a systems administrator So by moving a task from a system administrator (Ops) to asoftware engineer (Dev), the personnel performing the task cost 50% more Thus, the time spentperforming the task must be cut by a third just to make the performance of the task cost the sameamount A bigger cut is necessary to actually gain time, with automation being the prevalentmethod to achieve these time savings This is the type of cost/benefit analysis that anorganization must go through in order to determine which DevOps processes to adopt and how toadopt them
Developers with a modern skill set are in high demand and short supply, and they also have aheavy workload Adding more tasks to their workload may exacerbate the shortage ofdevelopers
1.8 Summary
The main takeaway from this chapter is that people have defined DevOps from differentperspectives, such as operators adopting agile practices or developers taking operationsresponsibilities, among others But one common objective is to reduce the time between theconception of a feature or improvement as a business idea to its eventual deployment to users.DevOps faces barriers due to both cultural and technical challenges It can have a huge impact onteam structure, software architecture, and traditional ways of conducting operations We havegiven you a taste of this impact by listing some common practices We will cover all of thesetopics in detail throughout the rest of the book
Some of the tradeoffs involved in DevOps are as follows:
Trang 34Creation of a need to support DevOps tools This tool support is traded off against the
shortening of the time to market of new functions
Moving responsibilities from IT professionals to developers This tradeoff is multifaceted The
following are some of the facets to be considered:
The cost to complete a task from the two groups
The time to complete a task from the two groups
The availability of personnel within the two groups
The repair time when an error is detected during execution If the error is detected quickly afterdeployment, then the developer may still have the context information necessary to diagnose itquickly, whereas if the error is initially diagnosed by IT personnel, it may take time before theerror gets back to the developer
Removing oversight of new features and deployment This tradeoff is between autonomy for the
development teams and overall coordination The efficiencies of having autonomousdevelopment teams must outweigh the duplications of effort that will occur because of no overalloversight
All in all, we believe that DevOps has the potential to lead IT onto exciting new ground, withhigh frequency of innovation and fast cycles to improve the user experience We hope you enjoyreading the book as much as we enjoyed writing it
1.9 For Further Reading
You can read about different takes on the DevOps definition from the following sources:
Gartner’s Hype Cycle [Gartner] categorizes DevOps as on therise: http://www.gartner.com/DisplayDocument?doc_cd=249070
AgileAdmins explains DevOps from an agile perspective: devops/
http://theagileadmin.com/what-is-You can find many more responses from the following recent surveys and industry reports: XebiaLabs has a wide range of surveys and state of industry reports on DevOps-related topicsthat can be found at http://xebialabs.com/xl-resources/whitepapers/
CA Technologies’ report gives some insights into business’ different understanding of DevOpsand can be found at http://www.ca.com/us/collateral/white-papers/na/techinsights-report-what-smart-businesses-know-about-devops.aspx
Trang 35While some vendors or communities extended continuous integration tools toward continuousdeployment, many vendors also released completely new tools for continuous delivery anddeployment.
The popular continuous integration tool Jenkins has many third-party plug-ins including someworkflows extending into continuous deployment You can find some plug-ins from Cloudbees
The duties of an operator are listed in http://en.wikipedia.org/wiki/DevOps
The 5 Whys originated at Toyota Motors and are discussed
in http://en.wikipedia.org/wiki/5_Whys
There are also discussions around whether or not continuous deployment is just a dream[BostInno 11] Scott Ambler has not only coauthored (with Mark Lines) a book on disciplinedagile delivery [Ambler 12], he also maintains a blog from which we adapted the description ofthe roles in a team [Ambler 15]
Netflix maintains a technical blog where they discuss a variety of issues associated with theirplatform Their deployment steps are discussed in [Netflix 13]
Mozilla’s Release Coordinator role is discussed in [Mozilla]
Len Bass, Paul Clements, and Rick Kazman discuss architectural decisions on page 73 and
subsequently in Software Architecture in Practice [Bass 13]
The discussion of IMVU is adapted from a blog written by Timothy Fitz [Fitz 09]
2 The Cloud as a Platform
Trang 36We’ve redefined cloud computing to include everything that we already do … The computer industry is the only industry that is more fashion-driven than women’s fashion … We’ll make cloud computing announcements because if orange is the new pink, we’ll make orange blouses I’m not going to fight this thing.
—Larry Ellison
2.1 Introduction
The standard analogy used to describe the cloud is that of the electric grid When you want to useelectricity, you plug a device into a standard connection and turn it on You are charged for theelectricity you use In most cases, you can remain ignorant of the mechanisms the variouselectric companies use to generate and distribute electricity The exception to this ignorance is ifthere is a power outage At that point you become aware that there are complicated mechanismsunderlying your use of electricity even if you remain unaware of the particular mechanisms thatfailed
The National Institute of Standards and Technology (NIST) has provided a characterization ofthe cloud with the following elements:
On-demand self-service A consumer can unilaterally provision computing capabilities, such as
server time and network storage, as needed automatically without requiring human interactionwith each service provider
Broad network access Capabilities are available over the network and accessed through
standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g.,mobile phones, tablets, laptops, and workstations)
Resource pooling The provider’s computing resources are pooled to serve multiple consumers
using a multi-tenant model, with different physical and virtual resources dynamically assignedand reassigned according to consumer demand There is a sense of location independence in thatthe customer generally has no control over or knowledge of the exact location of the providedresources but may be able to specify location at a higher level of abstraction (e.g., country, state,
or datacenter) Examples of resources include storage, processing, memory, and networkbandwidth
Rapid elasticity Capabilities can be elastically provisioned and released, in some cases
automatically, to scale rapidly outward and inward commensurate with demand To theconsumer, the capabilities available for provisioning often appear to be unlimited and can beappropriated in any quantity at any time
Measured service Cloud systems automatically control and optimize resource use by
leveraging a metering capability at some level of abstraction appropriate to the type of service(e.g., storage, processing, bandwidth, and active user accounts) Resource usage can bemonitored, controlled, and reported, thereby providing transparency for both the provider andconsumer of the utilized service
Trang 37From the perspective of operations and DevOps, the most important of these characteristics areon-demand self-service and measured (or metered) service Even though the cloud provides whatappear to be unlimited resources that you can acquire at will, you must still pay for their use As
we will discuss, the other characteristics are also important but not as dominant as on-demandself-service and paying for what you use
Implicit in the NIST characterization is the distinction between the provider and the consumer ofcloud services Our perspective in this book is primarily that of the consumer If yourorganization runs its own datacenters then there may be some blurring of this distinction, buteven in such organizations, the management of the datacenters is not usually considered asfalling within the purview of DevOps
NIST also characterizes the various types of services available from cloud providers, as shown
in Table 2.1 NIST defines three types of services, any one of which can be used in a DevOpscontext
TABLE 2.1 Cloud Service Models
Software as a Service (SaaS) The consumer is provided the capability to use the provider’s
applications running on a cloud infrastructure The applications are accessible from variousclient devices through either a thin client interface, such as a web browser (e.g., web-based e-mail) or an application interface The consumer does not manage or control the underlying cloudinfrastructure including networks, servers, operating systems, storage, or even individualapplication capabilities, with the possible exception of limited user-specific applicationconfiguration settings
Platform as a Service (PaaS) The consumer is provided the capability to deploy onto the cloud
infrastructure consumer-created or acquired applications created using programming languages,libraries, services, and tools supported by the provider The consumer does not manage orcontrol the underlying cloud infrastructure including networks, servers, operating systems, orstorage, but has control over the deployed applications and possibly configuration settings for theapplication-hosting environment
Infrastructure as a Service (IaaS) The consumer is provided the capability to provision
processing, storage, networks, and other fundamental computing resources where the consumer
Trang 38is able to deploy and run arbitrary software, which can include operating systems andapplications The consumer does not manage or control the underlying cloud infrastructure buthas control over operating systems, storage, and deployed applications; and possibly limitedcontrol of select networking components (e.g., host firewalls).
We first discuss the mechanisms involved in the cloud, and then we discuss the consequences ofthese mechanisms on DevOps
2.2 Features of the Cloud
The fundamental enabler of the cloud is virtualization over hundreds of thousands of hostsaccessible over the Internet We begin by discussing IaaS-centric features, namely, virtualizationand IP management, followed by some specifics of PaaS offerings Then we discuss generalissues, such as the consequences of having hundreds of thousands of hosts and how elasticity issupported in the cloud
Virtualization
In cloud computing, a virtual machine (VM) is an emulation of a physical machine A VM image
is a file that contains a bootable operating system and some software installed on it A VM imageprovides the information required to launch a VM (or more precisely, a VM instance) In thisbook, we use “VM” and “VM instance” interchangeably to refer to an instance And we use
“VM image” to refer to the file used to launch a VM or a VM instance For example, an AmazonMachine Image (AMI) is a VM image that can be used to launch Elastic Compute Cloud (EC2)
VM instances
When using IaaS, a consumer acquires a VM from a VM image by using an applicationprogramming interface (API) provided by the cloud provider for that purpose The API may beembedded in a command-line interpreter, a web interface, or another tool of some sort In anycase, the request is for a VM with some set of resources—CPU, memory, and network Theresources granted may be hosted on a computer that is also hosting other VMs (multi-tenancy)but from the perspective of the consumer, the provider produces the equivalent of a stand-alonecomputer
Creating a Virtual Machine
In order to create a VM, two distinct activities are performed
The user issues a command to create a VM Typically, the cloud provider has a utility thatenables the creation of the VM This utility is told the resources required by the VM, the account
to which the charges accrued by the VM should be charged, the software to be loaded (seebelow), and a set of configuration parameters specifying security and the external connectionsfor the VM
The cloud infrastructure decides on which physical machine to create the VM instance The
operating system for this physical machine is called a hypervisor, and it allocates resources for
Trang 39the new VM and “wires” the new machine so that it can send and receive messages The new
VM is assigned an IP address that is used for sending and receiving messages We havedescribed the situation where the hypervisor is running on bare metal It is also possible thatthere are additional layers of operating system–type software involved but each layer introducesoverhead and so the most common situation is the one we described
Loading a Virtual Machine
Each VM needs to be loaded with a set of software in order to do meaningful work The softwarecan be loaded partially as a VM and partially as a result of the activated VM loading softwareafter launching A VM image can be created by loading and configuring a machine with thedesired software and data, and then copying the memory contents (typically in the form of thevirtual hard disk) of the machine to a persistent file New VM instances from that VM image(software and data) can then be created at will
The process of creating a VM image is called baking the image A heavily baked image contains all of the software required to run an application and a lightly baked image contains only a
portion of the software required, such as an operating system and a middleware container Wediscuss these options and the related tradeoffs in Chapter 5
Virtualization introduces several types of uncertainty that you should be aware of
Because a VM shares resources with other VMs on a single physical machine, there may besome performance interference among the VMs This situation may be particularly difficult forcloud consumers as they usually have no visibility into the co-located VMs owned by otherconsumers
There are also time and dependability uncertainties when loading a VM, depending on theunderlying physical infrastructure and the additional software that needs to be dynamicallyloaded DevOps operations often create and destroy VMs frequently for setting up differentenvironments or deploying new versions of software It is important that you are aware of theseuncertainties
IP and Domain Name System Management
When a VM is created, it is assigned an IP address IP addresses are the means by whichmessages are routed to any computer on the Internet IP addresses, their routing, and theirmanagement are all complicated subjects A discussion of the Domain Name System (DNS), andthe persistence of IP addresses with respect to VMs follows
DNS
Underlying the World Wide Web is a system that translates part of URLs into IP addresses Thisfunction concerns the domain name part of the URL (e.g., ssrg.nicta.com.au), which can beresolved to an IP address through the DNS As a portion of normal initiation, a browser, forexample, is provided with the address of a DNS server As shown in Figure 2.1, when you enter
Trang 40a URL into your browser, it sends that URL to its known DNS server which, in association with
a larger network of DNS servers, resolves that URL into an IP address
FIGURE 2.1 DNS returning an IP address [Notation: Architecture]
The domain name indicates a routing path for the resolution The domainname ssrg.nicta.com.au, for example, will go first to a root DNS server to look up how toresolve .au names The root server will provide an IP address for the Australian DNS serverwhere .com names for Australia are stored The .com.au server will provide the IP address ofthe nicta DNS server, which in turn provides an IP address for ssrg
The importance of this hierarchy is that the lower levels of the hierarchy—.nicta and .ssrg—are under local control Thus, the IP address of ssrg within the .nicta server can be changedrelatively easily and locally
Furthermore, each DNS entry has an attribute named time to live (TTL) TTL acts as anexpiration time for the entry (i.e., the mapping of the domain name and the IP address) Theclient or the local DNS server will cache the entry, and that cached entry will be valid for aduration specified by the TTL When a query arrives prior to the expiration time, the client/localDNS server can retrieve the IP address from its cache When a query arrives after the expirationtime, the IP address has to be resolved by an authoritative DNS server Normally the TTL is set
to a large value; it may be as large as 24 hours It is possible to set the TTL to as low as 1 minute
We will see in our case studies, Chapters 11–13, how the combination of local control and shortTTL can be used within a DevOps context
One further point deserves mention In Figure 2.1, we showed the DNS returning a single IPaddress for a domain name In fact, it can return multiple addresses Figure 2.2 shows the DNSserver returning two addresses