IT training container solutions the cloud native attitude khotailieu

Container Solutions THE CLOUD NATIVE ATTITUDE Your guide to Cloud Native: what it is, what it’s for, who’s using it and why ANNE CURRIE  CONTENT PAGE NEXT PAGE container-solutions.com ABOUT THIS BOOK This is a small book with a single purpose, to tell you all about Cloud Native - what it is, what it’s for, who’s using it and why Go to any software conference and you’ll hear endless discussion of containers, orchestrators and microservices Why are they so fashionable? Are there good reasons for using them? What are the trade-offs and you have to take a big bang approach to adoption? We step back from the hype, summarise the key concepts, and interview some of the enterprises who’ve adopted Cloud Native in production Take copies of this book and pass them around or just zoom in to increase the text size and ask your colleagues to read over your shoulder Horizontal and vertical scaling are fully supported The only hard thing about this book is you can’t assume anyone else has read it and the narrator is notoriously unreliable WHAT DID YOU THINK OF THIS BOOK? We’d love to hear from you with feedback or if you need help with a Cloud Native project email info@container-solutions.com This book is available in PDF form from the Container Solutions website at http://container-solutions.com First published in Great Britain in 2017 by Container Solutions Publishing, a division of Container Solutions Ltd Copyright © Anne Berger (nee Currie) and Container Solutions Ltd 2017 Chapter “Distributed Systems Are Hard” first appeared in The New Stack on 25 Aug 2017 The contents of this article are under Creative Commons Attribution licence creativecommons.org/licenses/by/4.0/  CONTENT PAGE NEXT PAGE CONTENT INTRODUCTION What on Earth is Cloud Native? ONE The Cloud Native Quest TWO Containers 9 THREE Dynamic Management 12 FOUR Microservices 15 FIVE The Dream of Continuous Delivery 18 SIX Going Native, Where to Start 23 SEVEN Distributed Systems Are Hard 27 EIGHT Revise! 31 NINE Are Case Studies Ever Useful? 33 CASE STUDY The Financial Times 35 CASE STUDY Skyscanner 38 CASE STUDY ASOS 40 CASE STUDY Container Solutions 42 TEN Case Study Analysis 44 ELEVEN Common Cloud Native Dilemmas 46 TWELVE Afterword Should Security be One? 49 SUMMARY The State of the Cloud Nation 53 REFERENCES 55   CONTENT PAGE PREVIOUS PAGE NEXT PAGE INTRODUCTION   CONTENT PAGE PREVIOUS PAGE NEXT PAGE Introduction What on Earth is Cloud Native? According to the Cloud Native Computing Foundation (CNCF) Cloud Native is about scale and resilience or “distributed systems capable of scaling to tens of thousands of self healing multi-tenant nodes” (1) That sounds great for folk like Uber or Netflix who want to hyperscale an existing product and control their operating costs But is a Cloud Native approach just about power and scale? Is it of any use to Enterprises of more normal dimensions? What about folk that just want to get new products and services to market faster like the UK’s Financial Times newspaper Five years ago they were looking for an architectural approach that would let them innovate more rapidly Did Cloud Native deliver speed for them? Others like my own startup Microscaling Systems wanted to create and test new business ideas without large capital expenditure, starting small with minimal costs Was Cloud Native a way to reduce bills for us? Why Does This Book Even Exist? The Container Solutions team and I wanted to understand what Cloud Native was actually being used for, what it could deliver in reality and what the tradeoffs and downsides were We interviewed a range of companies who adopted a Cloud Native approach because we wanted to understand what they learned Enterprises like the flight booking unicorn Skyscanner, the international ecommerce retailer ASOS, and the global newspaper The Financial Times We’ve also built and operated systems ourselves for well over 20 years and many of the brand new ideas coming out of Cloud Native seem oddly familiar This book is a distillation of what we gleaned from our conversations with users, vendors, hosting providers, journalists, and researchers It made us ask ourselves, “what the heck is Cloud Native?” Is it a way to move faster? A powerful way to scale? A way to reduce operational costs or capital expenditure? How can these different aims be achieved with one paradigm? Finally, is it good that Cloud Native can potentially so much or is that a risk? With everyone from the ordinary developer to the CTO in mind, this book explores Cloud Native’s multiple meanings and tries to cut through the waffle to identify the right Cloud Native strategy for specific needs We argue that moving fast, being scalable, and reducing costs are all achievable with a Cloud Native approach but they need careful thought Cloud Native has huge potential but it also has dangers Finally we reflect on what Cloud Native really means Is it a system of rules or more of a frame of mind? Is it the opposite of Waterfall or the opposite of Agile? Or are those both utterly meaningless questions? What is Cloud Native? Sounds Like Buzzwords “Cloud Native” is the name of a particular approach to designing, building and running applications based on cloud (infrastructure-as-a-service or platform-as-a-service) combined with microservice architectures and the new operational tools of continuous integration, containers and orchestrators The overall objective is to improve speed, scalability and finally margin Speed: companies of all sizes now see strategic advantage in being able to move quickly and get ideas to market fast By this we mean moving from months to get an idea into production to days or even hours Part of achieving this is a cultural shift within a business, transitioning from big bang projects to more incremental improvements Part of it is about managing risk At it’s best a Cloud Native approach is about de-risking as well as accelerating change, allowing companies to delegate more aggressively and thus become more responsive Scale: as businesses grow, it becomes strategically necessary to support more users, in more locations, with a broader range of devices, while maintaining responsiveness, managing costs, and not falling over Margin: in the new world of cloud infrastructure, a strategic goal may be to pay for additional resources only as they’re needed – as new customers come online Spending moves from up-front CAPEX (buying new machines in anticipation of success) to OPEX (paying for additional servers on-demand) But this is not all Just because machines can be bought just in time does not mean that they’re being used efficiently [14] Another stage in Cloud Native is usually to spend less on hosting At it’s heart, a Cloud Native strategy is about handling technical risk In the past, our standard approach to avoiding danger was to move slowly and carefully The Cloud Native approach is about moving quickly by taking small, reversible and low-risk steps This can be extremely powerful but it isn’t free and it isn’t easy It’s a huge philosophical and cultural shift as well as a technical challenge   CONTENT PAGE PREVIOUS PAGE NEXT PAGE INTRODUCTION How Does Cloud Native Work? The fundamentals of Cloud Native have been described as container packaging, dynamic management and a Microservices-oriented architecture, which all sounds like a lot of work What does it actually mean and is it worth the effort? We believe Cloud Native is actually all about five architectural principles Use infrastructure or platform-as-a-service: run on compute resources that can be flexibly provisioned on demand like those provided by AWS, Google Cloud, Rackspace or Microsoft Azure Design systems using, or evolve them towards, a microservices architecture: individual components are small and decoupled Automate and encode: replace manual tasks with scripts or code For example, using automated test suites, configuration tools and CI/CD Containerize: package processes together with their dependencies making them easy to test, move and deploy Orchestrate: abstract away individual servers in production using off-the-shelf dynamic management and orchestration tools These steps have many benefits but ultimately they are about the reduction of risk Over a decade ago in a small enterprise I lay awake at night wondering what was actually running on the production servers, whether we could reproduce them and how reliant we were on individuals and their ability to cross a busy street Then I’d worry about whether we’d bought enough hardware for the current big project We saw these as our most unrecoverable risks Finally, I worried about new deployments breaking the existing services, which were tied together like a tin of spaghetti That didn’t leave much time for imaginative ideas about the future (or sleep) In that world before cloud, infrastructure-as-code (scripted environment creation), automated testing, containerisation and microservices, we had no choice but to move slowly, spending lots of time on planning, on testing and on documentation That was absolutely the right thing to then to control technical risk However the question now is “is moving slowly our only option?” In fact, is it even the safest option any more? We’re not considering the Cloud Native approach because it’s fashionable – although it is We have a pragmatic motivation: the approach appears to work well with continuous delivery, provide faster time to value, scale well and be efficient to operate However, most importantly it seems to help reduce risk in a new way – by going fast but small It’s that practical reasoning we’ll be evaluating in the rest of this book   CONTENT PAGE PREVIOUS PAGE NEXT PAGE ONE THE CLOUD NATIVE QUEST   CONTENT PAGE PREVIOUS PAGE NEXT PAGE ONE The Cloud Native Quest In our introduction we defined Cloud Native as a set of tools for helping with three potential objectives: • Speed: faster delivery for products and features (aka feature velocity or “Time To Value”) • Scale: maintaining performance while serving more users • Margin: minimizing infrastructure and people bills We also implied that Cloud Native strategies have a focus on infrastructure product supporting 100,000 Launching prototypes that don’t scale well is a sensible approach when you don’t yet know if a product or feature has appeal There’s no point in overengineering it However, the point of launching prototypes is to find a product that will eventually need to support those 100,000 users and many more When this happens your problem becomes scale – how to support more customers in more locations whilst providing the same or a better level of service Ideally, we don’t want to have to expensively and timeconsumingly rewrite products from scratch to handle success (although in some cases that’s the right call) • Start with a cloud (IaaS or PaaS) infrastructure • Leverage new architectural concepts that have infrastructural impact (microservices) • Use open source infrastructure tools (orchestrators and containers) We believe Cloud Native is a technique that marries application architecture and operational architecture and that makes it particularly interesting In this chapter we’re going to talk about the goals we’re trying to achieve with CN: going faster, bigger and cheaper The Goals of Speed, Scale & Margin First of all, let’s define what we mean by these objectives in this context Right now, the most common desire we’re seeing from businesses is for speed So that’s where we’ll start Speed In the Cloud Native world we’re defining speed as “Time to Value” or TTV – the elapsed clock time between a valid idea being generated and becoming a product or feature that users can see, use and hopefully pay for But value doesn’t only mean revenue For some start-ups, value may be user numbers or votes It’s whatever the business chooses to care about We’ve used the phrase “clock time” to differentiate between a feature that takes person days to deliver but launches tomorrow and a feature that takes person day but launches in months time The goal we’re talking about here is how to launch sooner rather than how to minimize engineer hours Scale We all know you can deliver a prototype that supports 100 users far more quickly, easily and cheaply than a fully resilient Margin It’s very easy to spend money in the cloud That’s not always a bad thing Many start-ups and scale-ups rely on the fact that it’s fast and straightforward to acquire more compute resources just by getting out a credit card That wasn’t an option a decade ago However, the time eventually comes when folk want to stop giving AWS, Microsoft or Google a big chunk of their profits At that point their problem becomes how to maintain existing speed and service levels whilst significantly cutting operational costs What Type of Business Are You? But before we jump into choosing an objective let’s consider that a goal is no use unless it’s addressing a problem you actually have and that different companies in different stages of their development usually have different problems Throughout this book we’ll be talking about the kinds of business that choose a Cloud Native strategy Every business is different, but to keep things simple we’re going to generalise to three company types that each represent a different set of problems: the start-up, the scale-up and the enterprise The Start-up A “start-up” in this context is any company that’s experimenting with a business model and trying to find the right combination of product, license, customers and channels A start-up is a business in an exploratory phase – trying and discarding new features and hopefully growing its user base Avoiding risky up-front capital expenditure is the first issue, but that’s fairly easily resolved by building in the cloud Next, speed of iteration becomes their problem, trying various models as rapidly as possible to see what works Scale and   CONTENT PAGE PREVIOUS PAGE NEXT PAGE ONE The Cloud Native Quest margin are not critical problems yet for a start-up A start-up doesn’t have to be new Groups within a larger enterprises may act like start-ups when they’re investigating new products and want to learn quickly There’s an implication here that the business is able to experiment with their business model That’s easy for internet products and much harder for hardware or on-premise products For the “speed” aspect of Cloud Native we are primarily describing benefits only available to companies selling software they can update at will If you can’t update your end product, continuous integration or delivery doesn’t buy you as much although it can still be of use The Scale-up A scale-up is a business that needs to grow fast and have its systems grow alongside it They have to support more users in more geographic regions on more devices Suddenly their problem is scale They want size, resilience and response times Scale is not just about how many users you can support You might be able to handle 100X users if you accept falling over a lot but I wouldn’t call that proper scaling Similarly, if you handle the users but your system becomes terribly slow that isn’t successful scaling either A scale-up wants more users, with the same or better SLA and response times and doesn’t want to massively increase the size of their operations and support teams to achieve it Native simultaneously It’s too hard Every Cloud Native project is challenging and as we’ll read in our case studies it requires focus and commitment Don’t fight a war on more than one front Your objectives don’t have to be extreme Company A might be happy to decrease their deployment time from months to days For Company B, their objective will only be achieved when the deployment time is hours or even minutes Neither Company A or Company B is wrong – as long as they’ve chosen the right target for their own business When it comes to “define your goal” the operative word is “your” So, if you’re searching for product fit you are in “start-up” mode and are probably most interested in speed of iteration and feature velocity If you have a product that needs to support many more users you may be in “scale-up” mode and you’re interested in handling more requests from new locations whilst maintaining availability and response times Finally if you are now looking to maximize your profitability you are in “enterprise” mode and you’re interested in cutting your hosting and operational costs without losing any of the speed and scalability benefits you’ve already accrued OK, that all sounds reasonable! In the next chapter we are going to start looking at the tools we can use to get there The Enterprise Finally, we have the grown-up business – the enterprise This company may have one or many mature products at scale They will still be wrestling with speed and scale but margin is also now a concern: how to grow their customer base for existing products while remaining profitable They no longer want to move quickly or scale by just throwing money at the problem They are worried about their overall hosting bills and their cost per user Being big, resilient and fast is no longer enough They also want to be cost effective Where to Start? It’s a good idea to pursue any wide-ranging objective like speed, scale or margin in small steps with clear wins For example, pursue faster feature delivery for one product first Then, when you are happy with your progress and delivery, apply what you’ve learned to other products It’s a dangerous idea to pursue multiple objectives of Cloud   CONTENT PAGE PREVIOUS PAGE NEXT PAGE TWO CONTAINERS   CONTENT PAGE PREVIOUS PAGE NEXT PAGE CASE STUDY CONTAINER SOLUTIONS   CONTENT PAGE PREVIOUS PAGE NEXT PAGE 42 CASE STUDY Container Solutions Based in London, Amsterdam and Zurich, Container Solutions (CS) was formed in 2014 to provide specialist analysis and engineering around the new technologies of microservices, CI/ CD, containers and orchestrators At around the same time as CS came into existence the term “Cloud Native” gained currency [13] Since then one of Container Solutions’ key activities has been reviewing production Cloud Native systems and providing feedback on best practice and effective next steps According to CTO Pini Reznik, Cloud Native users have changed a great deal in the past few years “Two or three years ago businesses mostly fell into one of two groups On one side you had companies who had barely heard of containers On the other, you had experts who were experimenting heavily or even building in production These experts invested significantly, usually with board level buy-in, and created systems for themselves with little or no help Those were the companies we all learned from.” Now, however, things have changed According to Pini and his team it’s common for companies to start experimenting with Cloud Native technologies with low investment, i.e cheaply in a bottom-up fashion This is often initiated by a keen internal technical champion, usually a developer who was inspired to try containers or microservices by a meetup or a conference Reznik says, “Once this person starts playing with the technology internally other engineers see the attraction, particularly for faster software delivery They run more experiments, get excited and often decide to try bigger projects” CS feel that getting from maverick developer to wider acceptance within a company is easier now There’s better industry awareness of Cloud Native Tech leaders hear conversations about it and see market support, it’s no longer such a scary, radical approach deliver features in has a huge impact on the success of the project We advise teams to build so the project bootstraps itself, i.e build a minimal base platform that immediately contributes to its own evolution.” In other words, he says “ use the platform to develop the platform.” Container Solution’s CEO Jamie Dobson is even more assertive on the subject “With Cloud Native, if a team are not getting modest ROI quickly they’re probably doing it wrong A successful CN implementation should immediately start making further development easier and build steadily from there If that’s not the case, they need to stop and step back In our experience, they’re probably doing too much, too soon without a firm enough foundation” In CS’s view the driver for Cloud Native in most companies now is speed of delivery Often companies start with microservices or containerisation However, testing, diagnostics and CI/ CD are also vitally important in a Cloud Native system - even more than in a Monolithic one If that “plumbing” is missed out the project will suffer The good news is that the market is no longer polarised between super-experts (like the FT or Skyscanner) and almost total unawareness Now that tooling has improved, and toolchain and platform leaders have started to emerge, a whole new group of companies are trying out Cloud Native tentatively, but often with a clear goal in mind or problem they want to solve According to Dobson, “Cloud Native is no longer just the domain of trendy startups and banks with deep pockets” Unfortunately, it’s at this point things can apparently go wrong Ironically, a common issue is that there’s loads of great stuff you can with Cloud Native (this book is packed with it) All have potentially big benefits, but many are tough to deliver with lots to learn “Companies regularly approach us to say they’ve tried Cloud Native but it failed to deliver The project became stuck”, says Reznik “We saw a pattern emerge - as they became more aware of all the possibilities of Cloud Native they found it hard to focus on any one thing But Cloud Native is difficult so if they didn’t focus they got bogged down on every front” CS usually help by encouraging the tech teams to step back and prioritise There are steps, like minimal automated testing or building a continuous delivery pipeline, that in CS’s experience make later tasks easier “Companies usually have a sensible wish list for Cloud Native, but we find that the order you   CONTENT PAGE PREVIOUS PAGE NEXT PAGE 43 TEN DO THOSE CASE STUDIES TELL US ANYTHING?   CONTENT PAGE PREVIOUS PAGE NEXT PAGE 44 TEN Do Those Case Studies Tell Us Anything? OK, we’ve just looked at four case studies Stepping back is there anything we can learn from comparing and contrasting them? Technically • Everyone I interviewed used the public cloud and gradually moved away from their own homegrown data centres None of them regret that; in fact, they all seem to be moving further towards cloud and seeking out more managed services to take the load off their engineers Noone appeared unduly worried about lock-in Culturally For FT and Skyscanner in particular, a Cloud Native approach felt like a cultural shift as much a technical one They both had a business-wide, ground-up objective to be agile, creative, individually autonomous and comfortable with change They both experienced considerable pain getting into Cloud Native technologies so early and they both had to re-tool several times However, I suspect that the difficulties themselves may have helped them with their cultural goal of building a more resilient and confident workforce Later entrants should have an easier time Our sector’s • Everyone cited increased development speed as their prime understanding of the challenges of Cloud Native (CN) has motivator, although for ASOS increased resilience was improved enormously in the past few years The Container also a factor in getting started with the cloud Everyone Solutions experience suggests that companies are now getting mentioned the importance of cost but it was secondary to Do Those Case Studies involved Tell us with CN successfully without needing such a big speed and resilience financial or cultural investment However, I suspect that a Anything? cultural desire for flexibility and “radical autonomy” will always • Everyone had a CI/CD pipeline and automated tests to play a big part in being successful with Cloud Native increase development speed • Everyone had adopted a microservice-like architecture at least in part of production, again to increase development speed They were happy with that decision and would continue to build new stuff with the microservice model • Most folk still had a monolithic heart that had not gone away but was much less actively developed on • Not everyone had adopted containers yet, but everyone who had was pleased with them and had subsequently adopted orchestrators to increase resilience and save hosting costs   CONTENT PAGE PREVIOUS PAGE NEXT PAGE 45 ELEVEN FIVE COMMON CLOUD NATIVE DILEMMAS   CONTENT PAGE PREVIOUS PAGE NEXT PAGE 46 ELEVEN Five Common Cloud Native Dilemmas Adopting Cloud Native still leaves you with lots of tough architectural decisions to make In this chapter we are going to look at some common dilemmas faced by folk implementing CN Dilemma – Does Size Matter? A question I often hear asked is “how many microservices should I have?” or “how big should a microservice be?” So, what is better, 10 microservices or 300? Compromise? Our judgment is distributed systems are hard and there’s lots to learn You can buy expertise but there aren’t loads of distributed experts out there yet Even if you find someone with bags of experience it might be in an architecture that doesn’t match your needs They might build something totally unsuited to your business The upshot is your team’s going to have to loads of on-the-job learning Start small with a modest number of microservices Take small steps A common model is one If the main motivation for Cloud Native is deploying code faster then presumably the smaller the microservice the better microservice per team and that’s not a bad way to start You get the benefit of deployments that don’t cross team Small services are individually easier to understand, write, boundaries but it restricts proliferation until you’ve got your deploy, and debug heads round it As you build field expertise you can move Smaller microservices means you’ll have lots But surely more a more Do Those Case Studies to Tell us advanced distributed architecture with more is better? microservices I like the model of gradually breaking down Anything? services further as needed to avoid development conflicts 300! 10! Small microservices are better when it comes to fast and safe deployment, but what about physical issues? Sending messages between machines is maybe 100 times slower than passing internal messages Monolithic, internal communication is efficient Message passing between microservices is slower and more services means more messages A complex, distributed system of lots of microservices also has counter-intuitive failure modes Smaller numbers are easier for everyone to grok Have we got the tools and processes to manage a complicated system that no one can hold in their head? Maybe less is more? 10,000! Somewhat visionary Cloud Native experts are contemplating not just 300 microservices but 3000 or even 30,000 Serverless platforms like AWS Lambda could go there There’s a cost for proliferation in latency and bandwidth but some consider that a price worth paying for faster deployment However, the problem with very high microservice counts isn’t merely latency and expense In order to support thousands of microservices lots of investment is required in engineer education and in standardisation of service behaviour in areas like network communication Some expert enterprises have been doing this for years but the rest of us haven’t even started Thousands of daily deploys also means aggressively delegating decisions on functionality Technically and organisationally this is a revolution Dilemma – Live Free or Die! Freedom vs Constraints The benefit of small microservices is they’re specialised and decoupled, which leads to faster deployment However, there’s also cost in the difficulty of managing a complex distributed system, and many diverse stacks in production Diversity is not without issues The big players mitigate this complexity by accepting some operational constraints and creating commonality across their microservices Netflix use their Hystrix as a common connectivity library for their microservices Linkerd from Buoyant serves a similar purpose of providing commonality, as does Istio from Google and Lyft Some companies who used containerisation to remove all environmental constraints from developers have begun re-introducing recommended configurations to avoid fixing the same problem in 20 different stacks Our judgement is this is perfectly sensible Help your developers use common operational tools where there’s benefit from consistency Useful constraints free us from dull interop debugging Dilemma – What Does Success Look Like Anyway? Moving fast means quickly assessing if the new world is better than the old one Devs must know what success looks like for a code deploy: better conversions, lower hosting costs or faster response times, for example?   CONTENT PAGE PREVIOUS PAGE NEXT PAGE 47 ELEVEN Five Common Cloud Native Dilemmas Ideally, all key metrics would be automatically monitored for every deploy Any change may have an unforeseen negative consequence (faster response times but lower conversions) Or an unexpected positive one (it fails to cut hosting costs but does improve conversion) We need to spot either If checking is manual that becomes the bottleneck in your fast process So, assessing success is another thing that eventually needs to be encoded At the moment, however, there’s no winning product to metric monitoring or A/B testing Most of the folk we talk to are still developing their own tools Dilemma – Buy, Hire or Train? If you want feature velocity then a valuable engineer is one who knows your product and users and makes good judgments about changes At the extreme end, devs might make changes based only on very high level directions (CTO of the UK’s Skyscanner, Bryan Dove, calls this “radical autonomy”) Training existing staff is particularly important in this fast-iteration world If you go for radical autonomy then devs will be making decisions and acting on them They’ll need to understand your business as well as your tech Folk can be bought or hired with skills in a particular tool, but you may need to change that tool Your hard skills requirements will alter You’ll need engineers with the soft skills that support getting new hard skills (people who can listen, learn and make their own judgments) In the Cloud Native world, a constructive attitude and thinking skills are much more important than familiarity with any one tool or language You need to feel new tools can be adopted as your situation evolves Dilemma – Serverless or Microservice? Serverless aka Function-as-a-Service (like AWS Lambda or Google Cloud Functions or Azure Functions) sounds like the ultimate destiny of a stateless microservice? If a microservice doesn’t need to talk directly with a local database (it’s stateless) then it could be implemented as a function-as-aservice So why not just that and let someone else worry about server scaling, backups, upgrades, patches and monitoring? You’d still need to use stateful products like queues or databases for handling your data but they too could be managed services provided by your cloud provider Then you’d have no servers to worry about This world has a high degree of lock-in (con) but little or no ops work (pro) That is pretty attractive Most folk are trying to reduce their ops work Serverless plus managed stateful services could that However, it’s still early days for Functions-as-a-Service At the moment, I suspect there’s a significant issue with this managed world, which is the lack of strong tooling In the same way that western civilisation rests on the dull bedrock of effective sanitation, modern software development depends on the hygiene factors of code management, monitoring and deployment tools With Serverless you’ll still need the plumbing of automated testing and delivery Tools will appear for Serverless environments but I suspect there isn’t a winning toolchain yet to save us from death by a thousand code snippets Modern team-based software development needs plumbing Most folk will have to create their own right now for Functionas-a-Services so it’s probably still for creative pioneers   CONTENT PAGE PREVIOUS PAGE NEXT PAGE 48 TWELVE AFTERWORDS SHOULD SECURITY BE ONE?   CONTENT PAGE PREVIOUS PAGE NEXT PAGE 49 TWELVE Afterwords Should Security Be One? This chapter is an interview with the brilliant Sam Newman, author of “Building Microservices”, where we discussed the unique challenges of securing Cloud Native systems and microservice architectures Sam’s book is a great read for more microservice-meatiness after this book, which is a mere taster All the intelligent thought in this chapter I entirely attribute to Sam Are Microservices Very Secure or Very Insecure? Unfortunately, the answer to this question is “yes” The first thing that struck me when talking to Sam was that I’d written a whole chapter on Microservices architecture and, indeed, a whole book on Cloud Native but I hadn’t once mentioned security That wasn’t because I don’t care about security or it’s an innate mystery to me, it’s just that it didn’t strike me as a big issue to talk about How wrong I was! Of course it is! And Sam very succinctly told me why In a Cloud Native world probably the biggest security challenge is microservices or, more accurately, how to secure a distributed system What are Microservices Again? As Sam puts it, microservices are independently deployable processes That means in a system of microservices you can start, stop or replace any of them at any time without breaking everything That’s great for reducing clashes between developers, increasing resilience and improving feature velocity, but for security it’s a double-edged sword It can enable you to make everything more secure with better defense in depth (hurray!) but if you don’t make a significant effort it can leave you in a much more exposed position than a monolith (damn!) Hurray, Microservices are Secure! Security-wise, the good thing about microservices, according to Newman, is that by dividing your system up you can separate data and processes into “highly sensitive or critical” and “less sensitive” groups and put more energy, focus and expenditure into protecting your high sensitivity and critical stuff In the olden days of a monolith, everything was together in one place so it all had to be highly protected (or not, as the case may be) Your eggs were all in one basket, which colloquially we tend to disapprove of, although it is not actually an unknown security strategy - WW2 Atlantic convoys very successfully made use of a heavily defended single basket Microservices give you more opportunity to layer your defenses (defense in depth) but also more opportunities to fail to so I’m sure you’re getting the picture that this advantage isn’t entirely clear cut Boo, Microservices are Insecure! However, Sam also told me the downside of microservices is that by spreading your system out over multiple containers and machines you increase the attack surface You have more to protect What kind of attack surfaces are we talking about? • More machines means more OS’s to keep patched for vulnerabilities • More containers means more images to refresh for vulnerability patches • More inter-machine messages means more communications need to be secured against sniffing (people reading your stuff on the wire) or changing the message payload (man-in-the middle attacks) • More service-to-service comms means more opportunity for bad players to start talking to your services masquerading as you Basically, microservices are very powerful but also hard They can improve your security but without careful thought they will probably reduce it In Sam’s correct judgment, microservice security needs to be considered and planned in from the start OK, so what can we about it? Threat Modelling Sam recommends we use a process called “threat modeling”, which helps us analyse potential points of weakness or likely attacks that our distributed, microservice system will have to withstand One useful technique for threat modelling is thinking up “attack trees” that cover every (often multi-step) way a baddie could possibly attack your system and then putting a cost/ difficulty against each attack For example: breaking into my house The lowest attacker-cost way in would be climbing through an open window while I was out (easy) The highest cost way in might be fighting the sabretoothed tiger on my doorstep (hard) The idea is not to make every attack impossible but to make every attack too costly Apparently my sabre-toothed tiger was complete overkill, I should just remember to close my windows   CONTENT PAGE PREVIOUS PAGE NEXT PAGE 50 TWELVE Afterwords - Should Security Be One? Some attacks are physical (like breaking a window) and some are social (like persuading me to let you in to read a meter) The first you usually battle with tools and code, the second with processes Networking You’ll probably also want to use SDN/network security and policy enforcement to make sure that traffic only ever comes at your services from other services they are allowed to talk to Defense in depth folks! Policy AND encryption! Defend, Detect, Respond, Recover According to Sam, a useful way to think about security and how to handle the attack points you’ve just uncovered with your attack tree is as a step process: Defend Detect Respond Patching Everyone’s security “open window”, however, is usually patching You’ve got to keep all your machines and containers patched for vulnerabilities In a microservice environment you are probably going to end up with too many units to this manually You’ll quickly need to automate this process Look at tools that can help you so Recover Polyglot? Defend So, what tools does he say we have that can secure microservices? HTTPS The first and easiest is HTTPS If any of your microservices communicate over HTTP then stop Move them to HTTPS Just because a connection is inside your system perimeter that doesn’t mean we can assume it’s safe from snooping The good news is HTTPS is not as hard as it used to be There are now great tools and free certificates from Let’sEncrypt, amongst others HTTPS also doesn’t slow things down anymore because most servers are optimized for encryption Using HTTPS verifies the data hasn’t been read or tampered with and verifies the callee, but it doesn’t verify the caller For that you’ll need some form of client-side auth, such as clientside certificates Don’t have a heart attack, those are also easier than they used to be Sam says take a look at Lemur from Netflix If you are using other forms of communication rather than REST/HTTP then there are ways to secure that too but that’s too complicated for this chapter so you’ll have to read more of Sam’s work to find out about that Authentication and Authorisation That covers service-to-service authentication but what about user auth? What is a specific individual user allowed to within the perimeter of your product? You still need to use OAuth or equivalent to cover that You’ll also have to consider whether or not services further downstream need to revalidate what a logged in user can Microservices lend themselves to a best-of-breed or polyglot approach where everyone runs their dream stack That has security advantages and disadvantages Commonality is easier to secure until you’ve got your head round everything and automated loads of it Keeping stacks secure and patched is easier than 500 The benefit of diversity, however, is if your hackers find an exploit then maybe they can take it less far, just compromise one microservice Pros and cons abound but Sam recommended that you start with a smaller number of stacks and patch them carefully Detect Logs! And keep your logs for a very long time Sam points out that the usual demand for logs is from developers diagnosing a field issue from maybe a few days or weeks ago Intrusion detection might involve investigating problems from a long time earlier than that so you need to keep logs longer Look at the ELK stack: Logstash, Elasticsearch and Kibana for example IP-based security appliances or tools that detect unusual behaviour inside or at your perimeter are also very useful Respond The success of your immediate response to an attack is less about tools and more about processes Knowing what to and then actually doing it Don’t panic! Don’t ignore it! Have processes that are pre-defined, carefully thoughtthrough, and tested, for acting on attack detection Don’t wait until the problem happens to work out what to next because in the heat of the moment you’ll make mistakes   CONTENT PAGE PREVIOUS PAGE NEXT PAGE 51 TWELVE Afterwords - Should Security Be One? Recover This is the bread and butter stuff Recovery from a security alert is actually just best practice for recovering from any disaster: • Already have all your data backed up, in multiple locations with restore from backup tested • Already have your whole system recreatable at will (ideally automated build and deploy) • In the event of an attack, patch as necessary and then burn it all down and restore everything from scratch That’s a lot of stuff you have to get in place in advance Tough, you’re going to have to it ;-) (that’s me BTW, I’m sure Sam would not be so bossy) So, Sam’s overall conclusion was Microservices are a hugely powerful tool for letting you build defense in depth, but also they also give you loads more opportunities to screw up and leave a window open so you need to think and plan I suspect the general advice is “Don’t Panic” But also “Don’t Ignore it!”   CONTENT PAGE PREVIOUS PAGE NEXT PAGE 52 THE END THE STATE OF THE CLOUD NATION?   CONTENT PAGE PREVIOUS PAGE NEXT PAGE 53 THE END The State of the Cloud Nation? Finally, you’ve reached the end Well done Finding time to read tech books (even short ones) is surprisingly difficult! • If you can’t deploy 10,000 times a day, like Skyscanner will eventually be able to do, are you a Cloud Native failure? Way back in my somewhat facetious blurb I said this book was horizontally and vertically scalable Would you scale up a book horizontally by making more copies or vertically by sizing up the text? Actually the tradeoffs (for tradeoffs there are) are surprisingly analogous to those for distributed systems vs monoliths Will you only win at CN if you check all the boxes? We don’t believe so We think Cloud Nativeness is a spectrum not a value system Infrared is neither superior nor inferior to ultraviolet, there just happen to be use cases for each (inside a microwave or a discotheque you might have a distinct preference) CN is merely a toolbox of architectural approaches that can be very effective at delivering speed (aka feature velocity), scale and reduced hosting costs You can use some of the tools, or all of them, or none of them, depending on what you need There are genuine difficulties with just making more copies of a book (aka horizontal scaling) It requires more resources, there may be licensing issues, you can’t be sure if anyone you gave a copy to actually read the book, how far through it they are or whether they understood it That’s the reason we don’t teach kids to read by handing them a copy of a book and walking away To share a book with young children we choose a big font and read together Vertically sizing up text is clearly neither a fast nor a scalable approach to group reading Usually we distribute copies of the book What I’m saying, however, is there are always some use cases for monolithic (vertical) scaling approaches and some for distributed (horizontal) ones Even the absurd example of vertically scaling a book by increasing the text size and reading en masse has a vital use in teaching literacy to kids There is no one true way to solve every scaling problem In the introduction, we said our goal for this book was to understand Cloud Native (CN) - what it is, what it‘s being used for and whether it’s actually effective We did this by talking to companies, thinking about what they told us and considering our own experiences We tried to show both what we learned and our thought processes so the other half of this partnership (you, dear reader) can form your own judgment Our initial definition for Cloud Native came from the Cloud Native Computing Foundation who say that, ideologically, CN systems are container-packaged, dynamically managed, and microservice-oriented (or orientated if you’re a Brit like me) We’d add another two characteristics: they need to be hosted on flexible, on-demand infrastructure (aka “cloud”), and plumbed in with a high degree of automation (automated testing and continuous integration, delivery and deployment) So, should we conclude that Cloud Native is a five-point checklist? • If you’re containerised and orchestrated and in the cloud, but not microserviced (like 40% of Cloud66’s customers) then are you not really Cloud Native? • Or if you’re microserviced, CI/CD and cloud hosted, but not containerised (like ASOS) then are you not genuine? For example, containers are less useful to you on Windows where the tech is less mature, but you might still want to use microservices to get better dev team concurrency Microservices are less useful to you where a quick Ruby-onRails MVP will suffice, but you might still want to containerize and orchestrate to speed up your deployments You don’t have to adopt all of Cloud Native for it to be useful (although we suspect all successful CN does rely on automation of testing, code management, and delivery processes) A Philosophy Cloud Native may not be a value system but there does appear to be a philosophy to it Everyone we met using CN urgently wanted to move fast and be adaptive to change in their industry, but they didn’t want to break everything they already had Their old tools and processes depended on moving slowly to manage risk but they wanted to move quickly, so they had to use new ones They then often used those same tools to cut their hosting bills and to scale but, critically, that was less vital to them than speed and improving their ability to respond and adapt We saw that Cloud Native was more of an attitude than a checklist It was a rejection of the slow, visionary, utopian big bang It was about embracing an iterative mindset, taking it one small, low-risk step at a time but taking those steps quickly Cloud Native solutions were often distributed and scalable but that was not generally the point The point was delivery speed - getting features out faster Adopting a Cloud Native attitude seems to mean evolving into a flexible business that embraces new technology, trusts its employees’ judgment and is culturally able to move quickly, be experimental and grasp opportunities A Cloud Native attitude doesn’t sound bad to me   CONTENT PAGE PREVIOUS PAGE NEXT PAGE 54 REFERENCES   CONTENT PAGE PREVIOUS PAGE NEXT PAGE 55 REFERENCES - Cloud Native Computing Foundation charter https://www cncf.io/about/charter/ The Linux Foundation, November 2015 http://blog.nwcadence.com/continuousintegrationcontinuousdelivery/ August 2014 - Wikipedia, https://en.wikipedia.org/wiki/Orchestration_ (computing) September 2016 19 - Docker - What is a Container? https://www.docker.com/ what-container 2017 - The Register ‘EVERYTHING at Google runs in a container’ https://www.theregister.co.uk/2014/05/23/google_ containerization_two_billion/ May 2014 20 - WeaveWorks Comparing Container Orchestrators https:// www.weave.works/blog/comparing-container-orchestration/ November 2016 - Ross Fairbanks Microscaling Systems Use Kubernetes in Production https://medium.com/microscaling-systems/ microscaling-microbadger-8cba7083e2a February 2017 About the Authors - Forbes David Williams, The OODA loop https://www forbes.com/sites/davidkwilliams/2013/02/19/what-a-fighterpilot-knows-about-business-the-ooda-loop/#30e3c4963eb6 February 2013 - The Skeptical Inquirer, Prof Richard Wiseman The Luck Factor http://www.richardwiseman.com/resources/The_Luck_ Factor.pdf June 2003 - Skyscanner Stuart Davidson http://codevoyagers com/2016/05/02/continuous-integration-where-we-werewhere-we-are-now/ May 2016 Anne Currie Anne Currie has been in the software industry for over 20 years working on everything from large scale servers and distributed systems in the ‘90’s to early ecommerce platforms in the 00’s to cutting edge operational tech on the 10’s She has regularly written, spoken and consulted internationally She firmly believes in the importance of the technology industry to society and fears that we often forget how powerful we are She is currently working with Container Solutions - ASOS public revenue data from https://www.asosplc.com/ Container Solutions - About Cloud66 http://www.cloud66.com Based in London, Amsterdam and Zurich, Container Solutions (CS) was formed in 2014 to provide specialist analysis and engineering around the new technologies of microservices, CI/ CD, containers and orchestrators 10 - AWS SQS provides reliable message queuing https:// ndolgov.blogspot.co.uk/2016/03/aws-sqs-for-reactiveservices.html 11 - Peter Norvig Moving data between machines http://norvig com/21-days.html#answers 12 - Husobee Restful vs RPC https://husobee.github.io/golang/ rest/grpc/2016/05/28/golang-rest-v-grpc.html) May 2016 The contents of this article are under Creative Commons Attribution licence creativecommons.org/licenses/by/4.0/ 13 - Cloud Native early mentions InformationWeek http:// www.informationweek.com/cloud/platform-as-a-service/ cloud-native-what-it-means-why-it-matters/d/d-id/1321539 July 2015 14 - Greenpeace report on inefficient use of energy within the IT sector http://www.greenpeace.de/sites/www.greenpeace.de/ files/publications/20170110_greenpeace_clicking_clean.pdf January 2017 15 - Sam Newman -The Principles of Microservices http:// samnewman.io/talks/principles-of-microservices/ 2015 16 - Wikipedia - Test Automation https://en.wikipedia.org/ wiki/Test_automation 17 - Upgard Overview of Configuration Tools https://www upguard.com/articles/the-7-configuration-management-toolsyou-need-to-know July 2017 18 - Bryon Root - The Difference Between CI and CD container-solutions.com  CONTENT PAGE PREVIOUS PAGE 56 ... to as just a “container” and it is transient – unlike a VM, a container only exists while it is executing (after all it s just a process with some additional limitations being enforced by the... to make Do you start with feature velocity and ease or with scale and margin? It s absolutely sensible to start easy and add incremental complexity as you gain familiarity and expertise In our... big, multi-purpose “monoliths” that were slow to stop and start We often still architect that way because it has many benefits, it just happens not to work so well with some aspects of dynamic

Định dạng
Số trang	57
Dung lượng	1,44 MB