Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 42 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
42
Dung lượng
5,11 MB
Nội dung
FACILITATING THE SPREAD OF KNOWLEDGE AND INNOVATION IN PROFESSIONAL SOFTWARE DEVELOPMENT Service Meshes Managing Complex Communication within Cloud Native Applications eMag Issue 63 - Aug 2018 ARTICLE ARTICLE ARTICLE Istio and the Future of Service Meshes Service Mesh: Promise or Peril? Increasing Security with a Service Mesh IN THIS ISSUE Istio and the Future of Service Meshes 12 Service Mesh: Promise or Peril? 16 Envoy Service Mesh Case Study: Mitigating Cascading Failure at Lyft 24 Increasing Security with a Service Mesh: Christian Posta Explores the Capabilities of Istio 28 How to Adopt a New Technology: Advice from Buoyant on Utilising a Service Mesh 32 Microservices Communication and Governance Using Service Mesh FOLLOW US CONTACT US GENERAL FEEDBACK feedback@infoq.com ADVERTISING sales@infoq.com EDITORIAL editors@infoq.com facebook.com /InfoQ @InfoQ google.com /+InfoQ linkedin.com company/infoq A LETTER FROM THE EDITOR Daniel Bryant Modern cloud-native applications often focus on architectural styles such as microservices, function as a service, eventing, and reactivity Cloud-native applications typically run within virtualized environments — whether this involves sandboxed process isolation, container-based solutions, or hardware VMs — and applications and services are dynamically orchestrated Although this shift to building cloud-native systems provides many benefits, it also introduces several new challenges, particularly around the deployment of applications and runtime configuration of networking Some of these technological challenges have been solved with the emergence of de facto solutions: for example, Docker for container packaging and Kubernetes for deployment and runtime orchestration However, one of the biggest challenges, implementing and managing dynamic and secure networking, did not initially get as much traction as other problem spaces Innovators like Calico, Weave, and CoreOS provided early container networking solutions, but it arguably took the release of Buoyant’s Linkerd, Lyft’s Envoy proxy, and Google’s Istio to really drive engineering interest in this space With one of the first uses of the phrase “service mesh” in February 2016, Buoyant CEO William Morgan announced the benefits of technology providing “Twitter-style Operability for Microservices Matt Klein, plumber and loaf balanc- er at Lyft, further extended the concept when announcing the release of the Envoy Proxy in September 2016: “Envoy runs on every host and abstracts the network by providing common features (load balancing, circuit breaking, service discovery, etc.) in a platform-agnostic manner. Phil Calỗado has written an excellent introduction to and history of the service mesh, and I have presented at the microXchg 2018 conference about the fundamental benefits and challenges of using a service mesh This emag aims to provide you with a guide to service mesh technology, not simply the technical ins and outs but also an exploration that aims to help you understand why this technology is important, where it is going in the future, and why you may want to adopt it The emag begins with Jasmine Jaksic from Google, who provides a guide to Istio and the future of service meshes She outlines how the Istio service mesh is split into a data plane, built from Envoy proxies that controls communication between services, and a control plane that provides policy enforcement, telemetry collection, and more Jaksic concludes by stating that the longterm vision is to make Istio ambient within the platform In the second article, Richard Li discusses the pros and cons of your technology stack adopting a service mesh, based on his experience as CEO at Datawire If you have a small number of microservices or a shallow topology, Li says you should consider delaying adoption of a service mesh, and instead evaluate alternative strategies for failure management If you are deploying a service mesh, be prepared to invest ongoing effort in integrating the mesh into your software development lifecycle Next, Jose Nino and Daniel Hochman examine the mitigation of cascading failure at Lyft, where their Envoy-powered service mesh handles millions of requests per second in production Today, these failure scenarios are largely a solved problem within the Lyft infrastructure as every deployed service gets throughput and concurrency protection automatically via use of the Envoy proxy You will also learn that over the coming months, Lyft engineers will be working in conjunction with the team behind the (self-configuring) concurrency-limits library at Netflix in order to bring a system based on their library into an Envoy L7 Filter In the fourth article, Christian Posta from Red Hat presents a practical guide to how Istio can facilitate good security practices within a microservice-based system, such as service-to-service encryption and authorization He argues that these foundational security features pave the road for building zero-trust networks in which you assign trust based on identity as well as context and circumstances, not just because the caller happens to be on the same internal network Next, Thomas Rampelberg explores how to best adopt a new technology like a service mesh, and shares his experience from working at Buoyant with Linkerd His core advice is that planning for failure and understanding the risks along the entire journey will actually set you up for success The concerns that you collect early on can help with this planning, and each possible risk can be addressed early on so that it doesn’t become an issue The emag concludes with a virtual panel hosted by fellow InfoQ editor Srini Penchikala, and includes a series of great discussions from innovators in the service mesh space such as Matt Klein, Dan Berg, Priyanka Sharma, Lachlan Evenson, Varun Talwar, Yuri Shkuro, and Oliver Gould The service mesh space is a rapidly emerging technical and commercial opportunity, and although we expect some aggregation or attrition of offerings over the coming months and years, for the moment, there are plenty of options to choose from (many of which we have covered on InfoQ): • Istio and Envoy, which are covered in this emag; • Linkerd (and Linkerd 2, which includes Conduit) are also covered here; • Cilium, API-aware networking and security powered by the eBPF kernel features; • HashiCorp Consul Connect, a distributed service mesh to connect, secure, and configure services across any runtime platform; and • NGINX (with Istio and nginMesh) or NGINX Plus with Controller We hope this InfoQ emag will help you decide if your organisation would benefit from using a service mesh, and if so, that it also guides you on your service mesh journey We are always keen to publish practitioner experience and learning, and so please get in touch if you have a service mesh story to share CONTRIBUTORS Daniel Bryant is leading change within organisations and technology His current work includes enabling agility within organisations by introducing better requirement gathering and planning techniques, focusing on the relevance of architecture within agile development, and facilitating continuous integration/delivery Daniel’s current technical expertise focuses on ‘DevOps’ tooling, cloud/container platforms and microservice implementations Jasmine Jaksic works at Google as the lead technical program manager on Istio She has 15 years of experience building and supporting various software products and services She is a cofounder of Posture Monitor, an application for posture correction using a 3-D camera She is also a contributing writer for The New York Times, Wired, and Huffington Post Follow her on Twitter: @JasmineJaksic Richard Li is the CEO/co-founder of Datawire, which builds opensource tools for developers on Kubernetes Previously, Li was VP product and strategy at Duo Security Prior to Duo, Richard was VP strategy and corporate development at Rapid7 Li has a B.Sc and M.Eng from MIT Daniel Hochman is a senior infrastructure engineer at Lyft He’s passionate about scaling innovative products and processes to improve quality of life for those inside and outside the company During his time at Lyft, he has successfully guided the platform through an explosion of product and organizational growth He wrote one of the highestthroughput microservices and introduced several critical storage technologies Hochman currently leads Traffic Networking at Lyft and is responsible for scaling Lyft’s networking infrastructure internally and at the edge Srini Penchikala Jose Nino Christian Posta Thomas Rampelberg currently works as a senior software architect in Austin, Tex Penchikala has over 22 years of experience in software architecture, design, and development He is also the lead editor for AI, ML & Data Engineering community at InfoQ, which recently published his mini-book Big Data Processing with Apache Spark He has published articles on software architecture, securiaty, risk management, NoSQL, and big data at websites like InfoQ, TheServerSide, the O’Reilly Network (OnJava), DevX’s Java Zone, Java.net, and JavaWorld (@christianposta) is a chief architect of cloud applications at Red Hat and well known in the community for his writing (Introducing Istio Service Mesh, Microservices for Java Developers) He’s also known as a frequent blogger, speaker, open-source enthusiast, and committer on various open-source projects including Istio, Apache ActiveMQ, Fabric8, etc Posta has spent time at web-scale companies and now helps companies create and deploy large-scale, resilient, distributed architectures — many of what we now call microservices He enjoys mentoring, training, and leading teams to be successful with distributed systems concepts, microservices, DevOps, and cloudnative application design is the lead for dev tooling and configuration on the Networking team at Lyft During the nearly two years he’s been at Lyft, Nino has been instrumental in creating systems to scale configuration of Lyft’s Envoy production environment for increasingly large deployments and engineering orgs He has worked as an open-source Envoy maintainer and has nurtured Envoy’s growing community More recently, Nino has moved on to scaling Lyft’s network load-tolerance systems He has spoken about Envoy and related topics at several venues, most recently at KubeCon Europe 2018 is a software engineer at Buoyant, which created the Linkerd service mesh He has made a career of building infrastructure software that allows developers and operators to focus on what is important to them While working for Mesosphere, he helped create DC/OS, one of the first container orchestration platforms, used by many of the Fortune 500 He has moved to the next big problem in the space: providing insight into what is happening between services, improving reliability between them, and using best practices to secure the communication channels between them Read online on InfoQ KEY TAKEAWAYS The microservices architectural style simplifies the implementation of individual services However, connecting, monitoring, and securing hundreds or even thousands of microservices is not simple A service mesh provides a transparent and language-independent way to flexibly and easily automate networking, security, and observation functions In essence, it decouples development and operations for services The Istio service mesh is split into 1) a data plane built from Envoy proxies that intercepts traffic and controls communication between services and 2) a control plane that supports services at runtime by providing policy enforcement, telemetry collection, and certificate rotation ISTIO AND THE FUTURE OF SERVICE MESHES by Jasmine Jaksic It wouldn’t be a stretch to say that Istio popularized the concept of a service mesh Before we get into the details on Istio, let’s briefly dive into what a service mesh is and why it’s relevant The near-term goal is to launch Istio to 1.0, when the key features will all be in beta (including support for Hybrid environments) The long-term vision is to make Istio ambient Service Meshes // eMag Issue 63 - Aug 2018 We all know the inherent challenges associated with monolithic applications, and the obvious solution is to decompose them into microservices While this simplifies individual services, connecting, monitoring and securing hundreds or even thousands of microservices is not simple Until recently, the solution was to string them together using custom scripts, libraries, and dedicated engineers tasked with managing these distributed systems This reduces velocity on many fronts and increases maintenance costs This is where a service mesh comes in A service mesh provides a transparent and language-independent way to flexibly and easily automate networking, security, and telemetry functions In essence, it decouples development and operations for services so a developer can deploy new services as well as make changes to existing ones without worrying about how that will impact the operational properties of their distributed systems Similarly, an operator can seamlessly modify operational controls across services without redeploying them or modifying their source code This layer of infrastructure between services and their underlying network is what is usually referred to as a service mesh Within Google, we use a distributed platform for building services, powered by proxies that can handle various internal and external protocols These proxies are supported by a control plane that provides a layer of abstraction between developers and operators and lets us manage services across multiple languages and platforms This architecture has been battle-tested to handle high scalability and low latency and to provide rich features to every service running at Google Illustration: Dan Ciruli, Istio PM Back in 2016, we decided to build an open-source project for managing microservices that in a lot of ways mimicked what we use within Google, which we decided to call “Istio” Istio means “sail” in Greek, and Istio started as a solution that worked with Kubernetes, which in Greek means “helmsman” or “pilot” It is important to note that Istio is agnostic to deployment environment, and was built to help manage services running across multiple environments Around the same time that we were starting the Istio project, IBM released an open-source project called Amalgam8, a content-based routing solution for microservices that was based on NGINX Realizing the overlap in our use cases and product vision, IBM agreed to become our partner in crime and shelve Amalgam8 in favor of building Istio with us, based on Lyft’s Envoy How does Istio work? Broadly speaking, an Istio service mesh is split into 1) a data plane built from Envoy proxies that Service Meshes // eMag Issue 63 - Aug 2018 intercepts traffic and controls communication between services and 2) a control plane that supports services at runtime by providing policy enforcement, telemetry collection, and certificate rotation Proxy Envoy is a high-performance, open-source distributed proxy developed in C++ at Lyft (where it handles all production traffic) Deployed as a sidecar, Envoy intercepts all incoming and outgoing traffic to apply network policies and integrate with Istio’s control plane Istio leverages many of Envoy’s built-in features such as discovery and load balancing, traffic splitting, fault injection, circuit breakers, and staged rollouts Pilot As an integral part of the control plane, Pilot manages proxy configuration and distributes service communication policies to all Envoy instances in an Istio mesh It can take high-level rules (like rollout policies), translate them into low-level Envoy configuration, and push them to sidecars with no downtime or redeployment necessary While Pilot is agnostic to the underlying platform, operators can use platform-specific adapters to push service discovery information to Pilot Mixer Mixer integrates a rich ecosystem of infrastructure back-end systems into Istio It does this through a pluggable set of adapters using a standard configuration model that allows Istio to be easily integrated with existing services Adapters extend Mixer’s functionality and expose specialized interfaces for monitoring, logging, tracing, quota management, and more Adapt10 ers are loaded on demand and used at runtime based on operator configuration Citadel Citadel (previously known as Istio Auth) performs certificate signing and rotation for service-to-service communication across the mesh, providing mutual authentication as well as mutual authorization Envoy uses Citadel certificates to transparently inject mutual transport-layer security (TLS) on each call, thereby securing and encrypting traffic using automated identity and credential management As is the theme throughout Istio, authentication and authorization can be configured with minimal to no changes of service code and will seamlessly work across multiple clusters and platforms Why use Istio? Istio is highly modular and is used for a variety of use cases While it is beyond the scope of this article to cover every benefit, let me provide a glimpse of how it can simplify the day-to-day life of NetOps, SecOps, and DevOps Resilience Istio can shield applications from flaky networks and cascading failures If you are a network operator, you can systematically test the resiliency of your application with features like fault injection to inject delays and isolate failures If you want to migrate traffic from one version of a service to another, you can reduce the risk by doing the rollout gradually through weight-based traffic routing Even better, you can mirror live traffic to a new deployment to observe how it behaves before doing the actual migration You can use Istio Gateway to load-balance the incoming and outgoing traffic and Service Meshes // eMag Issue 63 - Aug 2018 apply route rules like timeouts, retries, and circuit breaks to reduce and recover from potential failures Security One of the main Istio use cases is securing inter-service communications across heterogeneous deployments In essence, security operators can now uniformly and at scale enable traffic encryption, lock down the access to a service without breaking other services, ensure mutual identity verification, whitelist services using ACLs, authorize service-to-service communication, and analyze the security posture of applications They can implement these policies across the scope of a service, namespace, or mesh All of these features can reduce the need for layers of firewalls and simplify the job of a security operator Observability The ability to visualize what’s happening within your infrastructure is one of the main challenges of a microservice environment Until recently individual services had to be extensively instrumented for end-to-end monitoring of service delivery And unless you have a dedicated team willing to tweak every binary, getting a holistic view of the entire fleet and troubleshooting bottlenecks can be cumbersome With Istio, out of the box, you get visualization of key metrics and the ability to trace the flow of requests across services This allows you to things like enable autoscaling based on application metrics While Istio supports a whole host of providers like Prometheus, Stackdriver, Zipkin, and Jaeger, it is back-end agnostic If you don’t find what you are look- Read online on InfoQ KEY TAKEAWAYS Be mindful of the impact of adopting a new technology like a service mesh into your production stack has on you and your colleagues and you can successfully empower stakeholders Be clear about what problem you are solving, and define appropriate acceptance criteria Run experiments that attempt to show how a service mesh can make life better for the various stakeholders Cultivate and educate allies and champions with small-scale demonstrations that show how a service mesh made things better Planning for failure and understanding the risks along the entire journey will set you up for success The concerns that you collected early on can help with this planning You can address each possible risk early so that it doesn’t become a problem. 28 HOW TO ADOPT A NEW TECHNOLOGY ADVICE FROM BUOYANT FOR A SERVICE MESH by Thomas Rampelberg Adopting technology and deploying it into production requires more than a simple snap of the fingers Making the rollout successful and making real improvements are even tougher Service Meshes // eMag Issue 63 - Aug 2018 When you’re looking at a new technology such as service meshes, it is important to understand that the organizational challenges you’ll face are just as important as the technological But there are clear precautions you can take to smooth the road to production It is important to identify what problems a service mesh will solve for you Remember, it isn’t just about adopting technology Once the service mesh is in production, it needs to provide benefits This is the foundation of your road to production Once you’ve identified the problem that will be solved, it’s time to go into sales mode No matter how little the price tag, getting a service mesh into production requires investment by many parties Changes impact co-workers in ways that range from learning new technology to disruption of their mission-critical tasks One of my favorite quotes is “No plan survives contact with the enemy” (Helmuth von Moltke the Elder) How many times has a deployment to production surprised you? If we assume that something will go wrong, it is possible to plan for that up front This is all part of “good” engineering practice Unfortunately, it is easy to see all the ways something can go right and ignore the ways that it can go wrong — in particular, because it is common for the ways that something can go wrong to be part of the unknowns. Remember the problem you’re solving? Now is the time to validate that you will actually solve the problem Even when you’ve gone through a proof of concept for technology, the chasm between a product website and what that new technology does in the real world can be wide Be- fore you take small steps towards a complete rollout in production, it is important to take some time and verify that you’re helping instead of hurting What problem are you solving? You can use service meshes to solve many problems It is important to nail down what for you is the most important problem You can used that as acceptance criteria: has this been worth the effort? Good problems are ones that have multiple possible solutions If the only solution to a problem is to use a specific technology, you might be seeing the world as if you have a hammer and everything is a nail The best problems that service meshes solve are the ones that empower microservice owners to what they best and not focus on the things that every platform needs to provide: observability, reliability, and security Working with microservices is hard In particular, it can be tough to debug the interactions between services to understand why something is broken It is entirely possible to solve this problem by working with service owners and getting them to build tools to provide the visibility required A service mesh would, however, provide the required visibility with less effort from everyone involved Just think about how empowering it is to let service owners worry about other things and get interaction debugging for as little work as possible In a microservice world, each service ends up depending on many others When one service fails, that one failure can cascade into other services that compose the stack Services can get stuck in lengthy retry loops that consume resources while they process retry queues Left unmanaged, what could have been a small isolated failure instead becomes a larger system issue that users insistently complain about Circuit breaking provides primitives that can mitigate those lengthy loops and stop a cascading failure in its tracks Following the theme of empowerment, by using a service mesh to solve this problem, you provide functionality that helps service owners easily build more resilient services At some point, the specter of compliance may haunt your doorstep Auditors will ask for encryption for data in motion The amount of work required to update and audit every service can be extreme There are the unfortunate details such as certificate revocation and update By using a service mesh, you can make all these problems an operational concern instead of a developmental one With one way to handle encryption for data in motion instead of the potential hundreds that you can be confronted with in a microservices world, the audit will go more smoothly For Houghton Mifflin Harcourt (HMH), an American educational and trade publisher, the problem to solve was developer agility Robert Allen, HMH’s director of engineering said: With Linkerd, a team could continue forward on a work contract and be ahead of the game, and not disrupt their deployment schedule We could decouple teams more and become a lot more agile This was a huge benefit Having a concrete problem statement and defining clear ac- Service Meshes // eMag Issue 63 - Aug 2018 29 ceptance criteria is the first step towards successfully adopting a service mesh in production You get a tool to use in the next step when selling the value of a service mesh to others, and a way to measure progress as the rollout occurs There are other problems you could be solving; these common problems come up time and again Sell it Rarely does anyone work alone, and it is unlikely that you can get a service mesh into production without help If you don’t (or can’t) convince your colleagues that a service mesh is a good idea, the path to production becomes infinitely more difficult, if not downright impossible Armed with the problem to solve, defined acceptance criteria, and a clear explanation of the mesh’s value, you have an opportunity to gather allies to your cause Turning your colleagues into allies creates additional voices to champion the virtues of a service mesh That sort of organizational buy-in can help you avoid many of the possible missteps farther along the road Every stakeholder will have their own concerns A developer might care about learning new technology and writing integration code that moves out their current deadlines Your management team may be concerned about downtime and new business dependencies It is valuable to talk with all stakeholders and understand their concerns Their concerns will help you shape the rollout and provide a basis for describing what benefits they’ll receive When you’re solving the right problems for your organization, you’re providing benefits to all of the stakeholders Build a list of benefits and incentives that addresses each concern This is the fun part! You have the opportunity to explain how solving this problem will empower your colleagues with new tools and capabilities Just imagine how exciting it is for a security team to understand that there will be consistent encryption between services The following table is a sample of stakeholder concerns Stakeholder Incentive Concern Platform engineers • Unified visibility across all services • Is it reliable? • Failure isolation • Will it introduce complexity? Developers/ service owners • Remove complex communication logic • What I have to change? from your code • Do I have to learn a new complicated way • Easily run parallel versions of a service of doing things? Security team • Consistent application of TLS and au• Will it make things less secure? thz/authn across services • What new attack vectors are introduced? • Policy Management Plan for failure • Faster pace of development • Fewer outages There are “opportunities” for a production rollout to trip up at every step By planning to encounter challenges at each step, it is possible to make them all go a little more smoothly Even before anything gets into production, there are opportunities for failure Take each step of the road to production with an eye towards the possible risks They can 30 • What dependencies are we introducing to our business? come from anywhere, and many will be unknown until you start implementing a certain stage of the process Focus on the problem that you’re solving — don’t try to boil the ocean It is tempting to see every possibility and get overly excited but as a project’s scope increases, so the risks and time required By keeping the problem and its Service Meshes // eMag Issue 63 - Aug 2018 acceptance criteria in focus, you have leverage that can help keep scope creep to a minimum and that allows you to move forward with confidence Start small Making incremental progress is important Sometimes, progress may feel impossible, but there is always a much smaller piece of the larger picture that you can work on first By separating the larger project into small deliverables, you’re able to remove much of the risk involved with introducing change Can you imagine everything required to change something significant in production all at once? Knowing about risks is only half of the battle You need to budget time to address them Clearly communicating your plan to deal with risks and involving your co-workers in that plan is a key tactic You may feel like it is taking too long to demonstrate value to your stakeholders It can be hard to understand why the production rollout of a service mesh could take so long The benefit of working in small incremental steps not only addresses risks, it also presents a chance for you to provide clear communication each step of the way Verify that each small step along the road was successful and communicate all progress Looking back at what worked well and didn’t for each incremental step is also an effective strategy for getting your service mesh into prod Retrospectives encourage communication and focus on good diagnostics while you discuss exactly what happened With the problem being solved at the top of everyone’s mind, clear verification becomes an integrated part of the rollout process Planning for failure also means accounting for what is to blame When something goes wrong, and you’re in the process of changing anything in the system, you’re often the first to get blamed It isn’t always your fault though! Misunderstood technology regularly gets blamed for things it can’t possibly Whenever this happens, you have an opportunity to educate your colleagues, explaining exactly what a service mesh does and doesn’t What are the tradeoffs? It can feel good to try and address every possible risk but that doesn’t always make sense Understand the potential impact of each risk Some risks can be extremely improbable, while having impressive implications Other risks are likely to happen yet have only minor consequences Each of these risks has a specific cost associated with mitigation Once you understand the cost of mitigation and the impact of the risk, it is possible weigh tradeoffs, make the call, and clearly communicate that By communicating that you gauged the cost and implications of a particular risk, it is possible to calm stakeholders with specific concerns opportunities to ally colleagues to your project Planning for failure and understanding the risks along the entire journey will set you up for success The concerns that you collected early on can help with this planning You can address each possible risk early so that it doesn’t become a problem Form3 put Linkerd into production and ended up with something they could rely on Ed Wilde mentions how it made a huge difference between previous systems: One thing that’s clear with Form today is how few errors there are On our previous system, you’d see a low background level of errors This was just accepted The difference with this system is that we just don’t have errors anymore Linkerd has proven to be a component you can rely on, and it is very solid We have had no operational issues with it Conclusion Being mindful of the impact the adoption of a new technology into production has on you and your colleagues, you can smoothly and successfully empower stakeholders The first step is being clear about what problem you’re solving Pick a real problem that you’re experiencing, define clear criteria that will show it has been fixed, and use that to show how a service mesh has made life better Get allies by demonstrating how a service mesh made things better for them It’s that type of direct help that will get them on your side, and that’s how you grow additional champions You’ll need to understand their concerns You must also accept that change always comes with risk Understanding this, combined with a clear view of the problem you’re solving and its acceptance criteria, will present This isn’t a foolproof plan, but by following these steps, you’ll be much better equipped to roll a service mesh out into production in your company Getting a service mesh into production isn’t just about the technology It’s about knowing how to empower your colleagues and making them feel like the service mesh gives them superpowers Service Meshes // eMag Issue 63 - Aug 2018 31 Read online on InfoQ KEY TAKEAWAYS Service-mesh frameworks are used for handling service-to-service communication and offer a platform to connect, manage, and secure microservices A service mesh helps application developers by taking care of features that require complex coding like routing decisions, which are done at the mesh level, not in the applications It also provides security policies that you can program into the mesh For example, you can set up a policy that restricts inbound internet traffic to some of the services in the mesh A service mesh like Istio works seamlessly on platforms like Kubernetes but there are rough edges when using it on other platforms Sidecar proxies let you effectively and reliably decouple your applications from the operational aspects of managing service communication 32 MICROSERVICES COMMUNICATION AND GOVERNANCE USING A SERVICE MESH by Srini Penchikala InfoQ spoke with a panel of experts in service meshes to learn more about why these frameworks have become critical components of cloud-native architectures Service Meshes // eMag Issue 63 - Aug 2018 THE PANELISTS Matt Klein is a software engineer at Lyft and the architect of Envoy Klein has been working on operating systems, virtualization, distributed systems, networking, and making systems easy to operate for over 15 years across a variety of companies Some highlights include leading the development of Twitter’s C++ L7 edge proxy and working on high-performance computing and networking in Amazon’s EC2 Dan Berg is a distinguished engineer within the IBM Cloud unit. Daniel is responsible for the technical strategy and implementation of the containers and microservices platform available in IBM Cloud Within this role, Berg has deep knowledge of container technologies, including Docker and Kubernetes, and has extensive experience building and operating highly available cloud-native services Berg is also a core contributor to the Istio service-mesh project is an entrepreneur with a passion for building developer products and growing them through open source communities Currently at GitLab, she formerly led Open Source Partnerships at LightStep and is a contributor to the OpenTracing project, a CNCF project that provides vendor-neutral APIs for distributed tracing She serves as an advisor to startups at HeavyBit industries, an accelerator for developer products Follow her on Twitter @pritianka Yuri Shkuro Lachlan Evenson is a cloud-native evangelist and mercenary Evenson has spent the last two-and-a-half years working with Kubernetes and enabling cloud-native journeys He is a believer in open source and is an active community member Evenson spends his days helping make cloud-native projects run great on Azure Priyanka Sharma Oliver Gould is CTO and co-founder of Buoyant, where he leads open-source development efforts for open service-mesh projects Linkerd and Conduit Prior to Buoyant, he was a staff infrastructure engineer at Twitter, where he was the tech lead of Observability, Traffic, and Configuration & Coordination teams He is the creator of Linkerd and a core contributor to Finagle, the high-volume RPC library used at Twitter, Pinterest, SoundCloud, and many other companies is a staff engineer at Uber Technologies, working on distributed tracing, reliability, and performance Shkuro is the coauthor of the OpenTracing standard (a CNCF project) and a tech lead for Jaeger, Uber’s open-source distributed tracing system Varun Talwar is formerly worked in the Google Cloud platform team and was the founding product manager of the gRPC and Istio projects Service Meshes // eMag Issue 63 - Aug 2018 33 A service mesh is a dedicated infrastructure layer for handling service-to-service communication and offers a platform to connect, manage, and secure microservices A service mesh makes the communication between microservices flexible and reliable It provides the critical capabilities needed in distributed-services environments such as resiliency, service discovery, load balancing, encryption, authentication and authorization, and fault tolerance (via service retry and circuit breaking) InfoQ: Can you define what a service mesh is and tell us what advantages it brings to microservices interaction and governance? Matt Klein: The two most difficult problems facing microservice practitioners are networking and observability — i.e., how services talk to each other reliably? When things go wrong, how can the problem be quickly determined and either fixed or worked around? Reliable microservice networking and observability requires a multitude of techniques including service discovery, load balancing, timeouts, retries, circuit breakers, health checking, advanced routing, stats, logging, distributed tracing, and more Historically, most modern architectures have built feature-rich libraries in each language that the organization uses that perform these concerns This by definition necessitates reimplementing and maintaining a large amount of sophisticated functionality in multiple languages plements all of the sophisticated networking and observability needs for a microservice architecture in one place and at very high performance Since the proxy implements the required functionality in a dedicated process, it can work alongside any application language When every application has an associated sidecar proxy and routes all traffic through it, the application itself no longer needs to be aware of the underlying network details and can treat it as an abstraction This allows application developers to largely focus on business logic, irrespective of the many languages that might be in use within their organization Dan Berg: Service mesh is a term used to describe the network of microservices that make up applications and the management of the interactions between them One example of this is Istio, an open technology that provides a way for developers to seamlessly connect, manage, and secure networks of different microservices — regardless of platform, source, or vendor A service mesh helps developers to be more productive by moving complex and error-prone logic The idea behind the service mesh is to use an out-of-process sidecar proxy running alongside every application This proxy im- 34 Service Meshes // eMag Issue 63 - Aug 2018 from the application code out to the mesh For example, a service mesh manages traffic routing and shaping, ensures secure communication between services, captures network telemetry, and enforces security policy for all services within the mesh A service mesh ensures greater resiliency of services with built-in circuit-breaking support, which handles failures in a graceful manner when a service is unable to reach its destination Priyanka Sharma: A service mesh is an infrastructure layer for service-to-service communication It ensures reliable delivery of your messages across the entire system and is separate from the business logic of your services Service meshes are often referred to as sidecars or proxies As software fragments into microservices, service meshes go from being nice to have to essential With a service mesh, not only will you ensure resilient network communications, you can also instrument for observability and control, without changing the application runtime Service meshes make it easier for organizations to adopt microservices with consistent tooling across engineering teams Individual developers can focus on their services and let the mesh take care of the network-layer communications as well as the tooling around the microservices Lachlan Evenson: A service mesh is a set of applications that enables uniform service-to-service communication in microservice architectures. Service meshes enable microservice developers and operators to interact with dependent services in a prescribed and expected fashion This aids in governance by providing a single interface and, as such, a single point of policy enforcement for all communication rather than bespoke or boilerplate implementations Varun Talwar: A service mesh is an architectural pattern whereby all service communications and common functions needed by microservices are handled by a platform layer (outside code) uniformly When a platform layer like this can uniformly implement common network functions like routing and load balancing, resiliency functions like retries and timeouts, security functions like authentication and authorization, and service-level monitoring and tracing, it can significantly ease the job of microservice developers and enable a smart consist infrastructure that can allow organizations to manage higher abstractions of services (independent of underlying network and infrastructure) Yuri Shkuro: The term “service mesh” is rather misleading In the direct interpretation, it could be used to describe both the network of microservices that make up distributed applications and the interactions between them However, recently the term has been mostly applied to a dedicated infrastructure layer for handling service-to-service communication, usually implemented as lightweight network proxies (sidecars) that are deployed alongside application code The application code can treat any other service in the architecture as a single logical component running on a local port on the same host It frees the application code from having to know about the complex topology of modern, cloud-native applications It also allows infrastructure teams to focus their energy on implementing advanced features like routing, service discovery, circuit breaking, retries, security, monitoring, etc in a single sidecar component, rather than supporting them across multiple programming languages and frameworks typical of modern applications Oliver Gould: A service mesh is a dedicated infrastructure layer for making runtime communication between microservices safe, fast, and reliable At Twitter, we learned that this communication is a critical determinant of the application’s runtime behavior, but if you aren’t explicitly dealing with it, you end up with a fragile, complex system A service mesh gives an operator the control they need to debug and manage this communication InfoQ: The enterprise service bus (ESB) pattern has been popular for several years, especially in service-oriented architecture (SOA) models How you contrast the service-mesh pattern to what ESB offers? Klein: I’m not going to debate the differences between SOA versus microservices or ESB versus service mesh To be perfectly honest, I think there is very little real difference and the name changes are mostly driven by vendors attempting to differentiate new products Computing, and engineering in general, are driven by iterative change In recent years, most SOA/microservice communication has moved to REST and newer, strongly typed IDLs such as Thrift and gRPC Developers have favored simplicity via direct networking calls from in-process libraries versus centralized message buses Unfortunately, most in-process libraries in use are not sufficiently solving the operational pain points that come when running a microservice architecture (Finagle and Hystrix/Ribbon are exceptions but require use of the JVM) I view “service mesh” as really just a modern take on the ESB architecture, adapted to the technologies and processes that are now in favor among microservice practitioners Berg: At a high level, an ESB and a service mesh appear to be similar in that they manage the communication between a set of services; however, there are fundamental differences A key difference is that messages are sent to an ESB which in determines which endpoint to send the message The ESB is a centralized point for making routing decisions, performing message transformations, and managing security between services A service mesh, on the other hand, is a decentralized approach where client-side proxies are programmed via the service-mesh control plane to manage routing, security, and metrics gathering Thus, the service mesh pushes key responsibilities out to the application versus encapsulating the functionality within a centralized system such as an ESB This makes the service mesh much Service Meshes // eMag Issue 63 - Aug 2018 35 more resilient and it scales better within a highly distributed system such as seen with cloud-native applications Due to the client-side approach used with a service mesh, it is possible to have much more sophisticated routing rules, policy enforcement, and resiliency features such as circuit breakers than what can be achieved with an ESB Another key difference is that the application logic doesn’t know that it is participating within a service mesh The mesh adjusts to adopt applications With an ESB, the application logic must be adjusted to participate with the ESB Sharma: ESBs and service meshes have a lot in common, particularly why they were built ESBs became popular in the SOA era — they managed network communications and also took care of some of the business logic They were built for the same reason we are building service meshes today — as the number of services increases, consistency and reliability across the system is needed, and a message bus/ sidecar proxy is a great way to achieve that Service meshes are different from ESBs because they are specifically built for cloud-native, microservices architectures ESBs not function well in the world of cloud computing They take on too much of the business logic from services, and slow down software development by creating another dependency and organizational silo To sum up, I would say that service meshes are the next-generation evolution of ESB technology The core motivators are the same, but the implementation is more sophisticated and tailor-made for the cloud-native era 36 Evenson: Just like ESB is synonymous with SOA such is service mesh to microservices The major difference is the scope and size of the services implemented by both ESB and a service mesh ESB is much larger in terms of feature set and back-end system support ESB typically focuses on large enterprises and industry standards and protocols whereas service meshes are lightweight enough to add value to the first few microservices. ESB, while not being on the critical path of every request The data plane is implemented by the sidecars running alongside application code For example, in a typical Kubernetes setup, each microservice instance runs in a pod next to its own copy of the service-mesh sidecar All traffic in and out of the microservice passes through that instance of the sidecar, with no hard dependencies on other centralized subsystems Talwar: ESBs were about centralized architecture, where a central piece carried all the intelligence to make decisions Over time, the central piece became complicated and lacked fitment in a microservice architecture where each team/service wants to configure, test, deploy, and scale their services at a rapid pace The new architecture of service meshes represents an inversion of the SOA pattern from dumb endpoints and smart pipes (large, monolithic, hierarchical apps) to smart endpoints (service-specific functions) with dumb pipes Gould: The goals are not all that different, but the priorities and implementation details are extremely different ESBs tend to be implemented as a centralized, single point of failure, whereas a service mesh like Conduit uses sidecar proxies to be explicitly decentralized and scalable Shkuro: The relationship between ESB and service mesh is similar to the relationship between monolithic and microservices-based applications They both serve a similar function; the main distinction is in how they are doing that ESB is a single system sitting between all other services in the architecture It provides a single point of control over every message exchange between services It also introduces a single point of failure and increases the latency of all communications In contrast, a service mesh implemented via sidecars performs the same functions but in a distributed, decentralized manner The control plane of a service mesh provides the same centralized authority of policies and routing decisions as Service Meshes // eMag Issue 63 - Aug 2018 Furthermore, it’s possible to use it in only a small part of an application, meaning that adoption can be incremental and doesn’t require total architectural lockin Finally, the service mesh is focused heavily on the operational aspects of communication and tries to avoid any real awareness of the details of the application’s business logic The goals of the service mesh are operational, not architectural or integrational InfoQ: Who in the enterprise should care about a service mesh? Is this something a typical developer should be aware of when deploying applications? Klein: The idea behind the service mesh is largely to make the network abstract to application developers Application developers still need to understand general networking concepts such as retries, timeouts, routing, etc (since they will be involved in configuration), but they shouldn’t need to know how they are implemented Thus, the typical developer should care about the service mesh because it means they can delete a lot of one-off networking and observability code and obtain a uniform, more feature-rich, and more reliable solution for free! Berg: Using service mesh and a strong cloud platform, smaller companies can create apps and features that previously only larger companies could dedicate resources towards, under the traditional model of using customized code and reconfiguring every server Cloud, service mesh, and microservices give developers the flexibility to work in different languages and technologies, resulting in higher productivity and velocity A typical developer should be aware that they are participating in a service mesh and understand they are communicating with other services in the mesh They should embrace the fact that the service mesh helps them avoid features that require complex coding, like routing decisions, because it is done at the mesh level, not in the application itself This ultimately allows the developer to be more productive The telemetry information as well as the ability to inject failures is a powerful development tool to detect problems and, ultimately, eliminate them from the application Sharma: Infrastructure and platform teams are often the folks who design and implement service meshes in software organizations It is critical for those teams and their engineering leadership to work together on the best strategy and implementation for the company While service meshes improve application developers’ productivity by decoupling network communication from the services, they should be aware of the specific service discovery and observability features being offered This will help the developers know what will work automatically and which functionality they need to customize For instance, if the service mesh is instrumented with OpenTracing, developers are guaranteed top-level observability across the system They can then choose to instrument their services with OpenTracing to get more detailed traces of bugs or performance degradations Evenson: A service mesh should be transparent to the developer and the services that it provides are treated as a feature of the platform Operators will however have an interest in service meshes as they are another piece of the stack that requires care and feeding. Talwar: One of the interesting aspects of service mesh is that it brings many diverse stakeholders together — like developer, operator, prod security, network ops, CIO, CTO, etc As for a developer, when the service mesh is done right in an org, the developer doesn’t have to write code for many common functions (ideally only business logic) and deployment into the fabric (with mesh) takes care of the functions (via policies) at runtime Shkuro: The service mesh solution is typically owned by the infrastructure/networking team A typical application developer does not need to know much about it They may need to know that to make a request to service X, they need to send it to the local port Y reserved for that service, or to send all requests to the same port but indicate the target service via HTTP header or a special API of the RPC framework Of course, in many organizations the same developer is also the on-call person for their services, which means it’s also useful to be aware of how to monitor the sidecar process in case of a problem At Uber, we have a tool that automatically gives each service a dashboard displaying metrics from many infrastructure components used by the service, including metrics generated by the sidecar process, such as request and error counts, request latency, histograms, etc Gould: The enterprise should care because it brings a layer of standardization to runtime operations, similar to how Docker and Kubernetes provide standardization of runtime operations Platform operators — the folks bringing Docker and Kubernetes into organizations — love the service mesh because it gets them out of the critical path for debugging and operating microservices Developers (and, more generally, service owners) benefit because it allows them to decouple their application code from operational logic that belongs in the runtime The mesh provides operational affordances that allow developers to move more quickly, with less fear of breaking things InfoQ: How service-mesh solutions support resiliency in terms of service retries, timeouts, circuit breaking, failover, etc.? Klein: The sidecar proxy implements a vast array of advanced features such as service discovery, load balancing, retries, timeouts, circuit breakers, zone-aware Service Meshes // eMag Issue 63 - Aug 2018 37 routing, etc on behalf of the application These features are very difficult to get right and microservice codebases are typically littered with buggy or incomplete versions of them It’s substantially more efficient to offload this type of functionality to a single entity that can be implemented once in a high-performance way and substantially vetted validate your connection timeouts without having to guess Berg: Application functionalities that are tightly coupled to the network, such as circuit breaking and timeouts, are explicitly separated from the service code/ business logic, and service meshes facilitate those functionalities in the cloud and out of the box Large-scale distributed systems have one defining characteristic: there are many opportunities for small, localized failures to turn into system-wide catastrophic failures The service mesh is designed to safeguard against these escalations by using the agility and portability of cloud tools, such as containers, to shed load and fail fast when the underlying systems approach their limits Talwar: Service meshes have two parts: a data plane and a control plane Pluggable API-driven data planes like Envoy (used in Istio) allow configuration for retries and timeouts so these can be configured and changed easily Envoy also has ability to define configuration for circuit breakers as well as coarse and fine-grained health checks for all instances in the pool for load balancing and routing away from failure/high-latency instances See here for more details This all is done in the client-side proxy (sidecar) available in the application The sidecar is responsible for forwarding a request to a service where another sidecar proxy receives the request prior to forwarding to the application When the request is being made, the proxy will automatically trip the circuit breaker, and potentially reroute traffic to another version when the upstream service is not reachable Failures may occur because of poorly set timeouts between the services A service mesh like Istio helps you avoid bad user experiences and outages from timeouts because Istio allows you to inject failures directly into the mesh, allowing you to test and 38 Evenson: The service mesh data-plane component sits in-path of all data communications across all microservices Given that placement, they are aware of the data mesh and hence can make policy-driven decisions that support resiliency features Shkuro: Many of these features would vary between specific implementations of the service mesh The techniques themselves are not new; many are still an active area of research and innovation What is special about the service mesh is that they abstract these concerns from the application code and encapsulate into a single infrastructure layer Doing so keeps the application code lightweight, and allows service-mesh developers to iterate quickly and develop best-of-class solutions for these problems For example, take the problem of failovers When a certain service in a particular availability zone experiences problems, usually the safest approach to recover is to shift the traffic to another availability zone, provided that it has enough excess capacity A service mesh can that completely transparently to the rest of the services in the architecture, by changing a few Service Meshes // eMag Issue 63 - Aug 2018 settings in its control plane To support this failover capability in every service would be a lot more difficult Gould: The single most important reliability feature provided by a service mesh is L7 load balancing Unlike L3/L4 load balancers, service meshes like Conduit are aware of per-request metadata and can help to automatically route around slow or failing instances, rack failures, etc Once these load balancers are aware of service-level objectives (usually in terms of latency and success rate), they can make incredibly smart decisions about when traffic should not be sent to a given instance The service mesh can also automatically retry requests for the application if that’s a safe thing to Note, however, that retries can actually make outages worse; you can get stuck in long-running retry loops that tie up resources and can cause system-wide cascading failures So, it’s important to parameterize correctly, e.g apply a budget-based approach to retries as we’ve done in Linkerd This dramatically improves worst-case behavior InfoQ: How does a service mesh support security capabilities like authentication and authorization? How can it help with run-time enforcement of security policy? Klein: Although most security teams would say that they want authentication and authorization between services, very few organizations end up deploying a solution at scale This is because system-wide authentication and authorization are very difficult problems! The service mesh helps greatly in this regard Authentication can be deployed relatively easily using techniques such as mTLS and SPIFFE Application/security developers need to specify policy but not need to worry about how the underlying encryption and authentication are implemented Similarly, the sidecar proxies can use authentication data derived from mTLS sessions drive authorization at the L7 routing level — e.g., specifying that /service_a can only be accessed by service A and /service_b can only be accessed by service B Berg: This stems from a few key factors A service mesh has a component that manages the certificate authority inside the mesh This authentication component is responsible for programming the client-side proxies to automatically establish trust between services in the mesh using mutual TLS (transport layer security) If developed properly, these certificates will have a short lifespan so that if a service is compromised, there’s only a small security-breach window before the certificate gets recycled, rendering the original useless A service mesh has security policies that you can program into the mesh For example, you can set up a policy that restricts inbound internet traffic to some of services in the mesh If you only want to allow inbound internet traffic to service A, all other inbound internet traffic will be rejected if it deviates to a service other than A, as the client-side proxy intercepts all inbound and outbound traffic to the applications A service mesh enforces strong identity assertion between services and limits the entities that can access a service All this is done without changing a line of the application code Sharma: Service meshes create more flexibility and control at deployment time because fewer assumptions are baked into the application code I think it would be best for the service-mesh providers in the panel to speak about their specific implementations for resiliency and authentication Evenson: The service-mesh control plane can only provide features that are inherently supported on the platform that the service mesh is running In the case of a service mesh running on Kubernetes, authentication and authorization are expressed in the service mesh and converted to the underlying Kubernetes resources where they are enforced. Talwar: Once service meshes intercept all the service-service communication, they can encrypt and strongly authenticate all communication with no developer involvement (a huge plus) and enable authorization policies for who can call whom Since all traffic is flowing through the data plane of the service mesh, ensuring encryption for all supported/tunneled protocols and allowing/disallowing egress/ ingress for each service can be enforced by the service mesh Shkuro: One great benefit of the sidecar approach is that its identity could be used interchangeably with the identity of the actual microservice, because the networking policy on the containers can be set up such that the microservice is not reachable by any other means except the sidecar process This allows moving many security concerns into the sidecar and standardizing them across the organization The authentication can be done exclusively by the sidecar, for example by terminating all TLS at the sidecar and using unencrypt- ed communication between the application and the sidecar The caller identity can be passed to the application code via trusted request header, in case it needs to perform additional advanced authorization Some simple forms of authorization, such as “only service X is allowed to access my endpoint Y” can also be moved completely into the sidecar process and controlled via centralized policies Those policies can even be updated at run time without affecting the application code Gould: Once orchestration is in place via e.g Kubernetes, the traditional network segmentation approaches to identity start to break down The service mesh makes it possible for services to regain a consistent, secure way to establish identity within a datacenter and, furthermore, to so based on strong cryptographic primitives rather than deployment topology Conduit, for example, can provide and/or integrate with certificate authorities to automate the distribution of TLS credentials for services, so that when two mesh-enabled services communicate, they have strong cryptographic proof of their peers Once these identity primitives are established, they can then be used to construct access-control policies InfoQ: What’s the on-ramp like for someone learning about and deploying service meshes today? Where are the rough edges that you expect to smoothen? Klein: To be honest, it’s still early days Successfully deploying a service mesh across a large microservice architecture is possible, but still requires quite a bit of networking and systems knowledge As I’ve written about Service Meshes // eMag Issue 63 - Aug 2018 39 extensively, a service-mesh deployment is composed of the data plane and the control plane The data plane touches every packet and performs load balancing, retries, timeouts, etc The control plane coordinates all of the data planes by providing configurations for service discovery, route tables, etc Data planes like Envoy, HAProxy, and NGINX are robust and fully production ready However, developing and deploying a control plane and associated configurations that work for an organization is actually the difficult part Envoy is a general tool that is used in a large number of deployment types This means that Envoy has a dizzying array of options that to the uninitiated can be very intimidating Unfortunately, adaptability is often at odds with ease of use On the other hand, control planes that are more tightly tied to an organization’s development practices and tooling will likely have fewer options and more opinions, and will be easier for the majority of developers to understand and use Thus, over time, I think that as microservice architectures standardize on tooling such as Kubernetes, the service mesh will be a more “out of box” experience via control plane projects like Istio that build on top of Envoy Berg: Similar to adopting a cloud strategy, a service mesh is rich in capabilities and function, but it can be overwhelming if you try to use all of its capabilities on day one You have to adopt service mesh in bite-sized portions to avoid choking For example, if you want visibility into the complexity of microservices, adopt a service mesh just for telemetry, not security or routing Start simple and grow your use of the service mesh as your needs grow 40 Sharma: Based on what we hear from the OpenTracing end-user community, service meshes are a welcome technology that make microservices more robust Currently, people are spending time understanding all the options and the more educational material (such as this article) that is out there, the better Evenson: This really depends on the service mesh One of the features of a service mesh is that you not have to change your application to support the service mesh Some rough edges are how application developers have to modify infrastructure definitions to deploy the data-plane component of the service mesh so that it can sit inpath to all data communications Talwar: Today, service meshes like Istio work seamlessly on platforms like Kubernetes but there are rough edges when using it in other platforms Another area of focus should be for users to incrementally try various parts of Istio, like just security or monitoring or resiliency without the cognitive load of understanding other parts I think more work on both these areas will make Istio more digestible and widely usable One other rough edge is well-tested performance and production hardening, which is work that is underway in the Istio project right now Shkuro: Kubernetes and Istio make deploying a service mesh fairly straightforward, but for many organizations that not use Kubernetes there is a bit of a learning curve To start with, the deployment system used in the organization needs to support the ability to run more than one container as a logical unit of a service instance (the concept of a pod in Kubernetes) Alternatively, the service mesh process Service Meshes // eMag Issue 63 - Aug 2018 can run as a single agent on the host, which solves some of the problems like routing and service discovery, but makes the other features like security and authentication impossible From some informal conversations I had with other companies, running the control plane for the service mesh is probably the hardest part The control plane needs to be deeply integrated with the rest of the architecture — e.g., it needs to be aware of service deployments in order to control service discovery, health checking, and load balancing/ failovers The Istio project is making great strides to abstract the control-plane functionality Gould: We’ve been supporting Linkerd in production for almost two years We’ve learned a ton about rough edges, both some things we were expecting to learn and often things that are only obvious in retrospect One surprising lesson was that while Linkerd is great at extremely high scale, it turns out many users would greatly benefit from a scaled-down approach optimized for simplicity rather than maximum power and maximum flexibility It should be easy to get started with a service mesh We want to reduce the complexity of managing distributed services, not add to it That insight led to our recent work on Conduit, and if https://conduit.io/getting-started/ isn’t all you need to get up and running, I’d love to know why More generally, I think the pitfall of adopting a service mesh is trying to too much at once Service-mesh adoption needs to be incremental in its criticality and in the scope of problems it solves They are a new sort of tool, and the most successful adoptions we’ve seen have been incremental Our advice is to keep it as simple as possible (but no simpler) InfoQ: Can you discuss the sidecar design pattern and what platform capabilities can be implemented using a sidecar? Klein: As I’ve discussed above, the sidecar proxy is the key to abstracting the network for applications We consider localhost networking to be reliable Using that assumption, the application only needs to be aware of its sidecar proxy, and the proxy can handle everything else (other than context propagation) Berg: The sidecar is a client-side proxy that is deployed with every application (i.e., container) A service that has deployments with the sidecar automatically enables the service with the mesh It is conceptually attached to the parent application and complements the application by providing platform features Thus, the sidecar provides a critical network-control point With this design pattern, your microservice can use the sidecar either as a set of processes inside the same microservice container or as a sidecar in its own container to leverage capabilities such as routing, load balancing, resiliency such as circuit breaking and retries, in-depth monitoring, and access control Sharma: The sidecar pattern is basically the plugin or driver pattern, but for platforms By abstracting the implementation details away from the application for networking, metrics, logging, and other subcomponents that have standard interfaces, operators get more control and flexibility in how they craft their deployment Evenson: The sidecar design allows you to manipulate the Linux runtime environment without having to change your application code Typically, a service mesh’s data-plane component is deployed as a sidecar and the routing for that network namespace on the Linux kernel is modified to route all ingress/egress via the data-plane component Talwar: The sidecar pattern is a pattern where a co-process/ container image sits next to the application and can act as a trusted partner It can be updated independently and managed by a separate team but shares a lifecycle with the application Platform capabilities that a sidecar can take on include logging, reporting, authentication, policy checks for quota, etc Shkuro: There is no strict agreement in the industry about what constitutes a sidecar pattern For example, Brendan Burns from Google would consider the service-mesh sidecar we discussed here to be an example of the ambassador pattern, because it is only concerned with how the application communicates with the rest of the world, while Microsoft Azure documentation uses a more generous definition that includes many peripheral tasks, including platform abstraction, proxying communications, configuration, logging, etc Personally, I prefer the latter definition, where the ambassador pattern is a subclass of the sidecar pattern from having to re-implement those features, potentially in multiple programming languages It is similar to breaking legacy monolithic applications into separate microservices, except that the sidecar lifecycle is the same as that of the parent service and we use them mostly for infrastructure-related functions Gould: Fundamentally, a sidecar is just another container It’s nothing magical With sidecar proxies, we’re able to manage operational logic as close to the application as possible, without actually being inside application code The entire point of a service mesh is to decouple your applications from the operational aspects of managing service communication effectively and reliably With the sidecar pattern, we can things like provide and validate identity for applications, since the sidecar necessarily has the same level of privileges of the service for which it proxies That’s the biggest difference between sidecars versus, e.g., per-host deployments In essence, the sidecar pattern recommends extracting common functionality from business applications and packaging them into another process that runs in a sidekick container It’s a well-known principle of decomposition By extracting the common pieces into reusable containers, we free the applications Service Meshes // eMag Issue 63 - Aug 2018 41 PREVIOUS ISSUES Gender, Race, Age and Neurodiveristy for Software Developers 61 This eMag draws together ideas that have been published on InfoQ over a number of years, presenting a wide range of aspects of diversity and ways that individuals, teams and organisations can create environments where diversity thrives, people are respected and outcomes are more successful 60 59 Perspectives on GDPR 62 Real-World Machine Learning: Case Studies, Techniques and Risks This eMag addresses three overlapping but distinct perspectives on the impact of GDPR - customers will be thankful, our reputation will be safeguarded, and crippling fines will be avoided Microservices Patterns and Practices Machine learning (ML) and deep-learning technologies like Apache Spark, Flink, Microsoft CNTK, TensorFlow, and Caffe brought data analytics to the developer community This eMag focuses on the current landscape of ML technologies and presents several associated real-world case studies While the underlying technology and patterns are certainly interesting, microservices have always been about helping development teams be more productive Experts who spoke about microservices at QCon SF 2017 did not simply talk about the technical details of microservices, but included a focus on the business side and more human-oriented aspects of developing distributed software systems 42 Service Meshes // eMag Issue 63 - Aug 2018 ... a service without breaking other services, ensure mutual identity verification, whitelist services using ACLs, authorize service- to -service communication, and analyze the security posture of applications. .. can improve security posture when building cloud- native applications With strong identity in the services talking with each other as well as for the origin/end-user, we can write some pretty powerful... Istio can facilitate good security practices within a microservice-based system, such as service- to -service encryption and authorization He argues that these foundational security features pave