Technical Deep Dive into Service Mesh Networking (Cloud Next ’19)

Technical Deep Dive into Service Mesh Networking (Cloud Next ’19)


HARVEY TUCH:
Afternoon, everybody. And welcome to a
technical deep dive into service mesh networking. The session today is
going to be in four parts. Act one, the service
mesh networking, in which I will provide an
overview of this service mesh and describe the role in
which the Envoy proxy plays in the service mesh data plane. In part two, my
colleague Arun well talk about Google’s managed
control plane for the service mesh Traffic Director. This was mentioned in
the keynotes yesterday and at a fantastic
session by [INAUDIBLE],, which I recommend watching
on YouTube if you haven’t. And finally, Larry Peterson, CTO
of Open Networking Foundation, will describe some
novel use cases for the service
mesh and L2 and L3, where in telecommunications
industry, the service mesh concepts
have been applied. Finally, we’ll wrap
up with some Q&A. By way of introduction,
I’m Harvey Tuch. I’m a Googler. I work on Envoy platform
issues at Google. I’m also an open
source contributor in Envoy and a maintainer. And I’m super passionate
about Envoy and its role in the service mesh. Why do we care about the
service mesh and Envoy? Part of this is driven by this
move towards microservices. This is pretty uncontroversial. There’s industry-wide
shift that’s undergoing right now from
managing your applications like pets, to cattle,
or in the case of this slide, an adorable
basket full of puppies. And, you know, this
has brought huge wins for development velocity,
orchestration, making our systems more dynamic. But, on the other hand, there
is a perceived and actual cost in terms of complexity
at the networking level. Enter the service mesh. In the service mesh, we extract
away all this network plumbing and the nuts and bolts. And instead, we start
reasoning at the granularity of applications
and microservices. We think in terms of application
security, service discovery, and observability. How does this work? Well, one common topology
for the service mesh is the sidecar proxy. We inject into a
container running alongside the applications an
additional process of binary, which is essentially
gateway to the service mesh for that application. It contains all this the
sort of network function goodies that are required to
participate in the service mesh– from service discovery,
to authentication, authorization, rate
limiting, tracing, and so on. All these network
functions are factored out of the individual
applications so that they don’t need to be
reimplemented multiple times in each different
language, or framework, or microservice that
you might be writing. There are alternative
topologies. We won’t talk too
much about them today. But just to give
you an idea, you can also imagine
running a library which contains this functionality. And this is typified
by gRPC-LB, which is a fantastic example
of client side, look at side load
balancing, where instead of hopping through
an additional process, you instead remain in
process and you use your IPC libraries, essentially, your
gateway to the service mesh. This has advantages in terms
of orchestration and reduced performance overhead. Another example,
alternative and service mesh that is sometimes seen, is the
middle proxy configuration, in which legacy applications
in brownfields and deployments are bridged to the service
mesh via an external proxy. They may not be
running in containers. You can’t re-link these
against some library to provide the service
mesh capabilities. Envoy is the leading open
source service mesh proxy. It runs as a sidecar proxy,
as we saw a few slides ago. And we’re going to just dive
into some of the details of what Envoy is about. So a good starting place is
what is Envoy, concretely? And if we look at
its architecture its design philosophy,
its a code-based built for performance predictability
and low overhead. It’s written in c++. It’s a midsized project. It’s designed– it’s
written in such a low level language because we care
about things like tail latency and minimizing overhead
on the fast path in the data plane. We care about reducing memory
consumption– we’re running in a Docker environment. We also care about
security, reliability. Being a low level
language, it’s also vital that you then apply
best practices when it comes to software engineering. We have incredibly
high test coverage. I developed a community which
has borrowed its best practices from companies like Google,
Lyft, in the industry, to produce reliable, and
stable, and secure software. Envoy is not just a proxy. And, in a few slides
down, I’m going to talk about Envoy
as a platform. So keep that in mind. It does have a bunch of sort
of goodies built into it, including advanced
load balancing, and health-checking algorithms,
support for modern protocols first– so it doesn’t have
too much legacy cruft from ancient network protocols– and integrates well with
other folks and players in the service mesh
infrastructure space– systems such as Prometheus,
or Datadog, or Zipkit. So we’ve covered,
like, what Envoy is. And the question is
like who is Envoy. What is this community? Who’s behind it? Who uses Envoy in production? This slide sort of captures some
of the contributions to Envoy at [INAUDIBLE] by
number of lines of code. Envoy’s providence is Lyft. It was created by the project
founder Matt Klein at Lyft to solve the service mesh
networking needs of Lyft. it was open source roughly
two and a half years ago. Since then, there’s been a
significant uptake in interest, in particular from Google. And this long tail
of developers– Google’s the green
bar you can see here that’s second from the left. Lyft is the pink. And there’s a huge number
of companies which make our community so awesome today. They contribute all kinds of
integrations and improvements to the core Envoy proxy. These are traditional
enterprise companies. They’re Sas providers. And they’re startups. If we look here at just the
lines of code over time, clearly Envoy’s
become much larger. But it’s also become
much more diverse in terms of its contributors. We start on the left,
back in the end of 2016 with just Lyft in
that sea of pink. In the lot 2017 in the middle,
you can see Google on the left were the main
contributors on the right. The community is now
pretty well balanced between this long tail of the
rest of the community, Google, and Lyft. Envoy is used much more widely
in production and justice development in the
community though. You can see here some
familiar brand names you’ll find on the
Envoy landing page for folks who are running
on volume production and are proud to say so. There is also a huge
number of other folks who are using Envoy in
these kinds of roles, for example Sas or cloud
infrastructure providers, and cloud service providers
who might be using it either directly. Or who might be using it because
is embedded in other products. Envoy is, for example,
ships with Istio. Anyone using Istio is,
by virtue, using Envoy. Why is Envoy being so successful
and being used so widely? Well, part of this is
because Envoy is a platform. It’s versatile. It can meet the needs
of various use cases and be reconfigured
as needed to be. This platform-like
nature, arises from two aspects of Envoy. First it’s control
plane, which is built around a set of
open gRPC-based protocols called xDS, allowing
Envoy to integrate first with well-known control
planes like Istio and Traffic Director, but also with
custom in-house network infrastructure that
might need Envoy to fit within its existing
configuration pipeline. Envoy’s data plane
is also extensible. That is, all the traffic
flowing through Envoy can be interposed on by
extensions that you can write and that the community
has written to add in all kinds of functionality. This slide lists
about roughly a dozen or so-plus extensions
that exist today in Envoy. These range from the meat and
potatoes of its TCP and HTTP filters. These filters
provide the ability to do things like transcoding
between different protocols, let’s say gRPC and JSON rest,
or to apply additional control over the stream of traffic,
for example rate limiting or fault injection. There’s also sort
of advance extension features for things like
replacing transport security. We actually have our current
PR for introducing an extension point for hardware
security modules. Envoy’s extensions go
beyond just a data plane. It’s data extensions
can we exist for things like monitoring. So you can integrate with
pretty much any monitoring system that exists. Envoy also ships with
batteries included. So it already has integrations
with many of the systems that you might care about,
for example Prometheus. Today, extensions are written
in c++ primarily and you statically link them
against the Envoy binary. But there’s actually a
very promising trend, which we’re seeing right now, in
which the extensions are being offered in dynamic languages
like Lua injected in runtime to Envoy via the control plane. This makes unvoiced
programmable data plane. And, in the future,
we anticipate that WebAssembly
support will arrive. And we’ll be able
to basically write extensions in any
language that you want, and deliver it’s dynamically
to a running Envoy process. Its control plane is
built around this alphabet soup of protocols,
which we call xDS. These sub protocols
allow things like service discovery, load balancing
assignments, load reporting, delivery of route configuration,
health checks, and so on. They’re canonically
defined and built around gRPC streaming
and protocol buffers. And we have REST,
YAML and JSON variants for folks who prefer
those technologies. They have a range of
consistency models depending upon the use case
and how your control plane and management serve
as a configured. We view these protocols,
though, as not just being Envoy’s control plane, but
being a sort of a lingua franca for network proxies in general. That is, that they are
the universal data plane API for L4 and L7. Once you have sort
of an ecosystem, sort of management service
and control plane which are capable of
running Envoy’s, you can run pretty much anything
that implements these APIs. This might include,
for example, gRPC-lb, which is adopting the xDS APIs
for slient side load balancing for those clients [INAUDIBLE]
be service meshes. It could include in the future
things like hardware load balances or other
software proxies, which choose to adopt
the Envoy’s API. And there is an active
effort right now in the community on making
these APIs structured and versioned appropriately
to enable splitting off concerns which are
specific to Envoy from their universal nature. Another way to
conceptualize this is that this is really the
analogy of open flow of P4 for L4 and L7. So this is it. With open flow, you
can reconfigure, let’s say, IP routing via an
open flow controller with SDN. You can do the same with Envoy
and its data plane via xDS. I’d like to just
finish off briefly by touching on the relationship
between Envoy and Istio. This is one of the most
famous associations that exists for Envoy. And it functions pretty much
as a standard sidecar proxy. This is also an interesting case
study in Envoy’s extensibility. First, it’s managed by the
Istio control plane called pilots over any xDS protocols. And it also features a number
of specific Istio extensions that are linked against
the Envoy binary to allow it to add runtime
interposed on network traffic and funnel them to via hard
PCS to the mixer service, which is part of the Istio runtime. And there, they can enforce
additional policy checks, and logging, and
telemetry, and so on. Many folks are comfortable
running Envoy themselves, particularly
sophisticated users who need to use it in very
custom applications, and Istio as well. Other folks would prefer
that their Envoys and Istio service mesh control planes
be managed because these are– and I think this really leads
into Traffic Director, which is Google’s managed control
plane for both Envoy and Istio. So my colleague Arun is going
to talk a bit more about Traffic Director now. So I’ll hand it over to him. ARUNKUMAR JAYARAMAN:
Thanks, Harvey. I’m Arum, a software
engineer at Google. So Harvey talked about– Harvey talked about
Envoy, and data plane, and xDS API as universal
data plane API. The xDS API is often coupled
with the sophisticated control plane to drastically reduce
the operational complexity of deployments. In that context, I would like to
introduce Traffic Doctor, which is a globally-deployed, fully
GCP-managed xDS server control plane to serve xDS-compliant
planes like Envoy. Traffic Director SOs
and network policies and globally load balanced
and centrally held check end points to the
xDS client via the xDS API. For global load balancing,
Traffic Director has a full view of the load
originating from every GCP zone and the capacity
available in every GCP zone for the service deployed. And with that knowledge
and information, Traffic Director has the
ability to route lines to the closest
zones with capacity, and the proximity is measured
based on network latencies. And when the region is
at capacity, or if there are failures in the
region, Traffic Director has the ability to route
clients to the closest zones in the adjacent
regions, or nearby regions. The load information and
the capacity information in Traffic Director is fed
into a autoscaler as well, so this allows autoscaler
to do a traffic demand driven autoscaling. And this autoscaler
scales up the instances. And Traffic Director,
in the interim, has the ability to route
client request to nearby zones, thereby efficiently serving the
service while the autoscaler scales up the instances. Traffic Director also performs
centralized health checking. So this eliminates the need
for every client in the network to health check
every other back-end, so we call it as
NSquare health checking. And this could be a significant
source of network traffic in your deployment. Finally, the xDS API is
observed by Traffic Director via Google front-end. This is the same
front-end that is used to serve other major
Google services, like Gmail, such and such. So the Google front-end
has the intelligence to route the xDS client
request to any Traffic Director instance globally
that has capacity. And independent of which
Traffic Director instance serves the request,
the load balancing, the global load
balancing for the client is always based on
the client’s zone. So I want to dive
a little bit deeper into the API model for
the Traffic Director. The GCP data model is
the same as HTTPS LB. In fact, under the
hood, Traffic Director uses the same
infrastructure as HTTPS LB for global load balancing and
centralized health checking. So we have a new load
balancing scheme called internal self managed that helps
direct a configuration targeted for Traffic Director. The global forwarding
rule can, in the context of Traffic Director,
can be thought of as a reference to a
configuration for one entity. For example, it may refer to a
configuration of a Kubernetes service. And if the user wants
to create a mesh, that is a combination of
multiple configurations, the user can do that by
combining multiple forwarding rule configs with the
same VPC network name. So in some sense,
the VPC network name acts as a logical identifier
for the mesh name. The URL map has the request
routing, the rules and action, pointing to the
service represented by the back-end
service data model. The back-ends are added
to the back-end service. The back-ends could either be
a VM represented by an instance group or data model, or
it could be the container IP ports represented by a data
model called network end point groups. In orchestrated deployments,
like Kubernetes and GKE, as the end points get scheduled,
as the ports get scheduled, the network end point– a controller called
net controller is responsible for keeping
the network end point group updated, so that the
back-end service has the proper reflection off end
points to service mapping. So this allows Traffic Director
to homogeneously manage the VM and the
container deployments. So far we’ve talked about the
Traffic Director beta features. So now I would like to introduce
Traffic Director’s traffic control features that are alpha. Traffic control is
a set of features that helps you control the flow
of traffic in your deployment. It allows you to
specify matching rules for your incoming
request, and allows you to set up actions
for the matched request. And then at a service
level, per service, it allows you to set
traffic policies that helps in traffic
shipping and traffic routing the incoming request
to your target destination service. Not necessarily related
to traffic control, but as the number of
configurations grow, there may need to test your
configuration through a test pool of [INAUDIBLE],,
or perhaps your mesh has [INAUDIBLE] of
different config scope. And in such cases,
you can potentially use a configuration
filtering option where you can target
the configuration to a subset of
[INAUDIBLE] identified through their metadata. So let me dive a little deeper
into each of these features. For request matching
for a set of rules to specify on what to match
for the incoming request, you can set up a host match,
you can set up a path match. And there are variants like
prefix match, suffix match, and regular
expressions and so on. Or you can set up matches based
on anything in your header, like in cookies, user
agents and so on. And, once a match is found,
you can perform certain actions on the match request. The standard actions are
redirects, rewrites, header transforms, or
sending the traffic to a particular service. But I want to talk about
a few interesting use cases and the actions. So let’s say you have a binary
that you want to canary, or you have a monolithic binary
and you are starting to have– modernize your application by
containerizing it and going to a microservice model. You can use an action– or traffic action called traffic
splitting, where you can set up a small percentage of traffic
to go to your new deployment, or the new binary,
and over time, as you qualify the binary,
as you get more confidence, you can send more and more
traffic to the new binary, enabling a smooth transition. So let’s say you
want to understand the resiliency of
your microservice, then one of your
core dependencies starts responding with errors. You can set up an action
called fault injection where you can specify
thresholds and type of errors that your service
would respond with. And using that,
you can figure out how your deployment behaves in
the presence of those errors. So even at Google, we have needs
where we have test [INAUDIBLE] service. And instead of using synthetic
or simulated traffic, sometimes there is a need
to have production traffic and test out your test binaries. To do that, you can do something
called traffic mirroring, where you can use the traffic control
features to setup traffic to be mirrored to
a shadow service without impacting your
production deployments, and the shadow service could be
your test service and your test bed helping to qualify your
test binary with the production traffic. So we talked about Traffic
Director’s global load balancing. So the global load balancing
helps pick out your zone with capacity. However, there may
be needs for the user to set up more fine grained load
balancing schemes, so we allow a per service load balancing
configuration that influences how a back-end is
picked within a zone, while the global load
balancing picks out the zone with capacity. So this two-tiered
load balancing enables you to have options like
in a round robin or beta round robin, or affinity and such, and
fine tune your load balancing schemes while still leveraging
the global load balancing offered by Traffic Director. So onto the final slide. So as a service
owner, if you would want to have a good control
of the volume of connections open from a client
to your service, you can set up something
called circuit breakers, which is a way to apply back
pressure on the client and have the clients fail
fast rather than overwhelm your clients– or sorry, overwhelm
your back-ends. Oftentimes, it
takes a small amount of time for the control
plane to turn around with a globally
optimal decision. However, if you want a better
resiliency in your data plane, where if a particular
back-end instance goes down, or a VM goes down, and if you
want to respond to that quickly before the control
plane can detect the state of– our change in the
health state of the back-ends, you can set up something
called outlier detection, where in the data
plane you identify unresponsive back-ends, and
[INAUDIBLE] those back-ends until the control
plane turns around with a globally
optimal decision. So with that, I’m going to
pass the presentation over to Larry who’s going to talk
about an interesting mesh use case and modernization
of telco infrastructure. LARRY PETERSON: I’m good. Thank you. [APPLAUSE] So now that you
understand service meshes, I’m going to break it,
or at least challenge it with a different use case,
which is the telco use case. So you need to have a
picture in your head, which is that it’s a multi-cloud
world, and, of course, we know there are a lot of
data center public clouds. You probably also know that
there are intermediate clouds. They live within the telcos. They live in exchange points. And the current push
is, I’m sure you’re aware if you’ve been following
the hype, is towards the edge. So the question
is, what is exactly going to happen here as
we get out to the edge? And how are we going
to do service meshes across this entire thing? Now, this particular picture
is a little bit aspirational because it shows a
really nice clean edge. The truth of the
matter is there’s a lot of legacy hardware
at the edge right now. Now, of course, it depends on
whose edge we’re talking about, but I’m going to talk about
the telco edge for a minute. So these are the central
offices of the telco, it’s the head-ins of
the cable companies. And if you were ever to
go inside of one of those, you’d think you had
walked into a museum because it’s full
of hardware that goes back many, many years. As long as it’s generating
money, they keep it running. You might find 300
different kinds of hardware devices inside
of a telco central office. And so it’s clearly in
the telco’s best interest to try to modernize
that infrastructure. And what they’ve
been doing is they’ve been trying to commoditize,
virtualize, and disaggregate the access network. And by access
network, I’m talking about the cellular
network and the fiber to the home, the passive
optical networks. And so they’re
working hard at this. They basically are trying
to adopt the same technology that’s back in the
public cloud and apply it to these edges full of
these specialized devices. There’s a project at
the ONF that we’re involved with called CORD,
which is an umbrella, but it also is a
particular implementation, which is an acronym
for central office we architected as a data center. But the idea is to take
those commodity virtualized type technologies and
bring them to the edge. And so that’s well in progress. A lot of disaggregation
has happened. It’s being put back
together again. And that does, in fact,
get us to the point that I’m going to
call this thing the Access-Edge, because
it’s a point where traditional conventional
cloud technology and access technology coexists. And so that’s the– this is called the Access-Edge. Now, of course, from
a telco point of view, they are interested in offering
services to subscribers. So we have to think at a little
bit finer granularity every now and then in terms, not of
a service mesh but as a service chain. And in the near future,
that service chain is going to be distributed
across the Access-Edge, some intermediate telco clouds,
or internet exchange points, all the way back up
into the data center. And it will, in fact, go
onto the premises as well. So there will be an on-premises
aspect to the edges in addition to the Access-Edge. But one of the unique things
about the access network is that it also
supports mobility. And so these service
chains need to be able to migrate from one
edge to another, which, as you move functionality
closer to the edges, it means that that functionality
has to move as well. So if you pull back
for a second and think about the earlier generations
of the cellular network 2G, 4G– to 4G, it’s basically mobility
of the broadband access. But now if we’re going to move
functionality to the edge, we have to move the
functions as well. That doesn’t mean VM
migration, but somehow I have to support the fact that
you had something running on your behalf at one
edge and it may now need to run another edge. So, in effect, the cloud
becomes mobile as well as the broadband connectivity. So this is the big picture
of what the telco world is after, and that is to be
part of this end-to-end cloud environment. So they had to start
from the beginning. You all know microservices
because you’ve been breaking your
monolithic applications down for a long time. The telco industry is just
now trying to do that. So I wanted to spend just
a couple of minutes talking about that, because it’s
after you break it down that you get to put it back
together in a service mesh. Well, like I said,
these central offices are full of proprietary
closed bundled hardware. And it’s another alphabet soup. And if you know any of
these, good for you. But in the access
networks, we’ve got PONs and RANs and
eNodeBs and BNGs and so on. Not important what
those all are. It’s the various boxes– and again, they’re monolithic
closed pieces of hardware, and they’re being disaggregated. And so part of
that disaggregation is to split the control
plane from the data plane. This is standard SDN practices. Part of it is to break the
monolithic pieces of code into microservices. So that’s what the rest of
the cloud industry is doing. But the other thing
that’s kind of interesting here is, because,
first and foremost, there are communications
services running, sometimes you want to
take those microservices and drop them down into
the switching fabric. And so because the data plane
in modern white box switches is also programmable,
I can take what used to run in a
microservice and move it down into the forwarding plane. Now we’re talking
hardware forwarding plane. So these are all aspects
of disaggregation. It’s not just
microservices, it’s programming the forwarding
plane in the switches as well. Well, let me try to make
it slightly more concrete. So this is now a service mesh. Without going through the
details of how you break down any particular bit
of hardware, this is a service mesh for
the cellular use case. And, again, there’s
a bunch of acronyms. What’s important about
the color coding here is that everything in blue is
a function that’s implemented in the switching fabric,
everything that’s red is a function that’s
implemented in a container, in a microservice. The solid lines are data paths. Packets actually flow over
on behalf of end subscribers. The dotted lines
are control paths. And this is clearly
an opportunity to bring traditional service
mesh technology to bear, so we could run Onvoy on the
video archive and the CDN out at the edge. The other thing to note is
that this is, in fact, also, multi-cloud. So bits and pieces of it. And they can be configured
in different ways with bits and pieces. But here, will, in fact,
run at the Access-Edge, and pieces of it will run back
in, perhaps, a commodity cloud. So this is a
configuration that we’ve been working on in conjunction
with the service mesh world and the ONF to
try to demonstrate this in the cellular case. And by the way, this is just
representative in the sense that we assume there will
be other edge services. And by the way,
those services are the ones that need high
bandwidth, low latency, and deal with a scalable
number of internet of things, or whatever autonomous vehicles
are out there at the edge, and there will, of course, be
multiple cloud services running back in the data center as well. So we’ve blown these
things into bits. Now we have to put them
back together again. And this is where something
like a service mesh is an important tool. But operationalizing
this disaggregated system is a little bit different than
simply connecting together a bunch of microservices,
because there are other elements in here. So I wanted to walk through
that for just a few minutes. So I think of this as
the disruptor’s dilemma. And this is, in fact,
the critical problem that the telco industry
is facing right now. On the one hand, they wanted
to disaggregate everything because that’s what
spurs innovation. I’ve got smaller pieces. It’s not tightly bundled and
controlled by the vendors. That is, in fact,
the value proposition of open networking and SDN. On the other hand,
you can’t deploy a bunch of disaggregated
components, you have to integrate
them back together again. Now, in the telco world, because
what they’re used to doing is dropping in these
bundled solutions, they have a certain
mindset as to how that integration happens. Basically, they take the
disaggregated components and they repackage
them until they’re a monolithic solution again. And then that’s what
they’re prepared to drop in. But that kind of
defeats the purpose. You’re not doing
continuous integration of the individual
components if you fall back to that kind of hard
coded integration. And so the key focus here
is that they also automate the integration process. And so that involves
a declarative intent. I basically– and this is very
much in the spirit of service meshes– I want to lift the operational
parameters out of the services and up into a service
control plane leaving behind the implementation
of the service proper. So I separate the
development concerns for the operational
deployment concerns. I want to avoid those hard coded
dependencies between the pieces that I’ve just carved off. Second and consistent with that,
I want a centralized authority. I don’t want to go touch 80
pieces of code every time I make a change. I want to centrally
declare this is my intent, and then make it
so everywhere else. Otherwise, I’m going to end
up– instead of managing 300, I’m going to be managing
30,000 different microservices and disaggregated components
in the central offices. That’s not a good place to be. The third point is we have to
be implementation agnostic. And this is really
critical, because it’s not just microservices,
there’s going to be functionality implemented
in Brownfield hardware devices for some
time, there are going to be functions implemented
in virtual machines, there are going to
be microservices, and there’s going to be
functions that are implemented in the switching fabric. And so we can’t bake assumptions
into our service control plane about where things
are implemented. So the solution
that we have come to as of now, and
what we’re deploying in trials within the
major carriers that are partners of the ONF– just two words about the ONF. Our partners are our
network operators, and that’s a broadly
defined term. Google’s a member, AT&T,
Comcast, Deutsche Telekom and so on. So are the ones that
want to see this happen, and the set of vendors
that are supporting them to try to make it happen. But in that setting, we are
putting together solutions, and we have a service control
plane that’s called XOS. And the important thing
to take away from it is that it’s basically managing
a service data plane that has a collection of different
disaggregated components that it’s managing,
some of those are legacy virtual network
functions running in VMs, some of them are horizontally
scalable microservices, and some of them are
SDN control apps, and others are possible as well. They could have been
legacy SAS for that matter. And in the very
declarative way, you tell, you give XOS a schema
that says, this is how I want all of
these parts glued together in a logical space, make it
so done in the data plane. So it’s a very consistent
model with what happens with service meshes. Now, having– pause it at that. Let’s step back for a second. And hopefully you recognize
the parallels here. It’s the same story,
different level. And so the parallels
are at L7, we have a programmable data plane. It’s programmed in proxies. In L2, L3, it’s programming
A6 in the switches. We have an API, it’s
xDS in the L7 case. It’s either open
flow or P4 runtime, both happen in the CORD case. In the control plane, we
have something like Istio. In the CORD case,
we have something like ONOS, which is an open
network operating system. So think of whatever your
favorite SDN controller is and substitute it there. By the way, you’ll
notice I left out L4. L4 could be solved in
either one of those. I didn’t want to
put it in either one as the absolute right
answer, but L4 is clearly where the two meet. So if you apply that
basic mapping of the world back on to where we are
with the service mesh, you find an obvious parallel
between ONOs and Istio. And Onvoy is a
way of programming the data plane in the same
way that we can use P4 runtime to program
the switching fabric, and, therefore,
affect the data plane. Now, that kind of leaves
us in a little bit of a struggle as to what
we call these things, because I’ve just added
an extra level here. So this is a control plane. Does that make this
the management plane? That is certainly
one interpretation. But the interpretation
that I prefer is to say that we have
a network control plane and we have a service
control plane. And the difference is, perhaps
subtle, but perhaps important, which is at the service level,
we do also keep track of state for the services that are
inside of the gray boxes at the bottom. And that’s really critical
because we’re not just managing a set of microservices,
or managing the functionality on behalf of subscribers. And this actually gets us to
one of the challenging problems in the telco case, which I
think has some applicability to service meshes in general. Which is that, if this
were a service mesh, and that’s just an arbitrary
one, except, again, I’ve color coded it, so that there
are microservice based services and there are switch
based services. Then, I have, on behalf
of different subscribers, or classes of
subscribers, something called a service chain,
which is an instance of each one of those services. Let’s assume they’re
multi-tenant services, on behalf of a particular
subscriber, that gives me a path through the
service mesh on behalf of that particular subscriber. And I have another
service chain, perhaps it’s different,
that runs on behalf of another subscriber. So in the networking world– in the access network
world, this is sometimes called network slicing. But I can give you some
guaranteed service, and I can give you customized
functionality on a subscriber by subscriber basis, or
perhaps on a subscriber class by subscriber class basis. This is really critical
for monetizing the edge. As you might imagine, it’s
a pretty important thing. But as we roll out– as
the telcos roll out 5G, the ability to control at
that kind of granularity, and not only provide
control at that granularity, but also diagnostics
at that granularity, and troubleshooting
at that granularity, is a critical requirement. So managing runtime
workflows, it’s not just about configuration parameters. It’s not just saying, here’s
how I want you to behave, go for it. It’s a runtime
operation as well. So the key takeaways. First of all, the access
network is being cloudified. It is being reduced to commodity
software, virtualization, disaggregation. The integration challenge
is the hard one, and that’s the same
challenge that service meshes are targeted at. Some of the specifics are
a little bit different, but at the end of the day,
these service meshes, I believe, will span both access
networks on premises and back into the public cloud. Second big point is that
access networks will be part of the multi-cloud. There is a really, really
interesting question here as to who’s
going to own the edge. And, in fact, if you look at
the two counter industries here in terms of
the cloud clearly moving towards the edge in
terms of the incumbent service providers are
already at the edge with the footprint in
the central offices, they’re clearly trying to turn
that into a small edge cloud and move back upstream, so
there is a collision course happening here. I have no idea how the
dust is going to settle, but I do believe access
technology is going to be in a critical part
of that future multi-cloud because it’s going to be a
critical part of the edge. And that’s simply because
of the capabilities that 5G will
eventually bring us. There’s a whole lot of
value that’s potential there above and beyond just
thinking everything at the edge is
going to be Wi-Fi. The sub-points
given to that, this is certainly going to be part
of the telco infrastructure, but I think it’s more
importantly also going to be democratized in part of
the on premises infrastructure at the same time. There is an
unlicensed band that’s going to be available
for 5G in the US, and it’s starting
to happen in Europe. Putting these capabilities
into manufacturing plants, parking lots. And anytime you try to
automate drones, or whatever, that’s going to be a
critical part of it. And therefore, and it’s because
of this intrinsic support for mobility, low latency and
scalable number of edge devices that this is going
to be critical. The final bulletin
is, I do believe that service meshes will be
the unifying abstraction. We already are
making good progress in two different settings. The telco one is a
slightly different one and a slightly newer one, but I
think these are going to merge. We clearly need to model
services and not just service– and be agnostic about the
surface implementation. When I say service in this
room, you immediately probably think Kubernetes
service, but we have to think more broadly than that. Second, support for fine
grain workflows at one time is going to be a
critical requirement. And there are clearly
parallels between what we’re doing at L2, L3, and L7. We need to figure out– I think, in fact, it
probably comes back to, what is the language
with which– or the API with which I control
the data plane no matter what level I’m at. Because we’re going to find that
functionally moving up and down the stack for performance
reasons and for generality reasons all the time. And that’s where we need
to get ourselves to. [APPLAUSE] [MUSIC PLAYING]

Leave a Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Copyright © 2019 Toneatronic. All rights reserved.