60 seconds to a Linkerd service mesh on AKS | Azure Friday

60 seconds to a Linkerd service mesh on AKS | Azure Friday


>>Hey, friends. Installing is
Service Mesh can be daunting. It’s big, it’s scary, it’s complicated, or is it? William Morgan, maintainer of the Open Source Project Linkerd is here to show us how easy it is to run Linkerd on your
AKS cluster and even use it to debug the behavior of
your Kubernetes application. I’m learning how today
on Azure Friday. [MUSIC]>>Hi folks. I’m Scott Hanselman and it’s another episode
of Azure Friday. I’m here with William Morgan
who’s going to make it easy to do a service mesh. This is the thing I would take
weeks of my time, would it not?>>That’s right, normally.>>But you can do it in a minute.>>Well, we’ll see,
60 seconds or less, maybe.>>Okay, all your money back?>>Right, exactly. It’s open source.>>Okay. So why does the customer
want is a Service Mesh? Well, someone out there
thinking, I need a thing. Service Mesh can solve my problem.>>Yeah. So Service Mesh is in this unfortunate status now where there’s a lot of hype around
it and a lot of buzz, but it also feels very
daunting to people. It feels like it could be this
giant piece of machinery that we’re levering onto our
existing infrastructure. We’ve just learned about Kubernetes, we’ve just learned about Docker, we’ve just learned about
the Cloud Native Landscape and now this new thing
that we have to deal with. So what I want to show
you is how Linkerd, which is an Open Source project that I’m a maintainer and contributor of, can actually make this
very easy for you. I’m going to show you how on AKS, you can install Linkerd in, they’ll say, 60 seconds, maybe.>>Okay.>>Then I’ll show you
some of the features that you can do with it.>>Thereby the stack will become
less scary and more welcoming.>>That’s right.>>I love it. Let’s do it.>>Yeah. So let’s talk a little
bit about what a Service Mesh is, because I thought we have heard that. The first question we have to get to. So Linkerd itself, I’d
say it is a Service Mesh. It’s also a member project of the Cloud Native
Computing Foundation, that’s the same foundation that
holds Kubernetes and Prometheus, a bunch of other very cool projects
in the Cloud Native Stack. Linkerd have been in productions for years and we’ve got
a very healthy community, lots of activity on GitHub
and I’ve put some fun, demo, material together, which I
want to get into very eagerly. But what is Linkerd? What
is the service mesh? Why does this thing even exist? So the way I like to
think about it is there’s the value prop and then
there’s the implementation.>>Okay.>>The value prop fits into
these three categories. One of them is line of visibility. So giving you visibility into what your microservices are doing when
they’re running on Kubernetes.>>Okay.>>The other value prop
is around reliability. So when things start going
wrong, which sometimes happens, can we automatically handle that? Can we repair the communication so that application as a whole
continues to function? The third one is around security. Can we add security
primitives in here so that we can make the system
of the whole more secure? In all three cases, the real value of
the Service Mesh is that it takes that functionality out of the developers and out
of the application, and pushes it down to
the underlying platform.>>Okay. So let me see if
I can understand this. So there was a time when
I used to have to fly a consultant out to do
round robin load balancing, and that became a checkbox in Azure, and then scale out and
scale up continues to become more and more abstracted away. Eventually, Kubernetes will become the operating system in the Internet
and I will sit on top of it, and only people of
a certain age will even remember that it existed
but it will run it all. You’re saying, you’re going
to take the best practices, the best practices around visibility, reliability, security,
and you’re going to make them something that the developer
worries less about.>>That’s right.>>It’s almost an Aspect
Oriented Programming thing. You’re going to bake it
into the infrastructure, almost as it became a mesh.>>Yes, that’s right. There are two reasons why I wanted to do that. The first is so that the developers don’t have
to spend their brainpower, implementing, retry logic
and things like that.>>Yeah.>>The second is that
the platform owner, the same team who is bringing in
Kubernetes into their ecosystem, has control over these features, which are fundamentally
platform features. That’s the real value prop.>>Very cool.>>Okay. So that’s
the theoretical exercise. I can talk a little bit about data->>I theoretically dig it, but I still want to be
blown away in 60 seconds.>>Right. Okay, good.>>So with Linkerd, a big goal for us is to make
it as easy as possible, so it should just
work out of the box.>>Awesome.>>It should have
minimal resource requirements. It shouldn’t require
tons of CPU and memory.>>Do I need another cluster
to run that cluster?>>Yeah, hopefully not, right?>>Yeah, that’s fine.>>You should understand
what’s happening. That’s always the hardest part. It does add things to your system
but we want to do it in such a way that you use an operator and you can actually
understand what’s happening. Those are the goals.
If we can do that, then we’ve done a good job. So what I want to do now
is actually I want to jump into the demo and then
just show it off a bit, and then we’ll take
a step back and talk more about what’s actually
happening under the hood.>>All right, I like it.>>So I’m going to start with
my handy-dandy AKS cluster. I have everything
set up with Kubectl. So if I do Kubctl get nodes, you’ll see I have
a single node AKS cluster.>>All right.>>This is a pretty
empty Kubernetes cluster so far, and nothing up my sleeve. The very first thing
that I’m going to do is I’m actually going to
install an application. So forget about the service mesh, let’s install an application.>>All right.>>I’m going to install
one that I call emojivoto. Okay. The way I’m doing
this in true demo fashion, is I’m going to curl
this YAML manifest->>Get it from the Internet?>>-from the Internet, I’m going to apply it
straight to my process.>>Thrust it directly from there.>>Yeah, this is like
the pseudo install whatever.>>Yes.>>You can see that this manifest is creating a whole bunch
of stuff. So that’s good. It’s got an emojivoto. Namespace, it’s claiming
a bunch of services and things. In fact, I have
my Kubernetes dashboard here. Let’s just refresh this
so we get everything. We can see that if we
dropped down here, we’ve got an emojivoto namespace, and everything is up and running.>>Age, 24 seconds.>>That’s right.>>I love it.>>Yeah. We’ve got a couple services
and deployments and pods. We actually have a little pod
here called the bot. So bot is actually
going to be sending traffic through the system
the whole time.>>Cool.>>Okay. Then if we look
down at our services, we have this one called “Web-svc”. This is actually provisioning
an external IP address, so we’ll be able to actually
play around with this app.>>All right.>>Before I install the service mesh, I want to play around with the app
because the app has a bug. Okay. So what is happening
when- let’s go back, refresh.>>Get our external IP address there.>>External IP. So if I click
on this, we’ll see our app. So here, I’ve deployed an emoji
voting application to AKS.>>What an attractive app with
the color changing background.>>Thank you. So let’s pick
one. Do you want to pick one?>>I like the cat with
the hearts on his eyes.>>Okay. Love cat, all right?>>It’s called love cat?>>I don’t know. That’s
what I call that.>>I’m calling it love cat now.>>We can see the bot
behind the scenes, it’s picking a bunch->>We’re competing against your bot?>>Yeah, that’s right.>>Well, that’s acceptable.>>Well, the bot is rate limited
to one requests per second.>>We can defeat the bot.>>So as long as you can
click fast that that, man can defeat machine.>>Excellent.>>Okay. So let’s go back.>>Hang on, that donor, it’s 404.>>That’s right. So
this app has a problem, we’ve hit a 404 page.>>Okay.>>Now the reason why I like to pick this app as a demo is if we go
back to the Kubernetes dashboard, and I’m looking at
the emojivoto namespace again, I’m going to refresh it just
so that we can see everything.>>Yeah, it looks healthy.>>Everything looks fine.
Everything’s green, right? So why is that?>>404 doesn’t qualify
as a bad enough error?>>Well, sort of. Kubernetes doesn’t
actually understand anything about
application-level errors. It’s scope Kubernetes understands
and AKS will understand, “Hey, this pod crashed. It’ll understand, hey,
there’s a network partition. It’ll understand, hey, this now died and I need to provision another one. But it doesn’t understand anything about the traffic
that’s happening between the services and it don’t
understand anything about how these services
are behaving that bad.>>Right. It’s not a business
process host, its application host.>>That’s right. Okay. So
that’s the setup for this. Now, I’m going to do two things. The first thing I’m
going to do is I’m going to finally installed Linkerd. I actually have
this very nice command called Linkerd-installed.
So write out. Linkerd version is telling me I
do not have Linkerd Installs, I’ll go to Linkerd Install. Just like I did with the
application manifest, I actually, this is going
to output a manifest.>>It’s going to put out a yaml.>>Yeah, and I am going to apply
that directly to the cluster. What my AKS cluster is doing
is it’s going to set up. This is setting up
the Linkerd control plane in a Linkerd namespace and she’s creating a bunch of
a software bindings and things like.>>Grafana, Prometheus.>>Yeah, and let’s go back to
that same Kubernetes dashboard, because I’d like to be very
visual about this and see that we are slowly
bringing some stuff up. Don’t worry about the scary looking
out now. Everything is fine.>>No, it’s been 17 seconds. I’ll give you 13 more.>>We actually have
a very nice command line called Linkerd Check which will wait for these control
plane pods to come up.>>That’s actually a really nice.>>Yeah. While we’re doing this.>>So they’re waking up, that’s cool.>>Let’s take a look at
what is happening here. So in that Linkerd namespace, we can see it’s looking a
little more provisioned.>>Yeah.>>In that Linkerd namespace, you can see there’s a bunch
of different pods here, one is called Linkerd web, one’s Linkerd identity,
Linkerd controller. So different things
are being spun up. The application meanwhile is
running in the emojivoto namespace. It’s connected to each other. In fact, we can go back to the application and
we can click around.>>This is a doughnut related bug
that I’m hearing.>>Well, right. Maybe this is like
a subtle health issue, right?>>Yeah, I don’t know.
It could be a flaking.>>If you click on bacon. We’ll see it may or may not work. Okay. So Linkerd check is telling me that the control plane
is up and running.>>All right.>>Okay. So the Linkerd control
plane is now installed. In fact, we can run
the Linkerd dashboard. This will spin up
my dashboard right here, and let me bump up the font. What you might notice
here is that this looks similar to the Kubernetes dashboard.>>Yeah, it does.>>You know about emojivoto
you’re talking to Kubernetes. Your control plane is
talking to theirs.>>That’s right. Now,
emojivoto is still running. We haven’t done anything with it. There’s a second step
we have to do here, just we need to add the data
plane to emojivoto. So right now the two things
are independent. Although Linkerd knows
that emojivoto is there. If we click on this name-space
here, it’ll tell us, hey, we’ve got these pods
here, but I don’t actually know anything about
what’s happening there.>>Are you going to
inject some bit of?>>That’s exactly right, you’ve read the script ahead.>>I know, I’ve actually
built large-scale systems.>>All right.>>The only way you
could have done it.>>Great. Okay. So the very last
thing before we inject emojivoto, and start playing around
with the app itself, and see if we can debug that problem.>>Yeah.>>I just wanted to point out that the Linkerd data plane actually runs as part of
the control plane as well. So here are all the control plane
components of Linkerd, and here is, what we can see is their request per second.
Their throughput.>>All right.>>Look at their latency;
P50, P95, P99. We sure have some fancy
grafana dashboard so too, which I’ll show off.>>I can see Prometheus is working.>>Right. That’s right. Okay. So here we are application is still broken, but now we have a chance for debugging what’s going to
happen here. All right. So now, I’m going to go
back and I’m going to do. You use the right word.
You use inject. So I’m going to run a command
called Linkerd Inject.>>Are you serious?>>That’s right. Yeah, yeah.>>Did not know that.>>Yeah. Now, what Linkerd
Inject is going to do, is so the control planes
running off to the side.>>I bet you I know
what it’s going to do.>>Okay. Let’s hear it.>>Okay. So you’ve got your original
emojivoto and you can go on, you did a cube controller,
a cube kernel. Create a cube kernel.
You could go apply. You’re going to do imply
because it’s already there. Inject is going to see that YAML
go by, add some stuff, instrumenting itself, or
injecting into that YAML, and the YAML that comes
out the other side will have Linkerd involved.>>Yeah, that’s right.>>That is very clever. I like that.>>That’s right. Now,
this in practice, this is more of a demo workflow. Because this is doing
the text transformation.>>No, but still someone
who’s going to do this formally would go and read
the Linkerd documentation. They would spend more than
60 seconds in doing this, they would write the YAML
custom for their needs.>>Well, actually we can
do a little bit better. So in practice, we can do
what’s called auto-injection. So on the server side in the Kubernetes cluster
itself, we can say, hey, for this name-space, anytime
someone deploys a pod, I want you to automatically
inject the data even better.>>Even better.>>Then you have full control
over everything.>>I dig it.>>But for demo purposes.
We’ll do this text transform, we’ll take the existing manifest.>>Okay.>>We’ll do the text transform
to add a bunch of stuff.>>Right.>>I’ll show you what that is and then will reapply it to Kubernetes.>>I just saw you there, and
three deployments are now Injected.>>That’s right. Yeah.
If we look back, let’s go back to the Kubernetes
dashboard and let’s go to the emojivoto and we’ll see something interesting
that’s happening here. Okay. So what Kubernetes is doing is, it is rolling those deployments. So remember the app is still
running the entire time. The bot is running, it’s sending traffic, and Kubernetes
is rolling one pod at a time. It’s adding the Linkerd
sidecar proxies.>>So it has effectively
versioned them? Any new version of
those is coming out?>>Yeah.>>Okay.>>Effectively. So if
we look at this set of images now in our deployments
what we’ll see is. It’s not just the original app.>>Yeah.>>But it’s actually
something called Linkerd proxy and these are all
separate containers.>>Is Linkerd proxying
all traffic between all.>>Yeah, that’s right. So now, let me take a quick break to go
back to this architecture diagram.>>Excellent.>>So let me show you
what we’ve done here. So down here, this is
all the control plane at the top, and the different components,
and they all talk to each other. We actually don’t have to worry too much about what’s
happening there. In the data plane, we’ve
injected these proxies, and we have done, when we saw
that proxy and it container, what that does it actually
sets up IP tables rules, so that all TCP traffic to and from the application pod automatically
goes through the Linkerd proxy. So that way, you don’t have to do anything in the application itself.>>Now, of course, as someone
who’s built big things, I’m thinking to myself, gosh, now more work is happening. So I would assume that
Linkerd does its best to do as little as possible while
still providing value. Because if you do nothing
you can scale infinitely. So Linkerd gets in and
out in milliseconds.>>That’s right. So
these Linkerd proxies are designed, they’re actually written
in a language called Rust; very cool systems language that
gives you native code performance, but it also gives you all sorts
of memory guarantees. So you avoid a whole class
of out-of-bounds security, vulnerability, and things that like.>>All this stuff you wouldn’t
have if you had written in C.>>That’s right. Yeah.>>Okay.>>But you get
the performance of C just without the security question marks.>>Lovely.>>So Linkerd Proxy
is custom written. It’s incredibly small
and lightweight. It’s sitting at under 10 megs
of memory per proxy.>>Wow.>>Less than one millisecond of p99 latency because
the Rust is manually, has manual memory management. We can really fine tune the latency
distribution, or the Proxy. All sorts of cool things,
but from the application.>>Let’s spike the ball
on the inside.>>You don’t actually know any of that stuff
that’s happened, right?>>No, just thinking about it.>>Okay.>>I dig it.>>Okay. So let’s go back to the dashboard here.
We’ll hit Refresh. We should see that everything
is now in a happy state. So Kubernetes says, okay,
everything has been rolled. The application continues to run. The big difference now
is when we go back to the Linkerd dashboard, we now start seeing traffic
on the deployments, and the pods of our application. So again, we haven’t modified
the application at all, but all of a sudden we see on a per deployment basis
and a per pod basis, we see the success rate,
and the throughput, and the latency distribution. Okay. Do you notice
anything terrible?>>Well, clearly you clicked
on the doughnut on web, it called voting and
voting has a horrible four out of five even when it
just got worse with success rate.>>That’s right. So let’s take
a look, let’s dive in here. So we can see immediately
[inaudible] web service, which is actually
fronting the traffic has below 90 percent success rate. That’s pretty bad. So
let’s deep dive in here.>>Oh, nice application map there.>>Yeah, that’s right.
So we can actually draw this application map
from live traffic. So we don’t have to introspect
the code or anything like that.>>It’s happening in
voting and it’s rolling the errors coming all
the way back into vote pod.>>That’s right. So this web service which
is the one we’re looking in talks to other services
voting, and emoji. Emoji seems to be fine, but voting is failing. Okay. So let’s click on that one. Okay. This is a pretty
straightforward list, only has one client, which is a web service,
terrible success rate. Down here, we can
actually see a live set of calls as a sampled set of calls. Now, this application is GRPC. So we’ve got these funny looking
pads and everything is a post. But we can see every thing
here looks good except just one call here VoteDoughnut
which seems to be failing. So out of those three calls
that we sample so far, they have all failed. We can take this one step further and we can look
at this tool called.>>Someone is trying to
mess up the election.>>Yeah, that’s right.
We can look at this cool tool called Tap, which roughly, actually
show me just a pure sample. In this case I’ve
configured it to show me all requests that are going
to the voting service. One at a time and we can see. Let me actually pause it because
we’ve got a lot of requests now. We can see that most of them
have GRPC status, Okay. Some of them have
GRPC status unknown, which effectively means error. Those ones that have
unknown are VoteDoughnut.>>All right. well we’re
just about out of time here. Let’s dig into Doughnut real
quick and see why it’s broken.>>Well, so at that point, that’s when it becomes
a developer question, right? The code in
here is [inaudible].>>Have you just pointed to
the developer right now?>>That’s right.>>So we know what’s wrong and we
tell the developer to go fix it.>>Rather than saying, hey, the app is broken. I don’t understand what’s happening. You can say, hey, this specific call on this specific service
is not failed. So can you see why
the debugger was there? So that’s an example
of how you can use Linkerd’s observability
in two or three features to debug a live application.>>I love it and the bad part
about this is that I’m never going to know why the doughnut fails. I’m going to have to go and look at the source code and you can too. What’s great about this is that
you can go and put together a service mesh with Linkerd
and AKS in no time at all. Thank you so much for
spending time with us.>>It was my pleasure.>>All right. we’re learning about Service Mesh and Linkerd
today on Azure Friday. [MUSIC]

Leave a Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Copyright © 2019 Toneatronic. All rights reserved.