class: title, self-paced Week 3 Part 2:
K8s Security for Apps. Container Implementation and Workflows
.nav[*Self-paced version*] .debug[ ``` ``` These slides have been built from commit: 44d41b6 [shared/title.md](https://github.com/BretFisher/container.training/tree/tampa/slides/shared/title.md)] --- class: title, in-person Week 3 Part 2:
K8s Security for Apps. Container Implementation and Workflows
.footnote[ **Slides[:](https://www.youtube.com/watch?v=h16zyxiwDLY) https://chicago.bretfisher.com/** ] .debug[[shared/title.md](https://github.com/BretFisher/container.training/tree/tampa/slides/shared/title.md)] --- ## Prep: Things to do before we get started 1. Open these slides: https://chicago.bretfisher.com/ 2. Get a cluster IP. Teams has a spreadsheet of IPs. Pick one and add your initials. 3. Access your server over WebSSH (https://webssh.bret.today) - username: k8s | password: training 4. Your cluster has two nodes. Test ssh from node1 to node2 with `ssh node2` etc. **Note** - This is hands on. You'll want to do most of these commands with me. - Everything is take home (except for the server 😉). We'll get to that later. .debug[[logistics-bret-security.md](https://github.com/BretFisher/container.training/tree/tampa/slides/logistics-bret-security.md)] --- ## Introductions - Hello! I'm Bret Fisher ([@bretfisher]), a fan of 🐳 🏖 🥃 👾 ✈️ 🐶 - I'm a [DevOps Consultant+Trainer], [Open Source maintainer], and [Docker Captain]. - I have a weekly [DevOps live stream] with guests. Join us on Thursdays! - That show turns into a podcast called "[DevOps and Docker Talk]." - You can get my weekly updates in email by following [my Patreon page]. [@bretfisher]: https://twitter.com/bretfisher [DevOps Consultant+Trainer]: https://www.bretfisher.com/courses/ [Open Source maintainer]: https://github.com/bretfisher [Docker Captain]: https://www.docker.com/captains/bret-fisher/ [DevOps live stream]: https://www.youtube.com/channel/UC0NErq0RhP51iXx64ZmyVfg [DevOps and Docker Talk]: https://podcast.bretfisher.com/ [my Patreon page]: https://patreon.com/BretFisher .debug[[logistics-bret-security.md](https://github.com/BretFisher/container.training/tree/tampa/slides/logistics-bret-security.md)] --- ## Logistics - The training will run for 3 hours each day, with Q&A before and after. - We'll do a short half-time break. - Feel free to interrupt for questions at any time on voice or Teams chat. - *Especially when you see full screen container pictures!* .debug[[logistics-bret-security.md](https://github.com/BretFisher/container.training/tree/tampa/slides/logistics-bret-security.md)] --- ## Exercises - At the end of each day, there is an exercise. - To make the most out of the training, please try the exercises! (it will help to practice and memorize the content of the day) - We recommend to take at least one hour to work on the exercises. (if you understood the content of the day, it will be much faster) - Each day will start with a quick review of the exercises of the previous day. .debug[[logistics-bret-security.md](https://github.com/BretFisher/container.training/tree/tampa/slides/logistics-bret-security.md)] --- ## Limited time signup for my video courses - I make bestselling Docker & Kubernetes courses on Udemy (nearly 300,000 students). - **As part of this workshop, you get free lifetime access to all of them!** - **But you must "buy" each course with the coupon before the coupon expires.** - Use the coupon code `CHICAGO22` to get the courses [in this list](https://www.udemy.com/user/bretfisher/). - Details will be emailed out to you as a reminder at the end of this workshop. .debug[[logistics-bret-security.md](https://github.com/BretFisher/container.training/tree/tampa/slides/logistics-bret-security.md)] --- ## Accessing these slides now - We recommend that you open these slides in your browser: https://chicago.bretfisher.com/ - This is a public URL, you're welcome to share it with others! - Use arrows to move to next/previous slide (up, down, left, right, page up, page down) - Type a slide number + ENTER to go to that slide - The slide number is also visible in the URL bar (e.g. .../#123 for slide 123) .debug[[shared/about-slides.md](https://github.com/BretFisher/container.training/tree/tampa/slides/shared/about-slides.md)] --- ## These slides are open source - The sources of these slides are available in a public GitHub repository: https://github.com/bretfisher/container.training - These slides are written in Markdown - You are welcome to share, re-use, re-mix these slides - Typos? Mistakes? Questions? Feel free to hover over the bottom of the slide ... .footnote[👇 Try it! The source file will be shown and you can view it on GitHub and fork and edit it.] .debug[[shared/about-slides.md](https://github.com/BretFisher/container.training/tree/tampa/slides/shared/about-slides.md)] --- ## Accessing these slides later - Slides will remain online so you can review them later if needed (let's say we'll keep them online at least 1 year, how about that?) - You can download the slides using that URL: https://chicago.bretfisher.com/slides.zip (then open the file `workflows.yml.html`) - You can also generate a PDF of the slides (by printing them to a file; but be patient with your browser!) .debug[[shared/about-slides.md](https://github.com/BretFisher/container.training/tree/tampa/slides/shared/about-slides.md)] --- ## These slides are constantly updated - https://container.training - Upstream repo https://github.com/jpetazzo/container.training .debug[[shared/about-slides.md](https://github.com/BretFisher/container.training/tree/tampa/slides/shared/about-slides.md)] --- class: extra-details ## Extra details - This slide has a little magnifying glass in the top left corner - This magnifying glass indicates slides that provide extra details - Feel free to skip them if: - you are in a hurry - you are new to this and want to avoid cognitive overload - you want only the most essential information - You can review these slides another time if you want, they'll be waiting for you ☺ .debug[[shared/about-slides.md](https://github.com/BretFisher/container.training/tree/tampa/slides/shared/about-slides.md)] --- name: toc-part-1 ## Part 1 - [Clone the workshop repository](#toc-clone-the-workshop-repository) - [Start some K8s apps](#toc-start-some-ks-apps) - [Docker Desktop 4 Windows Sec Update](#toc-docker-desktop--windows-sec-update) - [Restricting Pod Permissions](#toc-restricting-pod-permissions) - [Bret's container security advice](#toc-brets-container-security-advice) - [Controlling resources in Kubernetes](#toc-controlling-resources-in-kubernetes) - [Resources in Linux](#toc-resources-in-linux) - [Defining min, max, and default resources](#toc-defining-min-max-and-default-resources) - [Namespace quotas](#toc-namespace-quotas) - [Limiting resources in practice](#toc-limiting-resources-in-practice) - [k9s](#toc-ks) - [The Kubernetes dashboard](#toc-the-kubernetes-dashboard) - [Security implications of `kubectl apply`](#toc-security-implications-of-kubectl-apply) - [Workflows: code pull requests](#toc-workflows-code-pull-requests) - [Workflows: container deployments with GitOps](#toc-workflows-container-deployments-with-gitops) - [Next project steps](#toc-next-project-steps) - [It's a wrap](#toc-its-a-wrap) .debug[(auto-generated TOC)] --- name: toc-part-2 ## Part 2 - [(Extra security and advanced content)](#toc-extra-security-and-advanced-content) - [Pod Security Admission](#toc-pod-security-admission) - [Network policies](#toc-network-policies) - [Authentication and authorization](#toc-authentication-and-authorization) - [Operators](#toc-operators) - [CI/CD with GitLab](#toc-cicd-with-gitlab) .debug[(auto-generated TOC)] .debug[[shared/toc.md](https://github.com/BretFisher/container.training/tree/tampa/slides/shared/toc.md)] --- class: pic .interstitial[] --- name: toc-clone-the-workshop-repository class: title Clone the workshop repository .nav[ [Previous part](#toc-) | [Back to table of contents](#toc-part-1) | [Next part](#toc-start-some-ks-apps) ] .debug[(automatically generated title slide)] --- # Clone the workshop repository - We will clone the GitHub repository onto our `node1` - The repository also contains scripts and tools that we will use through the workshop .lab[ - Clone the repository on `node1`: ```bash git clone https://github.com/bretfisher/container.training ``` ] (You can also fork the repository on GitHub and clone your fork if you prefer that.) .debug[[chicago/sampleapp-simple.md](https://github.com/BretFisher/container.training/tree/tampa/slides/chicago/sampleapp-simple.md)] --- class: pic .interstitial[] --- name: toc-start-some-ks-apps class: title Start some K8s apps .nav[ [Previous part](#toc-clone-the-workshop-repository) | [Back to table of contents](#toc-part-1) | [Next part](#toc-docker-desktop--windows-sec-update) ] .debug[(automatically generated title slide)] --- # Start some K8s apps - Quickly spin up our demo apps from last week .lab[ ```bash kubectl apply -f ~/container.training/k8s/dockercoins.yaml kubectl apply -f ~/container.training/k8s/rainbow.yaml kubectl get pods -w -A ``` ] Once you stop seeing new pods show up as "running" you can ctrl-c to quit watching .debug[[chicago/sampleapp-simple.md](https://github.com/BretFisher/container.training/tree/tampa/slides/chicago/sampleapp-simple.md)] --- class: pic .interstitial[] --- name: toc-docker-desktop--windows-sec-update class: title Docker Desktop 4 Windows Sec Update .nav[ [Previous part](#toc-start-some-ks-apps) | [Back to table of contents](#toc-part-1) | [Next part](#toc-restricting-pod-permissions) ] .debug[(automatically generated title slide)] --- # Docker Desktop 4 Windows Sec Update - I asked Docker Captains and Docker Inc. about Tuesday's question: - "How do we observe and control what's running in DD4W containers?" -- - WSL2 (Win 10/11) doesn't yet have observability from the Windows host - MS has a ["Enterprise" page for WSL] that is good info, but no good news -- - You could deploy a private registry, and devs can only pull from there - But someone has to pull/push all Hub images you use. Who want's another job? 😅 -- - Good news! Docker Desktop Business Subscription has new controls - [Registry Access Management] and [Image Access Management] - Centrally control what registries are allowed by your org users - Centrally control the type of images they can pull (official, verified, org, etc.) ["Enterprise" page for WSL]: https://docs.microsoft.com/en-us/windows/wsl/enterprise [Registry Access Management]: https://www.docker.com/blog/introducing-registry-access-management-for-docker-business/ [Image Access Management]: https://docs.docker.com/docker-hub/image-access-management/ .debug[[chicago/dd4w-sec.md](https://github.com/BretFisher/container.training/tree/tampa/slides/chicago/dd4w-sec.md)] --- class: pic .interstitial[] --- name: toc-restricting-pod-permissions class: title Restricting Pod Permissions .nav[ [Previous part](#toc-docker-desktop--windows-sec-update) | [Back to table of contents](#toc-part-1) | [Next part](#toc-brets-container-security-advice) ] .debug[(automatically generated title slide)] --- # Restricting Pod Permissions - By default, our pods and containers can do *everything* (including taking over the entire cluster) - We are going to show an example of a malicious pod (which will give us root access to the whole cluster) - Then we will explain how to avoid this with admission control (PodSecurityAdmission, PodSecurityPolicy, or external policy engine) .debug[[k8s/pod-security-intro.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/pod-security-intro.md)] --- ## Setting up a namespace - For simplicity, let's work in a separate namespace - Let's create a new namespace called "green" .lab[ - Create the "green" namespace: ```bash kubectl create namespace green ``` - Change to that namespace: ```bash kns green ``` ] .debug[[k8s/pod-security-intro.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/pod-security-intro.md)] --- ## Creating a basic Deployment - Just to check that everything works correctly, deploy NGINX .lab[ - Create a Deployment using the official NGINX image: ```bash kubectl create deployment web --image=nginx ``` - Confirm that the Deployment, ReplicaSet, and Pod exist, and that the Pod is running: ```bash kubectl get all ``` ] .debug[[k8s/pod-security-intro.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/pod-security-intro.md)] --- ## One example of malicious pods - We will now show an escalation technique in action - We will deploy a DaemonSet that adds our SSH key to the root account (on *each* node of the cluster) - The Pods of the DaemonSet will do so by mounting `/root` from the host .lab[ - Check the file `k8s/hacktheplanet.yaml` with a text editor: ```bash vim ~/container.training/k8s/hacktheplanet.yaml ``` - If you would like, change the SSH key (by changing the GitHub user name) ] .debug[[k8s/pod-security-intro.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/pod-security-intro.md)] --- ## Deploying the malicious pods - Let's deploy our "exploit"! .lab[ - Create the DaemonSet: ```bash kubectl create -f ~/container.training/k8s/hacktheplanet.yaml ``` - Check that the pods are running: ```bash kubectl get pods ``` - Confirm that the SSH key was added to the node's root account: ```bash sudo cat /root/.ssh/authorized_keys ``` ] .debug[[k8s/pod-security-intro.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/pod-security-intro.md)] --- ## Mitigations - This can be avoided with *admission control* - Admission control = filter for (write) API requests - Admission control can use: - plugins (compiled in API server; enabled/disabled by reconfiguration) - webhooks (registesred dynamically) - Admission control has many other uses (enforcing quotas, adding ServiceAccounts automatically, etc.) .debug[[k8s/pod-security-intro.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/pod-security-intro.md)] --- ## Built-in admission plugins - [PodSecurityPolicy](https://kubernetes.io/docs/concepts/policy/pod-security-policy/) (will be removed in Kubernetes 1.25) - Legacy, since K8s 1.0, pre "Admission Controllers" API - Commonly used, but removed when 1.25 ships in August 2022 - [PodSecurityAdmission](https://kubernetes.io/docs/concepts/security/pod-security-admission/) (beta since Kubernetes 1.23) - use pre-defined policies (privileged, baseline, restricted) - label namespaces to indicate which policies they can use - optionally, define default rules (in the absence of labels) .debug[[k8s/pod-security-intro.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/pod-security-intro.md)] --- ## Acronym salad for built-in features - PSP = Pod Security Policy (legacy) - an admission plugin called PodSecurityPolicy - a resource named PodSecurityPolicy (`apiVersion: policy/v1beta1`) - PSA = Pod Security Admission - an admission controller called `PodSecurity`, enforcing PSS below - the successor to the legacy PSP - PSS = Pod Security Standards - a set of 3 policies (privileged, baseline, restricted) .debug[[k8s/pod-security-intro.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/pod-security-intro.md)] --- ## Dynamic admission - Leverage ValidatingWebhookConfigurations (to register a validating webhook) - Examples: [Kubewarden](https://www.kubewarden.io/) (uses Wasm-based policies) [Kyverno](https://kyverno.io/policies/pod-security/) (uses simple YAML K8s resources) [OPA Gatekeeper](https://github.com/open-policy-agent/gatekeeper) (uses Rego language for policies) - Pros: available today; very flexible and customizable; superset of PodSecurityAdmission - Cons: performance and reliability of external webhook (minor usually) .debug[[k8s/pod-security-intro.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/pod-security-intro.md)] --- ## My policy preferences for the real world - [Kyverno] (and their SaaS [Nirmata]) is the most mature and flexible solution - It's a CNCF incubating project, meaning it's ready for production, stable, and popular - It only requires using YAML, unlike others, and way more flexible than the built-in PSA - It's friendly at the CLI, failing gracefully and explaining why you didn't meet the policy - Like all Admission Controllers, it's harder to troubleshoot when `apply` is automated - I had the founder demo it on my stream, checkout [the video] or [podcast] - We should still probably know the basics of PSA, since thats built-in [Kyverno]:https://kyverno.io/policies/pod-security/ [Nirmata]:https://nirmata.com/ [the video]:https://youtu.be/4uabd0GkqdY?t=357 [podcast]:https://podcast.bretfisher.com/episodes/kubernetes-policy-management-with-kyverno-and-nirmata .debug[[k8s/pod-security-intro.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/pod-security-intro.md)] --- ## More on Kyverno - Security and Ops teams, seriously, [checkout Kyverno] - It can do *lots* of things to place guard rails on your clusters (188+ policies) - This gives you more comfort in letting devs "just deploy to their namespace" -- - A few of the many things it can do: - Require images in specific namespaces or the whole cluster to be *signed*! - Implement and control PSS and PSA for you - Prevent NodePort in services, so pods must use LoadBalancer or Ingress - Prevent use of `latest` tag on images (which is not good in production) -- - Dev's! usually your DevSecOps team will let you know the policies - All you need to do is write your pod spec properly to adhere to their standards - Here's an example... [checkout Kyverno]:https://kyverno.io .debug[[k8s/pod-security-intro.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/pod-security-intro.md)] --- ## Good default pod spec for workloads .small[ ```yaml spec: securityContext: seccompProfile: type: RuntimeDefault # enable seccomp default profile runAsUser: 1000 # hardcode user to non-root if not set in Dockerfile runAsGroup: 1000 # hardcode group to non-root if not set in Dockerfile runAsNonRoot: true # hardcode to non-root. Redundant to above if Dockerfile is set USER 1000 containers: - name: my-container-name image: my-image:tag ports: - containerPort: 8080 # hardcode the listening port if Dockerfile isn't set wit EXPOSE protocol: TCP securityContext: allowPrivilegeEscalation: false # prevent sudo, etc. privileged: false # prevent acting like host root readinessProbe: httpGet: # Lots of timeout values with defaults, be sure they are ideal for your workload path: /ready port: 8080 resources: # Because limits = requests, QoS is set to "Guaranteed" limits: memory: "500Mi" # If container uses over 500MB it is killed (OOM) cpu: "2" # If container uses over 2 vCPU it is throttled requests: memory: "500Mi" # Scheduler finds a node where 500MB is available cpu: "1" # Scheduler finds a node where 1 vCPU is available ``` ] ??? :EN:- Mechanisms to prevent pod privilege escalation :FR:- Les mécanismes pour limiter les privilèges des pods .debug[[k8s/pod-security-intro.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/pod-security-intro.md)] --- class: pic .interstitial[] --- name: toc-brets-container-security-advice class: title Bret's container security advice .nav[ [Previous part](#toc-restricting-pod-permissions) | [Back to table of contents](#toc-part-1) | [Next part](#toc-controlling-resources-in-kubernetes) ] .debug[(automatically generated title slide)] --- # Bret's container security advice - There's mountains of features, tools, and techniques to improve security for containers - I thought I'd give you a top 10-ish list of activities I see as valuable in each area - This is exclusively about containers, in 3 parts: - Image security - Container/Pod security - Kubernetes cluster security - This list was inspired by my [security AMA on the topic] [security AMA on the topic]: https://github.com/BretFisher/ama/discussions/150 .debug[[shared/brets-security-advice.md](https://github.com/BretFisher/container.training/tree/tampa/slides/shared/brets-security-advice.md)] --- ## Image security top 10-ish - Use `slim` or `ubuntu` for language base images - Implement multi-stage builds so prod doesn't have dev/test dependencies - Create and use non-root users in Dockerfiles - Define your Dockerfile `USER` as the ID, to work best with Kubernetes - Don't reuse image tags for prod-destined images. Use semver or/with date tags - [Consider an init process] like `tini` to avoid zombie processes - Use comments heavily in Dockerfiles to document your build process - Copy `.gitignore` to `.dockerignore` everywhere there's a Dockerfile. (add `.git`!) - Focus on reducing CVE count. [Automate builds and CVE scans] for every PR commit [Consider an init process]: https://github.com/BretFisher/nodejs-rocks-in-docker#proper-nodejs-startup-tini [Automate builds and CVE scans]: https://github.com/BretFisher/allhands22 .debug[[shared/brets-security-advice.md](https://github.com/BretFisher/container.training/tree/tampa/slides/shared/brets-security-advice.md)] --- ## Container security top 10-ish - You're running a non-root user in the container, right? - Run your apps on a high port (3000, 8000, 8080, etc.), for easier rootless containers - Lock down your pod spec with defaults for non-root, seccomp, and privilege escalation ```yaml spec: securityContext: runAsUser: 1000 runAsGroup: 1000 seccompProfile: type: RuntimeDefault containers: - name: httpenv image: bretfisher/httpenv securityContext: allowPrivilegeEscalation: false privileged: false ``` .debug[[shared/brets-security-advice.md](https://github.com/BretFisher/container.training/tree/tampa/slides/shared/brets-security-advice.md)] --- ## Kubernetes security top 10-ish - Use a vendor or cloud Kubernetes installer rather than "vanilla" upstream - Scan your cluster often for configuration issues with [Kubescape] (NSA and CIS options) - Add automated GitOps tools and prevent humans from having kubectl root access - Enable Admission Controllers to enforce security policies ([Kyverno]) ([a great post]) - Scan all YAML files (manifests, kustomize, Helm) for security and config issues - Add to CI automation. Scan on every PR of infrastructure-as-code - K8s specific tools include [Trivy] and [Datree] - Even "all in one" tools like [Super-Linter] and [MegaLinter] can help - [Research sigstore], and implement Content Trust for a secure supply chain - Besides the obvious log and monitoring, [use Falco] to watch for bad behavior [Kubescape]: https://github.com/armosec/kubescape [Trivy]: https://aquasecurity.github.io/trivy [Datree]: https://www.datree.io/ [Super-Linter]: https://github.com/github/super-linter [MegaLinter]: https://oxsecurity.github.io/megalinter/latest/descriptors/kubernetes/ [Kyverno]: https://kyverno.io/ [a great post]: https://www.appvia.io/blog/podsecuritypolicy-is-dead-long-live [Research sigstore]: https://www.sigstore.dev/ [use Falco]: https://falco.org/ .debug[[shared/brets-security-advice.md](https://github.com/BretFisher/container.training/tree/tampa/slides/shared/brets-security-advice.md)] --- class: pic .interstitial[] --- name: toc-controlling-resources-in-kubernetes class: title Controlling resources in Kubernetes .nav[ [Previous part](#toc-brets-container-security-advice) | [Back to table of contents](#toc-part-1) | [Next part](#toc-resources-in-linux) ] .debug[(automatically generated title slide)] --- # Controlling resources in Kubernetes - Using Kubernetes leads to a consolidation of servers - The OS is no longer the isolation boundary, we can safely run many apps on one OS - It's easier to manage a smaller set of larger servers - This reduces waist of unused resources -- - Good news: App owners (devs) can now stop defining "host resource requirements" - You should, however, still define resource needs for your containers - Since K8s allows sharing clusters between teams, resource mgmt is key - We've got multiple options! .debug[[k8s/resource-limits-intro.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits-intro.md)] --- ## How do resources affect security? - Clusters are typically shared between teams and app types - By default, any container can consume all host resources - Availability is a key factor in security (DOS attacks, etc) -- - In some cases, Kubernetes will kill containers to make resources avail for others - When resource contention occurs in a cluster, undefined pods are killed first .debug[[k8s/resource-limits-intro.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits-intro.md)] --- ## Resource Management Preview - The Kubernetes Scheduler decides which node will run your pod - It takes resources requests into account in finding a suitable host -- - This includes multiple factors: 1. Any reservations and limits your pod spec defines 2. Any Resource Quotas your namespace has (aggregate of all pods) 3. Any Limit Ranges your namespace has (per pod) 4. Finally, it matches that to a node with those available resources -- - If we don't reserve resources, Kubernetes may assign us a node w/o enough resources - If we don't limit resources, pod can run host out of CPU/memory when it misbehaves - **These two, reservations and limits, are central to Kubernetes scheduling** .debug[[k8s/resource-limits-intro.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits-intro.md)] --- class: pic .interstitial[] --- name: toc-resources-in-linux class: title Resources in Linux .nav[ [Previous part](#toc-controlling-resources-in-kubernetes) | [Back to table of contents](#toc-part-1) | [Next part](#toc-defining-min-max-and-default-resources) ] .debug[(automatically generated title slide)] --- # Resources in Linux - We can attach resource indications to our pods (or rather: to the *containers* in our pods) - We can specify *limits* and/or *requests* - We can specify quantities of CPU and/or memory .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- ## CPU vs memory - CPU is a *compressible resource* - it can be preempted immediately without adverse effect - if we have N CPU and need 2N, we run at 50% speed - Memory is an *incompressible resource* - it needs to be swapped out to be reclaimed; and this is costly - if we have N GB RAM and need 2N, we might run at... 0.1% speed! - As a result, exceeding limits will have different consequences for CPU and memory .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- class: extra-details ## CPU limits implementation details - A container with a CPU limit will be "rationed" by the kernel - Every `cfs_period_us`, it will receive a CPU quota, like an "allowance" (that interval defaults to 100ms) - Once it has used its quota, it will be stalled until the next period - This can easily result in throttling for bursty workloads (see details on next slide) .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- class: extra-details ## A bursty example - Web service receives one request per minute - Each request takes 1 second of CPU - Average load: 1.66% - Let's say we set a CPU limit of 10% - This means CPU quotas of 10ms every 100ms - Obtaining the quota for 1 second of CPU will take 10 seconds - Observed latency will be 10 seconds (... actually 9.9s) instead of 1 second (real-life scenarios will of course be less extreme, but they do happen!) .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- class: extra-details ## Multi-core scheduling details - Each core gets a small share of the container's CPU quota (this avoids locking and contention on the "global" quota for the container) - By default, the kernel distributes that quota to CPUs in 5ms increments (tunable with `kernel.sched_cfs_bandwidth_slice_us`) - If a containerized process (or thread) uses up its local CPU quota: *it gets more from the "global" container quota (if there's some left)* - If it "yields" (e.g. sleeps for I/O) before using its local CPU quota: *the quota is **soon** returned to the "global" container quota, **minus** 1ms* .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- class: extra-details ## Low quotas on machines with many cores - The local CPU quota is not immediately returned to the global quota - this reduces locking and contention on the global quota - but this can cause starvation when many threads/processes become runnable - That 1ms that "stays" on the local CPU quota is often useful - if the thread/process becomes runnable, it can be scheduled immediately - again, this reduces locking and contention on the global quota - but if the thread/process doesn't become runnable, it is wasted! - this can become a huge problem on machines with many cores .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- class: extra-details ## CPU limits in a nutshell - Beware if you run small bursty workloads on machines with many cores! ("highly-threaded, user-interactive, non-cpu bound applications") - Check the `nr_throttled` and `throttled_time` metrics in `cpu.stat` - Possible solutions/workarounds: - be generous with the limits - make sure your kernel has the [appropriate patch](https://lkml.org/lkml/2019/5/17/581) - use [static CPU manager policy](https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#static-policy) For more details, check [this blog post](https://erickhun.com/posts/kubernetes-faster-services-no-cpu-limits/) or these ones ([part 1](https://engineering.indeedblog.com/blog/2019/12/unthrottled-fixing-cpu-limits-in-the-cloud/), [part 2](https://engineering.indeedblog.com/blog/2019/12/cpu-throttling-regression-fix/)). .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- ## Running low on memory - When the system runs low on memory, it starts to reclaim used memory (we talk about "memory pressure") - Option 1: free up some buffers and caches (fastest option; might affect performance if cache memory runs very low) - Option 2: swap, i.e. write to disk some memory of one process to give it to another (can have a huge negative impact on performance because disks are slow) - Option 3: terminate a process and reclaim all its memory (OOM or Out Of Memory Killer on Linux) .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- ## Memory limits on Kubernetes - Kubernetes *does not support swap* (but it may support it in the future, thanks to [KEP 2400]) - If a container exceeds its memory *limit*, it gets killed immediately - If a node is overcommitted and under memory pressure, it will terminate some pods (see next slide for some details about what "overcommit" means here!) [KEP 2400]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#implementation-history .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- ## Overcommitting resources - *Limits* are "hard limits" (a container *cannot* exceed its limits) - a container exceeding its memory limit is killed - a container exceeding its CPU limit is throttled - On a given node, the sum of pod *limits* can be higher than the node size - *Requests* are used for scheduling purposes - a container can use more than its requested CPU or RAM amounts - a container using *less* than what it requested should never be killed or throttled - On a given node, the sum of pod *requests* cannot be higher than the node size .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- class: pic ## Requests vs. Limits  .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- ## Pod quality of service Each pod is assigned a QoS class (visible in `status.qosClass`). - If limits = requests: - as long as the container uses less than the limit, it won't be affected - if all containers in a pod have *(limits=requests)*, QoS is considered "Guaranteed" - If requests < limits: - as long as the container uses less than the request, it won't be affected - otherwise, it might be killed/evicted if the node gets overloaded - if at least one container has *(requests<limits)*, QoS is considered "Burstable" - If a pod doesn't have any request nor limit, QoS is considered "BestEffort" .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- ## Quality of service impact - When a node is overloaded, BestEffort pods are killed first - Then, Burstable pods that exceed their requests - Burstable and Guaranteed pods below their requests are never killed (except if their node fails) - If we only use Guaranteed pods, no pod should ever be killed (as long as they stay within their limits) (Pod QoS is also explained in [this page](https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/) of the Kubernetes documentation and in [this blog post](https://medium.com/google-cloud/quality-of-service-class-qos-in-kubernetes-bb76a89eb2c6).) .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- class: extra-details ## CPU and RAM reservation - Kubernetes passes resources requests and limits to the container engine - The container engine applies these requests and limits with specific mechanisms - Example: on Linux, this is typically done with control groups aka cgroups - Most systems use cgroups v1, but cgroups v2 are slowly being rolled out (e.g. available in Ubuntu 22.04 LTS) - Cgroups v2 have new, interesting features for memory control: - ability to set "minimum" memory amounts (to effectively reserve memory) - better control on the amount of swap used by a container .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- class: extra-details ## What's the deal with swap? - With cgroups v1, it's not possible to disable swap for a cgroup (the closest option is to [reduce "swappiness"](https://unix.stackexchange.com/questions/77939/turning-off-swapping-for-only-one-process-with-cgroups)) - It is possible with cgroups v2 (see the [kernel docs](https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html) and the [fbatx docs](https://facebookmicrosites.github.io/cgroup2/docs/memory-controller.html#using-swap)) - Cgroups v2 aren't widely deployed yet - The architects of Kubernetes wanted to ensure that Guaranteed pods never swap - The simplest solution was to disable swap entirely - Kubelet will refuse to start if it detects that swap is enabled! .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- ## Alternative point of view - Swap enables paging¹ of anonymous² memory - Even when swap is disabled, Linux will still page memory for: - executables, libraries - mapped files - Disabling swap *will reduce performance and available resources* - For a good time, read [kubernetes/kubernetes#53533](https://github.com/kubernetes/kubernetes/issues/53533) - Also read this [excellent blog post about swap](https://jvns.ca/blog/2017/02/17/mystery-swap/) ¹Paging: reading/writing memory pages from/to disk to reclaim physical memory ²Anonymous memory: memory that is not backed by files or blocks .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- ## Enabling swap anyway - If you don't care that pods are swapping, you can enable swap - You will need to add the flag `--fail-swap-on=false` to kubelet (remember: it won't otherwise start if it detects that swap is enabled) .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- ## Specifying resources - Resource requests are expressed at the *container* level - CPU is expressed in "virtual CPUs" (corresponding to the virtual CPUs offered by some cloud providers) - CPU can be expressed with a decimal value, or even a "millicpu" suffix - (100m = 0.1, or 10% of a vCPU) - (2000m = 2, or 100% of 2 vCPUs) - Memory is expressed in bytes - Memory can be expressed with k, M, G, T, ki, Mi, Gi, Ti suffixes - (corresponding to 10^3, 10^6, 10^9, 10^12, 2^10, 2^20, 2^30, 2^40) - (most common is Mi or Gi) - (1Gi = 1GB, or 1,073,741,824 bytes) .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- ## Specifying resources in practice This is what the spec of a Pod with resources will look like: ```yaml containers: - name: httpenv image: bretfisher/httpenv resources: limits: memory: "100Mi" cpu: "100m" # 10% of a vCPU. Could also be "0.1" requests: memory: "100Mi" cpu: "10m" # 1% of a vCPU. Could also be "0.01" ``` This set of resources makes sure that this service won't be killed (as long as it stays below 100 MB of RAM), but allows its CPU usage to be throttled if necessary. .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- ## Default values - If we specify a limit without a request: the request is set to the limit - If we specify a request without a limit: there will be no limit (which means that the limit will be the size of the node) - If we don't specify anything: the request is zero and the limit is the size of the node *Unless there are default values defined for our namespace!* .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- ## We need default resource values - If we do not set resource values at all: - the limit is "the size of the node" - the request is zero - This is generally *not* what we want - a container without a limit can use up all the resources of a node - if the request is zero, the scheduler can't make a smart placement decision - To address this, we can set default values for resources - This is done with a LimitRange object .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- class: pic .interstitial[] --- name: toc-defining-min-max-and-default-resources class: title Defining min, max, and default resources .nav[ [Previous part](#toc-resources-in-linux) | [Back to table of contents](#toc-part-1) | [Next part](#toc-namespace-quotas) ] .debug[(automatically generated title slide)] --- # Defining min, max, and default resources - We can create LimitRange objects to indicate any combination of: - min and/or max resources allowed per pod - default resource *limits* - default resource *requests* - maximal burst ratio (*limit/request*) - LimitRange objects are namespaced - They apply to their namespace only .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- ## LimitRange example ```yaml apiVersion: v1 kind: LimitRange metadata: name: my-very-detailed-limitrange spec: limits: - type: Container min: cpu: "100m" max: cpu: "2000m" memory: "1Gi" default: cpu: "500m" memory: "250Mi" defaultRequest: cpu: "500m" ``` .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- ## Example explanation The YAML on the previous slide shows an example LimitRange object specifying very detailed limits on CPU usage, and providing defaults on RAM usage. Note the `type: Container` line: in the future, it might also be possible to specify limits per Pod, but it's not [officially documented yet](https://github.com/kubernetes/website/issues/9585). .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- ## LimitRange details - LimitRange restrictions are enforced only when a Pod is created (they don't apply retroactively) - They don't prevent creation of e.g. an invalid Deployment or DaemonSet (but the pods will not be created as long as the LimitRange is in effect) - If there are multiple LimitRange restrictions, they all apply together (which means that it's possible to specify conflicting LimitRanges,
preventing any Pod from being created) - If a LimitRange specifies a `max` for a resource but no `default`,
that `max` value becomes the `default` limit too .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- class: pic .interstitial[] --- name: toc-namespace-quotas class: title Namespace quotas .nav[ [Previous part](#toc-defining-min-max-and-default-resources) | [Back to table of contents](#toc-part-1) | [Next part](#toc-limiting-resources-in-practice) ] .debug[(automatically generated title slide)] --- # Namespace quotas - We can also set quotas per namespace - Quotas apply to the total usage in a namespace (e.g. total CPU limits of all pods in a given namespace) - Quotas can apply to resource limits and/or requests (like the CPU and memory limits that we saw earlier) - Quotas can also apply to other resources: - "extended" resources (like GPUs) - storage size - number of objects (number of pods, services...) .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- ## Creating a quota for a namespace - Quotas are enforced by creating a ResourceQuota object - ResourceQuota objects are namespaced, and apply to their namespace only - We can have multiple ResourceQuota objects in the same namespace - The most restrictive values are used .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- ## Limiting total CPU/memory usage - The following YAML specifies an upper bound for *limits* and *requests*: ```yaml apiVersion: v1 kind: ResourceQuota metadata: name: a-little-bit-of-compute spec: hard: requests.cpu: "10" requests.memory: 10Gi limits.cpu: "20" limits.memory: 20Gi ``` These quotas will apply to the namespace where the ResourceQuota is created. .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- ## Limiting number of objects - The following YAML specifies how many objects of specific types can be created: ```yaml apiVersion: v1 kind: ResourceQuota metadata: name: quota-for-objects spec: hard: pods: 100 services: 10 secrets: 10 configmaps: 10 persistentvolumeclaims: 20 services.nodeports: 0 services.loadbalancers: 0 count/roles.rbac.authorization.k8s.io: 10 ``` (The `count/` syntax allows limiting arbitrary objects, including CRDs.) .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- ## YAML vs CLI - Quotas can be created with a YAML definition - ...Or with the `kubectl create quota` command - Example: ```bash kubectl create quota my-resource-quota --hard=pods=300,limits.memory=300Gi ``` - With both YAML and CLI form, the values are always under the `hard` section (there is no `soft` quota) .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- ## Viewing current usage When a ResourceQuota is created, we can see how much of it is used: ``` kubectl describe resourcequota my-resource-quota Name: my-resource-quota Namespace: default Resource Used Hard -------- ---- ---- pods 12 100 services 1 5 services.loadbalancers 0 0 services.nodeports 0 0 ``` .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- ## Advanced quotas and PriorityClass - Pods can have a *priority* - The priority is a number from 0 to 1000000000 (or even higher for system-defined priorities) - High number = high priority = "more important" Pod - Pods with a higher priority can *preempt* Pods with lower priority (= low priority pods will be *evicted* if needed) - Useful when mixing workloads in resource-constrained environments .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- ## Setting the priority of a Pod - Create a PriorityClass (or use an existing one) - When creating the Pod, set the field `spec.priorityClassName` - If the field is not set: - if there is a PriorityClass with `globalDefault`, it is used - otherwise, the default priority will be zero .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- class: extra-details ## PriorityClass and ResourceQuotas - A ResourceQuota can include a list of *scopes* or a *scope selector* - In that case, the quota will only apply to the scoped resources - Example: limit the resources allocated to "high priority" Pods - In that case, make sure that the quota is created in every Namespace (or use *admission configuration* to enforce it) - See the [resource quotas documentation][quotadocs] for details [quotadocs]: https://kubernetes.io/docs/concepts/policy/resource-quotas/#resource-quota-per-priorityclass .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- class: pic .interstitial[] --- name: toc-limiting-resources-in-practice class: title Limiting resources in practice .nav[ [Previous part](#toc-namespace-quotas) | [Back to table of contents](#toc-part-1) | [Next part](#toc-ks) ] .debug[(automatically generated title slide)] --- # Limiting resources in practice - We have at least three mechanisms: - requests and limits per Pod - LimitRange per namespace - ResourceQuota per namespace - Let's see a simple recommendation to get started with resource limits .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- ## Set a LimitRange - In each namespace, create a LimitRange object - Set a small default CPU request and CPU limit (e.g. "100m") - Set a default memory request and limit depending on your most common workload - for Java, Ruby: start with "1G" - for Go, Python, PHP, Node: start with "250M" - Set upper bounds slightly below your expected node size (80-90% of your node size, with at least a 500M memory buffer) .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- ## Set a ResourceQuota - In each namespace, create a ResourceQuota object - Set generous CPU and memory limits (e.g. half the cluster size if the cluster hosts multiple apps) - Set generous objects limits - these limits should not be here to constrain your users - they should catch a runaway process creating many resources - example: a custom controller creating many pods .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- ## Observe, refine, iterate - Observe the resource usage of your pods (we will see how in the next chapter) - Adjust individual pod limits - If you see trends: adjust the LimitRange (rather than adjusting every individual set of pod limits) - Observe the resource usage of your namespaces (with `kubectl describe resourcequota ...`) - Rinse and repeat regularly .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- ## Underutilization - Remember: when assigning a pod to a node, the scheduler looks at *requests* (not at current utilization on the node) - If pods request resources but don't use them, this can lead to underutilization (because the scheduler will consider that the node is full and can't fit new pods) .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- ## Viewing a namespace limits and quotas - `kubectl describe namespace` will display resource limits and quotas .lab[ - Try it out: ```bash kubectl describe namespace default ``` - View limits and quotas for *all* namespaces: ```bash kubectl describe namespace ``` ] .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- ## Additional resources - [A Practical Guide to Setting Kubernetes Requests and Limits](http://blog.kubecost.com/blog/requests-and-limits/) - explains what requests and limits are - provides guidelines to set requests and limits - gives PromQL expressions to compute good values
(our app needs to be running for a while) - [Kube Resource Report](https://codeberg.org/hjacobs/kube-resource-report) - generates web reports on resource usage - [nsinjector](https://github.com/blakelead/nsinjector) - controller to automatically populate a Namespace when it is created ??? :EN:- Setting compute resource limits :EN:- Defining default policies for resource usage :EN:- Managing cluster allocation and quotas :EN:- Resource management in practice :FR:- Allouer et limiter les ressources des conteneurs :FR:- Définir des ressources par défaut :FR:- Gérer les quotas de ressources au niveau du cluster :FR:- Conseils pratiques .debug[[k8s/resource-limits.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/resource-limits.md)] --- class: pic .interstitial[] --- name: toc-ks class: title k9s .nav[ [Previous part](#toc-limiting-resources-in-practice) | [Back to table of contents](#toc-part-1) | [Next part](#toc-the-kubernetes-dashboard) ] .debug[(automatically generated title slide)] --- # k9s - Somewhere in between CLI and GUI (or web UI), we can find the magic land of TUI - [Text-based user interfaces](https://en.wikipedia.org/wiki/Text-based_user_interface) - often using libraries like [curses](https://en.wikipedia.org/wiki/Curses_%28programming_library%29) and its successors - Some folks love them, some folks hate them, some are indifferent ... - But it's nice to have different options! - Let's see one particular TUI for Kubernetes: [k9s](https://k9scli.io/) .debug[[k8s/k9s.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/k9s.md)] --- ## Installing k9s - If you are using a training cluster or the [shpod](https://github.com/jpetazzo/shpod) image, k9s is pre-installed - Otherwise, it can be installed easily: - with [various package managers](https://k9scli.io/topics/install/) - or by fetching a [binary release](https://github.com/derailed/k9s/releases) - We don't need to set up or configure anything (it will use the same configuration as `kubectl` and other well-behaved clients) - Just run `k9s` to fire it up! .debug[[k8s/k9s.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/k9s.md)] --- ## What kind to we want to see? - Press `:` to change the type of resource to view - Then type, for instance, `ns` or `namespace` or `nam[TAB]`, then `[ENTER]` - Use the arrows to move down to e.g. `kube-system`, and press `[ENTER]` - Or, type `/kub` or `/sys` to filter the output, and press `[ENTER]` twice (once to exit the filter, once to enter the namespace) - We now see the pods in `kube-system`! .debug[[k8s/k9s.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/k9s.md)] --- ## Interacting with pods - `l` to view logs - `d` to describe - `s` to get a shell (won't work if `sh` isn't available in the container image) - `e` to edit - `shift-f` to define port forwarding - `ctrl-k` to kill - `[ESC]` to get out or get back .debug[[k8s/k9s.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/k9s.md)] --- ## Quick navigation between namespaces - On top of the screen, we should see shortcuts like this: ``` <0> all <1> kube-system <2> default ``` - Pressing the corresponding number switches to that namespace (or shows resources across all namespaces with `0`) - Locate a namespace with a copy of DockerCoins, and go there! .debug[[k8s/k9s.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/k9s.md)] --- ## Interacting with Deployments - View Deployments (type `:` `deploy` `[ENTER]`) - Select e.g. `worker` - Scale it with `s` - View its aggregated logs with `l` .debug[[k8s/k9s.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/k9s.md)] --- ## Exit - Exit at any time with `Ctrl-C` - k9s will "remember" where you were (and go back there next time you run it) .debug[[k8s/k9s.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/k9s.md)] --- ## Pros - Very convenient to navigate through resources (hopping from a deployment, to its pod, to another namespace, etc.) - Very convenient to quickly view logs of e.g. init containers - Very convenient to get a (quasi) realtime view of resources (if we use `watch kubectl get` a lot, we will probably like k9s) .debug[[k8s/k9s.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/k9s.md)] --- ## Cons - Doesn't promote automation / scripting (if you repeat the same things over and over, there is a scripting opportunity) - Not all features are available (e.g. executing arbitrary commands in containers) .debug[[k8s/k9s.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/k9s.md)] --- ## Conclusion Try it out, and see if it makes you more productive! ??? :EN:- The k9s TUI :FR:- L'interface texte k9s .debug[[k8s/k9s.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/k9s.md)] --- class: pic .interstitial[] --- name: toc-the-kubernetes-dashboard class: title The Kubernetes dashboard .nav[ [Previous part](#toc-ks) | [Back to table of contents](#toc-part-1) | [Next part](#toc-security-implications-of-kubectl-apply) ] .debug[(automatically generated title slide)] --- # The Kubernetes dashboard - Kubernetes resources can also be viewed with a web dashboard - Dashboard users need to authenticate (typically with a token) - The dashboard should be exposed over HTTPS (to prevent interception of the aforementioned token) - Ideally, this requires obtaining a proper TLS certificate (for instance, with Let's Encrypt) .debug[[k8s/dashboard.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/dashboard.md)] --- ## Three ways to install the dashboard - Our `k8s` directory has no less than three manifests! - `dashboard-recommended.yaml` (purely internal dashboard; user must be created manually) - `dashboard-with-token.yaml` (dashboard exposed with NodePort; creates an admin user for us) - `dashboard-insecure.yaml` aka *YOLO* (dashboard exposed over HTTP; gives root access to anonymous users) .debug[[k8s/dashboard.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/dashboard.md)] --- ## `dashboard-insecure.yaml` - This will allow anyone to deploy anything on your cluster (without any authentication whatsoever) - **Do not** use this, except maybe on a local cluster (or a cluster that you will destroy a few minutes later) - On "normal" clusters, use `dashboard-with-token.yaml` instead! .debug[[k8s/dashboard.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/dashboard.md)] --- ## What's in the manifest? - The dashboard itself - An HTTP/HTTPS unwrapper (using `socat`) - The guest/admin account .lab[ - Create all the dashboard resources, with the following command: ```bash kubectl apply -f ~/container.training/k8s/dashboard-insecure.yaml ``` ] .debug[[k8s/dashboard.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/dashboard.md)] --- ## Connecting to the dashboard .lab[ - Check which port the dashboard is on: ```bash kubectl get svc kubernetes-dashboard -n kubernetes-dashboard ``` ] You'll want the `3xxxx` port. .lab[ - Connect to http://oneofournodes:3xxxx/ ] The dashboard will then ask you which authentication you want to use. .debug[[k8s/dashboard.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/dashboard.md)] --- ## Dashboard authentication - We have three authentication options at this point: - token (associated with a role that has appropriate permissions) - kubeconfig (e.g. using the `~/.kube/config` file from `node1`) - "skip" (use the dashboard "service account") - Let's use "skip": we're logged in! -- .warning[Remember, we just added a backdoor to our Kubernetes cluster!] .debug[[k8s/dashboard.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/dashboard.md)] --- ## Closing the backdoor - Seriously, don't leave that thing running! .lab[ - Remove what we just created: ```bash kubectl delete -f ~/container.training/k8s/dashboard-insecure.yaml ``` ] .debug[[k8s/dashboard.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/dashboard.md)] --- ## The risks - The steps that we just showed you are *for educational purposes only!* - If you do that on your production cluster, people [can and will abuse it](https://redlock.io/blog/cryptojacking-tesla) - For an in-depth discussion about securing the dashboard,
check [this excellent post on Heptio's blog](https://blog.heptio.com/on-securing-the-kubernetes-dashboard-16b09b1b7aca) .debug[[k8s/dashboard.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/dashboard.md)] --- ## `dashboard-with-token.yaml` - This is a less risky way to deploy the dashboard - It's not completely secure, either: - we're using a self-signed certificate - this is subject to eavesdropping attacks - Using `kubectl port-forward` or `kubectl proxy` is even better .debug[[k8s/dashboard.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/dashboard.md)] --- ## What's in the manifest? - The dashboard itself (but exposed with a `NodePort`) - A ServiceAccount with `cluster-admin` privileges (named `kubernetes-dashboard:cluster-admin`) .lab[ - Create all the dashboard resources, with the following command: ```bash kubectl apply -f ~/container.training/k8s/dashboard-with-token.yaml ``` ] .debug[[k8s/dashboard.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/dashboard.md)] --- ## Obtaining the token - The manifest creates a ServiceAccount - Kubernetes will automatically generate a token for that ServiceAccount .lab[ - Display the token: ```bash kubectl --namespace=kubernetes-dashboard \ describe secret cluster-admin-token ``` ] The token should start with `eyJ...` (it's a JSON Web Token). Note that the secret name will actually be `cluster-admin-token-xxxxx`.
(But `kubectl` prefix matches are great!) .debug[[k8s/dashboard.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/dashboard.md)] --- ## Connecting to the dashboard .lab[ - Check which port the dashboard is on: ```bash kubectl get svc --namespace=kubernetes-dashboard ``` ] You'll want the `3xxxx` port. .lab[ - Connect to http://oneofournodes:3xxxx/ ] The dashboard will then ask you which authentication you want to use. .debug[[k8s/dashboard.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/dashboard.md)] --- ## Dashboard authentication - Select "token" authentication - Copy paste the token (starting with `eyJ...`) obtained earlier - We're logged in! .debug[[k8s/dashboard.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/dashboard.md)] --- ## Other dashboards - [Kube Web View](https://codeberg.org/hjacobs/kube-web-view) - read-only dashboard - optimized for "troubleshooting and incident response" - see [vision and goals](https://kube-web-view.readthedocs.io/en/latest/vision.html#vision) for details - [Kube Ops View](https://codeberg.org/hjacobs/kube-ops-view) - "provides a common operational picture for multiple Kubernetes clusters" .debug[[k8s/dashboard.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/dashboard.md)] --- class: pic .interstitial[] --- name: toc-security-implications-of-kubectl-apply class: title Security implications of `kubectl apply` .nav[ [Previous part](#toc-the-kubernetes-dashboard) | [Back to table of contents](#toc-part-1) | [Next part](#toc-workflows-code-pull-requests) ] .debug[(automatically generated title slide)] --- # Security implications of `kubectl apply` - When we do `kubectl apply -f
`, we create arbitrary resources - Resources can be evil; imagine a `deployment` that ... -- - starts bitcoin miners on the whole cluster -- - hides in a non-default namespace -- - bind-mounts our nodes' filesystem -- - inserts SSH keys in the root account (on the node) -- - encrypts our data and ransoms it -- - ☠️☠️☠️ .debug[[k8s/dashboard.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/dashboard.md)] --- ## `kubectl apply` is the new `curl | sh` - `curl | sh` is convenient - It's safe if you use HTTPS URLs from trusted sources -- - `kubectl apply -f` is convenient - It's safe if you use HTTPS URLs from trusted sources - Example: the official setup instructions for most pod networks -- - It introduces new failure modes (for instance, if you try to apply YAML from a link that's no longer valid) ??? :EN:- The Kubernetes dashboard :FR:- Le *dashboard* Kubernetes .debug[[k8s/dashboard.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/dashboard.md)] --- class: pic .interstitial[] --- name: toc-workflows-code-pull-requests class: title Workflows: code pull requests .nav[ [Previous part](#toc-security-implications-of-kubectl-apply) | [Back to table of contents](#toc-part-1) | [Next part](#toc-workflows-container-deployments-with-gitops) ] .debug[(automatically generated title slide)] --- # Workflows: code pull requests - The industry is moving towards a typical workflow for *code PRs* - The goal is to make it faster to *merge* code to a *deployable branch* - While also increasing automation, quality, feedback, and security - Let's visualize 3 workflows for code PRs with CI & build automation .debug[[chicago/code-pr-workflow.md](https://github.com/BretFisher/container.training/tree/tampa/slides/chicago/code-pr-workflow.md)] --- class: pic  .debug[[chicago/code-pr-workflow.md](https://github.com/BretFisher/container.training/tree/tampa/slides/chicago/code-pr-workflow.md)] --- class: pic  .debug[[chicago/code-pr-workflow.md](https://github.com/BretFisher/container.training/tree/tampa/slides/chicago/code-pr-workflow.md)] --- class: pic  .debug[[chicago/code-pr-workflow.md](https://github.com/BretFisher/container.training/tree/tampa/slides/chicago/code-pr-workflow.md)] --- class: pic .interstitial[] --- name: toc-workflows-container-deployments-with-gitops class: title Workflows: container deployments with GitOps .nav[ [Previous part](#toc-workflows-code-pull-requests) | [Back to table of contents](#toc-part-1) | [Next part](#toc-next-project-steps) ] .debug[(automatically generated title slide)] --- # Workflows: container deployments with GitOps - The industry is moving towards a typical workflow for *container deployments* - The goal is to make it faster to *ship* code to a *server environment* - While also increasing automation, quality, feedback, and security - Let's visualize a workflow for successfully tested images to *manually* deploy to servers .debug[[chicago/gitops-workflow.md](https://github.com/BretFisher/container.training/tree/tampa/slides/chicago/gitops-workflow.md)] --- class: pic  .debug[[chicago/gitops-workflow.md](https://github.com/BretFisher/container.training/tree/tampa/slides/chicago/gitops-workflow.md)] --- ## Flaws of this CD approach - It requires humans to touch servers in order to deploy - It requires humans to *have access* to servers -- - Human delays means what's approved and what's running could be different - Because humans have to deploy, there's no central source of truth - GitOps ideals are meant to fix all this .debug[[chicago/gitops-workflow.md](https://github.com/BretFisher/container.training/tree/tampa/slides/chicago/gitops-workflow.md)] --- ## A few goals of GitOps - Clusters watch a git repo (or image tag regex) for changes - Clusters deploy (and rollback on failure) as soon as a change is detected -- - If clusters do something, they *write-back* to git as the central log of change - Git logs are the central source of truth -- - Deployment specs (YAML) live in git, right next to code and IaC (terraform, ansible, etc.) - Pull Requests become the human gates for approving deployments -- - YAML specs and IaC are linted, validated, tested, and approved just like code - This synergy of change management means devs *and* ops can update apps on servers .footnote[.small[ Waveworks coined GitOps in 2017 and has a [great intro guide](https://www.weave.works/technologies/gitops/) ]] .debug[[chicago/gitops-workflow.md](https://github.com/BretFisher/container.training/tree/tampa/slides/chicago/gitops-workflow.md)] --- class: pic  .debug[[chicago/gitops-workflow.md](https://github.com/BretFisher/container.training/tree/tampa/slides/chicago/gitops-workflow.md)] --- class: pic  .debug[[chicago/gitops-workflow.md](https://github.com/BretFisher/container.training/tree/tampa/slides/chicago/gitops-workflow.md)] --- class: pic .interstitial[] --- name: toc-next-project-steps class: title Next project steps .nav[ [Previous part](#toc-workflows-container-deployments-with-gitops) | [Back to table of contents](#toc-part-1) | [Next part](#toc-its-a-wrap) ] .debug[(automatically generated title slide)] --- # Next project steps - There are many ways to get started with containers - The two primary paths are local dev first, or build/test first .debug[[chicago/next-project-steps.md](https://github.com/BretFisher/container.training/tree/tampa/slides/chicago/next-project-steps.md)] --- ## Local dev first This is what most devs imagine as the path to success 1. Get Docker Desktop locally 2. Pick a single project to migrate to containers 3. Build Dockerfiles locally 4. Get solution to work in Compose locally with the various containers 5. Get other devs to try it 6. Focus on shifting dev to "container first" with Docker + Compose 7. Eventually get images to build/test in CI 8. Some time after that, Build Kubernetes YAML and start testing locally 9. Build a K8s cluster and try deploying 10. Iterate on the three aspects of the project, dev, build/test, and clusters 11. Once project is in production, document learnings, find the next project .debug[[chicago/next-project-steps.md](https://github.com/BretFisher/container.training/tree/tampa/slides/chicago/next-project-steps.md)] --- ## What if you can't run Docker or K8s locally? Maybe because security, resource limits, project approvals, etc. - Option 1: Issue each dev a Linux server. Use VSCode/etc. to remotely dev via ssh - VS Code Remote: https://code.visualstudio.com/docs/remote/remote-overview - Option 2: Create a K8s cluster on servers. Use tools to dev locally and run remotely - Okteto: https://okteto.com/ - Garden: https://garden.io/ - Telepresence: https://www.telepresence.io/ - Skaffold: https://skaffold.dev/ - Option 3: Invest in a platform to develop remotely. This is the trend. - GitHub employees now only develop remotely using [CodeSpaces](https://github.com/features/codespaces) - Okteto [sells a self-hosted option] for one-click remote dev environments - [Kasm Linux desktops](https://www.kasmweb.com/) [sells a self-hosted option]: https://www.okteto.com/pricing/?plan=Self-Hosted .debug[[chicago/next-project-steps.md](https://github.com/BretFisher/container.training/tree/tampa/slides/chicago/next-project-steps.md)] --- ## Build/test first Maybe devs don't have time/budget/approval to change their local dev setup 1. Pick a single project to migrate to containers 1. Build Dockerfiles locally 1. ~~Get solution to work in Compose locally with the various containers~~ 1. ~~Get other devs to try it~~ 1. ~~Focus on shifting dev to "container first" with Docker + Compose~~ 1. Get images to build/test in CI 2. Build a K8s cluster 3. Build Kubernetes YAML and start testing ~~locally~~ 4. Iterate on the ~~three~~ two aspects of the project, ~~dev,~~ build/test, and clusters 5. Once project is in production, document learnings, find the next project -- Dev local envs can come later, or never. It's up to you. .debug[[chicago/next-project-steps.md](https://github.com/BretFisher/container.training/tree/tampa/slides/chicago/next-project-steps.md)] --- class: pic .interstitial[] --- name: toc-its-a-wrap class: title It's a wrap .nav[ [Previous part](#toc-next-project-steps) | [Back to table of contents](#toc-part-1) | [Next part](#toc-extra-security-and-advanced-content) ] .debug[(automatically generated title slide)] --- # It's a wrap .debug[[chicago/thanks.md](https://github.com/BretFisher/container.training/tree/tampa/slides/chicago/thanks.md)] --- class: pic  .debug[[chicago/thanks.md](https://github.com/BretFisher/container.training/tree/tampa/slides/chicago/thanks.md)] --- ## Thanks for learning with me! - Seriously, thanks so much for sticking with me - Things to do after this workshop: - Download the slides if you want: https://chicago.bretfisher.com/slides - Checkout the repo this came from: https://github.com/BretFisher/container.training - Signup for my Udemy courses if you haven't: https://chicago.bretfisher.com - Keep in touch - Join my DevOps/Container stream Thursdays at noon central: https://bret.live - Join my DevOps Discord Server where we chat containers all day: https://devops.fan - Listen to my Docker/Kubernetes podcast: https://podcast.bretfisher.com - Tweet me questions (DMs open): https://twitter.com/bretfisher - GitHub AMA me questions: https://bret.show/ama - Email me questions: bret@bretfisher.com .debug[[chicago/thanks.md](https://github.com/BretFisher/container.training/tree/tampa/slides/chicago/thanks.md)] --- class: pic .interstitial[] --- name: toc-extra-security-and-advanced-content class: title (Extra security and advanced content) .nav[ [Previous part](#toc-its-a-wrap) | [Back to table of contents](#toc-part-2) | [Next part](#toc-pod-security-admission) ] .debug[(automatically generated title slide)] --- # (Extra security and advanced content) .debug[[workflows.yml](https://github.com/BretFisher/container.training/tree/tampa/slides/workflows.yml)] --- class: pic .interstitial[] --- name: toc-pod-security-admission class: title Pod Security Admission .nav[ [Previous part](#toc-extra-security-and-advanced-content) | [Back to table of contents](#toc-part-2) | [Next part](#toc-network-policies) ] .debug[(automatically generated title slide)] --- # Pod Security Admission - "New" policies (available in alpha since Kubernetes 1.22, beta in 1.23) (By the way, 1.24 is latest version as of July 2022) - Easier to use (doesn't require complex interaction between policies and RBAC) .debug[[k8s/pod-security-admission.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/pod-security-admission.md)] --- ## PSA in theory - Leans on PSS (Pod Security Standards) - Defines three policies: - `privileged` (can do everything; for system components) - `restricted` (no root user; almost no capabilities) - `baseline` (in-between with reasonable defaults) - Label namespaces to indicate which policies are allowed there - Also supports setting global defaults - Supports `enforce`, `audit`, and `warn` modes .debug[[k8s/pod-security-admission.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/pod-security-admission.md)] --- ## Pod Security Standards: 3 modes Note: restricted->baseline->privileged: each adds more restrictions to the last - `privileged` (admin/easy mode) - can do everything - equivalent to not enabling PSS for a namespace - `baseline` (Disables uncommon things, or "host-affecting things") - disables hostNetwork, hostPID, hostIPC, hostPorts, hostPath volumes - limits which SELinux/AppArmor profiles can be used - containers can still run as root and use most capabilities - `restricted` (hard mode. Make this the goal for all your apps) - limits volumes to configMap, emptyDir, ephemeral, secret, PVC - containers can't run as root, only capability is NET_BIND_SERVICE - includes `baseline` (can't do privileged pods, hostPath, hostNetwork...) .debug[[k8s/pod-security-admission.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/pod-security-admission.md)] --- ## Why `baseline` ≠ `restricted` ? - `baseline` = should work for that vast majority of images - `restricted` = better, but might break / require adaptation - Many images run as root by default - Some images use CAP_CHOWN (to `chown` files) - Some programs use CAP_NET_RAW (e.g. `ping`) .debug[[k8s/pod-security-admission.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/pod-security-admission.md)] --- ## PSA in practice - Step 1: enable the PodSecurity admission plugin (auto-enabled as beta in 1.23) - Step 2: label a Namespace for which of policy to warn, audit, and/or enforce - Step 3: (optional) provide an AdmissionConfiguration to set defaults and exemptions - Step 4: profit! .debug[[k8s/pod-security-admission.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/pod-security-admission.md)] --- ## Namespace labels - Three optional labels can be added to namespaces: `pod-security.kubernetes.io/enforce` `pod-security.kubernetes.io/audit` `pod-security.kubernetes.io/warn` - The values can be: `baseline`, `restricted`, `privileged` (setting it to `privileged` doesn't really do anything) .debug[[k8s/pod-security-admission.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/pod-security-admission.md)] --- ## `enforce`, `audit`, `warn` - `enforce` = prevents creation of pods - `warn` = allow creation but include a warning in the API response (will be visible e.g. in `kubectl` output) - `audit` = allow creation but generate an API audit event (will be visible if API auditing has been enabled and configured) .debug[[k8s/pod-security-admission.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/pod-security-admission.md)] --- ## Blocking privileged pods - Let's block `privileged` pods everywhere - And issue warnings and audit for anything above the `restricted` level .lab[ - Set up the default policy for all namespaces: ```bash kubectl label namespaces \ pod-security.kubernetes.io/enforce=baseline \ pod-security.kubernetes.io/audit=restricted \ pod-security.kubernetes.io/warn=restricted \ --all ``` ] Note: warnings will be issued for infringing pods, but they won't be affected yet. .debug[[k8s/pod-security-admission.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/pod-security-admission.md)] --- class: extra-details ## Check before you apply - When adding an `enforce` policy, we see warnings (for the pods that would infringe that policy) - It's possible to do a `--dry-run=server` to see these warnings (without applying the label) - It will only show warnings for `enforce` policies (not `warn` or `audit`) .debug[[k8s/pod-security-admission.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/pod-security-admission.md)] --- ## Relaxing `kube-system` - We have many system components in `kube-system` - These pods aren't affected yet, but if there is a rolling update or something like that, the new pods won't be able to come up .lab[ - Let's allow `privileged` pods in `kube-system`: ```bash kubectl label namespace kube-system \ pod-security.kubernetes.io/enforce=privileged \ pod-security.kubernetes.io/audit=privileged \ pod-security.kubernetes.io/warn=privileged \ --overwrite ``` ] .debug[[k8s/pod-security-admission.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/pod-security-admission.md)] --- ## What about new namespaces? - If new namespaces are created, they will get default permissions - We can change that by using an *admission configuration* - Step 1: write an "admission configuration file" - Step 2: make sure that file is readable by the API server - Step 3: add a flag to the API server to read that file .debug[[k8s/pod-security-admission.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/pod-security-admission.md)] --- ## Admission Configuration Let's use [k8s/admission-configuration.yaml](https://github.com/jpetazzo/container.training/tree/main/k8s/admission-configuration.yaml): ```yaml apiVersion: apiserver.config.k8s.io/v1 kind: AdmissionConfiguration plugins: - name: PodSecurity configuration: apiVersion: pod-security.admission.config.k8s.io/v1beta1 kind: PodSecurityConfiguration defaults: enforce: baseline audit: baseline warn: baseline exemptions: usernames: - cluster-admin namespaces: - kube-system ``` .debug[[k8s/pod-security-admission.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/pod-security-admission.md)] --- ## Copy the file to the API server - We need the file to be available from the API server pod - For convenience, let's copy it do `/etc/kubernetes/pki` (it's definitely not where it *should* be, but that'll do!) .lab[ - Copy the file: ```bash sudo cp ~/container.training/k8s/admission-configuration.yaml \ /etc/kubernetes/pki ``` ] .debug[[k8s/pod-security-admission.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/pod-security-admission.md)] --- ## Reconfigure the API server - We need to add a flag to the API server to use that file .lab[ - Edit `/etc/kubernetes/manifests/kube-apiserver.yaml` - In the list of `command` parameters, add: `--admission-control-config-file=/etc/kubernetes/pki/admission-configuration.yaml` - Wait until the API server comes back online ] .debug[[k8s/pod-security-admission.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/pod-security-admission.md)] --- ## Test the new default policy - Create a new Namespace - Try to create the "hacktheplanet" DaemonSet in the new namespace - We get a warning when creating the DaemonSet - The DaemonSet is created - But the Pods don't get created .debug[[k8s/pod-security-admission.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/pod-security-admission.md)] --- ## Clean up - We probably want to remove the API server flags that we added (the feature gate and the admission configuration) ??? :EN:- Preventing privilege escalation with Pod Security Admission :FR:- Limiter les droits des conteneurs avec *Pod Security Admission* .debug[[k8s/pod-security-admission.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/pod-security-admission.md)] --- class: pic .interstitial[] --- name: toc-network-policies class: title Network policies .nav[ [Previous part](#toc-pod-security-admission) | [Back to table of contents](#toc-part-2) | [Next part](#toc-authentication-and-authorization) ] .debug[(automatically generated title slide)] --- # Network policies - Namespaces help us to *organize* resources - Namespaces do not provide isolation - By default, every pod can contact every other pod - By default, every service accepts traffic from anyone - If we want this to be different, we need *network policies* .debug[[k8s/netpol.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/netpol.md)] --- ## What's a network policy? A network policy is defined by the following things. - A *pod selector* indicating which pods it applies to e.g.: "all pods in namespace `blue` with the label `zone=internal`" - A list of *ingress rules* indicating which inbound traffic is allowed e.g.: "TCP connections to ports 8000 and 8080 coming from pods with label `zone=dmz`, and from the external subnet 4.42.6.0/24, except 4.42.6.5" - A list of *egress rules* indicating which outbound traffic is allowed A network policy can provide ingress rules, egress rules, or both. .debug[[k8s/netpol.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/netpol.md)] --- ## How do network policies apply? - A pod can be "selected" by any number of network policies - If a pod isn't selected by any network policy, then its traffic is unrestricted (In other words: in the absence of network policies, all traffic is allowed) - If a pod is selected by at least one network policy, then all traffic is blocked ... ... unless it is explicitly allowed by one of these network policies .debug[[k8s/netpol.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/netpol.md)] --- class: extra-details ## Traffic filtering is flow-oriented - Network policies deal with *connections*, not individual packets - Example: to allow HTTP (80/tcp) connections to pod A, you only need an ingress rule (You do not need a matching egress rule to allow response traffic to go through) - This also applies for UDP traffic (Allowing DNS traffic can be done with a single rule) - Network policy implementations use stateful connection tracking .debug[[k8s/netpol.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/netpol.md)] --- ## Pod-to-pod traffic - Connections from pod A to pod B have to be allowed by both pods: - pod A has to be unrestricted, or allow the connection as an *egress* rule - pod B has to be unrestricted, or allow the connection as an *ingress* rule - As a consequence: if a network policy restricts traffic going from/to a pod,
the restriction cannot be overridden by a network policy selecting another pod - This prevents an entity managing network policies in namespace A (but without permission to do so in namespace B) from adding network policies giving them access to namespace B .debug[[k8s/netpol.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/netpol.md)] --- ## The rationale for network policies - In network security, it is generally considered better to "deny all, then allow selectively" (The other approach, "allow all, then block selectively" makes it too easy to leave holes) - As soon as one network policy selects a pod, the pod enters this "deny all" logic - Further network policies can open additional access - Good network policies should be scoped as precisely as possible - In particular: make sure that the selector is not too broad (Otherwise, you end up affecting pods that were otherwise well secured) .debug[[k8s/netpol.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/netpol.md)] --- ## Our first network policy This is our game plan: - run a web server in a pod - create a network policy to block all access to the web server - create another network policy to allow access only from specific pods .debug[[k8s/netpol.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/netpol.md)] --- ## Running our test web server .lab[ - Let's use the `nginx` image: ```bash kubectl create deployment testweb --image=nginx ``` - Find out the IP address of the pod with one of these two commands: ```bash kubectl get pods -o wide -l app=testweb IP=$(kubectl get pods -l app=testweb -o json | jq -r .items[0].status.podIP) ``` - Check that we can connect to the server: ```bash curl $IP ``` ] The `curl` command should show us the "Welcome to nginx!" page. .debug[[k8s/netpol.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/netpol.md)] --- ## Adding a very restrictive network policy - The policy will select pods with the label `app=testweb` - It will specify an empty list of ingress rules (matching nothing) .lab[ - Apply the policy in this YAML file: ```bash kubectl apply -f ~/container.training/k8s/netpol-deny-all-for-testweb.yaml ``` - Check if we can still access the server: ```bash curl $IP ``` ] The `curl` command should now time out. .debug[[k8s/netpol.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/netpol.md)] --- ## Looking at the network policy This is the file that we applied: ```yaml kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: deny-all-for-testweb spec: podSelector: matchLabels: app: testweb ingress: [] ``` .debug[[k8s/netpol.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/netpol.md)] --- ## Allowing connections only from specific pods - We want to allow traffic from pods with the label `run=testcurl` - Reminder: this label is automatically applied when we do `kubectl run testcurl ...` .lab[ - Apply another policy: ```bash kubectl apply -f ~/container.training/k8s/netpol-allow-testcurl-for-testweb.yaml ``` ] .debug[[k8s/netpol.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/netpol.md)] --- ## Looking at the network policy This is the second file that we applied: ```yaml kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: allow-testcurl-for-testweb spec: podSelector: matchLabels: app: testweb ingress: - from: - podSelector: matchLabels: run: testcurl ``` .debug[[k8s/netpol.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/netpol.md)] --- ## Testing the network policy - Let's create pods with, and without, the required label .lab[ - Try to connect to testweb from a pod with the `run=testcurl` label: ```bash kubectl run testcurl --rm -i --image=centos -- curl -m3 $IP ``` - Try to connect to testweb with a different label: ```bash kubectl run testkurl --rm -i --image=centos -- curl -m3 $IP ``` ] The first command will work (and show the "Welcome to nginx!" page). The second command will fail and time out after 3 seconds. (The timeout is obtained with the `-m3` option.) .debug[[k8s/netpol.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/netpol.md)] --- ## An important warning - Some network plugins only have partial support for network policies - For instance, Weave added support for egress rules [in version 2.4](https://github.com/weaveworks/weave/pull/3313) (released in July 2018) - But only recently added support for ipBlock [in version 2.5](https://github.com/weaveworks/weave/pull/3367) (released in Nov 2018) - Unsupported features might be silently ignored (Making you believe that you are secure, when you're not) .debug[[k8s/netpol.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/netpol.md)] --- ## Network policies, pods, and services - Network policies apply to *pods* - A *service* can select multiple pods (And load balance traffic across them) - It is possible that we can connect to some pods, but not some others (Because of how network policies have been defined for these pods) - In that case, connections to the service will randomly pass or fail (Depending on whether the connection was sent to a pod that we have access to or not) .debug[[k8s/netpol.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/netpol.md)] --- ## Network policies and namespaces - A good strategy is to isolate a namespace, so that: - all the pods in the namespace can communicate together - other namespaces cannot access the pods - external access has to be enabled explicitly - Let's see what this would look like for the DockerCoins app! .debug[[k8s/netpol.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/netpol.md)] --- ## Network policies for DockerCoins - We are going to apply two policies - The first policy will prevent traffic from other namespaces - The second policy will allow traffic to the `webui` pods - That's all we need for that app! .debug[[k8s/netpol.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/netpol.md)] --- ## Blocking traffic from other namespaces This policy selects all pods in the current namespace. It allows traffic only from pods in the current namespace. (An empty `podSelector` means "all pods.") ```yaml kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: deny-from-other-namespaces spec: podSelector: {} ingress: - from: - podSelector: {} ``` .debug[[k8s/netpol.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/netpol.md)] --- ## Allowing traffic to `webui` pods This policy selects all pods with label `app=webui`. It allows traffic from any source. (An empty `from` field means "all sources.") ```yaml kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: allow-webui spec: podSelector: matchLabels: app: webui ingress: - from: [] ``` .debug[[k8s/netpol.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/netpol.md)] --- ## Applying both network policies - Both network policies are declared in the file [k8s/netpol-dockercoins.yaml](https://github.com/jpetazzo/container.training/tree/main/k8s/netpol-dockercoins.yaml) .lab[ - Apply the network policies: ```bash kubectl apply -f ~/container.training/k8s/netpol-dockercoins.yaml ``` - Check that we can still access the web UI from outside
(and that the app is still working correctly!) - Check that we can't connect anymore to `rng` or `hasher` through their ClusterIP ] Note: using `kubectl proxy` or `kubectl port-forward` allows us to connect regardless of existing network policies. This allows us to debug and troubleshoot easily, without having to poke holes in our firewall. .debug[[k8s/netpol.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/netpol.md)] --- ## Cleaning up our network policies - The network policies that we have installed block all traffic to the default namespace - We should remove them, otherwise further demos and exercises will fail! .lab[ - Remove all network policies: ```bash kubectl delete networkpolicies --all ``` ] .debug[[k8s/netpol.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/netpol.md)] --- ## Protecting the control plane - Should we add network policies to block unauthorized access to the control plane? (etcd, API server, etc.) -- - At first, it seems like a good idea ... -- - But it *shouldn't* be necessary: - not all network plugins support network policies - the control plane is secured by other methods (mutual TLS, mostly) - the code running in our pods can reasonably expect to contact the API
(and it can do so safely thanks to the API permission model) - If we block access to the control plane, we might disrupt legitimate code - ...Without necessarily improving security .debug[[k8s/netpol.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/netpol.md)] --- ## Tools and resources - [Cilium Network Policy Editor](https://editor.cilium.io/) - [Tufin Network Policy Viewer](https://orca.tufin.io/netpol/) - Two resources by [Ahmet Alp Balkan](https://ahmet.im/): - a [very good talk about network policies](https://www.youtube.com/watch?list=PLj6h78yzYM2P-3-xqvmWaZbbI1sW-ulZb&v=3gGpMmYeEO8) at KubeCon North America 2017 - a repository of [ready-to-use recipes](https://github.com/ahmetb/kubernetes-network-policy-recipes) for network policies .debug[[k8s/netpol.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/netpol.md)] --- ## Documentation - As always, the [Kubernetes documentation](https://kubernetes.io/docs/concepts/services-networking/network-policies/) is a good starting point - The API documentation has a lot of detail about the format of various objects: - [NetworkPolicy](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#networkpolicy-v1-networking-k8s-io) - [NetworkPolicySpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#networkpolicyspec-v1-networking-k8s-io) - [NetworkPolicyIngressRule](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#networkpolicyingressrule-v1-networking-k8s-io) - etc. ??? :EN:- Isolating workloads with Network Policies :FR:- Isolation réseau avec les *network policies* .debug[[k8s/netpol.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/netpol.md)] --- class: pic .interstitial[] --- name: toc-authentication-and-authorization class: title Authentication and authorization .nav[ [Previous part](#toc-network-policies) | [Back to table of contents](#toc-part-2) | [Next part](#toc-operators) ] .debug[(automatically generated title slide)] --- # Authentication and authorization - In this section, we will: - define authentication and authorization - explain how they are implemented in Kubernetes - talk about tokens, certificates, service accounts, RBAC ... - But first: why do we need all this? .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- ## The need for fine-grained security - The Kubernetes API should only be available for identified users - we don't want "guest access" (except in very rare scenarios) - we don't want strangers to use our compute resources, delete our apps ... - our keys and passwords should not be exposed to the public - Users will often have different access rights - cluster admin (similar to UNIX "root") can do everything - developer might access specific resources, or a specific namespace - supervision might have read only access to *most* resources .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- ## Example: custom HTTP load balancer - Let's imagine that we have a custom HTTP load balancer for multiple apps - Each app has its own *Deployment* resource - By default, the apps are "sleeping" and scaled to zero - When a request comes in, the corresponding app gets woken up - After some inactivity, the app is scaled down again - This HTTP load balancer needs API access (to scale up/down) - What if *a wild vulnerability appears*? .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- ## Consequences of vulnerability - If the HTTP load balancer has the same API access as we do: *full cluster compromise (easy data leak, cryptojacking...)* - If the HTTP load balancer has `update` permissions on the Deployments: *defacement (easy), MITM / impersonation (medium to hard)* - If the HTTP load balancer only has permission to `scale` the Deployments: *denial-of-service* - All these outcomes are bad, but some are worse than others .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- ## Definitions - Authentication = verifying the identity of a person On a UNIX system, we can authenticate with login+password, SSH keys ... - Authorization = listing what they are allowed to do On a UNIX system, this can include file permissions, sudoer entries ... - Sometimes abbreviated as "authn" and "authz" - In good modular systems, these things are decoupled (so we can e.g. change a password or SSH key without having to reset access rights) .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- ## Authentication in Kubernetes - When the API server receives a request, it tries to authenticate it (it examines headers, certificates... anything available) - Many authentication methods are available and can be used simultaneously (we will see them on the next slide) - It's the job of the authentication method to produce: - the user name - the user ID - a list of groups - The API server doesn't interpret these; that'll be the job of *authorizers* .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- ## Authentication methods - TLS client certificates (that's the default for clusters provisioned with `kubeadm`) - Bearer tokens (a secret token in the HTTP headers of the request) - [HTTP basic auth](https://en.wikipedia.org/wiki/Basic_access_authentication) (carrying user and password in an HTTP header; [deprecated since Kubernetes 1.19](https://github.com/kubernetes/kubernetes/pull/89069)) - Authentication proxy (sitting in front of the API and setting trusted headers) .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- ## Anonymous requests - If any authentication method *rejects* a request, it's denied (`401 Unauthorized` HTTP code) - If a request is neither rejected nor accepted by anyone, it's anonymous - the user name is `system:anonymous` - the list of groups is `[system:unauthenticated]` - By default, the anonymous user can't do anything (that's what you get if you just `curl` the Kubernetes API) .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- ## Authentication with TLS certificates - Enabled in almost all Kubernetes deployments - The user name is indicated by the `CN` in the client certificate - The groups are indicated by the `O` fields in the client certificate - From the point of view of the Kubernetes API, users do not exist (i.e. there is no resource with `kind: User`) - The Kubernetes API can be set up to use your custom CA to validate client certs .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- class: extra-details ## Authentication for kubelet - In most clusters, kubelets authenticate using certificates (`O=system:nodes`, `CN=system:node:name-of-the-node`) - The Kubernetes API can act as a CA (by wrapping an X509 CSR into a CertificateSigningRequest resource) - This enables kubelets to renew their own certificates - It can also be used to issue user certificates (but it lacks flexibility; e.g. validity can't be customized) .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- ## User certificates in practice - The Kubernetes API server does not support certificate revocation (see issue [#18982](https://github.com/kubernetes/kubernetes/issues/18982)) - As a result, we don't have an easy way to terminate someone's access (if their key is compromised, or they leave the organization) - Issue short-lived certificates if you use them to authenticate users! (short-lived = a few hours) - This can be facilitated by e.g. Vault, cert-manager... .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- ## What if a certificate is compromised? - Option 1: wait for the certificate to expire (which is why short-lived certs are convenient!) - Option 2: remove access from that certificate's user and groups - if that user was `bob.smith`, create a new user `bob.smith.2` - if Bob was in groups `dev`, create a new group `dev.2` - let's agree that this is not a great solution! - Option 3: re-create a new CA and re-issue all certificates - let's agree that this is an even worse solution! .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- ## Authentication with tokens - Tokens are passed as HTTP headers: `Authorization: Bearer and-then-here-comes-the-token` - Tokens can be validated through a number of different methods: - static tokens hard-coded in a file on the API server - [bootstrap tokens](https://kubernetes.io/docs/reference/access-authn-authz/bootstrap-tokens/) (special case to create a cluster or join nodes) - [OpenID Connect tokens](https://kubernetes.io/docs/reference/access-authn-authz/authentication/#openid-connect-tokens) (to delegate authentication to compatible OAuth2 providers) - service accounts (these deserve more details, coming right up!) .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- ## Service accounts - A service account is a user that exists in the Kubernetes API (it is visible with e.g. `kubectl get serviceaccounts`) - Service accounts can therefore be created / updated dynamically (they don't require hand-editing a file and restarting the API server) - A service account can be associated with a set of secrets (the kind that you can view with `kubectl get secrets`) - Service accounts are generally used to grant permissions to applications, services... (as opposed to humans) .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- ## Service account tokens evolution - In Kubernetes 1.21 and above, pods use *bound service account tokens*: - these tokens are *bound* to a specific object (e.g. a Pod) - they are automatically invalidated when the object is deleted - these tokens also expire quickly (e.g. 1 hour) and gets rotated automatically - In Kubernetes 1.24 and above, unbound tokens aren't created automatically - before 1.24, we would see unbound tokens with `kubectl get secrets` - with 1.24 and above, these tokens can be created with `kubectl create token` - ...or with a Secret with the right [type and annotation][create-token] [create-token]: https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#create-token .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- class: extra-details ## Checking our authentication method - Let's check our kubeconfig file - Do we have a certificate, a token, or something else? .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- class: extra-details ## Inspecting a certificate If we have a certificate, let's use the following command: ```bash kubectl config view \ --raw \ -o json \ | jq -r .users[0].user[\"client-certificate-data\"] \ | openssl base64 -d -A \ | openssl x509 -text \ | grep Subject: ``` This command will show the `CN` and `O` fields for our certificate. .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- class: extra-details ## Breaking down the command - `kubectl config view` shows the Kubernetes user configuration - `--raw` includes certificate information (which shows as REDACTED otherwise) - `-o json` outputs the information in JSON format - `| jq ...` extracts the field with the user certificate (in base64) - `| openssl base64 -d -A` decodes the base64 format (now we have a PEM file) - `| openssl x509 -text` parses the certificate and outputs it as plain text - `| grep Subject:` shows us the line that interests us → We are user `kubernetes-admin`, in group `system:masters`. (We will see later how and why this gives us the permissions that we have.) .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- class: extra-details ## Inspecting a token If we have a token, let's use the following command: ```bash kubectl config view \ --raw \ -o json \ | jq -r .users[0].user.token \ | base64 -d \ | cut -d. -f2 \ | base64 -d \ | jq . ``` If our token is a JWT / OIDC token, this command will show its content. .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- class: extra-details ## Other authentication methods - Other types of tokens - these tokens are typically shorter than JWT or OIDC tokens - it is generally not possible to extract information from them - Plugins - some clusters use external `exec` plugins - these plugins typically use API keys to generate or obtain tokens - example: the AWS EKS authenticator works this way .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- class: extra-details ## Token authentication in practice - We are going to list existing service accounts - Then we will extract the token for a given service account - And we will use that token to authenticate with the API .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- class: extra-details ## Listing service accounts .lab[ - The resource name is `serviceaccount` or `sa` for short: ```bash kubectl get sa ``` ] There should be just one service account in the default namespace: `default`. .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- class: extra-details ## Finding the secret .lab[ - List the secrets for the `default` service account: ```bash kubectl get sa default -o yaml SECRET=$(kubectl get sa default -o json | jq -r .secrets[0].name) ``` ] It should be named `default-token-XXXXX`. When running Kubernetes 1.24 and above, this Secret won't exist.
Instead, create a token with `kubectl create token default`. .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- class: extra-details ## Extracting the token - The token is stored in the secret, wrapped with base64 encoding .lab[ - View the secret: ```bash kubectl get secret $SECRET -o yaml ``` - Extract the token and decode it: ```bash TOKEN=$(kubectl get secret $SECRET -o json \ | jq -r .data.token | openssl base64 -d -A) ``` ] .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- class: extra-details ## Using the token - Let's send a request to the API, without and with the token .lab[ - Find the ClusterIP for the `kubernetes` service: ```bash kubectl get svc kubernetes API=$(kubectl get svc kubernetes -o json | jq -r .spec.clusterIP) ``` - Connect without the token: ```bash curl -k https://$API ``` - Connect with the token: ```bash curl -k -H "Authorization: Bearer $TOKEN" https://$API ``` ] .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- class: extra-details ## Results - In both cases, we will get a "Forbidden" error - Without authentication, the user is `system:anonymous` - With authentication, it is shown as `system:serviceaccount:default:default` - The API "sees" us as a different user - But neither user has any rights, so we can't do nothin' - Let's change that! .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- ## Authorization in Kubernetes - There are multiple ways to grant permissions in Kubernetes, called [authorizers](https://kubernetes.io/docs/reference/access-authn-authz/authorization/#authorization-modules): - [Node Authorization](https://kubernetes.io/docs/reference/access-authn-authz/node/) (used internally by kubelet; we can ignore it) - [Attribute-based access control](https://kubernetes.io/docs/reference/access-authn-authz/abac/) (powerful but complex and static; ignore it too) - [Webhook](https://kubernetes.io/docs/reference/access-authn-authz/webhook/) (each API request is submitted to an external service for approval) - [Role-based access control](https://kubernetes.io/docs/reference/access-authn-authz/rbac/) (associates permissions to users dynamically) - The one we want is the last one, generally abbreviated as RBAC .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- ## Role-based access control - RBAC allows to specify fine-grained permissions - Permissions are expressed as *rules* - A rule is a combination of: - [verbs](https://kubernetes.io/docs/reference/access-authn-authz/authorization/#determine-the-request-verb) like create, get, list, update, delete... - resources (as in "API resource," like pods, nodes, services...) - resource names (to specify e.g. one specific pod instead of all pods) - in some case, [subresources](https://kubernetes.io/docs/reference/access-authn-authz/rbac/#referring-to-resources) (e.g. logs are subresources of pods) .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- class: extra-details ## Listing all possible verbs - The Kubernetes API is self-documented - We can ask it which resources, subresources, and verb exist - One way to do this is to use: - `kubectl get --raw /api/v1` (for core resources with `apiVersion: v1`) - `kubectl get --raw /apis/
/
` (for other resources) - The JSON response can be formatted with e.g. `jq` for readability .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- class: extra-details ## Examples - List all verbs across all `v1` resources ```bash kubectl get --raw /api/v1 | jq -r .resources[].verbs[] | sort -u ``` - List all resources and subresources in `apps/v1` ```bash kubectl get --raw /apis/apps/v1 | jq -r .resources[].name ``` - List which verbs are available on which resources in `networking.k8s.io` ```bash kubectl get --raw /apis/networking.k8s.io/v1 | \ jq -r '.resources[] | .name + ": " + (.verbs | join(", "))' ``` .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- ## From rules to roles to rolebindings - A *role* is an API object containing a list of *rules* Example: role "external-load-balancer-configurator" can: - [list, get] resources [endpoints, services, pods] - [update] resources [services] - A *rolebinding* associates a role with a user Example: rolebinding "external-load-balancer-configurator": - associates user "external-load-balancer-configurator" - with role "external-load-balancer-configurator" - Yes, there can be users, roles, and rolebindings with the same name - It's a good idea for 1-1-1 bindings; not so much for 1-N ones .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- ## Cluster-scope permissions - API resources Role and RoleBinding are for objects within a namespace - We can also define API resources ClusterRole and ClusterRoleBinding - These are a superset, allowing us to: - specify actions on cluster-wide objects (like nodes) - operate across all namespaces - We can create Role and RoleBinding resources within a namespace - ClusterRole and ClusterRoleBinding resources are global .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- ## Pods and service accounts - A pod can be associated with a service account - by default, it is associated with the `default` service account - as we saw earlier, this service account has no permissions anyway - The associated token is exposed to the pod's filesystem (in `/var/run/secrets/kubernetes.io/serviceaccount/token`) - Standard Kubernetes tooling (like `kubectl`) will look for it there - So Kubernetes tools running in a pod will automatically use the service account .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- ## In practice - We are going to run a pod - This pod will use the default service account of its namespace - We will check our API permissions (there shouldn't be any) - Then we will bind a role to the service account - We will check that we were granted the corresponding permissions .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- ## Running a pod - We'll use [Nixery](https://nixery.dev/) to run a pod with `curl` and `kubectl` - Nixery automatically generates images with the requested packages .lab[ - Run our pod: ```bash kubectl run eyepod --rm -ti --restart=Never \ --image nixery.dev/shell/curl/kubectl -- bash ``` ] .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- ## Checking our permissions - Normally, at this point, we don't have any API permission .lab[ - Check our permissions with `kubectl`: ```bash kubectl get pods ``` ] - We should get a message telling us that our service account doesn't have permissions to list "pods" in the current namespace - We can also make requests to the API server directly (use `kubectl -v6` to see the exact request URI!) .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- ## Binding a role to the service account - Binding a role = creating a *rolebinding* object - We will call that object `can-view` (but again, we could call it `view` or whatever we like) .lab[ - Create the new role binding: ```bash kubectl create rolebinding can-view \ --clusterrole=view \ --serviceaccount=default:default ``` ] It's important to note a couple of details in these flags... .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- ## Roles vs Cluster Roles - We used `--clusterrole=view` - What would have happened if we had used `--role=view`? - we would have bound the role `view` from the local namespace
(instead of the cluster role `view`) - the command would have worked fine (no error) - but later, our API requests would have been denied - This is a deliberate design decision (we can reference roles that don't exist, and create/update them later) .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- ## Users vs Service Accounts - We used `--serviceaccount=default:default` - What would have happened if we had used `--user=default:default`? - we would have bound the role to a user instead of a service account - again, the command would have worked fine (no error) - ...but our API requests would have been denied later - What's about the `default:` prefix? - that's the namespace of the service account - yes, it could be inferred from context, but... `kubectl` requires it .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- ## Checking our new permissions - We should be able to *view* things, but not to *edit* them .lab[ - Check our permissions with `kubectl`: ```bash kubectl get pods ``` - Try to create something: ```bash kubectl create deployment can-i-do-this --image=nginx ``` - Exit the container with `exit` or `^D` ] .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- class: extra-details ## `kubectl run --serviceaccount` - `kubectl run` also has a `--serviceaccount` flag - ...But it's supposed to be deprecated "soon" (see [kubernetes/kubernetes#99732](https://github.com/kubernetes/kubernetes/pull/99732) for details) - It's possible to specify the service account with an override: ```bash kubectl run my-pod -ti --image=alpine --restart=Never \ --overrides='{ "spec": { "serviceAccountName" : "my-service-account" } }' ``` .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- ## `kubectl auth` and other CLI tools - The `kubectl auth can-i` command can tell us: - if we can perform an action - if someone else can perform an action - what actions we can perform - There are also other very useful tools to work with RBAC - Let's do a quick review! .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- ## `kubectl auth can-i dothis onthat` - These commands will give us a `yes`/`no` answer: ```bash kubectl auth can-i list nodes kubectl auth can-i create pods kubectl auth can-i get pod/name-of-pod kubectl auth can-i get /url-fragment-of-api-request/ kubectl auth can-i '*' services kubectl auth can-i get coffee kubectl auth can-i drink coffee ``` - The RBAC system is flexible - We can check permissions on resources that don't exist yet (e.g. CRDs) - We can check permissions for arbitrary actions .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- ## `kubectl auth can-i ... --as someoneelse` - We can check permissions on behalf of other users ```bash kubectl auth can-i list nodes \ --as some-user kubectl auth can-i list nodes \ --as system:serviceaccount:
:
``` - We can also use `--as-group` to check permissions for members of a group - `--as` and `--as-group` leverage the *impersonation API* - These flags can be used with many other `kubectl` commands (not just `auth can-i`) .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- ## `kubectl auth can-i --list` - We can list the actions that are available to us: ```bash kubectl auth can-i --list ``` - ... Or to someone else (with `--as SomeOtherUser`) - This is very useful to check users or service accounts for overly broad permissions (or when looking for ways to exploit a security vulnerability!) - To learn more about Kubernetes attacks and threat models around RBAC: 📽️ [Hacking into Kubernetes Security for Beginners](https://www.youtube.com/watch?v=mLsCm9GVIQg) by [Ellen Körbes](https://twitter.com/ellenkorbes) and [Tabitha Sable](https://twitter.com/TabbySable) .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- class: extra-details ## Other useful tools - For auditing purposes, sometimes we want to know who can perform which actions - There are a few tools to help us with that, available as `kubectl` plugins: - `kubectl who-can` / [kubectl-who-can](https://github.com/aquasecurity/kubectl-who-can) by Aqua Security - `kubectl access-matrix` / [Rakkess (Review Access)](https://github.com/corneliusweig/rakkess) by Cornelius Weig - `kubectl rbac-lookup` / [RBAC Lookup](https://github.com/FairwindsOps/rbac-lookup) by FairwindsOps - `kubectl rbac-tool` / [RBAC Tool](https://github.com/alcideio/rbac-tool) by insightCloudSec - `kubectl` plugins can be installed and managed with `krew` - They can also be installed and executed as standalone programs .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- class: extra-details ## Where does this `view` role come from? - Kubernetes defines a number of ClusterRoles intended to be bound to users - `cluster-admin` can do *everything* (think `root` on UNIX) - `admin` can do *almost everything* (except e.g. changing resource quotas and limits) - `edit` is similar to `admin`, but cannot view or edit permissions - `view` has read-only access to most resources, except permissions and secrets *In many situations, these roles will be all you need.* *You can also customize them!* .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- class: extra-details ## Customizing the default roles - If you need to *add* permissions to these default roles (or others),
you can do it through the [ClusterRole Aggregation](https://kubernetes.io/docs/reference/access-authn-authz/rbac/#aggregated-clusterroles) mechanism - This happens by creating a ClusterRole with the following labels: ```yaml metadata: labels: rbac.authorization.k8s.io/aggregate-to-admin: "true" rbac.authorization.k8s.io/aggregate-to-edit: "true" rbac.authorization.k8s.io/aggregate-to-view: "true" ``` - This ClusterRole permissions will be added to `admin`/`edit`/`view` respectively .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- class: extra-details ## When should we use aggregation? - By default, CRDs aren't included in `view` / `edit` / etc. (Kubernetes cannot guess which one are security sensitive and which ones are not) - If we edit `view` / `edit` / etc directly, our edits will conflict (imagine if we have two CRDs and they both provide a custom `view` ClusterRole) - Using aggregated roles lets us enrich the default roles without touching them .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- class: extra-details ## How aggregation works - The corresponding roles will have `aggregationRules` like this: ```yaml aggregationRule: clusterRoleSelectors: - matchLabels: rbac.authorization.k8s.io/aggregate-to-view: "true" ``` - We can define our own custom roles with their own aggregation rules .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- class: extra-details ## Where do our permissions come from? - When interacting with the Kubernetes API, we are using a client certificate - We saw previously that this client certificate contained: `CN=kubernetes-admin` and `O=system:masters` - Let's look for these in existing ClusterRoleBindings: ```bash kubectl get clusterrolebindings -o yaml | grep -e kubernetes-admin -e system:masters ``` (`system:masters` should show up, but not `kubernetes-admin`.) - Where does this match come from? .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- class: extra-details ## The `system:masters` group - If we eyeball the output of `kubectl get clusterrolebindings -o yaml`, we'll find out! - It is in the `cluster-admin` binding: ```bash kubectl describe clusterrolebinding cluster-admin ``` - This binding associates `system:masters` with the cluster role `cluster-admin` - And the `cluster-admin` is, basically, `root`: ```bash kubectl describe clusterrole cluster-admin ``` .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- ## `list` vs. `get` ⚠️ `list` grants read permissions to resources! - It's not possible to give permission to list resources without also reading them - This has implications for e.g. Secrets (if a controller needs to be able to enumerate Secrets, it will be able to read them) ??? :EN:- Authentication and authorization in Kubernetes :EN:- Authentication with tokens and certificates :EN:- Authorization with RBAC (Role-Based Access Control) :EN:- Restricting permissions with Service Accounts :EN:- Working with Roles, Cluster Roles, Role Bindings, etc. :FR:- Identification et droits d'accès dans Kubernetes :FR:- Mécanismes d'identification par jetons et certificats :FR:- Le modèle RBAC *(Role-Based Access Control)* :FR:- Restreindre les permissions grâce aux *Service Accounts* :FR:- Comprendre les *Roles*, *Cluster Roles*, *Role Bindings*, etc. .debug[[k8s/authn-authz.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/authn-authz.md)] --- class: pic .interstitial[] --- name: toc-operators class: title Operators .nav[ [Previous part](#toc-authentication-and-authorization) | [Back to table of contents](#toc-part-2) | [Next part](#toc-cicd-with-gitlab) ] .debug[(automatically generated title slide)] --- # Operators The Kubernetes documentation describes the [Operator pattern] as follows: *Operators are software extensions to Kubernetes that make use of custom resources to manage applications and their components. Operators follow Kubernetes principles, notably the control loop.* Another good definition from [CoreOS](https://coreos.com/blog/introducing-operators.html): *An operator represents **human operational knowledge in software,**
to reliably manage an application.* There are many different use cases spanning different domains; but the general idea is: *Manage some resources (that reside inside our outside the cluster),
using Kubernetes manifests and tooling.* [Operator pattern]: https://kubernetes.io/docs/concepts/extend-kubernetes/operator/ .debug[[k8s/operators.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/operators.md)] --- ## Some uses cases - Managing external resources ([AWS], [GCP], [KubeVirt]...) - Setting up database replication or distributed systems
(Cassandra, Consul, CouchDB, ElasticSearch, etcd, Kafka, MongoDB, MySQL, PostgreSQL, RabbitMQ, Redis, ZooKeeper...) - Running and configuring CI/CD
([ArgoCD], [Flux]), backups ([Velero]), policies ([Gatekeeper], [Kyverno])... - Automating management of certificates and secrets
([cert-manager]), secrets ([External Secrets Operator], [Sealed Secrets]...) - Configuration of cluster components ([Istio], [Prometheus]) - etc. [ArgoCD]: https://github.com/argoproj/argo-cd [AWS]: https://aws-controllers-k8s.github.io/community/docs/community/services/ [cert-manager]: https://cert-manager.io/ [External Secrets Operator]: https://external-secrets.io/ [Flux]: https://fluxcd.io/ [Gatekeeper]: https://open-policy-agent.github.io/gatekeeper/website/docs/ [GCP]: https://github.com/paulczar/gcp-cloud-compute-operator [Istio]: https://istio.io/latest/docs/setup/install/operator/ [KubeVirt]: https://kubevirt.io/ [Kyverno]: https://kyverno.io/ [Prometheus]: https://prometheus-operator.dev/ [Sealed Secrets]: https://github.com/bitnami-labs/sealed-secrets [Velero]: https://velero.io/ .debug[[k8s/operators.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/operators.md)] --- ## What are they made from? - Operators combine two things: - Custom Resource Definitions - controller code watching the corresponding resources and acting upon them - A given operator can define one or multiple CRDs - The controller code (control loop) typically runs within the cluster (running as a Deployment with 1 replica is a common scenario) - But it could also run elsewhere (nothing mandates that the code run on the cluster, as long as it has API access) .debug[[k8s/operators.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/operators.md)] --- ## Operators for e.g. replicated databases - Kubernetes gives us Deployments, StatefulSets, Services ... - These mechanisms give us building blocks to deploy applications - They work great for services that are made of *N* identical containers (like stateless ones) - They also work great for some stateful applications like Consul, etcd ... (with the help of highly persistent volumes) - They're not enough for complex services: - where different containers have different roles - where extra steps have to be taken when scaling or replacing containers .debug[[k8s/operators.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/operators.md)] --- ## How operators work - An operator creates one or more CRDs (i.e., it creates new "Kinds" of resources on our cluster) - The operator also runs a *controller* that will watch its resources - Each time we create/update/delete a resource, the controller is notified (we could write our own cheap controller with `kubectl get --watch`) .debug[[k8s/operators.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/operators.md)] --- ## Operators are not magic - Look at this ElasticSearch resource definition: [k8s/eck-elasticsearch.yaml](https://github.com/jpetazzo/container.training/tree/main/k8s/eck-elasticsearch.yaml) - What should happen if we flip the TLS flag? Twice? - What should happen if we add another group of nodes? - What if we want different images or parameters for the different nodes? *Operators can be very powerful.
But we need to know exactly the scenarios that they can handle.* ??? :EN:- Kubernetes operators :FR:- Les opérateurs .debug[[k8s/operators.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/operators.md)] --- class: pic .interstitial[] --- name: toc-cicd-with-gitlab class: title CI/CD with GitLab .nav[ [Previous part](#toc-operators) | [Back to table of contents](#toc-part-2) | [Next part](#toc-) ] .debug[(automatically generated title slide)] --- # CI/CD with GitLab - In this section, we will see how to set up a CI/CD pipeline with GitLab (using a "self-hosted" GitLab; i.e. running on our Kubernetes cluster) - The big picture: - each time we push code to GitLab, it will be deployed in a staging environment - each time we push the `production` tag, it will be deployed in production .debug[[k8s/gitlab.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/gitlab.md)] --- ## Disclaimers - We'll use GitLab here as an example, but there are many other options (e.g. some combination of Argo, Harbor, Tekton ...) - There are also hosted options (e.g. GitHub Actions and many others) - We'll use a specific pipeline and workflow, but it's purely arbitrary (treat it as a source of inspiration, not a model to be copied!) .debug[[k8s/gitlab.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/gitlab.md)] --- ## Workflow overview - Push code to GitLab's git server - GitLab notices the `.gitlab-ci.yml` file, which defines our pipeline - Our pipeline can have multiple *stages* executed sequentially (e.g. lint, build, test, deploy ...) - Each stage can have multiple *jobs* executed in parallel (e.g. build images in parallel) - Each job will be executed in an independent *runner* pod .debug[[k8s/gitlab.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/gitlab.md)] --- ## Pipeline overview - Our repository holds source code, Dockerfiles, and a Helm chart - *Lint* stage will check the Helm chart validity - *Build* stage will build container images (and push them to GitLab's integrated registry) - *Deploy* stage will deploy the Helm chart, using these images - Pushes to `production` will deploy to "the" production namespace - Pushes to other tags/branches will deploy to a namespace created on the fly - We will discuss shortcomings and alternatives and the end of this chapter! .debug[[k8s/gitlab.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/gitlab.md)] --- ## Lots of requirements - We need *a lot* of components to pull this off: - a domain name - a storage class - a TLS-capable ingress controller - the cert-manager operator - GitLab itself - the GitLab pipeline - Wow, why?!? .debug[[k8s/gitlab.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/gitlab.md)] --- ## I find your lack of TLS disturbing - We need a container registry (obviously!) - Docker (and other container engines) *require* TLS on the registry (with valid certificates) - A few options: - use a "real" TLS certificate (e.g. obtained with Let's Encrypt) - use a self-signed TLS certificate - communicate with the registry over localhost (TLS isn't required then) .debug[[k8s/gitlab.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/gitlab.md)] --- class: extra-details ## Why not self-signed certs? - When using self-signed certs, we need to either: - add the cert (or CA) to trusted certs - disable cert validation - This needs to be done on *every client* connecting to the registry: - CI/CD pipeline (building and pushing images) - container engine (deploying the images) - other tools (e.g. container security scanner) - It's doable, but it's a lot of hacks (especially when adding more tools!) .debug[[k8s/gitlab.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/gitlab.md)] --- class: extra-details ## Why not localhost? - TLS is usually not required when the registry is on localhost - We could expose the registry e.g. on a `NodePort` - ... And then tweak the CI/CD pipeline to use that instead - This is great when obtaining valid certs is difficult: - air-gapped or internal environments (that can't use Let's Encrypt) - no domain name available - Downside: the registry isn't easily or safely available from outside (the `NodePort` essentially defeats TLS) .debug[[k8s/gitlab.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/gitlab.md)] --- class: extra-details ## Can we use `nip.io`? - We will use Let's Encrypt - Let's Encrypt has a quota of certificates per domain (in 2020, that was [50 certificates per week per domain](https://letsencrypt.org/docs/rate-limits/)) - So if we all use `nip.io`, we will probably run into that limit - But you can try and see if it works! .debug[[k8s/gitlab.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/gitlab.md)] --- ## Ingress - We will assume that we have a domain name pointing to our cluster (i.e. with a wildcard record pointing to at least one node of the cluster) - We will get traffic in the cluster by leveraging `ExternalIPs` services (but it would be easy to use `LoadBalancer` services instead) - We will use Traefik as the ingress controller (but any other one should work too) - We will use cert-manager to obtain certificates with Let's Encrypt .debug[[k8s/gitlab.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/gitlab.md)] --- ## Other details - We will deploy GitLab with its official Helm chart - It will still require a bunch of parameters and customization - We also need a Storage Class (unless our cluster already has one, of course) - We suggest the [Rancher local path provisioner](https://github.com/rancher/local-path-provisioner) .debug[[k8s/gitlab.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/gitlab.md)] --- ## Setting everything up 1. `git clone https://github.com/jpetazzo/kubecoin` 2. `export EMAIL=xxx@example.com DOMAIN=awesome-kube-ci.io` (we need a real email address and a domain pointing to the cluster!) 3. `. setup-gitlab-on-k8s.rc` (this doesn't do anything, but defines a number of helper functions) 4. Execute each helper function, one after another (try `do_[TAB]` to see these functions) .debug[[k8s/gitlab.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/gitlab.md)] --- ## Local Storage `do_1_localstorage` Applies the YAML directly from Rancher's repository. Annotate the Storage Class so that it becomes the default one. .debug[[k8s/gitlab.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/gitlab.md)] --- ## Traefik `do_2_traefik_with_externalips` Install the official Traefik Helm chart. Instead of a `LoadBalancer` service, use a `ClusterIP` with `ExternalIPs`. Automatically infer the `ExternalIPs` from `kubectl get nodes`. Enable TLS. .debug[[k8s/gitlab.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/gitlab.md)] --- ## cert-manager `do_3_certmanager` Install cert-manager using their official YAML. Easy-peasy. .debug[[k8s/gitlab.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/gitlab.md)] --- ## Certificate issuers `do_4_issuers` Create a couple of `ClusterIssuer` resources for cert-manager. (One for the staging Let's Encrypt environment, one for production.) Note: this requires to specify a valid `$EMAIL` address! Note: if this fails, wait a bit and try again (cert-manager needs to be up). .debug[[k8s/gitlab.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/gitlab.md)] --- ## GitLab `do_5_gitlab` Deploy GitLab using their official Helm chart. We pass a lot of parameters to this chart: - the domain name to use - disable GitLab's own ingress and cert-manager - annotate the ingress resources so that cert-manager kicks in - bind the shell service (git over SSH) to port 222 to avoid conflict - use ExternalIPs for that shell service Note: on modest cloud instances, it can take 10 minutes for GitLab to come up. We can check the status with `kubectl get pods --namespace=gitlab` .debug[[k8s/gitlab.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/gitlab.md)] --- ## Log into GitLab and configure it `do_6_showlogin` This will get the GitLab root password (stored in a Secret). Then we need to: - log into GitLab - add our SSH key (top-right user menu → settings, then SSH keys on the left) - create a project (using the + menu next to the search bar on top) - go to project configuration (on the left, settings → CI/CD) - add a `KUBECONFIG` file variable with the content of our `.kube/config` file - go to settings → access tokens to create a read-only registry token - add variables `REGISTRY_USER` and `REGISTRY_PASSWORD` with that token - push our repo (`git remote add gitlab ...` then `git push gitlab ...`) .debug[[k8s/gitlab.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/gitlab.md)] --- ## Monitoring progress and troubleshooting - Click on "CI/CD" in the left bar to view pipelines - If you see a permission issue mentioning `system:serviceaccount:gitlab:...`: *make sure you did set `KUBECONFIG` correctly!* - GitLab will create namespaces named `gl-
-
` - At the end of the deployment, the web UI will be available on some unique URL (`http://
-
-
-gitlab.
`) .debug[[k8s/gitlab.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/gitlab.md)] --- ## Production - `git tag -f production && git push -f --tags` - Our CI/CD pipeline will deploy on the production URL (`http://
-
-gitlab.
`) - It will do it *only* if that same git commit was pushed to staging first (look in the pipeline configuration file to see how it's done!) .debug[[k8s/gitlab.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/gitlab.md)] --- ## Let's talk about build - There are many ways to build container images on Kubernetes - ~~And they all suck~~ Many of them have inconveniencing issues - Let's do a quick review! .debug[[k8s/gitlab.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/gitlab.md)] --- ## Docker-based approaches - Bind-mount the Docker socket - very easy, but requires Docker Engine - build resource usage "evades" Kubernetes scheduler - insecure - Docker-in-Docker in a pod - requires privileged pod - insecure - approaches like rootless or sysbox might help in the future - External build host - more secure - requires resources outside of the Kubernetes cluster .debug[[k8s/gitlab.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/gitlab.md)] --- ## Non-privileged builders - Kaniko - each build runs in its own containers or pod - no caching by default - registry-based caching is possible - BuildKit / `docker buildx` - can leverage Docker Engine or long-running Kubernetes worker pod - supports distributed, multi-arch build farms - basic caching out of the box - can also leverage registry-based caching .debug[[k8s/gitlab.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/gitlab.md)] --- ## Other approaches - Ditch the Dockerfile! - bazel - jib - ko - etc. .debug[[k8s/gitlab.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/gitlab.md)] --- ## Discussion - Our CI/CD workflow is just *one* of the many possibilities - It would be nice to add some actual unit or e2e tests - Map the production namespace to a "real" domain name - Automatically remove older staging environments (see e.g. [kube-janitor](https://codeberg.org/hjacobs/kube-janitor)) - Deploy production to a separate cluster - Better segregate permissions (don't give `cluster-admin` to the GitLab pipeline) .debug[[k8s/gitlab.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/gitlab.md)] --- ## Pros - GitLab is an amazing, open source, all-in-one platform - Available as hosted, community, or enterprise editions - Rich ecosystem, very customizable - Can run on Kubernetes, or somewhere else .debug[[k8s/gitlab.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/gitlab.md)] --- ## Cons - It can be difficult to use components separately (e.g. use a different registry, or a different job runner) - More than one way to configure it (it's not an opinionated platform) - Not "Kubernetes-native" (for instance, jobs are not Kubernetes jobs) - Job latency could be improved *Note: most of these drawbacks are the flip side of the "pros" on the previous slide!* ??? :EN:- CI/CD with GitLab :FR:- CI/CD avec GitLab .debug[[k8s/gitlab.md](https://github.com/BretFisher/container.training/tree/tampa/slides/k8s/gitlab.md)]