Kubernetes has evolved into the dominant container orchestration platform. With Prometheus, if youre running inside the cluster, then the Prometheus instance is susceptible to the cluster conditions, and if the cluster became unstable it would destabilize your monitoring system. Learn more about how Snyk can secure your deployment and bring you some peace of mind. The API was designed to be used both declaratively and imperatively, and operators are an imperative way to use Kubernetes. to create, distribute and apply security profiles for seccomp, AppArmor, SeLinux, You need to evaluate whether certain operations can be changed to avoid the dependency on cluster level permissions. Also, if there are static cluster-scoped resources whose definition wont change based on the inputs given to the operators, you can move the creation of those resources to the Operator Lifecycle Manager (OLM) catalog. But as developers, we can ensure the security of our Kubernetes deployments with permission rules and other restrictions. As with general operating system security, its important to only provide strictly necessary permissions, and to regularly review granted permissions to verify that theyre still required. In this post, well look at some examples of proper operator permission scoping, the tandem roles that operator creators and end-users have in ensuring security, and a few ways to use operators to make Kubernetes services more secure. It is known as the Linux of the cloud for a reason, and continues to gain market share today. By continuing, you agree get a better feeling about all features we want to implement to better support
Unfortunately, most Kubernetes operators require fairly broad privileges to carry out their functionality, so developers need to strike a careful balance between security and utility. To solve this problem, Kubernetes provides security context for containers to control the privileges of the processes running inside it. But after a while, they took over and everybody was writing operators to handle custom lifecycles, partially because it became harder to upstream feature changes into the Kubernetes scheduler., Improvements to the Kubernetes scheduler may obviate the need for operators in some cases, Isenberg suggested: When operators emerged people were using their own custom controllers and operators to manage the workflow or lifecycle of their application because they couldnt customize the scheduler or plugin a custom scheduler. The project tries to not overlap with those existing implementations to provide I cant disagree with this article enough. For example, Jetstacks Cert-Manager can automate the management and issuance of TLS certificates. From a security standpoint, it makes more sense to not having everything run in the same place, because if the operator is compromised, then potentially the entire cluster can be compromised.. Providing use access ensures that the admission controller doesnt permit any pod access if it doesnt satisfy the conditions that you set in the pod security policy. If the operator becomes compromised for any reason, the attacker will now be limited in what they can do to the host system. Security context and pod security policies are useful tools to secure any container. Having many of them, which is the current situation in the ecosystem, introduces a lot of risks to your cluster. Depending on how you classify your operator, operators require permissions at the namespace or cluster level to perform certain operations or access Kubernetes resources. With so many ingress and service mesh options around, theres no standard to build deployment automation on top of, so solutions tend to be non-transferable and limited to a custom stack., He noted that similar patterns to operators were found in Mesos, where the requirement for applications and services to have a custom workload scheduler as part of the two-level scheduler led to a mix of generic schedulers for stateless applications and specific schedulers for complex distributed systems like Spark and Cassandra that needed lifecycle management. Each rule is a set of API groups, resources, and verbs. After enabling the admission controller, you can create a restrictive pod security policy as shown in the following example, and all of the authenticated users and service accounts must be provided with use access to that pod security policy as demonstrated in Enabling pod security policies via RBAC. Or maybe running those operators outside of the cluster and disconnecting and reconnecting them, rather than running them forever.
We schedule a monthly meeting every last Thursday of a month. For example, you can move CRD creation from your operator to OLM since it doesnt change throughout the operators lifecycle. In Kubernetes, you can use the wildcard character (*) in role or cluster role definitions. The more code you write, the more locked into that platform you are. If the wildcard character (*) is used under resources and verbs (as shown in the following example), it means that we allow every possible operation to be performed on all of the resources under the API group: A rule definition like the preceding example in the role or cluster role would not cause any problem with how the operator functions. All an operator is, is a set of controllers so why did I have to make this a first-class concept? He suggested the pattern of using a Helm chart there the operator is just controllers inside that to deliver more types. Cluster roles and cluster role bindings have to be used instead of roles and role bindings. But if youre adapting applications and workloads to run on Kubernetes and operators arent already available for them, its possible Kubernetes isnt the best place to run them. SELinux, seccomp and AppArmor in Kubernetes clusters. If you feel that a user story is not captured Although Kubernetes operators introduce some security considerations, they can also make a cluster more secure. That leads to people creating an operator that bundles Ansible or a Helm chart in an attempt to solve deployment management issues when what they actually have is more of a traditional configuration management problem he believes is better solved by Helm (which Rancher is standardizing on for its 2.5 release), GitOps or Kustomize, or by more ambitious projects like the CUE data constraint language (a generic approach to automation and scripting). This prevents problems from spilling over into other deployments on the same cluster. This project expands on some shortcomings in Kubernetes basic scheduling constructs, using the notion of advanced stateful set, an enhanced version of the Kubernetes default stateful set for building stateful services on Kubernetes. If a container is compromised, the attacker can easily gain access to the host node and attack the host and other containers running on the host. Otherwise, most imperative use is either direct API usage with a controller that doesnt have its own CRD, or just scripting and your CI/CD workflow. I think better paradigms will come along for how to do that than what operators are doing right now, Shepherd said.
In particular, Snyk provides a Kubernetes operator we can install into our cluster. This is why you have projects like Thanos and others that are trying to push the monitoring piece outside of the cluster and then just scrape the cluster from the outside world. When the operator uses issuer instead of cluster issuer to sign a certificate, it avoids requiring the cluster permission. Being diligent in keeping your environment up-to-date pays off by mitigating the risk of vulnerabilities. It has three Kubernetes custom resource types: certificate, issuer, and cluster issuer. Our namespace in this example is configured to enforce at the intermediate baseline security level, and to issue warnings at the aptly-named restricted level. To leverage pod security policies, you have to enable the admission controller in the Kubernetes API server. But Helm is an external add-on, and its capabilities are limited to what we can implement through the existing Kubernetes API. this project: Next to those KEPs, here are existing approaches for security profiles in Security context defines the pod or containers privileges and access control settings. The problem is that its too permissive and might be dangerous. A namespace-scoped operator watches and manages resources within a namespace. When operators are correctly authored, they eliminate errors of negligence or even malice. Operators are a high-privilege component: by design, they run persistently inside your cluster and from a security standpoint, thats introducing risk. It gave us the ability to add custom functionality from within a cluster. Community, discussion, contribution, and support, Add ConfigMap support for seccomp custom profiles, Add KEP to create seccomp built-in profiles and add complain mode, OpenShift's Machine config operator, in charge of file management and security profiles on hosts. But custom schedulers are still rare: I only know of two that arent just operators, he noted. The main permissions system in Kubernetes is role-based access control (RBAC) authorization, which must be enabled when starting the cluster. In other words, permissions should be as restrictive as possible. From a security standpoint, you have similar sets of challenges. The purpose of this blog is to share what IBM Cloud Pak foundational services did to ensure the security of their operators. Understand how to bring elastic runtimes to the Enterprise with effective security and data protection at scale. Feature image by Adriano Gadini via Pixabay. Kubernetes also provides pod security policies that enable the administrator to configure policies to enforce security on every container running on the cluster. It also ensures that the root file system of the container is read only. in the Kubernetes documentation. A role or cluster role includes a list of rules. There are some applications that are just easier to manage on VMs because they were built in an era where thats how they were designed to be deployed and managed.. For more information about the conditions that can be set, see What is a Pod Security Policy? There are a lot of complexities around building an operator and getting it right and making it run fully autonomous in a way that is bulletproof; thats a lot of heavy lifting. You might come across situations where a namespace-scoped operator must create a cluster-scoped Kubernetes resource, for example, a CustomResourceDefinition (CRD) which is a cluster-scoped Kubernetes resource. KUDO bridges the gap between what Helm can do and building the entire dependency tree of an application, Naor explained. Just because you can run something on Kubernetes doesnt mean you should. Another automation option Naor suggests considering is the OpenKruise project. Since OLM has cluster administrator privileges, it can deploy the cluster-scoped resource and no additional permissions have to be given to the operator for its creation. You signed in with another tab or window. The next step is to define cluster-wide pod security restrictions. As discussed earlier, Kubernetes operators perform the actions and configurations normally assigned to human operators who can and regularly do make mistakes. Whenever there is a cluster-scoped Kubernetes resource involved, even a namespace-scoped operator requires cluster level permissions. A newsletter digest of the weeks most important stories & analyses. In that case, make sure that all of the cluster level permissions are properly documented and consumers of the product are aware of the permissions that youre assigning. Everybody wants to maximize their investment in Kubernetes but I feel theres a little bit of a sunk cost fallacy that goes on. Here's the feature parity status across them: For information about the security model and what permissions each features requires, The team will be more than happy You write in a declarative way something that is equivalent to an operator; not as full blown but it gives good coverage for the majority of use cases. But youre building a piece of software that youre going to have to maintain indefinitely and not just a configuration that you can tweak and replace. And where you would have run many operators, instead you run just a single one that performs all the heavy lifting.. Unfortunately, this new PSA doesnt allow the same fine-grained custom control as the YAML in the example above. Essentially, a Pod Security Policy is a cluster-scoped resource that define a set of conditions that a pod must run with in order to be accepted into the system. So they built something that you install once and then declaratively say what to inject and where to inject, Naor said. While the full potential of operators hasnt been reached, its important to not lose sight of the importance of building in security as early as possible in the creation process. The project is currently discussing how Helm 4 will deal with the fragility of CRDs as cluster-wide shared global resources without sacrificing the usability of Helm, and there are questions about handling the operator patternas part of this. This project is the starting point for the Security Profiles Operator (SPO), an You might unknowingly provide privileged access to resources that you didnt intend on providing per your requirements. You should also consider how it will be consumed. But if you look across the board, how many of those systems exist? Using this snippet, we prevent processes from gaining root privileges and modifying their root file system. These capabilities can all be accomplished by using Kubernetes operators. UPDATED: Want to Get Started with Flamelink, a Firebase CMS right now? From there, Snyk finds and reports vulnerabilities as they arise in existing or new workloads on the cluster. IBM Cloud Pak foundational services went through the journey of converting all of their common service helm charts into optimized operators. A privileged process running as root inside the container is similar to a privileged process running on the host itself. the Kubernetes world: The SPO's features are implemented for each one of the underlying Heres an example of a role that allows read access to the pod resources in the namespace specified: The empty string for apiGroups indicates the core API. While Helm maintainer Matt Butcher has previously rejected the idea of using Helm to perform operator-like tasks, more operators are stretching the original definition and acting as custom installers in ways that overlap with Helm. Writing operators is a very advanced use case that should be almost never recommended. Managing resources in early versions of Kubernetes was a straightforward affair: we could define resources with YAML markup and submit these definitions to the cluster. Because writing operators commits you to maintaining the code, you will also have to commit to hiring more developers and a Kubernetes platform team who have skills with operators in the future, to make sure you use of Kubernetes is robust and secure. Data has gravity and it might take more than 30 seconds to evacuate a node, or it might require something more complicated that stateful sets, which were designed for the etcd pattern where you have name nodes and those names arent supposed to change, theyre just supposed to come back up. Roles define how privileged actors can access resources, and RoleBindings connect human or software component actors to Roles. Most deployments to a cluster are concerned with day-to-day operations of the application, but its possible to deploy dedicated security management services right into the cluster. Those pre-cloud native deployment designs needed to be adapted to Kubernetes and so people did that a lot with operators., If youre writing a stateless application, you dont need to write an operator at all. We dont sell or share your email. For example, some operators need to create customized cluster-scoped resources such as cluster role bindings or cluster roles depending on the use case. Similar to the RBAC system, it makes sense to restrict the operators permissions as much as possible. To match this, we can define a role binding that assigns this role to a specific user account: Operator deployments often come with their own sets of roles and bindings. GitOps is completely different but its where your time is better spent because its most likely the solution to your problems at the moment is going to be in GitOps not in writing operators., As Rancher has started to tackle edge computing, which amplifies the problem of multiple clusters, he noted that were realizing that were effectively just reinventing configuration management., To have an operator says Im going to create something to embody all the complexities of operating an application and that just doesnt make much sense because honestly, whose application is that complicated? The recommended way to monitor the Kubernetes audit log is not from inside the cluster, but rather than having an external security operations cluster that connects to the target cluster and perform the monitoring, which means that the cluster is more resilient to a threat actor trying to neutralize your security. Operators built on that. Ranchers experience writing controllers for Kubernetes has shown that there are some standard patterns. No, its just going to make it a little bit easier but youre still operating a very complex system.. This chain starts with the operators authors and their repository and continues to the way the operator is delivered to the users cluster, among many other things. Snyk offers solutions to the potential vulnerabilities we outlined in this post, and to many others, by alerting developers to security issues as they arise during coding. Adding the following security context to a container ensures that the process inside the container doesnt run as root or in privileged mode. Basically the only time you would really write an operator with custom orchestration logic is for persistent systems that are highly available: not a persistent application because thats typically just talking to a database, but the actual persistent system like Cassandra that needs to be highly available and has custom logic on how to do quorums and failover and stuff like that, Shepherd explained. For more information about enabling the controller, see How do I turn on an admission controller in the Kubernetes documentation. Before Kubernetes version 1.21, we would do this with the PodSecurityPolicy (PSP) object. When you look at those use cases, an operator for Cassandra or MySQL or persistent system really makes sense. page. A cluster-scoped operator watches and manages resources across multiple or all namespaces within a cluster. If its something as generic as backup and restore, probably someone already did that in a generic way that would fit your application. Side clusters a term he coined in analogy to the sidecar pattern are particularly useful for security and monitoring services and could also improve the security and privilege concerns with operators. If you're interested in contributing to SPO, please see the developer focused If they dont, they do have something similar to it. So the logical next step for Kubernetes was the extension of the Kubernetes API using operators. So when you write operators, youre locked into Kubernetes, whereas a deployment is like a configuration file; you could just throw away the deployment and take your service somewhere else. If youre going to build an operator that sense theres a new microservice version available and you need to upgrade stuff, GitOps is a pretty generic way of achieving the same thing: you commit that theres a new version and the GitOps agent will synchronize that to your target environment and everybodys happy., Naor suggested KUDO as a declarative alternative for automating the deployment, installation and lifecycle of complex applications on Kubernetes. While the gains of switching to operators are great, from a security perspective, it means that theres more potential risk if compromised. Although there are a few use cases where operators are so extremely useful that theyre invaluable, theyre also a highly abused concept and its not clear that people are thinking through the security implications before diving into writing operators, Alcide founder and Chief Technology Officer Gadi Naortold us. Instead of using the wildcard character, the best practice is to explicitly list out each API group, verb, and resource as shown in the following rule definition: In general, cluster-scoped operators require access to resources across the entire cluster and need cluster level permissions. It offers additional permission resources: Roles and RoleBindings, which only apply to the namespace in which they are defined; and ClusterRoles and ClusterRoleBindings, which apply to the entire cluster (and should only be used if necessary). And when they do, its the vendor or the project that should be writing the operator, not you. If were deploying a Kubernetes operator to a cluster, we should treat general Kubernetes security conventions as foundational to our operator-specific considerations. This made sharing configurations easy, and allowed us to conveniently deploy charts with one command. Operators are fundamentally a way to get around the lack of dependency management or resource hierarchies in Kubernetes, Isenberg suggested (echoing much of the Helm discussion about CRDs), and neither of those are problems that will be solved quickly so sometimes an operator is the answer. Whenever possible, operator developers should choose namespace-scoped variants. Instead, we select from three newly defined policy levels privileged, baseline, and restricted to apply to each namespace.
As any other piece of software, this operator is meant to help people. Similarly, the usage of host path volumes allows files on the host node to be accessible from the container. Are you thinking this setup sounds like a potential place for vulnerabilities?