How to run data on Kubernetes: 6 starting principles

How to run data on Kubernetes: 6 starting principles

Sylvain Kalache

Sylvain Kalache is the co-founder of Holberton, an edtech company training digital talent in more than 10 countries. He is an entrepreneur and a software engineer who has been working in the tech industry for over ten years. He was part of the team that led SlideShare’s acquisition by LinkedIn. He has also written for VentureBeat and CIO.

More posts by this contributor

Kubernetes is fast becoming an industry standard, with up to 94% of organizations deploying their services and applications on the container orchestration platform, per a survey. Standardization is a key reason why companies use Kubernetes. It allows advanced users to double their productivity.

Organizations have the ability to deploy any workload anywhere using standardizing on Kubernetes. However, Yhe technology thought that workloads were temporary. This meant that only stateless workloads could safely be deployed on Kubernetes. The community recently changed the paradigm by introducing features such as Storage Classes and StatefulSets, which allow data to be used on Kubernetes.

Although it is possible to run stateful workloads on Kubernetes, it is still difficult. This article will explain how to make it happen and the benefits.

Do it progressively

Kubernetes will soon be as popular as Linux. It is the de facto method of running any application anywhere in a distributed manner. Kubernetes requires you to learn a lot about technical concepts and vocabulary. For example, newcomers might have difficulty understanding the many Kubernetes logic units like containers, pods and nodes.

If you don’t have Kubernetes running in production, don’t jump into data workloads. To avoid losing data if things go wrong, you should start by moving stateless applications.

You don’t have to search for an operator that suits your needs. Most of them are open-source.

Understand the limitations and specificities

Once you have a good understanding of the general Kubernetes concepts and are comfortable with them, you can dive into the details for stateful concepts. You must ensure that the correct storage system is provided for applications, which may have different storage requirements.

What industry calls storage profiles is called Storage Classes in Kubernetes. They are a way to describe which classes a Kubernetes cluster has access to. Different quality-of-service levels can be applied to storage classes, such as I/O operations per minute per GiB, backup policies, or arbitrary policies like binding modes and allowed topologies.

Another important component to understand is StatefulSet. It is the Kubernetes API objects used to manage stateful apps. It offers key features like:

  • Stable, unique network identifiers that let you keep track of volume, and allows you to detach and reattach them as you please.
  • Stable, persistent storage so that your data is safe.
  • Ordered, graceful deployment and scaling, which is required for many Day 2 operations.

While StatefulSet has been a great replacement for the infamous PetSet, it is still in its infancy and has some limitations. For example, the StatefulSet controller has no built-in support for volume (PVC) resizing — which is a major challenge if the size of your application dataset is about to grow above the current allocated storage capacity. There are workarounds, but such limitations must be understood well ahead of time so that the engineering team knows how to handle them.

Read More