Cassandra is a robust database that provides high availability and self-healing. It’s also an excellent fit for Kubernetes, which provides an orchestration platform to deploy and manage cloud-based distributed applications.
This tutorial will guide you through installing the Cassandra Kubernetes provider on your cluster. It will include creating a Cassandra Pod and StatefulSet in your Kubernetes cluster.
Table of Contents
Create a Kubernetes Cluster
Kubernetes is the leading orchestration platform for deploying and managing containerized systems in the cloud. It simplifies the lifecycle of distributed applications and ensures that resources are deployed in the right places, providing elasticity and allowing businesses to scale their infrastructure without worrying about manual deployments or over-provisioning.
Creating a cluster is the first step to starting with the Cassandra Kubernetes provider. It can be easily achieved by following one of the available earning started guides or utilizing an existing collection from your cloud provider. Once your cluster is set up, you can confidently create the required resources through your Kubernetes dashboard or the command line.
Once you’ve created a cluster, deploying a Cassandra StatefulSet is a breeze with the help of the YAML file. The file will create a Cassandra ring with three pods. You can verify the pods have been deployed by running the kubectl node tool, and you can edit the StatefulSet to change the number of pods. The command for modifying the YAML file is kubectl edit.
Install the Cassandra Kubernetes Provider
Cassandra needs to have persistent storage as a database to ensure data durability. Managing Kubernetes is challenging, as containers cannot be guaranteed a specific disk or cluster. To solve this problem, StatefulSets have been introduced, allowing persistent workloads to be managed by the Kubernetes control plane and automatically scheduled onto persistent storage.
However, the complexity of StatefulSets and other Kubernetes features can make it hard to run and maintain a production-grade database on a Kubernetes cluster. The Cassandra community has coalesced around a solution called the Cassandra operator, installed via Helm (the Kubernetes package manager). It provides a layer of translation between what Kubernetes requires to maintain services and how the application works, making it easier to deploy and manage.
Create a Cassandra Pod
When it comes to databases, Cassandra is all about distributed architectures. In the past, running applications in Kubernetes with a database outside of it led to mismatched architectures and limited developer productivity. Cassandra addresses this by providing a high-performance NoSQL database with the reliability of a mature open-source project.
With an operator in place, deploying Cassandra instances and ensuring data is available to the right users is easy. Whether your Kubernetes cluster is on open-source Kubernetes, Google Kubernetes Engine (GKE), Amazon Elastic Container Service (EKS), or Pivotal Container Service (PKS), it’s just a matter of installing the cass-operator and applying its configuration YAML files.
One of these files is for the Cassandra pod, and you’ll want to set the listen_address, rpc_address, and seed_provider to your internal IP address. Also, make sure to select the storage_port parameter to a non-default port. Finally, you’ll need to specify the endpoint_snitch, which is set to SimpleSnitch by default but should be changed to gossiping property file snitch for production deployments.
Create a Cassandra Service
Cassandra embodies inherent replication that ensures your data is stored in multiple places. It can help protect against data loss and improve performance when scaling up a database system.
Until recently, deploying a distributed database alongside distributed applications meant managing these two systems separately. It led to a mismatched architecture, limited developer productivity, and duplicative stacks for application monitoring and database infrastructure management.
With the emergence of Kubernetes and its container orchestration capabilities, it’s easy to run stateful applications and databases on the same platform. It eliminates the mismatch between the applications and the database and makes it easier for DevOps teams to manage both.
This guide will use the cass-operator operator and open-sourced on Github to deploy a Cassandra headless service. The Cassandra headless service is a cluster-agnostic entity that clients will use to connect to the Cassandra cluster using one contact point, the DNS name of a pod (Cassandra-0 in this example), which will be translated into the IP address of a member of the StatefulSet Cassandra.
Create a Cassandra StatefulSet
Cassandra is a distributed NoSQL database that handles large amounts of data across multiple nodes. It provides high availability and allows you to scale up or down without downtime. It also has self-healing capabilities that make it ideal for mission-critical applications.
You must provide information about your cluster to deploy the Cassandra helm chart. It includes the default node pool size and internode encryption. You can also specify your Cassandra installation’s data center, rack, and version. Bitnami will only update the container images for this chart if it detects significant changes in the original image.
When you create a StatefulSet, you need to configure the config file for each node in the cluster. For example, you need to set the listen_address and rpc_address to the internal IP address of each node in the cluster.
Create a Cassandra Node
Cassandra is a distributed database that uses a ring network topology. The nodes are organized in a peer-to-peer fashion, meaning that each machine (whether physical or virtual) has equal access to the same data. Each node holds part of the data and can accept read and write requests.
When a new node joins the cluster, it streams data from existing nodes to be consistent with the current state of the data. This process is known as compatible bootstrap. The default behavior is for the joining node to pick contemporary replicas from each token range to guarantee that the data it receives is consistent with other data in the cluster.
The YAML file for the Cassandra cluster defines a seed list used to bootstrap the discovery of the database. The seed list contains a comma-delimited list of internal IP addresses of the cluster’s seed nodes. To avoid getting a failure result message at startup, ensure the seed list includes exactly three hostnames.
Review How to Deploy Cassandra Kubernetes Provider on Your Cluster: A Step-By-Step Guide.