This tutorial focuses on how to install and Configure Apache Cassandra on CentOS Stream 9 / RHEL 9. Apache Cassandra is an open-source NoSQL database written in Java that manages massive amounts of data at the same time. It is a lightweight, largely distributed, and non-relational database with among its strengths being it automatically scales horizontally, has distributed architectures, and has a flexible approach to a schema definition. The NoSQL database feature enables rapid, ad-hoc organization and analysis of huge sets and disparate data types. It can be installed anywhere as it is vendor-independent and therefore works with any of the major cloud providers.

Each instance of Cassandra is called a node that can handle 2-4TB of data and many thousands of operations per second which depends on the resources allocated to a single node. It is a peer-to-peer system with masterless architecture where each node can provide the same functionality as any other node. The nodes communicate wit each other via a protocol called gossip. It is purposed to maintain a high-availability cluster with 100% uptime even in case of Node failure as all nodes can handle any operation done by a different node. It offers geographic distribution where you can set up data centers in different parts of the world and Apache Cassandra will handle the communication between the nodes. It being distributed means that Cassandra can run on multiple machines while appearing as a unified whole to users.

Cassandra uses partitions to distribute data automatically with positive performance consequences. It has a partition key that is responsible for distributing data among nodes and for determining data locality. Each node owns a particular set of tokens, which Cassandra distributes data based on the ranges of these tokens across the cluster. When data is interested in a cluster, a hash function is applied to the partition key that determines what node gets the data. The node that owns the data for that range is called a replica node that can be replicated to multiple (replica) nodes, to ensure reliability and fault tolerance. Cassandra supports the replication factor (RF) notion which describes how many copies of your data should be in the database.

Apache Cassandra is an attractive option to enterprises due to its unique features that provide linear scalability and have proven fault-tolerance on commodity hardware making it the perfect platform for mission-critical data. The features include;

  • Fault Tolerant – Cassandra replicates across multiple data centers, providing lower latency for your users and the assurance that data will not be lost in case of any failure. Failed nodes are replaced with no downtime.
  • Distributed – Cassandra is designed as a distributed system. To benefit from its maximum performance it is recommended for Cassandra to be run on multiple machines. The architecture makes it suitable for applications that are not supposed to lose data even when a node is down.
  • Scalable – Cassandra scales linearly when new machines are added across as many geographical sites as needed. It also streams data between nodes during scaling operations such as adding a new node or data center during peak traffic times providing an elastic architecture, particularly in Cloud.
  • Performant and focuses on Quality – Cassandra has been tested with over 1000 nodes and outperforms popular NoSQL databases in benchmarks and real-world applications.
  • Security and observability – with the new Audit Logging feature, Cassandra tracks the DML, DDL, and DCL activity with minimal impact on normal workload performance

Apache Cassandra powers mission-critical deployments with improved performance and high scalability. It is run by different incorporations, from startups to the largest enterprises. They include Apple, Alby, BestBuy, Bloomberg, Bigmate, Airship, Instagram, Hule, eBay, Macy’s, The New York Times, Target, Spotify, Walmart, Uber, Yelp, and many more.

The latest release of Apache Cassandra is version 4 with the following features.

  • Support for Java 11 that can be used to build and run Apache Cassandra 4.0
  • Apache Cassandra 4.0 implements virtual tables backed by an API
  • Introduced Audit Logging to heap memory and disk space to prevent out-of-memory errors
  • Support new feature with Full Query Logging (FQL) for debugging query traffic and migration
  • Improved Internode Messaging with optimized Internode Messaging Protocol
  • Improved Streaming which is the way nodes of cluster exchange data in form of SStables

Install Apache Cassandra on CentOS Stream 9 / RHEL 9

The following steps will lead you into the successful installation of Apache Cassandra on CentOS Stream 9 / RHEL 9.

Install missing dependencies on your system

sudo yum install java python3 python-pip 

Install cqlsh with pi using the following command

sudo pip3 install cqlsh tox

Confirm Java is installed

$ java -version
openjdk version "11.0.16" 2022-07-19 LTS
OpenJDK Runtime Environment (Red_Hat- (build 11.0.16+8-LTS)
OpenJDK 64-Bit Server VM (Red_Hat- (build 11.0.16+8-LTS, mixed mode, sharing)

$ cqlsh --version
cqlsh 6.0.0

Create the repo file to add Cassandra’s repository.

sudo vi /etc/yum.repos.d/cassandra.repo

Then add the Apache repository of Cassandra to the file.

name=Apache Cassandra

Once you have pasted the above commands, tap esc, then tap 😡 to save and close the file. Update the package index.

sudo yum update -y

PS. If you get an error while importing GPG keys, update the crypto policies to LEGACY to ensure compatibility then reboot your system.

sudo update-crypto-policies --set LEGACY
sudo reboot

Install Cassandra with the following command

sudo yum install cassandra -y

Verify that Apache Cassandra has been successfully installed using the rpm command below.

$ rpm -qi cassandra
Name        : cassandra
Version     : 4.1~alpha1
Release     : 1
Architecture: noarch
Install Date: Tue 13 Sep 2022 12:37:11 AM EAT
Group       : Development/Libraries
Size        : 61567406
License     : Apache Software License 2.0
Signature   : RSA/SHA256, Fri 20 May 2022 11:40:59 PM EAT, Key ID e91335d77e3e87cb
Source RPM  : cassandra-4.1~alpha1-1.src.rpm
Build Date  : Fri 20 May 2022 11:40:38 PM EAT
Build Host  : 10a3cf41bc4a
URL         :
Summary     : Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store.
Description :
Cassandra is a distributed (peer-to-peer) system for the management and storage of structured data.

Create the Cassandra service

$ sudo vim /etc/systemd/system/cassandra.service
Description=Apache Cassandra

ExecStart=/usr/sbin/cassandra -f -p /var/run/cassandra/


Then reload the daemon.

sudo systemctl daemon-reload

Start the Cassandra service and enable it to start on boot.

sudo systemctl start cassandra
sudo systemctl enable cassandra

Then check the status of the service.

$ sudo systemctl status cassandra
cassandra.service - Apache Cassandra
     Loaded: loaded (/etc/systemd/system/cassandra.service; disabled; vendor pr>
     Active: active (running) since Tue 2022-09-13 00:39:36 EAT; 30s ago
   Main PID: 2826 (java)
      Tasks: 44 (limit: 48809)
     Memory: 2.2G
        CPU: 10.231s
     CGroup: /system.slice/cassandra.service
             └─2826 /usr/bin/java -ea -da:net.openhft... -XX:+UseThreadPrioriti

You can also check the status of Cassandra using nodetool.

$ nodetool status
Datacenter: datacenter1
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load        Tokens  Owns (effective)  Host ID                               Rack 
UN  104.38 KiB  16      100.0%            304ddf04-80c4-460a-9076-cb364443c6fb  rack1

To connect to the database, use the following command.

$ cqlsh
Connected to Test Cluster at
[cqlsh 6.0.0 | Cassandra 4.1-alpha1 | CQL spec 3.4.5 | Native protocol v5]
Use HELP for help.

You can change the cluster name with the following command.

> UPDATE system.local SET cluster_name = 'Technixleo Cluster' WHERE KEY = 'local';


> quit

Edit the YAML configuration file to also cahnge the cluster name.

sudo vi /etc/cassandra/default.conf/cassandra.yaml

Edit the Cluster name

cluster_name: 'Technixleo Cluster'

Using separate secondary disk for Cassandra Data

Cassandra is data hungry, this makes it use a lot of space in your main disk. Most users prefer having a dedicated disk to store Cassandra data.

My secondary disk is /dev/sdb. I will create a partition on the disk using the fdisk utility.

$ sudo fdisk /dev/sdb
Welcome to fdisk (util-linux 2.37.4).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Device does not contain a recognized partition table.
Created a new DOS disklabel with disk identifier 0x8ed35d2c.

Command (m for help): n
Partition type
   p   primary (0 primary, 0 extended, 4 free)
   e   extended (container for logical partitions)
Select (default p): p
Partition number (1-4, default 1): 1
First sector (2048-104857599, default 2048): 
Last sector, +/-sectors or +/-size{K,M,G,T,P} (2048-104857599, default 104857599): 

Created a new partition 1 of type 'Linux' and of size 50 GiB.

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

The new partition is /dev/sdb1 as shown below.

$ sudo fdisk -l /dev/sdb
Disk model: QEMU HARDDISK   
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x8ed35d2c

Device     Boot Start       End   Sectors Size Id Type
/dev/sdb1        2048 104857599 104855552  50G 83 Linux

Create the directory to store Cassandra’s data.

sudo mkdir /var/lib/cassandra

Assign correct permission to the new directory.

sudo chmod 777 /var/lib/cassandra

You might get an error while mounting the directory with respect to the file system. You can format the partition with a filesystem say ext4 as shown below.

$ sudo mkfs.ext4 /dev/sdb1
mke2fs 1.46.5 (30-Dec-2021)
Discarding device blocks: done                            
Creating filesystem with 13106944 4k blocks and 3276800 inodes
Filesystem UUID: 81f42655-7ccd-4698-bc2a-7fdc45dfd106
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
	4096000, 7962624, 11239424

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (65536 blocks): done
Writing superblocks and filesystem accounting information: done

Mount the directory on the new partition with the following command.

sudo mount /dev/sdb1 /var/lib/cassandra

Confirm it is mounted

$ df -h /var/lib/cassandra
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb1        49G   24K   47G   1% /var/lib/cassandra

View the UUID of the new device with the following command

sudo file -sL /dev/sdb1
/dev/sdb1: Linux rev 1.0 ext4 filesystem data, UUID=81f42655-7ccd-4698-bc2a-7fdc45dfd106 (needs journal recovery) (extents) (64bit) (large files) (huge files)

Edit your /etc/fstab to include new partition and choosing a relevant file-system

UUID=81f42655-7ccd-4698-bc2a-7fdc45dfd106 /var/lib/cassandra ext4 defaults 0 2

Clear the system cache.

nodetool flush system

Then restart to apply changes.

sudo systemctl restart Cassandra

Then check if Cassandra is running

nodetool status

nodetool Utility

nodetool is a command line utility that is used to manage a cluster by exposing operations and attributes available with Cassandra.

To identify the nodetool version, use the following command.

$ nodetool version
ReleaseVersion: 4.1-alpha1

To return information about a specific node, use the following command.

$ nodetool info
ID                     : 304ddf04-80c4-460a-9076-cb364443c6fb
Gossip active          : true
Native Transport active: true
Load                   : 125.35 KiB
Generation No          : 1663019321
Uptime (seconds)       : 194
Heap Memory (MB)       : 114.18 / 1902.00
Off Heap Memory (MB)   : 0.00
Data Center            : datacenter1
Rack                   : rack1
Exceptions             : 0
Key Cache              : entries 11, size 984 bytes, capacity 95 MiB, 115 hits, 130 requests, 0.885 recent hit rate, 14400 save period in seconds
Row Cache              : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds
Counter Cache          : entries 0, size 0 bytes, capacity 47 MiB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds
Percent Repaired       : 100.0%
Token                  : (invoke with -T/--tokens to see all 16 tokens)

nodetool describecluster command will give you the name of the Cassandra cluster.

$ nodetool describecluster
Cluster Information:
	Name: Technixleo Cluster
	Snitch: org.apache.cassandra.locator.SimpleSnitch
	DynamicEndPointSnitch: enabled
	Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
	Schema versions:
		54e17321-3f2e-37ca-9b08-d91ba7bdd369: []

Stats for all nodes:
	Live: 1
	Joining: 0
	Moving: 0
	Leaving: 0
	Unreachable: 0

Data Centers: 
	datacenter1 #Nodes: 1 #Down: 0

Database versions:
	4.1.0-alpha1: []

	system_schema -> Replication class: LocalStrategy {}
	system -> Replication class: LocalStrategy {}
	system_auth -> Replication class: SimpleStrategy {replication_factor=1}
	system_distributed -> Replication class: SimpleStrategy {replication_factor=3}
	system_traces -> Replication class: SimpleStrategy {replication_factor=2}

To identify which node is responsible for handling which range of tokens:

$ nodetool ring
Datacenter: datacenter1
Address         Rack        Status State   Load            Owns                Token                                       
                                                                               7908336325362939036                       rack1       Up     Normal  125.35 KiB      100.00%             -8948403195832713941                      rack1       Up     Normal  125.35 KiB      100.00%             -7329379900135670643                      rack1       Up     Normal  125.35 KiB      100.00%             -6351234447619556568                      rack1       Up     Normal  125.35 KiB      100.00%             -4572600146795969439                      rack1       Up     Normal  125.35 KiB      100.00%             -3476558246561031150                      rack1       Up     Normal  125.35 KiB      100.00%             -2385261135868818890                      rack1       Up     Normal  125.35 KiB      100.00%             -1540884488377657134                      rack1       Up     Normal  125.35 KiB      100.00%             -100616258475073895                       rack1       Up     Normal  125.35 KiB      100.00%             970884357864958917                        rack1       Up     Normal  125.35 KiB      100.00%             1988735762607385932                       rack1       Up     Normal  125.35 KiB      100.00%             2806407695219736727                       rack1       Up     Normal  125.35 KiB      100.00%             3882163576445316266                       rack1       Up     Normal  125.35 KiB      100.00%             4618079269156209907                       rack1       Up     Normal  125.35 KiB      100.00%             5697452076603811565                       rack1       Up     Normal  125.35 KiB      100.00%             6940972082284403537                       rack1       Up     Normal  125.35 KiB      100.00%             7908336325362939036                         

  Warning: "nodetool ring" is used to output all the tokens of a node.
  To view status related info of a node use "nodetool status" instead.

To query information about a remote node:

nodetool -h <ip-address> -p <jmx-port> info

To remove data from a node that is not responsible for, use the following command

nodetool cleanup

Wrap up

Apache Cassandra database is designed to scale when an application is under high stress removing the possibility of losing data or stalling operations. It is capped as an ‘always on’ database that is also deployment agnostic meaning you can run it – on-prem, a cloud provider, or multiple cloud providers. Cassandra allows you to tune your consistency by representing the minimum number of Cassandra nodes that must acknowledge a read or write operation to the coordinator before the operation is considered successful. Check below for some of our other articles;


  1. Hello Ann Kamau, thank you for giving such a detailed explanation for installing Cassandra in Centos Stream, I followed your step by step but currently stuck at the “cqlsh” command to connect to Cassandra.

    Whenever i tried to run that command I got the following error :
    Traceback (most recent call last):
    File “/usr/bin/”, line 148, in
    from cqlshlib import cql3handling, pylexotron, sslhandling, cqlshhandling, authproviderhandling
    ImportError: cannot import name ‘authproviderhandling’ from ‘cqlshlib’ (/usr/local/lib/python3.9/site-packages/cqlshlib/

    this is after I installed the cqlsh according to step by step that provided above


Please enter your comment!
Please enter your name here