Skip to content

Ceph Concepts

Here's a detailed overview of Ceph storage concepts:

Architecture

  1. Distributed Storage System Ceph is a distributed storage system designed to provide scalable, reliable, and high-performance storage. It eliminates single points of failure, allowing for continuous operation.

1. Components

  1. Monitors (MON)
    • Maintain cluster state and configuration.
    • Ensure consensus among nodes.
  2. Managers (MGR)
    • Handle cluster management and monitoring.
    • Provide API for external tools.
  3. Object Storage Devices (OSD)
    • Each drive on each node is one OSD.
    • Create pools out of HDD, SSD OSDs for better performance and separation.
    • Store data.
    • Responsible for data replication and recovery.
  4. Metadata Servers (MDS)
    • Manage file system metadata.
    • Optimize file system performance.

Data Storage

  1. Object Storage
    • Stores data as objects within pools.
    • Supports S3 and Swift APIs.
    • Pools
      • Logical grouping of OSDs.
      • Define storage policies (replication, erasure coding).
    • Objects
      • Stored data with associated metadata.
      • Support for large objects (multiple GBs).
  2. Block Storage
    • Provides block devices (RBD) for VMs and applications.
    • Supports thin provisioning and snapshots.
  3. File System Storage
    • CephFS provides a POSIX-compliant file system.
    • Supports file and directory operations.

Data Replication and Distribution

  1. Replication
    • Ensures data durability through replication.
    • Configurable replication factor (e.g., 3x).
    • Replication Factor
      • Configurable replication factor.
      • Affects data durability and storage capacity.
  2. Placement Groups (PG)
    • Distribute data across OSDs.
    • Optimize data placement and recovery.
    • PG Count
      • Configurable PG count.
      • Affects data distribution and performance.

Cluster Management

  1. Cluster Configuration
    • Managed through configuration files and CLI.
    • Supports dynamic reconfiguration.
  2. Monitoring and Maintenance
    • Ceph provides tools for monitoring and maintenance.
    • Supports integration with external tools (e.g., Grafana, Prometheus).

Data Protection and Security

  1. Data Encryption
    • Optional data encryption.
    • Supports SSL/TLS encryption.
  2. Authentication and Authorization
    • Ceph supports authentication and authorization.
    • Integrates with external auth systems (e.g., LDAP).

Performance Optimization

  1. Erasure Coding
    • Efficient data storage and recovery.
    • Supports various erasure coding algorithms.
  2. Cache Tiering
    • Improves performance with caching.
    • Supports SSD and NVMe caching.

Advanced Topics

  1. BlueStore
    • Ceph's object store storage engine.
    • Optimized for performance and efficiency.
  2. CephFS
    • POSIX-compliant file system.
    • Supports snapshots and quotas.
  3. RBD (RADOS Block Device)
    • Provides block devices for VMs and applications.
    • Supports thin provisioning and snapshots.

Setting up Ceph Storage for Proxmox Cluster

Requirements

  • Minimum 3 servers for the Ceph cluster (max depending on the desired level of redundancy)
  • Each server should have:
    • A minimum of 2 disks (1 for the operating system and 1 for Ceph storage)
    • A minimum of 8 GB of RAM
    • A 64-bit CPU
    • Spare disks on all nodes for Ceph should be of similar/same size to avoid capacity loss.
  • Proxmox VE 6.x or later installed on each node
  • A separate network for Ceph communication (optional but recommended)

Steps

1: Install Ceph on Each Node

Update the node and install using the Ceph section in Proxmox UI. Do this for all the nodes.

Ceph Section

2: Add Ceph monitors for each node

For any of the nodes, navigate to Ceph -> Monitors and click create to add monitors. Do this to have one monitor for each node.

3: Add additional Ceph managers for each node

One manager is fine for the cluster, but its good to create multiple managers that stay in standby mode.

4: Create OSDs

OSDs can be created one per storage disk. Go the each node, Ceph -> OSD and create OSD for each node. They all will appear together.

OSDs on LVM partition

Generally, OSDs in Proxmox only works on full disks but using below ceph commands we can setup OSDs on an LVM.

ceph auth get client.bootstrap-osd > /var/lib/ceph/bootstrap-osd/ceph.keyring
ceph auth get client.bootstrap-osd > /etc/pve/priv/ceph.client.bootstrap-osd.keyring

# Create new logical volume with the remaining free space. 
lvcreate -l 100%FREE -n pve/ceph-osd  #any lvm command to create lv would work

#you can run lvs, lvscan, pvs(free space left) to see details on newly created lv.

#Create (= prepare and activate) the logical volume for OSD 
ceph-volume lvm create --data pve/ceph-osd
More details on lvm here

You may get some error lines in some cases but if creation is successful(last line), things should work
ceph-volume lvm create --data pve/ceph-lv
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 3b0e4ffa-35f0-4fd8-8b65-b93f0e44234c
Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-2
--> Executable selinuxenabled not in PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
Running command: /usr/bin/chown -h ceph:ceph /dev/pve/ceph-lv
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-5
Running command: /usr/bin/ln -s /dev/pve/ceph-lv /var/lib/ceph/osd/ceph-2/block
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-2/activate.monmap
stderr: got monmap epoch 3
--> Creating keyring file for osd.2
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2/keyring
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2/
Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 2 --monmap /var/lib/ceph/osd/ceph-2/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-2/ --osd-uuid 3b0e4ffa-35f0-4fd8-8b65-b93f0e44234c --setuser ceph --setgroup ceph
stderr: 2024-12-24T18:06:17.882-0800 782c539ba840 -1 bluestore(/var/lib/ceph/osd/ceph-2//block) _read_bdev_label unable to decode label /var/lib/ceph/osd/ceph-2//block at offset 102: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input [buffer:3]
stderr: 2024-12-24T18:06:17.883-0800 782c539ba840 -1 bluestore(/var/lib/ceph/osd/ceph-2//block) _read_bdev_label unable to decode label /var/lib/ceph/osd/ceph-2//block at offset 102: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input [buffer:3]
stderr: 2024-12-24T18:06:17.883-0800 782c539ba840 -1 bluestore(/var/lib/ceph/osd/ceph-2//block) _read_bdev_label unable to decode label /var/lib/ceph/osd/ceph-2//block at offset 102: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input [buffer:3]
stderr: 2024-12-24T18:06:17.883-0800 782c539ba840 -1 bluestore(/var/lib/ceph/osd/ceph-2/) _read_fsid unparsable uuid
--> ceph-volume lvm prepare successful for: pve/ceph-lv
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2
Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/pve/ceph-lv --path /var/lib/ceph/osd/ceph-2 --no-mon-config
Running command: /usr/bin/ln -snf /dev/pve/ceph-lv /var/lib/ceph/osd/ceph-2/block
Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-2/block
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-5
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2
Running command: /usr/bin/systemctl enable ceph-volume@lvm-2-3b0e4ffa-35f0-4fd8-8b65-b93f0e44234c
stderr: Created symlink /etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-2-3b0e4ffa-35f0-4fd8-8b65-b93f0e44234c.service → /lib/systemd/system/ceph-volume@.service.
Running command: /usr/bin/systemctl enable --runtime ceph-osd@2
stderr: Created symlink /run/systemd/system/ceph-osd.target.wants/ceph-osd@2.service → /lib/systemd/system/ceph-osd@.service.
Running command: /usr/bin/systemctl start ceph-osd@2
--> ceph-volume lvm activate successful for osd ID: 2
--> ceph-volume lvm create successful for: pve/ceph-lv
Node without Monitor/Manager

if your node is not a manager or monitor then you need to run this command as well

ln -s /etc/pve/ceph.conf /etc/ceph/ceph.conf

5: Create Ceph Pool

Create pool, ceph -> pool Size -> Number of replicas Min Size -> Min number of replicas.

A cluster of size 3 can have max replica 3.

5: Output

For the main cluster interface, this is how it should look.

Ceph Issues and Solutions

rbd error: rbd: listing images failed: (2) No such file or directory (500)

The usually happens when a disk migration to ceph fails or there is a corrupt disk in ceph.

Issue following commands to list and remove corrupt disk

List disk - This should fail with same error

rbd ls -l ceph-pool-store

List disk - This should work

rbd -p ceph-pool-store list

List disk - delete corrupt disk

rbd -p ceph-pool-store rm vm-1001-disk-0

Slow Migration

If the VM is running during migration, it'll be slow. Migrate after turning off the VM.

Remove Ceph from Proxmox

Steps - Migrate all VM disks to another store. - Delete ceph Storage pool - Stop OSDs, Select OUT and destroy - Stop and destroy monitors and managers - Run following commands to make purge remaining files.

systemctl stop ceph-mon.target
systemctl stop ceph-mgr.target
systemctl stop ceph-mds.target
systemctl stop ceph-osd.target
rm -rf /etc/systemd/system/ceph*
killall -9 ceph-mon ceph-mgr ceph-mds
rm -rf /var/lib/ceph/mon/  /var/lib/ceph/mgr/  /var/lib/ceph/mds/
pveceph purge
apt-get purge ceph-mon ceph-osd ceph-mgr ceph-mds -y
apt-get purge ceph-base ceph-mgr-modules-core -y
rm -rf /etc/ceph/* /etc/pve/ceph.conf /etc/pve/priv/ceph.*
apt-get autoremove -y
The content provided is generated with the help of artificial intelligence (AI) and may contain inaccuracies or outdated information due to the limitations of AI. While I strive to review and validate the content, some errors or inaccuracies may still be present in the final output. Please use this content as a general guide only and verify any critical information through reputable sources before relying on it. I appreciate your understanding and feedback in helping us improve the accuracy and quality of our AI-generated content."