Oracle Cloud Infastructure – Enable Multipath for Ultra High Performance UHP Storage

The Oracle Cloud Infrastructure Block Volume service leverages NVMe-based storage for consistent performance and offers flexible and elastic performance options. You only need to provision the required capacity, and the performance scales according to the selected performance level, up to the service limits. There are four performance levels available who have a direct impact at performance.

Four Levels of Performance

Ultra High Performance: Ideal for workloads with the highest I/O demands, offering the best possible performance. This level allows you to purchase between 30 and 120 VPUs per GB/month.

Higher Performance: Suitable for workloads with high I/O requirements that do not need the peak performance of the Ultra High Performance level. This level provides 20 VPUs per GB/month.

Balanced: The default performance level for new and existing block and boot volumes, providing a good balance between performance and cost for most workloads. This level offers 10 VPUs per GB/month.

Lower Cost: Best for throughput-intensive workloads with large sequential I/O, such as streaming, log processing, and data warehouses. This level only includes storage costs with no additional VPU costs. It is available only for block volumes, not for boot volumes.

Volume Performance Units

Block Volume performance utilizes the concept of Volume Performance Units (VPUs). Than more VPU’s you purchase, than more IOPS and Throughput you get. You can increase and decrease the VPU level online. For pricing, see here at the official price list: https://www.oracle.com/cloud/price-list/ or use the Cloud Cost Calculator https://www.oracle.com/ch-de/cloud/costestimator.html. When you create a new Compute Instance, the boot volume is set to Balanced by default.

Screenshot Performance Level Slider where you can change the VPUs online.

Requirements for Ultra High Performance UHP

High Performance block volumes uses a minimum level of performance on compute instancel level. As example, it’s not possible to run UHP with an instance who has 1 OCPU and 2GB memory only. For the complete list of UHP supportes shapes see here: https://docs.oracle.com/en-us/iaas/Content/Block/Concepts/blockvolumeperformance.htm#shapes_block_details. If you change the VPUs for device where the shape requirements are not met, a yellow attention box is shown in attached block volume screen. You must use a Device path to get Multipath work.

In my case, I use a compute instance E4 Flex Shape with 16 OCPUs enabled, that is sufficient for Ultra High Performance block volumes. The compute instance runs in a public subnet, an internet gateway is in configured.

Pro-Tip: If you attach block volumes with iSCSI, enable the Block Volume Plugin in compute instance and let the agent do the attach job for you. The agent configures the multipath setting for you to benefit from high performance.

About Multipath: https://docs.oracle.com/en-us/iaas/Content/Block/Tasks/attachingavolume.htm#multipath

Block Volume Creation

We create a block volume with these characteristics:

  • 256 GB Size
  • Auto-tune off
  • UHP(VPU/GB:100)
  • No Policy or Replication enabled
  • Encrypt by Oracle-manage key

Block Volume Attachment

We attach the new created block volume by iSCSI. Set a device path and let the agent do the work for you.

Verify attached device.

The block volume is attached, the column Multipath is set to Yes, this means the requirements to use UHP block volumes like OCPU, access to Internet or Service Gateway etc. are fulfilled.

Disk Mount on Operating System Level

The agent has attached the disk and created the multipath configuration for me. Now we can use the value of the device path for further configuration. As this is an Oracle Linux 8 compute instance running in OCI, I use the OS user opc who has sudo permissions for high-level tasks.

--show mapped device
$ sudo  ls -l /dev/oracleoci/oraclevdb
lrwxrwxrwx. 1 root root 18 May 17 13:00 /dev/oracleoci/oraclevdb -> /dev/mapper/mpatha

With the fdisk (n/p/1/enter/enter/w) command we create a new partition mpatha1 and save the settings.

--create partion
$ sudo fdisk /dev/mapper/mpatha

Welcome to fdisk (util-linux 2.32.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Device does not contain a recognized partition table.
Created a new DOS disklabel with disk identifier 0x88cf72ab.

Command (m for help): n
Partition type
   p   primary (0 primary, 0 extended, 4 free)
   e   extended (container for logical partitions)
Select (default p): p
Partition number (1-4, default 1): 1
First sector (2048-1073741823, default 2048):
Last sector, +sectors or +size{K,M,G,T,P} (2048-1073741823, default 1073741823):

Created a new partition 1 of type 'Linux' and of size 512 GiB.

Command (m for help): w
The partition table has been altered.

You can show the multipath configuration by lsblk command:

--show multipaths
$ sudo lsblk
NAME               MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
sda                  8:0    0  512G  0 disk
└─mpatha           252:2    0  512G  0 mpath
  └─mpatha1        252:3    0  512G  0 part
sdb                  8:16   0 46.6G  0 disk
├─sdb1               8:17   0  100M  0 part  /boot/efi
├─sdb2               8:18   0    1G  0 part  /boot
└─sdb3               8:19   0 45.5G  0 part
  ├─ocivolume-root 252:0    0 35.5G  0 lvm   /
  └─ocivolume-oled 252:1    0   10G  0 lvm   /var/oled
sdc                  8:32   0  512G  0 disk
└─mpatha           252:2    0  512G  0 mpath
  └─mpatha1        252:3    0  512G  0 part
sdd                  8:48   0  512G  0 disk
└─mpatha           252:2    0  512G  0 mpath
  └─mpatha1        252:3    0  512G  0 part
sde                  8:64   0  512G  0 disk
└─mpatha           252:2    0  512G  0 mpath
  └─mpatha1        252:3    0  512G  0 part
sdf                  8:80   0  512G  0 disk
└─mpatha           252:2    0  512G  0 mpath
  └─mpatha1        252:3    0  512G  0 part

Now let’s format the disk, create a mountpoint and mount the block volume. In the verfication you see the mapped object to the new created directory.

--create filesystem
$ sudo mkfs.xfs /dev/mapper/mpatha1
meta-data=/dev/mapper/mpatha1    isize=512    agcount=4, agsize=33554368 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1    bigtime=0 inobtcount=0
data     =                       bsize=4096   blocks=134217472, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=65535, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Discarding blocks...Done.

--create directory
$ sudo mkdir /data

--manual mount
$ sudo mount /dev/mapper/mpatha1  /data

--verification
$ df -m
Filesystem                 1M-blocks  Used Available Use% Mounted on
devtmpfs                       31957     0     31957   0% /dev
tmpfs                          32003     0     32003   0% /dev/shm
tmpfs                          32003     9     31994   1% /run
tmpfs                          32003     0     32003   0% /sys/fs/cgroup
/dev/mapper/ocivolume-root     36307 13966     22341  39% /
/dev/sdb2                       1014   356       659  36% /boot
/dev/mapper/ocivolume-oled     10230   191     10040   2% /var/oled
/dev/sdb1                        100     6        94   6% /boot/efi
tmpfs                           6401     0      6401   0% /run/user/987
tmpfs                           6401     0      6401   0% /run/user/1000
/dev/mapper/mpatha1           524032  3686    520346   1% /data

To make the mount durable, add this line to /etc/fstab, it maps to the multipath device partition we created. Test your settings by a reboot of the compute node and visible again if the mount is ok.

--edit /et/fstab
$ sudo vi /etc/fstab
/dev/mapper/mpatha1  /data  xfs  defaults,nofail  0  2

--reboot instance
$ sudo reboot

FIO UHP Sample

The command line tool fio is a flexible I/O testing tool that benchmarks storage devices by simulating various read and write workloads. It allows you to configure detailed parameters such as I/O engine, block size, and concurrency to assess performance under different conditions. Some basic command are listed in the reference: https://docs.oracle.com/en-us/iaas/Content/Block/References/samplefiocommandslinux.htm.

Install fio

--install fio and related packages
$ sudo dnf install -y fio

In this example, fio is used to show the differences between the VPU settings. The VPUS are changed online to the other level. I canceled the job after a short cycle just verify the IOPS.

VPUs/GB = 0

$ sudo fio --filename=/dev/mapper/mpatha1 --direct=1 --rw=randread --bs=4k --ioengine=libaio --iodepth=256 --runtime=240 --numjobs=4 --time_based --group_reporting --name=iops-test-job --eta-newline=1 --readonly
iops-test-job: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=256
...
fio-3.19
Starting 4 processes
Jobs: 4 (f=4): [r(4)][1.2%][r=2129KiB/s][r=532 IOPS][eta 03m:58s]
Jobs: 4 (f=4): [r(4)][1.7%][r=2057KiB/s][r=514 IOPS][eta 03m:57s]
Jobs: 4 (f=4): [r(4)][2.1%][r=2189KiB/s][r=547 IOPS][eta 03m:56s]
...

VPUs/GB = 50

$ sudo fio --filename=/dev/mapper/mpatha1 --direct=1 --rw=randread --bs=4k --ioengine=libaio --iodepth=256 --runtime=240 --numjobs=4 --time_based --group_reporting --name=iops-test-job --eta-newline=1 --readonly
iops-test-job: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=256
...
fio-3.19
Starting 4 processes
Jobs: 4 (f=4): [r(4)][1.2%][r=119MiB/s][r=30.4k IOPS][eta 03m:58s]
Jobs: 4 (f=4): [r(4)][1.7%][r=119MiB/s][r=30.5k IOPS][eta 03m:57s]
Jobs: 4 (f=4): [r(4)][2.1%][r=121MiB/s][r=30.9k IOPS][eta 03m:56s]
...

VPUs/GB = 120

$ sudo fio --filename=/dev/mapper/mpatha1 --direct=1 --rw=randread --bs=4k --ioengine=libaio --iodepth=256 --runtime=240 --numjobs=4 --time_based --group_reporting --name=iops-test-job --eta-newline=1 --readonly
iops-test-job: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=256
...
fio-3.19
Starting 4 processes
Jobs: 4 (f=4): [r(4)][1.2%][r=225MiB/s][r=57.5k IOPS][eta 03m:58s]
Jobs: 4 (f=4): [r(4)][1.7%][r=222MiB/s][r=56.9k IOPS][eta 03m:57s]
Jobs: 4 (f=4): [r(4)][2.1%][r=225MiB/s][r=57.5k IOPS][eta 03m:56s]
...

Summary

Changing the performance characteristics of a boot or block volume is quit simple, as long you fulfill the requirement to a minimal amount of OCPUs. Take care, than more VPUs you order, than higher is the storage costs. Use the Cost Calcuator from Oracle to get a feeling what the price difference is between VPU10 and VPU120. #ilike