Has anyone used fio to test raid disk speed?

Hi,

I am new to Gitlab forum. We recently needed to upgrade our Gitlab and we plan to set up the git repository on to a raid 1 device so that it can provide redundancy.

We have a 1.8 SSD on raid 1 and 2 1G spinning disk on raid 0. The file system is ext4.

/dev/md1 is mount on /data.

I used this page to test the performance: https://docs.gitlab.com/ee/administration/operations/filesystem_benchmarking.html

and use

fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --bs=4k --iodepth=64 --readwrite=randrw --rwmixread=75 --size=4G --filename=/data/testfile

to test if mix read/write is okay. Surprisingly, it took almost 15 minutes to finish the test - compared to the same command on other disks such as ext4 on raid 0 or XFS.

The result is like this:

test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.12
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [m(1)][99.9%][r=3063KiB/s,w=944KiB/s][r=765,w=236 IOPS][eta 00m:01s]  
test: (groupid=0, jobs=1): err= 0: pid=1280: Thu Sep 17 01:43:10 2020
  read: IOPS=582, BW=2329KiB/s (2385kB/s)(3070MiB/1349710msec)
   bw (  KiB/s): min=   16, max= 7016, per=100.00%, avg=2330.49, stdev=970.98, samples=2696
   iops        : min=    4, max= 1754, avg=582.59, stdev=242.74, samples=2696
  write: IOPS=194, BW=778KiB/s (797kB/s)(1026MiB/1349710msec); 0 zone resets
   bw (  KiB/s): min=    8, max= 2192, per=100.00%, avg=779.41, stdev=300.29, samples=2694
   iops        : min=    2, max=  548, avg=194.82, stdev=75.08, samples=2694
  cpu          : usr=0.29%, sys=1.82%, ctx=74828, majf=0, minf=8
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=2329KiB/s (2385kB/s), 2329KiB/s-2329KiB/s (2385kB/s-2385kB/s), io=3070MiB (3219MB), run=1349710-1349710msec
  WRITE: bw=778KiB/s (797kB/s), 778KiB/s-778KiB/s (797kB/s-797kB/s), io=1026MiB (1076MB), run=1349710-1349710msec

Disk stats (read/write):
    md1: ios=785840/268200, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=392956/270603, aggrmerge=4/290, aggrticks=88357/31336, aggrin_queue=89176, aggrutil=2.42%
    md0: ios=54864/273110, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=27429/135380, aggrmerge=2/1493, aggrticks=3676920/5905344, aggrin_queue=6829956, aggrutil=74.20%
  sdb: ios=27354/134636, merge=4/2034, ticks=4569325/6572514, in_queue=8052728, util=74.20%
  sda: ios=27505/136124, merge=1/953, ticks=2784515/5238175, in_queue=5607184, util=56.50%
  sdc: ios=731048/268097, merge=8/581, ticks=176714/62672, in_queue=178352, util=2.42%

I noticed that IOPS is very low compared to what Gitlab page told me.

However, if I used dd if=/dev/urandom of=/data/testfile bs=4k count=2000 then the result is much faster.

I also tried to test on Simple benchmarking way but it returned acceptable result.

Can anyone guide me on how to debug this issue?

I was able to remove a single SSD disk from raid and only test on it and the result shows it is very quick.

I guess there might be something wrong with raid setting.

What are your fstab settings for the mount? Using noatime can help speed things up. But also, your dd command would be quicker, since that only generates an 8mb file instead of 4gb which you are generating with fio.

Although, even assuming that, a dd command with a 4gb will also finish quicker, since it’s not doing the same kind of test as fio. For me an equivalent dd command on a 7200rpm 4TB disk took approx 90 seconds, and the fio command above was looking to take about 30 mins - on ext4 with noatime. Not even attempted it with raid yet.

According to this link: https://thesanguy.com/2018/01/24/storage-performance-benchmarking-with-fio/

I did this test:

fio --name=seqwrite --rw=write --direct=1 --ioengine=libaio --bs=4k --numjobs=4 --size=4G --runtime=60 --filename=/data/testfile

slightly different command, ran it for 60 seconds - so you could have put --runtime on your command to just let it run for 60 seconds, instead of it running for 15 - 30 mins which probably is averaging out the results. Doing that kind of fio test, was far quicker, and yet the results far higher, IOPS was higher, as well as the read/write. Even with a randwrite it had better IOPS and read/write speed than the command you gave above.

1 Like

Hi iwalker,

Thank you. I noticed that the issue might be in those two spinning disks. I tested them independently and needed 1 hour to finish a 4G file.

Now I just get rid of raid idea and now I am creating btrfs on these three devices with a single data profile. The speed is acceptable.

I guess you still used your original fio command if it took an hour. Using the one I provided would have taken no more than 60 seconds as I restricted the test with --runtime=60, even if randwrite.

I’ve done a test just now with a VM as well, and configured raid. Also ran the test using the command I provided, and it also took 60 seconds. So I doubt it’s your disks or your raid configuration as both tests that I did with or without raid provided acceptable results.

Test ran on non-ssd using sequential write:

fio --name=seqwrite --rw=write --direct=1 --ioengine=libaio --bs=4k --numjobs=4 --size=4G --runtime=60 --filename=/data/testfile

standard 1TB HDD results:

Run status group 0 (all jobs):
  WRITE: bw=4650KiB/s (4762kB/s), 1162KiB/s-1163KiB/s (1190kB/s-1191kB/s), io=272MiB (286MB), run=60001-60003msec

Disk stats (read/write):
  sdb: ios=0/69634, merge=0/140, ticks=0/230434, in_queue=46496, util=97.03%

VM using standard disks in Proliant Server with raid 1:

Run status group 0 (all jobs):
  WRITE: bw=68.7MiB/s (72.1MB/s), 17.2MiB/s-17.2MiB/s (18.0MB/s-18.0MB/s), io=4125MiB (4325MB), run=60001-60001msec

Disk stats (read/write):
    md0: ios=0/1052794, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=0/1054513, aggrmerge=0/1467, aggrticks=0/148497, aggrin_queue=137504, aggrutil=85.96%
  sdc: ios=0/1054010, merge=0/1977, ticks=0/153394, in_queue=141856, util=85.96%
  sdb: ios=0/1055016, merge=0/957, ticks=0/143601, in_queue=133152, util=82.86%

VM this time with randwrite:

Run status group 0 (all jobs):
  WRITE: bw=2389KiB/s (2446kB/s), 581KiB/s-617KiB/s (595kB/s-632kB/s), io=141MiB (148MB), run=60001-60297msec

Disk stats (read/write):
    md0: ios=0/39078, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=0/36076, aggrmerge=0/3004, aggrticks=0/225217, aggrin_queue=187662, aggrutil=84.39%
  sdc: ios=0/36076, merge=0/3004, ticks=0/236848, in_queue=197696, util=84.00%
  sdb: ios=0/36076, merge=0/3004, ticks=0/213586, in_queue=177628, util=84.39%

in both occasions they took 60seconds (as shown in results 60001 milliseconds) as I used the --runtime=60 parameter. The only difference being a randwrite being a lot slower (IO 141MiB instead of 4125MiB) than sequential write. So the only reason it takes a long time is due to the command you are using and not specifying --runtime=60 or however many seconds you wish to run it for.

One other thing to look out for, if you are intending on storing a database on your btrfs filesystem you need to disable CoW for the database directories as it causes major problems for databases when CoW is enabled. If not, then no need to worry.