Recently we have been using the VMware server we have more and more. Lately we have been seeing performance issues and I’m trying to track down exactly what is causing this.
In this article, we’ll take a look at some disk performance testing I’m doing on various servers of different configuration. We’ll compare the results against the VMware server to see how it stacks up.
Write Testing
First, I’m doing a write test on the various servers I have access to. The command I am using is below.
# sh -c "dd if=/dev/zero of=bigfile bs=8k count=500000 && sync"
Dell 2950 (Apollo)
This is a server running Ubuntu 6 Linux. The server has six Ultra-320 SCSI disks running on a Perc-5i RAID controller in a RAID-5 configuration. It is a Dual Dual-Core Xeon 3.0GHz based machine with 6GB of RAM. This server has nearly no active load on it at all.
$ sh -c "dd if=/dev/zero of=bigfile bs=8k count=500000 && sync" 500000+0 records in 500000+0 records out 4096000000 bytes (4.1 GB) copied, 27.7942 seconds, 147 MB/s
SuperMicro (Curry)
This is a Dual 3.2GHz Xeon (single-core) server w/ 2GB of RAM. It is running FreeBSD 5.4 and has two Ultra-320 SCSI disks running on a 3ware RAID controller running in RAID-1. This server has a moderate, active load on it so this is important to keep in mind.
# sh -c "dd if=/dev/zero of=/root/igfile bs=8k count=250000 && sync" 250000+0 records in 250000+0 records out 2048000000 bytes transferred in 52.198335 secs (39234968 bytes/sec)
SuperMicro (DED-003)
This is a dual Intel Pentium 4 3.00GHz machine with dual 7200RPM SATA drives on a 3ware RAID controller running in RAID-1. This machine is running FreeBSD 6.2.
# sh -c "dd if=/dev/zero of=/root/igfile bs=8k count=250000 && sync" 250000+0 records in 250000+0 records out 2048000000 bytes transferred in 36.487612 secs (56128639 bytes/sec)
SuperMicro (VM00)
This is the VMware server I’m troubleshooting. This is a Dual Dual Core AMD Opteron 2212 (2.0GHz) server w/ 8GB of RAM. It is running Ubuntu 6 and has four 7200RPM SATA disks running on a 3ware RAID controller running in RAID-10. During this test, all virtual machines were shutdown and thus has no active load on it so this is important to keep in mind.
# sh -c "dd if=/dev/zero of=/root/igfile bs=8k count=500000 && sync" 500000+0 records in 500000+0 records out 4096000000 bytes (4.1 GB) copied, 28.1394 seconds, 146 MB/s
Summary
In summary:
| Apollo: | 147 MB/s |
| Curry: | 37 MB/s |
| DED-003: | 54 MB/s |
| VM00: | 146 MB/s |
It’s important to keep in mind that Curry had the largest active load, which likely attributes to it’s low benchmark numbers. I also did similar read tests using the following command:
# time dd if=bigfile of=/dev/null bs=8k
Although, of course, the times were greatly different, they were similarly proportional to the above write results.
Continued Test
To continue to test, I am next going to use the “iostat” tool to watch where the disk bottleneck is as I load four VM’s from a suspended state.
Upon launching the resume process for four VM’s, the read rate shot up to around 15,000 reads per second. The average queue size held steady around 20 reads with an average wait time of around 30-35ms steady. The average utilization of the disks was 100% through this time.
Later, when the virtual machines were up the disk utilization ran around 60%. Clearly, this is where the issue is. With only four servers up — two Linux servers and two Windows Server 2008 (Longhorn) servers up — having that little overhead (30%) is not enough to allow the VM’s to respond well. Once the VM’s had a chance to settle down, the utilization rate and read rate dropped dramatically.
Summary
After having a chance to review all of this data, I feel that the bottleneck is the speed of the drive spindles. Having 7200RPM drives, interface (SATA, SCSI, SAS, IDE, etc.) aside, the spindle speed is crucial under heavy disk load.
loading...
loading...
18 Responses
Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.
Continuing the Discussion