As will become somewhat obvious over the next few posts, I’ve been setting up new database servers, and have crossed a sort of a threshold. IO bandwidth completely outstrips the ability of a single core to do anything interesting or useful with it. I’ve never had a database server that was not IO bound, and to be fair, this one probably isn’t once you factor in all the cores. I’ve also never had an 8 core system before.
This is a dual quad core server with low power chips, from Intel. The individual core’s aren’t speed daemons, but for the most part, I’m looking at lots of parallel operations so I’d rather have the parallelism than the individual core speed. The drive system though, it’s a LSI Megaraid with 512MB battery backed cache and 6 15K SAS drives in a raid 10 configuration. It’s good for better than 200 MB/sec continuous reads or writes, and about 130MB/sec per character writes. 1000 seeks/second. (and 450mb/sec reads on a 12 drive raid 10, and 200 meg/sec single character writes) Fast. Nice for DB loads. Nice for just about anything that needs speed more than space.
However, the parallelism rears it’s ugly head. This is the wide finder problem that Tim Bray has been talking about.
Gzip only runs on a single core, and while it’s throughput varies, it’s in the range of 4MB/second/core. If we could use all the cores, we still couldn’t keep up with the drives, but at least we’d be able to zip large backups reasonably fast.
Ssh, one core. Though, I’m seeing 45 MB/second/core there, it should be nearly possible to saturate the drive system and the dual gig-ethernet with the combined processor power. One of the use cases on the server is to blat the whole database from one to the other using rsync when reestablishing the secondary for fail over. 4x Faster would be nice.
Database dumps. Database Restores. Processor constrained again. Even on the old machine, a 3 yr old dual opteron, dumps are processor constrained at least when writing to a network or reading a toast table. This is annoying, since one non-negotiable bit of downtime is the time to dump from the old machine and load on the new one. It turns out that we can speed things up here to some extent by dumping each table on it’s own, with several in parallel. My tests are showing that for a database that’s dominated by 3 tables, the time to dump and reload becomes dominated by the time to dump and reload the largest table, rather than the entire database. Here that’s about a 25% savings, as there’s one massive table, one simply large, and quite a few under a hundred megs.
Recreating the indexes is another place where we redline one core, and hardly stress the drives. I think that I can probably manually split the index generation into separate threads, but I’d have to be careful to not split individual tables into different files.
These aren’t ordinary operations for a running database, but they are operations that tend to happen during down time when everyone’s looking at the clock and getting nervous. And it’s only going to get worse on the hardware side, since there’s no indication that anyone’s going to be rolling back to big honkin single cores. And the data’s certainly not getting smaller. So, it’s time for the software to get a little smarter.No comments