PostgreSQL recovery tips

Having run into some disk issues last night we’d not expected, I had some scary moments trying to find any resources for PostgreSQL recovery scenarios relating to disk failure. I chalk this up to most PostgreSQL users being sensible and using RAID1 or similar. We’re doing things on the mother of all shoestring budgets, though, so when disks start spewing things like:

[11180714.763689] ata2.00: exception Emask 0x0 SAct 0x1f SErr 0x0
action 0x0
[11180714.763726] ata2.00: irq_stat 0x40000008
[11180714.763760] ata2.00: cmd 60/08:20:17:6c:a4/00:00:31:00:00/40 tag
4 ncq 4096 in
[11180714.763761]          res 41/40:00:19:6c:a4/ff:00:31:00:00/40
Emask 0x409 (media error) <F>
[11180714.763864] ata2.00: status: { DRDY ERR }
[11180714.763893] ata2.00: error: { UNC }
[11180714.765974] ata2.00: configured for UDMA/133
[11180714.765989] sd 1:0:0:0: [sdb] Unhandled sense code
[11180714.765991] sd 1:0:0:0: [sdb] Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[11180714.765995] sd 1:0:0:0: [sdb] Sense Key : Medium Error [current]
[descriptor]
[11180714.765999] Descriptor sense data with sense descriptors (in hex):
[11180714.766001]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[11180714.766010]         31 a4 6c 19
[11180714.766014] sd 1:0:0:0: [sdb] Add. Sense: Unrecovered read error
- auto reallocate failed
[11180714.766019] end_request: I/O error, dev sdb, sector 832859161
[11180714.766062] ata2: EH complete

… you really, really panic.

sdb in this case is one of our large 500GB disks we use for archival, so I knew damage would be limited to some of the archive tables at worst. With some help from the #postgresql channel on Freenode I got to work.

First things first: Shut down PostgreSQL. service postgresql-8.4 stop in our case.

Next, we copy all the data we can off that disk. Turned out it was just throwing errors on one file, so I duplicated the tablespace except that file onto our other 500GB disk. That one file was pg_data/16394/461543 in our tablespace – 16394 being the evemetrics_production database OID, but I didn’t know what this file was.

Once I’d moved all the data across to another disk I umounted the old one and got to work on bringing the server back up without the file.

At this point it’s worth noting one thing I did before all this cropped up: I’d taken a full backup with pg_dump, which had completed without errors. This lead me to think that the file we were looking at was an index or some system catalog.

Next, I sorted out the tablespace symlink for our sdb tablespace:

# cd /var/lib/postgresql/8.4/main/pg_tblspc/
# ls -lar
lrwxrwxrwx  1 postgres postgres   14 2010-05-14 20:32 461544 -> /disk2/pg_data
lrwxrwxrwx  1 postgres postgres   20 2010-05-14 20:44 461543 -> /disk1/pg_data
# rm 461543
# ln -s /disk2/disk1/pg_data 461543

With our tablespace now pointing at the backup copy, I brought the server back online with service postgresql-8.4 start.

Before you do anything else, you now have to update your tablespace entry on the server or Bad Things happen.

UPDATE pg_tablespace WHERE spclocation = '/disk1/pg_data' SET spclocation = '/disk2/disk1/pg_data'

Now we could find out what that file was:

postgres@pandora:/disk2$ /usr/lib/postgresql/8.4/bin/oid2name -d evemetrics_production -f 461577
From database "evemetrics_production":
  Filenode               Table Name
-----------------------------------
    461577  index_on_api_request_id
And there you have it- an index! We got off seriously lucky here, and RAID1/10/5 would’ve saved us if we had the money for it. All we had to do was issue a REINDEX command on that table and we were good to bring everything back up. Moral of the story? Backup often, backup early, and use RAID. Also, reliable disks are _so_ worth the money.

3 thoughts on “PostgreSQL recovery tips”

  1. Lucky save there but great job indeed!

    I suppose reliable backups are more essential than RAID for a site like eve metrics. After all the data you collect is so volatile loosing a day or ten won’t be such a big deal as it will be repopulated promptly by users. That being said I use raid mirroring even on my home pc 🙂 It’s nice to sometime get a virus that renders 2 TB of HDD space worthless yet perfectly synchronized…

    P.S.: Once the storm is over I hope you bring the historic csv dumps back (last is 12th of July). My personal apps are hungry for some new data. Not trying to put pressure or anything but ISK does not make itself 🙂

  2. CSV dumps we’ll look into – wasn’t aware they’d stopped working!

    Yes and no on the repopulation issue – there’s a lot of data which is transient and we can’t reconstruct without keeping backups of all the uploads, and doing a replay. This is something I’m actually working on doing so that we have the raw data as a last resort.

    In terms of backups, we do backups regularly to another disk on the system, and we copy those backups off-site to a virtual private server in a nearby datacenter run by Linode, who also do their own backups of the VPS. Both myself and Makurid also take copies of the database (with API keys and other sensitive information redacted for security, of course) and use them in development of the site so we have a relatively up to date live copy of the site on our own machines.

    RAID would be lovely to have, and if we had our dream machine it’d look something like:

    • 64GB of ECC RAM
    • 10x146GB 10,000RPM disks (alternatively, SSDs) in RAID10 or using btrfs for software RAID, across 8 disks as 2×4 disk groups with 2 hot spares

    Or, we’d stick a RAMSAN 620 next to the box, fibre channel into it, and call it a day. Sadly, we don’t have any money. So what we’ve got is our (slightly) smaller server with less shinies, and we get to choose redundancy or performance. We chose performance because we are in a position to do backups regularly, and at the end of the day, it’s cheaper.

Comments are closed.