PostgreSQL recovery tips

Having run into some disk issues last night we’d not expected, I had some scary moments trying to find any resources for PostgreSQL recovery scenarios relating to disk failure. I chalk this up to most PostgreSQL users being sensible and using RAID1 or similar. We’re doing things on the mother of all shoestring budgets, though, so when disks start spewing things like:

[11180714.763689] ata2.00: exception Emask 0x0 SAct 0x1f SErr 0x0
action 0x0
[11180714.763726] ata2.00: irq_stat 0x40000008
[11180714.763760] ata2.00: cmd 60/08:20:17:6c:a4/00:00:31:00:00/40 tag
4 ncq 4096 in
[11180714.763761]          res 41/40:00:19:6c:a4/ff:00:31:00:00/40
Emask 0x409 (media error) <F>
[11180714.763864] ata2.00: status: { DRDY ERR }
[11180714.763893] ata2.00: error: { UNC }
[11180714.765974] ata2.00: configured for UDMA/133
[11180714.765989] sd 1:0:0:0: [sdb] Unhandled sense code
[11180714.765991] sd 1:0:0:0: [sdb] Result: hostbyte=DID_OK
[11180714.765995] sd 1:0:0:0: [sdb] Sense Key : Medium Error [current]
[11180714.765999] Descriptor sense data with sense descriptors (in hex):
[11180714.766001]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[11180714.766010]         31 a4 6c 19
[11180714.766014] sd 1:0:0:0: [sdb] Add. Sense: Unrecovered read error
- auto reallocate failed
[11180714.766019] end_request: I/O error, dev sdb, sector 832859161
[11180714.766062] ata2: EH complete

… you really, really panic.

sdb in this case is one of our large 500GB disks we use for archival, so I knew damage would be limited to some of the archive tables at worst. With some help from the #postgresql channel on Freenode I got to work.

First things first: Shut down PostgreSQL. service postgresql-8.4 stop in our case.

Next, we copy all the data we can off that disk. Turned out it was just throwing errors on one file, so I duplicated the tablespace except that file onto our other 500GB disk. That one file was pg_data/16394/461543 in our tablespace – 16394 being the evemetrics_production database OID, but I didn’t know what this file was.

Once I’d moved all the data across to another disk I umounted the old one and got to work on bringing the server back up without the file.

At this point it’s worth noting one thing I did before all this cropped up: I’d taken a full backup with pg_dump, which had completed without errors. This lead me to think that the file we were looking at was an index or some system catalog.

Next, I sorted out the tablespace symlink for our sdb tablespace:

# cd /var/lib/postgresql/8.4/main/pg_tblspc/
# ls -lar
lrwxrwxrwx  1 postgres postgres   14 2010-05-14 20:32 461544 -> /disk2/pg_data
lrwxrwxrwx  1 postgres postgres   20 2010-05-14 20:44 461543 -> /disk1/pg_data
# rm 461543
# ln -s /disk2/disk1/pg_data 461543

With our tablespace now pointing at the backup copy, I brought the server back online with service postgresql-8.4 start.

Before you do anything else, you now have to update your tablespace entry on the server or Bad Things happen.

UPDATE pg_tablespace WHERE spclocation = '/disk1/pg_data' SET spclocation = '/disk2/disk1/pg_data'

Now we could find out what that file was:

postgres@pandora:/disk2$ /usr/lib/postgresql/8.4/bin/oid2name -d evemetrics_production -f 461577
From database "evemetrics_production":
  Filenode               Table Name
    461577  index_on_api_request_id
And there you have it- an index! We got off seriously lucky here, and RAID1/10/5 would’ve saved us if we had the money for it. All we had to do was issue a REINDEX command on that table and we were good to bring everything back up. Moral of the story? Backup often, backup early, and use RAID. Also, reliable disks are _so_ worth the money.