Don’t fsck ext3 disks…

The storm last night caused numerous power outages both at work and at home. Unfortunately, the Web server UPS died and caused some minor disk corruption. No problem, the disks are ext3 with journaling enabled.

But, when I got home, instead of allowing the system to come up on its own and recover the journal, I ran fsck just to see what the damage was. Only one directory was lost…the PostNuke PostCalendar cache directory on /var. No problem, that’s a temp directory with no valuable data, so I unlinked it and let fsck finish.

On the second pass fsck found approximately 24,000 lost files. I panicked. We’re talking deep down, gut wrenching, “I just lost all of my data and I’m totally screwed” fear.

I backed out, mounted /var read only, and frantically started archiving all 13GB of data to /home to make sure nothing was lost. Once I verified the data I was confused as to why nothing was lost. Knowing all of my data was backed up I reran fsck on /var with the -y (yes to all) flag and let the disk churn for twenty minutes.

Sure enough, 24000 files accounting for about a gig of disk space were lost. All PostCalendar cache files.

WTFO?

Yup. Turns out that there is no garbage collection and PostNuke doesn’t clean up after itself. Every single cache file ever created since the upgrade back in August was still sitting on the disk and, with 20,000 hits per day, those cache files tend to add up.

Feh.

So, heart attack avoided and I regained a gig of disk space. I guess the evening wasn’t a total loss…