Tuesday, September 29, 2009

Ah, the cursed hardware flake.

So I no longer have access to the vast, vast majority of my data. The only machine with SATA ports has a memory corruption issue. I discovered this while running digikam under strace, trying to identify why kipi-plugins wasn't seeing any plugins, and strace aborted when malloc reported to it that it had detected corruption.

Now, normally, one might figure this to be a software issue; Careless memory management can and will eventually lead to corruption. Say hi to 0xFEEEFEEE and 0xBAADFOOD. However, my system has been periodically locking up in the middle of the night, and I hadn't been able to track down why; I assumed it was due to some software quirk (Gentoo does have a reputation for being bleeding-edge, but I've been coming to find that it's merely a sharp edge, without all that much bleeding.). When I finally noticed that whatever caused the flakes wasn't getting logged to syslog, I figured it was likely a hardware issue; Mainline kernel releases tend to be free of such bugs, no matter how much you twiddle things around with "make menuconfig" .., I didn't know when I'd have time to take things apart piece by piece to identify the culprit, but I'd already started looking up the components I'd used, to see if the reports of failures had changed notably.)

So if strace reports memory corruption under these circumstances, there's a very high chance it's a hardware issue rather than a coding bug. My mainboard has a compatible version of memtest86+ in ROM, and I'm letting that run until it flakes. (The mainboard has all kinds of features in support of overclocking, including automated tweaking of relevant settings, but I've always been running everything at stock levels.) Could be a couple days, could be a week. I'd prefer to see the flake twice, so I get a feel for how long it takes to manifest before I try rerunning the testing with various RAM modules removed.

Meanwhile, all my data is sitting on a 1TB SAMSUNG SATA drive (Yes, that very same model I hate with that very same firmware version I know to be buggy), and I don't have the time to set up alternate access while my desktop/server system is testing its memory.

Add to that that my laptop's screen has gotten incredibly flaky; In addition to the connecting cable's flaking enough to require not-infrequent repositioning of the display angle, I now have (or had, I can't find it now) one pixel where both the green and blue subpixels have died. My left ALT key started resisting oddly last week, but finally fell off over the weekend; I can now use the membrane switch without an impeding piece of plastic. (No jokes about membrane keyboards, Das Keyboards or modern Model Ms, please, unless someone knows where I can find a modern laptop with a built-in buckling-spring keyboard. :P )

So that's both my home machine's giving me headaches.

Fortunately, my desktop's RAM comes with lifetime warranty. (Don't they all, these days?) And I've still got about six months left on the extended warranty I bought with my laptop, to which I haven't done any internal hardware or otherwise warranty-breaking work.

The truly ironic-in-the-Alanis-sense thing, though? Shortly after I got back from Atlanta, I moved all of my data from my laptop to my now-unavailable desktop, in anticipation of getting the warranty work done on the laptop. I'd been using FUSE and sshfs to keep it reasonably accessible (an awesome combination that even works fairly well over 802.11g), but now I don't (and won't) have access to any of my data (for a while). No music I haven't copied to my PS3 already, no working on the con photos, no doing squat with anything I already have.

Time to make more data. :P

No comments:

Post a Comment