Troubleshooting a PC hardware problem

beelzerob

Senior Member
So my main HA PC suddenly has developed the habit of turning itself off. As in, a complete shutting down (like the power cord was pulled).

First guess was the power supply, so I replaced that and let it run for a little bit, and it turned itself off again too.

Second guess was memory. So I booted up with a windows memory test CD, and let that run. Well, at some point in that, it once again turned itself off. Of course because of that, I can't then look at the results to see if there were any failures.

It's possible that running the memory test turns off the PC when it hits a bad spot of memory?

I guess I can just sit and watch the thing while it does the test to see if it keeps happening at the same spot, or if any failures are logged before its demise.

Any other suggestions of how to troubleshoot this?

We did have the power suddenly turn off last night, and at least 2 tries where it clicked on and off before I went down and shut the main breaker until they got their act together...maybe that twisted something the wrong way.
 
This happened to me a year or so back and it turned out my mother board caps got fried. An inspection of the caps around the CPU showed bulges which indicated a bad MB.
 
Besides the Swollen Cap Syndrome [yes it has a name] is not the problem. I have seen such swollen caps in many brands. Goes back to a company in China that stole the electrolyte formula from a Japanese company but didn't get the stabilizer part of the formula. As time goes on the electrolyte breaks down and you get released gasses. You may even see the tops bulging or brown residue seeping from the caps seals.

Double check the fans and ventilation for clogged up items.
If it starts overheating all kinds of things can happen.
 
Double check the fans and ventilation for clogged up items.
If it starts overheating all kinds of things can happen.

Bingo. :)

As I started it up again and decided I would just sit through the memory tests, I thought more and more about how it seemed to be shutting off after a particular amount of time, no matter what it was doing....which that can usually mean overheating. So I checked the CPU fan. Sure enough, it weren't moving. I took it off, and it was hard to spin by hand. Replaced it with another fan, and so far it's still up and running.

I think being in the basement kept it alive for quite a while, but still the heat just built up enough to shut it down. Must be a safety limit, because it still works fine now that it has a fan. It didn't fry!

Thanks for the helps guys.
 
beelzerob,
I'm not sure what automation platform you are running but there's a script for HomeSeer that allows you to monitor you fan speeds and CPU temps on most mother boards. This has saved me once when I came home to an announcement - "The CPU fan is below your preset threshold".
 
I've got CQC.

Isn't that kind of info motherboard dependant, though? It has to be something either available in the bios or not?

Ya, that would have helped. I actually have a serial device that would work wonders for that. It's an mcubed t-balancer. It has digital temp sensors on it and controls fan speeds however you want it to. It was in my media PC for a while, since it works great for keeping the thing quiet in the living room...but since it's all in the basement now, I took the thing out. I'd have to write a driver for it for CQC to make it of use though, and right now my list is long of such tasks....
 
I've got CQC.

Isn't that kind of info motherboard dependant, though? It has to be something either available in the bios or not?

Ya, that would have helped. I actually have a serial device that would work wonders for that. It's an mcubed t-balancer. It has digital temp sensors on it and controls fan speeds however you want it to. It was in my media PC for a while, since it works great for keeping the thing quiet in the living room...but since it's all in the basement now, I took the thing out. I'd have to write a driver for it for CQC to make it of use though, and right now my list is long of such tasks....


http://www.almico.com/speedfan.php
 
I've got CQC.

Isn't that kind of info motherboard dependant, though? It has to be something either available in the bios or not?

Ya, that would have helped. I actually have a serial device that would work wonders for that. It's an mcubed t-balancer. It has digital temp sensors on it and controls fan speeds however you want it to. It was in my media PC for a while, since it works great for keeping the thing quiet in the living room...but since it's all in the basement now, I took the thing out. I'd have to write a driver for it for CQC to make it of use though, and right now my list is long of such tasks....


http://www.almico.com/speedfan.php
Yep, Speedfan is the software. It works with most all mother boards built in the last few years. There are exceptions as always.
 
Speedfan is cool, there is a SageTV plugin for it too so you can check on your sage server from inside the GUI on the clients. It's kickass.
 
Ooo ya, speedfan. I remember that. I was big into silent computing for a while, via silentpcreview.com. That was a handy little utility, except that some motherboards it didn't work well with, it would actually turn off your fans, or speed them to 100% until you rebooted. I think after that, I lost some interest.

But maybe it works better now, and maybe with the boards I've got. Worth a look I guess. Overall, I think it was just a bad fan...it had been ticking quite a bit, and I should have listed (I have about a dozen spare fans).
 
Here are a couple of ideas based on things that happened to me:

1) UPS interference. Background: My PC used to have a UPC connected serially. I eventually replaced the UPS and no longer used the serial cable. I did, however, connect to an ELK M1 via a serial connection. So, anyway, I would get random shutdowns and always wonder why. When I disconnected the Serial cable, my problems went away. I hypothesized that the M1 was triggering a "power failure, UPS out of power" scenario on the computer. Disabling the UPS feature made that problem go away

2) Different scenario in windoze: I installed localcooling for power management, and its config got corrupted and set the "shutdown" timeout to be 1 minute of inactivity. If you run windows, checkout your power settings and see if they didn't inadvertently get changed
 
Back
Top