a great 'puzzle' for the IT type people among us

electron

Administrator
Staff member
I am having a really strange problem here, and I am hoping maybe some of you can help me figure this one out.

I have a bunch of PC's deployed where I work, and recently I got one back (XP Pro) which is locking up at random intervals. Then one of my main linux servers which runs on a machine with the exact same specs started freezing up as well. This is a box which had an uptime of almost 400 days. Then another dead pc was given to me, and sure enough, exact same hardware specs. So I started going through my pile of dead PC's here, and found another one in that pile which had the exact same specs. This is what I know:

1) freezing happens randomly
2) doesn't matter what OS, it will crash, even if I run a linux distro from CD (HD disconnected)
3) all computers have the same bios version etc, tried upgrading, but no luck
4) PC health shows everything is ok (including voltages), but I did try a brand new Antec power supply
5) all 4 PC's have a serial # which is extremely close (0974,0975,0977,0978)
6) they are about 3 years old, so no warranty
7) the machine do not crash at all when you let it sit in the Bios, or if you let it sit where you select which OS to boot (grub, etc.)
8) extensive 18hour memory tests show memory is ok, swapped it anyways as a test, same problem
9) motherboard in question is the MSI MS-6368 ver 2.1

I have tried everything I can think of, and I can not get these machines to stop locking up. It doesn't happen until the OS starts loading (with linux, it sometimes even happens before any hardware detection), and I have tried disabling everything in the Bios, selecting the slowest performance etc.

What do you guys think? I would hate to write off 4 PC's which worked just fine, and I am afraid there might be a few more out there from this batch.
 

electron

Administrator
Staff member
checked that out as well, reseated the cpu etc, and the system will forever as long as it stays in the bios or a prompt (eventho that does seem to point to the cpu).
 

Rupp

Senior Member
Any internal hardware the same on all pc? If so it could be a common driver that is affecting all of them?
 

Squintz

Senior Member
Check your peripherals.

Are all the Peripherals the same. Is the CD Rom the same in all of them?
Are you using the Same keyboard on each one? Keyboards have a processor in them which performs the task of looking up the ascii(hex) value when you press shift or alt and just about any key for that matter. Perhaps your keyboard is going bad. A PC will usually not boot without a keyboard.
 

HAL_MAN

Member
Do you have any other machines with exact specs in good working order?
If so start with the motherboard and replace piece by piece until the lockups stop.
That should at least help you pin down the componet that is locking the machine.
Regards,
Shawn
 

electron

Administrator
Staff member
I am trying to find a machine which is part of that batch, assuming there is more out there, and still works (other batches are usually different specs), but it's going to be difficult to take another PC out of circulation since we are short 4 already.

It is a hardware issue for sure since it doesn't matter what OS it is, and if the OS is loaded (i.e. during a format started by XP setup). All machines are 100% identical.
 

BraveSirRobbin

Moderator
Have you had a power surge/outage lately? It could have "stressed" certain components and they are now failing.

Could be that this "common" hardware was vulnerable to a surge.
 

electron

Administrator
Staff member
The workstations were all protected by APC UPS's, and the server was protected by a giant APC battery/surge protector. The fact that it isn't happening to the many other cheap PC's which have been deployed (same manufacture as well) is what really puzzles me.
 

DavidL

Senior Member
Put a motherboard monitor (speedfan.exe?) on the PCs with logging and excersize them. Sounds heat related to me. Only when they are chugging away is there a problem. Any DOS based scripts out there that you can boot into that can get things warm?

I just got back a motherboard from Gigabyte that got harder and harder to boot (post). Giga said they replaced a "part" with no detail. Maybe this weekend I will build another PC to see if they really got it fixed.
 

jb_sgf

Member
I had a PC doing pretty much the same. Run forever in BIOS screen and lock up when loading operating system. After swapping parts in and out I found that the power supply was the guilty party.
 

electron

Administrator
Staff member
I monitored the temps, and no problems at all. The linux command line is pretty much the same thing as the DOS command line (just text), and it even crashed then, so I don't think it is heat related, especially since all these machines started doing it around the same time, and one was in a 'chilled' data room, while the others were in a regular cubicle.

I tried swapping the powersupply already with a brand new Antec 400W, and no luck. I even unplugged all other accessories, and verified the voltages were ok.

Do keep posting any suggestions you might have tho, I am willing to try anything. I have not located a working machine from this batch yet, so I can't test the mobo/cpu replacement yet.
 

Squintz

Senior Member
Well since the PC is only locking up and not shuting down I am guessing its not a heat problem and its not a power supply issue. Whats left?

Mother Board
Memory
Processor
Keyboard?
Video Card?

Do you get any POST warnings?
 

Stinger

Active Member
Anything I would have tried it appears was already tried. However, I'd take a look at the voltage coming in and monitor it.

Some items truly defy any type of logic, for instance if it isn't OS specific, yet it sits in the BIOS just fine, then it can't just be hardware related.

It must happen when the OS initializes certain components in the system. It is indeed a great puzzle.
 
Top