Starting last Tuesday morning its been a week of fun and games with the servers at work. Got in Tuesday morning to the wonderful smell of escaping smoke and melting plastic. Seems the transformer in one of our network switches cooked itself. In doing so it got the case hot enough to melt the feet of the bottom of the unit. So pull that one do a review of all the other gear in the same area, as this involved taking the servers offline anyway I took the opportunity to install some much needed memory upgrades.
That was a mistake.
One of the servers, Wowbagger (one half of the mail system with its companion Eddie), has been having on going problems with its IO-APIC and just randomly crashing, maybe once every two weeks. Resetting it would always bring it back up with only the occasional file system corruption which was quite easy to fix with ReiserFS. Being chronically under-funded meant we couldn’t justify replacing it when it works 95% of the time.
Of course by opening it up and adding the extra RAM I disturbed the voodoo that was keeping it going. So instead of crashing every other week it starts going down every evening, good quality in a girlfriend not so great in a server. So Thursday night go in to fix the server up so that its working for business hours Friday. Rebuild the file system which takes about 4 times as long as usual, probably cause its now 4 am and I just want to go home. So 5am comes around its been up and running stable for an hour so I decide to go home, get some food and some sleep.
No such luck. 10mins from get a notice that the server has gone down again. So close to home I just go there grab a shower and some food and turn around. Not happy. Get to work get it back up and running. Give it two hours this time tis all good so back off home and sleep. Thankfully it makes it all the way through Friday. Well business hours anyway. 10pm it dies again. At this point I’m rather fed up with the server and think Fuck It, its Friday night it can bloody well wait till tomorrow morning.
Saturday, jump on the bike go for a ride, stop in at the office reset Wowbagger again and it all comes up good. It even stays that way all through Saturday and well into Sunday.
4pm Sunday get another SMS Wowbagger is DOWN. Get into the office to the wonderful smell of escaping smoke and melting plastic. mmm I’m sure I just did this.
Turns out Wowbagger decided to cook it’s CPU.
So Monday finally got clearance to replace the bloody thing. Nice little Athlon 64 X2 3600+ with 2GB RAM for AU$380. It’s now the most powerful box in the office, obscenely over-speced for what it’s doing but everyone’s spam… err mail is working again.
Of course Wowbagger was running Ubuntu 6.06 LTS which meant the new cheap motherboard was completely unsupported, but Ubuntu in-situ upgrading is great. 2 hours and a half dozen commands and its running Ubuntu 7.04 and all the hardware just works.
Now if I can just get some funding to bring the rest of the servers up to a similar spec and enough down time to convert them from Fedora Core 3 to Ubuntu 7.04 that would be great.









