Wednesday, April 19, 2006

Ok, so get this:

Most customers have their own IT environments which they pack up and move into our cozy data center. We have one customer a little different from the rest.
Years ago, while the molten crust was still cooling on our business model and sales staff, someone sold this customer a total management solution. In this package, we gave them email, remote desktop services, data storage and Microsoft directory services, all on hardware we leased (and sub let to the customer) configured and maintained. Like I said, years ago.
Flash forward to present day, and the customer is still using the same stuff. Generally, a leased server is replaced every two years. These were eventually given to us by the leasing company and have not been upgraded (or patched) since they were configured 5+ years ago. This is positively pre-historic in this industry. The outdated software versions are so old I've never used them and the vendor no longer supports them.
When I first started here, I didn't know any of this. The customer (an extremely nice customer, by the way) called and reported a few problems and wanted me to log in and look around and present some recommendations.
I did. And I did it completely blind since none of this is documented.
I emailed the user (in November) and copied the current sales person that I'd recommend new, warranted hardware and a complete update of the infrastructure to supported versions.
The customer said that sounded great and for us to please do it, since technically everything belonged to us and he just gets everything as a service.
Awesome, right? Crap, no.
The sales person (and management three levels up) said we can't upgrade anything. The contract was priced so long ago that we don't make any money off it. In fact, since it is such a constant pain in the ass due to age, we would spend more upgrading it than we make.
"But we are selling them a service, and they are paying us," I argued. That apparently doesn't beat out, "We would lose money even though it is our own stupid fault."
Great. So we own an unsupportable environment with the only agreement being that we will support it.
Last night I was at home when I got a call from work. Email had stopped working for this customer due to a drive space error. I logged in from home and looked around. The mail store drive, in this case the G: drive, was completely filled up with 34+ gigs of mail. I located a folder with the *.old file name and started to delete this obsolete data to free space.
Most of the way through this process, it was halted with a "resource unavailable" error. I checked. I no longer had access to the G: drive. According to standard Microsoft troubleshooting techniques, I rebooted.
When the server came back online, there was no G: drive at all.
I manually re-initialized the drive and told the system that it was "G:" again. When I attempted to access it, it was not formatted. The basic drive structure, the map that tells the operating system how the data is arranged, had fallen off. Windows offered to format it for me, erasing all data. Crap crap crap.
I called work, not too upset. After all, we fully manage these servers, so I'll just have the latest backed up image of the G: drive dropped onto new hardware.
Except that management ordered us to stop backing this stuff up for this client over four years ago. Because it was costing us a few extra dollars a month and putting us that much closer to taking a loss.
To summarize, no mail data, no back up data, no hardware warranty, no software support and management telling me to work quickly because I was needed on stuff where the company could make money.
Right now I'm attempting a data recovery with a hacked and stolen utility just to restore mail service. Restoring emails from the last five years is a project for later today.
And why do we care? Aside from the basic concern for the plight of our fellow man?
The company with no access to data is responsible for price adjustments in the sale of petroleum products. Generally price adjustments that benefit the consumer. In fact, always price adjustments that benefit the consumer. And they use data gathered over the past five years to establish the proper margins.
Happy driving season, everybody.

6 comments:

Darrell Davis said...

... (blank stare) ...

Garrick said...

Yeah. So.
No hacked data recovery software could even find the drive. I lugged a spare server over to the rack just to get some kind of email online and noticed the cables behind the failed mail server were oddly configured.
While the Exchange environment was configured as a cluster, one of the nodes was dark, powered down and unplugged.
Reconnecting it and rebooting both nodes brought mail back.
Due to some random combination of drive space filling, then freeing, then services failing and restarting, the cluster service failed to the unavailable node, taking the shared G: drive with it.
All of this is documented somewhere. Oh yeah, in the comment section of this blog. Right there /^\.
Go Team Guesswork!

Joe said...

... (utterly confused stare) ...

Adrian said...

...*blink*...*blink*...

I'm forwarding this blog to Wayne Dolcefino

Darrell Davis said...

ok, Was the server working at any time with the cross wiring? Did you involve a sacrifice, was it living? Also exactly how much did you recover with the dark drive restore. Was it simply email that was recovered. Did you find the G spot (had to be done) or is it truely gone after 5+ years of use.

Garrick said...

Ok. It was a cluster with a shared drive array. The other node was turned off and unplugged but the "functional" node didn't know that.
The drive vanished, with all data, but rolled over to the other node.
When that node was brought up, it not only saw the data but had no record of any unpleasant outage.
Complete blind luck.