Tuesday, September 16, 2008

My Proposed Solution Included "Hugging It Out"

Last week included Microsoft's monthly "Patch Tuesday".
Back in the day, security patches would come out whenever they were ready. Depending on how critical these patches were, many geek weekends were totally destroyed.
It was a dark and barbaric time, when geeks could never predict when they would be working and could never plan in advance to attend their regional comic book conventions.
So now there is a regular "Patch Tuesday", which gives us the ability to plan updates at the slight cost of accepting the Windows stranglehold on the I.T. industry.
And the geeky response is a resounding, "K."
But Sunday evening I authorized the latest set of patches everywhere I have access to do so, having been given new patch deployment toy. I mean tool.
And the patches flew from the tool with great haste, destroying vulnerabilities in our application infrastructure wherever they landed and bringing joy to each and every server. Almost.
On one server (sadly, a Production server) a vital (but formerly vulnerable) file was removed and yet not replaced by the automated patching tool. It sucked quite a bit, but was easy enough to track down and fix.
However, the entire Test environment crashed horrible after patching. There was much sadness.
My immediate response was (I believe) profound: "We have a Test environment?"
Of course, patches were blamed.
The error messages did not start until after patching.
I set about trying to restore functionality.
Troubleshooting is a lot like detective work, minus all the fast cars, A&E camera crews, mysterious dames and regular bathroom breaks.
Since I'd never seen this environment working, my progress was slow.
Restoring to an unknown "known good" is just an exercise in futility, so I decided around noon to reset the objective.
I was no longer attempting to repair. This was suddenly a forensics job.
In I.T., there is a sacred ritual which is normally restricted to management participation only. The opportunity for a layperson to join in the sacred "Placing of the Blame" was not to be missed.
Meticulously (and garbed in my traditional DBA skin coat painted with symbols from network diagrams of ancient and arcane token rings), I began to document settings which were not consistent with the Production environment. I also (gleefully) noted settings which varied within the environment itself from busted server to busted server.
Oh, there were many and their diversity at once shocked and delighted me.
Around 2pm, someone on my team forwarded me an email from the system owner to my
manager requesting that I be called out and stoned for applying untested patches.
To a Test environment.
I noted the last time these servers participated in the scheduled nightly reboots, which was one full cycle of the moon ago.
During that most recent progression of Sister Moon, the administrative account mistakenly used to configure this Test environment has its administrative rights removed and restored twice.
In fact, no one in the account configuration area can even tell me what the current status of that account is.
The fact that the servers all pull licensing information using this account was one I typed into a report while actually giggling.
Currently, there is still a check mark in the "broken" column of the spreadsheet tracking this environment, but I was fully successful in dodging blame and directing it to its rightful location.
I also noted in my report that since a Test environment is exactly the place for untested patches, the sundering of functionality just means everything is "working as intended" from a procedural perspective.

No comments: