Log in

No account? Create an account
entries friends calendar profile Previous Previous Next Next
Daily rant - Ed's journal
Daily rant
Today, at work, we had a disk fail on one of our storage arrays.

Normally this is irritating, but minor - engineer is dispatched with new disk to swap it.

Not today it seems. No, today our problem management team would not raise an incident for me - there was no user impact, because we have a resilient system, therefore it was not acceptable for them to fix it based on an incident, and a retroactive change. I would therefore have to raise an emergency change.

Well, this first of all pissed me off. I'm not changing anything, I'm just getting a malfunctioning part swapped out. I'm still pretty marginal on whether that should be considered a _change_ at all, because nothing is, in fact, changing.

So anyway. After ... getting a bit wound up by the fact that it was acceptable for us (this was internally) to stonewall replacing some of our customers hardware that we _knew_ had a fault, I spoke to someone in our customer's team, to find out ... quite why they thought that was a good idea.

It seems they didn't, and they were fine with the concept of 'just swap the damn disk, before another one burns out'.

But anyway, I finally cracked and started filling our emergency change form - resisting the urge to be massively sarcastic when answering the 'why can your change not be done as part of a planned release process' (Because I'm not psychic), and 'please explain in non technical terms what you're doing' and 'what is your justification for dispensation from the testing process' and a whole selection of asnine little questions.

My emergency change was rejected, because there wasn't enough time to process it between when I submitted it (about 16:00, admittedly) and when I'd specified for it to start - 09:00 tomorrow.

Now, we have a really rather robust storage system, and it actually is the case that this disk is not really a problem - we've several hot spares, which will function just fine, even after several drives go 'pop'.

But that's not the point. It's not hard, when you have a 4 hour support agreement with a vendor, which costs lots of money, to get this done. It's only when you involve muppets, that it's turned an incident that should have a quick resolution, into what can only be described as the IT equivalent of the benny hill show.
5 comments or Leave a comment
cbr_paul From: cbr_paul Date: September 30th, 2008 08:28 pm (UTC) (Link)
I'd have had a field day with the 'please explain in non technical terms what you're doing' part alone!!!
sobrique From: sobrique Date: September 30th, 2008 09:42 pm (UTC) (Link)
Well I was thinking that I'd do a patronising explaination of a hard disk, and how it works.

But then got bored, and was still angry already.
(Deleted comment)
sobrique From: sobrique Date: September 30th, 2008 09:41 pm (UTC) (Link)
nah, it doesn't quite work that way - it's the customer's change process that we're using, so it's them that set all the 'rules' on it.
stgpcm From: stgpcm Date: October 1st, 2008 06:56 pm (UTC) (Link)
The question is does the system automatically using a "hot spare" constitute a change? their disk layout is now different, which *could* alter the performance profile, and they have one less hot spare available to them.
crashbarrier From: crashbarrier Date: October 1st, 2008 07:45 am (UTC) (Link)
IT equivalent of the benny hill show.

cue ubiquitous musical accompaniment?

5 comments or Leave a comment