So after some sleep and contemplation here are a few of my take aways from Sunday's problems opportunities ...
Being a single parent in the 24/7/365 IT world would suck. Thank you Jesus for a wonderful wife!
The ability to remote control almost everything on our network ... priceless!!
A reminder that UPS units can fail ... this is the 2nd old one I've seen fail and power off equipment tied to it!
Check the F1 support site sooner ... I would have saved 30mins troubleshooting if I'd checked sooner.
You can get right through to an SBC tech at 2am in the morning ... no wait time!
We need some remote controlled power strips ... we had these at my prior job ... give the power strip an outside IP and you can check the status via web interface of each item plugged into the strip and power cycle individual outlets on the strip should a PC/server be powered off or locked up hard. A good one also has a modem for backup access. Some popular ones are by Western Telematic, Dataprobe, and Micro Energetics Corp. Of course if the UPS powering your remote power strips is down ... well, you're up the creek :-)
Living just 3 miles from where you work ... priceless!!
For us, I believe this is the first time in over a year that F1 has been down on a Sunday. That's pretty impressive...of course we all want 100% uptime, but that's impossible. Kudos to the F1 support team for the quick turnaround.
Despite your best efforts ... problems will occur. Always have a plan B ... and make sure it's tested before you need it. Shelley made a comment that she was, in a way, glad that they had to actually put their check-in plan B to use. I'm sure it helped them uncover inefficiencies that they'll use to revise plan B. Some systems may require a plan C also.
Offsite monitoring tools rock! Knowing there's a problem before your end users start calling you is golden. Our Postini anti-spam/virus solution has SMTP monitoring built-in. If you don't have something like this in place check out DynDNS business tools. Their Network Monitoring service is only $99/yr and includes 12 different checks you can run. They even now have an Email Backup service that will spool your email should your onsite email server be unavailable.
The fact that it was 50 degrees at 2am when there should be snow on the ground ... priceless!
After checking, our front firewall is no longer setup to auto fail-over to our backup ISP ... doh! I'm betting at some point during our VLAN implementation several months ago this got broken. The challenge is that our backup ISP comes in from the auditorium roof (wifi broadband) about as far away from our firewall as you can get in our facility. Getting the connection to the firewall is a pain.
Wanna know what my plan B was if our SBC connection was dead? Given my lack of hands-on knowledge of VLAN'ing, I thought of a quick-n-dirty solution. Slap our spare/ghosting router between the backup ISP and our LAN. Turn off DHCP on this router (it's on a different subnet) and give the check-in PC's static addresses to that router. I'm guessing I could have the the router config'd and PC's remapped inside 30mins.
Obviously, we (aka Ed) now have a new task to get the auto fail-over working again :-)
The missing link I'd like to remedy is having someone network savvy onsite during services. This weekend IT support role would be the first line of defense before I get called. We identified this missing link back in June '05, but have still not filled it ... the good news is that we've not needed someone in this role for the past 9 months ... nor would they been able to fix the F1 issue Sunday. We've had some discussion with our Tech Arts team about this in the past as they have some IT guys on their teams ... looks like it's time to rekindle that discussion. The trick is finding someone network savvy already in a weekend role that has some flexibility to do IT support if needed.
I was impressed with Shelley's troubleshooting ... before she called me they had rebooted more than once, verified they could get on the internet, and verified that the check-in app would launch ... not bad for someone non-techie :-) She even commented she thought it was a problem on F1's end and not a network issue.
Explaining to a kid why you're going to church in the middle of the night is pretty humorous. I could see the look of confusion on their half-awake faces :-)
I took Monday off and won't see anyone until 2pm today ... so I've not yet had a chance to chat with anyone about this so I'm sure some more thoughts are still to come...
Jason, thanks for all the hard work and dedication - especially during "single parenthood!" You're a good man Charlie Brown. I enjoy working with you - even if it's only every now and then.
:)
Posted by: daryl mcmullen | March 14, 2006 at 12:50 PM
Jason,
Just experienced the same issue with defective power strips, can be very frustrating! Hope things improve for you!
Posted by: Travis Kensil | March 14, 2006 at 02:38 PM