Archives

That's it man, game over man, game over! What the &%$ are we gonna do now?

I regularly receive questions about backup and recovery. A particularly troubling message I received recently prompted me to create the polls on database backup and recovery. I decided to create a short series of articles on disaster recovery for the small/medium sized businesses that typically use Made2Manage. This article is a true story of how I learned the importance disaster recovery the hard way.

Backups are Secondary

Let me preface the story with the following. When thinking about backups, what we are really concerned with is recovery. If you had an armed SWAT team that arrived at your office every day, gathered your back up media, and whisked it away deep inside some mountain somewhere, it wouldn’t matter if you couldn’t restore them. So, no matter which backup method you decide on, the first order of business would be doing regular test restores.

I restore my M2M databases to two different test servers every single night using T-SQL scripting and SQL Server Integration Services. The purpose is to have fresh data to work with on these test servers, and I’ll describe that project at a later date.

What Disaster

Years ago, not long after I inherited M2M, I had a terrible problem with M2M Support. This was back in the awful days of VFP (version 3.2 I think), and I regularly had data corruption problems. Being a neophyte, I called support and they walked me through repairing my problem in VFP. The M2M Support Technician instructed me to delete my (DBF) file so that I could recreate it. Even though I was still a beginner, I questioned her about that because that didn’t sound right. I thought the proper file to delete and recreate was the CDX file (index). I distinctly asked her two times if she truly wanted me to delete the DBF and she verified the instruction. So, I deleted the DBF and lost all of my shipping line items in that one instant.

I was then instructed to restore from backup. Backups were administered by an outside company who handled our hardware (not Consona). Since I was new to IT, backup and recovery were deemed too critical for me to administer. To make a long story short, their techs were making their weekly visits, switching out backup tapes, performing their test restores, and certifying that they worked. Except…. They weren’t.

The Realization

I immediately called the hardware consulting company and while a technician was on route, I opened the backup software console. When I scanned the entries, I realized that we hadn’t had a backup in weeks. There was an error with one of the tapes, and the backup software would not start a new backup job before the error was cleared. These fools were not even checking the logs, much less performing test restores. I have no idea how long it had been since they’d actually done the job for which they charged us. I then pulled the physical check sheets and they were filled out correctly of course indicating that all backups were tested.

The Aftermath

M2M informed me that they could no longer help me, and correctly so. While the original problem was their fault, all they guarantee is that they can get you back online provided you have a good backup.
The long walk to the boss’ office after you’ve made such an error is a terrible experience. Thoughts of, “I wonder how long it will take me to get a resume together,” were running through my mind. Amazingly he was very relaxed about the problem and didn’t seem overly concerned.

After the hardware technician arrived and surveyed the scene, he tried to weasel his way out of it, explaining to my boss that backup tapes were unreliable. I pinned him to the wall at that point, exposed the fact that he had not been doing his job, and in generally vented my frustration at him.

The Recovery

The hardware technician restored the 3 week old database (which in VFP is nothing more than a bunch of files in the operating system) to a different place on the drive and left. M2M was down and we were in crisis, but I went into my office and shut the door. I sat in the silence, breathed deeply, and thought about the problem. I contracted with a VFP programmer to re-build the shipping item (SHITEM) table as completely as possible using data from the shipping master (SHMAST), and other tables in the database.

This cost my employer a small fortune, which I insisted be passed on to the hardware consulting company. We lost a day’s productivity, but were otherwise unscathed. I didn’t hold what happened against the M2M Technician, and we’ve had dinner a few times since then at national M2M Conferences.

How Can I Avoid the Long Walk?

Stay tuned to this series of articles, and I’ll help you avoid the mistakes I made.

Related posts:

8 comments to That’s it man, game over man, game over! What the &%$ are we gonna do now?

  • Andrew

    I had a problem many years ago, which didn’t involve M2M. I couldn’t restore from tape and it DID cost me my job.

  • I had a server crash one time. fortunately it was on a friday night, my tape was good, and i got everything going by monday.
    looking back on it, i think it was because i added a cheap IDE hard drive to the server about a year before. I never skimped on server hardware after that.

  • Anthony Agpaoa

    I’ve gone away from the tape route. I backup using grandfather scheme onto a USB external drive form the server off hours. I then backup the files onto my own external HD’s that I rotate in and out of a fire proof safe.

    IT’s just quicker for me to do a search and a restore off USB drives. All I need to configure now is offsite backup which I’m currently looking for solutions.

  • I continue to receive the following ERROR message on our exception list for Backup Exe

    Failed to Open M:\BCSHARED\BCSHARED.DBF WILL TRY AGAIN….

    I have asked my IT person and M2M support and no one knows how to fix this problem. It appears it is the Barcode table and I can only assume the tables shut off to process the backup but the barcode screen is left open so the program will auto turn on at the appropriate time. I can only assume the transactions for the barcode have already posted to M2M software and this is just an open file error, but no one will confirm or deny this. What are your thoughts?

  • Debbie, what backup software are you using?

  • [...] That’s it man, game over man, game over! What the &%$ are we gonna do now? [...]

  • Paul Deleanu

    I had a problem with a bad server motherboard (a faulty RAM VRM module(s)) where the raid card falsely flagged a hard drive as failing. I had a replacement drive within 4 hours and not knowing the motherboard was the problem I swapped the bad drive and started the rebuild.

    The next day I woke up to find a message in my inbox telling me that the server was down. The bad motherboard had corrupted EVERYTHING. IBM was very good about the whole thing and got me a replacement server right away; even though per our service they should have waited till Monday. Restoring from bare metal using my backups was rather painless but I didn’t backup everything (firewall settings, anti-virus management, IIS) and had to spend the next two days rebuilding secondary databases. All in all besides me spending the weekend over there were no problems and the incident went largely unnoticed.

  • [...] Server Recovery Models and Backup Types This is the fourth of a series of articles discussing backup and recovery for M2M [...]

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>