I regularly receive questions about backup and recovery. A particularly troubling message I received recently prompted me to create the polls on database backup and recovery. I decided to create a short series of articles on disaster recovery for the small/medium sized businesses that typically use Made2Manage. This article is a true story of how I learned the importance disaster recovery the hard way.
Backups are Secondary
Let me preface the story with the following. When thinking about backups, what we are really concerned with is recovery. If you had an armed SWAT team that arrived at your office every day, gathered your back up media, and whisked it away deep inside some mountain somewhere, it wouldn’t matter if you couldn’t restore them. So, no matter which backup method you decide on, the first order of business would be doing regular test restores.
I restore my M2M databases to two different test servers every single night using T-SQL scripting and SQL Server Integration Services. The purpose is to have fresh data to work with on these test servers, and I’ll describe that project at a later date.
Years ago, not long after I inherited M2M, I had a terrible problem with M2M Support. This was back in the awful days of VFP (version 3.2 I think), and I regularly had data corruption problems. Being a neophyte, I called support and they walked me through repairing my problem in VFP. The M2M Support Technician instructed me to delete my (DBF) file so that I could recreate it. Even though I was still a beginner, I questioned her about that because that didn’t sound right. I thought the proper file to delete and recreate was the CDX file (index). I distinctly asked her two times if she truly wanted me to delete the DBF and she verified the instruction. So, I deleted the DBF and lost all of my shipping line items in that one instant.
I was then instructed to restore from backup. Backups were administered by an outside company who handled our hardware (not Consona). Since I was new to IT, backup and recovery were deemed too critical for me to administer. To make a long story short, their techs were making their weekly visits, switching out backup tapes, performing their test restores, and certifying that they worked. Except…. They weren’t.
I immediately called the hardware consulting company and while a technician was on route, I opened the backup software console. When I scanned the entries, I realized that we hadn’t had a backup in weeks. There was an error with one of the tapes, and the backup software would not start a new backup job before the error was cleared. These fools were not even checking the logs, much less performing test restores. I have no idea how long it had been since they’d actually done the job for which they charged us. I then pulled the physical check sheets and they were filled out correctly of course indicating that all backups were tested.
M2M informed me that they could no longer help me, and correctly so. While the original problem was their fault, all they guarantee is that they can get you back online provided you have a good backup.
The long walk to the boss’ office after you’ve made such an error is a terrible experience. Thoughts of, “I wonder how long it will take me to get a resume together,” were running through my mind. Amazingly he was very relaxed about the problem and didn’t seem overly concerned.
After the hardware technician arrived and surveyed the scene, he tried to weasel his way out of it, explaining to my boss that backup tapes were unreliable. I pinned him to the wall at that point, exposed the fact that he had not been doing his job, and in generally vented my frustration at him.
The hardware technician restored the 3 week old database (which in VFP is nothing more than a bunch of files in the operating system) to a different place on the drive and left. M2M was down and we were in crisis, but I went into my office and shut the door. I sat in the silence, breathed deeply, and thought about the problem. I contracted with a VFP programmer to re-build the shipping item (SHITEM) table as completely as possible using data from the shipping master (SHMAST), and other tables in the database.
This cost my employer a small fortune, which I insisted be passed on to the hardware consulting company. We lost a day’s productivity, but were otherwise unscathed. I didn’t hold what happened against the M2M Technician, and we’ve had dinner a few times since then at national M2M Conferences.
How Can I Avoid the Long Walk?
Stay tuned to this series of articles, and I’ll help you avoid the mistakes I made.