Thursday, April 27, 2006

Take Five

Have you ever had a critical DBA moment where you are up against the wall and the clock is ticking? Maybe an upgrade didn't go so well and you had to try and revert at the last minute. Or perhaps you've just had a system crash and realized that your archive logs have been deleted or misplaced by a faulty backup.

I've had a couple of critical moments myself, and thankfully I've managed to work through all of them so far. I say so far because really as a DBA, you just live for those critical moments. DBAs are literally paid for maintaining a functionally sound database and overcoming those critical situations in a successful manner. That's your job!

Earlier this week I was able to work through such a case. One of my Oracle databases had just been migrated from 9.2.0.5 to 10.1.0.4 sucessfully. The user community was happily working and everything was fine for about 8 hours. About an hour before I was going home, I began receiving calls...from the entire community. Our third-party application which uses this database was no longer allowing new records or modifications to be made. The database was more or less in a read-only state.

My first thought was that perhaps something was overlooked in the database creation, and my datafiles had filled up without autoextending. All files looked fine. I also checked my tablespaces and they were configured properly. My SGA and PGA looked ok, and there were no database locks. Panic began to set in because now the users were unable to work for about 10 minutes. This is the point where you must take five! Maybe not really five minutes, but at least take a step back and compose yourself if you are ever in this situation. I've heard horror stories of DBAs taking the users out completely because they try to restart the instance. I've even heard of those who try to change SGA size, or tweak initialization parameters in such a situation. Not that those people are terrible DBAs, but pressure can make for some bad decisions. Do not let this pressure determine your outcome.

Back to my situation, I realized that I hadn't even checked for any invalid objects yet. My original assumptions as to what was happening did not lead me into that direction since I just ran utlrp.sql to compile all INVALID objects when I imported the data earlier that morning. Nonetheless, I took a look and found my problem. Someone modified a procedure which broke one of the triggers...they loaded an older version by accident This was the cause of the freeze. Once I loaded the newer version of the procedure, the database was back to normal.

Just another day's work as a DBA...and I got out of work on time too.

No comments:

Post a Comment

Please leave a comment...