I re-learned some important troubleshooting techniques recently that I’d like to share with you. They are rather simple and could save hours of time trying to fix something that’s broken.
1) Slow down
I remember as a Computer Operator that I submitted jobs for batch processing at a rapid pace. One day I submitted a job before it was actually supposed to run. The programmer who had to fix errors caused by the mistake came into the Computer Room and very kindly explained to me to slow down and take my time. No one expected the night shift Operator (which I was) to work like a day shift Operator (which I normally was not). The Operators who normally worked day shift knew all the little gotchas of the schedule, because they worked with the schedule on a regular basis. Taking a few extras minutes to assess the schedule was not going to cost the company millions of dollars, or put people out of work.
The same applies to troubleshooting production issues. Even if the issue is costing the company millions of dollars, the shotgun approach to fixing an issue may cost the company even more money if the fix does not actually fix the issue. When troubleshooting an issue, remember the old saying “Speed kills”.
2) Trace the circuit
I had a friend years ago that installed gas pumps for a living. One of the most valuable pieces of advice he shared with me is tracing the circuit, or the flow of electricity. In programming, tracing the flow of data will help in finding the issue. Start with where the data comes into the application, where it’s processed, and where it will finally come to rest. One of the more recent issues I had was the use of several HTML hidden values. I spent hours trying to find out why the data in the database remained unchanged. It turns out that the developer who previously worked on the application added some hidden fields to the HTML, which overrode any data selection I made. Not really his mistake, but mine for not following the simple practice of tracing the circuit.
3) Leave no stone unturned.
I find it odd that many I have worked with in the past easily dismiss even the simplest check when troubleshooting an issue. “It can’t be that” is usually what I hear, and usually they are right. However, I have found that sometimes it is the thing people easily dismiss is the root cause of the issue. Sometimes the assumption is made that the application is using the correct data source, when in reality a mistake was made setting up the configuration of the application. If you’re doing testing and people are having an issue with production, maybe your testing is causing the issue, because the test application is pointing to production. Always check the items that take two seconds to check; may save tons of time in the future. Sometimes the reason the car will not start is because there is no gas in the car. Sometimes the reason why the data does not look right is because the data source used is not the right data source.
4) Ask for help
Never be afraid to ask for help. Many in IT had a sense of pride in what they know, and this pride sometimes prevents them from asking for help. Remember the old proverb that “Pride comes before the fall”. The code you work on for a company is the company’s code, not yours. Issues encountered with the code you work on need to be shared with others, no matter how foolish it may seem to do so.
5) Information is power
I have found few companies that are perfect enough to sustain the loss of an employee. Tacit (tribal) knowledge is not properly transferred, which leaves a gap in the company’s ability to maintain an application. This is usually the result of one person being the sole support for an application. It’s not a perfect world, and everyone has a day job. But when there are team meetings, it’s important to share as much as possible and practical with the rest of the team, so other may acquire the knowledge needed to support the business. In turn, as a team member, it is important to listen to the information shared. The application is what the application is at the time. Future changes can always be made, but it is what it is today.
To summarize, taking your time, following the flow of the application data, searching for the simplest solution, and asking for help when needed, will save hours of time when troubleshooting an issue. Unless you walk on water, no one expects that an issue will be resolved in two seconds. Finding a bug can be a labor intensive operation. Bugs are sometimes subtle in how they appear, and some bugs can be almost impossible to replicate. The goal in troubleshooting is to find the bug and eliminate it. Trying the simple solutions first, in a logical manner, will ultimately get you to the goal of finding a solution. As Arthur Conan Doyle stated, Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth.