How do You Find a Wolf in Siberia?

Siberia is a vast, not-quite-entirely-charted territory. It has an enormous variety of terrains, and in the winter can be quite challenging to exist in. So how do you find one lone wolf in all that space?

Simple: you build a wolf-proof fence down the middle of it, and then just figure out, through some kind of test (maybe a Star Trek-style sensor sweep) which side the wolf is on. Once you know, you repeat the process, building a wolf-proof fence down the middle of that side, and figuring out which side the wolf is on. You repeat until you’ve boxed the critter in and can see him with your own eyes. The wolf-proof part is important, because it keeps him boxed in–once you eliminate a chunk of land, yo udon’t need to go back and consider that chunk again.

Now, this isn’t the only troubleshooting approach in the world. There are others. But for me, this is the one that’s the most broadly applicable, and useful, in almost any situation. The “wolf-proof fence” is simply a test that you run, which can eliminate one or more root causes of whatever problem you’re dealing with. In reality, your “fence” might not eliminate exactly half the root causes at once, as the analogy implies, but the goal is to definitively eliminate one or more causes with each test you run.

Suppose you have two cars, and one morning, one won’t start. So as a test, you try to start the other car, and it does start. What possible problem causes have you eliminated? None. No realistic ones, anyway. Your first car could still have a variety of problems: it could be out of case, have a dead battery, have a bad wire someplace, or any number of likely causes. But your test didn’t eliminate any of those, and so your test was useless. A waste of time–you didn’t build a wolf-proof fence.

That’s the kind of “troubleshooting” I see far too many people engage in, particularly in my field of information technology. They waste effort, sometimes because they simply get flustered and are in a hurry, other times because they don’t actually know anything about the system they’re trying to test. Either way, they try more or less random stuff, or they try stuff that’s worked before without knowing why it worked before. They burn a lot of time, sometimes feel bad about not being able to solve the problem, and perhaps worry about the security of their job.

My methodology is simple, and it isn’t unique; it is–perhaps using different words–the same troubleshooting methodology you’ll see taught in plenty of places. But it’s incredibly effective.

  1. Understand the symptoms.
  2. Understand the scope.
  3. Replicate the problem.
  4. Divide and conquer.

Along the way, you’ll need to avoid some common red herrings, like relying too much on belief and not entirely on facts, conducting inconclusive tests, and changing too many variables at one time. There are some semi-red-herrings, too, like asking, “what changed?” You’ll need to pick up a little “disguised scientific method,” and really train yourself to be consistent and objective. But you can do it.

Ready?