The Field Guide to Understanding Human Error by Sidney Dekker

Sidney Dekker is the founder of the Safety Science Innovation lab, in Brisbane, Australia.  He holds a Masters in Psychology, another in Experimental Psychology, and Phd. in Cognitive Systems Engineering.  His work on safety and human error have drawn worldwide recognition.

The Field Guide is Dekker’s sixth book, all of which should be required reading by anyone working with safety issues within complex systems.  The book makes a strong argument against blaming incidents on “human error” and instead going deeper to the underlying causes that allowed the conditions for the incident to take place.  A key takeaway is that when investigating an incident or accident you must put yourself inside the tube of which the people involved were reacting from, not outside the tube with the luxury of hindsight.

Nobody goes to work to have an accident.

Investigation

  • ‘Human error,’ after all, is no more than a label. It is a judgment. It is an attribution that we make, after the fact, about the behavior of other people, or about our own.
  • Debriefings of mishap participants help construct the situation that surrounded people at the time and gets their view on that situation.
  • A good starting point (when investigating an accident) is to build a timeline. This timeline, however, needs to be of sufficient resolution to reveal underlying processes that may be responsible for the “errors.”
  • Remember at all times what you are trying to do. In order to understand other people’s assessments and actions, you must try to attain the perspective of the people who were there at the time. Their decisions were based on what they saw on the inside of the tunnel—not on what you happen to know today.
  • Micro-matching can mean that you take performance fragments from the stream of events and hold them up against rules or procedures that you deem applicable in hindsight. You don’t explain anything by doing this.
  • This is how practitioners create safety: they invest in their understanding of how systems can break down, and then devise strategies that help forestall failure.
  • There is no single cause—neither for failure, nor for success. In order to push a well-defended system over the edge (or make it work safely), a large number of contributory factors are necessary and only jointly sufficient.
  • What you call “root cause” is simply the place where you stop looking any further.
  • Another aspect of managing such problems is that people have to commit cognitive resources to solving them while maintaining process integrity. This is called dynamic fault management, and is typical for event-driven domains.
  • Sticking with the original plan while a situation has actually changed and calls for a different plan is what Judith Orasanu calls “plan continuation.”
  • In other words, decision making in a dynamic situation is hardly about making decisions, but rather about continually sizing up the situation. The “decision” is often simply the outcome, the automatic by-product of the situation assessment.
  • If material is learned in neat chunks and static ways (books, most computer-based training) but needs to be applied in dynamic situations that call for novel and intricate combinations of those knowledge chunks, then inert knowledge is a risk.
  • Larry Hirschorn talks about a law of systems development, which is that every system always operates at its capacity.9 Improvements in the form of new technology get stretched in some way, pushing operators back to the edge of the operational envelope from which the technological innovation was supposed to buffer them.
  • “Loss of situation awareness” simply says that you now know more than the people who were in the tunnel. “Loss of situation awareness” puts you outside, in hindsight.  What this really shows, however, is your ignorance about the situation other people faced.  Your job is not to point out (by whatever labels) that you are now smarter than operators were back then.  Your job is to go down into the tunnel with them, and to understand their awareness. To understand what made sense to them at the time (without knowing the outcome!) and to explain to yourself and others why.

Drift

  • While a mismatch between procedures and practice almost always exists, it can grow over time, increasing the gap between how the system was designed (or imagined) and how it actually works. This is called drift
  • Past success is taken as guarantee of future safety. Each operational success achieved at incremental distances from the formal, original rules can establish a new norm.
  • Departures from the routine become routine. Seen from the inside of people’s own work, violations become compliant behavior.
  • A major driver behind routine divergence from written guidance is the need to pursue multiple goals simultaneously.
  • Doing what you do today (which could go wrong but did not) does not mean you will get away with it tomorrow.
  • While the end result of drift may seem a very large departure from originally accepted routines, each step along the way is typically only a small increment (or decrement) from what was accepted previously.
  • A group’s construction of risk can persist even in the face of continued (and worsening) signals of potential danger. This can go on until something goes wrong, which (as Turner would have predicted) reveals the gap between the presence of risk and how it was believed to be under control.
  • This is why high-reliability organizations (HRO) deal with risk by remaining chronically uneasy. HRO suggests you stay curious, open-minded, complexly sensitized, inviting of doubt and ambivalent toward the past.
  • People in HROs are described, ideally, as skeptical, wary and suspicious of quiet periods.

Complacency

  • Complacency is a huge term, often used to supposedly explain people’s lack of attention to something, their gradual sensitization to risk, their non-compliance, their lack of mistrust, their laziness, their “fat-dumb-and-happiness,” their lack of chronic unease.
  • Using ‘complacency’ is an investigative or managerial cop-out.
  • “The notion of complacency arises from the suggestion that, particularly when the automation being monitored is highly reliable, operators may not merely trust it, but trust it too much, so that they fail to sample (monitor) the variables often enough.”

Accident Models

  • The kind of safety work you do depends on what you think is the biggest source of risk. That in turn depends on what your accident model is.
  • The Chain-of-events model stems from the 1930s. It says that a linear series of errors, failures and violations is necessary to push a system over the edge into breakdown.
  • Please stop using the triangle Heinrich, by the way, is also to many the father behind the so-called “triangle:” the idea that there is a proportional relationship between major accidents/fatalities, injuries and incidents, and minor events.
  • The triangle promises the following: Serious injuries, accidents and fatalities can be avoided by reducing or avoiding minor incidents and safety events.  But incidents in very safe systems are caused by radically different things than accidents or fatalities.  A continued belief in the triangle can probably make your industry less safe, because of the false idea that we can control the risk of a major disaster by counting, recording, tabulating and suppressing the small stuff.
  • Take the Macondo accident (also known as Deepwater Horizon). Just before the accident, managers were celebrating six years of injury-free performance.
  • Barrier models propose that our safety-critical activities are generally well protected against risk.
  • The barrier model is inspired by industries where containing (dangerous) energy is the major safety issue, for example process control, oil & gas, nuclear power generation.
  • The barrier model is, in many ways, still a chain-of-events model.4 One thing causes the next, in a linear sequence, until all barriers have been breached.
  • What is much harder for a barrier model to do is to explain the social, organizational and bureaucratic context that gave rise to weak defenses.

Systems Theory

  • The more complex a system, the more difficult it becomes to control.  The more complex a system, the more difficult it becomes for people to even know whether they still have adequate control or not.
  • System models build on two fundamental ideas: Emergence: Safety is an emergent property that arises when system components and processes interact with each other and their environment.  Control imposes constraints on the degrees of freedom (for example, through procedures, design requirements) of components, so as to control their interaction.
  • System accidents result not from component failures, but from an erosion of control of safety-related constraints on the development, design and operation of the system.
  • NASA’s “Faster, Better, Cheaper” organizational philosophy in the late 1990s epitomized how multiple, contradictory goals are simultaneously present and active in complex systems.

Safety Culture

  • A safety culture is a culture that allows the boss to hear bad news.
  • Managing safety on the basis of incidents is only one way—and in a sense a very limited way.
  • Safety I. Safety is the absence of negative events. A system is safe if there are no incidents or accidents. Safety II. Safety is the presence of positive capacities, capabilities and competencies that make things go right.
  • Safety is made and broken on and in the line. So perhaps that is where responsibility for it should lie as well, yet many industries have turned large parts of their organizations’ safety work over to a staff function.
  • These Safety Officers are being a mere arms-length tabulator of largely irrelevant or unusable data, compiling a trail of paperwork whose only function is to show compliance, and being a cheerleader for past safety records.
  • The solution was to make it a much more active contributor to top-down management work—helping craft responses to incidents, debriefing people involved in them, and participating in other management decisions that affected trade-offs between safety and efficiency.
  • If you remain on the sideline as a supposed “impartial” department, you will see the basis for your work be whittled away. You must be ready to do battle for your data and the safety concerns they feed and represent.
  • An end to weekly or monthly or quarterly targets. Safety is not something that the safety department produces, so targets make little sense.
  • A continued grounding in operational reality. Having only full-time safety people can make your department less effective, as they can lose their idea (or never had one) of what it is to operate at the sharp end. That said, just being a practitioner (or having once been one) does not in itself qualify people to be members of, or have a strong say in, a safety department.
  • Old View safety tends to locate a lot of the responsibility for safety in processes, protocols, procedures, databases and paperwork. And it typically makes a staff department responsible for administering all of it. This has grown dramatically over the past 30 years.  In fact, some would say there is an over-regulation of safety. Safety has increasingly morphed from operational value into bureaucratic accountability.
  • Safety bureaucracies are often organized around lagging indicators: measuring that which has already been.  Safety bureaucracies tend to value technical expertise less than they value protocol and compliance.
  • In 2008, for example, two years before the Macondo well blowout, BP had identified what it called priority gaps in its Gulf of Mexico operations. The first of these was that there were “too many risk processes going on”, which had collectively become “too complicated and cumbersome to effectively manage.”

Let's Start a Discussion