In his book “Human Error”, James Reason put forward a model of how systems fail by describing a system as a set of layers of Swiss cheese with holes in random places. If a path exists through all the layers (i.e., there is an alignment of holes in each layer) then an incident will occur.
This model is not perfect but it is instructive in the sense that there are normally multiple things that go wrong that ultimately lead to an accident. Only when all of these causes simultaneously occur will an incident result.
For example, if there is a vapor release, but no source of ignition, then there will be no explosion. However, if there is both a release and a source of ignition, then an explosion will result.
Let’s review five oil refinery and offshore rig incidents by looking for causes that are specifically related to maintenance activities or systems.
Shortfalls in maintenance are not the only cause—and sometimes not even the main cause of these accidents—yet there are always lessons to be learnt as we seek to continually improve our maintenance and reliability performance and reduce the possibility of safety incidents. Most of the investigation reports below come from the US Chemical Safety Board (CSB), which is an independent federal agency. Their mission is to investigate chemical incidents and advocate for the implementation of corrective measures.
These measures include the evaluation of other government agencies and their role in the regulatory control of the industry. Investigations completed by the CSB are not internal company investigations, nor regulatory body investigations. Instead, CSB acts as an independent third party, highlighting lessons and corrective actions relevant to the industry as a whole.
In early 2014, there were two incidents of sulfuric acid release, which caused worker injuries in the Alkylation Unit of the Tesoro Martinez Refinery. The first incident resulted in a release of 84,000 pounds of sulfuric acid and caused burns to two employees with the potential for far more severe consequences. The second resulted in a spray of acid that injured an additional two workers.
Alkylation units contain both highly aggressive acid catalysts and light hydrocarbons, creating a combination of hazards with the potential for severe consequences. Both incidents at the Tesoro refinery occurred during the performance of non-routine maintenance work.
The investigation report highlights the inherent increase in risk when conducting work that is unfamiliar and where procedures and work instructions may not be well known and understood.
Detailed planning and risk assessments are absolutely essential when conducting non-routine work. A highly-structured planning process will lead to appropriate questions being asked and answered regarding the work steps required and the risks at each stage.
A multi-disciplinary team has the best chance to properly prepare for the work and minimize the risk to employees.
Much has been written about the tragic incident on the Deepwater Horizon Rig on April 20, 2010, in the Gulf of Mexico, in which 11 people lost their lives and 17 others suffered serious physical injuries.
The CSB released their report in April 2016, which highlighted some learnings related to maintenance functions. Technical findings in the report indicated that one of the emergency disconnect systems (the blue pod) was not functional at the time of the incident. Miswiring of a solenoid caused a critical battery to be drained, therefore rendering the blue pod inoperable.
Each piece of equipment handed over to operations for commissioning after maintenance or project work is accompanied by a handover pack. As part of this handover pack, every discipline signs off to verify that the equipment is in good condition and functions as per design.
A handover pack for a single solenoid is certainly not the highest profile pack in the commissioning phase of a new project or after a turnaround. However, the failure of this particular solenoid was one of the contributing factors to a significant disaster.
For maintenance professionals, this finding mobilizes us to examine our handover pack systems with great attention to detail. The quality of the asset information in the planning system directly influences the quality of the handover pack.
The level of care we take in managing the integrity of the handover process is critical to the prevention of incidents.
On October 23, 2009, a large explosion at the Caribbean Petroleum Refinery in Puerto Rico caused extensive damage to 17 petroleum storage tanks as well as severe damage extending into the neighborhood beyond the refinery.
The incident occurred during the offloading of gasoline from a tanker ship to the tank farm. The tank being filled actually overflowed, resulting in a vapor cloud release and subsequent explosion.
One maintenance-related cause identified in the CSB report is worthy of some attention - a simple failure of the tank side gauge transmitter.
When the transmitter failed, data was no longer transmitted to the computer system and trends were not visible to the operator to indicate the overfilling of the tank. It was reported at the time that the level transmitters were often out of service.
Here’s a case where a “bad actor” can easily become a situation that people learn to live with. Every maintenance organization should understand the importance of continuous improvement and the determination to resolve ongoing maintenance issues (bad actors). These lessons can have a significant impact on reducing the risk of an incident.
A pipe rupture on the Crude Unit of the Chevron Refinery in Richmond, California on August 6, 2012, resulted in a vapor release and explosion. Eighteen employees were caught in the vapor cloud but were able to make their way to safety before ignition.
Smoke and particulate clouds travelled across the surrounding community. In all, 15,000 people from the surrounding area were treated for symptoms of sore throats, breathing problems, chest pains and headaches. Approximately 20 people were admitted to hospital.
The offending pipe was found to have been damaged by sulfidation corrosion, which is known to be highly aggressive on carbon steel piping.
One of the key activities designed to combat this kind of failure is the implementation of routine inspections as part of the Preventative Maintenance strategy.
The purpose of Preventative Maintenance routines is to monitor equipment in order to observe warning signs before a failure occurs. A periodic review of PM routines is important to evaluate what checks are being done, and to determine whether these checks are the right ones that can prevent a failure from occurring.
On March 9, 2012, an incident that happened on the Suncor Altares drilling rig serves to highlight the importance of adequate orientation and training when moving from one environment to another.
The rig had previously been commissioned in heavy oil well conditions at shallow depths, whereas it was now deployed in drilling a deep, high-pressure gas well. During the drilling operation, a kick occurred (higher pressure in the drilled rock forces formation fluids into the well bore).
The casing pressure increased above the maximum allowable level (MACP) but the drilling program did not provide work instructions for what action to take in this scenario. In addition, the Well Control program was inadequate as it only outlined MACP in general terms and was more oriented towards shallow well drilling programs.
Ultimately a mechanical failure led to a blowout of the well. The drilling rig caught fire and was completely destroyed.
A more thorough process for training and orientation could have prevented the blowout on Suncor Altares and is an important lesson to minimize risk for safety, environmental and production incidents.
In just the previous five events:
These are the frightening consequences of failures in systems related to oil refining and offshore drilling.
Every incident that occurs in a production environment provides the opportunity for us all to learn and improve our existing policies and practices in order to prevent a recurrence.
Some incidents have tragic consequences and are widely discussed in the media while others may not cause such significant consequences. Nevertheless, the more we can eliminate the “holes in each slice of cheese”, the more we can reduce the possibility of a catastrophic accident.
The specific maintenance causes noted in these incidents are only a few of many that could play a role in a major loss of life, property, and revenues.
What incidents have you witnessed or studied that have lessons for us to learn as maintenance and reliability professionals?
Interested in an electronic permit to work solution that operates with your business rules and works on any device and any ERP, CMMS, or EAM system? Request a Demo of Prometheus Permitting & Safety.