The Electric Grid Needs Good Failure Mode Analysis

Published October 7, 2023
new york city at night

Failure mode analysis is a standard engineering practice. It means figuring out the basic ways a system can fail, each way being a mode of failure. Once you understand these failure modes, you can do things like watch for them, or prepare for them, or take steps to keep them from happening.

Way back when I was a junior engineer, I did mathematical failure mode analysis for large dams. There were just 3 to 5 basic modes, depending on the design. Earth dams had more than concrete ones, but I digress.

We now have repeated, even urgent, warnings that the electric power grid is increasingly prone to failure. Some of these warnings have come from people who actually oversee the grid, including FERC, NERC, and the independent system operators (ISOs). My fellow skeptics have also been vocal about this growing threat of disaster.

Everyone talks about blackouts, but I have not seen a detailed analysis of the various ways these might occur. I suspect there are several different basic ways, each calling for a different approach. So here are some starter thoughts.

First of all, there are deliberate rolling blackouts versus uncontrolled blackouts. ISOs and utilities may well have internal plans, or perhaps rules, for running rolling blackouts. If so, it would be very helpful to know what these are. For example, emergency service groups at all levels of government could have rolling blackouts response plans.

Uncontrolled blackouts may be unpredictable, but they can still be planned for to some degree. I live way out in the country, and we get blackouts several times a year, so we have a well-prepared routine for dealing with them if they do not last too long.

Of special importance are the size and duration of blackouts, as both features deeply affect planning. Should we expect a lot more small blackouts or just a few more big ones? How about really big ones, a few of which have occurred in the past?

It also matters how hot or cold it is, especially for large, long-lived blackouts. Severe cold is really dangerous. The Texas disaster killed a lot of people, and PJM almost went that way last Christmas.

The first question is how much failure mode analysis can already be done using existing computer models? The ISOs and utilities do a lot of modeling. For example, the ISOs already can determine what system upgrades will be needed before a new large generator can be connected to the grid. The wind and solar people complain about this because it sometimes makes their remote projects very expensive.

If the ISOs can do that kind of detailed flow analysis, they ought to be able to see where things are likely to break and what that might do to the system. I am reminded of The Wichita Lineman line saying, “if it snows that stretch down south will never stand the strain”.

It may well be that they are already doing this sort of failure mode analysis; they just don’t want to tell us about it, lest it worry us. But given all the warnings, we clearly need to worry and to take steps to address that worry.

It also may be true that they cannot do the kinds of failure mode analysis I am describing. The growing threat is new, after all, so the software simply may not exist. I seem to recall both the New England ISO and PJM saying they really could not evaluate the impact on the reliability of this attempted transition.

If this is true, then given the huge amounts of potential damage, including deaths, we should be developing that software as fast as possible. Not knowing the impact of the near-term transition on reliability is a true emergency. We may be flying blind into the wall of impossibility.

There is another form of failure that also needs to be addressed, namely wholesale power price spikes. We have already had a few of these and are still struggling to figure out how to pay for them. When unit power prices go from tens of dollars to thousands, there is a lot of damage, even if it is fiscal, not physical.

FERC and NERC reliability standards require various forms of analysis. Publicly accessible failure mode analysis should be among them. Vague warnings are not good enough.