The famous cybersecurity triad usually has several interpretations depending on how you look at it and its application. Such is the case of industrial cybersecurity. Most people refer to the triad based on its criticality. It is known that in information security, confidentiality is prioritized, while in industrial systems, availability is paramount.
Is this enough to understand and apply the triad in the industrial field? How much is known about industrial cybersecurity?
Understanding Availability.
This is one of the aspects of the triad that is most confusing and that mistakes are made the most when evaluating industrial cyber risk and therefore when trying to mitigate it.
When we talk about the availability of systems, whether IT or industrial OT, we are referring to the time that the system is available. It is usually expressed numerically in terms of percentages. It is necessary to know other data such as the mean time between failures (MTBF) and the average expected time for the system to fail.
Availability (%) = Uptime / (Uptime + Downtime) × 100
Key Components:
- Uptime – The total time the system is operational.
- Downtime – The total time the system is unavailable due to failures or maintenance.
- Mean Time Between Failures (MTBF) – The average time between system failures.
- Mean Time to Repair (MTTR) – The average time required to restore the system after failure.
A typical corporate system in IT networks usually has availability in the order of 99%, perhaps 99.9%, as an extreme, very difficult to achieve an availability of 99.99%. This would be a ceiling on the availability of information systems.
In the industrial field, a typical market DCS system typically ensures 99.99999% availability. This is normal. SIS, ESD, F&G safety instrumented systems typically guarantee more than seven nines.
This makes it clear that it is not just a matter of reversing the priority of the triad. We are talking about very different magnitudes. Industrial systems must guarantee levels of availability that are completely unknown to IT environments. A German manufacturer, HIMA, has a HiMAX system that offers infinite availability.
An availability of 7 nines (99.99999%) means that such a system would be expected to yield a maximum of 1 hour of unavailability in 50 years of continuous operation, keeping the system unturned off during that entire time.
Methods to Measure Availability:
- Real-time Monitoring – Using software tools to track uptime and downtime.
- Service-Level Agreements (SLAs) – Defining expected availability percentages (e.g., 99.999% uptime).
- Incident Management Metrics – Tracking failures and recovery times to improve system reliability.
OT should be interpreted as the ability of the cyber asset to execute essential functions. For example: A PLC that is controlling the operation of a steam boiler, the essential function of that PLC is to “Control the correct operation of the Boiler”. Even if the PLC does not receive any consultation, it does not mean that it is not controlling the boiler.
IT is usually interpreted as the ability of the cyber asset to respond to a demand, execute its programs, a query or request. The system must be able to respond to a query. Many times, a ping is sent to you to see if the PLC responds. If the latter responds, it is assumed that it is available. But is it really controlling the boiler? When we receive the response from a PING, are we certain that the boiler is being controlled?
Understanding availability correctly is essential to visualize the real risk of implementing “cybersecurity controls”, monitoring availability, in a way that these controls make sense and lead to making correct decisions. Or even identify real risks and not fall into misinterpretations, false alarms or even self-deception.
The PLC is often seen as a single device when in fact it is several devices in one. A PLC contains a backplane and hardware modules interconnected through the backplane. These are typically one or two power supplies, one to two CPUs, a variety of communications devices, to the industrial process and to other devices or systems, including the HMI that the operator uses to interact with the process.
What is the best way to monitor or keep an eye on the availability of a PLC? If it is redundant double, triple or quadruple?
Some professionals do it by means of a PING on one of their communication interfaces. Others might think of a measurement in the energy consumption of power supplies, assuming that if the PLC is turned off then it will not be drawing current. Others prefer to listen to network traffic, assuming that if such a communications interface demonstrates activity, then it is reasonable to think that the PLC is available. Otherwise, he would not be communicating.
Do some of these measures guarantee that the PLC is controlling the boiler? What if the control logic of the PLC operating the boiler is stopped and the PLC is still on?
Professionals who implement security solutions, dedicated to surveillance, must ask themselves and answer, are we watching and monitoring in the right way? Are we coming to the right conclusions, and are the responses to alleged cyber incidents adequate?
What is the availability of the tool being used to measure availability? Shouldn’t it be better than the PLC? Does it make sense to put a monitoring tool with an availability of perhaps 99% to be monitoring the availability of a PLC with an availability requirement of 99.99999%?
Is the way we are measuring PLC availability, correct? What happens if the measurement shows a Positive value? Is it conclusive? How should we respond to a positive that does not make sense?
Understanding Integrity.
Integrity is of fundamental importance, both in the domain of information systems (IT) and in the domain of industrial systems (OT). However, their application and interpretation are very different between the two domains.
The integrity of a PLC will be compromised when the configuration of the cyber asset is altered in a way, usually malicious, with the purpose of causing or provoking an intolerable consequence for the organization. In the example of our PLC for boiler control, it could be a modification in the PLC logic that manages to cause an explosion with a possible fatality or multiple fatalities. This is a hypothetical case by way of example. It is necessary to understand how the boiler works and to know whether this is possible or not.
In general, these actions must be malicious and knowing. Referring to the integrity of the essential function of the PLC. Any other modification in the configuration of the PLC or its modules would surely cause some kind of problem in the availability that is usually tolerable, but not in the integrity of the boiler’s operation.
So, the fundamental thing is to control, limit and know if the logic that contains the control of the boiler has been modified. If it has been modified, what has been the change? Change management is vitally important. The same as their surveillance in the correct way.
Many look for very elaborate and complex methods to try to infer that the PLC logic has been modified or perhaps possibly modified in some way, when in reality this can be known in a very simple way with a native function of the PLC and at no cost to the organization.
Why are most companies implementing complex and very expensive solutions when this can be done in a very simple way and at no cost to the organization?
Understanding Confidentiality.
The confidentiality of information is another aspect that is also often misinterpreted when applying in the field of plant floor security, and more specifically in industrial control systems.
Industrial systems do not have habeas data. They don’t store personal information, credit cards, they don’t even have the date of birth of the person who is logging in, nor their family tree, their address, etc. These data are completely irrelevant.
Typically, during a plant operation shift, the operators in charge are jointly and severally liable. Pedro does not have to take care of José, his companion on duty, as if he distrusted him.
We can say that, in the vast majority of industrial control systems, DCS, PLC, SIS, ESD, etc. Confidentiality is usually not a concern for the plant. These systems do not store historical information. HMIs keeps a recent history to display trend charts with useful information for operational purposes. Operations and production management systems typically implement plant wide historians (PWH). But controllers do not retain historical information. They use instantaneous information from the transmitters of pressure, flow, temperature, etc. An instant later, the value of the temperature of the tank a fraction of a second earlier is usually completely irrelevant, and its possible alteration has no impact for the purposes of controlling the industrial plant.
Conclusion
We see that companies are spending fortunes on implementing systems that monitor industrial systems incorrectly. And the worst thing is that they insist on continuing to monitor industrial systems in a perfectly wrong way. The industry is being cheated with wrong rationales and robbed to implementing solutions that doesn’t help.
An old Japanese proverb says. When you’re traveling on the wrong train, get off at the nearest station and return. The longer you delay making that decision, the more difficult it will be for you to travel back.
Most companies are only mitigating their budget, implementing complex expensive IDS systems without real knowledge of the plant, and even more complex SOC/OT systems instead of prioritizing correctly, mitigating risk in a practical, realistic, effective, durable, efficient, and a lot more economical way.
Don't forget to subscribe to OT Connect Newsletter - The News That Matters. A good balance between informative, valuable information and solutions with less than 20% of marketing content.
Get Involved & Participate!
Comments