Development process
for safety-critical systems

Written by Maciej Gajdzica

Senior Software Developer with unique experience in building life&death systems.

Therac-25 case – the lesson learned.

Therac-25 is a radiotherapy device from the 80’s used in twelve hospitals across the USA and Canada. It has earned its reputation as the most infamous case for software errors resulting in multiple human deaths. Between 1985 and 1987 it was involved in six such fatal events, each time causing excessive patient radiation.

The experts designated by the court found the exact lines of code responsible for the accidents, yet they did not stop at that – they were sure the flaws in the code stemmed directly from flaws in the software development process. Further research revealed that the whole code was written by one person, without any validation, verification, or consultation by other coders. The process also lacked basic documentation, such as architecture specification and requirements.

These were not the only problems. The faulty risk assessment had lead the producer to resign from implementing mechanical safety measures preventing excessive radiation. No tests were performed prior to introducing the device to the work environment. Any reports of problematic functioning were frowned upon by the producer and the hospital staff were blamed. At the end, the producer was forced to pay huge fines, lost reputation, and had to withdraw from the medical field.

The case of Therac-25 was an important lesson for all developers working on life-and-death systems. We have all understood that relying on individual team skill and experience is not enough to provide a sufficient level of safety. Everyone makes mistakes.

A safety net against programming mistakes

Therefore, the process of software development is required to constitute a safety net against potential programming mistakes. Such mistakes are bound to occur, yet they will not severely impact the whole project – the right team will be able to intercept and remove them before any faulty code is introduced to the final product.

Therefore, the process of software development is required to constitute a safety net which catches all the mistakes and errors at different stages and does not let them slip through to the final product. This process is called the V-model and has been specified in norms such as IEC62304 (medical), DO-178C (aeronautics), and ISO26262 (automotive).

Brak alternatywnego tekstu dla tego zdjęcia

The V-model consists of three parts – Design, Implementation, and Verification. During the design process a general concept of the system is developed, requirements are listed, the system architecture is built, and details of its modules functioning are arranged. The implementation stage means transforming the concept into the source code. Finally, verification involves various test of different levels: unit, integration, and system testing, followed by the certification process performed by an independent institution reviewing the project’s compliance with the norms.

The model got the name due to its characteristic graphic representation – the shape of the letter „V,” where each element on the right verifies the final product’s agreement with the respective part of the design stage on the left side of the graph. The scheme means the implementation details are checked by unit tests, module cooperation is checked by integration tests, while requirements are checked by user acceptance tests.

Safety critical software documentation

It seems obvious that such solutions should be complemented with an extensive documentation. On each stage of design and verification several documents are created – plans, specifications, reports, risk assessments, etc. Preparing these documents is a crucial step necessary for certifying the product and finalizing the project.

The V-model is easily recognized as similar to the Waterfall system, which has been considered inefficient for the last twenty years. The Big Design Up Front approach leads to never-ending prolonging of the project, inability to predict many problems, and a tedious process of introducing changes. As a result, the quality of a system produced in such a fashion is hardly satisfactory. Why, therefore, should you develop the most sensitive tools this way?

In reality it looks a bit differently. As it has been accurately summarized in the medical norm IEC62304:

„It does not require that any particular life-cycle model is used, but it does require that the plan include certain ACTIVITIES and have certain ATTRIBUTES.”

In practice, the norm does not impose on us any particular approach towards the product life-cycle. Instead, it grants us considerable freedom in the matter, provided that we do execute certain actions, thus creating the required documents. Therefore, it has been much more common to perform the development process in iterations, while monitoring its compliance with the norms at the certification stage.

 

(Originally published at LinkedIn: www.linkedin.com/pulse/development-process-safety-critical-systems-maciej-gajdzica)



Contact form