What we have learned from building safety-critical systems

Posted: 2020-12-16

Written by Joanna Leska & Magda Dąbrowska (#mktteam)

During the last webinar with SoDA Piotr and Maciej, our safety-critical experts, talked on what we have learned from creating the software that human life depends on. They shared what are the best practices of developing safety-critical systems, that can be useful in creating any kind of software.

Watch the recording or read the article below.

blog_safety_critical

Hi, I am Piotr Strzałkowski.

As I could not be present at the webinar today, we decided to make the recording for you, to present our journey from regular to the systems that human life relies on and present the valuable tips to implement in delivering regular software.

Standards and procedures in our first highly safety-critical project.

Piotr Strzałkowski: The very first software critical project was a challenge. Although I worked with standards before, safety-critical systems and software development for such systems are very demanding due to the fact that we have to meet the appropriate standards: e.g. for a given industry, for example, railway, automotive or aviation.

These standards do not fully describe the path or the way we should work. They describe processes, and certain metrics, but there is no detailed information on how to act.

When creating regular embedded software, we do not have to meet such standards, nor collect certain metrics (we can, but we do not have to). In this case it is required.

Every process must be accurately described. Sometimes we joked with my colleagues that we have 100 lines of documentation for one line of code. It is so in these projects and you have to accept it.

It is also necessary to introduce various testing processes at different levels, ranging from the lowest test level, i.e. unit tests, to integration tests between modules, then hardware integration tests, then even higher functional tests, and so on.

Therefore, the entire team must be trained and familiar with the standards, and must be aware of how to code, what steps to take, and know the tools that we are going to use.

It looks like the standards and procedures are cool!

Maciej Gajdzica: Standards and processes can seem overwhelming. However, when we later happened to return to projects with the same characteristics as we did before the safety-critical, we proposed similar solutions, for example, unit tests, static analysis or using cyclomatic complexity.

If it was possible, we wanted to introduce such elements because we thought it was worth it.

It may seem that all these restrictions, certifications, standards, documentation, tests, additional tools are just a burden and slow down the project, and it is not worth using them in regular projects.

It might not be worth copying 100% of these schematics, but some elements were very useful. When we implemented projects that did not have such safety standards and restrictive requirements, we just wanted proven work methods.

I remember the first project we did together. At the beginning, there was uncertainty because the safety critical systems have a lot of additional restrictions and a lot of additional things that we have to do.

We all did not feel confident and we did not fully see whether these various techniques actually work, whether they would help us, and whether we would just work well.

The turning point that convinced me to this process was the task I had to perform: I was responsible for communication between the processors. The system contained several processors and this communication was quite complicated.

I was supposed to implement communication on one of these processors and then we had to perform integration with other fragments of the code that were written by other teams.

Remembering my experiences from previous projects, not critical,  where we didn’t have this level of test documentation and stuff like that, I remember it as a really tough time.

We had to combine a lot of various experts, had to do long time tests, even manual. When something broke, usually one person was fixing it, and the rest, responsible for other parts of the system, were actually bored. It all took long time.

In this project it was completely different because first we did the documentation, then we planned on a piece of paper how this communication should look like, we analyzed all the cases.

Then I implemented my part, performed unit-tests, checked if the part that was described as my responsibility works well, then I ran tests with input data from other parts of the system, and when the proper integration was achieved, it turned out that after the first small problems and misunderstandings, we were able to get to the end result very quickly.

I was surprised that it went so fast. It seemed to me that something was wrong because I have little experience on how long it usually takes and it seemed so strange that it was so easy.

And it was this event that made me believe that these procedures are actually good, they can actually help, and it’s not that they are just to fulfill some formalities, but they actually bring tangible benefits.

This feeling was even stronger in the second situation in this project, when we were doing such a demo version.

The project was at an early stage of development. There were two processors that were supposed to do the same thing and two teams were working on the performance of the same functionalities. And our team worked in focus on the processes. We chose an iterative path – we mixed development, tests and various types of checks.

The second team followed the more classic way that we knew from earlier projects –  first they did the development of a bigger part and then performed tests and worked on quality – all in large blocks.

Well, and at first our team felt quite uneasy, because we felt that it was just going slower and that the other team had better results.

But on the other hand we knew that from the engineering point of view it was all organized, the tests are positive and we just felt that it is well done.

Afterwards it turned out that we were right, that we had already solved majority of problems that the other team did not even realize at all, because they postponed these tests, and tools for later and had to work with larger blocks, so my team finally caught up with them.

It was such an additional proof for me that this road which requires more effort from and additional tools and practices, makes sense and brings real benefits, which we can also express in numbers, which are just real and are not just based on feelings.

Piotr, what do you think about it?

Piotr Strzałkowski: Well, it was some lesson. Although I  have already worked with standards, this project was challenging, as the restrictions were really tight, and no working guidelines were given.

There were stating the expected code quality, and suggested some metrics, but the decision on which methods and metrics should be used, and what tools should we take, was on our site. The most important question turned out: what “a quality code” actually means?

The most obvious answear is that the good code, is a well formatted code, done in the proper language. This is not a whole truth.

A quality code should be testable, maintenable, is reliable and portable. I mean, it is possible to transfer this code to other platforms or to other projects.

We should all love metrics.

Piotr Strzałkowski: The code should be readable, divided into modules, and should have properly defined interfaces. These features also have their metrics such as cyclomatic complexity, Halstead complexity or the basic ones as the number of bugs,  the number of lines of code.

On the basis of these metrics we can define what is the quality of our code.

To acquire the metrics we must to apply appropriate tests, such as static code analysis, code review, or unit tests. Thanks to this we get the metric of code line or function coverage.

Upon this data and by combining the data with other from the project, we can let’s say “profile” our project, as we know, even before our team informs us, that there may be potential problems.

Looking at these metrics we can observe some modules or some parts of the code and see problems comings, and thus we may move some part of the team to work on that module.

We can also present these metrics for the team for discussion, and brainstorming, and to make decisions together on how to fix certain problems.

Upon such metrics and information, the leader, without looking into the code, may notice some inconsistencies and problems – architectural or those related to the testing process.

Thus he can make better decisions in the informed way.

Maciej Gajdzica: Sometimes we look at these metrics and see some discrepancies from the norm, and then we look at the code, and it seems that everything is fine and have no idea how we could improve it.

Still but these metrics indicate places we should look at. Sometimes we intuitively know that something is wrong – this is so called “code smell” and it is a nice metaphor, because it shows that we know that something stinks here, we don’t know what yet, but we have to find it.

These metrics are such a signpost and  they should also be interpreted.

Sometimes it even happens that the metric shows something but we know this should stay as it is. In this case we should describe it as an exception to the rule and justify why we think that we can exceed this metric here.

Piotr Strzałkowski: It is also so cool for a leader too, to have a database to make his decisions on. It’s good if he may present to the boss on the basis of numbers, and values.

He doesn’t wander like in the fog, but has precise data that he can discuss with the manager, which he can discuss with the team and make some decisions based on this data, and combine them and profile, with for example number of bugs and data projected on GIT.

This tells him where is the problem – is it in the review process, or maybe the testing process or with the code quality.

Thanks to this, we can make reliable and informed decisions, and clearly present them to the client or to managers.

Maciej Gajdzica: The metrics also opens discussion and make the team more involved. First because we all have such a feeling that an engineering job is well done. That we rely on data and not on some things that cannot be fully specified or explained. We also have the feeling that we are heard and our ideas are later analyzed and the best ones simply enter this project and it seems to me that the atmosphere is also better and it also drives the quality.

Piotr Strzałkowski: At the same time you have to remember that Rome was not built in a day and the selection of metrics… and not all our initial assumptions are right.

But this is why the iterative process is, and we may check, let’s say at each sprint, what are the values of or a given metric in a given project, because it is possible that they just will not work, and we need to replace one metric to another.

There are a lot of available metrics that we can apply, such as for an automotive standards, for example, or those used in other projects. There is plenty to choose from – it is worth considering, and checking the available tools, to find the ones that would support us or our project best.

Another important lesson we had is the way we make decisions in the team – either related to architecture, or the design, or those related to the project itself. Our experience shows that authoritarian decision-making by one architect does not fully work.  Some technical solutions may become, for example, untested or the involvement of other team members can drop drastically.

In brainstorms people with various experience can exchange knowledge. And in fact I think that the final decision should be made by one person but it should be based on the information form the team members. Such approach engages team members, and opens new doors that, for example, an experienced developer may not always see and brings fresh perspective on certain aspects of the project.

At the same time from the leader’s point of view, it makes the team, no a group of separate people and professionals.  A bond creates, a real team that works together is born.

Another important to discuss are the tools.

Free of charge tools do not exist.

Maciej Gajdzica: Tools are often also recommended by team members during some retrospective project talks.

You have to make informed choices because in safety critical systems the tools may be very expensive, and highly advanced.

Still they do not perform of all the work for us. Sometimes one can fall into such a trap that if he buys an expensive tool, everything is taken care of. Actually it isn’t.

Piotr Strzałkowski: That is true. One of the common mistakes is that we do not take into consideration the time to train the team on the tools so that they can get to know how to operate it, configure it for their convinience.

No one needs a tool that has badly configured and provides incorrect metrics.

Such metrics misguide us, and cause mess in the project. We make wrong decision and problems are growing.

Maciej Gajdzica: Let me provide an example of operating systems that were used in one of our safety-critical projects. We could use the code of this operating system, and we could include it in our code, but it still had to be placed properly, which means in the way compliant with the specific standards.

This showed that actually the free software doesn’t exist. Of course it does not mean that we should only use this Polyspace or VectorCast which cost tens of thousands of euros.

Of course, if possible you can implement these free tools if they suit the team.

We have to bear in mind that the choice is not only between expensive, “self-serviced” tools, that take some responsibility off the team and between free of charge tools which are for free only seemingly.

You may not copy anything.

After some time in the project we built the whole system of the tools, connected, and cooperating.

And then we thought that we have a great, ready solution for other projects, and all we have to do is to copy them to other projects, and it will work in different circumstances.

Piotr Strzałkowski: The reality was quite brutal – it turned out that not everything fits together and is not so easily scalable. So you can’t transfer tools 1:1 to other projects.

Some have to be abandoned, some must be adjusted, and sometimes metrics do not match. You really need to see what fits a given project and iteratively adapt these tools to the new project.

Maciej Gajdzica:  And this is a very good topic – the iterativeness, because you may learn with time what tools we would need, and what they are used for.

A new project is totally different, and often different people work there. Even if know what we want to achieve, we can’t put it all at once because we may just get stuck. Introducing tools iteratively will actually impact the quality of the code.

For example, you should not provide full automation at once, of everything, but gain from the experience with continuous integration, server configuration, options of night testing or hardware testing.

If would try everything at once, at the beginning of the new project, this will just not work or the other path of the project will be taken, and the work we would have performed, would be unnecessary.

Piotr Strzałkowski: Sure, there are key elements to be done at the very beginning, but we should wait with some of them as the project develops, to see what may be needed.

You know, sometimes you think we will start with the C language, but then you end with C ++, sometimes functionalities changes and you need to adjust the tools or the customer changes the requirements.

So, there are some elements that can be easily transferred and rescaled, and with some, you have to hold back to see how the project develops.

From the customer perspective, every software is critical.

Maciej Gajdzica: For project owners and the clients each project they are working on, or a paying for, is critical.

On the other hand, for us as engineers, it is always important to a good job, in accordance with the art, to do the best we can.

And these are the two forces that actually work in the same direction and we can help to meet the expectations by using some pure engineering techniques, but also related to how the team works, on the basis of our experience gained in safety-critical systems.

There is one common element for the both worlds – well, it’s just this software development process.

Because in fact, we always want to have the well-tested software, and doing this at an early stage of the project, we combining these development tasks, and quality testing, we are able to quickly get benefits.

Such elements like planning test at various levels, like continuous integration or static code analysis are just valuable and helpful in every kind of a project.

Piotr Strzałkowski: From my point of view it is crucial to define the basic set of metrics as soon as possible.

In this way from the very beginning of the project we can see how the project is developing, what is happening in it, what are the problems and in what areas.

Summarizing: defining the metrics, value measuring, educating the team, and explaining what they should pay attention to, and why are we doing this, are the most important.

Maciej Gajdzica: On the other hand, from the point of view of the team, engaging the people in decision making has a real impact on the project, and allows them to feel that they not only work according to someone else’s vision but authentically are the part of this.

This generates greater commitment and it can be seen in practically all the activities.

Piotr Strzałkowski: I agree with you because everyone wants to do their job in the best way and be proud of the work.

Maciej Gajdzica – Senior Software Developer, Embedded Systems Developer at Solwit SA, specializing in safety-critical systems. He worked, among others, on the train traffic control system and recently developing software for the medical industry. The promoter of good practices – especially TDD – in the embedded industry. In his free time, he constructs a Micromouse robot that finds its way in the labyrinth and describes this process on his blog http://ucgosu.pl/
Piotr Strzałkowski – Embedded Domain Manager at Solwit S.A. Engineer with a thirteen-year-olexperience in the embedded systems industry, specializing in automotive and safety-critical projects, at the same time a fan of RC modeling and mechanical tuning of all vehicles.
Learn more about how we develop and test safety-critical systems – more

Watch the recording!

CONTACT US
Complete
the form below.
We will contact you to set up
a conversation at the convenient
moment for you.

Hidden
(Required)