Safera

Post-mortem on the thermocouple sensor solution project

Introduction

Not every project works out as intended. Sometimes projects simply fail to produce results, both in the world of academia and in more practical professional applications. To quote the words of Adam Savage, “Failure is always an option.” At Defcon 17, he hosted a talk about failure and how learning from it led to him being who he is today. While it’d be nice to simply pin the failure of the project onto something as simpel as Murphy’s Law, it’d rob us of the chance to learn.

In the guidelines for the Protocamp project it states that the project website is intended to be positive and serve as advertisement for the product the team has developed. Regrettably, this is not the case. To be frank, it’s hard to have a positive outlook on your project as you show your results only to be asked “Is this what you spent three months on?” It is not a pleasant experience trying to find an answer.

In the context of projects a post-mortem report simply refers to documenting the project, its background, goals, phases, setbacks and successes so that this knowledge can be utilized in the future. In other disciplines a postmortem study refers to the knowledge learned from the dead, a fitting if unnecessarily grim metaphor for this project. This is what I’m hoping this post will achieve. I hope that the record or our failures will help others avoid the landmines we stepped on.

Background

Safera is a Finnish company specializing in kitchen safety solutions. They have had considerable success with their products being available in both a stand-alone form as well as integrated into fume hoods. The product uses a variety of sensors in order to determine if a situation is safe or if action should be taken. Depending on the version, the action taken can go from something as simple as alerting the user all the way to cutting power to the stove.

In order for the product to accurately determine if the situation is safe or not Safera requires a variety of data sets from normal and abnormal situations. To fulfill this need they have built a lab space for  this exact purpose. Up until now, they have used commercial thermocouples in order to take temperature measurements. Many of these commercial models used are, however, quite flawed. In addition to issues with drivers as well as their relatively high price, their closed source nature makes them difficult to repair in case of malfunction. There’re also other small design flaws with several models that can lead to inaccurate measurements due to poor placement of the sensor used for cold-side compensation. For this reason, they chose to sponsor a Protocamp team in hopes of the team developing a more cost-effective and accurate thermocouple sensor solution.

What we did

As grim as the introduction might make it out the be, the project wasn’t a complete failure. Our initial plan involved a complex setup involving a Raspberry Pi acting as a central sensor hub connected to the computer over an ethernet cable utilizing a TCP server/client architecture. Fairly early on, however, we came to the conclusion that it was simply not viable with the low turnaround time of the project. It was the first major restructuring of the project and was done fairly early on after several meetings with the Safera representative. This was also done during an early stage of the projet as we were performing research into how to accomplish the various goals.

After this initial stage of the project we split into our respective work package groups. Working side by side, the team split into three. I worked on the data logger software, Jyri worked on hardware and Anton and Vladimir worked on incorporating additional sensors. Much of the work at this stage of the project was done individually, with most communication happening on Telegram and weekly meetings. It wasn’t ideal, but it worked well enough at the time.

On the software front, research was focused on how to implement an easy to use user interface. Because of the choice to use Python, the choices were narrowed down to QT and Tkinter. Due to a misunderstanding regarding licensing, particularily clauses in the GPL, I chose to use Tkinter. This was, however, something of a mistake. While the user interface is functional, it has significant room for improvement. The software is split into several components. The largest of them is the App class containing all of the UI functionality. The initializer defines the elements of the UI and the functions bound to them. The second component is related to thermocouple conversions, being based around an open source library. The third component contains simple functions for writing into files. More detail can be found in the report.

On the  hardware side, there was much to learn regarding circuitboard design and manufacturing as well as what components were required and for what. Based on this research, Jyri selected the components that were used as well as what was on the board beyond the socket, ADC and connectors. The socket used is a standardized plug for K-type thermocouples, allowing for the thermocouples themselves to be replaced easily should the wires become damaged. In addition to the high precision low noise ADC used the circuitboard contains a filter. The initial prototype version of the board is intended to function more as a proof of concept, meaning it is intended to be used with an arduino or another similar microcontroller. Later versions were intended to include several ADCs and thermocouple connectors all connected to an Atmel microcontroller on the board.

At the beginning of the project it was proposed that several other sensors be integrated into the sensor package used during test measurements. For example, initial plans included implementing an air particle count sensor as well as a video camera among other possible options. The third workpackage related to the project included researching and testing how these could be used in the Safera lab.

While we worked hard, the end results were not perfect by any stretch of the imagination. The question we have to therefore ask is what went wrong and how can we do to avoid the same mistake in the future.

What went wrong

Based on the tone of the post, it’s quite clear things did not go well. When this happens to a project, it is important to consider the potential reasons so that the same pitfalls may be avoided. It is quite likely this list is not comprehensive, but these are things that jump out in retrospect.

Although initially we were quite confident in our chances of success, things did not go so well in the end. For one, the initially decided upon time table went out the window almost immediately. While progress was made, it was far slower than the initial time table assumed. There is no one simple core reason nor can the blame be placed on any single individual’s shoulders. During meetings, we recognized these issues with deadlines. We narrowed to scope and cut out less necessary features in an effort to meet deadlines. This helped, but it was a bandaid on a fleshwound. The delays, however, weren’t what did the project in. They simply exacerbated other problems into insurmountable obstacles.

What caused the most issues and led to the eventual failure of the project were issues related to the analog to digital converter. The AD7793 used in the thermocouple circuit proved to be quite challenging to use. The issues stemmed from the fact that reading from the ADC was quite complicated and the process of communicating with the chip was obtuse and poorly documented. Writing into registers a byte at a time as what libraries existed were simple frameworks. Because of the late stage this issue was encountered, there simply was not enough time to produce a second prototype utilizing a different ADC. A simple setback that became something that could not be fixed simply because there was not enough time.

Another issue was that it proved surprisingly hard for team members to help others with their work packages. Due to everyone focusing on their aspect of the project, anyone trying to help was coming in with baseline knowledge where the issue was related to something far more specific. This proved to be a surprisingly serious issue as it became clear that team members could not properly help eachother even if they wanted. Taking the time to catch someone up to date would’ve simply caused further delays, making the issues related to deadlines worse. A part of this was due to the lack of a central knowledge database where members of the group could quickly catch up on aspects of other work packages. Essentially, a failure in communication which made teamwork far more difficult than it should’ve been.

Further issues were caused by the fact that the initial research done simply proved insufficient. Several parts of each workpackage were largely overlooked until they became a problem. For example, the software work package hit a dead end for nearly a week as it became clear that the use of pipes that’d seemed the “obvious” choice was infact completely unsuited for the project. Furthermore, it became clear that the very architecture of the software itself could’ve been significantly improved by using the Python asynchronous IO functionality, yet by the time it became clear there was no time to utilize this knowledge. While the code functions, it is also clear it could’ve simply been better had these things been known earlier on in the development process.

On the hardware side, insufficient research had been done on communication with the ADC. The SPI interface bus that had caused so many issues should’ve been something that was researched at the start. Instead, issues with it only became clear when it became time to integrate the circuit board, microcontroller and software. Even as we spoke about our project during the gala, defeated and frankly embarassed, it was clear that if we’d simply had a week we could have finished the project. If we’d been more thorough with our research, there is a fair chance we would’ve had the knowledge required to overcome this obstacle and to finish the project.

A more minor issue, more something that should’ve been considered but never was, was the development model utilized. The project began with a plan. The plan was examined, revised and then accepted. Then individual segments were worked on and only at the end were they set to be combined before presentation. Instead of utilizing this waterfall model where each member of the team simply worked on their own aspect come hell or high water, wouldn’t the time constraints call for a different approach. What if, for example, the project was instead split into more discrete actions utilizing a Scrum model with short two week sprints? Would everyone working as a group or in pairs to finish a feature have allowed us to finish in time? The suitability of a development model for the purposes of the project is quite important, yet it was never something we even spoke about. Only in hindsight did I even realize this is something we could’ve spoken of.

Another mistake was one brought up by the Safera representative. Even as we were struggling to complete the project in time, we simly did not even think of asking for help. During a meeting with the company at the beginning of the project they even volunteered to assist should the project hit a brick wall. Yet, even when faced with a problem we could not overcome in nearly a week of work, why did we not ask for help? They mentioned that it was an issue they could’ve likely solved within  a day with the experience from working with similar hardware. During such a short project, it is important to make use of every resource and to not make use of the offered help was almost criminally neglicent.

Reflecting on what went wrong

When reflecting on a project, it is important to realize that hindsight is 20/20. The failure of a project can never truly be blamed on a single factor or a single individual. While it is rather disheartening to list your failures and to dwell on them, it is a part of the learning process. After establishing what went wrong, it is important to consider why these things went wrong. By going beyond simply saying “We shouldn’t do this again” and examining how to avoid the circumstances that led to the error in the first place, we can better avoid repeating past mistakes.

It is easy to simply state that the project was late meeting deadlines, however it is much harder to say why. There’re numerous reasons for why a project does not meet deadlines. In a way, it is a microcosm of everything that went wrong. Everything from poor resource management, flawed and naive planning as well as issuse with motivation and communication.  In a way, it can be said that delays were more a symptom than a cause. Everything from length delays do to insufficient research to problems that could’ve been solved simply by asking for help. If there is one thing I personally can say about it, I would use delays in a project as a canary in a coalmine for there being something that should be addressed.

Mental fatigue and motivation are two of the hardest factors to quantify, yet they are quite important. Especially in a project with a short turn-around time, it can be quite hard find the time reset and get your head back in the game. The fact that you are often beholden to another party if things do not go right simply make matters worse. The stress can build up until it becomes hard to focus. This can lead, and in the case of this project did, lead to a situation where the stress causes difficulty focusing and delays which only winds up making the stress worse. How people deal with stress is highly individual and in many ways effectively de-stressing is a learned skill. In the case of this project, talking about it openly could’ve helped solve issues. A potential solution is to hold non-project related relaxation and team-building events. While you see it in work enviroments and in some professional settings, it is something that could be considered for projects similar to those of this course. If nothing else, we didn’t do regarding this issue and look where we are now.

The issues with the research performed towards the start of the course are at least in my opinion largely down to inexperience. Because of our lack of experience, we simply did not know what we did not know. With more practice, we would’ve most likely recognized the holes in our research. If nothing else, this project serves as a learning experience.

One of the worst parts of the projects was to be in a situation where you earnestly want to give someone a hand with a problem they face yet are unable to.  It is a special type of hopeless feeling, which certainly does not help with the motivation issues. The question is how could we have avoided the situation? I think it comes down to poor communication. While each member of the team had a good idea of what they were doing, there was no knowledge base. For example, even though I’d commented my code, I did not have the presence of mind to write a README which would allow for another member of the team to get up to speed. The same applies to the circuit board and AD7793 SPI interface. From this, I can confidently say that it is important to maintain a knowledge base when working on a group project. Not only is it useful to have a central knowledge  repository to refer to, it will allow for members of the team to more effectively shift to provide assistance where it is required. In the case of our team, I think it largely came down to inexperience. It never struck anyone in the group as something that should be done.

On a more general level, communication is absolutely vital. This is a part of why the course has mandatory weekly meetings. However, I do think that it would have been better to hold them on Mondays instead of Wednesdays. This would’ve allowed for the meeting to to be for discussion of what is to be done during the week, like Scrum, intead of being a simple meeting to discuss the state of the project. All things considered though, when the meeting is is such a minor consideration as long as you hold meetings in which everyone takes part.

-Antti Matikainen, writing from their own personal experiences in defiance of common sense and advice to the contrary.

The results