By Steve Owens
Every company wants a product that never fails. You may have the most features for the lowest cost, but if your product is unreliable, sales will fall and the brand will be eroded. Unfortunately, for most companies, product failures are a fact of life that cannot be completely avoided. For these companies, the question becomes what strategies should we use to minimize these field failures, and this question has everything to do with what to do when your product fails.
In our study of hundreds of small technology companies including startups, we found a small percentage that had outlier reliability metrics. These companies all had one thing in common – a set of processes for ensuring products are reliable and stay reliable. Specifically, these procedures: Field Failure Reporting (FFR), Root Cause Analysis (RCA), Corrective Action (CA) and Design Verification Testing (DVT).
Field Failure Reporting (FFR)
Most companies in our study simply fixed or replaced field failures. The primary focus was on making the customer happy. Once the customer was happy everything was good. Give them a working unit, and get back to what we were doing. Many of them would also do a lot of finger pointing about whose fault the failure was – but that’s another blog all together.
Our outliers also made the customer happy, but did something else as well – they saw the failure as an opportunity to systematically improve product reliability, and thus make all future customers happier by figuring out how to avoid this type of failure from occurring again.
The first step to this mission of avoiding future failure is opening a Field Failure Report. Typically, the FFR is opened by customer service, processed by engineering and can only be closed by the CEO. Engineering is charged with finding the root cause of the problem and developing a corrective action that will prevent the same type of failure from occurring again. In the best companies, engineering was required to “convince” someone outside of the engineering department (typically the CEO) that they had truly identified the root cause, and that the corrective action would not only prevent futures product failures, but would also not cause other issues: increase product cost, other failures, bottlenecks, etc.
Root Cause Analysis (RCA)
The first step to improving the reliability of an existing product is determining the root cause of failure. Just repairing the unit does not help us prevent future failures. We need to understand why the repair was necessary. Why did the component fail?
Root causes come in three categories:
- User Error
- Drawing Compliance
- Drawing Errors
User Error occurs when the customer either uses the product outside the specification, or does not understand how to use the product. For example, the customer subjecting the product to a shock event (they dropped it) outside the specified limit.
Drawing Compliance is a term used to describe the responsibility of manufacturing. Manufacturing is responsible for doing what is on the drawings – they are responsible for ensuring the units manufactured comply with the drawings they were given. If the drawings call for version 2.3 firmware to be loaded, and version 2.2 was actually loaded, then the units are not within Drawing Compliance.
Drawing errors occur when the unit was used within the specifications, and the unit was manufactured with Drawing Compliance, but still failed. Drawing Error means the drawings are either calling out something that should not be done, or not calling for something that should be done.
A proper RCA determines which of the three areas may be the root of failure.
Corrective Action (CA)
Once the root cause has been determined, the next step is to generate a change that will prevent this type of error from occurring again. If the problem was User Error, then possible CAs could be:
- Improve the Instructions
- Change the Product Requirements
- Improve/Change Marketing Message
If the problem is Drawing Compliance, then possible CAs could be:
- Improve Manufacturing Processes
- Improve Personal Training
- Change Vendors
If the problem is a Drawing Error, then the CA is to correct the drawing and:
- Change Product Development Procedures
- Improve Personal Training
- Update DVT (see next section)
Design Verification Testing (DVT)
The foregoing is all well and good for existing products, but how do you ensure the reliability of a new product before any units get to the field? The answer is by conducting a DVT. A DVT is a written procedure that specifies a series of test or analysis that will confirm that a set of product drawings will meet the product requirements if manufactured within Drawing Compliance.
Typically, a DVT may include:
- HALT to measure reliability
- Functional Testing
- Compliance Testing
- Performance Testing
- Usability Testing
By documenting the test, you provide an easy means of repeating the test every time the product is updated. The DVT procedure can also be used in a AQL (Average Quality Level) program were units are sampled off the production line and tested.
For most companies, field returns are going to happen. We can either see these returns as the latest fire to put out, or we can see them a means to preventing futures fires from occurring.