In these notes we will introduce the concept of engineering reliability. We will not attempt to present a thorough nor complete examination of all the reliability concepts. Rather we will simply cover the fundamentals so that students will understand what is meant by reliability and have sufficient information to pursue the subject further when there is a need.
One definition of technical reliability:
"The reliability of a system or a device is the probability that it will give satisfactory performance for a specified period under specific operating conditions."
Note the governing parameters for reliability:
To find any one of the above parameters the other four must be specified. The more precisely the other knows are specified the more precisely the unknown parameter can be determined.
The reliability of a system will depend on:
Therefore any engineer who is involved in technical activity should be aware of and cognizant of reliability and reliability techniques.
Reliability is measure in terms of probability. And, probability - is a form of applied mathematics and is an aid to logical thinking. Our knowledge of the probability of an occurrence is based on either intrinsic knowledge of the system, on statistical test data or both.
Intrinsic knowledge:-
Statistical Knowledge:-
Eventually all probability must be based on statistical confirmation. "The validity of statistical results is no better than the input data and the manner in which they are used."
Because statistics is based on finite data, the probability inferences have three parameters:- the probability, accuracy, the level of confidence. When the news presents the results of a pole, it presents the percentage "43%", the accuracy "plus or minus 2%", and the confidence level " nine times out of ten".
Reliability identifies the state of knowledge about a system and not about the system itself.
Consider:- An Astronaut sitting on a rocket talking to the engineer by radio.
"Is this rocket reliable?"
"Yes it is"
"How do you know?"
"We have tested 16 of them and 14 worked without a hitch." (R = 14/16
=0.875)
"Have you tested this one?"
"No you wouldn't be on it if we had."
"How do you know this one is reliable?"
"We built it identical to the others as is humanly possible."
"Look over there we have 10 more just like it, we will fire those just to show you."
"Ooops! That first one blew up. Sorry about that." (R = 14/17 = 0.824)
"Never mind, we will try the rest!"
"See they all performed satisfactorily!!" (R = 23/26 = 0.885)
"See the reliability is better than when we started!"
In this example the reliability has improved although the astronaut is sitting on the same rocket which has not been changed in any way.
Neither probability theory nor reliability can predict discrete or specific events. They can only say something about a large number of identical events. Therefore reliability will not predict that a device will operate for a specific time before failing or that one specific system will operate over a longer time than another specific system.
What can reliability tell us?
During WWII quality control concepts were introduced into manufacturing to compensate for the lack of skilled workforce. These concept based upon work at the Bell Laboratories in the 1920's. Shewhart had determined that in repetitive manufacturing some defects were purely random and that statistical mathematics could be used to identify from small samples whether the defects were random or problem related. In late 40's and 50's Equipment complexity increased and there was a shortage of engineers and technicians to solve all the problems so statistical procedures were introduced under the name of reliability to identify problem components and systems . These method have been more recently amplified and applied as the "Tugushi Method" after the Japanese professor who advocated them.
Meanwhile a U.S. Navy survey showed that at any given time 70% of all naval electronics equipment was in inoperable. The search began for some method of improving and specifying reliability requirements. Two basic approaches emerged.
The area developed rapidly because of the wealth of statistical mathematics readily available and because non-engineers professionals that could be used to do the work:- mathematicians, chemists, physicists.
The first developments where with electronic equipment where vacuum tubes and electronic component failures appeared random.
Statistical concepts where redefined as reliability terms.
If we record the sum of failures over a period of time for a given set of systems.
If we plot F(x) as the fraction of the total population which has failed up to time x against the
operating time x, we get a monotonically increasing function. F(x) is known as the Cumulative
Distribution Function, CDF, it may also be called "the probability of failure in time x."
FIGURE 1 Cumulative Density Function
The probability density function f(x) is the rate of failures at time x, the slope of the CDF
f(x) = d F(x) /dx
f(x) is also known as frequency function of failures.
f(x) is not a probability
The integral of f(x) between x = a and x = b is the probability of failure during interval a - b.
FIGURE 2 Probability density Function
When describing properties of distributions care must be taken in the terms used.
FIGURE 3 Properties of distributions
Beware - in vernacular mean is used when mode or medium is meant.
The Hazard Rate ( H(x) ) is the instantaneous probability of failure at time x. At time x the Hazard Rate is fraction of those that have survived until time x which will fail in the next unit of time.
FIGURE 4 Hazard Rate from PDF
Referred to as hazard rate, or failure rate, or rate of failure occurrence, or failures/HR.
The Hazard Rate often indicates the nature of the failure mechanism Most commonly referenced shape of H(x) is the bath tub curve.
FIGURE 5 Hazard Rates
Figure 5 (a) Primarily refers to electronic components. Can also be compared to human body (infant mortality, maturity, old age). For Mechanical Equipment Figure 5(b) is more realistic. H(x) is more V shaped for example cars and machinery. Fatigue problems increase H(x) rate with time.
To be able to deal with statistical data mathematically a number of failure distribution models
are
used. Their choice is based on the fact that they closely approximate statistical failure data and
because they can be expressed rather simply mathematically.
FIGURE 6 Normal Distribution
It is symmetric mean = mode = median
68% within 1 std. deviation
95% within 2 std. deviation
99.73% within 3 std. deviation
Most random phenomena have normal distribution
FIGURE 7 Exponential Distribution
This Distribution has great favor in the electronics industry. It is a one parameter distribution , the mean. It is an easy distribution to deal with mathematically.
Theta = MTBF mean time between failure
Theta = Total Operating Time / Number of Failures
A three parameter distribution.
FIGURE 8 Weibull distributions with
various shape constants.
For mechanical systems the weibull distribution has proven to be most useful. Originally discovered in
In practice it is usually assumed that the origin is zero. Only if the data is a poor fit are other origins considered. The distribution has its origins in the early 40's to attempt to describe Fatigue failures.
Note when b = 1 the reliability becomes an exponential function
The usual practice is to use Weibull graph paper which is a form of log of log - log paper. Plotting the CDF on Weibull paper gives a straight line if the origin is 0. The slope of the line is the shape parameter b.
FIGURE 9 Weibull paper showing plots of an
exponential distribution and a normal distribution.
Reliability data for components and complete systems are often established through testing. From these tests the Weibull distribution can be used to establish the reliability of the component or system.
Assume we have a very large number (n) of ball bearings on test. The environmental conditions are controlled:- load, speed, temperature, etc. They experience the following failures.
| 50 hrs | 4.6% failure |
| 100 hrs | 9% failures |
| 500 hrs | 48% failure |
| 1000 hrs | 81% failure |
What will be the failure rate and the reliability for 20 hours service?
If we plot the failure data on Weibull paper we get the following result.
FIGURE 10 Weibull plot of bearing test data
The slope of the curve is b = 1.2 Close to exponential
Failure rate F(x) (CDF) for 20 hrs = 1.5%
Reliability for 20 hrs operation = 98.5%
Note:- Bearing books state that : L50 life = 5 L10 life
From these test:
L50 = 500 hrs
L10 = 100 hrs
Then
L50/L10 = 5 Same as bearing book.
Sometimes tests cannot be carried out on extremely large samples. Because of costs and other factors, only a small number can be tested. In such a case we must compensate for the fact that failed samples reduce the number on test. For small sample size it is suggested that median rank correction be made.
If we knew exactly the percent that would fail before the first failure in a set, that percentage would be the true rank of the first failure. Since we do not know the true rank, we must estimate it.
The median rank correction (MR) for small samples can be approximated by
MR = (j - 0.3) / (n + 0.4)
j = order of failure
n = sample size tested
For example:- Consider newly designed gear boxes. We would like to know the operating life for at rated load of 50 H.P. for reliabilities of 90%, 50%, and 80%.
Seven typical units are selected at random from production and tested at 50 H.P. load until they fail. With the following results:
| Order | Life x 106 | Uncorrected Rank (Failure Rate) | Correct MR (%Failure) |
| 1 | 1.0 | 0.142 | 0.0945 |
| 2 | 1.2 | 0.286 | 0.2297 |
| 3 | 1.5 | 0.428 | 0.3648 |
| 4 | 1.7 | 0.571 | 0.500 |
| 5 | 1.8 | 0.714 | 0.635 |
| 6 | 2.0 | 0.857 | 0.770 |
| 7 | 2.3 | 1.0 | 0.905 |
Plotting the results on Weibull paper we get:
Figure 11 Weibull plot of Gear Box Tests
From the plot we find :
R10 =1x106 cycles
R50 = 1.65x106 cycles
R80 = 1.02x106 cycles
R98 = 600,000 cycles
In some cases we cannot test all the units to completion. This may be specially true when collecting field data. We must make appropriate correction for the lost data.
If we start to gather reliability data for a number of units, but if for some reason some of the units are removed from the test before they fail, we must correct for the fact that not all of the units ran to failure.
The failure rate is corrected by the following way:-
Failure rate correction for interval t2 - t1
r2' = r2 (N1 - r1')/N2
r2'= expected failures during t2 - t1
r2 = actual failures during t2 -
t1
N1= original number of units on test
r1'= cumulative number of failures at t2 - t1
N2= number of units on test during t2 -
t1
Let us consider that 202 trim tab actuators are put in service on an aircraft. The following life
to
failure are reported, some aircraft are pulled from service for other reasons.
| Life time of failures in hours | Number of failures | Number exposed to failure | Number of Failures expected if all Original Population had been allowed to proceed to failure | Cumulative number of failures expected | F(t) | R(t) |
| 141 | 1 | 202 | 1.00 | 0.0049 | 0.9951 | |
| 210 | 1 | 177 | 1 x (202-1)/177 = 1.135 | 2.135 | 0.0106 | 0.9894 |
| 220 | 1 | 176 | 1 x (202-2.135)/176 = 1.135 | 3.27 | 0.0162 | 0.9838 |
| 260 | 1 | 165 | 1 x (202-3.27)/165 = 1.20 | 4.47 | 0.0221 | 0.9779 |
| 300 | 1 | 156 | 1 x (202-4.47)/156 = 1.27 | 5.74 | 0.0284 | 0.9716 |
| 310 | 1 | 153 | 1 x (202-5.74)/153 = 1.28 | 7.02 | 0.0347 | 0.9613 |
| 340 | 1 | 144 | 1 x (202-7.02)/144 = 1.35 | 8.37 | 0.0414 | 0.9586 |
| 351 | 1 | 143 | 1 x (202-8.37)/143 + 1.35 | 9.72 | 0.0481 | 0.9519 |
Plotting F(t) on Weibull paper we get
FIGURE 12 Weibull plot for Incomplete Failures
For 500 hours service reliability R is approximately 0.90
For 1100 hours service reliability R is approximately 0.50
And reliability for 100 hrs operation will be ( R = 1 - 0021) approximately 0 .9979
The Weibull shape constant (slope) is b = 2.4 near normal see Figure 8.Hazard Rate and Weibull Distribution
We define the hazard rate at time x as the fraction of units that have survived to time x which will fail in the next unit of time. (Figure 4)
For bathtub Figure 5 curve MTBF defines failure rate during useful life.
H(x) = hazard rate = 1/MTBF
When we consider the hazard rate we can see the power and usefulness of the Weibull distributio
We see from Figure 8 that by manipulating the variables we can develop any Hazard form.
If we use these functions piece wise we can duplicate bathtub curve H(X).
Altenatively by adjusting Theta and b we can closely approximate H(x) hazard rate of any
form.
Now we can see that the exponent b is important because its value tells us what kind of failure
mode
we are recording.
b < 1 burn in (some overloaded or underdesigned components).
b = l useful life constant failure rate.
b > l systematic wear out.
b = 3.5 random failures.