An Introduction to Statistical Process Control (SPC)

One of the key ideas in lean manufacturing is that defects should be detected as early as possible. Efforts to control manufacturing processes so that issues can be detected before defects occur actually predate lean. Statistical Process Control (SPC) is a set of methods first created by Walter A. Shewhart at Bell Laboratories in the early 1920’s. W. Edwards Deming standardized SPC for the American industry during WWII and introduced it to Japan during the American occupation after the war. SPC became a key part of Six-Sigma, the Toyota Production System and, by extension, lean manufacturing.

SPC measures the outputs of processes, looking for small but statistically significant changes, so that corrections can be made before defects occur. SPC was first used within manufacturing, where it can greatly reduce waste due to rework and scrap. It can be used for any process that has a measurable output and is now widely used in service industries and healthcare.

SPC uses statistical methods to monitor and control process outputs. This includes graphical tools such as run charts and control charts. The design of experiments is also an important aspect of SPC. SPC must be carried out in two phases. The first phase ensures that the process is fit for purpose and establishes what it should look like. The second phase monitors the process to ensure that it continues to perform as it should. Determining correct monitoring frequency is important during the second phase and will in part depend on changes in significant factors, or influences.

Common Causes and Special Causes of Variation

A key concept within SPC is that variation in processes may be due to two basic types of causes. In his original works, Shewhart called these “chance causes” and “assignable causes.” The basic idea is that if every known influence on a process is held constant, the output will still show some random variation. Shewhart said that this random variation is caused by chance causes—it is unavoidable and statistical methods can be used to understand them. For example, if we know that a process is only noticeably affected by chance causes, then it is possible to calculate the probability of a given part being out of specification.

Shewhart referred to other sources of variation as assignable causes. These are not random in nature and are caused by identifiable events or changes. For example, a change in temperature, a different operator taking over a machine, or a change in the batch of material being used. It is very difficult to predict, using statistics alone, what the output of a process will be if there are assignable causes of variation.

In modern SPC, chance causes are normally called “common causes,” and assignable causes are called “special causes.” The chance, or common, cause variation may also be thought of as the noise. Below the noise floor it is not possible to detect the effects of assignable, or special, causes of variations. If these special causes start to produce more significant variations then they become visible above the noise floor.

These concepts also have parallels with measurement systems analysis (MSA). Common or chance causes are equivalent to precision and repeatability in MSA. Similarly, special or assignable causes are equivalent to bias or trueness.

A State of Control

One of the aims of SPC is to achieve a process in which all the variation can be explained by common causes, giving a known probability of a defect. Any significant special cause variation should be detected and removed as quickly as possible.

Shewhart said that something was controlled when “we can predict, at least within limits, how the phenomenon may be expected to vary in the future…. [this]) means that we can state, at least approximately, the probability that the observed phenomenon will fall within the given limits.”

In modern SPC, a process is said to stable or in control when the observed variation appears statistically to be caused by common cause variation, at the level that has historically been recorded for the process. This is often achieved using a control chart showing limits which represent the expected level of variation. A stable process may also be thought of as one in which any assignable cause variations are below the noise floor of the common cause, random variations.

Real processes have many sources of variation but usually only a few dominant special causes are significant. During the first phase of applying SPC to a process, these special causes are identified and removed to produce a stable process. The limits of this process can then be determined statistically, provided another special cause does not emerge. For example, a once-stable process may start to change as tooling wears.

The concept of a stable process also has a parallel in measurement uncertainty evaluation. The uncertainty of a measurement should only be evaluated when any known systematic effects, or causes of bias, have been corrected, leaving a measurement that can be modelled by only random influences.

Basic Statistics

SPC is a large subject that can involve some pretty complex statistics. However, only a very basic understanding of statistics is required to understand the core methods of SPC. You need to understand standard deviation, probability distributions, and statistical significance.

The standard deviation provides a measure of the variation or dispersion for a set of values. Suppose you want to measure the variation of a manufacturing process that is producing parts. You could start by measuring 30 parts at the end of the process. Each of the parts has a slightly different measurement value. Looking at these values would give you an idea how much variation there is between the parts, but we want a single number which quantifies that variation.

The simplest way of measuring this dispersion would be to find the largest and the smallest values, and then subtract the smallest from the largest to give the range. The problem with using the range is that it doesn’t consider all of the values; it is best purely on the two extremes. The more parts we checked, the bigger the range we would get, so clearly this is not a reliable measure. There is also no way of determining a probability of conformance based on the range.

The standard deviation is the reliable measure that we need; it allows a probability of conformance to be calculated if certainty assumptions are valid. It’s basically the average distance of all the individual values from the mean for all the values.

Consider this simple example. We have measured 5 parts (n = 5) with following values: 3, 2, 4, 5, 1. The mean of these values is the sum divided by n.

(3 + 2 + 4 + 5 + 1) / 5 = 3

Next, we find the difference of each value from the mean:

3-3 = 0, 2-3 = -1, 4-3 = 1, 5-3 = 2, 1-3 = -2

When considering dispersion, it’s not important whether the values are larger or smaller than the mean, only how far away they are. To get rid of the direction (the sign), we square each difference, then we add them all together and divide by n to get the mean:

This is normally written:

What has been calculated so far is the variance. Because each difference from the mean was squared, taking the square root of the variance makes sense, this is the Standard Deviation. For this example, the standard deviation is 2=1.41. However, because the sample only contained 5 parts, it is not a reliable estimate of the standard deviation for the process in general. Therefore, a correction must be applied, this is done by using n-1 instead of n. The complete calculation of the standard deviation may be written as:

Standard deviation is used to measure the common cause variation in a process.

Another basic statistical concept that is important in SPC is the probability distribution. Random events can be characterized using probability distributions. The possible scores when you roll a six-sided die follow a simple probability distribution. The die has an equal chance of rolling a 1, 2, 3, 4, 5 or 6. If the dice is rolled 6,000 times, you would expect each number to occur approximately 1,000 times. If you made a bar chart of the scores, the bars would all be of roughly equal height. This rectangular shape is known as a rectangular distribution. The uncertainty due to rounding a measurement result to the nearest increment on an instrument’s scale has this rectangular—or uniform—distribution, since there is an equal chance of the true value being anywhere between +/- half an increment on either side.

When two dice are rolled, something interesting happens. The score can be any integer between 2 and 12, but you are much more likely to get a score of 7 than a 2 or a 12. This is because there are several ways to score a 7 but only one way to score a 2 or a 12. For example, to score a total of 2, both dice need to roll a 1. There are two ways to score 3 (A=1 and B=2) or (A=2 and B=1). All the possible scores, with the different ways to achieve them, are as follows:

Ways to score 2 : (1,1)

Ways to score 3 : (1,2)(2,1)

Ways to score 4 : (1,3)(2,2)(3,1)

Ways to score 5 : (1,4)(2,3)(3,2)(4,1)

Ways to score 6 : (1,5)(2,4)(3,3)(4,2)(5,1)

Ways to score 7 : (1,6)(2,5)(3,4)(4,3)(5,2)(6,1)

Ways to score 8 : (2,6)(3,5)(4,4)(5,3)(6,2)

Ways to score 9 : (3,6)(4,5)(5,4)(6,3)

Ways to score 10 : (4,6)(5,5)(6,4)

Ways to score 11 : (5,6)(6,5)

Ways to score 12 : (6,6)

The probability of each score increases linearly from the lowest value to the middle value and then decreases linearly to the largest value. This type of probability distribution is known as a triangular distribution. A triangular distribution occurs whenever two random effects with uniform distributions of similar magnitude are added together to give a combined affect.

When more random effects are combined, the peak of the triangle starts to flatten and the ends extend into tails, giving a bell-shaped distribution known as the Gaussian, or normal, distribution. Lots of uniform or triangular distributions add up to give this normal distribution. In fact, the normal distribution occurs whenever lots of different random effects, with different shaped distributions, add up to give a combined effect. This is proven more mathematically by the central limit theorem. Because of this effect, the normal distribution occurs very commonly in the complex systems of the natural world and processes are often simply assumed to be normal.

If we know the standard deviation and the probability distribution for a process, then it is possible to calculate the probability of the output taking a given range of values. This means that the probability of a defect can be calculated. It is also possible to calculate the probability that a given value belongs to this distribution. If it is very unlikely that a measured part could have come from the probability distribution for the stable process, then it is likely that a new special cause has emerged, indicating that the process is going out of control.

Run Charts and Control Charts

A run chart is a simple scatter plot with the sample number on the x-axis and the measured value on the y-axis. It presents a view of how the process changes over time.

Control charts are very similar to run charts but they also include control limits and often other zones. For example, there may be horizontal red lines at +/- 3 standard deviations representing the control limits, and additional horizontal lines marking +/-1 and +/-2 standard deviation. The number of standard deviations is often simply referred to as sigma. A control chart is a very important graphical tool used in SPC. It is used to monitor processes to check that they are “in-control.” The regions between the process mean and the +/-1 sigma may be referred to as Zone C, between 1 and 2 sigma as Zone B, and between 2 and 3 sigma as Zone A. It is important to understand that the control limits do not relate to the product specification or tolerance in any way. They simply show the variation of the process when it is under control, so that its current operation can be compared with that state. Process capability is also important and should have been established during phase 1 of the SPC where the process is setup. The Control chart is used during phase 2 to ensure that the process is stable.

A control chart makes it easy to spot when a process is drifting or producing errors which cannot be explained by normal random variations. For example, if several points are all increasing or decreasing then this would indicate the process is drifting out of control.

Different rules may be applied but in general, if any of these conditions are true then it indicates that the process is out of control:

A point is outside control limits
7 consecutive points on same side of center line
7 consecutive intervals increasing or decreasing
2 out of 3 consecutive points in same Zone A or beyond
4 out of 5 consecutive points in same Zone B or beyond
14 consecutive points alternate up and down
14 consecutive points are in either Zone C

Different types of control charts are used to monitor different types of processes with different sampling strategies. For example, Individuals with Moving Range (IMR) for individual real time measurements, X-bar R or X-bar S when regular samples are taken, and Np/p for attribute data. I’ll cover the different types of control chart and other details of SPC in future posts.