1 Introduction and scope

Bottleneck detection in manufacturing is the first and most essential step to improve overall manufacturing capacity. Yet as detailed in the paper below, existing methods lack either accuracy or practicability, or both. This paper aims to detect the bottleneck in flow lines. The presented methodology was developed by Roser at the Robert Bosch GmbH, where it is known as the bottleneck walk. The method allows the continuous improvement of the system capacity. It is assumed that the flow lines have defined buffers between processes and are not equipped with electronic data-monitoring systems. The latter assumption is based on the authors’ practical experience, where most production lines are not equipped with electronic data-monitoring systems appropriate for bottleneck detection for three reasons:

  • Flow lines are often combinations of manual and automatic processes. However, live data of manual processes are usually difficult to obtain and hence not available, even for the rare circumstances where this would be permitted by work councils.

  • Not every station is equipped with a suitable electronic system or an overall system network.

  • Even if stations are equipped with data-monitoring equipment, the information gathered is usually insufficient for bottleneck detection and lacks key information.

Therefore, the described method not only contains the method for evaluation of shop floor bottleneck data, but also describes a process on how to raise the data on the shop floor.

2 Bottleneck fundamentals

2.1 Bottleneck definitions

The importance of improving bottlenecks has been recognized and described by several authors [14]. However, the prerequisite for improving the bottleneck is to find the bottleneck in the first place (bottleneck detection). Hence, before searching for the bottleneck, it is important to first clearly define what a bottleneck is. A number of bottleneck definitions are available in the literature:

  1. 1.

    Krajewski et al. [5] describes a bottleneck as a function that limits output.

  2. 2.

    Chase and Aquilano [6] call it a resource whose capacity is lower than the demand, or the process that limits throughput.

  3. 3.

    Roser et al. [7, 8] define the bottleneck as a stage in a system that has the largest effect on slowing down or stopping the entire system.

  4. 4.

    Kuo et al. [9] observe that on the shop floor, a bottleneck is often defined as the machine whose production rate in isolation is the smallest among all the machines in the system.

  5. 5.

    Kuo et al. [9] also observe that, alternatively, on the shop floor a bottleneck is often defined as the machine with the largest work-in-process inventory in the preceding buffer.

  6. 6.

    Kuo et al. [9] finally define the bottleneck as the process whose sensitivity of the system’s performance index to its production rate in isolation is the largest, as compared to all other processes.

Definitions 1 and 2 deliver a basic understanding of bottlenecks, but are not precise enough for shop floor application. Definition 4 is limited to only static systems, whereas definition 5 is only an indirect measure via inventory and hence subject to other influences resulting in flawed bottleneck detection. Although these influences are in practice often negligible, the author has also seen instances where this influence could not be ignored. Definition 6 is the one with the highest accuracy as proven by Kuo et al. [9], while at the same time being general enough to be accepted as a basic definition of bottlenecks for manufacturing systems.

However, most of these definitions do not take the shifting of bottlenecks into account. Yet, in dynamic systems, bottlenecks do shift. Hence, we expand the definitions by Krajewski et al. [5], Roser et al. [7, 8], and [9] to include both multiple bottlenecks and a measure of influence on the system by defining the bottleneck as follows:

Bottlenecks are processes that influence the throughput of the entire system. The larger the influence, the more significant the bottleneck.

The authors distinguish between momentary bottlenecks and long-term bottlenecks. The momentary bottleneck may be in different processes at different times. Hence, more than one process can influence the overall system throughput. The degree of influence of a process on the entire system—and hence, the long-term bottleneck influence of this process—depends on the duration of time this process is a momentary bottleneck as proven by Roser et al. [10].

2.2 Degree of influence of a single process on the entire system

Since in dynamic systems bottlenecks shift, more than one process is likely to be a bottleneck using the definition above. Therefore, it is of interest to compare the relevance of the bottlenecks. The larger the bottleneck, the larger its influence on the system throughput. While this sensitivity required by Kuo et al. [9] is difficult to obtain analytically, it can be obtained experimentally by comparing the system behavior for different cycle times.

In any case, the influence of the process on the overall system performance depends heavily on the speed of the process. Figure 1 shows the different possibilities of influence. If the process has a fast time between parts, it is likely that other processes in the system will in combination always be slower than the observed process. Hence, any change in the process speed has no influence on the system speed, and the process is not a bottleneck since the maximum speed of the remaining system excluding the observed process is slower than the process.

Fig. 1
figure 1

Relation between process speed and system speed under consideration of the bottleneck

As the process becomes slower and its time between parts increases, however, it will start to have an influence on the system speed. Hence, the process is now a partial bottleneck. The slower the process becomes, the larger its influence. Eventually, the process is always slower than any other part of the system. Any further increase in the time between parts of the process will lead to an equal increase in the time between parts for the entire system. Any slowdown of the process will lead to a slowdown of the system of equal magnitude. The process is now the only bottleneck.

For a numerical example, assume that the process under consideration is able to produce in average one part every 5 s and the remaining production system is able to produce in average one part every 60 s. In this case, the process under consideration is unlikely to ever be the bottleneck. If the process becomes slower and approaches 60 s between parts, it is more and more likely to influence the overall speed of the system and will be sometimes the bottleneck. If the process becomes even slower and is significantly slower than the rest of the system, it is likely that this process is always the bottleneck, and the remaining system always has to wait for this slow process.

As for determining the degree of influence of a process on the entire system through simulation, we change the speed of the process and observe the change in the speed of the entire system. The gradient of this relation in percent represents the degree of influence of the process on the entire system. This can also be seen as the sensitivity of the system speed to the speed of a single process.

The degree of influence of the process on the entire system can be described by the gradient of the curve. A non-bottleneck has no influence and a gradient of 0 %. If there is only one dominating bottleneck, its gradient is 100 %. In most real-world systems, however, all processes have a gradient less than 100 %, and more than one process has a gradient above 0 %.

Hence, the degree of influence of a process onto the system can be between 0 and 100 %. The shape of the curve depends heavily on the details of the system. Additionally, for static systems, these graphs have sharp corners, where the gradient changes from 0 to 100 % the instant the process becomes the slowest process in the system.

2.3 Blocking and starving

Kuo et al. [9] also state that definition 6 cannot directly be measured on the shop floor. The major accomplishment by Kuo et al. [9] is the proof that an evaluation of the processes being “blocked” or “starved” will find the bottleneck according to definition 5. These states can be defined as follows:

  • Blocked a process has to stop because its subsequent buffer or process is full.

  • Starved a process has to stop because its preceding buffer or process is empty.

Each process may at different times be blocked or starved, or neither blocked nor starved. In a production line, the frequencies of blockage and starvation of adjacent processes can be compared. According to Kuo et al. [9], if the upstream process has a higher frequency of blockage than the downstream process has of starvation, then the process between the upstream and downstream processes is the bottleneck.

For practical purposes, please also note that a bottleneck does not necessarily have to be in a production process itself. It can also be (and in our experience frequently is) in a logistics process that supplies processes. Furthermore, it can even be a process within the information flow (regardless of push or pull systems).

3 Common bottleneck detection methods in industry

3.1 Process Time

The process-time approach measures the process times in the material flow under isolated conditions. This method offers a simple and fast way to detect the bottleneck. But the method detects only the static bottleneck—the capacity limit of the flow line. This method does not include any losses and therefore does not detect the bottleneck, but rather the maximum capacity under ideal conditions. Variations in this method are, for example, the X-factor theory [11].

3.2 Utilization- or OEE-based approaches

Approaches using utilization [1214] or related overall equipment effectiveness (OEE) measures enhance the process-time approach by including performance losses. The bottleneck detection focus lies in the analysis of the gap between net production time and total time. The main flaw of this method is that it is based on averages and cannot detect shifting bottlenecks in dynamic systems.

3.3 Simulation

Simulation is an experimental procedure for modeling a system and its dynamic processes in a software model that can be experimented with in order to gain knowledge. This knowledge can then be transferred back to reality. A simulation enables the user to model a system, even if it has not been built yet. Afterward, the user is able to test the system under a variety of conditions [15].

A simulation basically allows for detection of bottlenecks especially when the combination of elements prohibits other classic bottleneck detection. Furthermore, the ability of the simulation software to visualize material flow design increases the system’s acceptance within the management [15].

For practical bottleneck detection in the environment described in this work, making assumptions is one of the key problems for the application of simulation software. While average process times are often reasonably well known, statistical data on process time are usually rather difficult to obtain. Hence, the data quality is often insufficient for the level of precision required for bottleneck detection. Therefore, simulation can be excluded as a basis for a detection methodology.

3.4 Active period method

The average active period method [16] and the active period method [17, 18] by the primary author are based on the duration a process is working without interruptions due to waiting for parts or transport. The average active period method defines the bottleneck as the process with the longest average active period, while the active period method defines the momentary bottleneck as the process with the momentarily longest active period. These methods work well and are able to determine the overall effect of processes on system capacity, and have also been used for additional tasks as, for example, buffer optimization [19]. On the downside, these methods require extensive process-related data that may or may not always be available. As such, they are only useful if the data are available. The presented bottleneck walk is based on these reliable methods while avoiding the extensive data requirement.

3.5 Summary

The summary above focused on methods used in industry. Overall, applicable methods lack the ability to detect the shifting bottleneck for dynamic and instable shop floor environments and are hence unsuitable for industry. Of course there are numerous other methods described in academic literature, although in our experience they are infrequently used in industry (see, e.g., [20, 21], for a recent overview of methods). Other methods also look not only at throughput, but also at other objectives, including throughput time, reliability, WIP, and others [22].

4 The bottleneck detection methodology

4.1 Basic methodology

The bottleneck walk is based on observations of different process and inventory states [23]. These data are gathered during a walk along the flow line. The collected data are evaluated in a systematic process. The result of these two steps is a ranking of bottleneck sets that limit the output of the flow line during the period observed.

4.2 Observation of process states

When observing a process, it cannot be determined by one observation alone if the process is the bottleneck. If the process is working, it may or may not be the bottleneck. If the process has an ongoing breakdown, it may or may not be the bottleneck. If the operator is absent, it may or may not be the bottleneck. However, it can be clearly stated when it is not the bottleneck. Whenever the process is waiting, it cannot be the bottleneck, since the process is waiting on another process. The process could work more but is slowed down by the bottleneck. Furthermore, from this observation of a waiting process, it can be determined in which direction the bottleneck needs to be searched next. If a process is waiting for parts (starved), then the bottleneck must be upstream. If a process is waiting for transport (blocked), then the bottleneck must be downstream. The list below gives an overview of different possible system states and the conclusion about the bottleneck.

  • May be the bottleneck working; breakdown; setup; maintenance; scheduled break, etc.

  • Starved bottleneck is upstream.

  • Blocked bottleneck is downstream.

While detecting the process state, waiting for the end of the process time is essential to ensure precision. The moment after the process time ends and the transfer of the part to the next station happens tells the observer what the actual state is. This is obsolete if the machine state is obvious and will not change within the length of a process time.

4.3 Observation of inventories

The second source of information is the inventories. These also give hints to the direction of the bottleneck. If the buffer between two processes is full or rather full, the bottleneck is probably downstream where the parts go to, assuming of course a fixed buffer size. Similarly, if the buffer is empty or rather empty, the bottleneck is probably upstream where the parts come from. If the inventory is half full, the bottleneck may be in either direction. While this information is probable, it is not absolutely certain that the momentary bottleneck is upstream or downstream.Footnote 1 For practical purposes, however, the information is still relevant. As with the processes above, the inventories can give us the direction of the bottleneck.

A clearly defined buffer can be filled between 0 and 100 %. Here it is necessary to decide at which point the bottleneck is considered to be upstream, downstream, or unknown. It is important to acknowledge that for inventory levels around half capacity, the bottleneck direction is highly uncertain. Hence, around half capacity, no valid statement can be made.

The closer the capacity is to one extreme, the more likely the bottleneck is in the corresponding direction, but the chances of observing the direction become less likely. Hence, a trade-off has to be made between accuracy and observability. From the authors’ practical experience, a one-third approach worked well. If the buffer is below one-third full, then the bottleneck is probably upstream. If the buffer is above two-thirds full, then the bottleneck is probably downstream. If the buffer is between one-third and two-thirds full, then there is not enough information to assume a bottleneck upstream or downstream. Of course other trade-offs are also possible. Especially in the case of small buffers, the rule of one-third often cannot be followed due to rounding problems.

4.4 The walking process

The bottleneck walk passes along the observed flow line and monitors the data of different processes and inventories as described above. In the authors’ experience, it is sometimes better to walk against the flow of material to avoid walking “with” a single part. This, however, is not a fixed requirement for the bottleneck walk. Furthermore, in practice, it is helpful to select the spots to be observed beforehand.

Of course with shifting bottlenecks, it is possible that the shift of the bottleneck overlaps with the walk, as the data are gathered sequentially (by walking) and not concurrently. However, in our experience, even for systems with small buffer inventories of less than 5 pieces and rapid cycle times of less than 3 s, a bottleneck shift happens less than once per minute. Hence, the likelihood of a bottleneck shifting while the processes involved are under observation is possible but unlikely. Furthermore, in practice, a shift can also be observed during the walk.

4.5 The evaluation process

Observing the waiting times of processes and the inventory levels will yield consistent information about the bottleneck direction. To combine these information bits into a picture, a data sheet as shown in Fig. 2 is used. A similar data sheet can easily be constructed for other systems, with an example shown in Fig. 3. All observed processes and buffers are listed in sequence on the top of the sheet, with a separate column for every observed spot. The example shows a common flow line with buffers in between.

Fig. 2
figure 2

Example data sheet for bottleneck detection

Fig. 3
figure 3

Generic blank data sheet for bottleneck detection of up to nine processes in sequence

During the bottleneck walk, the observer walks along the line, writing down the inventory levels and process states in one line of the data sheet each round. For practical purposes, the process states are abbreviated with “W” for waiting, “P” for processing, “B” for breakdown, and so on. Subsequently, for every buffer or process where the direction of the bottleneck can be determined, an arrow is drawn on the data sheet in the direction of the bottleneck. The bottleneck then must be between the arrows pointing toward each other. Circling the bottleneck with a red box visualizes the finding.

Repeating a string of observations multiple times will give a picture of the shifting bottleneck over time, and it will be easy to determine where the bottleneck most frequently was. In addition, the observations will also give clues to why a process became the bottleneck. In the example of Fig. 2, process C seems to be the most frequent bottleneck and is usually processing a part when it is the bottleneck. Hence, it appears that the process time of process C causes process C to become the bottleneck [24].

For quantitative evaluation, the calculation of the bottleneck frequency for each process is suggested. It is the number of measurements the process was bottleneck by arrow evaluation divided by the total number of measurements. The process with the largest bottleneck frequency is the primary bottleneck and should be focused upon in future improvement activities.

To gain further information, it is also advised that the observers look at the bottleneck immediately after each observation and try to understand why the process became the bottleneck right then. These insights will be invaluable for later improvements of the bottleneck.

As above, while in theory this all sounds very straightforward, in reality there are again some additional points to remember. First, there may be more than one bottleneck indicated in one line as, for example, in measurement 3 of Fig. 2 above. This simply means that the bottleneck is currently shifting. Two or more processes are a bottleneck for a part of the line, and yet, it is unknown which bottleneck process will eventually dominate the other bottleneck process. However, it will be one of the processes indicated as bottlenecks.

Secondly, as shown in measurement 2 of Fig. 2 above, the area between the arrows pointing to a bottleneck may cover more than one process. In this case, all the processes between the arrow tips may be the bottleneck. Similarly, the arrows may point to the gap between two observations as shown in line 4 of Fig. 2. It may be that the bottleneck shifted just while you were walking past these two points taking data. However, in the authors’ experience, it is much more likely that there is a small process in between that has not been studied in detail. This may be, for example, a transport process or another secondary process that is the bottleneck at that time. Since this happens rather frequently in practice, the data sheets in Figs. 2 and 3 above have a double vertical line between observation spots to remind the user about the possibility that there may be something else that was not looked at in detail.

Finally, as in measurement 5 of Fig. 2 above, it is also possible that the bottleneck is outside the scope of the observations and the entire system may be slowed down by a lack of demand or supply.

4.6 Examples

The authors have used this method successfully in over 20 different production lines to detect the bottleneck. In roughly half of the cases, the true bottleneck differed from the expectation of the management, and in about one-third of the cases, the bottleneck was in a previously unobserved secondary logistic process. Cycle times ranged from 2 s to 15 min, with between 10 and 30 processes in the lines. In the following, two examples are presented to illustrate the procedure and its advantages compared to other approaches.

The first example in Fig. 4 shows a very basic case of an assembly line for a valve. This valve is assembled on a fully automated line with four major stations, each having similar cycle times. The cycle time was very fast, with one part being produced every 2 s. The buffers between the stations were very small, often only three to five parts. The combination of fast and similar cycle times with small buffers led to rapidly changing bottlenecks.

Fig. 4
figure 4

Bottleneck detection sheet of a fast-changing valve assembly line. For the sake of clarity, the number of parts in each buffer has been omitted and only a limited number of ten observations are shown

Despite the fast cycle time, it was quite possible to observe waiting times in processes. In preparation for the bottleneck walk, we selected one or two spots at each station where the waiting times could be observed easily.

For example, when an arm adding a spring returned to its rest position, the part started moving to the next station. Whenever there was a small delay between the arm returning and the part moving, the process was waiting for another process downstream. Or a verification process ended with a small light going from red to green, upon which the next part was released from the stopper. When the light turned green and there was no part at the stopper, then the process was waiting for material upstream.

Similarly, the maximum buffer capacities of selected buffers were measured and the quantities for bottleneck upstream/downstream/unknown were decided. After these preparations, the actual bottleneck walks took only 3 min each. Due to the nature of the system, the bottlenecks changed quickly, with the bottleneck moving to a different process roughly every 10 min. Nevertheless, during almost every walk, the actual bottleneck was very clear. The observations were distributed throughout multiple days, although with such a fast-changing system it would have also been possible to observe with smaller durations between observations, although in this case the results would of course only represent the behavior during the observed period.

However, not only was it possible to observe such rapidly shifting bottlenecks in action, but through multiple observations it was also possible to determine the likelihood of each process being the bottleneck. Process C was frequently the bottleneck (50 %), with two other processes, A and D, being occasionally the bottleneck (30 and 20 %, respectively). The last process was never the bottleneck. The table in the lower part of Fig. 4 shows the direction of the bottleneck and the bottlenecks in black for ten bottleneck walks. While not all data points gave a direction, for each walk the bottleneck was very clear.

This rapidly changing bottleneck was very easy to observe using the bottleneck walk, but would have been difficult or impossible to determine using the traditional methods such as line-balancing charts, average cycle times, or inventories. These methods find bottlenecks only in the processes that are directly observed, missing bottlenecks processes that are not under observation.

In the second example, the capacity of a highly automated assembly line producing electronic components needed to be improved. The line consisted of different individual workstations, with the parts transported via workpiece carriers and coupled by conveyor belts as shown in Fig. 5. The second to last station consisted of two parallel quality control processes for capacity reasons. Plant management believed these quality control processes to be the bottleneck based on cycle time and a large queue of material waiting for these stations. However, despite significant effort to reduce the process time, the overall capacity did not improve.

Fig. 5
figure 5

Diagram of the material flow of automotive component assembly line

The analysis shows that while there was usually a long line before the quality control stations, these stations had a very short waiting time for material after almost every part. The time was barely noticeable, being around 0.2 s of a 3-s cycle time. Nevertheless, the station was waiting for material despite the long queue of material before these parallel stations.

It turned out that a small device was moving the workpiece carriers to one or the other of these two parallel quality control stations. This workpiece carrier was the bottleneck. As an otherwise insignificant secondary process, it has so far completely escaped attention. Only the bottleneck walk was able to determine the bottleneck reliably in a minor process that was not even part of the investigation. In our experience, between 30 and 50 % of all bottlenecks are in such secondary transport-related processes and, as such, are ignored by all other bottleneck detection efforts. The result of an exemplary bottleneck walk is shown in Fig. 6 below, where the arrows point to the bottleneck in a previously unobserved spot between the buffer and the quality control.

Fig. 6
figure 6

Exemplary bottleneck-walk result of automotive component assembly line shown in Fig. 5. Bottleneck was detected in an unobserved logistics process. For the sake of clarity, the bottleneck direction has been noted directly in the material flow graph

4.7 Application

The method is based on multiple observations. The number of observations is a core issue for application of the method described. For this, the method can have different target groups: first, industrial engineers, whose dedication is to personally improve manufacturing systems; and secondly, frontline managers (first level of leadership).

For these managers, two or three observations per day are a viable approach. Since those managers frequently cross the manufacturing site, they often have the opportunity for a single measurement using the bottleneck walk methodology. This methodology allows gathering information in a structured way. Therefore, it allows for a focused approach based on shop floor observations without requiring a large investment of time. The following practical rules for frequency of observation are given from shop floor experience.

  • The frequency of shifting bottlenecks increases for systems with shorter system cycle times, i.e., a system that produces a part every 5 s shifts more frequently than a system that produces a part every 2 h.

  • The frequency of shifting bottlenecks increases for systems with smaller buffer inventories, i.e., a system that has only two parts between processes shifts more frequently than a system that has ten parts.

  • The more balanced the cycle times of the processes are, the larger the frequency of bottleneck shifts.

  • The more frequent a bottleneck shifts, the less delay is needed between observations. Less frequently shifting bottlenecks need larger delays between observations to reduce the likelihood of measuring the same pattern twice.

5 Conclusion

Bottleneck detection is a critical part of the continuous improvement process. Unfortunately, commonly used bottleneck detection methods are woefully inadequate for practical use, lacking either validity or usability or both. In two-thirds of all bottleneck detections done by the authors, the bottleneck was in a process different from what the managers of the line believed.

The presented bottleneck walk provides a framework for a simple yet accurate bottleneck detection method. For accurate bottleneck detection, it is necessary to determine the momentary bottleneck before making statistical conclusions. There are few methods that can detect the momentary bottleneck reliably, yet this is a key requisite for bottleneck detection in dynamic systems. Bottleneck detection methods that use averages overlook shifting bottlenecks. For a comparison of methods and their accuracy, see the forthcoming paper [25], where the bottleneck walk outperformed all conventional methods and was second only to the mathematically more demanding average active period method. If the duration of the averages is reduced and the observation is repeated frequently, then the effect of inventory buffers will likely diminish the accuracy of the observations.

The key advantage of the bottleneck walk, besides its accuracy, is its simplicity. No stopwatches or formulas are necessary for this approach. The bottleneck walk is effective and can be quickly applied. Furthermore, the bottleneck walk can also be easily taught to shop floor operators even without knowledge of mathematics. For this reason, it enables quick improvement cycles as demanded by the concept of lean production [26] and discussed in [24]. The method was thoroughly field tested in different manufacturing plants, providing a reliable and practical way to find the bottleneck.

The method has been developed based on research done at the Toyota Central Research and Development Laboratories, Japan, and the Robert Bosch GmbH, Germany.

6 Further research

Further research will concentrate on the extension of the bottleneck methodology on types of manufacturing other than flow lines. Since the process states can be easily obtained in flow lines, other manufacturing types are the next challenge for a transfer of this methodology.