In case it helps anybody, consider the following analogy –
Imagine you were going to a polling station to cast your vote. Inside the polling station, one person is assigned the job of checking your voter ID card, and registering your name. Probably, on the adjacent bench, a person puts the ink-mark on your index finger. There might be many more such people in charge (each doing a certain work), before you ultimately stand before the EVM to cast your vote. All these people (and the EVM) can be considered as stages of a (voting) pipeline.
Now, let say you were to go to the polling station at 3pm in a Sunday afternoon. Chances are you would encounter a sea of people there. Now, imagine what would happen if the people in charge refused any queues inside the polling station. In that case, there would a treacherously long queue of agitated people waiting outside the entry gate. Instead, if the people in charge did allow queues inside the polling station, then the queue outside it would be shorter, and people would be less agitated seeing the crowd move more quickly (through the stages of the voting pipeline).
Drawing comparison from the above analogy, we can imagine what would happen when the stages of the hardware pipeline do not have buffers. The processes/instructions start getting blocked just to enter the pipeline! When multiple stall cycles from previous instructions are involved, the later instructions can’t even enter the pipeline! Given that pipelines are used by a large number of instructions of various kinds, and for a long period of time, such kind of instructions with multiple stall cycles are indeed quite frequently encountered. If this happens, then what remains of the benefits that a pipeline promises? To keep everything moving, a practical hardware pipeline must implement stage buffers. Yes, this does increases the delay in the system. Yes, processes/instructions can still get blocked before entering the pipeline. But, it’s still far better than the former case of not using stage buffers!