← all projects

pump acututre

Python,pandas,librosa,SciPy,FastAPI,Docker,Podman,Traefik
pump acututre

What Happens Every Day on the Factory Floor

I have spent years around manufacturers and repair shops. When a critical machine dies on the floor, you see one of two outcomes. Either the plant has a spare on hand — a VFD, an extra pump, a motor — and the crew swaps it in within hours; the failed unit goes to a repair shop, or a new one is ordered to refill the emergency shelf. Or there is no spare, and the line stops while everyone waits.

I have also been inside the repair shops. If the failed machine is too big to ship back whole, the technician tests what fits on the bench, exercising only the functions the bench can recreate. The lesson, both sides of the door, is the same. Being prepared beats being surprised. Hear a part degrading weeks ahead and you have time to find it, order it, schedule the swap on your own terms. Downtime collapses. That is the gap this project tries to live inside.


How These Failures Actually Happen

I have watched both scenarios play out. The good case: a pump drags on Wednesday, trips a breaker on Thursday, and by Friday the spare is bolted in and the original is on a pallet to the repair shop. Annoying, expensive, survivable. The bad case is the one nobody on the floor wants to call preventable — no spare, no warning anyone acted on, days of an idle line while a replacement waits its turn in the repair queue.

Both machines had been complaining for weeks. A bearing developed a faint, repetitive tick in the 2–4 kHz range. A seal wept under load. The shaft ran a fraction of a millimetre out of true and the casing picked up a low, periodic thump. None of it was loud. Nobody was listening for it anyway.

Pumps, motors, compressors — the unglamorous machinery the world runs on — almost never fail suddenly. They fail slowly, then all at once. For weeks before the "all at once" moment, they are telling anyone who will listen that something is wrong.

I spent a couple of months building a small system that listens. It takes a ten-second audio clip from a microphone near a pump and answers one question: does this pump sound healthy, or is it on its way out? This post is the story of how it works, what I got wrong before I got it right, and what the project taught me about the gap between "training a model" and "shipping a thing that actually works".

It is written for anyone curious — you don't need a background in machine learning or signal processing. I'll explain the jargon when it shows up, and there's a glossary at the end you can refer back to.

Image prompt for the header: A cinematic, slightly grainy photograph of a large industrial pump in a dimly lit factory, with stylised translucent sound waves emanating from its housing and curling toward a small microphone in the foreground. Warm amber emergency lighting, blue-grey shadows. Aspect ratio 16:9. No text.


Why Pumps Fail the Way They Fail

Reliability engineers call it the P-F curve. P is the potential failure point — the moment a defect first becomes detectable. F is functional failure — the moment the machine actually stops working. For a typical pump bearing, the gap is two to six weeks. That is the gap I wanted to live inside.

Most plants still maintain machinery one of two ways: run things until they break (cheap, until it isn't), or replace parts on a fixed schedule whether they need it or not (wasteful, but predictable). A third option — predictive maintenance — has existed for decades but used to require expensive vibration sensors, trained analysts, and a lot of patience. What changed recently is that the audio half of the problem became tractable. A $40 microphone and a laptop can now do what used to require a $2,000 accelerometer and a specialist.

The question is no longer can we hear failure? It is can we automate the listening?


The Data

To train any kind of audio classifier you need labelled examples — clips of pumps that are known to be healthy, and clips of the same pumps when something is wrong. Hitachi released exactly such a dataset in 2019 called MIMII (Malfunctioning Industrial Machine Investigation and Inspection). It contains thousands of recordings from real industrial pumps, fans, valves, and slide rails, with clear labels for normal and abnormal operation.

For the pump subset I worked with there are four physical pumps, each recorded for thousands of ten-second clips:

Pump Healthy clips Faulty clips Total
id_00 1,006 143 1,149
id_02 1,005 111 1,116
id_04 702 100 802
id_06 1,036 102 1,138
Total 3,749 456 4,205

Two things to notice. First, 4,205 clips is small by modern ML standards — large language models eat that for breakfast. Second, only about 11% of the clips are faulty. That class imbalance matters enormously, and I'll come back to it.

The clips themselves are ten seconds long, recorded at 16,000 samples per second, with eight microphones placed around each pump (one channel per microphone). I average those eight channels down to a single mono channel before doing anything else, which trades the ability to localise the fault (which side of the pump is it coming from?) for a cleaner signal-to-noise ratio. Worth it for the size of the dataset.

Image prompt: A clean, editorial-style infographic showing a side view of an industrial pump with eight small microphone icons arranged in a circle around it, each connected by a thin line to a single waveform on the right. Minimal, two-colour palette (deep navy and warm orange). Aspect ratio 4:3. Suitable for a technical blog.

test
hello

Turning Sound Into Numbers (Without the Math)

Here is the central problem. A computer cannot listen to audio the way you and I can. To a computer, a ten-second clip is a list of 160,000 numbers — sixteen thousand per second, for ten seconds — each number representing how much the microphone diaphragm moved at one instant in time.

You cannot just hand 160,000 numbers to a machine-learning model and ask it to figure things out. There is too much information, most of it irrelevant. A bearing about to fail does not announce itself by changing every single one of those 160,000 numbers in a coherent way. It announces itself by adding a particular kind of energy at a particular range of frequencies. The trick is to extract those qualities and throw away the rest.

The standard tool for this is a spectrogram. The idea is simple once you see it. You chop the ten-second clip into tiny overlapping slices — about 128 milliseconds each, the length of a blink. For each slice you ask: which frequencies are present, and how loud is each? You stack those answers side by side and you get a picture: time on the horizontal axis, frequency on the vertical axis, brightness representing energy.

Think of it as sheet music for sound. Where a musician would write high C, loud, for half a second, the spectrogram shows a bright spot near the top of the page, half a second wide.

Image prompt: Two side-by-side panels. Left: a raw audio waveform (the squiggly line everyone recognises). Right: the corresponding spectrogram — time on the x-axis, frequency on the y-axis, with brighter colours where there's more energy. A subtle arrow between them. Annotate the spectrogram with a small label "frequency over time". Flat, modern editorial style. Aspect ratio 16:9.

Once you have the spectrogram, you can ask interesting questions of it. Where does the energy live? Pumps have a motor that hums at a low frequency, around 50 to 100 Hz, with harmonics extending up to maybe 500 Hz. That energy is always there, healthy or not — it's the equivalent of the engine note in a car. It tells you nothing about whether the pump is failing.

The interesting stuff lives higher up, between 1,000 and 8,000 Hz. A bearing with a tiny crack in its raceway produces short, sharp clicks every time the crack passes the rolling element — like a fingernail tapping on glass, but at frequencies your ears barely notice. Cavitation, where dissolved gases boil out of the fluid and collapse violently, produces a hissy, broadband rush of noise across the same range. Impeller damage shifts the balance of energy between specific sub-bands. These are the signatures I wanted the model to learn.

gallery