How Noise Reduction Actually Works

July 3, 2026 · Under the hood · about a 4 minute read

Our cleaner tells you it uses "spectral gating" and we'd rather that meant something to you than sound like incense. Here's the whole idea, no math degree required — including the failure sounds, because knowing why a tool breaks is how you use it well.

Sound as a stack of frequencies

Any slice of audio — say, 40 milliseconds of you talking over a fan — can be described as a recipe: this much 100 Hz, this much 250 Hz, this much 4 kHz, and so on. Computing that recipe is what a Fourier transform does, and it's cheap enough that your phone does it thousands of times per second without noticing.

The crucial observation: steady noise has a boring recipe. A fan sounds the same this second as last second — its frequency recipe barely changes. Your voice's recipe, meanwhile, dances around constantly. That difference is the entire trick.

The four moves

1 – Fingerprint the noise. Find moments where nobody's talking (or use the sample you marked) and average their frequency recipes. That average is the noise profile: how much energy the fan puts at every frequency.

2 – Slice the audio into overlapping frames — ours are about 46 ms, overlapping 75% — and compute each frame's recipe.

3 – Subtract, per frequency. In each frame, any frequency holding barely more energy than the noise profile predicts is mostly noise: turn it down hard. A frequency towering over the profile is mostly voice: leave it alone. This is the "gate" — each of a thousand frequency bands has its own tiny volume fader, adjusted forty times a second.

4 – Smooth and rebuild. Raw gating flickers, so the faders are smoothed across neighbouring frequencies and across time, then the frames are woven back together into audio.

The failure sounds, decoded

"Musical noise" — faint watery twinkling in the quiet parts — is what under-smoothed gating sounds like: random frequency bins winking on and off. Our temporal smoothing exists specifically to tame this; hearing it means strength is set too hot for how variable your noise is.

The underwater voice happens when subtraction gets greedy. Consonants and breath sounds are quiet and noise-like by nature; an aggressive gate eats their edges and speech turns soft and gargly. This is the tool working exactly as designed on the wrong settings — which is why the strength slider and the A/B habit matter more than any algorithm choice.

And the honest boundary, one more time: all of this assumes the noise holds still. AI dialogue isolators (the upload-your-file services) attack the moving-noise problem with learned models of what speech is — genuinely different machinery, with its own artifacts and its own privacy bill. For hiss, hum, fans and rooms, the forty-year-old trick, run locally, holds up remarkably well. You now know exactly what the button does.

Watch it work on your own file — the cleaner shows the measured floor drop, and now you know where the number comes from.