Machine listening doesn't just measure sound, but recognizing events.
How does Radio Rex work? The vibration from the "R" sound in "Rex" (around 500 Hz - male voice) would hit a thin metal blade and launch Rex the Dog.
For this week, I'm trying to make a simple machine that could "play" with a jazz or music composition. Initially wanting to analyze pitch one by one on a jazz composition, but I wasn't able to figure out how to callback a pitch extraction process during the audio stream. As of now, I've only adapted the example that demonstrates magnitude extraction on the audio stream callback, and set a TARGET_START_VOL at 0.01 for the machine to start counting increment of volume from that point. When the machine has counted more than 3 increments, it will trigger the Arduino to do something.
volume/magnitude/amplitude is being counted with the RMS of the audio.
if volume>=TARGET_START_VOL:if (state_of_mag['last'] isnotNone) and (volume is not None):
if volume>TARGET_END_VOL:s
tate_of_mag['count_mag_increasing'] =0# reset if it goes beyond end volume
elif volume > state_of_mag['last']: state_of_mag['count_mag_increasing'] +=1
state_of_mag['last'] = volume
else:
state_of_mag['count_mag_increasing'] =0 # reset it back if it not keep increasing
is_gershwin=state_of_mag['count_mag_increasing'] >= 3 # arbitrary number of frames to confirm it's increasing
The increasing volume is useful particularly for records like "Rhapsody in Blue" by George Gershwin, as we use the clarinet run in the beginning as a trigger of the Radio Rex.
Inspired by the memes of jazz where physical objects accidentally make a sound that sounds uncannily like an improvised avant-garde trumpet or drums, from a door creaking to a car honking, they perform a call and response with the jazz band. For this simple example, I haven't managed to generate any sound yet from my physical object, I've only made the motor moves when the clarinet volume dramatically increases.
Python Examples
FFT is usually accompanied with finding the fundamental pitch (strongest frequency in audio (1-D)) given the sample rate.
All of these are in 1D:
fft = np.fft.fft(data)
mag = np.abs(fft) # mag is amplitude
How? (remember that abs of imaginary number is not the same as squaring a real number)
X[k] = a + bj
Thatβs a complex number:
a = real part (cosine component)
b = imaginary part (sine component)
freq_bins = np.fft.fftfreq(len(data), 1/sample_rate)
find the peak bin index
peak = np.argmarx(mag)
remember that fft is a global frequency summary, if you want to plot freq over time, we need to run it in range to go from 1s to the next second:
for start in range(0, len(audio) - win_size, hop_size):
frame = audio[start:start + win_size]
frame = frame * window
fft_frame = np.fft.rfft(frame)
specs.append(np.abs(fft_frame))
specs = np.array(specs) # shape: (time_frames, freq_bins)
freqs = np.fft.rfftfreq(win_size, 1/sr)
times = np.arange(specs.shape[0]) * hop_size / sr
Alternatively, using librosa will be generating an array of the fundamental pitches, useful for plotting it over time.
f0, voiced_flag, voiced_prob = librosa.pyin(y, fmin=50, fmax=500, sr=sr, frame_length=2048, hop_length=sr)
times = librosa.times_like(f0, sr=sr, hop_length=sr)
for t, pitch in zip(times, f0):
print(f"{t:.1f}s -> {pitch if not np.isnan(pitch) else 'no pitch'}")
Elizabeth Kezia Widjaja Β© 2026 π