I’m working on a small project (which will be revealed once it reaches some useful stage), and one of the things I need to work on, is to enabling sound visualization.
Visualizing the sound usually incorporates two factors: the amplitudes (which creates the ‘wave’) and a frequencies spectrogram. In this part we will try to do those analysis, however not ready for realtime visualizing yet, this will be made on the next stage – first we need to understand, what’s going on, before we start optimizing it.
We’re using here Python 3.8, together with some additional libraries – see imports section below.
For now we will use simple scipy.io.wavfile library for reading the sound files. For real usage we need to use a better library, as this one is pretty much limited.
import matplotlib.pyplot as plt
from matplotlib import colors as clrs
plt.rcParams['agg.path.chunksize'] = 10000
from scipy.io import wavfile as wav
from scipy.fftpack import fft, fftfreq
import numpy as np
from skimage.util import img_as_int
from skimage.transform import resize
from skimage import io, color, exposure
Reading a file¶
In our analysis we take the mono signal, so we have to convert it, if needed. After loading, we will check the basic parameters of the loaded file, like its length and sample rating.
samplerate, audio = wav.read('test_signal_03.wav')
if audio.ndim > 1:
audio = np.mean(audio, axis=1) # we want mono!
num_samples = audio.shape[0]
audio_length = num_samples / samplerate
max_value = np.max(audio)
print(f'Audio length: {audio_length:.2f} seconds')
print(f'Audio samples: {num_samples}')
print(f'Max value:{max_value}')
fig, axes = plt.subplots()
axes.plot(np.arange(num_samples) / samplerate, audio)
axes.set_xlabel('Time [s]')
axes.set_ylabel('Amplitude');
plt.show()
plt.close(fig)
Audio length: 146.00 seconds Audio samples: 6438600 Max value:24741.0
Selecting a fragment¶
For visualization, the whole file won’t give us anything useful. Both the amplitudes and especially the spectrogram, need to reflect some small portion of thw whole. What portion?
Visualization is a video. Video is made of frames. For each frame we need the new portion of data. Fortunately, video framerate is way lower than sound samplerate. We can select a portion of samples that reflect a selected period of time between two frames – current and the next one.
As input parameters, we take the current time in seconds and the target video framerate.
fps = 30
total_frames = int(audio_length * fps)
current_position = 100 # in seconds
current_frame = int(current_position*fps)
number_of_frames = 1 # number of frames to analyse
print(f'Current frame: {current_frame}/{total_frames}')
sample_start = int(current_frame / fps * samplerate)
sample_end = int((current_frame + number_of_frames) / fps * samplerate)
N = sample_end - sample_start
T = 1.0 / samplerate
Current frame: 3000/4380
Performing an analysis¶
We’re ready to perform the analysis. First one, amplitudes, is pretty straightforward, we just read the data from the file.
audio_part = audio[sample_start:sample_end]
For the spectrogram, we perform a 1-D discrete Fourier transform and put the results over the frequencies axis. FFT produces both negative and positive values. If input values are real numbers (as ours are) negative results are exactly same as positive ones (but negative, of course) – hence we can draw only the positive ones.
freqs = fftfreq(N, T)
spectrum = abs(fft(audio_part))[:int(freqs.size/2)]
Now we can plot the figures and see both.
fig=plt.figure(figsize=(10, 3))
axes=[]
axes.append(fig.add_subplot(1, 2, (1, 1)))
axes.append(fig.add_subplot(1, 2, (2, 2)))
axes[0].plot(np.arange(N) / samplerate, audio_part)
axes[0].set_xlabel('Time [s]')
axes[1].plot(freqs[:int(freqs.size/2)], spectrum)
axes[1].set_xlabel('Frequency [Hz]')
fig.tight_layout()
plt.show()
plt.close(fig)
Color mapping¶
Before we prepare our textures for further use, we will prepare a special color map. We want the values to be represented in the red channel, from pure black to pure saturated red.
reds = plt.get_cmap('Reds', 256)
black_reds = reds(np.linspace(0, 1, 256))
for i in range(256):
black_reds[i:i+1, :] = np.array([i/256, 0, 0, 1])
map_black_reds = clrs.ListedColormap(black_reds)
The final texture¶
We’re making a 512×512 texture, where top 256 lines are frequencies spectrogram, while bottom 256 lines are amplitudes.
audio_part_img = exposure.rescale_intensity(audio_part, out_range=(-1.0, 1.0))
audio_part_img = np.expand_dims(audio_part_img, axis=0)
audio_part_img = resize(audio_part_img, (256, 512), anti_aliasing=True)
spectrum_img = exposure.rescale_intensity(spectrum, out_range=(0, 100))
spectrum_img = np.expand_dims(spectrum_img, axis=0)
spectrum_img = resize(spectrum_img, (256, 512), anti_aliasing=True)
fig=plt.figure(figsize=(5.12, 5.12), dpi=100)
fig.subplots_adjust(hspace=0)
fig.tight_layout()
axes=[]
axes.append(fig.add_subplot(2, 1, 1))
axes[0].set_axis_off()
axes[0].margins(0)
axes.append(fig.add_subplot(2, 1, 2))
axes[1].set_axis_off()
axes[1].margins(0)
axes[0].imshow(spectrum_img, cmap=map_black_reds)
axes[1].imshow(audio_part_img, cmap=map_black_reds)
plt.savefig("test_sound.png", dpi=fig.dpi)
plt.show()
plt.close(fig)
Our resulting texture is exactly what we need for sound visualization. The problem is, producing it takes too much time and cannot be used for realtime usage. This problem we’ll solve next time.