VAD looks nice. But I need more. In [this audio](https://botcompany.de/files/1400344/its-very-windy.wav), I want to detect the last part (wind noise) as non-speech. Here's the VAD's result on the file: 