If you're every out and wonder which bird is behind the song you can hear, this is a great app: https://birdnet.cornell.edu/ (Android and IOS)
You record a portion of the song, and it uses machine learning to analyse it and tell you the bird with a confidence figure. Works really well.
As an FYI, if you are interested in the fundamental frequency of birdsong the GitHup repo below might be of interest. It is an STFT + interpolation to get an accurate (potentially quickly changing) frequency estimate: https://github.com/JorenSix/stft_freq
Was this work done on data stored in lossy formats? The appendix with the talk about format conversion makes it sound like it. Should this not have been the first thing to be avoided („garbage in, …“)?
Couple points of gentle critique:
It's kinda hard to compare the different spectral representations when they're zoomed and cropped differently.
Spectrograms can be misleading, in a few different ways. Magnitude FFTs discard phase, which we can hear. And our eyes tend to fixate on the peaks, but the noise floor between harmonics in speech had a big impact on perceived quality. Choice of color scheme and gradient changes how we look at the spectrogram: they can emphasize mathematical or coding artifacts we wouldn't hear, or hide things which we can hear. At the end of the day, we don't hear with our eyes... So a spectrogram is a tool for looking at audio, but not always an 'honest' one. So I'm a bit suspect of pouring over slightly different spectrograms, and worrying about which ones look better aesthetically.