YouTube audio: levels, spectrum, sampling ...

Discussion:

(too old to reply)

J. P. Gilliver

2023-08-04 01:24:23 UTC

OK, not strictly broadcast, but YouTube is almost a broadcast channel.

I use yt-dlp a lot, usually on default settings (which AIUI usually gets
the best available), and then an extractor set to "extract original
audio" (I use Pazera, as it's easy to be sure it's extracting original
without any further transcoding; however, it's just ffmpeg-based, and I
presume any other similar would yield the same result). [Yes, I looked
into using the audio-only settings for yt-dlp, but they didn't easily
lend themselves to batching; besides, I sometimes _do_ want the video
too - clip of an artist performing, and I want the audio-only one for
use in the car. My muscle memory of the keystrokes to extract the audio
means I can do it in seconds anyway.] I usually look at the resultant
audio - sometimes with the intention of reducing the filesize, sometimes
just out of curiosity. (I use GoldWave, but I presume almost any other
similar utility - such as Audacity - would yield similar results.)

Several observations:

1. The _vast_ majority are coded at 44100 Hz, stereo. I suppose that -
"CD quality" - is the default setting for many capture/encoding devices,
but it does seem overkill for mono material, especially of considerable
age (such as from 78s). Still, I'm not surprised. (I very occasionally
find one that _has_ been encoded mono. Though I don't think I've seen
any encoded at less than 44100 - certainly if I have, it's been
extremely rare.)

2. The _level_ is often extremely low - especially for some old (say
1960-1999) video clips. (Not all, by any means - but often enough to be
noticeable.) By low, I mean I have to boost them by ×4, or even ×8 or
occasionally ×16, to get the peaks above 50% full scale. (I only use
powers of 2 to avoid distortion.) Is this something YouTube are
imposing? Is audio level adjustment difficult on some common piece of
video capture hardware/software? I even came across one recently where
the uploader _said_ something like "this is quiet, you may have to
adjust" in the notes, so s/he knew about it. This does seem odd.

3. (This is the one that finally prompted me to post.) Far more often
than not - I'd say over 90% of tracks - there's a visible (I can no
longer hear that high) tone around 15½ kHz. I presume in the majority of
cases, it's timebase - 15625 for "PAL" (yes I know, but YKWIM), 15750
for NTSC; even where it's not from an actual video source, I presume it
has picked it up somewhere in the processing, e. g. from a computer
monitor/graphics card. This is _not_ what's puzzling me. What is, is
that the spectrum is very often brickwalled at that line: even where the
actual valid material is all below 15, 12, 11, 10, 8, or 6 kHz (you'd be
surprised how much _does_ have nothing valid above those!), and the
remainder is just uniform noise - it cuts off at the line. Can anyone
think why? It's nowhere near the Nyquist limit of 22050; I could
understand a rolloff _towards_ that to avoid aliasing, and that rolloff
being gentle to avoid other adverse effects, but no, it's brickwalled,
and at the line (which is _well_ below).

--
J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)***@T+H+Sh0!:`)DNAf

I admire him for the constancy of his curiosity, his effortless sense of
authority and his ability to deliver good science without gimmicks.
- Michael Palin on Sir David Attenborough, RT 2016/5/7-13

Theo

2023-08-04 11:00:55 UTC

Permalink

Post by J. P. Gilliver
2. The _level_ is often extremely low - especially for some old (say
1960-1999) video clips. (Not all, by any means - but often enough to be
noticeable.) By low, I mean I have to boost them by ×4, or even ×8 or
occasionally ×16, to get the peaks above 50% full scale. (I only use
powers of 2 to avoid distortion.) Is this something YouTube are
imposing? Is audio level adjustment difficult on some common piece of
video capture hardware/software? I even came across one recently where
the uploader _said_ something like "this is quiet, you may have to
adjust" in the notes, so s/he knew about it. This does seem odd.

I suspect people aren't doing a lot of postprocessing: recording from
analogue using their soundcard/etc, uploading the clips to YT, YT does the
transcoding but isn't changing levels. It may be people are using line in
inputs from analogue sources and not setting levels correctly, I don't know.

Post by J. P. Gilliver
3. (This is the one that finally prompted me to post.) Far more often
than not - I'd say over 90% of tracks - there's a visible (I can no
longer hear that high) tone around 15½ kHz. I presume in the majority of
cases, it's timebase - 15625 for "PAL" (yes I know, but YKWIM), 15750
for NTSC; even where it's not from an actual video source, I presume it
has picked it up somewhere in the processing, e. g. from a computer
monitor/graphics card. This is _not_ what's puzzling me. What is, is
that the spectrum is very often brickwalled at that line: even where the
actual valid material is all below 15, 12, 11, 10, 8, or 6 kHz (you'd be
surprised how much _does_ have nothing valid above those!), and the
remainder is just uniform noise - it cuts off at the line. Can anyone
think why? It's nowhere near the Nyquist limit of 22050; I could
understand a rolloff _towards_ that to avoid aliasing, and that rolloff
being gentle to avoid other adverse effects, but no, it's brickwalled,
and at the line (which is _well_ below).

Maybe YT have a filter to block timebase frequencies? In the early days
there was a lot of material uploaded from VHS (pre-2010 YT videos are often
240p or similar), which I suspect is where it's coming from on your
examples. I wouldn't be surprised if the timebase leaked onto the audio
track, but contemporary VHS hardware couldn't play it back so people didn't
notice. They might do today, hence a reason to filter it out. And, as
these mega-platforms go, it's easier to have a one-size-fits-all policy than
to do any tailoring to the input material.

Theo

J. P. Gilliver

2023-08-04 14:06:16 UTC

Permalink

Post by Theo

I suspect people aren't doing a lot of postprocessing: recording from
analogue using their soundcard/etc, uploading the clips to YT, YT does the

I suspect you're right there ...

Post by Theo
transcoding but isn't changing levels. It may be people are using line in
inputs from analogue sources and not setting levels correctly, I don't know.

... and there.

Post by Theo

But that's the point: (A) I'd say 40-60% of clips _do_ have the timebase
whistle, or at least _some_ peak between 15 and 16 kHz, so it's _not_
being filtered - and (B) the _rest_ of the content is brickwalled _at_
that tone. In other words, there is content (often mostly noise) _up_ to
that tone, and zero above it:

/\/\/\/\/\ |
---------|
|
|________

Where /\/\ is the meaningful content, ----- is just noise, | is the
tone, and _ is the nothing, up to Nyquist.

It was/is the brickwalling that I find puzzling. Sure, if there had been
a _notch_ around the two timebase frequencies, or a rolloff starting
_below_ them. But brickwalling _at_ (but not including!) the tone seems
very odd.

Have a look at some.

--
J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)***@T+H+Sh0!:`)DNAf

After all is said and done, usually more is said.

Brian Gaff

2023-08-05 09:19:06 UTC

Permalink

If you have an mp3 file, then mp3gain can change the levels without another
pass through encode decode making the sound lumpy and gritty, as it seems
its phase and levels that lossy encoding affects mostly.
Brian
--
--:
This newsgroup posting comes to you directly from...
The Sofa of Brian Gaff...
***@blueyonder.co.uk
Blind user, so no pictures please
Note this Signature is meaningless.!

Post by Theo

Post by J. P. Gilliver
3. (This is the one that finally prompted me to post.) Far more often
than not - I'd say over 90% of tracks - there's a visible (I can no
longer hear that high) tone around 15œ kHz. I presume in the majority of
cases, it's timebase - 15625 for "PAL" (yes I know, but YKWIM), 15750
for NTSC; even where it's not from an actual video source, I presume it
has picked it up somewhere in the processing, e. g. from a computer
monitor/graphics card. This is _not_ what's puzzling me. What is, is
that the spectrum is very often brickwalled at that line: even where the
actual valid material is all below 15, 12, 11, 10, 8, or 6 kHz (you'd be
surprised how much _does_ have nothing valid above those!), and the
remainder is just uniform noise - it cuts off at the line. Can anyone
think why? It's nowhere near the Nyquist limit of 22050; I could
understand a rolloff _towards_ that to avoid aliasing, and that rolloff
being gentle to avoid other adverse effects, but no, it's brickwalled,
and at the line (which is _well_ below).

Brian Gaff

2023-08-05 09:14:53 UTC

Permalink

I think you expect too much of people. Most of those with footage tend to do
what they always do and just put that up. If you really wanted to do the
processing its easy enough, but most just don't bother. I tend to know if I
get anything off line be it podcast, Youtube or whatever and use Goldwave
to fix stuff and resave it at the bit rate I like. If you want to hear brick
wall filtering, any pop concert recently recorded by the bbc is like that.
After all FM never had anything above 15khz except noise most of the time.

I have a few custom effects saved in Goldwave, one is superfast gain
update, which effectively compresses the dynamic range. There are a few
that only compress peaks, and some for matching the upper levels without
clipping. I also have quite a few parametric settings like presence reduce,
and tizzy enhance for those under the blanket recordings. There are very
light touch noise reductions from cclipboard as well, and some custom
pop/click ones to clean up crackles.

I also made a wide spatial stereo one which can enhance some stereo live
recordings a lot, and do not cause the distortion in those in the stereo
centre.

Brian
--
--:
This newsgroup posting comes to you directly from...
The Sofa of Brian Gaff...
***@blueyonder.co.uk
Blind user, so no pictures please
Note this Signature is meaningless.!

Post by J. P. Gilliver
OK, not strictly broadcast, but YouTube is almost a broadcast channel.
I use yt-dlp a lot, usually on default settings (which AIUI usually gets
the best available), and then an extractor set to "extract original audio"
(I use Pazera, as it's easy to be sure it's extracting original without
any further transcoding; however, it's just ffmpeg-based, and I presume
any other similar would yield the same result). [Yes, I looked into using
the audio-only settings for yt-dlp, but they didn't easily lend themselves
to batching; besides, I sometimes _do_ want the video too - clip of an
artist performing, and I want the audio-only one for use in the car. My
muscle memory of the keystrokes to extract the audio means I can do it in
seconds anyway.] I usually look at the resultant audio - sometimes with
the intention of reducing the filesize, sometimes just out of curiosity.
(I use GoldWave, but I presume almost any other similar utility - such as
Audacity - would yield similar results.)
1. The _vast_ majority are coded at 44100 Hz, stereo. I suppose that - "CD
quality" - is the default setting for many capture/encoding devices, but
it does seem overkill for mono material, especially of considerable age
(such as from 78s). Still, I'm not surprised. (I very occasionally find
one that _has_ been encoded mono. Though I don't think I've seen any
encoded at less than 44100 - certainly if I have, it's been extremely
rare.)
2. The _level_ is often extremely low - especially for some old (say
1960-1999) video clips. (Not all, by any means - but often enough to be
noticeable.) By low, I mean I have to boost them by ×4, or even ×8 or
occasionally ×16, to get the peaks above 50% full scale. (I only use
powers of 2 to avoid distortion.) Is this something YouTube are imposing?
Is audio level adjustment difficult on some common piece of video capture
hardware/software? I even came across one recently where the uploader
_said_ something like "this is quiet, you may have to adjust" in the
notes, so s/he knew about it. This does seem odd.
3. (This is the one that finally prompted me to post.) Far more often than
not - I'd say over 90% of tracks - there's a visible (I can no longer hear
that high) tone around 15œ kHz. I presume in the majority of cases, it's
timebase - 15625 for "PAL" (yes I know, but YKWIM), 15750 for NTSC; even
where it's not from an actual video source, I presume it has picked it up
somewhere in the processing, e. g. from a computer monitor/graphics card.
This is _not_ what's puzzling me. What is, is that the spectrum is very
often brickwalled at that line: even where the actual valid material is
all below 15, 12, 11, 10, 8, or 6 kHz (you'd be surprised how much _does_
have nothing valid above those!), and the remainder is just uniform
noise - it cuts off at the line. Can anyone think why? It's nowhere near
the Nyquist limit of 22050; I could understand a rolloff _towards_ that to
avoid aliasing, and that rolloff being gentle to avoid other adverse
effects, but no, it's brickwalled, and at the line (which is _well_
below).
--
I admire him for the constancy of his curiosity, his effortless sense of
authority and his ability to deliver good science without gimmicks.
- Michael Palin on Sir David Attenborough, RT 2016/5/7-13

J. P. Gilliver

2023-08-05 11:01:16 UTC

Permalink

Post by Brian Gaff
I think you expect too much of people. Most of those with footage tend to do
what they always do and just put that up. If you really wanted to do the
processing its easy enough, but most just don't bother. I tend to know if I

Indeed.

Post by Brian Gaff
get anything off line be it podcast, Youtube or whatever and use Goldwave
to fix stuff and resave it at the bit rate I like. If you want to hear brick

Actually, I rarely want to _change_ the sound of things I have
downloaded - I just like to _look_ at them out of curiosity. (I do
resave at lower bit rates *and sample rates* if they're fundamentally
far too high, as it just offends me to have something that I can see is
mono saved as stereo, or that has nothing above 8 or 10 kHz saved at
44100.) I _used_ to do it to make smaller files; nowadays the cost of
storage is so low that that's not that important, though I still do it.

(I'd always understood that the algorithms look at stereo difference,
and if there isn't much, should produce a smaller file - or, you can
specify a lower bitrate and still get the same quality; however,
manually telling it to encode as mono if I can see it's mono anyway,
seems to more or less half the filesize, so that aspect of data
compression isn't having that much effect.)

Post by Brian Gaff
wall filtering, any pop concert recently recorded by the bbc is like that.
After all FM never had anything above 15khz except noise most of the time.

You'd think they'd set the brick wall to remove timebase whistle, if
that's the case, though.

Post by Brian Gaff
I have a few custom effects saved in Goldwave, one is superfast gain
update, which effectively compresses the dynamic range. There are a few
that only compress peaks, and some for matching the upper levels without

Doesn't the built-in "Maximise" function do that? (I use it to _assess_
the maximum level [and find when it is if not obvious], but I cancel it,
as it would involve a non-binary gain adjustment.)

Post by Brian Gaff
clipping. I also have quite a few parametric settings like presence reduce,
and tizzy enhance for those under the blanket recordings. There are very
light touch noise reductions from cclipboard as well, and some custom
pop/click ones to clean up crackles.

You're obviously a more sophisticated user than I am. I think the only
ones I've custom-saved are 11 kHz and 5500 Hz brickwalls that I use if
I'm going to half or quarter the sample rate of something that has noise
(but no significant signal) above those (to avoid aliasing it down), and
a ×4 gain.
[]
I mainly use it just to look, rather than change, other than the
recoding as mono or lower sample rate.

(As for the purists who say _any_ recoding is further distortion - I
accept this in theory, but on the whole find one such produces nothing I
can hear; if I decide I want to apply some subsequent adjustment, I go
back to the original file. Same as jpg images. When I'm trying things
out in GoldWave, of course, I work in its memory where it's not
encoded.)

--
J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)***@T+H+Sh0!:`)DNAf

Science isn't about being right every time, or even most of the time. It is
about being more right over time and fixing what it got wrong.
- Scott Adams, 2015-2-2

John Williamson

2023-08-05 12:16:14 UTC

Permalink

Post by J. P. Gilliver
You'd think they'd set the brick wall to remove timebase whistle, if
that's the case, though.

What frequency of line whistle, though? On the current video streaming
services, line whistle may be as low as 10 kHz or as high as 20 kHz,
depending on the original video standard. While the high end isn't
likely to be a problem, the lower end may mask wanted signals.

It's not a new problem, one very famous album, much admired for its
audio quality, has a constant scan whistle on most of the tracks from
the monitors used on the studio computers.

--
Tciao for Now!

John.

J. P. Gilliver

2023-08-05 22:04:26 UTC

Permalink

Post by John Williamson

Post by J. P. Gilliver
You'd think they'd set the brick wall to remove timebase whistle, if
that's the case, though.

Most of the material I'm interested in is from say late 1950s up to
about turn of the century - almost all SD video, 15625 or 15750 Hz. (I
think anything early enough to have been on film - or system A [10125
Hz, which I do remember!] - has probably been converted to SD video well
before it got to YouTube.)

Post by John Williamson
It's not a new problem, one very famous album, much admired for its
audio quality, has a constant scan whistle on most of the tracks from
the monitors used on the studio computers.

At the other end, I remember an LP of Bob Newhart I borrowed from the
library - late '70s or early '80s - had very noticeable mains buzz,
presumably noticeable to me as being harmonics of 60 Hz, not of 50 which
I'd probably developed a comb filter for.
(Was the whistle on the album in question 15xxx Hz, or higher?)

--
J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)***@T+H+Sh0!:`)DNAf

I hope you dream a pig.

David Paste

2023-08-21 16:00:21 UTC

Permalink

AFAIU the MP4 (-f 140 is generally the highest quality in that
codec) audio downloads have a frequency roll-off at around that
frequency (16kHz iirc). This can be seen in audacity using the
spectrogram setting; right-click on the vertical kHz bar and
select "zoom to fit"). I have started to download audio in OPUS
(-f 251) as it doesn't seem to have this roll-off. Then
batch-converting files with FFMPEG (thank you to those in here
who helped me with this in the past!) to MP3 for wider
compatibility with my various media players.

If I am downloading videos (usually music videos) I use
the -f 140 + (whatever 1080 in h.264 video is, I can't remember
offhand) as this offers best compatibility with media players in
terms of video.

Observations: MP4 audio is 44.1 kHz whilst OPUS is 48 kHz. The
MP3s that FFMPEG create retain the 48 kHz sampling rate, and I
have had no compatibility problems with any of my players.

I will include the data from MediaInfo if you are interested. (I
also like the way YT-DLP includes the YT address ID as part of
the file name. Very handy!). I hope I haven't wasted your time
with this reply.

MP4 file:

General
Complete name : C:\David's really great computer\yt-dlp\MP4 Kali Uchis – telepatía [Official Audio] [Dwzk-XZxZ4k].m4a
Format : MPEG-4
Format profile : Base Media
Codec ID : isom (isom/iso2/mp41)
File size : 2.47 MiB
Duration : 2 min 40 s
Overall bit rate mode : Constant
Overall bit rate : 129 kb/s
Writing application : Lavf60.5.100

Audio
ID : 1
Format : AAC LC
Format/Info : Advanced Audio Codec Low Complexity
Codec ID : mp4a-40-2
Duration : 2 min 40 s
Bit rate mode : Constant
Bit rate : 128 kb/s
Channel(s) : 2 channels
Channel layout : L R
Sampling rate : 44.1 kHz
Frame rate : 43.066 FPS (1024 SPF)
Compression mode : Lossy
Stream size : 2.45 MiB (99%)
Title : ISO Media file produced by Google Inc.
Language : English
Default : Yes
Alternate group : 1

OPUS file:

General
Complete name : C:\David's really great computer\yt-dlp\OPUS Kali Uchis – telepatía [Official Audio] [Dwzk-XZxZ4k].webm
Format : WebM
Format version : Version 4
File size : 2.57 MiB
Duration : 2 min 40 s
Overall bit rate : 134 kb/s
Writing application : google/video-file
Writing library : google/video-file

Audio
ID : 1
Format : Opus
Codec ID : A_OPUS
Duration : 2 min 40 s
Channel(s) : 2 channels
Channel layout : L R
Sampling rate : 48.0 kHz
Bit depth : 16 bits
Compression mode : Lossy
Language : English
Default : Yes
Forced : No

J. P. Gilliver

2023-08-23 23:38:07 UTC

Permalink

Post by David Paste

AFAIU the MP4 (-f 140 is generally the highest quality in that
codec) audio downloads have a frequency roll-off at around that
frequency (16kHz iirc). This can be seen in audacity using the

That would explain it. Yes, sometimes there is a trace of something
between (what I assume is) the timebase signal and 16 kHz, i. e. just
above the solid line.

I guess it's just coincidence that the MP4 designers chose a cutoff so
close to (but above) the timebase line.

Post by David Paste
spectrogram setting; right-click on the vertical kHz bar and

(I use GoldWave - I'd probably be using Audacity, but I bought Goldwave
before Audacity became common and free, and am used to it - and that has
a spectrogram option. [I usually have one channel set to spectrogram,
and the other set to X-Y so I can see immediately whether it's stereo or
not.])

Post by David Paste
select "zoom to fit"). I have started to download audio in OPUS
(-f 251) as it doesn't seem to have this roll-off. Then
batch-converting files with FFMPEG (thank you to those in here
who helped me with this in the past!) to MP3 for wider
compatibility with my various media players.

I'll admit I mostly just download with yt-dlp's default and extract the
"original" audio, which nearly always comes out as [encoded as!] 44.1
kHz stereo - very occasionally 48 kHz, equally rarely mono (44.1 I
think).
[]

--
J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)***@T+H+Sh0!:`)DNAf

Old soldiers never die - only young ones