18 Comments

You can’t get far into an audio forum or the comments sections of audio websites without encountering the statement “Some products that measure well sound bad, and some products that measure poorly sound good.” Depending on who said it, it’s at best uninformed and at worst a lie. And it’s a lie that sometimes sticks listeners with underperforming audio gear.

The inaccuracy of the first half of that sentence can be shown in scientific papers and in the absence of documented examples. The second half could be true if the words “. . . to me” were added, but to the best of my recollection, I’ve always seen it presented as a universal statement, in which case it’s false. This platitude reflects not wisdom, but a rejection of science by people who, as far as I can tell, haven’t bothered to look into the science and have no measurement experience.

The Biggest Lie

One of the most glaring examples of this sentiment appeared just this month, in a review of the Tannoy Revolution XT 6 speaker by Herb Reichert in the July 2020 issue of Stereophile. The first sentence of the review reads, “I’ve been wrestling with my elders about new ways to measure loudspeakers, lobbying for methods that might collaborate [sic] more directly with a listener’s experience.” In another article, the same writer states his opinion more directly: “As a tool for evaluation, or as a predictor of user satisfaction, today’s measuring procedures are almost useless.” As we’ll see, this review clearly shows why measurements are so essential in the evaluation of audio products.

Both of the author’s statements reflect ignorance of the subject. In the case of speakers, measurement methods that have been shown to predict user satisfaction with 86% correlation were established more than 30 years ago. They were developed largely through extensive research led by Dr. Floyd Toole, conducted at Canada’s National Research Council (NRC) in Ottawa, and continued at Harman International. Countless speaker companies now use these methods as a design guideline. That’s because they know that speakers that measure well according to these principles will sound good to most listeners.

Some might point out that the model fails 14% of the time, but it’s unlikely that the 14% of speakers that measure well but didn’t win universal love from the listening panel sound “bad,” unless they have, say, high distortion -- which a different set of measurements could easily detect. Regardless, it’s absurd to proclaim an 86% success rate “almost useless.”

More recently, scientific research has produced headphone and earphone measurements that predict user satisfaction about as accurately. For example, in AES paper 9878, “A Statistical Model that Predicts Listeners’ Preference Ratings of In-Ear Headphones: Part 2 -- Development and Validation of the Model,” a Harman International research team of Dr. Sean Olive, Todd Welti, and Omid Khonsaripour report a 91% correlation between measurements and listener preferences in an evaluation of 30 earphones using 71 listeners.

AES 9878

I’ll agree that measurements don’t predict which amps, DACs, and other electronics people will like. But that’s not because of flaws in the measurements -- it’s because listeners rarely agree about which audio electronics they like. Blind tests seldom show clear differences between, or preferences for, certain models, brands, or types of amplifiers, for instance. Reviews of these products do not indicate preference trends among reviewers; they tend to rave about all sorts of amps and DACs. If a statistically significant number of participants in controlled listening tests don’t express affection for some audio electronics and disdain for others, there’s no way measurements or subjective reviews can predict listener preference.

What about the idea that “some products that measure poorly sound good”? A solid argument against this notion came from Stereophile technical editor (and former editor-in-chief) John Atkinson, who, in a summary of his 1997 AES presentation, stated, “. . . once the response flatness deviates above a certain level -- a frequency-weighted standard deviation between 170Hz and 17kHz of approximately 3.5dB, for example -- it’s unlikely the speaker will either sound good or be recommended.” And he’s talking here about the speakers recommended by Stereophile writers. Research shows that a panel of multiple listeners in blind tests would likely be even less forgiving of speakers that measure poorly.

AES Atkinson

Of course, even a clearly flawed audio product might sound good to somebody. To find an example, look no further than the very same Tannoy review. Atkinson’s measurements show that, as he puts it, “. . . the tweeter appears to be balanced between 3dB and 5dB too high in level,” which creates an “excess of energy in the presence region, which I could hear with the MLSSA pseudorandom noise signal when I was performing the measurements.”

To get a rough idea of what this sounds like, turn the treble knob on an audio system up by 4dB. It’s far from subtle, and it’s not pleasant. I can’t look at that measurement without thinking the factory used the wrong tweeter resistor. In a blind test with multiple listeners, such as the evaluations conducted by the NRC or Harman, this speaker would almost certainly score poorly.

Yet I find no mention of this flaw in the subjective review. In fact, the reviewer describes the speaker’s sound as “slightly soft,” and concludes with the words “Highly recommended.” Based on this review, at least, it seems likely that if a measurement technique could be found that reliably predicts which speakers this reviewer likes, most listeners won’t like those same speakers.

Fortunately, those who read the measurements got the real story. Those who ignored the measurements because they’ve been told they’re “almost useless” may end up buying a speaker with an obvious tonal-balance error.

Don’t get me wrong -- I don’t mind if someone raves about an audio product with a huge, demonstrable flaw, just as I’d hope no one minds if I occasionally enjoy listening to Kiss’s Alive! album. I’ve read many such reviews, and rarely felt inspired to comment on them. But dismissing decades of work by some of the world’s most talented audio scientists just because it doesn’t fit your narrative is as frivolous as claiming that Gene Simmons is the greatest bass player of all time.

I would hope that audio writers would be curious about their avocation and want to learn everything they can about it, but a huge percentage of them have shut themselves off from any new information that might cast some of their beliefs in doubt. In their rejection of science, they’ve mired their readers and their industry in nonsense -- and in many cases, they’ve stuck their readers in the infinite loop of buying underperforming products and then selling those to buy other flawed products, instead of simply learning key facts about audio so they can buy good gear the first time.

Frequency response curves

I’m encouraged, though, because the headphone community isn’t burdened with an anti-science attitude. On the contrary, headphone enthusiasts are putting together measurement rigs, reading the research, and working to understand how their headphones and amps work and interact. Yet they understand that science provides only guidelines, and that they ultimately have to listen for themselves and trust their ears to make the final judgment. Most important, they are getting better reproduction of, and more enjoyment from, their music. I think and hope that this is the future of audio.

. . . Brent Butterworth
This email address is being protected from spambots. You need JavaScript enabled to view it.

Say something here...
Cancel
Log in with ( Sign Up ? )
or post as a guest
People in conversation:
Loading comment... The comment will be refreshed after 00:00.
  • This commment is unpublished.
    Jim Farrell · 1 months ago
    I agree with your premise up to a point. Rejecting measurements entirely is clearly a scientifically indefensible position to take. Calling it a lie, however, suggests that those who promote it know it is wrong, but say it anyway for some unspecified purpose or personal gain. That is somewhat harsh and requires backing up. And yes, I know you said "at best uninformed" but the title of your article is " The biggest Lie in audio" so you don't get to roll back from that in the text.

    Also if there is an 86% correlation between good measurements and the perception of good sound that is indeed highly significant but means that 14% of people did not hear good sound from well measuring components. So for some people at some time under some circumstances good measuring components can indeed sound bad and it may well be that the opposite is also true.

    The problem lies at the interface between what we measure and what we hear. We only measure a limited number of parameters and these take little cognizance of biological variations between individuals hearing, psychological biases around what sounds good and what doesn't, expectations regarding brands and sources and price, cultural bias involving musical taste and what we grew up with and also, crucially, what we are used to in our own rooms in our own houses. It goes on and on and our currently rather pathetic range of measuring tools cannot even scratch the surface.

    Frankly both absolutist camps are wrong. Absolutists generally are.

    If you want to rely on science then double blind listening setups with a big enough sampling size to account for the psychacoustical issues laid out above is probably the way to go, but it's a hobby for goodness sake, not curing cancer or solving the mysteries of the universe so I doubt that is going to happen. In the meantime everyone in both camps and on the fence can continue to enjoy arguing with each other. Like this.
    • This commment is unpublished.
      Brent Butterworth · 1 months ago
      You raise some good points. One could argue that the writer who rejects (and probably doesn't even read) the science on measurement because it challenges his established beliefs and his identity as an audiophile is not lying in this case. But we are talking about a publication with a technical editor who confirms and supports measurement science in practically every issue, and an editor-in-chief with a master's degree in physics who claims to be conversant with audio measurement science. Yet they are publishing writers who breezily deny the science without making a case against it. I'm comfortable in calling that a lie. They can't have it both ways. One of those statements is false and the editor and technical editor know it.

      You're 100% right that cultural biases, brand identification, etc., can play a big role in one's preference for an audio component. In fact, we can celebrate that. But I think my readers deserve, as much as possible, an unbiased assessment of a component's performance. They already know if they are drawn to some particular brand or technology.

      Our current range of audio measurement tools is quite remarkable, in my opinion; I've owned many of them and used almost all of them. As noted, using these tools we can accurately predict user preference in speakers and headphones, and it's impossible for them to predict user preference in electronics when no such preferences can be discerned in listening tests. A lot of brilliant people worked years to develop those tools, and if you're going to call them "pathetic," then back up your statement with specific criticisms that demonstrate your qualifications to make such a statement.

      Yes, audio is a hobby, but evaluating audio components is my profession, and has been since the early 1990s. I take my gig very seriously and I want to do the best job I can at it. That's why I include measurements.
  • This commment is unpublished.
    todd · 1 months ago
    a well known publication just published a review of a higher end Revel speaker system and published measurements taken in the reviewers listening space.

    If one measures that way are they not measuring the room, not the speaker invalidating the measurements?

    • This commment is unpublished.
      Brent Butterworth · 1 months ago
      Are you talking about Tom Norton's review of the Revel PerformaBe system in Sound & Vision?
      • This commment is unpublished.
        Todd fetterman · 1 months ago
        Yes, that's right.
        • This commment is unpublished.
          Brent Butterworth · 1 months ago
          I can't tell anything from those measurements, unfortunately. If they were averaged over, say, 6 positions, which largely eliminates the effects of room modes, I'd put more faith in them. Those measurements show a +5dB bump at 1.2kHz, which is something I have not seen in any other Revel speaker. It is possible to do useful in-room measurements -- John Atkinson does them for Stereophile, and I often do them as backup for my quasi-anechoic measurements -- but real analysis requires anechoic or quasi-anechoic measurement. If you visit this page and go do to "Speaker Measurements 101," I visit this topic in depth. http://www.brentbutterworth.com/writing.html
  • This commment is unpublished.
    SECA_Alan · 2 months ago
    This article advances a fascinating issue, one that's poorly served by tribalism and is full of nuance. There's certainly no problem enjoying a technically 'less resolved' design and discovering emotional connection with music through it. However that cannot be a basis for comparison. The 'Spinorama' and associated preference ratings are a great way to understand certain design choices, engineering competency and a PIR that accounts for many of the basic building blocks of good sound.

    I thoroughly enjoy the intense discussion, the context and knowledge shared teaches the avid reader a lot.
  • This commment is unpublished.
    Jeanette · 2 months ago
    Great article Brent! Just one of many areas where people are encouraging us to ignore the science to our peril...
  • This commment is unpublished.
    Kevin Voecks · 2 months ago
    Bravo for telling the truth Brent! I should add one observation: People who "like," or even own a particular loudspeaker may not feel the same way about it in a properly conducted double-blind test. I once had three well known audio people participate in a double-blind test in which they did not know the identities of any of the speakers they were evaluating. All three of them owned one of the speakers in the test from a well-known brand, which they all regarded as a true reference. They all rated the very speaker they owned very poorly, disparaging it. It goes to show the power of human prejudices.
  • This commment is unpublished.
    Bob Johnson · 2 months ago
    As someone who used to spend $1,000's every month on magazine ads, everyone forgets the ONLY real purpose of a magazine is to sell advertising! Readers are there only as something to sell to advertisers and set ad rates, the amount of money coming in from subs is negligible compared to ad revenue. Magazine reviews are worthless for decision making, basically they are "payoffs" to advertisers for their advertising dollars. One example, after getting a publisher to agree to a product review, he told me to contact the reviewer to arrange it. I called the reviewer, an older, very respected member of the community, and started talking about shipping arrangements. His response was "hell kid, I don't have time for that crap, send me some pretty pictures and I'll take care of writing the review." Very enlightening.
    • This commment is unpublished.
      Brent Butterworth · 2 months ago
      OK, but I have worked for at least 10 publications that review audio gear, and TMK, what you're describing has happened with only one writer I've worked with -- whom I was in the process of firing when he quit. I've lost plenty of advertising dollars when people didn't like their review or didn't get as many reviews as they liked, and I know many competing publications have, too.
  • This commment is unpublished.
    Todd · 2 months ago
    Question for Brent,

    Brent, I recently heard Andrew Jones remark that it is a misnomer that the majority of speaker damage is caused by amplifier clipping. Rather, he said speaker damage is almost always caused by too much power. I was surprised, and would like to hear your view on this if you care to opine.
    • This commment is unpublished.
      Brent Butterworth · 2 months ago
      Hi, Todd. I really don't know. It would sure be fun to test that, though.
      • This commment is unpublished.
        Mark · 2 months ago
        there are basically two failure modes in loudspeakers: thermal and mechanical. With regards to "clipping" a sure way to fail a loudspeaker is to drive them into mechanical clipping by applying excessive power.
  • This commment is unpublished.
    Dustin · 2 months ago
    Great article, Brent. There was also a fair bit of back and forth on this topic in the comments section in another recent Stereophile article (Totem Skylight speakers). I tried to argue in favour of what the science has demonstrated (username: buckchester). I even got some replies from Jim Austin, the new editor. I was disappointed with his responses. It’s frustrating when so many people in this hobby are so obviously irrational.

    Floyd Toole posts quite often on AVS Forum and he has actually stated that when speakers of similar bass capability were used, the correlation actually increased from 86% t 99%. I can find you the exact quote if you’d like.
    • This commment is unpublished.
      Brent Butterworth · 2 months ago
      I read that thread on their site. He's really digging in on their "telling the reader what components SOUND like" talking point -- as if a single reviewer in uncontrolled conditions, often operating in considerable ignorance of how audio and psychoacoustics work, could give us a definitive assessment of a component's sound.

      I was more appalled, though, by his assertion that he, Atkinson and Kal are all familiar with Toole's work -- which implies that most of his writers doing speaker reviews are not familiar with what is almost surely the most important research on speaker performance to date. This is like staffing an astronomy magazine with writers who aren't familiar with Edwin Hubble's work.
      • This commment is unpublished.
        Dustin · 2 months ago
        I totally agree with all your comments.

        The point I kept trying to make, that went completely unacknowledged, was that if there is a discrepancy between a speaker’s measurement and a subjective sighted review, then it would surely be worthwhile to wonder if sighted bias influenced the review, especially since science has demonstrated this can happen. Instead, they jump to the conclusion that the sighted review is the be and end all explanation of how the speaker sounds, and that the measurements must not be adequately taking into account everything that we hear. To me, the simpler explanation of what is going on hear is so painfully obvious. I don’t understand how they can’t see that. This guy is apparently a trained physicist too. So to see the logic he uses disappoints me even further.

        His comments on the added value that Stereophile can bring to this discussion (which he says is somehow missing from Toole’s work) in evaluating the “emotional communicativeness” of the speakers make no sense to me at all. Let’s see one of his reviewers take a blind test between a speaker they reviewed and a reference speaker that measures well with the Spinorama method (e.g. Revel F208) and let’s see if their impressions remain the same as their sighted review did. In the case of these two particular reviews (the Totem and the Tannoy), I highly doubt they would. And they can take all the time they need. They just can’t know which speaker they are listening too. But they would never do this because they know if would throw their entire model into question.
        • This commment is unpublished.
          Brent Butterworth · 2 months ago
          Agreed, and that's another great point. Why should I care about what emotional reaction some audio writer had to a speaker -- a reaction that's determined at least as much by the writer's mood, the setup of the speakers, the anciliary gear, the chosen music, and the writer's biases for or against the brand or technology as it is by the performance of the speakers? On what basis do they assume the reader would have a similiar emotional reaction? I'll grant that some of these guys are amusing writers, but this is not a serious or useful means of testing.
    • This commment is unpublished.
      Dustin · 2 months ago
      FYI - Quote from Floyd Toole in post #10499:

      "A correlation coefficient of 0.86 is not perfect". True, but some context helps to explain the shortfall. Those same papers explained that bass extension and smoothness accounted for about 30% of the factor weighting in the sound quality ratings. Because the 70 loudspeakers in that test included full range floor standers and small bookshelf units it was obvious that some of the variation was due simply to the differing bass performances - a good speaker with bass beats a good speaker with less bass. When a subset of bookshelf speakers was tested - having similar bass performance - the correlation coefficient was 0.995 - i.e. perfect. So, when "apples" are compared to "apples" in terms of bandwidth, the correlation is truly amazing."

      https://www.avsforum.com/forum/89-speakers/710918-revel-owners-thread-350.html

    • This commment is unpublished.
      Dustin · 2 months ago
      FYI - Quote from Floyd Toole in post #10499:

      "
      Olive, S.E. (2004a). “A multiple regression model for predicting loudspeaker preference using objective measurements: part 1 – listening test results”, 116th Convention, Audio Eng. Soc., Preprint 6113.
      Olive, S.E. (2004b). “A multiple regression model for predicting loudspeaker preference using objective measurements: part 2 – development of the model”, 117th Convention, Audio Eng. Soc., Preprint 6190."

      https://www.avsforum.com/forum/89-speakers/710918-revel-owners-thread-350.html

  • This commment is unpublished.
    Dr. Ears · 2 months ago
    The biggest lie in audio is, "I think your system sounds better than mine".
    I have been buying & selling New Old Stock audio tubes for four decades.
    Whenever I buy a decent size lot, I take the best and worst testing pairs and listen to them, I have never heard an audible difference, so I concluded long ago that whatever we are testing for cannot be heard.
    As components have gotten better with the notable exception of audio vacuum tubes, we can now reproduce a flat frequency curve better than ever.
    However, I believe that most of us find a flat frequency curve to sound harsh with listening fatigue occurring fairly quickly.
  • This commment is unpublished.
    John Mayberry · 2 months ago
    Measurements are important. Yet they are not always definitive and don't necessarily tell a story accurately.

    I remember Dick Heyser and his knuckle test. He'd knock his on the side of a speaker. He said, "if you like that note you're going to love this speaker".

    The simple fact is there are only a few speakers which provide even a passible waterfall response or an impedance measurement without major phase related anomalies, A great many of them are truly a dog's breakfast.

    We still don't have a musical transfer function 50 years after first being postulated.

    That's before we even consider their interaction with the acoustic environment.

    Yes, testing is important. But 99% of the speakers out there don't test well. That may be the gist of the issue.



    • This commment is unpublished.
      Brent Butterworth · 2 months ago
      Hi, John. Much of what you've said is new to me. With waterfall responses, have we determined what "passable" is? Is there research that ties these to blind listening test results?

      Ditto for impedance -- my measurements demonsrate the corrleation of headphone impedance curves with sensitivity to output impedance of the source device, but I don't know of any research that ties speaker impedance curves to listening test results, other than a <4-ohm impedance is more than a lot of amps can handle.

      I measure only about 15-20 speakers a year right now, but I did a lot more when I worked for Sound & Vision. Off the top of my head, I'd guess that a third of them measured pretty well. Maybe even half of them. Of course, those were mostly fairly mainstream products; if you measured all the speakers at a high-end audio show, I expect the percentage wouldn't be as high.

SoundStage! Expert: Sonus faber Olympica Nova Speakers - 1) General Care (February 2020)

SoundStage! Expert: Sonus faber Olympica Nova Speakers - 2) Grille Care (February 2020)

SoundStage! Expert: Sonus faber Olympica Nova Speakers - 3) Cleaning (February 2020)

Latest Comments

Margo Coster 1 days ago Is Chesky Dumping Binaural?
@SimonWhat's your point? He's human?
Brent Butterworth 5 days ago Audeze iSine10 Earphones
@MauroHi, Mauro. I don't think the in-ears in this case add anything in particular relative ...
They are by an obscure brand, but full-range AMT has been done before - https://precide.ch/eng/eergo/ergo2.htm

Not ...
Mauro 7 days ago Focal Elegia Headphones
My take for those that might be interested buying Elegia’s:
They have an incredibly clear midrange ...
Mauro 12 days ago Audeze iSine10 Earphones
Hi Brent,

do these in-ears add something more to the sound compared to over-ear's?
Have you ever ...