If you read audio websites often, you’ve surely seen discussion about whether or not measurements are important in audio reviews. Unfortunately, few of the people writing about this topic have experience in audio measurement, and their comments rarely amount to anything more than excuses for why they don’t do measurements. Because measurement is such a big part of SoundStage!’s group of websites, and SoundStage! Solo in particular, I thought it important to explain why we do measurements, and what conclusions you should draw -- and not draw -- from them.

The reason we do measurements is that a subjective audio review cannot present a comprehensive, unbiased evaluation of an audio product. It’s one writer’s opinion, almost always formed after casual, sighted listening sessions. A subjective review reflects not only the sounds that reached the writer’s eardrums, but also the writer’s presuppositions about the product category, the brand the product wears, and the technology the product uses; the writer’s relationship with the manufacturer and/or the public relations person; and the writer’s concerns about what readers, other writers, and other manufacturers will think of the review. It can also be affected by the writer’s mood, the music chosen for listening, even the time of day during which the writer performed the evaluations and wrote the review. How much do these factors affect the review? We don’t know -- and neither does the reviewer, unless he possesses a depth of self-knowledge that the Buddha would envy.


These problems could be eliminated by blind testing, but for reasons I’ve discussed elsewhere, almost no audio writers do blind testing. So what we have are mostly subjective reviews that include only the writer’s reactions to a product. These reviews can be entertaining to read as a sort of audio travelogue, but because there’s no attempt to correlate the writer’s judgment with anyone else’s judgment, or with any objective standards, these reviews provide, as famed audio researcher Floyd Toole says in the new 3rd edition of Sound Reproduction: The Acoustics and Psychoacoustics of Loudspeakers and Rooms, “just stylish prose and opinion.”

A better method is to include performance measurements of the product being tested. Measurements provide a practical way to get beyond a writer’s opinion and provide a more comprehensive and less biased evaluation of a product.

Many audio reviewers say they reject measurements because music is about emotion, and that measurements can’t gauge emotion. The audio writer, they suggest, can gauge the emotion of a certain piece of music played through a certain piece of audio equipment, and the presumption is that the reader will share his emotional reaction to this experience. But our emotional reactions to music incorporate all sorts of influences, many of which I cite above, and it’s hubristic for any audio writer to assume your emotional reaction to a certain piece of music played over a certain system at a certain moment will correlate with his. I find it insulting when audio writers -- few of whom have demonstrated deep knowledge of audio engineering, scientific research, physics, or music -- presume that their emotional reaction to a piece of music played through a certain piece of gear will be the same as mine.

Contrary to the beliefs of many audio reviewers, measurements tell us much more about how well a component conveys the emotion of a piece of music than their opinions can. What the critics of measurement fail to realize is that the key measurements of speakers and headphones are interpreted by how they relate to the preferences of real listeners established through extensive blind testing. Measurements allow us to gauge a product against the opinions of dozens or hundreds of listeners, formed in conditions where bias is minimized or eliminated. This is vastly more useful than gauging a product against one reviewer’s opinion, formed in uncontrolled, casual testing with no attempt to eliminate bias.

Research in correlating measured performance with listener responses dates back at least to the 1980s. Here’s how the process generally works. The researcher brings in numerous listeners -- with a preference for trained listeners experienced at evaluating audio products -- to listen to samples of a wide variety of audio products in a particular category, and pick their favorites. The researcher then performs measurements of the products to see which measurements predict the listener impressions and which ones don’t. A target response is created based on the listeners’ comments and the responses of the listeners’ favorite products, and then the target response is tested against listener perceptions to confirm its validity.

In these studies, researchers are typically able to develop measurements that predict listener preferences with impressive accuracy. For example, in their 2017 paper “A Statistical Model That Predicts Listeners’ Preference Ratings of In-Ear Headphones: Part 2 -- Development and Validation of the Model,” researchers Sean Olive, Todd Welti, and Omid Khonsaripour report a correlation of 0.91, with 1.0 being perfect correlation. What’s the correlation between subjective reviews and listener preferences? To my knowledge, no magazine or website has tested this, or published the resulting data.

Note that I’m mostly talking about frequency response measurements of loudspeakers and headphones. I’ve also found excellent correlation between my headphone isolation measurements and listener perception of the leakage of outside noise into the headphones and earphones, using listeners from the staff of Wirecutter (a website that tests headphones and many other products) as my test subjects, and playing a recording of airplane cabin noise through my surround-sound system.

The correlation between other measurements and listener perception is not as well established. Distortion measurements predict listener perception only in fairly extreme cases. Spectral decay, or waterfall, measurements have yet to be well correlated with listener perceptions, but they are interesting to look at and they often correspond with frequency response measurements, so I include them. Impedance and sensitivity measurements tell you little or nothing about the sound quality of headphones or speakers, but they are important for assuring that a set of headphones can deliver optimum performance with the amplifier or source device you use.

You may be wondering why I haven’t mentioned measurements of audio electronics, such as amplifiers, preamps, and DACs. That’s because the numerous papers on the subject from the Audio Engineering Society’s E-Library show at best a tenuous and slight correlation between measurements of electronics and the results of blind listening tests. Listeners are only rarely able to consistently distinguish between these products in blind tests, and even when they can, the preferences among multiple listeners are usually too varied and mild to be meaningful. Without reasonable consistency in listener preferences, there’s nothing with which the measurements -- or the impressions of a subjective reviewer -- can be correlated.

However, listeners can distinguish among these devices when they exhibit significant flaws, such as high levels of distortion or large deviations in frequency response, and measurements can easily and reliably detect these flaws. Some of these products also have idiosyncrasies, such as high output impedance or low maximum output, that affect how well they’ll work with the other products in your system. Thus, it’s important to measure these products to see if they have any flaws, characteristics, or limitations that might affect your experience with them.

Measurement system

I certainly understand why most audio publications avoid measurements. I think most audio engineers would agree that it takes at least a couple of years’ experience to become proficient in any one measurement, plus incalculable hours to actually run the measurements and analyze the results. It’s also costly: while there are a few good, affordable audio measurement systems, most cost somewhere between $3000 and $30,000. And of course, audio measurement demands more commitment, passion, and effort than most people would prefer to devote to such a dense and challenging subject. It’s much easier to pour yourself another glass of scotch and deride the measurement guys as “enemies of poetry, love, and humanistic culture.” But if that’s all the writer is willing to do, they won’t be able to provide information that can predict how the reader -- as opposed to just the reviewer -- will like the product in question.

. . . Brent Butterworth
This email address is being protected from spambots. You need JavaScript enabled to view it.

Say something here...
Log in with ( Sign Up ? )
or post as a guest
People in conversation:
Loading comment... The comment will be refreshed after 00:00.
  • This commment is unpublished.
    headphoneryan · 2 years ago
    @Brent Butterworth you wrote something? Checking
  • This commment is unpublished.
  • This commment is unpublished.
    Brent Butterworth · 2 years ago
    @todd Thanks, Todd!

    On your first question, yes, it applies to Class D amplification, too. I've searched the Internet and the AES e-library and I can't find any research where controlled testing showed an audible difference between Class AB and Class D amps. There are anecdotal reports, but none TMK where the testing was blind. That said, I don't see the need for Class D in applications where power demand is low and power consumption/heat dissipation is not a problem.

    On your second question, an external DAC/headphone amp doesn't hurt, and many are very affordable and will have lower output impedance and higher max output power than the DAC/headphone amps built into computers. I assisted on an extensive test of those a while back: https://thewirecutter.com/reviews/best-portable-headphone-amp-with-built-in-dac/
  • This commment is unpublished.
    todd · 2 years ago
    RE: "...at best a tenuous and slight correlation between measurements of electronics and the results of blind listening tests...."

    1. would this apply to class D amplification as well, or do you recommend sticking with class AB amplifier topography?
    2. would this apply to the DACs in low end computers or would you recommend an external dac for computer audio?

    love your work Brent , thanks for the measurements

    p.s. awesome to have the likes of floyd toole making comments

  • This commment is unpublished.
    headphoneryan · 2 years ago
    @Brent Butterworth thank you! I am looking forward. FR is the most relevant for me. What's right? I do not care about isolation, etc. Ryan
  • This commment is unpublished.
    Brent Butterworth · 2 years ago
    @headphoneryan That's a great idea. Look for it on March 1.
  • This commment is unpublished.
    headphoneryan · 2 years ago
    A suggestion for you: please write something about how to read the measurements you take. I try but I do not understand most I see. FR is even hard.

  • This commment is unpublished.
    Brent Butterworth · 2 years ago
    @Joseph Yes, they have target curves for earphones and over-ear headphones now. My guess is that they wouldn't call them "finished," but their validity has been established through blind testing.
  • This commment is unpublished.
    Joseph · 2 years ago
    "A target response is created based on the listeners’ comments and the responses of the listeners’ favorite products"

    Is this so called target finished? I read the Innerfidelity article about Harman. Is the work completed?

    Joseph Tan
  • This commment is unpublished.
    Ian Colquhoun · 2 years ago
    @Brent Butterworth It has been so refreshing to read this article simply because it needs to be said. We use an entire suite of measurements taken in our anechoic chamber to create a Listening Window and Sound Power curve using our algorithm to average them. Ultimately the final test a speaker must pass is the double-blind listening test and the results of this testing will generally mean adjustments to the response because of the high audibility of very low Q artifacts that are difficult to visually identify. But I can say with certainty that the correlation between the results derived from the “Spinorama” and the results from the double-blind listen testing are very real, and large variations from these results will result in a guaranteed loser in the double-blind listen testing. It is worth noting that other factors, like power capabilities and overall bandwidth, play a large roll in overall product performance and can affect the results of a double-blind listen test. But those measurements are not the discussion here, and they tend not to be haunted by a disbelief that they need to be measured.

    Ian Colquhoun
    Axiom Audio
  • This commment is unpublished.
    Brent Butterworth · 2 years ago
    @Gary Eickmeier It's all in the book, in depth, supported with research.
  • This commment is unpublished.
    Gary Eickmeier · 2 years ago
    @Brent Butterworth I am still making my way through the book, notepad in hand, so I will shut up until I see the chapter about what I call "The Big Three." Mark Davis has said that what we hear in a loudspeaker are the frequency response and the radiation pattern. I have added speaker positioning and the reflective qualities of the walls around them. There are so many variations of these factors it is near impossible to characterize the "sound" of a speaker with simple frequency response measurements of the direct sound output and waterfall plots that indicate the width of the "spray" of the direct field. I know that Floyd touches upon my work on p 403 but there is some misunderstanding in there that I have written to him about. What I am waiting to see in the book are measurements that can correlate The Big Three (radiation pattern, speaker positioning, and acoustics of the room) to perception. Those are what is audible about speakers.
  • This commment is unpublished.
    Sean Olive · 2 years ago
    @Sean Olive One of the things we have done to more accurately measure leakage effects using our GRAS 45 CA is to develop pinnae that represent typical leakage on humans.. Todd Welti did a cool study where he measured a number of headphones on 10 different listeners to establish how they leaked. He then developed a new pinnae that matches the average leakage of the headphones measured on subjects... The paper is here: http://www.aes.org/e-lib/browse.cfm?elib=17699

    The next challenge is finding 1 or more heads that represent a range of leakage found on a distribution of head sizes. The size/shape of the head is usually more of a factor on headphones with larger size cups.
  • This commment is unpublished.
    Sean Olive · 2 years ago
    @Doug Schneider In all of the later studies we virtualized all the headphones over a single pair by simulating the measured magnitude and minimum phase response. The correlation between actual and virtualized headphone based on subjective ratings was greater than r = .90. For controlling leakage in IE headphones we put a MEM mic inside the replicator IE and measured any leakage. For AE/OE virtualizing we used an open-back headphone that had very repeatable response within and between different listeners.
  • This commment is unpublished.
  • This commment is unpublished.
    Floyd Toole · 2 years ago
    @David Blumenstein There is no "guilt" in being a romantic when it comes to music - the arts. When emotions get transferred to hardware it can be problematic. I conducted my first blind listening test on loudspeakers in 1966 - 52 years ago. As a research scientist I was intrigued when even simple anechoic measurements showed a clear relationship to the sound quality ratings. This was the first step in a process that has occupied my professional life. I sum it up as "science in the service of art". After all, the objective of HiFi - high fidelity - was and is to deliver the art as nearly as possible as it was created. When uncertainties about the performance of the hardware are reduced, then what is heard is more directly attributable to the performing artists and the recording and mastering engineers in the creative process. My attitude is that this is where debates should be. It is elaborated on in detail in my book, but is nicely summarized in a tutorial lecture on YouTube, "Sound Reproduction; Art and Science/Opinions and Facts: https://www.youtube.com/watch?v=zrpUDuUtxPM&list=FL8EhjwAiBJi_scYd0WgC12g&t=0s&index=7&frags=pl%2Cwn

    Having just completed an significant upgrade to my own entertainment system - making it as "neutral" , as transparent as is possible - I am rewarded by many stunningly good recordings, as well as chagrinned by those that suffer from apparently flawed recording apparatus, techniques or judgment. With such a system one can lay credit and blame where it belongs, without having to wonder what, if any, influence one's hardware has. It is the art that puts the smile on my face, not the hardware, even though I respect the engineering excellence embedded in it and the science that guided those efforts.

    One benefit of understanding the science is that high quality, accurate, sound reproduction becomes available at lower price levels. I don't understand why this is a problem. That is why publications and manufacturers that show accurate measurements are part of the solution for consumers.
  • This commment is unpublished.
    David Blumenstein · 2 years ago
    @Doug Schneider I am GUILTY...guilty of being an insufferable romantic. I grew up from my early teens with HiFi - 40+ years now - and it will always be an intrinsic aspect of my life. Intellectually, academically, in my head I 'grok" the need for measurements and statistics. I use them to guide me as a filter when in the market for gear. I want the targets of my desire to be able to deliver on tangible needs, That being said and filtered, its all about the intangibles, what I feel in my heart, which gets it pounding, gets me motivated to search out products/services which I can write about and share with the HiFi community at large.

    To the fears I have enumerated:

    1 I am scared that when for me, HiFi becomes solely a mental exercise, It will have lost its lustre
    2 I am scared that the wondrous "hot stove league" (which takes place between baseball seasons, where compelling rumination takes place) aspect of HiFi discussions will be stymied if and when objectivity metaphorically slams the door. For baseball fans - SABR, while intriguing is not definitive.
    3 I am scared that if indeed HIFi is ruled by graphs and measurements then the industry (sic) which rests upon subjective "shoulders" could very well implode as manufacturers shut down as their competition with product sporting comparable if not equal measurements is that much more affordable.

    It is not a zero-sum game. It should never be. I posit that without the objective/subjective debate it is game-over for HiFi as we know it.
  • This commment is unpublished.
    Floyd Toole · 2 years ago
    @Doug Schneider It is in the book; pages 137-142, where I describe Sean's subjective/objective correlations. There are two AES preprints referenced.
  • This commment is unpublished.
    Brent Butterworth · 2 years ago
    @Floyd Toole Thanks for jumping in, Floyd! I wish more people would read your book. Their understanding of audio -- REAL audio, not made-up quasi-religious audio -- would go up tenfold. The 3rd edition is very enjoyable and informative, it's more like a collection of a lot of really great magazine articles than a textbook.
  • This commment is unpublished.
    Doug Schneider · 2 years ago
    @David Blumenstein Interesting point of view -- and some very good points. Where do YOU stand on it?

Latest Comments

As a long-time audiophile with roots in high-end audio, including manufacturing, I couldn’t agree more ...
The writing is just so much spot on. And I am saying this as a ...
@todd fettermanWe did it with headphones, at least! I might do something on my speaker, subwoofer ...
@Brent ButterworthYes, of course, there are others, doing measurements and I appreciate you turning me onto ...
@todd fettermanThank you so much, Todd! But Soundstage deserves a lot of credit, they've been doing ...
@Rudi I think TapeOp is one of the best magazines I have ever read. Agreed that ...
@Chris LaunderKudos to Harry Pearson for conceiving such an appealing idea and message. But his construct ...
Excellent piece. Somebody had to come forward and say the emperor has no clothes. Its ...
This is a good piece and, while I largely agree with it, I think putting ...