Stephen Porges Preliminary Study Summary Report
Affective responses to the acoustic features of sounds from a Polyvagal Perspective
Jacek Kolacza, Gregory F. Lewisa, & Stephen W. Porgesa,b
aDepartment of Psychiatry, University of North Carolina
bKinsey Institute, Indiana University
Acknowledgments: The researchers would like to acknowledge the International Misophonia Research Network, Jennifer Jo Brout, and the 4H Foundation for their support and contribution to this study.
Our nervous system is continuously being stimulated by the acoustic environment in which we live. While we feel calm and safe while listening to some sounds, other sounds alert us to danger or life threat. Some responses to sounds are learned through associations with negative and positive experiences, while others are “hard-wired” into our nervous system. The acoustic features of sounds that trigger these hard-wired reactions has been described in the Polyvagal Theory (Porges, 2011; Porges & Lewis, 2009). Polyvagal Theory proposes that, prior to associative learning, subjective responses to sounds are neurophysiologically and anatomically dependent on features of the acoustic signal such as pitch and variations in pitch. Consistent with the theory, safety is signaled when the pitch of an acoustic signal is modulated (pitch varies across time) within a frequency band in which there are no very low or very high frequencies. The modulation in vocalizations is frequently called prosody and within the context of Polyvagal Theory is assumed to be the vocal conduit that humans use to express positive emotional states. Thus, a monotone signal lacks prosody and is not sufficient to signal safety. In this study, we examined how acoustic properties related to pitch modulation differ among body sounds, natural sounds, and music. In addition, we investigated how specific acoustic features relate to feelings of pleasure and arousal.
The Polyvagal Theory leads to the following four “bio-psych-acoustic” hypotheses:
- Melodic music will have an acoustic signature that triggers feelings of pleasure and calm (low arousal).
- Modulated sounds that share acoustic features with melodic music will trigger a sense of safety.
- Body sounds (or pathogen stimuli, such as coughing and sneezing) will have an acoustic signature overlapping with predator and/or danger signals.
- Sounds with little frequency modulation (i.e., lack of prosody) will trigger an arousing response related to fear or alerting to danger.
In this study we investigated the acoustic features of stimuli from the International Affective Digitized Sound system (IADS) and their relation to subjective responses. The IADS is a database of sounds generated by the Center for Emotion and Attention at the University Florida (Bradley & Lang, 2007). The IADS stimuli were evaluated by raters along two primary dimensions: affective valence (ranging from pleasant to unpleasant) and arousal (ranging from calm to excited). We selected four categories of stimuli that we expected to be closely linked to the evolutionary features of sound signals of safety (social communication), danger (fight or flight), or life threat (predator). We selected a subset of stimuli which had a single identifiable source and were likely encountered by human ancestors prior to modernization, thus being evolutionarily meaningful. The subsets of IADS sounds were placed into the categories of natural sounds (n=34; e.g., robin’s call, baby cooing, female screaming, the sound of a brook) and body sounds that serve as a proxy for pathogen sounds (n=9; e.g., snoring, vomiting, and coughing). We also identified melodic music across a range of styles (n=13; e.g., choir, a rock band, a bugle) to serve as a category reflecting prosodic features.
IADS stimuli were analyzed using the modulation power spectrum (MPS). The MPS is a useful convention for describing the neurologically significant features of acoustic signals (see Singh & Theunissen, 2003). This spectrum is a decomposition of the time-frequency representation of sound (the spectrogram) along two dimensions: the time domain (x-axis) and the frequency domain (y-axis of spectrogram, see Figure 1 for examples of spectrograms and a derived MPS). Two aspects of the MPS can be used to describe the modulation of sounds:
- Separability is a quantitative parameter of the acoustic signal derived from the MPS image, which describes the overall complexity of the image. It is essentially an estimate of how much information would be lost by compressing the image. If the image can be compressed without minimal loss of information, the acoustic signal is described as separable. The following examples can help you understand the concept of separability. Two types of grayscale images could be compressed with very little loss of information: a picture of static on a TV screen, or a picture of black circles on a white background. An image with low separability would be an image of a person’s face taken in front of a house. The first images could be compressed dramatically and still be recognizable, while in the last example, either the house or the face would become unrecognizable. A thorough discussion of the methods used to quantify separability can be found in Singh & Theunnisen (2003), and visual examples of compression are available at http://math.gmu.edu/~sap/U09/m203/SVD/svd.html. In the case of MPS representations of sounds, white noise is almost perfectly separable (like the static on the TV). Natural sounds, including speech, are far less separable. While music, due to its repeated patterns (like the black circles on a white background) is more separable than natural sounds.
- Temporal modulation is the relative energy in the sound that is concentrated in rhythmic changes in intensity over time, compared to the energy from low energy frequency modulation components. Sounds that are highly rhythmic but not prosodic in pitch would be high in temporal modulation.
The MPS variables described above were extracted from the sounds in the IADS database. These features were compared between music, body, and natural sound groups. We then examined the relation of these acoustic features to subjective ratings of pleasure and arousal. Since please and arousal were correlated dimensions in this database, each dimension was statistically adjusted to be independent of each other.
As illustrated in Figure 1, melodic music was rated significantly higher on pleasure than natural and body sounds. Body sounds were the least pleasurable category of sounds.
As illustrated in Figure 2, body sounds were rated lower in arousal compared to music and natural sounds. This suggests body sounds may elicit an evolutionarily ancient defense response, immobilization.
As illustrated in Figure 3, melodic music was distinctly more separable that the other types of sounds. This demonstrates that, compared to other sounds, the prosodic features of music included both temporal (i.e., rhythm) and frequency (i.e., pitch) modulations that created predictable patterns in the MPS.
As illustrated in Figure 4, body sounds were distinguished from the other categories by having greater relative power in their rhythmic changes independent of frequency. Body sounds had significantly less frequency-related modulation than either music or natural sounds.
In the full sample of 165 IADS sounds, regression analyses were conducted to evaluate potential relationships between the acoustic variables (i.e., separability and temporal modulation) and the subjective ratings (i.e., pleasure and arousal). The analyses identified significant correlations in which greater separability was associated with higher pleasure ratings and in which greater temporal modulation was associated with lower arousal ratings.
Body sounds have unique acoustic features and elicit unique subjective ratings. Acoustically, body sounds were the least separable and had the highest temporal modulation. These findings become more intuitive and understandable, once we review examples of sounds that are high and low on separability and temporal modulation.
Music, as a category of acoustic, is highly separable. When we listen to music our nervous system anticipates the rhythmic changes and functionally has the capacity to fill in short gaps. This feature of separability is the psychoacoustic basis for music compression algorithms (e.g. MP3 players). In contrast, body sounds are not acoustically separable. This feature of low separability in body sounds is processed by our nervous system as less predictable and may place us into a behavioral state of hypervigilance. Functionally, this result in many individuals having great difficulty in ignoring the body sounds of others. Since music, in general, is prosodic and separable, both the modulation of frequencies that define prosody and the predictability of the sound sequences in combination signal our nervous system to calm and not to be vigilant. Of course, when music is less predictable and less prosodic, it loses its ability to calm. However, even when music is up-tempo (e.g., dances, marches) it maintains sufficient separability and prosody to maintain prosocial behaviors while arousing and mobilizing. In contrast, the acoustic features of body sounds by lacking prosody and predictability trigger a state of hypervigilance, which would disrupt ongoing tasks and social interactions.
The temporal modulation describes the relative acoustic energy associated with rhythmic changes in intensity (i.e., temporal) compared to frequency modulation (e.g., prosody). The sounds of footsteps is an example of an IADS sound with high temporal modulation. With footsteps there are short bursts of acoustic energy that rhythmically occur within a relatively narrow frequency band. Body sounds, similar to footsteps, frequently have a rhythmic component (e.g., coughing, flatulence, regurgitation, lip smacking, chewing) generating sounds within a narrow frequency band. The acoustic energy in body sounds is biased towards temporal modulation relative to frequency modulation, the acoustic component linked to prosody and associated with sounds that are calming and signal safety.
When the acoustic qualities that constitute body sounds are perceived, our nervous system promotes hard-wired subjective responses. The IADS ratings provide a window into these hard-wired subjective responses of pleasure and arousal to body sounds. Subjectively, body sounds uniquely were experienced as low on both the pleasure and arousal dimensions. This contrasts to music, which was experienced as high on both dimensions. The natural sounds shared with music a high degree of arousal and shared with body sounds a low rating of pleasure. Focusing first on the arousal dimension, body sounds uniquely had noticeably lower ratings on arousal. Arousal may be interpreted as a shift in physiological state towards an increase in activation of the sympathetic nervous system that would promote movement. Functionally, increased arousal would be associated with movements both towards positive and negative stimuli. These data suggest body sounds are not only perceived as negative (i.e., low on the pleasure scale), but they are associated with reduced mobilization or even perhaps immobilization. In contrast both music and natural sounds functionally stimulated the neural circuits that would support movement, although the pleasurable rating would suggest that the direction of the movement may be different for music and natural sounds. The high pleasure rating of music would suggest movement towards others (e.g., prosocial) observed in dancing and marching, while the low pleasure rating of natural sounds would suggest more of a defensive fight-flight response. Although the average subjective response to natural sounds is low on the pleasure scale, closer inspection identifies a great range of responding. There is a subset of natural sounds that are rated as pleasurable and arousing. These sounds, more similar to music, may function similar to music in their ability to support prosocial movement.
Across all IADS sounds greater temporal modulation was associated with lower subjective ratings of arousal. Extrapolating to body sounds, which had high temporal modulation and low arousal, suggests that body sounds may be triggering a biobehavioral “shut down” reaction normally associated with of life threat. The data demonstrate that body sounds have a unique acoustic profile that may trigger hard-wired responses, which may be especially pronounced in individuals who are in a physiological state of hypervigilance.
Future research will focus on three areas: 1) evaluate acoustic properties (i.e., separability, temporal modulation) of a broad range of sounds that are irritating; 2) evaluate individuals with sound sensitivities to determine the role that an individual’s physiological state and middle ear transfer function1 has in determining subjective, behavioral, and physiological reactions to irritating sounds; 3) Reducing auditory hypersensitivities and normalizing the middle ear transfer function with the Listening Project Protocol (see Porges et al., 2013, 2014)
1The middle ear transfer function objectively quantifies the acoustic ‘permeability’ to sounds in different frequency bands (Porges & Lewis. 2011). An atypical middle ear transfer function could functionally amplify the acoustic features of body sounds and attenuate frequency modulations in pleasing sounds. If this is the case, then the middle ear transfer function could be rehabilitated through the Listening Project Protocol.
Bradley, M. M. & Lang, P. J. (2007). The International Affective Digitized Sounds (2nd Edition; IADS-2): Affective ratings of sounds and instruction manual. Technical report B-3. University of Florida, Gainesville, Fl.
Porges, S. W. (2010). Music therapy and trauma: Insights from the polyvagal theory. In K Stewart, (Ed.) Symposium on Music Therapy & Trauma: Bridging Theory and Clinical Practice. New York: Satchnote Press, 3-15.
Porges, S. W. (2011). The Polyvagal Theory: Neurophysiological Foundations of Emotions, Attachment, Communication, and Self-regulation. New York: WW Norton.
Porges, S. W., & Lewis, G. F. (2011). U.S. Patent Application No. 13/992,450.
Porges, S. W. & Lewis, G. F. (2009). The polyvagal hypothesis: Common mechanisms mediating autonomic regulation, vocalizations, and listening. In S. M. Brudzynski (Ed.), Handbook of Mammalian Vocalizations: An Integrative Neuroscience Approach. Amsterdam: Academic Press, 255-264.
Porges, S. W., Bazhenova, O. V., Bal, E., Carlson, N., Sorokin, Y., Heilman, K. J., … & Lewis, G. F. (2007). Reducing auditory hypersensitivities in autistic spectrum disorder: preliminary findings evaluating the listening project protocol. New treatment perspectives in autism spectrum disorders, 91.
Porges, S. W., Macellaio, M., Stanfill, S. D., McCue, K., Lewis, G. F., Harden, E. R., … & Heilman, K. J. (2013). Respiratory sinus arrhythmia and auditory processing in autism: Modifiable deficits of an integrated social engagement system?. International Journal of Psychophysiology, 88(3), 261-270.
Singh, N C., Theunissen F E. (2003). Modulation spectra of natural sounds and ethological theories of auditory processing. The Journal of the Acoustical Society of America, 114(6), 3394-3411.