Methods in Neuromusicology: Principles, Trends, Examples and the Pros and Cons
Neuromusicology, also known as the Cognitive Neuroscience of Music, is a modern discipline devoted to the measurement of real-time processes in the human brain while perceiving and producing sound. Research topics range from acoustic feature processing and listening to melodies to composition and music performance. Before designing an experiment, researchers might find it helpful to be informed about the efficiency of methods and their pros and cons. The chapter at hand gives an overview of several methods used in the neurosciences with a special emphasis on their principles, constraints and fields of application. The focus is on transcranial magnetic stimulation (TMS), functional magnetic resonance imaging (fMRI), positron emission tomography (PET), electroencephalography (EEG) and on event-related potentials (ERP). The reader will also become acquainted with trends and recent developments towards whole-brain analyses and real life studies based on the idea to improve ecological validity.
KeywordsPositron Emission Tomography Transcranial Magnetic Stimulation Auditory Cortex Mismatch Negativity Gradient Coil
Neuromusicology, also termed the ‘Cognitive Neuroscience of Music’, is a modern discipline that came into existence through its methods. It is still not clear whether it belongs to cognitive neuroscience as a sort of ‘parent discipline’, to empirical musicology or whether it has a status of its own. Any neuroscience method enables researchers to measure the brain’s physiological processes in real-time, thus giving insight into the task-related or spontaneous functionings of the human brain without requiring any verbal or behavioral type of response.
The chapter gives an overview about the most frequent methods used in neuromusicology. I will particularly focus on transcranial magnetic stimulation (TMS), functional magnetic resonance imaging (fMRI), positron emission tomography (PET), electroencephalography (EEG) and on event-related potentials (ERP). These types of research methods are all non-invasive, i.e. neurosurgical interventions are circumvented. For each method I will weigh up the pros and cons and give examples of application.
Since results obtained with these techniques essentially differ in nature, it is necessary to first classify methods according to type: Data achieved with EEG and ERP belong to the class of ‘bioelectric potentials’. Those achieved with fMRI and PET belong to the group of ‘neuroimaging data’. Results obtained with EEG and ERP have shifts of intracellular currents as their common source (and starting point), and bioelectric potentials on the head’s surface are the final output (for details regarding electrogenesis, see Sect. 3). Thus, by using EEG and ERP, neural activity can directly be measured. Scans obtained with fMRI and PET, by contrast, reveal changes in energy consumption (mostly oxygen) in well-defined regions of the brain. These types of neuroimaging techniques therefore point to neural activity in a merely indirect way.
Jäncke [13, 14] addressed this issue of method classification in a slightly different manner by using the exact reference to anatomical structures as a specifying criterion. According to that, neuroimaging methods like fMRI and PET allow a precise assignment of physiological processes to neuroanatomical structures, whereas bioelectrical methods like EEG or ERP do not.
1.1 Transcranial Magnetic Stimulation: How Does It Work?
Let me start with some essential remarks on transcranial magnetic stimulation (TMS). In comparison to the methods mentioned above TMS is a sort of exception: It enables researchers to draw certain causality-based conclusions, i.e. to precisely relate ‘cause’ and ‘effect’. In contrast to that, fMRI, ERP and other conventional neuroscience methods, although quite popular, allow only mutual relationships, or correlative conjunctions, which means that coincidences cannot be ruled out.
How does TMS work? Transcranial magnetic stimulation modifies the excitability of nerve tissue so that cortical processes may either be accelerated or severely be inhibited up to virtual lesions between 10 and 30 min length. The underlying principle is electromagnetic induction: A magnetic field is temporarily built up in orthogonal direction to the plane of a stimulation coil that is placed 2 cm above the head. This way, an electric current is induced in the small cortical regions underneath while tissue resistances of skin, skull and dura can be disregarded . It depends on the coil’s stimulation frequency whether processes speed up or slow down: A pulse series with repetition rates of 5 Hz or higher (repetitive TMS; rTMS) may cause facilitation through lowering the excitation threshold, whereas a pulse series with repetition rates of 1 Hz maximum (same for 10-ms intense single shots) may have the opposite effect and provoke inhibition by suppressing the intracellular current flows .
Note that, in principle, TMS does not belong to the class of ‘neuroimaging or visualization methods’. Instead, TMS modulates pure neurophysiological (excitatory and inhibitory) activity, observable either through a decelerated or an accelerated overt behavior . However, to complete the data, researchers often decide for a combined approach using TMS as well as fMRI (or ERP) for measuring purposes (e.g. ).
2 Functional Magnetic Resonance Imaging: Basic Principles and Image Acquisition
Let me move on to the conventional types of neuroscience methods. I will first take a closer look at the principles of functional magnetic resonance imaging (fMRI). Since the underlying mechanisms are quite complex, I will omit the main parts of MR physics here and restrict myself to those aspects I consider relevant for discussing the methods’ advantages and disadvantages. (For further information on MR physics please refer to the detailed, excellent explanations in [13, 19].
To investigate brain activity with fMRI, measurement has to be performed in two steps: In the initial phase, called Magnetic Resonance Imaging (MRI), pure neuroanatomical data are recorded to precisely reconstruct size, shape and structure of the individual brain. After that, changes of regional blood flow are registered to determine the amount of oxygen consumption. (This is the method’s functional component, the ‘f’ in fMRI). Regarding the equipment, an expensive magnetic resonance scanner (tomograph) is the indispensable part; it should be able to produce a high-intense, static magnetic field B0 with a field strength of 1 up to 7 Tesla (T).
MR physics uses the spin, i.e. the self-rotating property of hydrogen atoms (H). Inside the scanner, the hydrogen atoms of the human body (4.7 × 1027 in number) react like tiny compass needles. They orient along the external magnetic field B0 and rotate with a special frequency termed ‘Larmor precession frequency’. Whenever B0 is modulated by introducing a sharp HF-impulse hydrogen atoms fold down from the Z- into the XY-plane, and magnetization changes from longitudinal to diagonal. To re-reach the starting position two types of ‘backward forces’, called ‘spin-grid interaction’ and ‘spin-spin interaction’ are effective, widely known as T1- and T2-relaxation .
In physical terms, T1-relaxation is defined as the point in time when longitudinal magnetization has regained 63 % of its original strength, whereas T2-relaxation is defined as the point in time when diagonal magnetization has decreased to 37 % of its initial value . Gray and white matter, fat and cerebrospinal liquid differ in their relaxation times (T1 and T2), which enables researchers to use these parameters for adjustment, in particular for regulating brightness and image contrast of the MR scans. Neuromusicology uses almost exclusively MR scans of the T1-weighed type. This type of brain image makes gray and white matter (as well as other forms of tissue with short T1-properties) look bright, whereas spaces filled with cerebrospinal liquid look dark, producing almost no signal.
In addition, several sub-processes, known as space-coding, are necessary to obtain precise 2- and 3-dimensional spatial information from the spin signal. Procedures for space-coding require a stimulation of selective layers as well as a modulation of the magnetic field and for this, paired gradient coils of X-, Y-, and Z-orientation are used . These coils produce gradient fields superimposed on B0, resulting in different strengths of the magnetic field at each point of the measuring volume. Regarding spatial information the MR signal can now be analyzed voxel by voxel .
Functional MR scans are the product of a further step of image acquisition, and for this aim, the so-called BOLD response is a necessary precondition (see next chapter).
2.1 BOLD Response and Its Underlying Principle
Let me describe the physiological mechanism responsible for the ‘f’ in fMRI: Blood itself, more precisely, the oxygen content of hemoglobin serves as a body-own (intravascular) indicator: Whenever a task has to be performed, either motor or cognitive, energy demand is high, and the regional cerebral blood flow (rCBF) increases. In the venous parts of the fine capillaries next to activated neural populations  an exchange of oxygen-deficient for oxygen-rich blood takes place, and this is the principle fMRI is based on. Note that the oxygenated type of hemoglobin differs from the de(s)oxygenated one in its magnetic susceptibility, the latter being slightly more susceptible than the former (also known as the para- vs. diamagnetic properties of blood; ). The principle of magnetic susceptibility was originally discovered by the American chemist Linus Pauling in the 1920s and was transferred to fMRI research in 1990, since then known as BOLD response (Blood Oxygen Level Dependent response, ). Jäncke  explains that deoxygenated (paramagnetic) hemoglobin has an inhibiting effect on the BOLD signal due to increasing magnetic field inhomogeneities yielding artifacts, but he concedes that the underlying mechanisms of signal production are still more complex.
The BOLD signal usually reaches its peak plateau between 4 and 8 s after task-onset. Thus, in comparison to bioelectrical methods, time resolution is poor. However, the main advantage of fMRI lies in its excellent spatial resolution with values between 1 and 4 mm3 (or even below) which is a necessary precondition to precisely localize functionally activated areas.
Note that Talairach and Tournoux , two neurosurgeons of Swiss origin, developed a stereotaxic atlas, i.e. a sort of spatial coordinate system supporting researchers in their effort to localize specific brain areas. By using a color range from red to blue to indicate either activation or deactivation, clusters of functional activity can be marked voxel- or pixelwise on these brain scans. This stereotaxic atlas also provides the possibility to adjust the morphological structure of individual brains or to transform data onto a template, the standard brain, used for data transfer between laboratories and for the comparability of results.
2.2 Techniques of Image Acquisition
Constructing an appropriate paradigm in fMRI research is no easy matter:
2.3 The Auditory Cortex—A Challenge to fMRI Research
About 87 % of all fMRI studies use echoplanar imaging (EPI) as the method of choice, valued for rapid data collection . Investigating the auditory cortex with fMRI, however, is a special problem. As already mentioned above, a disadvantage is that noise is produced by a fast switching of gradient coils during space-coding, ranging from 60 to 100 dB intensity. This side effect, similar to that observed for transcranial magnetic stimulation (TMS), makes the study of auditory processing complicated for, at least, three reasons: First, target sounds could be masked by the scanner noise and may hardly be detected. Second, the focus of attention may shift towards the noise and will eventually impair task performance. Third, emotional reactions to melodies as well as their aesthetic evaluation might extremely be hindered, resulting in reduced brain activation of the limbic system (amygdala and hippocampus).
How can researchers deal with these side effects caused by scanner disturbance? Several effective solutions are suggested by Jäncke : The first compensating technique he introduces is known as sparse temporal sampling (STS). It is a variation of the ‘echo-planar imaging’ technique. STS is characterized by inserting pauses of up to 10 s length into periods of continuous scanning. These inter-stimulus intervals can make the coil noise fade out while at the same time, i.e. preceding the measurement, target melodies may fade in. Another possibility is termed clustered acquisition. This time, the entire set of target stimuli will be presented immediately before scanning, providing a time-frame between 4 and 6 s for image acquisition (data recording). Note that whenever clustered acquisition is used as the method of choice, the first two (of let’s say, ten) fMRI scans have to be excluded from analysis: Owing to the fact that longitudinal magnetization has not been completely built up, signal strength is less than in the remaining scans . Several other simple measures may also effectively reduce scanner noise. First, scanner-compatible headphones may attenuate the coil noise while at the same time the sound quality of target melodies can be enhanced. Another possibility is to line the inside of the scanner tube with sound-absorbing material (e.g. insulating mats) with attenuation effects up to 30 dB.
Faced with the challenge to increase the effectiveness of auditory fMRI designs Mueller et al.  tested the effects of three types of EPI-sequences on image acquisition: continuous scanning, sparse temporal sampling (STS) as well as a method called interleaved silent steady state (ISSS) that differs from STS in terms of scanner behavior in the silent gaps between recordings.
Mueller et al.  tested all three types of EPI sequences (continuous scanning, STS and ISSS) for possible differences in brain activation caused by the measuring technique itself. In each session of 12.5 min length the same 10 s excerpts of classical dances and their electronically distorted counterparts were taken as example pieces to examine the excitability of brain tissue in 20 volunteers (7 females). The study obtained two results: First, activations in left and right auditory cortices were significantly stronger for the original than for the distorted dance pieces. More interesting, however, is the observation of additional activations in the limbic system (left and right hippocampal structures) that could be made visible with ISSS but not with sparse temporal sampling. So, unexpectedly, the interleaved silent steady state method emerged as the most sensitive acquisition technique; thus, it might be the method of choice whenever subtle activities in subcortical structures or in deeper layers of the cerebrum have to be detected.
2.4 Positron Emission Tomography: Some Notes on the Signal and on Image Acquisition
It is the right time to take a closer look at positron emission tomography (PET), the older type of neuroimaging methods used for visualizing brain processes. Once again, energy consumption (oxygen and glucose) serves as an indicator to precisely localize brain activity. Spatial resolution obtained with this method is between 2 and 8 mm3 depending on the type of PET scanner, i.e. it is slightly less accurate than that obtained with fMRI. When thinking about auditory paradigms neuroscientists often choose PET instead of fMRI. The reason why PET is preferred lies in the avoidance of scanner noise due to a different technique of data recording. In other words, PET in contrast to fMRI does not produce disturbing scanner noise at all. Thus, while lying in the tube of a PET scanner, participants can easily focus on the target sound, react emotionally and may also appreciate the target’s aesthetic value. However, the major disadvantage of this method is that a radioactive tracer substance, mostly 15-Oxygen (15O) or 2-Fluoro-2-Deoxyglucose, has to be injected intravenously. Radioactive isotopes emit positrons (particles of positive electric charge) that interact (or collide) with electrons (particles of negative electric charge), resulting in the emission of photons which can be measured with an array of scintillation detectors, i.e. a circle of sensors placed around the head .
Note that the half-life of each radioisotope imposes restrictions on the experimental paradigm in that time length available for task-related measurement is strictly determined by the rate of decay. The half-life of 15-Oxygen, for instance, is about 2 min, placing severe restrictions on the choice of appropriate stimuli: Single sound events such as pure tones or intervals have been proven suitable, whereas harmonic sequences or melodies should not exceed a time length of, on average, 10 s.
Jäncke  points out that due to risks of health only 10 injections of radioisotopes per participant seem acceptable, resulting in 20 min (10 shots à 2 min each) recording time, also restricting the number of the to-be-tested conditions. In compensation, and also to increase the statistical power, group-wise averaging of PET scans is advisable, and for this, PET data first have to be transferred to the template, the standard brain. Thus, in terms of individual PET data it seems almost impossible to achieve convincing statistical results.
2.5 Research with FMRI and PET: Example Fields of Music-Related Application
Scans depict functional activity in specific brain regions, using energy consumption as an indicator. Most results refer to the cerebrum, in particular the cortex, the basal ganglia and the limbic system, but, increasingly, the cerebellum and parts of the brain stem down to the pons (in particular the cochlear nucleus and the superior olive of the auditory brain stem) are also investigated using neuroimaging methods (e.g. ). To give you a glimpse of an idea about the range of results obtained with fMRI and PET I will pick some out, choosing ‘auditory processing’, ‘music theory’ and ‘creativity’ as example fields of application.
Note, however, that from a neurophilosophical perspective fMRI results do not allow to draw a conclusion about the functioning of the mind or the type of knowledge representation (be it analogous or propositional; see e.g. [41, 45] for a further discussion). In other words: neuroimaging methods still cannot be used to distinguish between mind and brain, the old, ‘hard’ philosophical problem. Even so, first attempts have been made to reconstruct the mental content belonging to different semantic categories from brain scans showing cortical activation and deactivation (see e.g. ).
2.5.1 Studying the Human Auditory Cortex with PET and FMRI
Neuroimaging methods enable researchers to examine the specific functioning of the auditory cortex in detail. This way, many fundamental insights that were previously found by introspection, now can be verified via brain scans which may help disciplines like Psychophysics and Tonpsychologie strengthen their impact.
One example in this respect is a study performed by Zatorre and Belin . Using PET, they were able to identify two functionally different parts in the auditory cortex, a core and a belt region. The core region is specialized in processing temporal features as typical for speech, whereas the belt region is specialized in processing spectral features as typical for tonal patterns. Furthermore, they observed a certain asymmetric shift (or functional lateralization) in that speech-like signals caused stronger activation in the left compared to the right auditory cortex whereas for signals with rich spectral content as in music the reverse was true. Zatorre and Belin  suggest that neuroanatomical structures on the micro-level, in particular different types of fiber myelination and cortical column width, may be the reason why rapid changes of signals are processed in the left auditory cortex, whereas spectral richness is more accurately processed in the right counterpart. (Note: It is the idea of starting with a common signal which was split up in two directions that makes the study trustworthy: Two pure tones in octave distance served as the standard signal and were randomly presented in alternating order. To either synthesize speech signals or simulate music, they were then speeded up in three steps (first condition) and enriched with additional spectral components (second condition)).
2.5.2 Tonality-Sensitive Areas—An Approach with fMRI
Another example field of application is music theory, although publications about the general laws in music and their neural base are scarce. Yet, Janata et al.  discovered a sort of tonality-sensitive center located in the rostromedial part of the prefrontal cortex (rmPFC). In this fMRI study, a melody was systematically modulated through all 24 major and minor keys, and eight musically experienced participants listened attentively to each transposed version.
2.5.3 Musical Improvisation—An Example of Whole-Brain Image Analysis
In recent years, a tendency towards naturalness and authenticity has been observed showing that ‘high ecological validity’ becomes a core criterion in cognitive neuroscience. In this context some neuromusicological field studies (using EEG) occasionally have been put into practice: Fritz and colleagues, for instance, used a portable EEG equipment to record brain activity from native village inhabitants in Cameroon who listened to Western classical music for the first time (unpublished work). A mobile EEG equipment has also been found useful to examine the effects of Cannabis on consuming rock music while sitting in the living room, smoking a couple of joints (see ).
Neuroimaging methods, by contrast, cannot fully meet the criterion of context-related authenticity as the scanning procedure should always be performed in a laboratory environment to obtain trustworthy results. Despite these constraints, a new tendency in fMRI research can be observed in that complex, natural musical pieces of several minutes length are used to investigate free natural listening and/or some types of spontaneous creativity.
2.6 Neuroplasticity in Musicians—Structural and Functional Types
Note that two decades ago, K. Anders Ericsson, an American psychologist of Swedish descent, developed a concept named ‘deliberate practice’, saying that high levels of proficiency (or expert performance) need years of intensive training especially in young adulthood (see ). Interestingly, deliberate practice leaves ‘traces in the brain’ through re-shaping its areas, a process widely known as neuroplasticity.
Neuroscientists distinguish between two forms, a functional and a structural type of experience-driven neuroplasticity. In the first case cortical activation strength, i.e. the susceptibility of brain tissue, is modified (see below), whereas in the latter significant enlargements, caused by an increase of dendritic branching as well as by intensification of synaptic strength, can be observed on the macro-level. A special method called ‘voxel-based morphometry’ enables researchers to precisely assess the extent of experience-induced anatomical changes while analyzing scans of the T1-weighed type (see  for details).
Münte et al.  considered the brains of professional musicians as the best fitting type to investigate these plastic changes, but also brains of sportspersons or chess players could serve as ideal models. Note that the brain’s structural (and functional) changes seem to correlate with the age of learning to play a musical instrument in that effects are stronger the earlier piano or violin lessons start (i.e. typically before the age of 10) (see e.g. [7, 37, 38]).
A first result became visible in experienced string players, revealing an asymmetric (structural and functional) enhancement of the primary somatosensory cortex. The effect could clearly be detected for the fingering hand, i.e. for left hand fingers 2–5, but neither for the bow hand (right hand) nor for the left hand of a nonmusician control group .
Pantev et al. [37, 38] confirmed the fact of pure functional neuroplasticity in musical contexts: While listening inattentively to either piano versus pure tones (or to tones of familiar versus unfamiliar instrumental timbre) functional activity (or cortical dipole strength) was significantly enhanced in the auditory cortex of musicians. No similar effect could be observed for the nonmusician control group. Note that in these studies Pantev et al. [37, 38] decided for MEG (Magnetoencephalography) which is another non-invasive method of data recording enabling researchers to measure the brain’s weak electromagnetic fields by using highly sensitive SQUID detectors. The distinguishing features of MEG are its high temporal as well as spatial resolutions, however, special software for source localization is required.
3 Electroencephalography: The Basics
Let me move on to electroencephalography (EEG), the oldest and most established type of neuroscience methods. By placing electrodes onto the head’s surface, EEG research remains almost exclusively non-invasive. Note, however, that special problems, for instance, localizing the source of musicogenic epileptic seizures during pre-surgical preparations, occasionally make it necessary, to either directly place electrodes onto the ‘naked cortex’, i.e. onto the gyri after opening the skull, or to implant them intracranially into the brain’s tissue (e.g. ). The first variation is termed electrocorticography (ECoG), the latter is called depth EEG recording.
The original, conventional EEG method was developed by Hans Berger, a German psychiatrist, at the beginning of the 1920s; he also coined the term ‘electroencephalography’ (Greek: enképhalon = brain; grapheîn = to write).
The EEG is a by-product of brain cells’ information transfer in which intra- and extra cellular current flows are modulated with specific membrane mechanisms. When these current flows synchronize, potential differences summate, and become strong enough to be recorded with EEG. The post-synaptic activity of pyramidal dendrites (rather than action potentials) in the cortex particularly possess these characteristics and is therefore regarded as the main source of the EEG … Thus, in EEG, coherent activity of numerous cortical neurons (approximated by 10000) is recorded.
Note that a fifth type, the gamma band with frequencies between 30 and 80 Hz, is omitted in this context here: Gamma activity indicates ‘feature binding’, a specific process necessary to experience coherent percepts. It has mainly been found in the visual domain for binding spatially separate, ‘static’ properties together, like ‘color’, ‘shape’ and ‘(surface) texture’. Regarding the auditory domain, similar processes of temporal feature binding have been found less frequently. (Despite that , observed stronger gamma-band-synchronizations in musicians than in non-musicians when listening to musical excerpts, suggesting that musicians are more experienced in anticipating melodies of a certain style as well as in combining musical parameters such as contour, rhythm and timbre to a melodic entity.)
3.1 Research with EEG: Two Example Studies
It is common knowledge among ethnologists that states of Shamanic trance can be reached by taking drugs and/or by repetitive, monotonous drumming (e.g. [11, 44]). The brain reacts to this mind-expanding experience with a change in the spectral content of the standard EEG, in particular by an increase of theta and delta activity.
Let me briefly describe a second example of music-related EEG research: During the 1980s, the Austrian neuroscientist Hellmuth Petsche came up with the idea of ‘EEG coherence analysis’, a methodological approach performed for each frequency band (δ, θ, α, β) separately to extend the conventional type of analyzing EEG raw signals via FFT. EEG coherence analysis has proven highly effective to investigate the interplay between cortical network structures during creative thinking and other mental processes of higher order. It describes the degree of similarity (or functional coupling) between EEG signals at adjacent electrodes of the same hemisphere (the ‘intrahemispheric type’) or at homologous electrode sites on the opposite halves of the brain (the ‘interhemispheric type’).
3.2 EEG Sports: A Promising Trend Using Mobile Devices
Investigating brain activity of humans in action, while playing golf, riding a bicycle or performing in a chamber music ensemble has been an unsolved problem in EEG research for many years. The most challenging aspect is not the real life situation per se but rather body movement as such: Any subject in motion produces many artifacts of extracerebral origin, arising from skin changes and muscle tension. In addition, sweating and loose electrodes may also contaminate the measurement and make EEG data not utilizable for further analysis (e.g. ). Furthermore, the recording equipment is unwieldy and heavy, including amplifiers and batteries, making it impossible to carry the device in a rucksack on the back. On the other hand, portable solutions would offer a wealth of opportunities in the field of human movement and sports science, while ecological validity would be high.
Until now, most studies in this context use EEG for neurofeedback-training in the lab, i.e. for investigating brain-based self-regulatory techniques that may help to modify the mental attitudes during several phases of practicing and performance. This way, certain EEG frequency bands, in particular alpha, theta and delta, can reliably be strengthened via monitor and other feedback devices, obviously increasing self-awareness, feelings of well-being and the supply of mental and physical energy necessary to succeed in any training session or sports competition outside [39, 56].
Recently, a new product series, termed eegosports TM has been developed by ANT Neuro, Enschede, a neuroscience company specialized in developing EEG hard- and software. Since 2013, they offer a portable, light-weight 64 channel EEG solution of less than 1000 g that enables researchers to freely investigate different types of movement as well as effects of training and physical exercise in a natural environment. Presumably, this mobile solution will be used in the context of ‘music and motion’ in the near future.
4 Event-Related Potentials (ERPs)—A Derivative of the EEG
Finally, let me describe the second type of bioelectric methods, known as the measuring of ‘event-related potentials’ (ERPs). ERP works on the precondition that, during recording, the same type of stimulus will be repeated at least 20 times which is not necessarily required for measuring the EEG.
Both methods, EEG and ERP, differ completely in their basic idea: EEG, on the one hand, allows to make individual recordings of several minutes length while disregarding transient brain activity, i.e. the components lasting some ms within ultra-short time frames. This way, the EEG informs about the brain’s overall physiological state, i.e. the levels of consciousness and arousal while listening to music of different style and tempo.
ERP, on the other hand, is completely devoted to the basic idea of drawing an analogy between the computer and the human mind, meaning that both systems, the electronic and the human, should be considered similar in their strategies to select, transform, store or activate the respective information (see ). The ERP, therefore, directly points to a, mainly, serial form of processing input and comprises several independent processing steps in sequence . (Note that according to this shift in thinking the word ‘cognition’, derived from the Latin word ‘cognoscere’, has lost its former philosophical connotations like ‘becoming aware of’, ‘apperceive’ or ‘comprehend intuitively’ and is now used in a simple, pragmatic way).
Rösler  points out that signal averaging and its product, the grand average ERP, are flawed with some weak points: First, brain responses stemming from several trial repetitions are summed up automatically which is considered inappropriate from a psychological point of view as it cannot be ruled out that participants might have changed attentiveness during recording. Second, brain responses are prone to habituation, i.e. amplitudes will be reduced the more familiar, or predictable, the often-repeated stimuli are. Third, grand average ERPs are produced at the expense of individual brain responses, meaning that conclusions regarding individual processing strategies cannot be drawn from the final product.
Note that it is not the ERP curve as a whole that serves as a unit for interpretation. Instead, each half wave, or ERP component, will be analyzed separately on the assumption that it responds independently, i.e. without any cohesive forces operating between adjacent components.
Regarding nomenclature, two details are needed to describe each ERP component properly: details about its ‘polarity’ and its ‘latency’. The term ‘polarity’ describes the component’s deflection, i.e. change in voltage direction either into the positive (‘P’) or the negative (‘N’). ‘Latency’, by contrast, refers to the timespan between stimulus onset and the peak amplitude and can either be described as a rounded value (in ms; e.g. P200) or as an ordinal number (e.g. P2). These sparse but essential details may be completed by some more information about the component’s ‘maximum amplitude value’ [µV], its ‘brain topography’ and its ‘waveshape’.
The curve example in Fig. 14 shows five components, termed P50, N100, P200, N400 and LPC (late positive component). The first ones up to 300 ms indicate exogenous processes that, in principle, are determined by stimulus parameters such as frequency, intensity or presentation rate. N400 and LPC, by contrast, indicate endogenous processes, reflecting some task-related cognitive processing steps for which attention is required. However, since recent results could show that exogenous components can be modulated by top-down processes too (e.g. ), contemporary ERP research directly focuses on the characteristics of the particular component itself, i.e. it omits this additional exogenous-endogenous classification.
To illustrate which aspects of cognitive processing a component may indicate, I will first pick out the so-called Mismatch Negativity (MMN).
4.1 The ‘Mismatch Negativity’ (MMN)—An Example Component of the ERP
From a functional point of view, the MMN indicates some trace-building processes within sensory (echoic) memory which gives rise to the assumption that a pattern-based process is the underlying driving force . The second attribute is its independency from attention, enabling researchers to investigate both, attentive (controllable) as well as preattentive (automatic) processes of sound detection. Regarding the latter, attention is caught by instructing subjects to watch a silent video or read a book which prevents them from taking particular notice of the sounds themselves.
Interestingly, these preattentive mechanisms of sound detection are modifiable by longstanding experience in that amplitudes are higher the more musical training participants have, in other words, the more accurate sound-templates in long-term memory are stored: Violinists, who are well-experienced in shading intervals and chords according to good intonation, automatically detect a slightly impure major chord (with frequencies of 396-491.25-596 Hz instead of 396-495-596 Hz), and this discrepancy between the actual input and the stored template will be indicated by a clear MMN. Musically inexperienced participants do not show a similar result .
4.2 Syntactic and Semantic Incongruities in Language and Music: ELAN/ERAN, P600 and N400
As already seen in the previous paragraph, ERP works best when tone rows are investigated, that is, when structure unfolds along the time axis. This way, sequence structures of any type will match the method’s distinguishing feature of registering transient brain activity in high resolution on a ms-time scale.
By using a different paradigm, three specific ERP components have been found in the language domain, named ELAN (early left anterior negativity), N400 and P600, that are connected with a rule-based type of sequence structure: the ELAN and P600 indicate error detection in terms of syntax structure, whereas the N400 indicates deviation regarding semantic content.
Interesting parallels can be drawn between syntax processing in language and music: That is, melodies ending with a non-diatonic, incongruous final tone  evoke early and late components (named ERAN [early right anterior negativity] and P600) of similar shape and latency as those ones that were previously found for processing spoken sentences, allowing the conclusion that underlying processes are domain-general (Fig. 16).
(Note that similar comparative results were achieved when processing prosodic information: A specific ERP component termed Closure Positive Shift (CPS) was found for processing intonational phrase boundaries in spoken language as well as for processing phrase boundaries in music (while listening to binary-structured melodies); cf. [18, 35, 51]).
Besides that, a prominent component, termed N400 reacts to challenges of the semantic type: The N400 is visible whenever absurd or meaningless words in otherwise grammatically correct sentences are identified (“He carries his daughter in his nostrils”). The N400 is therefore interpreted as indicating violations of semantic expectancy .
To my knowledge, no study exists at present where the N400 is evoked by a semantic mismatch between a musical context on the one hand and a musical target (e.g. a chord) on the other, supporting the widespread view that chords, intervals and musical excerpts are less clear in meaning than words.
5 Do Advantages Outweigh the Disadvantages?—A Final Assessment of the Methods’ Pros and Cons
Let me sum up the latest developments in neuromusicology. In my view, the following three tendencies become apparent: First, there is the endeavor to precisely relate cause and effect, i.e. to prefer causal relationships to correlative ones. This means that transcranial magnetic stimulation (TMS) is increasingly applied to music-related questions [26, 52], allowing researchers to assign an either slowed down or accelerated overt musical behavior to differently stimulated brain tissue.
The second trend is towards investigating brain activity in real life situations, i.e. to increasingly fulfill the criterion of ecological validity. Recently developed EEG mobile solutions (eegosportsTM) match well with this concept: They enable researchers to investigate the ‘brain in action’ while sledging, jogging, preparing a solo recital or playing in a jazz combo, in short: during every type of sport and performance activity. This trend also includes the endeavor to record brain activity in natural environments, for instance, while performing cross-cultural field studies in non-Western countries.
Third, several labs advocate a holistic approach in that whole-brain activity is explored with fMRI while listening to complex musical pieces or while spontaneously improvising on the piano [1, 28]. Regarding this holistic approach, EEG coherence analysis, developed by Petsche as early as 1996, might be considered as a forerunner, since functional coupling of cortical network structures (while composing or listening to short pieces of music) can also be investigated by using this type of analysis.
Starting the assessment of methods with EEG—What are the main advantages and disadvantages of this oldest and most established type of neuroscience methods?
EEG allows the recording of unspecific brain activity over a time span of several minutes length while disregarding the transient components. Accordingly, the EEG is an appropriate method whenever experiencing music, be it repetitive drumming or a Mozart symphony. So, EEG is the method of choice whenever the focus of research is on the level of consciousness, on attention or arousal. Furthermore, Fourier analysis enables researchers to precisely observe changes in the spectral content of EEG signals over the entire time of recording. A disadvantage might be that interpretation is limited to statements about the brain’s physiological state in general. However, a further plus point is coherence analysis, an option showing the functional coupling of brain activity at near and distant electrode placements, yielding information about the interplay between cortical network structures. So, whenever coherence analysis is included as an additional tool, the method’s power is considerably increasing.
A second advantage is the possibility to analyze EEG raw traces for each subject separately. This is without alternative whenever mental states during creative processes are investigated, making the EEG method indispensable for creativity research (Schaffenspsychologie). However, there is also an option for a group-wise EEG analysis by comparing frequency bands of, e.g. musicians vs. non-musicians in relation to a specific task or a particular piece of music.
What are the pros and cons of measuring ‘event-related potentials’ (ERPs), the second type of bioelectric methods?
In principle, ERP components indicate information processing in a step-by-step manner. This way, ERP supports the basic idea of cognitive psychology which says that the human mind and the computer work on analogous principles . In addition, excellent time resolution allows the discovery of new components beyond the established ones, for instance the face-sensitive N170, indicating the earliest stage of face recognition (e.g. ).
Regarding contents, ERP is the appropriate method to investigate three specific aspects: (a) to examine brain responses to frequency, intensity and other sound parameters in the context of psychoacoustics (eliciting a P50, an N100 and a P200, respectively), (b) to investigate brain responses to (simple) auditory sequence structures (eliciting an MMN) and (c) to examinethe processing of rule-based syntax structures (evoking components such as ELAN/ERAN and the P600) so that comparisons between language and music can be drawn.
However, a lot of methodological constraints are imposed on each ERP design with restrictive effects on the interpretation of results: First, to increase the signal-to-noise ratio—between the event-related potential and spontaneous, unrelated EEG activity—the same type of stimulus has to be repeated between 20 and 1000 times (depending on the respective paradigm). Second, grand average ERPs refer to the entire number of trials and subjects, making it impossible to measure inter-individual differences or to reconstruct individual responses in retrospect. Third, brain responses are prone to habituation which means that, for each subject, trials are subsumed to a single common curve while disregarding minor or even major shifts in attention which is considered inappropriate from a psychological point of view.
Fourth, empirically working musicologists should know that ERP does not allow any conclusion about processing musical pieces of a particular epoch, a specific genre or the personal (idiosyncratic) style of composer (e.g. Händel vs. Bach). Fifth, the ERP responds to structure-violation in an overall-sense, for instance to any deviant chord within the standard scheme, be it a Neapolitan Sixth (N6 or sn) or a double dominant (DD) in a musical cadence. These types of deviant chords will evoke the same ERP component (ERAN), no further specification in terms of harmonic progression will be possible.
Finally, let me weigh up the main advantages and disadvantages of fMRI and PET, the most popular types of neuroimaging methods:
Due to an excellent resolution in the spatial domain (approximately 1–4 mm3 for fMRI and 2–8 mm3 for PET depending on the scanner type) both neuroimaging methods provide the possibility to localize even the smallest functionally activated brain areas, based on a voxel-wise analysis. This way, the complex interplay between cortical and subcortical network structures, including the basal ganglia, the cerebellum and parts of the brain stem, can be made visible.
However, the precise ‘Where’ in the brain is at the expense of the ‘When’: The BOLD signal reaches its peak plateau between 4 and 8 s after task-onset, thus, in comparison to EEG and ERP, time resolution is poor.
Among the various options of stimulating brain tissue with pulse trains and sharp HF-impulses to obtain high-quality imaging data, echoplanar imaging is the fastest, enabling researchers to record the whole brain in less than 2 s. However, a disadvantage of this stimulation type is the technical noise in the scanner with volume intensities between 60 and 100 dB due to a fast switching of gradient coils during space-coding, a necessary step for image acquisition. Note that the interleaved silent steady state method is the most sensitive of all echo-planar imaging techniques, suitable for detecting subtle activities in subcortical structures as well as in deeper layers of the cerebrum .
However, in terms of musically-related contexts neuroscientists often choose PET, the older one of both imaging techniques. PET in contrast to fMRI does not produce any disturbing scanner noise at all. This enables participants to deepen their emotional experience and perceive a musical piece in an aesthetic sense while lying in the scanner. However, the major disadvantage of the PET method is that a radioactive tracer substance has to be intravenously injected. This procedure imposes several restrictions on the recording procedure and the choice of stimuli, both aspects are strictly determined by the rate of decay.
To sum up: Do advantages outweigh the disadvantages? The question should be answered in the positive: Neuroscience methods offer elegant solutions to measure cognitive processes in real-time, yielding results of either high-temporal or high-spatial resolution. This fits nicely with a proposal by Leman : To solve research problems more successfully he recommends a “joint correlative approach between different research methodologies; in particular musicology, computer modeling, experimental psychology and […] neuromusicology” (p. 194f), in short, a “convergence paradigm” . This is in accordance with a truly systematic approach as advocated by Schneider : “The ultimate goal of systematization is to establish coherent systems of knowledge that should be free from any contradictions, and should be as complete in descriptions and explanations of the principles and phenomena of a certain field as is possible.” (p. 20).
However, to dampen euphoria and overoptimism regarding the available neuroscience methods and their capacities, take notice of the following:
“The goal of neural science is to understand the mind—how we perceive, move, think, and remember.” Despite all efforts, this statement by Eric Kandel (cited in , p. 1) still cannot be put into practice. (Some experiments on mental rotation make an exception, e.g. [15, 48]). Until now, many impressive methods inform about the physiological state and the functional activity of the brain. But how the mind works, is a different matter. Scientists still do not know for sure how thoughts are generated and how mental knowledge representations precisely look like. Nevertheless, attempts have recently been made to reconstruct the mental content belonging to different semantic categories from fMRI scans showing cortical activation and deactivation (e.g. ). However, the main disadvantage of this approach is ambiguity in that, until now, no clear assignment between both types of substance, the material and the immaterial world, can be made. Even so, innovations in the field of neuroscience are growing rapidly, so there are grounds to believe that the dualism between mind and brain, the so-called ‘hard problem’, may be solved in the near future.
- 2.Andoh, J., Zatorre, R.J.: Interhemispheric connectivity influences the degree of modulation of TMS-induced effects during auditory processing. Front. Psychol. 2, Article 161, 13 pages (2011). doi: 10.3389/fpsyg.2011.00161
- 3.Besson, M., Faïta, F.: An event-related potential (ERP) study of musical expectancy: comparison of musicians with nonmusicians. J. Exp. Psychol.: Hum. Percept. Perf. 21(6), 1278–1296 (1995)Google Scholar
- 4.Bhattacharya, J., Petsche, H., Pereda, E.: Long-range synchrony in the ƴ-band: role in music perception. J. Neurosci. 21(6), 6329–6337 (2001)Google Scholar
- 6.Drobisch, M. W.: Über musikalische Tonbestimmung und Temperatur [On musical pitch estimation and temperature]. In: Abhandlungen der Königlich-Sächsischen Gesellschaft der Wissenschaften 2, 1–120. Hirzel, Leizpig (1855).Google Scholar
- 8.Ericsson, K.A.: The influence of experience and deliberate practice on the development of superior expert performance. In: Ericsson, K.A., et al. (eds.) The Cambridge Handbook of Expertise and Expert Performance (Chapter 38, pp. 685–706. Cambridge University Press, New York (2006)CrossRefGoogle Scholar
- 11.Gingras, B., Pohler, G., Fitch, W.T.: Exploring Shamanic journeying: Repetitive drumming with Shamanic instructions induces specific subjective experiences but no larger Cortisol decrease than instrumental meditation music. PLOS One 9(7), 9 pages (2014)Google Scholar
- 13.Jäncke, L.: Methoden der Bildgebung in der Psychologie und den kognitiven Neurowissenschaften. W. Kohlhammer, Stuttgart (2005)Google Scholar
- 14.Jäncke, L.: Lehrbuch Kognitive Neurowissenschaften. Huber, Bern (2013)Google Scholar
- 28.Limb, C.J., Braun, A.R.: Neural substrates of spontaneous musical performance: an fMRI study of Jazz improvisation. PLoS One 3(2), e1679 (11 pages) (2008)Google Scholar
- 30.Maguire, E.A., Gadian, D.G., Johnsrude, I.S., Good, C.D., Ashburner, J., Frackowiak, R.S.J., Frith, C.D.: Navigation-related structural change in the hippocampi of taxi drivers. PNAS 98(8), 4398–4403 (2000)Google Scholar
- 32.Münte, T.F., Altenmüller, E., Jäncke, L.: The musician’s brain as a model of neuroplasticity. Nat. Rev. Neurosci. 3, 473–478 (2002)Google Scholar
- 34.Neisser, U.: Cognitive Psychology. Meredith, New York (1967)Google Scholar
- 42.Révész, G.: Tonpsychologie. Voss, Leipzig (1913)Google Scholar
- 43.Rösler, F.: Statistische Verarbeitung von Biosignalen: Die Quantifizierung hirnelektrischer Signale. In: Baumann, U., et al. (eds.) Klinische Psychologie: Trends in Forschung und Praxis 3, pp. 112–156. Huber, Bern (1980)Google Scholar
- 44.Rouget, G.: Music and trance. A theory of the relations between music and possession. Chicago University Press, Chicago (1985)Google Scholar
- 45.Rumelhart, D.E., Norman, D.A.: Representation in memory. Stevens Handbook of Experimental Psychology 2, 2nd edn, pp. 511–587. Wiley, New York (1988)Google Scholar
- 47.Schneider, A.: Foundations of systematic musicology: a study in history and theory. In: Schneider, A. (ed.) Systematic and Comparative Musicology: Concepts, Methods, Findings, pp. 11–61. Peter Lang, Frankfurt am Main (2008)Google Scholar
- 49.Siedentopf, C.M.: (Internet source) University of Innsbruck, Austria (2013). www.fMRI-easy.de
- 53.Talairach, J., Tournoux, P.: Co-Planar Stereotaxic Atlas of the Human Brain. 3-Dimensional Proportional System: An Approach to Cerebral Imaging. Thieme Medical Publishers, New York (1988)Google Scholar