Music Moves
Music Moves
Buy on Leanpub

Table of Contents


Music Moves. The question that sparked off this project was the simple question why does music make us move?

Why Music Moves?

There has been a lot of research in the field of music-related body movement over the last decades, and the University of Oslo has been at the forefront in the field. The educators in Music Moves have been running a research-based course called “Music and Motion” at the University of Oslo for several years, and were eager to share their ideas more broadly. When the University of Oslo looked for courses to be transformed to online courses, Alexander, Hans and Kristian were easy to ask, and so Music Moves was born!


Well, we are going to talk about specific concepts. The entrainment process, for example. What makes me nod my head. Terminology such as the difference between motion and action, and gesture. And we are going to talk about the history. Why we have arrived at the situation we have today, for example. Technologies for studying music related body motion. So as you see, you’re going to learn a lot about different types of theoretical approaches to this, methods to use, and, of course, also some of the research that we are producing when we study music related body motion at the University of Oslo. Welcome to Music Moves.

About the authors

The educators, Alexander, Hans and Kristian are all associate professors at the Department of Musicology at the University of Oslo, Norway. For this run of Music Moves they are also joined by the recent PhD fellow Mari. They all share the passion for music and movement, even though they come to the topic from different perspectives.

Alexander Refsum Jensenius started his university studies as a classical piano-playing mathematics student, but ended up with a music degree focusing on electronic music. He values creativity and artistic output as highly as scientific results, and therefore likes to call himself both a “music researcher” and “research musician”. He has been studying and researching in Oslo, Berkeley, Gothenburg and Montreal, often focusing on weird and experimental musics. After focusing on large-scale body motion for several years, he has now turned his interest to “micromotion”. These are tiny actions happening in the body, often at a millimetre scale. Alexander wants to understand how music influences such micromotion, but also how he can use micromotion in music performance. He has two daughters that move to music.

Hans T. Zeiner-Henriksen has been teaching various courses in music production, music history (pop/rock), popular music studies and music psychology/cognition since 1994. He completed his PhD in 2010 called “The PoumTchak Pattern: Correspondences between Rhythm, Sound and Movement in Electronic Dance Music”. His research is concerned with the connections between music and movement, particularly in the club music of the 1990s (Daft Punk, Basement Jaxx). He was also part of the research project “Music, Motion and Emotion: Theoretical and Psychological Implications of Musical Embodiment,” in which he mainly studied how corporeality takes part in intense emotional experiences in music. He has three daughters that move to music.

Kristian Nymoen studied both musicology (Master’s) and informatics (PhD) in Oslo. His interdisciplinary background is highly reflected in his research, which covers development and usage of technologies for studying music-related body motion. He develops analysis methods for music cognition, new interfaces for musical expression, machine learning techniques, and more. He is also an active guitarist and live electronics musician. He has one daughter that moves to music.

In addition to Alexander, Hans, and Kristian, you will also visit other experts from the University of Oslo in their offices. Hallgjerd Aksnes is an expert on emotion and metaphor theory in music. Rolf Inge Godøy is an expert on embodied music cognition. Anne Danielsen is an expert on groove and rhythm, particularly in popular music.

Course Outline

Music Moves will give you a broad introduction to the field of music and movement, and the chance to learn from experts in the field.

You will learn about the main theories explaining why movement is so important in music. This includes an overview of the historical development of music-related body movement. It also includes discussions of the cultural influence and importance for how and why we move as we do to music. This is again coupled to some of the most important psychological phenomena that govern our performance and perception of music.

In addition to the theoretical overviews, you will also learn about the different research methods and technologies that researchers use to study music-related movements. This includes everything from qualitative studies of metaphors, to machine-learning-based analysis of motion capture data.

Course Structure

The course stretches over six weeks, with a gradual increase in topics and complexity. Each week will be organized in three different “tracks”: Theory, Terminology and Methods. This is to ensure that you learn progressively throughout the weeks. You will recognize the three tracks through three unique colours in the platform:



During the coming six weeks you will encounter a lot of new terms that might be difficult to grasp. We therefore put together a dictionary for you to use along with the course material (see Appendix). There may be even more terms that you do not understand, so please do not be afraid to ask in the comment field.


A lot of people have been involved in the making of Music Moves. A list of everyone involved is available at the end of the book. A big thanks to everyone!

Week 1


A Historical perspective

There are numerous musical genres, and each of them has its own particular “style”: the sound, the looks of the performers, the behaviour of the audiences, and so on.

What led to the development of such strong cultural conventions? How do these conventions influence the way we experience music?

In this course we will study music as a phenomenon, rather than as a specific musical artefact. But that also requires an understanding of the different cultures we are situated in and coming from. The educators have been active musicians within various music genres, but they all have their academic training from institutions where the tradition of Western art music (“classical music”) has been dominant. This musical culture is often seen as the ideal in the musical “hierarchy”, and the most serious and prestigious way of experiencing music. The typical way of experiencing Western art music is to be:

  • seated
  • in silence
  • and not moving

This is probably the only possible way for many music lovers to have a rich and rewarding experience with music, but for others it may have limitations for how a bodily involvement may enhance the experience. Since this way of experiencing music often is used as a model for how music should be experienced also in other settings, this may be problematic. In Music Moves our aim is to explain why we believe it is due time that the body is taken seriously in music education and research. Furthermore, we even believe that if we truly want to understand the power of music, the body needs to be included in the discussion.

We will start by looking at how the focus on, and interest in, music and body movement have changed over the years.

The history of listening experience - Part 1

Contemporary classical music listening is mostly sedentary and silent, with no spontaneous outburst or cheering from the audience. Has it always been like this? In this video Hans T. Zeiner-Henriksen presents a historical overview of the development of the classical music concert tradition.

The Listening Experience up until the 20th Century

In this article we will explore some of the topics from the previous video in more detail.

The connection between music and body movement seems immediately obvious already in light of how musical sound is made. Body movements produce sounds on instruments, and very few musicians are able to play properly without a repertoire of other types of movements: the jazz pianist might keep time through foot tapping, while the classical clarinettist might embody the melodic phrases through phrasing “dips” in the head and shoulders.

Similar types of connections between sound and movement are also apparent in the perception of music. In dance music, for example, the body responds to specific features in the music - of choreographed classical ballet, stylized folk dance, and improvised club dance. But perceiver’s movements are not only restricted to dancers. As Simon Frith writes,

A good rock concert . . . is measured by the audience’s physical response, by how quickly people get out of their seats (1996:124).

In most popular music, jazz, and almost any folk music, connections between music and movement can be found in examples of foot tapping, head nodding, body swaying, clapping, singing along, dancing, or in various ways mimicking sound-producing actions (playing “air guitar”). However, there are large cultural differences in how such bodily involvement is regarded.

In the scholarly tradition focusing on Western “classical” music it has often been a focus on so-called “serious listening”. While the conductor may gesticulate exaggeratedly and the musicians certainly move while producing sound, the concert hall audiences are generally supposed to sit still and quiet. Patrick Shove and Bruno Repp describe the listening environment of the concert hall as:

A social proscription against overt movement by listeners has long been in effect (1995:64).

We may then ask, for how long?

The ideal of silent, attentive listening in concert halls is a social phenomenon that advanced only during the 19th century. Richard Sennet writes:

To sneer at people who showed their emotions at a play or concert became de rigueur by the mid-19th century. Restraint of emotion in the theater became a way for middle-class audiences to mark the line between themselves and the working class. A “respectable” audience by the 1850s was an audience that could control its feelings through silence; the old spontaneity was called “primitive.” The Beau Brummell ideal of restraint in bodily appearance was being matched by a new ideal of respectable noiselessness in public (1974:206).

While Sennet occupies himself with the sociological causes for this shift, James Johnson views it in relation to the music that was introduced at the time, such as the works of Beethoven requiring more concentrated listening (Johnson 1995). Johnson, as well as Lydia Goehr (1992:191ff) disdains the 18th-century audience for being primarily occupied with social activities when attending concerts. William Weber in turn takes both to task for endorsing a specifically post-Romantic view of listening that is replete with distrust of

any fusion between music and mundane social activities which are felt to violate the integrity of musical experience (Weber 1997:681).

The idea of the musical work as a perfect, complete unity propagated in the same period. This fostered conventions such as:

  • always play complete symphonies, never parts
  • never applaud between parts/movements
  • never applaud until the last note is played

Susan McClary questions such conventions in music; the procedures that have “ossified into a formula that needs no further explanation” (2000:2-3). Strangely enough, even when music from before this 19th century turn of focus are performed, it is controlled by the same conventions. But is this appropriate?

After the premi-re of his Symphony No. 31 (the “Paris” symphony) in 1778, Wolfgang Amadeus Mozart wrote the following in a letter to his father:

Just in the middle of the first Allegro there was a passage which I felt sure must please. The audience were quite carried away - and there was a tremendous burst of applause. But as I knew, when I wrote it, what effect it would surely produce (quoted in Anderson 1966:558).

The passage Mozart refers to has two quite intense ascending pitch movements, each followed by a slower descending movement, and they probably inspired the applause, figuratively (and perhaps literally) “lifting” the audience. Mozart would almost certainly not have achieved the same overt response from his audience a century later. The noisier and rather unrestrained listening environment of the 18th century was maybe more receptive to music that invited corporeal involvement, for sure, influencing the composers at the time. And the absence of an immediate and satisfying response to the corporeal effects of music may have pushed subsequent generations of composers in other directions.

The shift to an ideal of silent, attentive listening during the 19th century is probably part of a complex train of events regarding new musical priorities. Jeremy Gilbert and Ewan Pearson point to Western philosophy’s dismissal of corporeality in musical experiences.

Music is understood by this tradition as being problematic in its capacity to affect us in ways which seem to bypass the acceptable channels of language, reason and contemplation. In particular, it is music’s apparent physicality, its status as a source of physical pleasure, which is problematic. By the same token, this tradition tends to demand of music that it - as far as possible - be meaningful, that even where it does not have words, it should offer itself up as an object of intellectual contemplation such as is likely to generate much meaningful discourse. Even those forms of modernist music which have aspired to pure abstraction (in particular the tradition of serial music), have been written with an emphasis on complexity and a deliberate intellectualism which foregrounds the music’s status as objects of rational contemplation rather than as a source of physical pleasure (1999:42-43).

Though the Western philosophical tradition obviously comprises a wide range of understandings and beliefs, Gilbert and Pearson raise a compelling point. Its emphasis on rational thought has probably encouraged composers, musicians, critics, and scholars to focus on intellectual approaches to music rather than corporeal ones.

The ultimate ascension of the intellectual approach to music listening - for example, the descriptions of listening types by Theodore Adorno (1968:15ff) - and its emphasis on the structure, development, and linearity of musical works are at least partially to blame for the Western scholarly disinterest in connections between music listening and body movements even in the twentieth century. Andrew Dell’Antonio observes that:

structural listening highlights an intellectual response to music to the almost total exclusion of human physical presence - whether that of the performer or that of the listener (2004:8).

But as we argue in this course, even if we try to avoid the body in music, it is still there, and it still influences our experience of music.


The history of listening experience - Part 2

In this video, Hans T. Zeiner-Henriksen continues the historical overview of music and movement, starting at the jazz age and moving through the swing, rock, and disco periods before getting to today’s music scene.

The Listening Experience in the 20th Century

Music scenes developed quickly in the 20th century, and many large changes came about. The most radical change was probably that music now could be experienced without any performers present. The 20th century was the first century of recorded music.

Swing jazz in the 1920s and 30s aimed for making people move. The music was rhythmic, repetitive and danceable. Over time, however, different sub-categories of jazz evolved into less danceable music, such as bebop, cool jazz, and free jazz. The tempo became too fast, or too slow, the structure was less transparent - with many improvised parts, and a respectful jazz audience did no longer entertain in dancing, but had their attention fixed on the musicians. Gradually, conventions for a jazz concert became as fixed as for the classical concert halls, with a seated audience that should applaud after solos and nod their head or tap their feet modestly to the beat. But, in line with the classical conventions, attentive listening was the only way of showing respect for the musicians.

The rock’n’roll that spread like wildfire in the 1950s evolved from the African American rhythm’n’blues. The African American music culture has always had a close link between music and movement, in the church, in concerts, in social gatherings, and many African American music genres are especially rhythmic oriented (funk, hip hop) with an obvious focus on dance. Olly Wilson points to repetition, a percussive orientation and the link between music, movement and dance as some of the elements that points to a heritage from the African continent (Wilson 1983).

In the 1950s the American society was still highly segregated and a white artist was needed to break this new music genre to a larger white audience. Elvis Presley was the perfect man; he could sing, he was good looking and he could move. Today his movements to music does not seem very provoking, but in the 1950s his moving hips were immediately associated with sex and a promiscuous lifestyle. The music was danceable and invited the audience out of their chairs to participate in the music with moving, dancing and singing along, but TV hosts and concert arrangers tried in any way possible to avoid the exposing of his dance moves to escape reactions from the parent generation. The connection between music and movement was seen associated with a wild and uncivilized life.

Most of the 1950s rock’n’roll artists disappeared for various reasons from the public scene around 1959, and the following years were dominated by a popular music more influenced by the crooner-tradition. The most popular dance fad was the Twist, a dance, in contrast to the 1950s rock’n’roll dance, you could perform without a partner. But the popularity of rock music had not ended, it came back with a much stronger force a few years later.

Two bands especially illustrate the development of rock in the 1960s; the Beach Boys and the Beatles. They both have an early period (1962/63-65) where the connection to the rock’n’roll genre is obvious. After 1966 their music became more complex; with the use of classical and other unconventional instruments, more personal and inventive lyrics, experimental studio techniques, more complex harmonies, etc. Both Pet Sounds (Beach Boys) and Sgt. Pepper’s Lonely Hearts Club Band (Beatles) were released in 1966 and seen to a certain extent as concept albums; with an overarching intellectual idea. This development was extremely important in heightening the status of pop/rock, but it simultaneously turned popular music towards the rational, adopting the idea of “pure listening” as the most “serious” engagement with music.

At around the same time James Brown developed his music in the opposite direction. Funk music grew out of the African American soul genre, with an explicit focus on the rhythmic aspects of the music. The groove became the most significant element and the audience did not sit still, even if the concerts were held in places with seats.

Disco came out of New York in the 1970s and by the end of the decade it was everywhere. The movie Saturday Night Fever (1977) was central in spreading the disco craze, but its version of the New York club scene was a slightly altered one; African Americans were replaced by Americans of Italian descent (John Travolta), African American artists were replaced by a British-Australian group (Bee Gees), gays were replaced by straights, and free improvised dance was replaced by instructed dance moves.

Especially many white rock fans were particularly annoyed by the popularity of disco in the 1970s. Radio DJ Steve Dahl fronted an anti-disco campaign he called “Disco sucks”:

Disco music is a disease. I call it disco dystrophy . . . The people victimized by this killer disease walk around like zombies. We must do everything possible to stop the spread of this plague (Steve Dahl quoted in Brewster & Broughton 2006:290).

Dahl is also associated with the infamous Disco Demolition Night at baseball’s Comiskey Park in Chicago on July 12 1979, where reduced admission was offered in exchange for disco records that were in turn blown up inside a container partway through the game. The game had to be stopped caused by the riot that followed. Brewster and Broughton observe that this protest was in fact not unique:

Dislike for disco was everywhere. The rock generation saw it as the antithesis of all that was holy: no visible musicians, no “real” stars, no “live” performance. It was music based wholly on consumption, music with no aesthetic purpose, indeed with no purpose at all other than making your body twitch involuntarily. Dehumanizing, expressionless, content-less - the judgements were damning (ibid:291).

Following the incident in Chicago, disco clearly fell from grace, at least in the United States. The major record companies had forced dance music into a typical star performer-oriented package, and the public in turn experienced lip-synching, derivative arrangements, and other studio “fakery” as evidence of disco’s (rather than the disco business’s) illegitimacy. The major labels saw disco as a passing phenomenon that had to be “exploited as quickly and thoroughly as possible” (ibid:201). This fate would then become self-fulfilling.

After the brutal end of disco, MTV started in United States in 1981 with an explicit focus on white rock music. In their first year they hardly showed videos with African American artists. Columbia records protested against this racist format by making extremely well-produced music videos for Michael Jackson’s Thriller album, and refused MTV access to any of their artists if they did not show his videos (Starr & Waterman 2014:452). Not unlike Elvis Presley three decades earlier, Michael Jackson was tremendously clever in dancing and moving rhythmically to the music, and the music videos were a perfect tool to show this ability.

Hip hop evolved from an African American street dance culture in New York in the 1970s. Its first commercial recording was in 1979 and during the following decades its popularity has spread both in the United States and worldwide. The focus on dance (breakdance/street dance) has been somewhat downgraded, but its emphasis on rhythm and groove has been explicit. Hip hop has become extremely popular and has also influenced what is considered mainstream popular music today.

Contemporary popular music is also influenced by the club music that initially came from United States to England around 1987-88. Disco music reinvented itself, became house music (from the club the Warehouse in Chicago) and was exported to England. House parties and raves were (mostly illegal) gatherings of large crowds for dancing (and ecstacy) during weekend nights. Its popularity spread during the 1990s to become a major commercial scene at the turn of the millennium.

If we try to see these developments in perspective, there are many trends that imply a connection between music and dance and popularity. When the audiences move along to the rhythm and the groove, it seems to have an impact that connects them to the music. Likewise, there are stronger negative reactions towards dance music compared to other music. The bodily aspects of music can create passionate likings, but also strong aversions, and the orientation towards the pleasure of music seems provoking for many.

A lot of contemporary popular music has an explicit focus on rhythm and groove, encouraging participation via overt body movements and dancing. This dominates Western music cultures today, and may open discussions on how bodily engagement can enhance the experience - not in moving the focus away from the music, but in focusing on musical elements that are significant for how music moves.


Why is Terminology Important

The structure of Music Moves is based on three “tracks”. We have just been through the first part of the theory track, and will here start the terminology track.

One could ask why it is important to be picky about terminology. Why cannot we just describe things the way we want? The problem, of course, is that if we want to discuss our research findings with others, we need to be precise in our descriptions. That is why we will spend some time on going through some key concepts together.

For example, what is the difference between “musical” and “music-related”. Or, what is the difference between “movement”, “action” and “gesture”? And, is there a difference between “motion” and “movement”? The next video will give you at least a few answers. And remember, you can always have a peek in the Music Moves Dictionary (see Appendix).

In this video Alexander Refsum Jensenius goes through some of the key terminology in Music Moves.

Being precise about terminology is important to avoid confusion. That is why we are going through some terminology each week. To begin with we will discuss the concept of music-related motion. What is the difference between music-related and musical? And is there a difference between motion and movement?

The third “track” of Music Moves is that of methodology. In this track we will explore different methods that researchers use to study music-related movement.

It is important to remember that there is never only one, correct method for carrying out research. Often, the best is to use a combination of different methods. We may often also separate between qualitative and quantitative methods.

Qualitative methods are often exploratory in nature, aiming to reveal and explain phenomena. This is often done through reasoning and writing text.

Quantitative methods, on the other hand, are often based on using measurements through some kind of technology. Then one uses numerical methods, such as statistics and machine learning, to calculate results.

The weekly videos in the methodology “track” will start by introducing some qualitative methods, and then mainly focus on describing various types of technologies used in modern research on music and movement. We will visit the labs at University of Oslo to get first-hand experience with the methods in a music research context.

What types of methods do researchers use for studying music-related body motion?

In this first methodology “track”, Alexander Refsum Jensenius presents the differences between descriptive and functional analysis, and qualitative and quantitative methods.

Week 2

This week we will look at some of the basic psychological principles related to music and movement. You will also learn about some of the terminology we use and we will start to explore different research methods used.

Did you listen to a particularly good piece of music over the weekend? Did you sit still listening? Did you dance or move in any other way? Did you know the song from before? Did the music move you emotionally, or bring back memories?

Perception as an Active Process

In order to understand our musical experiences, it is essential to look into two psychological concepts: perception and cognition.

It is not easy to give precise definitions of the two terms perception and cognition. Neither is it easy to clearly explain the difference (and relationship) between the two concepts. But we can try: Perception is a broad term, involving the reception of information through our senses (hearing, sight, smell, touch, and so on). But perception is not simply a passive reception of information. As part of the perception process we subconsciously focus our vision or hearing towards events of interest, and we are even able to suppress unwanted perceptual stimuli (imagine what you would do if you saw someone about to make a loud noise right next to your ear).

The way we think of it, cognition is mainly concerned with the mental representation of phenomena. Some will claim that cognition is a sub-category of perception; the part of perception that occurs in the brain. Others, however, argue that perception and cognition are two independent concepts that overlap. What is certain is that the two are interdependent, and that conscious and subconscious mental processes are active in perception; for instance, processes like our memory and our preconceptions and expectations of what we perceive. This is where, for example, our cultural background and upbringing comes into play to shape our experiences.

A traditional view on human cognition has been to think of the human brain as a computer. In such a model the senses could be thought of as “inputs” to the brain, and the brain performs “calculations” to decide on an “output”. This way of understanding human cognition implies a one-directional musical experience, shaped by auditory information from our ears being “processed” in the brain. As such, the musical experience is “passive”, in the sense that there is a one-directional flow of information from stimulus through our senses and to our ears.

The theoretical foundation for Music Moves is following within the tradition of what is called embodied cognition. This tradition is distinctly different from the “classical” cognition model mentioned above in that it regards the perception process as an active process. Rather than passively receiving sensory information, the embodied cognitive approach is based on the idea that we experience the world through our whole bodies.

Fundamental to the embodied cognition model is that our experiences of living in the world shape how we perceive phenomena around us. The American psychologist James J. Gibson introduced the term affordance to explain how we develop knowledge and skills about the world around us. An affordance is something that is offered to the perceiver by the object one observes or interacts with. Affordances are therefore typically action-oriented. For instance, a chair may be sit-on-able because it affords the action sitting. A cup is drink-able because it affords the action to drink. It is important to remember that affordances are always relative to the perceiver. So a chair may have different affordances for a child and an adult. Also, in Gibson’s thinking, affordances are not necessarily positive: a branch on a tree, for instance, may be both climb-on-able and fall-off-able. In 1988, the affordance concept was used by Don Norman in his book The design of everyday things. There are slight differences between Gibson’s and Norman’s use of the term, which you may read more about in this article. Norman explains affordances in this video.

In the rest of Music Moves we will use these ideas from embodied cognition, as we explore the world of embodied music cognition. This term was coined by the Belgian musicologist Marc Leman and takes the living body as the point of departure for how we understand music performance and perception.


Perception as an active process

Music perception is more than just listening. Our own experiences as humans with bodies shape the way we perceive different phenomena. For instance, we know how to produce a clapping sound. This means that perceiving the sound of a clap is not just receiving and interpreting the sound signal, but perceiving the action “clapping”.

Action and Co-Articulation

In this week’s terminology track we will explore the concept of action.

From the theory track you have learned that an embodied approach to cognition is based on the idea that we make sense of the world through our whole bodies. Furthermore, the affordance concept tells us that objects have certain action “properties”, and that we are immediately aware of these properties.

Sound-producing actions

What are sound-producing actions? In this video we look at the difference between motion and action. We learn about different types of sound-producing actions: impulsive, sustained and iterative.

Exploring Sound-producing Actions

We shall here look more closely at some terminology used to describe sound-producing actions.

Objects and Actions

Sounds are produced when actions work on two or more objects, what is often called interaction. We may therefore think of this as an object-action-object system, as illustrated in the figure below.

An object-action-object system
An object-action-object system

In nature, the features of such a system are defined by the acoustical properties of each of the objects involved in the interaction (such as the size, shape and material), and the mechanical laws of the actions that act upon them (such as the external forces and gravitational pull). It is our life-long experience of acoustical and mechanical properties of objects and actions, that makes us able to predict the sound of an object-action-object system even before it is heard.

Affordance of Sound-producing Actions

To use Gibson’s affordance concept again, we may say that an object-action-object system affords a specific type of sonic result. For example, when we see a drummer hit a drum with her stick, we know that we will hear a drum sound because we have seen (and heard!) drums before. Think of any instrument (or other object for that matter), and you can probably imagine how it will sound.

Even more, you can probably think of many different sound-producing actions to use with one particular instrument, hitting, sliding, blowing, and so on. And you can use many different objects to interact with your instrument, such as a mallet or a bows. As you can easily imagine, there are endless combinations of such action-sound couplings. The remarkable thing is that we are able to imagine and predict the sonic outcome of such couplings. Our main argument here is that this embodied knowledge is deeply rooted in our cognitive system.

From Brain to Sound

So far we have only discussed the perception of sound-producing actions. But we may also think of a chain describing the production of such actions. The figure below shows the action-sound chain from cognitive process to sound, and with feedback in all parts of the chain.

Action-sound chain
Action-sound chain

The chain starts with neurological activity in the brain, followed by physiological activity in a series of muscles, and biomechanical activity in limbs of the body. The interaction between the body and the object occurs as an attack when an element of the object (for example a drum membrane) is excited and starts to resonate.

The feedback in the chain is important, as it is part of the action-perception loop. It is this constant feedback from all our parts of the body that make us able to adjust our actions continuously. For example, just think of how a jazz drummer is constinantly able to adapt the playing to make the best sound from the drums and to pick up on the musical elements of the other musicians.

Excitation, Prefix, Suffix

If we zoom in on only the “attack” part of the chain depicted above, it can be seen as consisting of three elements: prefix, excitation and suffix.

Excitation, Prefix, Suffix
Excitation, Prefix, Suffix

The prefix is the part of a sound-producing action happening before the excitation, and is important for defining the quality of the excitation. The suffix is the return to equilibrium, or the initial state, after the excitation.

The prefix, excitation and suffix are closely related both for the performance and the perception of a sound-producing action. Following the idea of our perception being based on an active action-perception loop, a prefix may guide our attention and set up expectations for the sound that will follow. For example, when we see a percussionist lifting the mallet high above a timpani we will expect a loud sound. We will also expect the rebound of the mallet (the suffix) to match the energy level of the prefix, as well as the sonic result. As such, both prefixes and suffices help to “adjust” our perception of the sound, based on our ecological knowledge of different action-sound types.

Action-Sound Types

The French composer and musicologist Pierre Schaeffer was a pioneer in defining a structured approach to thinking about musical sound. We will here build on his concept of three main types of sounds:

  • Impulsive: sounds characterised by a discontinuous energy transfer, resulting in a rapid sonic attack with a decaying resonance. This is typical of percussion, keyboard and plucked instruments.
  • Sustained: sounds characterised by a continuous energy transfer, resulting in a continuously changing sound. This is typical of wind and bowed string instruments.
  • Iterative: sounds characterised by a series of rapid and discontinuous energy transfers, resulting in sounds with a series of successive attacks that are so rapid that they are not perceived individually. This is typical of some percussion instruments, such as guiro and cabasa, but may also be produced by a series of rapid attacks on other instruments, for example rapid finger actions on a guitar.

It is important to note that many instruments can be played with both impulsive and sustained actions. For example, a violin may be played with a number of different sound-producing actions, ranging from pizzicato to bowed legato. However, the aim of categorising sound-producing actions into three action-sound types is not to classify instruments, but rather to suggest that the mode of the excitation is directly reflected in the corresponding sound.

As shown in the figure below, each of the action-sound types may be identified from the energy profiles of both the action and the sound.

Action-sound types
Action-sound types

Here the dotted lines indicate where the excitation occurs. Note that two action possibilities are sketched for the iterative action-sound type, since iterative sounds may often be the result of either the construction of the instrument or the action with which the instrument is played. An example of an iterative sound produced by a continuous action can be found in a cabasa, where the construction of the instrument makes the sound iterative. Playing a tremolo, on the other hand, involves a series of iterative actions, but these actions tend to fuse into one superordinate action. In either case, iterative sounds and actions may be seen as having different properties than that of impulsive and sustained action-sound types.


Introduction to Sound and Movement Analysis

In this week’s methods track we will explore qualitative movement analysis and learn about sound analysis.

As we saw last week, there are many ways of analysing music-related body motion. The qualitative methods are often based on descriptive approaches, and using text as the medium. Several structured approaches to this exist, and in the following video we will look more closely at two systems used for qualitative movement analysis: Labanotation and Laban Movement analysis.

When researching music and movement we need tools for analysing both movement and sound. So after the qualitative movement analysis video you will learn about how it is possible to carry out systematic studies on musical sound. This includes an overview of different visualisation techniques, such as waveform displays and frequency spectra.

Qualitative movement analysis

In this video you will learn about different types of qualitative movement analysis.

We will in particular look more into two systems developed by dance-choreographer Rudolph Laban: the Labanotation and the Laban Movement Analysis. These systems have not received wide-spread usage, but they are probably the two most well-known notation and analysis methods in use in dance and beyond.

Exploring Sound Analysis

This article introduces some basic techniques for quantitative sound analysis. Most importantly, we will have a look at three visual representations of sound: The waveform, spectrum and spectrogram.

We will only scratch the surface of the vast topic of sound analysis, and you are not required to memorise the technical details given here. However, it is useful to have a conceptual idea of the ways in which sound is analysed and visualised. You may want to first read quickly through this article, then look at the video in the next step, and come back to this article for reference.

Sound waves in the air

What happens if you throw a stone into water? As the stone hits the surface, waves spread from the point of impact. A sound in the air is quite similar. However, the waves are not variations in the water. A sound is variations in air pressure above and below the normal air pressure level. Just like the waves in the water, the sound waves spread out from the source, albeit quite fast; approximately 340 meters (1000 ft) every second. In the image below, the sound wave is displayed with dark areas where the air is more compressed, and light areas where the air is less compressed (the air molecules are more dispersed).

Sound is pressure waves in the air
Sound is pressure waves in the air

Imagine that you are standing on the blue “x” in the picture above. We can make a graph of the air pressure at your location, such as the example below. Notice that it starts with a steady air pressure (no sound), and then starts to vary once the sound waves hit you.

A graph of the air pressure as it alternates above and below normal air pressure
A graph of the air pressure as it alternates above and below normal air pressure

The variations in air pressure that we perceive as sound are very rapid. We call the speed of these variations the sound frequency. The lowest audible bass frequencies vary 20 times per second (20 Hertz) and the highest audible frequency to a young person without hearing loss is around 20 000 times per second (20 000 Hertz). The amplitude of the sound wave describes how large its pressure variations are. A quick demonstration of frequency and amplitude is shown in this video.


The most common way of representing sound is the one you will meet in most types of sound recording software: the waveform. The waveform is actually quite similar to the previous figure. A waveform representation shows how the amplitude (y-axis) of a sound varies over time (x-axis). The figure below shows a waveform representation of a very short sound segment. Time is shown along the horizontal axis. Notice how the amplitude varies above and below the 0 line.

Waveform zoomed in showing a graph of the sound wave as it alternates above and below zero
Waveform zoomed in showing a graph of the sound wave as it alternates above and below zero

The figure above shows a very short (3 millisecond) excerpt of a sound file. If we “zoom out” of the image, these variations become so small that they “merge” into the blue solid areas in the figure below. We are now unable to see the individual fluctuations from the previous figure, but we can identify several “bursts” of sound.

Waveform zoomed out. Individual alternations above and below zero are now too small to be seen, but we may observe three large and a couple of smaller sound bursts
Waveform zoomed out. Individual alternations above and below zero are now too small to be seen, but we may observe three large and a couple of smaller sound bursts

Frequency content

Natural sounds contain a range of frequencies. Tonal sounds contain a fundamental frequency and a range of overtones which are multiples of the fundamental frequency. For instance a tone with a fundamental frequency of 220 Hertz (called small A), has overtones at 440 Hz, 660 Hz, 880 Hz, etc. Normally, we cannot hear the individual overtones. The fundamental frequency and the overtones fuse together, and their amplitude relationship plays an important role in determining the perceived timbre of the tone. Timbre, sometimes called tone colour, is what makes it possible to distinguish between for instance a flute and a violin who are playing the same tone.


A spectrum representation shows the frequency content of a sound recording. Here the frequency is shown on the horizontal x-axis, and amplitude on the vertical y-axis. The figure below shows a spectrum of a saxophone tone. The peaks in the spectrum are the fundamental frequency and the overtones of the saxophone sound.

Spectrum of a single tone. A fundamental frequency is indicated by a peak (to the left), followed by peaks indicating the first, second, third overtones, and so forth.
Spectrum of a single tone. A fundamental frequency is indicated by a peak (to the left), followed by peaks indicating the first, second, third overtones, and so forth.


If we are interested in analysing how the spectrum varies over time, we may use what is called a spectrogram (or sometimes sonogram)

A spectrogram is created by dividing the sound file into many short segments, calculating the spectrum for each segment, and placing these next to each other. The picture below shows the conceptual construction of a spectrogram. Essentially, the spectrum of each segment is tilted sideways and colour-coded. The result of placing these colour-columns next to each other is an image showing the variation in frequency content over time.

Building a spectrogram
Building a spectrogram

The picture below shows a spectrogram of the same saxophone melody as shown in the video on sound analysis in the next step.

Spectrogram of saxophone melody
Spectrogram of saxophone melody

Sound descriptors

Sometimes we need more precise descriptions of a sound file than we can get from a visual inspection of a waveform or spectrogram. We may then move over to quantitative analysis using sound descriptors. A sound descriptor is a numerical description of a single aspect of the sound. The descriptor may be global, describing an entire sound, or time-varying, describing variations within the sound.

Examples of descriptors

  • In a sense, duration is a global descriptor. A sound has a duration which may be described globally with a single number.
  • Sound energy as a global descriptor is a description of the total sound energy in the sound file. This is typically found by calculating the root-mean-square value of the waveform. Sound energy may also be a time-varying descriptor. Instead of calculating the energy of the entire sound file, the energy is calculated for a sequence of short time-windows. One number is calculated per time-window, resulting in a sequence of numbers.
  • The spectral centroid is often explained as the “centre of gravity” of the spectrum. The time-varying spectral centroid usually reflects how the brightness of the sound evolves over time. The resulting sequence of numbers can be displayed in a plot such as in the picture below.
The spectral centroid shown as a curve on top of a spectrogram. As high frequency content is filtered out, the value of the spectral centroid becomes smaller
The spectral centroid shown as a curve on top of a spectrogram. As high frequency content is filtered out, the value of the spectral centroid becomes smaller
  • Spectral flux describes how much the spectrum varies over time
  • Roughness describes a dissonance in the sound that is the result of certain limitations in our auditory system (critical bandwidth)


There exists a wide range of software for sound analysis. Here is a selection of free software that you may try yourself:

  • Sonic Visualiser: Free and user-friendly tool for visualising audio files. (Windows / Mac)
  • Audacity: Free software for recording and editing multitrack audio. (Windows / Mac)
  • Praat: Free software for audio analysis. Mainly targeted at speech analysis but also useful for other types of musical sound. (Windows / Mac)
  • Spear: Free software that lets you analyse and manipulate individual sinusoidal components of a sound file. (Mac)
  • SpectrumView: Free iOS app producing a simple spectrogram from microphone input. (iOS)

In addition, there exists a range of tools that require a Matlab license. These tools are more advanced, and have a higher threshold to get started:

  • MIR Toolbox: Advanced toolbox for audio analysis by Olivier Lartillot. (requires Matlab)
  • Timbre Toolbox: Advanced toolbox for audio analysis by researchers from IRCAM and McGill University. (requires Matlab)

How do we analyse sound?

As for movement analysis, there are both qualitative and quantitative approaches to the analysis of musical sound.

In everyday life, we use words to describe sounds. A sound may be bright, or a song may be sad. Sometimes we need more precise ways to describe sound. In this video you will learn about three different ways of visualising sound: waveform, spectrum, and spectrogram, and also that sounds may be described by means of sound descriptors.

Week 3

This week you will learn more about the importance of multimodality in human perception in the theory track.

The terminology track will focus on the motion of performers that do not produce sound: sound-modifying and sound-accompanying actions, as well as communicative movements and gestures.

Finally, in the methods track we will take a closer look at “motion capture” systems. Remember to use the dictionary

Introduction to Multimodality

Multimodality is a term used to emphasise that our senses work together.

A modality refers to one of the channels we use to get information, such as audition, vision, taste, balance and proprioception. In most cases, the modalities confirm each other: they contribute to strengthen our perception of a phenomenon. However, in particular cases our mind perceives something that is not present in any of the modalities. An example of this is the McGurk effect, named after the psychologist that first described this phenomenon. You will see a demonstration of this effect in the next video.

Our cognition’s multimodal nature may explain why we easily project features of one modality onto another. When hearing a sound, we assume something about the sound production, even if we have never heard the sound before. What type of material may have caused this sound? What type of action was involved?

Furthermore, musical elements such as phrases, rhythmic patterns, melodies or harmonic progressions may give associations to shapes and movement. The same elements may even induce strong emotional reactions in the perceiver. But why is this so? Move on to learn more.



This video presents the term multimodality and how this aspect of our perceptual system has implications for how we perceive music.

Can Body Movement Teach us Something about Music Perception

As we have learned previously, the theory of embodied cognition tells us that our involvement with the world is based on the fact that we are humans with physical, moving bodies.

The same holds true for our involvement with music. The paradigm of embodied music cognition suggests that our bodies are integral to how we experience music. As such, we can learn a lot about how we perceive music by studying how we move to music.

Studying music cognition

How would you design an experiment to study people’s perception of music? There are many possible answers to this question. Let us consider two alternative methods.

Using qualitative interviews is one method to study how people experience. This makes people reflect on their own behaviour, and will therefore give valuable subjective data. By asking people about how they move to music, they are forced to articulate and express how they perceive the music, but only indirectly. Our experience is that people answer quite differently whether they are asked: “How did you move to this song?” rather than “What do you think about this song?”. Asking both questions may also be a way to reveal more about their musical experience than they would have otherwise shared.

However, talking about one’s own behaviour may be challenging. Few people have the vocabulary to express the nuances of their own bodily behaviour and perception of music. Furthermore, if the interview occurs after a concert, the person’s long-term memory becomes an important factor in shaping their answer.

A more quantitative approach is to ask people to use a device with a button or slider to indicate their excitement with the music. This removes the language barrier and can be carried out in real time. However, such an apparatus may not be so easy to use in most natural music environments, such as a concert hall. Furthermore, by asking people to indicate their excitement, some preconditions are put into the experiment by the researcher and many nuances of how people perceive the music are not considered at all.

Yet another method, and the one that we will focus most on in Music Moves, is that of “motion capture”. Observing people’s bodily response to music is “easy” for most people, since they do not have to put their experience into words nor do any conscious activity other than just moving regularly. However, the close connection between perceiving and moving (as discussed in the video on perception: affordances, mirror neurons, etc.) suggests that movement to music may be a bodily expression of how you perceive the music. Marc Leman calls this corporeal articulations, which he discusses thoroughly in the book Embodied music cognition and mediation technology.

Suggested reading

Studying music perception in the motion capture laboratory

Alexander and Kristian discuss some of their experiences from experiments on music and movement. Are there similarities in the way different people move to music?

Introduction to Motion, Action and Gesture

This week’s terminology track focuses on actions that do not produce sound directly.

One example are sound-modifying actions, for example the feet of a pianist as she pushes the pedals on the piano. Another is the sound-accompanying movements one may find when moving to sound.

We may also talk about various types of communicative movements, such as between musicians, between a conductor and the musicians, or between musicians and audience members.

Finally, we may talk about gestures, movements or actions that are used to express some kind of meaning. More on all of this in the next video.

Infrared Marker-based Motion Capture

Infrared marker-based motion capture is one subcategory of optical motion capture systems.

As discussed in the previous video these systems may be very precise, with high resolution and high recording speeds. This type of technology is widely used for animation purposes in the film and gaming industries, and for medical purposes and rehabilitation. An increasing number of music researchers are now also making use of such systems in their studies of music and movement.

Cameras and markers

Infrared marker-based motion capture systems use reflective markers on the body or on an instrument. Such markers vary in size, and the best size to choose depends on the type of movement to be recorded. For instance, quite small markers are used to capture facial expressions, and larger markers can be used for full-body movement.

In order to “see” the markers, each of the mocap cameras emits infrared light. The light is reflected off the markers, and sent back as a two-dimensional image to each camera. The computer can then determine the exact location of the marker in space by combining the images from each camera such as sketched below.

Cameras finding the location of a marker
Cameras finding the location of a marker

Three-dimensional motion capture data

Infrared mocap systems provide three-dimensional position data for each marker. The three dimensions are measured along the axes X, Y, and Z, and the orientation of these axes are determined when the system is calibrated. In a rectangular room, it often makes sense to let the axes run between opposing walls, and from floor to ceiling.

Recording motion data at a rate of 100 Hz means that 100 measurements are made per second, each with 3 data points (X, Y, Z) per marker. Considering that a full-body motion capture may require up to 30 (or even more) markers, we end up with a large amount of data. Software for simple processing and visualisation of the data is usually available from the mocap system provider. However, for music-related research it is often necessary to use analysis software that is tailored for our needs. One such example is the MoCap Toolbox from the University of Jyväskylä in Finland.

Data processing

Various processing of the recorded data is often needed, this may be small adjustments due to minor errors in the recording, or transformations of the data to calculate for instance the velocity or acceleration of the movement.

Occlusion: Gap-filling

One normal problem with motion capture data, is that a marker is “lost” in the recording. This happens when a marker is occluded or if it is moved out of the field of view of the cameras. Small gaps in the marker data can be easily repaired with so-called “gap filling”, based on interpolating between the closest data points to estimate the marker position in the gap, as shown below.


For longer gaps in the data it may be impossible to accurately estimate the marker position. That is why it is important to create as good recordings as possible in the first place.


Sometimes the recorded mocap data may be noisy, for example with small random errors in the data set. This may be caused by poor lighting conditions or a bad calibration of the system. It is still possible to reduce the noise level, by applying a smoothing filter to the data, as shown below.



Finally, after gap-filling and smoothing the data, it may be necessary to transform it in different ways. Here the research question is the most important when it comes to deciding which types of data processing and transformation is needed. Some popular transformations include:

  • Position data is intuitive, and is often useful directly.
  • Sometimes we need to look at how fast the position changes. The rate of change of position (the derivative of position) is what we call velocity: how fast does the position change? Velocity is closely related to kinetic energy, and may therefore be useful in certain types of analysis.
  • Similarly, we may need to look at how fast the velocity changes. This is called acceleration, and is calculated as the derivative of velocity. Acceleration is closely related to force.
  • Higher derivative levels may also be useful. It has been suggested that jerk, which is the derivative of acceleration may convey certain motion properties related to affect and emotion.
  • We may also use the position data from markers to calculate joint angles, orientation/rotation, periodicities, and so forth.

There are also numerous more advanced processing and transformation techniques in use, and we suggest to check out the MoCap Toolbox to explore these further.


Week 4

This week you will learn about the concepts of pulse and entrainment in the theory track.

The terminology track will focus on the motion of perceiver’s, people experiencing music.

Finally, in the methods track we will take a closer look at motion capture systems that can be used outside of the lab, what we call mobile mocap.

Pulse and entrainment

Do you ever nod your head or tap your feet when you hear music?

This activity usually follows a certain pulse in the music. But how do we find this pulse within a complex musical landscape of sounds? What sounds in the music are most vital? And how de we transfer this (mostly unconscious) understanding of pulse into a muscular activity that causes us to nod our head or tap our feet?

The Feeling of Pulse and the Entrainment Process

A fundamental property of music is its ability to structure time. Elements like recurring sounds and accentuations express an organization of the musical events into periods of a certain length that can be measured according to a tempo as divisions of clock time (beats per minute). In popular music this tempo is generally steady (isochronous) and expressed most clearly by the drums, with recurring bass drum and snare drum sounds usually playing a central role. In classical music there is more common with small tempo variations. This is called rubato.

To what extent do rhythmic groupings of music have connections to the human body? The connection is most apparent in how rhythmic groupings can be expressed through movements like head nodding (or head banging), foot or finger tapping or upper body bouncing. The ability to perform such movements in synchrony with music is widespread and cross-cultural, and for most dancers and musicians it is often essential for their ability to perform their tasks.

Moreover, it is a very human ability, not observed performed by species closely related to us. In Coldplay’s music video for the song “Adventure Of A Lifetime” (2015) they have chimpanzees that move in synchrony to music. These animals are of course fake. No animal have been successfully trained to perform anything like this (except a cockatoo called Snowball that you can find on several YouTube-videos bouncing back and forth in synchrony to the beat).

Entrainment is a term that describes a process where one rhythm adopts the period and phase of another. If you place two pendulum clocks on the same surface and start them out of phase they will eventually end up in synchrony according to how well the surface transfers oscillations. This term is also used on behavioural processes where we, for example, coordinate our actions with others, and fits well on the human ability to synchronize with the rhythm in music (see Clayton et al. 2005).

Musicologist Mari Riess Jones has done many studies of rhythm in music. She uses the term entrainment to describe how our attention (or attentional energy) may oscillate according to a musical rhythm (Jones 2004) with flexibility to variations and tempo shifts. This process supports dynamic system theories of cognition and from this body of work she has adopted the term attractors or attractor points. When two oscillations interact there are points in their trajectories that are especially significant for how they align (Kelso 1995).

Events in music that are vital for an entrainment process can likewise be named attractors, since they can cause a type of bodily oscillation. Her claim supports the existence of neural oscillations; processes of repeated neural activity that facilitate the performance of cyclic operations (see also Stern 2004:80).

Neural oscillations are also probably active in entrainment processes when repeated movements to musical rhythms are performed. Daniel Schneck and Dorita Berger point to the correspondences between rhythmic pulse in music and muscle activation:

Rhythmic pulsation embodies a consistent symmetrical balance of energy output, of fall and rebound of tension and relaxation. Rhythmic vibration in music involves the same steady stream of force-rest-force-rest, of systematic strong and weak impulses, of alternating flexion (contraction), release (relaxation), and extension as in the case for paired and coupled muscular behavior. (Schneck and Berger 2006:139, emphasis in original).

The musical pulse transfers a rhythmic vibration of oscillating energy that can coordinate muscular behaviour.

Via dynamic system theories of cognition where bodily actions are seen as highly significant (see Thelen and Smith 1994 and Kelso 1995), a framework for the entrainment process may be outlined; certain repeated sounds or accentuations in the music function as attractors that form an oscillating energy that coordinate (entrain) neural oscillations of muscular behaviour.

In dance music productions a steady stream of bass drum sounds is often used to make the pulse unambiguously present. These are typical attractors. The rhythmic energy pounds steadily and the listener may respond with various types of corporeal up-and-down movements.

A hi-hat sound (or a similar high frequency sound) is often added in between the bass drum sounds (on the off-beats) doubling the number of attractors, marking out an opposite point in the rhythmic energy and the trajectory of an up-and-down movement (ex: Gloria Gaynor: I Will Survive, Cher: Believe).

Moreover, a snare drum sound may replace or supplement every second bass drum sound causing certain variations to the rhythmic energy and thus to the corporeal movements.

There are not many rhythmic patterns that have a clearer connection between sounds and rhythmic pulse than the pattern described above, and it is, probably for this reason, used in numerous dance music tracks. More generally in music attractors do not occur that frequently, and they can be formed by different musical elements and be more difficult to identify.

What is considered the pulse may also be ambiguously communicated and there are many cases of more complex and compound rhythms. The rhythmic feel - often expressed through a repeating rhythmic groove - is an essentially vital element in much music. It gives the audience the opportunity to move along to the music in meaningful ways and it forms fundamental aspects of mood and feeling.

Many times an unambiguous pulse may be communicated throughout the whole song, but how it is done always varies along the way. Which instruments that are present and how they participate in this action can shape a kind of corporeal journey from start to end. Of course other elements are active in shaping this journey (harmonies, melody, sound, lyrics, etc.), but my argument here is that the main pulse is more than just divisions of clock time. The feeling of pulse plays a significant role in how music can express a diversity of mood and emotional content.


Pulse and Music Culture

Although our ability to entrain to an external rhythm might be innate, the perception of an underlying reference structure in music - for example, the underlying pulse - is also highly dependent on the music culture. Our perception is determined by our previous experience, and by repeated regularities in our environment.

Music culture can be defined as what arises when multiple people share a repertoire of musical concepts and practices (Baily 1985, Blacking 1995). For example, did Hannon and Trehub (2005) find culture-specific musical biases between adults from Bulgaria and Macedonia (who are exposed to music with a non-isochronous pulse) and adults from North America (who are mainly exposed to music with an isochronous pulse). This suggests that pulse perception also depends on one’s familiarity with the specific music culture.

Sometimes the underlying reference structures such as the pulse are not necessarily represented by the actual sonic events. For example, in the beginning of Stir It Up by Bob Marley, the guitar riff is played between the pulse beats. However, a perceiver familiar with reggae would immediately recognize the underlying pulse despite (or because of) the off-beat guitar riff. In other words, the underlying reference structure is sometimes indicated by style-specific recurring rhythmic patterns that do not follow the underlying pulse.

So how can the underlying pulse be identified if it is not present in the sounding music? As previously mentioned, the pulse level in music is often externalized through body motions. Agawu (2003) describes how the underlying reference structure in many West and Central African dances is indicated by typical rhythms that do not follow the underlying pulse, which is only visible in the corresponding dance (Agawu 2003:73). In a study of Brazilian drum patterns, Kubik (1990) explains that, since the percussionists’ “inner pulsation” was often not present in the sound, one had to find it in the body motion of the musicians and dancers.

In music styles like traditional Scandinavian folk dance the cue is not in a specific sonic rhythm but in a more complex implied pattern. Blom (2006) points out that the underlying pulse in certain traditional Scandinavian dance music genres consists of non-isochronous sequences, and that this underlying structure should be understood in relation to the dancers’ vertical motion of their center of gravity and the musicians’ foot stamping.

We have seen that in some music styles, the underlying pulse does not coincide with sonic events, but is derived from a rhythm pattern typical for the style of music. In other styles of music, like some Scandinavian dance music genres, the cue is not in a specific sonic rhythm but in a more complex pattern. Either way, however, we find the same mechanisms of cultural learning.


An Analysis of a Ritardando

Ritardando is a musical term (from Italian) for a deceleration in tempo. In the following a ritardando in a song is analysed in accordance to the theory presented in this course.

The song “Take Me Out” by Franz Ferdinand (from 2004) has an electric guitar intro that actually sets a pulse with a particularly fast tempo (286 bpm). The repeated guitar tones are identical, which is a typical characteristic of musical elements that mark downbeats. However, this tempo is so fast that most bodily up-and-down movements (head nodding, foot tapping) become very tedious or difficult to perform. Thus, a bodily interpretation of what is the main pulse will more likely be half the tempo (143 bpm) even from the very start.

This interpretation is confirmed when the verse starts. A hi-hat with accentuations on every second guitar entry marks the downbeats of this slower tempo. Half way through the verse (0:20) a bass drum is added also marking the downbeats, making this slower pulse even more definite. How the pulse is interpreted from the start of the song may not be crucial to the experience, but the energy that this part expresses gives a kind of aggressive mood to the opening of the song.

The most interesting section of Take Me Out in relation to entrainment and pulse is the ritardando from 0:50 till 0:56; when the tempo during two bars slows down from the initial 143 bpm to 104 bpm.

Throughout this section a bass guitar doubles the initial energetic guitar theme while bass drum sounds clearly mark the downbeats. The ritardando is audible from 0:50, but a slight deceleration has already taken the tempo down from 143 to 140 bpm. When it truly starts, it has a smooth descend to 123 bpm during the first bar, continuing down to 104 during the last.

To establish the new tempo and confirm that the ritardando now has ended, four downbeats are marked with all instruments. In Kronman and Sundberg’s (1987) study of musical ritardando they argue for connections to natural decelerations in physical human movement. Their belief is that we recognize how the ritardando develops from patterns in our human experience of slowing down the speed of bodily movements.

In “Take Me Out” the diminutive early ritardando probably prepares the listener unconsciously of what will come. Then the smooth ritardando develops according to a natural deceleration of a physical movement. Since a ritardando may signal a continuing path to a full stop, the music has to clearly confirm that the ritardando now has ended and a new tempo is established.

After the slower tempo is established through four bars of clear energetic downbeat markings, a new guitar theme is introduced. A this point the hi-hat is moved to the off-beats. They can be experienced as attractors marking an opposite position of the downbeats in a bodily up-and-down movement.

In popular music and especially dance music, an off-beat hi-hat (or similar high frequency sounds) very often corresponds to the peak position of an up-and-down movement. Both the guitar theme and the following vocals also have their highest note entry on the off-beat followed by a descending interval. Are such structural positions coincidental or do the alternation of low and high sounds and matching ascending and descending melodic lines fortify experiences of up and down when moving to a rhythmic pulse? ( - this will be further discussed in week 6)


Introduction to Perceiver Movements

In the previous videos in the terminology track we have looked at different types of music-related body movement in performers. This week we will look more at the perceivers, that is, people experiencing music.

Before we get started with the video, it may be useful to recall that we use the term perceiver to focus on the multimodal approach to the experience of music. While we often talk about “listening” to music in our daily life, this usually implies a truly multimodal experience. Few people really mean that they only listen to the sound of the music, even though this is of course also possible. But as we have looked at earlier, listening to the sound is only one part of the experience. The visual element is also crucial, and other senses too even the taste music. Yes, everyone that has played a wind instrument will know that music can taste something. Ask your clarinet playing friend!

But what types of movements can be found in perceivers? Generally, we may find all the same type of movements in perceivers as in performers, including sound-producing. A main difference, though, is that perceivers are usually not the main focal point in a musical context. That is the role of the performers. Still, we have found that studying the body movements of perceivers can be very interesting. After all, a person’s behaviour may reveal a lot about his or hers cognitive state and experiences.

Perceiver movements

In this video we look at the movements of perceivers.

Mobile Motion Capture

Sometimes, it is relevant to capture the movement of performers or perceivers outside of a laboratory setting. This can be for research purposes - studying how performers or perceivers move in a real concert - or it can be an interactive part of the performance, tracking the movement of a performer or perceiver to influence visual displays or the musical sound. This article presents some of the technologies you will see in the next video step.

The most precise way of recording motion is in a dedicated motion capture (mocap) laboratory with optical infrared cameras and reflective markers. However, there are several reasons why we might want to use other types of motion capture systems to studying music-related motion:

  • cutting edge camera-based mocap technologies are expensive
  • setting up a camera-based system in a “real” setting (such as a concert hall) is visually distracting and might be disturbing to the performer
  • lighting conditions are often less than ideal when measuring people’s body motion in a “real” situation (as opposed to in a mocap lab).

Inertial measurement units (IMUs)

Inertial sensors operate on the physical principle of inertia:

  • Accelerometers measure acceleration
  • Gyroscopes measure the amount of rotation of the sensor.
  • Magnetometers measure the orientation in relation to the earth’s magnetic field (as a compass)

Magnetometers are strictly speaking based on magnetic sensing, not inertial, but are often included in what is called inertial measurement units (IMU). An IMU is inside most modern smart phones and tablets, and makes it possible to estimate orientation and acceleration.

The Xsens Suit

At the University of Oslo we have an Xsens suit, which consists of 17 IMUs in combination with a kinematic model of the human body. The kinematic model restricts the possible position of each of the IMUs, and thus facilitates calculating the position of each sensor based on the acceleration and rotation data provided by the sensors.

The Xsens suit makes it possible to record motion in any location. It can also be used in real time and the motion data can be used as an interactive element of a music performance. Here is an example of the Xsens suit used in performance. In the video, the performer is controlling a number of synthesizers with his movements. The data is fed to a program called Ableton Live, where certain parts of the music are pre-composed some parts are controlled by the performer (e.g. triggering new sections and selecting between different chords). We will look more closely at such interactive music technologies in the methods track in week 6.

Suggested literature

The mobile mocap studio

How can you capture motion out in the field? Kristian Nymoen is demonstrating a mobile motion capture system.

Motion capture technology is not only confined to advanced motion capture laboratories. Several portable solutions exist for doing motion capture. Most of these are based on inertial sensing technologies, namely accelerometers and gyroscopes. Most modern mobile phones contain such sensors. More specialised systems for mobile motion capture also exist, such as the Xsens suit which is shown in this video.

Week 5

This week we will focus on the concept of groove in the theory track, and get to know concepts like “offbeat”, “upbeat”, “downbeat”.

In the terminology track you will learn more about gestures and the term coarticulation.

In the methods track you will get an overview of how it is possible to analyse motion from ordinary video recordings. Remember to use the dictionary.

5.4 A Discussion on Groove

In this article we will explore the term groove as a noun and as a verb.

As a noun, groove generally indicates a certain part of an overall musical sound, mix, or arrangement. Allan Moore places the groove in what is “laid down by the bass and drum kit” (2001:34). This is not a definition, but it does reveal a common point of view about the most relevant instruments.

Vijay Iyer suggests that the groove

might be described (but not defined) as an isochronous pulse that is established collectively by an interlocking composite of rhythmic entities (Iyer 1998).

Though the groove�s relationship to the steady (isochronous) pulse seems rather vague in Iyer�s description, it certainly shows the link to a set of rhythmic components. But which ones?

Timothy S. Hughes writes,

A figure is not a groove unless it is designed to be repeated (2003:14).

The expectations created by repetition are vital in this respect. (see Danielsen 2006, chap. 8, and Hawkins 2008 for discussions of repetition). Solitary rhythmic events might affect the groove but do not contribute to any recognizable structure for it, which is important for the groove to be established.

Another central question: Does the groove in fact have to be established collectively, or can a single instrument (or sound source) supply it? When a sole bass drum sound booms out of the speakers in a club, and the crowd starts moving, is this a groove? This question raises the issue of the aesthetic qualities often linked to the term. Referring to Charles Keil and Steven Feld’s 1994 book Music Grooves, Iyer writes,

Groove involves an emphasis on the process of music-making, rather than on syntax . . . The focus is less on coherence and the notes themselves, and more on spontaneity and how those notes are played (Iyer 1998).

Like Keil and Feld, Iyer favours for his vision of groove the interaction of a group of musicians playing live, and the inevitable “miniscule, subtle microtiming deviations from rigid regularity” that follows (ibid).

Two further questions arise here: Are grooves only to be related to live musicians? And do grooves require deviations from rigid regularity? A recording of live musicians undeniably preserves groove relations, so the first question is less concerned with the actual presence of musicians than with some sense that the music is being played “live,” either in concert or on a recording. But surely music production techniques like multitrack recording, overdubbing, quantization, editing, and the use of drum machines, sequencers, and other types of electronic music equipment are also tools for the production of grooves, at least when they are used in a “groove”-preserving manner.

This leads to the second question. The performance ideal of playing as “tight” as possible according to the studio�s “click track” arose in many pop genres during the 1970s, especially around disco music. During the 1980s, sequencers and drum machines maximized this “tightness” while creating the expectations later to surround electronic dance music and its body movement. The use of electronic equipment is especially efficient for producing the machine-precise timing that is seen as appropriate for this genre. But the very same equipment and techniques are also used in other genres with very different ideals of “tightness.” Ultimately, while deviations from rigid regularity are certainly central to many genres of groove-based music, they should not be considered a prerequisite or universal quality of a groove.

Rather, grooves and what should be considered their vital musical elements should chiefly be seen in connection to body movement. Certain basic rhythmic pulse-oriented elements of a groove may facilitate a basic movement pattern while other sounds, appearing between the downbeats and upbeats or atop them, shape this pattern or even suggest alternatives to it (various body parts can move simultaneously to different pulses). Thus all recurring sounds that take part in this process should be considered elements of the groove.


When “groove” is used as a verb, an adjective, or an adverb, it has an aesthetic connotation. In this form, several scholars have expressed similar notions regarding qualities related to grooves, in terms of both how they are reacted to and how they are produced.

The Norwegian musician and music researcher Carl Haakon Waadeland discusses the quality of “swing,” a term typically associated with jazz music, that has definite parallels to “groove”:

Swing is conceived as a quality of music performance, related to a process through which the musicians, both individually and in an interactive context of playing together, make a musical phrase � a rhythm or a melody � �come alive� by creating a performance that in varying degrees involves playing �against� a �fixed pulse� (Waadeland 2001:23; emphasis in original).

Turning to the music listener and the experience of swing, he continues:

When exposed to music that we perceive as swinging, we often want to tap our foot, clap our hands, move our body, or, perhaps, dance to the music. In this way we experience how swinging and “groovy” music initializes “energy” and generates movements in our body, thus, various body movements may be seen as a consequence of an experience of swing (loc. cit.; emphasis in original).

Waadeland then extends this type of experience to comprise Western classical music (Bach, Stravinsky, a Vienna waltz), Brazilian samba, and Norwegian folk music, where every performance must swing “in its own specific way” (ibid.: 24; emphasis in original).

These perspectives on how swing is produced and received are in line with Keil�s notions of swing and groove:

It is the little discrepancies within a jazz drummer�s beat, between bass and drums, between rhythm section and soloist, that create �swing� and invite us to participate (Keil 1987:277).

Keil also argues that participatory discrepancies are present through the use of various types of sound production equipment and effects, including “space, echo, reverb, digital delay, double-tracking” (ibid.: 282). Such effects can introduce important dimensions to a track, but for groove-based popular music, entry points at precise positions, echo- or delay-effects that strengthen exact metrical subdivisions, and the absence of any reverb might be just as important.

“Groove” or “groovy” as a verb or an adjective/adverb is used to express a specific experience with music. The nature of these experiences may not be universal, but in line with Waadeland, one generalization is probably acceptable: the music grooves if body movements are activated by its rhythmic elements. How music is produced or played in order to activate movement varies according to specific cultural traditions to such an extent that the question becomes moot. The contributions of Keil, Iyer, and Waadeland, however profound, do not embrace all groove-based music. There are common features and similarities but also significant differences among the various genres. Why some people move to a certain type of music and others do not reflects the kinds of music to which they were previously exposed. Individual body movements and movement patterns are shaped according to the style of dance music in question, and familiar genres usually work better.


5.6 An Analysis of a Dance Music Groove

Electronic dance music has become extremely popular since the 1990s. In the following a groove from the beginning of a typical dance music track is analysed according to the theory presented in this course.

The first four measures (0:06�0:14) of Basement Jaxx�s Jump n� Shout from 1999 is an example of a groove that has a clear communication of the main pulse, but with an additional pattern that makes this beginning more exciting (more groovy!). In the notational representation underneath you can see the bass drum sounds as the lowest staff. The tempo of 127 bpm (2 bass drum sounds per 0.94 seconds) is a good tempo for dancing (and for a fast walk). On the spectrogram beneath the notational representation you can see the bass drum sounds as the largest figures � repeated eight times. Notice that these figures are thinner at the start and fatter at the bottom. This shape is formed by a descending pitch movement that makes the experience of this sound as a downbeat even stronger (Zeiner-Henriksen 2010a:Chap. 8).

Together with the bass drum there is a hi-hat that can be seen at the top staff of the notational representation. This sound has mostly entry points at the off-beat, forming an alteration between the low bass drum sounds and the high hi-hat sounds � a pattern that is very effective in setting a main pulse (see Zeiner-Henriksen 2010b:Chap. 3).

  • Notational representation of Basement Jaxx�s Jump n� Shout, 0:06�0:14.
  • Spectrogram of Basement Jaxx�s Jump n� Shout, 0:06�0:10, cymbal pattern circled.

In the middle staff of the notational representation you can see a third rhythmic layer consisting of a cymbal pattern � its attacks are as salient as the hi-hat sounds but not as dominant as the bass drum sounds. You can also see the entry points of this pattern circled in the spectrogram. While the two others communicate the main pulse, this pattern is present to make it more exciting � more groovy. But how does it become more groovy?

If we say that the alternating bass drum and hi-hat pattern activates a steady up-and-down movement (head nodding, upper-body bouncing), then this cymbal pattern interacts with that structure. The third (and seventh) event of the cymbal pattern has the same entry point as the bass drum sound and therefore stresses those downbeats. In this notational representation there is an undulating line that represents a possible movement curve � down on the bass drum sound and up on the hi-hat, and with a slightly lower curve where the cymbal sound has the same placement as the bass drum sound.

  • Notational representation of Basement Jaxx�s Jump n� Shout, 0:06�0:14, with suggested movement curve.

This up-and-down movement pattern is probably not changed by the cymbal pattern, but our experience of the movement might be altered. Being placed right before the bass drum and hi-hat sounds, they seem to introduce a sort of tension or friction into the groove, making small dents in the movement pattern established by the bass drum and hi-hat. In the notational representation and the spectrogram below these tension points are marked as small bumps in a possible movement curve (an up-and-down movement).

  • Notational representation of Basement Jaxx�s Jump n� Shout, 0:06�0:14, with suggested movement curve and possible tension points circled.
  • Spectrogram of Basement Jaxx�s Jump n� Shout, 0:06�0:14, with suggested movement curve and possible tension points circled.

An electronic dance music track often starts out with a build-up section that leads to a more complete groove where there are more interactions between rhythmic patterns. While some of these may be closely connected to and in various ways supportive of the basic beat that communicates the main pulse, other patterns may be more independent and contribute with elements that bring tension, emphasis or various forms of expectation to the groove.

  • Notational representation of Basement Jaxx�s Jump n� Shout, 0:18�0:21, with possible tension points, emphasized beats and entries producing expectation circled.

The excerpt represented above starts two measures further into the track (measures 7 and 8; 0:18�0:21). Here the hi-hat is boosted in the mix compared to the preceding part (measures 1 to 6). A snare drum joins in with three similar but not identical sounds that interact in yet other ways with the dominant movement pattern. The cymbal pattern is somewhat simplified, probably to avoid colliding with the snare drum pattern.

The first two snare drum events seem to have an effect similar to the cymbal pattern in creating tension or friction in the groove, while the three events ending both of the periods of four beat-cycles seem to function as a pick-up note in relation to the following downbeats, bringing a sense of expectation to that part. The extra snare drum sounds that fall exactly on the upbeats do not seem to have a role beyond somewhat emphasizing this specific beat.

  • Spectrogram of Basement Jaxx�s Jump n� Shout, 0:18�0:21, with suggested movement curve and possible tension points, emphasized beats and entries producing expectation circled.

Given the track�s tempo of 127 bpm, it may seem like a reach to identify this many influential events. But it is important to distinguish among the various roles that sounds might play in forming a groove that in turn moves the body. There are no right answers or straightforward recipes for good dance music: these roles will influence each other in quite intricate ways, and each dancer will respond differently to them as well. But in aiming to distinguish what makes a good groove, we must allow for all of the possibilities.


5.9 Musical Gestures

Gesture has been a buzzword recently, but what is actually a gesture?

There is not only one, precise definition of gesture. In fact, it differs widely. In this article we will discuss its usage and various meanings.

The Oxford dictionary offers a “classic” gesture definition:

a movement of part of the body, especially a hand or the head, to express an idea or meaning

This definition is almost identical to those of other large dictionaries, including Merriam-Webster, Collins and Dictionary. It is interesting to note that all of these definitions focus on three elements:

  • movement of the body
  • in particular, movement of the hands or head
  • expression of an idea/meaning/feeling

The MacMillan dictionary adopts a slightly broader definition:

a movement that communicates a feeling or instruction

Here, “instruction” has been added as part of the definition, and this is also followed up with two sub-definitions:

  1. hand movement that you use to control something such as a smartphone or tablet […]
  2. the use of movement to communicate, especially in dance

Of all of the general definitions of gesture, MacMillan’s resonates best with recent trends in many research communities.

Academic Definitions

As can be expected, there are numerous definitions of gesture in the academic literature. They may be broadly divided into three main categories:

  • Communication: gestures are used to convey meaning in social interaction (linguistics, psychology)
  • Control: gestures are used to interact with a computer-based system (HCI, computer music)
  • Metaphor: gestures are used to project movement and sound (and other phenomena) to cultural topics (cognitive science, psychology, musicology)

The first type of definition most closely resembles the general understanding of the term, as well as the definition that is presented in most dictionaries.

The second type represents an extension of the first, but incorporates a shift of communicative focus from human-human to human-computer communication. Still, the main point is that of the conveyance of some kind of meaning (or information) through physical body motion. In its purest sense, such as finger control on a touchscreen, this type of human-computer communication is not especially different from that of the “gesture” used in human-human communication. Likewise, nowadays most people are accustomed to controlling their mobile devices through “pinching,” “swiping,” etc., so it seems like such “HCI gestures” have become part of everyday language, just as the MacMillan definition suggests.

The third type, on the other hand, focuses on gesture in a metaphorical sense. This is what is commonly used when people talk about the musical gesture.

Musical gesture

Musical gesture has become a popular way to describe various types of motion-like qualities in the perceived sound (such as by God�y) or even in the musical score alone (such as by Hatten). This, obviously, is a long way from how “gesture” is used to evoke a meaning-bearing body motion in linguistics, although it may be argued that there are some motion-like qualities in what is being conveyed in the musical sound as well.

One reason many music researchers embrace the term gesture is because it allows us to make a bridge between movement and meaning. As we have seen previously, movement can be described as the physical displacement of an object in space over time. Meaning, on the other hand, denotes the mental activation of an experience.

The notion of gesture may therefore be argued to cover both movement and meaning, and therefore bypasses the Cartesian divide between matter and mind. As such, the term gesture provides a tool for crossing the traditional boundary between the physical and the mental world. Exactly such a crossing is at the core of the embodiment paradigm and it forms the strength of the current extension from disembodied music cognition to embodied music cognition.


5.11 Coarticulation in Music

In this article we look a little more at the concept of coarticulation, meaning the fusion of small-scale actions and sounds into larger units.

The term coarticulation was first coined in linguistics to explain how syllables “merge” into words, which again “merge” into sentences. Professor Rolf Inge God�y and colleagues at the University of Oslo have used the term in a similar way to explain how musical phrases can be seen as the fusion of small-scale sound units into sounds and phrases (melodies).

Our short-term memory is important for how we perceive the world, and also music. Although it is difficult to give an absolute duration, it is common to say that the short-term memory covers a range of about 0.5 to 5 seconds, perhaps longer if there are few events.

Based on the studies of music cognition at the University of Oslo, we believe that action and sound is broken down into a series of chunks in our minds when we perceive or imagine music. These chunks are typically based on the duration of our short-term memory, that is, 0.5 to 5 seconds. Not coincidentelly, this is typically also the duration of many human actions, anything from opening a door, to meaningful snippets of speech, and to many musical phrases.

From a cognitive perspective, it is commonly accepted that when we listen to music we often perceive such chunks (phrases, measures, or motives) rather than shorter or larger units.

Taking these thoughts into an embodied cognitive paradigm, we see that the formation of perceptual chunks can be multimodal in nature, and that chunking can be found also in the sound-producing actions of performers. We will not get into details here, but rather just give one example from a research study we did of piano performance.

The figure below shows the score and spectrogram of the first 8 measures of the opening of the last movement of L. v. Beethoven�s Piano Sonata nr. 17 Op. 31 no. 2 in d-minor, The Tempest (Example video, not the one used in the experiment). This piece was motion captured, and the figure below also shows plots of the horizontal positions (along keyboard) and absolute velocities of the left and right wrists, elbows, and shoulders of the pianist.

Piano coarticulation
Piano coarticulation

The interesting point here from a chunking and coarticulation perspective, is how the individual notes are merged into larger action chunks. This is particularly visible in the left hand, and can also be refound in the elbows and shoulders. This type of coarticulation is very common in music performance, and is to a large part based on biomechanical constraints. You necessarily have to move your hand in a circular path to be able to play a passage like this.


5.13 Introduction to Video Analysis

This week’s methods track will focus on movement analysis using regular video cameras.

Some of the advanced motion capture technologies we have looked at in previous weeks are certainly the best in terms of accuracy, precision and speed. But video analysis is by far the most accessible and easy solution to get started with analysing human motion on a computer.

In the following video we will take a look at how it is possible to use regular video cameras for movement analysis, and we will look at some different visualisation techniques, including motion images and motiongrams.

5.15 Visualising Video

This article presents a set of video-based visualisation techniques developed for the analysis of music-related body motion.

Here in Music Moves we present a number of different types of motion capture technologies. Many of these are excellent and work well for their purposes. Still, however, video recording is most likely the most accessible “motion capture” technology for most people. Video cameras are nowadays easily available everywhere, so anyone can get started right away.

It may be odd to think that it is necessary to create visualisations of a video recording. After all, video is visual to start with. However, watching a running video is not a very efficient way of analysing large sets of video recordings.

Motion images

One of the most common techniques when one works with motion analysis from video files is to start by creating what we call a motion image. The motion image is found by calculating the absolute pixel difference between subsequent frames in a video file, as illustrated in the figure�below. The end result is an image in which only the pixels that have changed between the frames are displayed.

Motion image
Motion image

The quality of the raw motion image depends on the quality of the original video stream. Small changes in lighting, camera motion, compression artefacts, and so on can influence the final image. Such visual interference can be eliminated using a simple low-pass filter to remove pixels below a certain threshold, or a more advanced “noise reduction” filter, as illustrated below. Either tool cleans up the image, leaving only the most salient parts of the activity in the motion.

Filtered motion image
Filtered motion image

The video of the filtered motion image is usually the starting point for further processing and analysis of the video material.

Motion-history images

A motion image represents the motion that takes place between two frames but does not represent a motion sequence that takes place over more frames. To visualise the motion itself over time, then, it is necessary to create a motion-history image�a display that keeps track of the history of what has happened over the course of some number of recent frames. One approach is to simply average over the frames of an entire recording. This produces what could be called an average image or a motion-average image, such as shown below.

Average image
Average image

These images may or may not be interesting to look at, depending on the duration of the recording and the content of the motion. The examples above are made from a short recording that includes only one short passage and a raising of the right hand. The lift is very clearly represented in the motion-average image, whereas the average image mainly indicates that the main part of the body itself stayed more or less in the same place throughout the recording.

For longer recordings, in which there is more activity in larger parts of the image, the average images tend to be more “blurred”�in itself an indication of how the motion is distributed in space.

To clarify the motion-history image, it is possible to combine the average image and the motion-average image, or possibly incorporate one frame (for example, the last frame) into the motion-average image. The latter alternative makes it possible to combine a clear image of the person in the frame with traces of the motion-history, as illustrated below.

motion-history image
motion-history image

Motion history images may be usefl to study, for example, performance techhnique. The figure below shows a visualisation of a percussion study. Here, each image represents an individual stroke on the drum pad, and the image series serves as a compact and efficient visualisation of a total of fourteen different strokes by the percussionist.


Each of the displays in the figure above represents around fifteen seconds of video material. As such, this figure is a very compact representation of a full recording session.


The motion-history images above reveal information about the spatial aspects of a motion sequence, but there is no information about the temporal unfolding of the motion. Then a motiongram may be useful, since it displays motion over time. A motiongram is created by averaging over a motion image, as illustrated in the figure below.

Schematic motiongram
Schematic motiongram

This figure shows a schematic overview of the creation of motiongrams, based on a short recording of a piano performance. The horizontal motiongram clearly reveals the lifting of the hands, as well as some swaying in the upper part of the body. The vertical motiongram reveals the motion of the hands along the keyboard, here seen from the front, as in the previous figures.

One example of the ways in which motiongrams can be used to study dance performance can be seen below. This display shows motion-average images and motiongrams of forty seconds of dance improvisation by three different dancers who are moving to the same musical material (approx. forty seconds). A spectrogram of the musical sound is displayed below the motiongrams.


The motiongrams reveal spatiotemporal information that is not possible to convey using keyframe images, and they facilitate the researcher’s ability to follow the trajectories of the hands and heads of the dancers throughout the sequences.

For example, the first dancer used quite similar motions for the three repeated excerpts in the sequence: a large, slow upwards motion in the arms, followed by a bounce. The third dancer, on the other hand, had more varied motions and covered the whole vertical plane with the arms. Such structural differences and similarities can be identified in the motiongrams, and then studied in more detail in the original video files.

From Music Research to Clinical Practice

We can make a little detour at the end of this article. As researchers working on basic issues, we are often asked about the “usefulnes” of what we do. It is often difficult to answer this question, because our research is not meant to be useful in the first place. But sometimes seemingly “useless” developments can have an impact elsewhere.

The visualisation techniques mentioned above have actually turned out to be very useful in medical research and clinical practice. A group of researchers in Trondheim, Norway, found that the motiongram technique was an excellent way of detecting so-called fidgety motion in infants. This is important when it comes to screening pre-term infants that are in the risk zone for developing cerebral palsy, as shown in this image with a healthy infant (top) and an infant with cerebral palsy (below).



Week 6

6.1 What we will cover in Week 6

In this final week you will in the theory track learn about bodily metaphors in our experience with music. You will also learn how these affect our experience of emotional content in music. In the methods track we will take a closer look at biosensors. There is no terminology track in this week. Remember to use the dictionary.

6.3 Music, Verticality and Bodily Metaphor

In this article we will look a little more on the details of how bodily metaphors shape our experience of music.

As we discussed in Studying music perception in the motion capture laboratory, most people have a strong sensation of physical direction when listening to short sounds that “move up” or “down”. This phenomenon comes about despite the fact that sound waves do not actually move up or down in physical space. So we are talking about a cognitive phenomenon. The American musicologist Arnie Cox writes about this phenomenon in music:

Verticality is not inherent in music (let alone in its notational representation); it is not there to be observed (heard) in the music, but it is instead a product of logical, metaphoric conceptualization (1999:50, emphasis in original).

Or as Björn Vickhoff adds:

Although there are no obvious directions of melody movement, most listeners feel directions in music. When the melody is moving ‘upwards’ or ‘downwards’ you get a feeling of spatial direction (2008:52).

Such processes of conceptualization have been addressed by cognitive semantics. In Philosophy in the Flesh from 1999, linguist George Lakoff and philosopher Mark Johnson employ the concept of “primary metaphors” (as opposed to “complex metaphors”) to illustrate the basic connection that exists between abstract and literal expressions. Primary metaphors are metaphors that have been incorporated into our world-view so thoroughly that we no longer see them as metaphors. They are based on correlations between expressions and embodied experiences and are, according to Lakoff and Johnson, fundamental to all thinking regarding subjective experience and judgement:

We do not have a choice as to whether to acquire and use primary metaphor. Just by functioning normally in the world, we automatically and unconsciously acquire and use a vast number of such metaphors. Those metaphors are realized in our brains physically and are mostly beyond our control. They are a consequence of the nature of our brains, our bodies, and the world we inhabit (Lakoff & Johnson 1999:59, emphasis in original).

With reference to Christopher Johnson’s “theory of conflation” (Johnson 1999), Lakoff and Johnson then describe how primary metaphors are formed:

For young children, subjective (nonsensorimotor) experiences and judgments, on the one hand, and sensorimotor experiences, on the other, are so regularly conflated—undifferentiated in experience—that for a time children do not distinguish between the two when they occur together (Lakoff & Johnson 1999:46).

Lakoff and Johnson use the example of the subjective experience of affection and the sensory experience of warmth through being held (loc.cit.). Even when children eventually develop the ability to differentiate between them, they will preserve associations from one domain (the “source domain”) to the other (the “target domain”). Thus “affection” and “warmth” will be connected, and in relation to affective meaning, “warmth” may be used where no actual (literal) high temperature is present. Similarly, metaphors are linked to movements: when we use “falling” metaphorically in the phrase “falling asleep,” the downward movement is projected upon the transition from consciousness to unconsciousness. Yet we have not “fallen” anywhere.

Verticality underpins our understanding of music as well, though the adverbs “up” and “down” and the adjectives “high” and “low” imply nonexistent spatial orientations there. According to Lakoff and Johnson such parallels

arise from the fact that we have bodies of the sort we have and that they function as they do in our physical environment (Lakoff & Johnson 1980:14).

Motor schemas and image schemas are parts of the conceptual structure we form through sensorimotor experience and visual perception. Bob Snyder describes schemas as

memory structures created by generalizations made across seemingly similar situations in the environment (2000:102).

These affect perception and shape actions. In the same way that we use image schemas as points of departure for producing images when we are told stories, we use motor schemas to form motor commands when we experience music. A motor schema related to tempo in music will support a correspondence between fast rhythms and rapid body movements; a motor schema related to verticality in music will encourage vertical movements in response to pitch. This latter motor schema has been shaped through our encounter with sources of verticality in music. Arnie Cox (1999:18f) refers to ten such sources that possess both literal and metaphoric features:

  1. verticality in staff notation,
  2. verticality in vocal experience, and
  3. the propagation of sound waves
  4. “higher” and “lower” frequencies,
  5. the “higher” and “lower” perceived loudness levels of high and low notes,
  6. the “higher” and “lower” amounts of air used for high and low notes,
  7. the “higher” and “lower” magnitudes of effort needed for high and low notes,
  8. the “higher” and “lower” degrees of tension in producing high and low notes,
  9. the association of “high” levels of emotional intensity and pitch at climaxes, and
  10. the metaphoric state-locations of tones in pitch space.

Of these ten sources of verticality, the first three ones are based on literal vertical relations, while the seven last ones are based on metaphoric verticality.

Some of these sources are based on the experience of singing or playing certain instruments and are blended with other metaphoric associations of “high” and “low,” especially greater or lesser quantities/magnitudes (“more = up, less = down”) (Lakoff & Johnson 1980:15). Others are mainly bodily experienced and do not have to trigger any explicit knowledge before helping us to form motor schemas. In a culture where music is written (as notation) and actively learned, verticality in music likely arises from a mixture of rational and bodily knowledge. Still, there are many music lovers that do not know notation, but still experience verticality in music. Much classical music from the Romantic period (Chopin, Grieg) reaches an emotional climax of a piece with ascending melodies, accelerandos (increasing tempo) and crescendos. Similarly, producers of dance music or groove-oriented popular music constantly confront the notion of “high” and “low”. A build-up in a typical house track very often moves gradually from “low” sounds to “high” sounds to reach a climax in the track (see Solberg 2014).

The sound systems in clubs are usually organized with separate subwoofers and tweeters that are situated vertically, so that the “low” sounds come from the speaker beneath the one that produces “high” sounds. This vertical placement has little specific impact upon low frequencies, but high frequencies are generally more directional, so tweeters are often placed at ear hight (see Rossing et al. 2002:chap. 24). The loud volume level in clubs also intensifies how sounds resonate in our body. Low-frequency sound waves have a greater impact than high-frequency waves in how they are felt most noticeably in boneless body regions like the abdomen, which is obviously below our ears (and eyes), thereby contributing to the physical realization of a “low” frequency.

Club-oriented dance music very often uses a basic pattern where a bass drum and a hi-hat alternate. This alternation of “low” and “high” is not as obviously “vertical” as a continuous pitch movement either up or down, but in relation to a vertical movement pattern, the structural parallel is pivotal. The bass drum sound evokes the “low” position of verticality, while the hi-hat sound evokes the “high” position. The musical sounds may then be a transducer of verticality-information, from music to spatial orientation – from alternating “low” and “high” sounds transduced to analogous up-and-down movements.


6.8 Introduction to Biosensors

So far in Music Moves we have only looked at motion sensors here in the methods track. This week we will explore how it is possible to measure biosignals, that is activity within the body itself.

Sensors measuring biosignals are often also called physiological sensors. Most of these sensors share the same sensing principle, that of measuring electrical current in various parts of the body. But since the biosignals vary considerably in strength throughout the body, the sensors are optimised differently:

  • Galvanic skin response (GSR) refers to changes in skin conductance, and is often used on the fingers or in the palm of the hand. The GSR signal is highly correlated with emotional changes, and such sensors have been used to some extent in music research (see next video) as well as in music performance. A challenge with such signals is that may not be entirely straightforward to interpret, and elements like sweat may become an issue when worn for longer periods of time.
  • Electromyograms (EMG) are used to measure muscle activity, and are particularly effective on the arms of musicians to pick up information about hand and finger motion. A challenge with EMG is to place the sensor(s) in a way such that they pick up the muscle activity properly. Later in this activity you will see an example of how EMG sensors can be used for musical interaction.
  • Electrocardiograms (EKG) measure the electrical pulses from the heart, and can be used to extract information about heart rate and heart rate variability. The latter has been shown to correlate with emotional state and has also been used in music research.
  • Electroencephalograms (EEG) are used to measure electrical pulses from the brain, using either a few sensors placed on the forehead, or hats with numerous sensors included. Due to the weak brain signals, such sensors need to have strong amplifiers and are therefore also suspect to a lot of interference and noise. Nevertheless, such sensors have also been applied in both music analysis and performance.

EEG is in many ways the first “step” towards doing brain imaging, which has also become more popular in music research over the last years. We will not cover this in Music Moves, but interested learners may find some useful links in the references below.


6.11 New Interfaces for Musical Expression

As we learned in the previous video, the term NIME refers to “new interfaces for musical expression”. The term is derived from the conference and community with the same name (NIME).

Common among practitioners in the NIME community is that of designing, building, performing and evaluating new musical instruments and other types of musical interfaces.

Musical Instruments

It is difficult to come up with a clear definition of a musical instrument, particularly if one considers all sorts of musical instruments. One common denominator, however, is that an instrument functions as a mediator between action and sound.

Illustration: Three boxes where the action box is pointing to the instrument box, and the instrument box is pointing to the sound box
Illustration: Three boxes where the action box is pointing to the instrument box, and the instrument box is pointing to the sound box

As such, any pair of objects and actions that are used in music could be considered a musical instrument.

Digital Musical Instruments

Many new interfaces for musical expression are digital musical instruments (DMIs), involving the use of digital technology in the mediation between actions and sound. This can be seen as a subset of electronic musical instruments, which embeds also analogue, electronic instruments such as the famous MOOG synthesizers.

A model capturing three essential parts of the term digital musical instrument is shown in the figure below. The controller is the physical device that the user interacts with. A controller in itself will not produce any sound, however, so the sound engine is where sound generation takes place. This may be a physical hardware unit or a piece of software. The mapping is a set of rules for how the sound engine should respond when the user interacts with the controller.


There exist numerous digital musical instruments. Many commercial digital musical instruments are based on a piano-like interface, with a controller, mapping, and sound engine built into the same physical device. Although piano-based instruments are hardly considered new interfaces for musical expression any longer, they are nevertheless digital musical instruments.

Image of a Nord Stage-piano <sup>Nord Stage (Photo by Ben Garney, CC BY 2.0 license)</sup>

The Hands by Michel Waisvisz is an example of a “classic” digital musical instrument not inspired by traditional musical instruments. The controller uses various sensors to capture hand, arm and finger movements. The sound engine is also separated from the controller, which makes it possible to create many different types of mappings between movement and sound. See The Hands in action in this youtube video.

The Hands <sup>The Hands (Photo by Luka Ivanovic, CC BY-SA 2.0 license)</sup>

NIME as a research field

One of the main areas of interest to NIME researchers is the development of new modes of interaction:

  • How can musical sound be controlled in intuitive and interesting manners?
  • How does the way we control musical sound shape our perception of the sound?
  • How can new technologies be used to enhance our musical experiences?

At the University of Oslo we have been interested in exploring different ways of creating NIMEs based on our research on embodied music cognition. Here we have use the results from cognitive experiments when designing mappings between movement and sound. Similarly, we have explored how NIMEs may be used in experiments. One such example is that of the “instrument” MuMYO.


The MuMYO instrument used in the previous video uses the commercially available MYO armband as the sensor. The MYO consists of eight electromyography (EMG) sensors which measure muscle tension, in addition to an inertial measurement unit measuring motion and rotation.

Using a machine learning algorithm to extract useful features from the muscle sensing data, the device is able to detect hand postures such as waving in or out, making a fist, or spreading the fingers. The possibility to capture muscle tension data is interesting when it comes to developing new musical instruments, as the effort put into tightening the muscles may be accessed more directly than when only measuring motion.


Appendix: Music Moves Dictionary


Motion sensor that measures acceleration.


An action can be defined as a motion sequence with a beginning and an end. Actions are often (but not always) goal-directed.

Aesthetic quality

A quality related to taste and judgement – of what we like/dislike.

See Wikipedia entry:


Action possibilities of something that is perceived, for example an object (a chair offers the possibility to sit) or musical sound (offers the possibility to dance). Affordances are always relative to the perceiver.

Alexander technique

A bodily method to promote well-being by focusing on one’s awareness and posture to ensure minimum effort and strain.


An attractor in a dynamic system is a position, a point in time, a numeric value, etc. that attracts the trajectory of an oscillation. If the entrainment process in music is considered a dynamic system, rhythmic sounds or accentuations can be named attractors that shape an oscillatory movement of muscle behaviour (in for example head nodding or foot tapping).

Bottom up process

In auditory perception, bottom-up refers to a data-driven process starting with the stimulus itself, and moving from vibrations in the air to the ear and finally to the brain. (see also top down process)

BPM (Beats per minute)

Measurement unit for musical tempo. A typical dance tempo is 120 BPM, which means that there are 2 beats per second.


A grouping of two or more sequential elements into a larger “chunk”. This can be how a fast sequence of sound-producing actions in music may be perceived as one coherent segment, or how phonemes in speech are coarticulated into words or phrases. We may talk about coarticulated actions of a performer, but also to explain how sequential elements are experienced.


Conscious mental activities, such as thinking, understanding, learning, and remembering.

See Wikipedia entry:

Communicative movements

Movements intended for communication, for example how musicians give signs to each other during a performance, or communicate directly to the audience through body language or gestures.

Controlled environment

Used in research to control variables. For instance in a lab-setting.


Emile Jaques-Dalcroze developed a method for teaching the interconnections between music, movement, dance and theatre. One of the most well-known parts of the Dalcroze system is called eurhythmics.

Demographical data

Information about a person such as age, gender, cultural background, etc. Such information is often collected in experiments to have background information about participants.

Descriptive analysis

The analysis of a phenomenon like movement by describing how it is performed, for example the kinematics of the body (velocity or acceleration of body parts) or spatial features (size and position in the room). Descriptive analysis may often be a useful “objective” starting point for further functional or aesthetic analysis.

Descriptors (movement)

Movement descriptors are numerical summaries of movement. This can for instance be the movement velocity or the overall movement energy.

Descriptors (sound)

Sound descriptors are numerical summaries of sound. This can for instance be the energy of a sound recording or the pulse clarity in a piece of music.

Dynamic envelope

The dynamic envelope of a sound describes how the dynamics (mainly the sound energy) unfolds over time. The dynamic envelope of a sound can typically be categorised into one of three main types: (1) impulsive, (2) sustained, (3) iterative.

Ecological setting

In research experiments, there is often a trade-off between controllability (being in control of all variables) and ecological validity where something is studied in its natural context. For example, studying a musician in a lab (controllable), or on stage during a concert (ecologically valid).


A subcategory of the Laban Movement Analysis (LMA) system. Contains four dimensions: space, time, weight, flow.

Embodied cognition

In the field of embodied cognition, cognitive processes are explained as being inseparable from the body. The perceiver is not only a passive receiver of information that is being processed and understood in the brain alone. Rather, the perceiver interacts with the environment. See also Internet Encyclopedia of Philosophy entry:

Emotion (in music)

Any experience of intense mental activity and high degree of pleasure/displeasure caused by the interaction with music. See also the Wikipedia entry on Music and emotion:

Emotional attunement

Changing emotional state towards an emotion expressed by some external factor, for instance, cheering up when listening to upbeat music.


The synchronization of two (or more) independent processes. For example, moving one’s foot to the pulse of the music. Here the tapping of the foot entrains to the beats in the musical sound.


When a system is (or perceived as) in balance.


The moment of energy transfer from sound-producing action to sounding object. For instance, the moment when a guitar string is released or the time span in which a clarinet player blows into the clarinet.

Functional analysis

The analysis of a phenomenon through its functional properties, such as whether an action is sound-producing, sound-modifying, etc. This is as opposed to descriptive analysis.

Fundamental frequency

Most harmonic tones are built up from an array of partials. These include the fundamental frequency, and overtones which are whole number multiples of the fundamental frequency.


A movement of part of the body, often a hand or the head, to express an idea or meaning.

Global descriptors

Descriptor that describes an entire sound or entire action (as opposed to a time-varying descriptor that varies throughout the sound or action)


Groove is a musical element that relates to the rhythmic “feel”. It is commonly used as a description for a repeated pattern and how it is played (or produced).

See also Wikipedia entry on Groove:


Motion sensor that measures rotation.

Image schemata

A cognitive structure establishing patterns for understanding and reasoning. First coined in linguistics, but also used in musicology to explain our coherent experience of musical sound and related body motion.

See also Wikipedia entry:

Impulsive action

A type of sound-producing action where a short, abrupt transfer of energy occurs between the sound-producing action and a sounding object. For instance, crashing two cymbals together.

Iterative action

A type of sound-producing action where a rapid series of impulsive actions fuse together into a continuous stream. For instance dragging the fingers across all the strings of a harp.


The branch of physics that describes the motion of objects in space over time. Kinematics does not take into account the force and energy needed for the motion.


The imaginary volume that one can reach around one’s body. The kinesphere is where our current actions can be carried out.

Laban Movement Analysis (LMA)

The movement analysis system developed by Rudolf Laban.


The movement notation system developed by Rudolf Laban. The notation is based on writing symbols of the different body parts along a vertical axis.

McGurk effect

A phenomenon that demonstrates how two different sensory stimuli (visual and aural) can lead to the perception of a third perceived object.

See also Wikipedia:


In everyday language “metaphor” is mostly used to describe an obvious figure of speech (like “the world is one’s oyster”). George Lakoff and Mark Johnson’s work on basic metaphors demonstrate how metaphors are present in languages in a more all-embracing manner. We wake “up”, we “fall” asleep, etc. They argue that these metaphors relates to our physical (bodily) and social experiences of the world.

Mirror neurons

Neurons in the motor centre of the brain that activate when we perceive an action performed by others. Mirror neurons have also been found to fire when only listening to an action, hence being important also for music cognition.


The term modality can be understood as a channel of sensory information. It includes, but is broader than, the five senses: seeing, hearing, tactility, taste, and smell.


The displacement of an object in space over time. Often used interchangeably with movement, although motion is often the preferred term in physics and scientific research.

Motion capture

A technique for measuring and storing human body motion through various types of technological systems.

Motion image

An image created by calculating the difference between the current video frame and the previous video frame. Often used as the basis for other types of computer-based video analysis.


A technique for visualising motion over time, based on motion images from ordinary video files.

Music-related body movement

A diverse category of movement, ranging from purely instrumental, such as hitting a piano key, to purely communicative, such as gesticulating in the air. Music-related movement may occur in any type of location, for example a concert hall, at home, in the street, or in a club setting.


A commercially available sensor armband from Thalmic Labs. The MYO has eight sensors that capture the muscle tension of the lower arm in addition to a gyroscope and accelerometer for measuring the rotation and motion of the device.

NIME (New Interface for Musical Expression)

A community of researchers and an annual conference focusing on the development of new technologies for musical expression. See also

Optical marker-based motion capture system

State-of-the-art technology for measuring movement using cameras and reflective markers.


The repetitive variation of a phenomenon over time (e.g. sound or movement).


Most harmonic tones are built up from an array of partials. These include the fundamental frequency, and overtones which are whole number multiples of the fundamental frequency.


To experience, be aware of, realize or understand something. The term is closely related to, but broader than sensing.


See perceive.


The perceived “height” of a tone. Closely connected to (but not necessarily the same as) the fundamental frequency of a tone.


When describing sound-producing actions, the prefix refers to the part that occurs before excitation, such as the preparation of a drumstroke and the path of the drumstick towards the drum.

Premotor cortex

See Wikipedia entry:


See Wikipedia entry:

Qualitative analysis

Analysis based on verbal or categorical information and using interpretation and reasoning as methods. Qualitative analysis is often used in the humanities and parts of the social sciences.

Quantitative analysis

Analysis based on numerical data using mathematical and statistical methods. Quantitative analysis is often used in the natural sciences, but increasingly also in other fields.

Quantity of motion (QoM)

Measure used to describe the overall amount of movement in a recording. Quantity of motion may be calculated in different ways, for example by averaging the velocities of all recorded markers in a full-body motion capture recording.

Sampling frequency

The sampling frequency is the number of times per second a phenomenon is measured and stored as a number. For audio, a typical sampling frequency is 44.1 kHz, which means that 44 100 measurements are done per second. For movement data, sampling frequencies are typically much lower, between 50 and 1000 Hz (50 to 1000 measurements per second).

Skin conductance sensor

Sensor that measures the electrical conductivity of the skin. Used in lie-detectors and also useful as a tool for sensing changes in human emotions.


See spectrogram

Sound-accompanying action

Actions that follow some features in the musical sound, but that are not involved in sound production. Example: playing “air-guitar” or “conduct” to music.

Sound feature/sound descriptor

See descriptor (sound)

Sound-modifying action

An action which modifies the sound, such as changing the pitch with the left hand on a string instrument or the mute position on a brass instruments.

Sound-producing action

An action that creates sound through excitation, such as hitting, stroking, and blowing.


A visual display that shows the frequency content of the sound over time. Is often used in sound analysis.

Spectral centroid

The “centre of gravity” of a sound spectrum. In other words, the point of the spectrum where there is an equal amount of energy above and below the threshold.

Spectral flux

A measure of the rate of change in the spectrum of a signal.


The amplitudes of the frequency components of the sound.


A term often used in experimental psychology, in which the stimulus is the controlled input used in the study. In some articles describing experiments on music-related movement, the term stimulus may refer to the musical sound being used, while the dance movement is the response.


When describing sound-producing actions, the suffix refers to the part that occurs after the excitation, such as the rebound of a drumstick.

Sustained actions

A type of sound-producing action where there is a continuous transfer of energy between a sound-producing action and the sounding object. For instance, using a violin bow on a string, or blowing into a flute.


Musical tempo is the speed or pace of the music, often measured in beats per minute (BPM).

Time-varying descriptors

Descriptors that are calculated from short, sequential time-frames, and then forming a sequence of numbers.

Top down process

In perception, top-down refers to hypothesising about stimuli based on previous experiences and memories. See also bottom-up process.

Verbal descriptions

Verbal description of sound include all words or labels that we use to describe a sound or an entire piece of music.

Vocal apparatus

The organs in the chest, neck, and head that are involved in the production of speech and other vocal sounds.


A waveform representation of a sound signal shows how the amplitude varies over time.

Music Moves: Credits

A lot of people have been involved in the making of Music Moves. We are very grateful that so many had the time and energy to make this course happen.

Academic Leads

  • Alexander Refsum Jensenius: Head of Department - Department of Musicology, University of Oslo
  • Hans T. Zeiner-Henriksen: Associate Professor - Department of Musicology, University of Oslo
  • Kristian Nymoen: Associate Professor - Department of Musicology and Informatics, University of Oslo
  • Mari Romarheim Haugen: Research fellow - Department of Musicology, University of Oslo

Musicology students on Campus

Thanks to the students in the course MUS2006 - Music and body movements who have taken the MOOC in parallel with their campus course and done weekly summaries for the educators.

Academics Interviewed

  • Anne Danielsen: Professor - Department of Musicology, University of Oslo
  • Hallgjerd Aksnes: Professor - Department of Musicology, University of Oslo
  • Rolf Inge Godøy: Professor - Department of Musicology, University of Oslo


  • Anne Eline Riisnæs: Assistant Professor - Department of Musicology, University of Oslo
  • Eckhard Baur: Lecturer - Department of Musicology, University of Oslo
  • Kai Arne Hansen: PhD Candidate - Department of Musicology, University of Oslo
  • Tami Gadir: Postdoctoral Fellow - Department of Musicology, University of Oslo

Participants / Extras in Videos

  • Minho Song
  • Christina Kobb
  • Mari Romarheim Haugen
  • Victoria Johnson
  • Kari Anne Vadstensvik Bjerkestrand

MOOC Production Team

  • Svein Harald Kleivane: USIT, University of Oslo
  • Jesper Havrevold: USIT, University of Oslo
  • Mikkel Kornberg Skjeflo: USIT, University of Oslo

Video Production Team

  • Audun Bjerknes: USIT, University of Oslo
  • Joakim Magnus Taraldsen: USIT, University of Oslo
  • Tore Bredeli Jørgensen: USIT, University of Oslo
  • David Buverud: Department of Musicology, University of Oslo

Dancers / Extras

  • Nadia Skånseng
  • Ingeborg Widerøe
  • Eirik Vildgren
  • Rebekka Ingibjartsdottir
  • Helene Benedikte Granum
  • Erik Lefsaker
  • Ivan Valentin
  • Tabita Berglund
  • Henriette Berg
  • Ingebjørg Vilhelmsen
  • Yngvild Flatøy
  • Jeanette Martinsen
  • Peter Lohyna
  • Åse Ava Fredheim
  • Ellen Kristine Wangberg

Extracts from NIME perfomances

  • Michel Waisvisz
  • Carles López
  • SSS- Cecile Babiole, Laurent Dailleau and Atau Tanaka
  • Adachi Tomomi
  • Anthony Hornof
  • Miya Masaoka
  • Yoichi Nagashima
  • Brenda Hutchinson
  • Ryan Janzen and Steve Mann
  • L2Ork (Linux Laptop Orchestra)
  • Tom Mays
  • Dan Overholt and Lars Graugaard
  • D.Andrew Stewart
  • Mark Appelbaum
  • University of Michigan Mobile Ensemble - Robert Alexander and Anton Pugh
  • Video editing: Natalianne Boucher and Atau Tanaka, Goldsmiths University of London

Musicians depicted in videos

The music in some video clips are not the music of the artists depicted. These artists are:

  • Haydn-trio: Bjarne Magnus Jensen (violin), Vivian Sunnarvik (cello), Stefan Ibsen Zlatanos (piano) at Georg Sverdrups Hus, Blindern.
  • Skydive Trio: Mats Eilertsen (bass), Thomas Dahl (guitar), Olavi Louhivory (drums) at Helviktangen, Nesodden.
  • The Lionheads: Rudi Leo Johansen, Øistein Christoffersen, Knut Magnus Nordhaug, Nicolai Herwell, at Fredrikkeplassen, Blindern.


  • Artwork depicted in the Music Moves trailer: ”Ligninger i rustfritt stål” av Bård Breivik.
  • Original music used throughout the course: ”Chew Wag” by A-Cow and Tami Gadir.
  • Drum tracks used in illustrations played by Dennis Sandoo.
  • And of course, special thanks to all the great people at FutureLearn!