Descriptions of time in music

Once upon time I seriously thought about attempting to code my own audio editor. An editor for serious editing of fringe stuff like contemporary electronic, techno/mono and abstract drum and bass. While thinking about the current ways of making music with computers (and especially the low cost ones, as a student will), I found a lot of room for improvement. Some of the most serious problems with current audio capable music software have to do with the handling of musical time—even the most high end applications are at some level tied to common music notation and the limitations of the Western musical tradition on the concept of rhythm. I’ll try to relay some of my concerns, here, and I hope this text suffices to arouse some suspicion over equating rhythm and time with the way common music notation (CMN) handles it.

Common music notation and time

The aim of this article is to demonstrate some of the common limitations in and biases of the treatment of time in Western music. The latter is here defined as music which is accurately transscribed into the common music notation (i.e. the staffs, bars, notes and time signatures abstraction we equate with written music). This is a very reasonable starting point, since most of the music heard within the Western civilization today either has been written in CMN, is easily transscribed into it or otherwise has been deeply influenced by its preferred methods of temporal structuring. It is also the lingua franca of both music education and the professional musicianship throughout most of Europe and North America—the current strongholds of the global music industry. It descends naturally from the first systematic transscribed forms of music, completed in the Middle Ages, and embodies in a convenient, familiar form what is commonly thought of as essential to the description of Western music.

The first significant thing to note about CMN is the existence of notes. Our music revolves around the concept of a tone/note, which from the standpoint of temporal characterisation is a discrete event with a start and an end. Melody, which arguably occupies a central role in musical development, is usually thought of as being formed from series of separate, identifiable notes. More generally, our concepts of musical time and rhythm are surprisingly tightly bound to events which are point‐like in time: tones starting and stopping with essential constancy of musical characteristics in between. Hence a very elementary and noteworthy observation: most Western music is time discrete. To us, rhythm without distinctly identifiable events is almost unfathomable—most people would not admit that there is any rhythm in the sound of rain, for instance.

For the most part, events are related to the amplitude of sounds—variations in apparent loudness signal the start and end of an event. Because of basic physical concerns, sudden increase in volume signals the onset of notes quite accurately whereas the gradual decay of resonating bodies blurs the offset somewhat. This translates into a kind of dominance of onsets: the correct offset of notes is significantly less important than a well timed onset if a musical outcome is desired. Similarly the pitch and timbral characteristics of sound have a much diminished role in determining the start and end of musical events, and can assume significant continuous detail before they affect the perceived rhythm of a given musical passage. CMN reflects these principles accurately: the time and manner of offset of notes is given considerably less attention than the onset and the few means of transitioning from one note to another without significant intervening amplitude variations (say, legato) are regarded primarily as stylistic measures.

Further defining factors in musical time are metric structure and cyclicity. This means the onsets of notes are not spaced randomly in time but rather are concentrated on specified, regularly repeating boundaries. Furthermore, these boundaries usually recur at approximately uniform intervals in time. This gives rise to what musicians commonly call the meter, a regular tick which signals the repeating structure of the temporal grid underlying a given composition. Again we tend to classify any variation from the meter as either an error on behalf of the player or as a stylistic measure (as in the case of swing). On the other hand, purely cyclic structures in the events themselves tend to be depreciated even if the temporal grid on top of which they live should be highly regular—it takes a seasoned techno freak to appreciate several minutes worth of monotonous, droning ’303. This tells us that Western music, just like the culture in general, perceives time as being linear. We might even go as far as to view the constant recurring basic tick as a way to set a reference against which our limited time keeping ability can measure the linear time instead of elevating the tick to a status of an independent time recurrent event.

In CMN, the cyclicity is captured by bars, choruses and parts while the basic tick comes in the form of BPM, time signature and the rather limited selection of note durations which are all based on subdivisions of the bar length. This way the metric structure shows up even on larger time scales: analysis of rhythm traditionally centers on grouping notes first into some of a few basic rhythmic motifs, then bars and higher in a hierarchical manner. The theory breaks neatly if no such higher structure is present. Again small numbers of units are preferably placed on each level—it is highly unusual to encounter rhythms which have something like 289/16 as their basic cycle. Western music theory is geared towards symmetrical structures which are most easily constructed if the length of the basic rhythmic units is highly composite. The most natural system seems to be a cascade of units whose length is two, three or four (e.g. 4/4 time, then four bars in a motif, an A–B–A structure of these to form a part and so on). Further, it is typical the same highly symmetrical structure repeats throughout a piece. Shifts between time signatures are quite rare. This makes the rhythm very predictable: once we have heard the first two or three bars, we may be able to tell what happens in the course of the piece from there on. This is in stark contrast with very exotic time signatures such as the 289/16 mentioned above. In such a signature basic rhythmic structures become so long‐winded that prediction is almost impossible. We may also have a strong urge to shift to a different signature from time to time to get a bit of life into the rhythm, something which perhaps isn’t as necessary with 4/4 and similar times because normal variation from bar to bar can provide enough interest.

Our music is also characterized by how we regard variation from the basic rhythmic template (say, 4/4 with a backbeat)—for the most part any variation is either seen as an effect, a stylistic measure or simply as an anomaly. This holds especially for the metric tick, so that for instance unequal lengths of 16th notes is regarded variably as a successful stylistic trick or a sign of a bad player. Excellent examples of the successful use of a deformed time grid are given by swing, repicana (in samba) and waltz rhythms and the omnipresent layback. Each have their own peculiar way of stretching and shifting the basic timegrid underlying the rhythm and also accenting the beat living above it. Nevertheless Western music theory cannot integrate such variation into the music itself, but instead puts the correct playback into the hands of the musician who should know. CMN certainly cannot capture such fine grained rhythmic features as repicana.

One final point to note about Western rhythm is that it is almost invariably single layered: only one basic tick is in effect at a time. Furthermore, if rhythmic series are overlaid their natural time signatures are either same or multiples of each other so that no ragged edges remain. (In techno a counterexample is provided by 4/4 basic rhythm with 3/4 drone rolling over it. The edge of a single bar seems to be continuously on the move with respect to the 4/4 beat.) Truly polyrhythmic structures are exceedingly rare, even in more contemporary and experimental music, with such simplistic structures as trioles taking their place. Of course one can argue that any polyrhythmic structure can be flattened into one with only one fast tick and a complex patterning. But this is something which would incur a severe loss of structure; precisely what we are out to avoid, here.

Some goals

When we talk about changing an editing paradigm or a data representation, there are many things to consider besides purely theoretical ones. All the representatioal power and conceptual elegance in the world cannot compensate for an unintuitive, cluttered framework. Nevertheless, the current discussion started out purely as a critique of CMN’s limitations. This means there is some accomodating to be done.

To make a data presentation usable, it should be simple and easy to understand. More specifically, simple and ordinary things should require little thought and difficult things should be possible with only a bit more effort. In the context of rhythm, such garden variety constructs as four‐to‐the‐bar beats and pure repetition should come naturally and preferably the actual editing should employ constructs known to the expected user. In this case this means a computer musician well versed in MIDI and current audio enabled composition software. This implies CMN, piano rolls, event lists or something akin. For myself, the most natural domain is the one employed by trackers: a downward scrolling regular list of timeslots with multiple parallel channels. Of course, this is not something you would employ as‐is in a commercial product. But the basic idea of laying down a more or less regular grid on time on which to place actual sonic events is a usable one. Given that such a grid can be arbitrarily complex and that multiple grids can in principle be used simultaneously, we seem to have a starting point for a complete framework.

From the current, practical viewpoint, the leap to actual editing software and storage formats is quite short. It is therefore essential that any representation we propose also has enough power to handle the cut‐and‐paste type editing operations typical of nowaday’s music software. I think this is practically synonymous with the need for semantic encoding of time. In other words, to be editable, a music representation needs to preserve any meaningful time structure present in music instead of just telling what happens when. To give a concrete example from techno, editing a repetitive drone would be much easier if the representation used handled the drone as a repetition of a given basic round instead of viewing it as a long sequence of unrelated events. Similarly elements of timing most would view as being stylistic (like swing) should be reflected in the representation instead of being buried in listings of human non‐readable event times, like in MIDI event list editors. This also implies that some traditional idioms of Western music, such as strictly metric time and signatures need to be preserved in some form. Needless to say, time signatures are neatly captured by the notions of hierarchical time and repetition. More on that later on.

But there is still the issue of music whose foundation is not based on strictly repetitive time. (Though after you take to electronica…) Changes in tempo, alternating time signatures, breaks in free time and held tones are but a few conventional examples. Classical music tends to use time for expressive purposes, blurring the boundary between repetitive beats and free time even further. These examples imply that even though the base rhythmic structure of a given composition may well be metric, highly regular and even repetitive, such structure rarely is completely mechanic. Some flexibility is needed while at the same time the metric structure needs to be preserved. We need to be able to stretch the time grid underlying a piece of music without destroying the semantic structure encoded over it. At smaller scale (say, within the confines of a single beat) similar needs arise from repetitive rhythmics which, nonetheless, have unequal spacing between successive ticks. Good examples are given by Latin rhythms (e.g. samba’s repicana/virada), African/Afro‐American rhythmic idioms (like swing and the various slight rhythmic subtleties which constitute layback) and in Western classical music, the waltz. All these entail distortion of the basic repetitive tick so that different parts of a bar of music, while in CMN transscribed as being of equal length, end up being unequal.

The unequality of tick lengths described above is significant over large periods of time, constituting an essential part of the rhythmic foundation of a piece. A slightly different variety of small scale variation are stylistic anomalies and glitches, which revolve around the basic tick but are not a part of it. Examples include simple errors (which sometimes are desirable) and slight delays used as a means to emphasize certain notes. Here we talk about time features which do not belong in the basic grid structure for the music but nevertheless need to be expressible. We might call them semimetric. If we stretch the notion far enough, we end up with music with little metric structure which we have already referred to as music in free time. This is also something which needs to be expressible. Free time is, however, characterized not so much by irregular relation of notes to the basic tick but the lack of a tick altogether. Free time is the domain of description for completely random time phenomena and also the many continuous aspects of sound. By continuous we mostly mean modulatory control (in MIDI, continuous control) and those parts of an electronic composition which do not consist of synthesized, discrete notes (e.g. free flowing sampled sound and time smearing effects like reverb).

To get the most out of a representation like the one we’re building here, it must be possible recombine the basic parts described above into more comprehensive and complex units. In rhythm, I can think of three basic ways: splicing together episodes of different kinds, layering different rhythmic structures on top of each other and hierarchical grouping of the elements to facilitate processing, repetition and editing. Splicing is a process very familiar to anyone who has composed or edited music—Western music theory recognizes parts and movements (in pop, bridges, choruses, licks and so on) while audio editing is mostly about cutting and pasting sound bites one after the other. Layering multiple, qualitatively different rhythmic elements leads to polyrhythmic time; something which is pretty much unknown territory to pop people and considered quite exotic even in the classical circles. In case, grouping and hierarchy is more about the semantics of rhythm and leads to ease of editing. Hierarchical construction of more complex rhythmic movements carries the same benefits as does grouping of separate notes into bars and parts in CMN, only at a larger scale.

Finally I’d like to point out something that we may not wish to do. Often when extensions to the traditional domain of music notation are proposed, some undesirable parts of contemporary music theory rear their ugly head. The things that tend to get us in trouble are serialist theory, aleatoric constructs and the algorithmic generation of music. These are tightly interconnected and have had a huge impact on Western music theory over the last couple of decades. The problem in our case comes from the fact that instead of focusing on a concrete instance of a composition (be it a completely specified electronic composition or a loose description based on the common music notation), they deal with music at a more general and often generative level. Algorithmic composition gives rules to generate songs, aleatoric music is loosely patterned after a given set of probabilities and serialist works center on the creation of works within a very strict limiting framework. When we discuss representations of music someone usually puts forward the idea that these generative processes, constraints and rules need to be encoded as well. The problem is, there is no general enough declarative language to do this with—we are left with a procedural description which becomes unreadable and messy. Basically, incorporating procedural data into music makes it necessary to resort to programming to express the music. Programs, then, cannot be easily edited, understood or visualised. On a more personal note, I tend to view such complex descriptions as being irrelevant to the description of a work. In my mind, what an aleatoric composition actually gives is a whole class of works and so reflects some fundamental sloppiness on behalf of the composer. After all, it is the composer’s job to pick a sensible, intelligent combination of stuff from the immense space provided by all possible sound. If the composer cannot even perform this selection to its logical conclusion, what is the worth of his creation? Even at the risk of some contention, I think the ideal for a composer is to fix his work to as large a degree as possible which preferably means nailing it down to a single, repeatable instance of digital sound. Otherwise the composer risks misrepresentation and disposes of the accuracy needed to convey some of the more intricate parts of the musical experience.

More on grids

Now that grids have mentioned so many times at a conceptual level, it would be interesting to see whether the concept can be materialized into a working editing framework. I think so, and I also believe that tastefully implemented, they might be as easy to use as any current rhythmic editing paradigm. At the same time, a grid based sound/music editor would offer unprecedented flexibility to the composer in utilizing some of the more esoteric rhythmic structures.

Polyrhythm gives an excellent example. Traditionally polyrhythmic music is quite rare and is usually composed on the same staffs as the simpler, unique time works. With grids, we can overlay multiple independent measures of time and achieve a lossless coding of polyrhythm.

To incorporate music with a changing time signature, we need to be able to splice together independent grids in time. The same mechanisms, applied at the level of events, can then be used to incorporate repetition, reuse of parts of the sequencing data and irregular time structure (like in breaks). This is also needed to facilitate looping (repetition) and higher structure in a composition. This is also what makes the emulation of traditional sequencing platforms possible. Similarly we need to be able to aggregate grids and their overlying sonic events vertically, to group together related, time overlapping lines of events.

Given a grid, it should be possible to make slight, one time modifications to the time dictated by the grid points so that expressive gestures can be encoded efficiently without breaking the underlying rhythmic structure. At the same time, a possibility must exist to encode some events in completely free time. This means that in addition to different types of grids, we must have a slate for events with no underlying rhythm. And after we have this kind of typing and one shot modification data in place, we soon see it is natural to attach some further parameters to the grids (or aggregated grid complexes) themselves—this way it is possible to capture some of the more common variation patterns encountered in rhythm. A good example would be given by a delayed grid.

Now, since grids are nothing but collections of point‐like instants of time which have been assigned names, we see that any continuous time phenomenon can be incorporated into the grid framework as well. Put in another way, a grid acts as a kind of ruler we put on top of continous time: it makes it possible to draw metric structures easily and accurately but does not really dictate that we should do so. This means that we can splice such inherently continuous objects as continuous control or prerecorded audio together with the grids so that the grids determine where and when the continous phenomena occur but otherwise are independent of them. It is easy to see that we get the best of both worlds, completely free time as in MIDI and completely fixed time of the kind employed by trackers.