Saturday, April 18, 2015

Reverse Engineering MIDI

I am very keen to expand the number of Scandi tunes that are saved to my tradtunedb site but I am finding that not enough people are posting tunes - largely because they are put off by the seeming complexity of the abc notation. One of my friends told me they'd find it a lot simpler if they could just play the tune on a MIDI keyboard and somehow get this automatically converted to abc. This got me thinking...

The Haskell School of Music

And then, by chance, I stumbled upon the Haskell School of Music (HSoM). This is a very comprehensive Haskell tutorial, chock-full of exercises, but where all the examples are taken from the field of music. It's the brainchild of Paul Hudak who is both one of the original designers of Haskell and also a keen musician. The book is a successor to his previous Haskell School of Expression, but to my mind it is a great improvement, partly because the treatment of the language is both clearer and deeper and partly because the exercises benefit from the common theme. Although HSoM is still very much a work in progress, it is remarkably comprehensive. It is split into two principal sections - the first part develops a domain-specific language for representing pieces of music and the second explores the generation, composition and analysis of musical signals which would allow you, for example, to design your own electronic instrument. All this is achieved by gradually introducing Euterpea, a computer music library developed in Haskell which supports the programming of computer music at both at the note level and the signal level.

Euterpea

Euterpea stems from a previous library also developed by Paul called Haskore and is maintained on github. It has at its core the Music algebraic data type:
    
data Music a  = 
       Prim (Primitive a)               --  primitive value 
    |  Music a :+: Music a              --  sequential composition
    |  Music a :=: Music a              --  parallel composition
    |  Modify Control (Music a)         --  modifier
  deriving (Show, Eq, Ord)
where Control is represented like this:
 
data Control =
          Tempo       Rational           --  scale the tempo
       |  Transpose   AbsPitch           --  transposition
       |  Instrument  InstrumentName     --  instrument label
       |  Phrase      [PhraseAttribute]  --  phrase attributes
       |  Player      PlayerName         --  player label
       |  KeySig      PitchClass Mode    --  key signature and mode
  deriving (Show, Eq, Ord)
The Control type allows you to insert a variety of modifying instructions - usually at the phrase level (for example you can transpose a tune, pick an instrument or indicate dynamic markings) but otherwise Music is extremely straightforward. Primitives represent the notes (or rests) themselves and you can compose phrases together either serially or in parallel. This is simple but powerful - for example if you compose individual notes in parallel, you get a chord, if you compose whole phrases of notes in parallel you can define different melodic lines, perhaps played on different MIDI instruments.

What is particularly useful is that Euterpea comes with functions to convert between MIDI and this Music data type. This is a good deal more attractive to work with - all you really get from MIDI is an instruction to turn a note on in a particular manner and then later to turn it off again. Euterpea manages the conversion by prefacing each note in the tune with a rest whose length is identical to the offset of the note in the tune and then composing all these two-item phrases in parallel. It thus becomes relatively easy, when trying to produce scores, to identify the notes that start each bar, although no bar indications are present in the Music data type itself.

As yet, Euterpea provides no help at all for producing a score of any kind from Music. It has a notion of a function that would provide notation called NotateFun but this is unimplemented.

Producing Scores

When you want to produce a performance of some kind from Music, things are relatively straightforward. Music is expressive enough to combine different notes together in any manner you wish and Control allows you to plug in your own modifiers, letting you express your own interpretation of the performance. But when you want to go in the opposite direction, things get trickier because the translation into MIDI is lossy - you lose nearly all the contextual information originally applied to phrases.

Accordingly, I don't want to be too ambitious in trying to recreate an abc score. I will limit myself to monophonic MIDI files and to relatively straightforward Scandi tunes with just a single melody line. On the whole, these tend to be in standard rhythms but the most prevalent is the polska. These are normally written in 3/4 time but are not waltzes - they have an emphasis on the first and third beats of the bar. They come in various forms: the slängpolska is straightforward, dividing each beat into semiquavers:
the triplet polska, as its name suggests, tends to divide each beat into triplets.
You would think that 9/8 would be a better representation (as in Irish slip jigs) but by convention, 3/4 is normally used. This means that if you offer the choice of time signature, you have more work to do in the translation of these polskas into 3/4 because you have to invoke the special abc triplet notation which is used whenever three notes take the time allotted to two. This must also be done for another very common polska form - the so-called short first beat polska where three notes are played as a regular triplet lasting the first full two beats in the bar.

Representing Scores

Scores will be represented in an algebraic data type Score:
    
data Score a = EndScore
             | Bar Int (Notes a) (Score a)
        deriving (Show, Eq, Ord)

data Notes a = PrimNote a
             | (Notes a) :+++: (Notes a)    -- a group of notes
             | Phrase (Tuplet a)            -- a duplet, triplet or quadruplet
        deriving (Show, Eq, Ord)

-- here Rational defines the type of Tuplet - 
-- (2/3) is two notes in the time of three (duplet) 
-- (3/2) is three notes in the time of two (triplet) 
-- (4/3) is four notes in the time of three (quadruplet) 
data Tuplet a = Tuplet Rational [a]
        deriving (Show, Eq, Ord)
As with Euterpea, it is polymorphic in the type of note being represented, allowing you to start with Euterpea's representation and end with one more suited to abc. Although very simple, it is sufficient to represent the set of notes in an abc score given the restrictions mentioned above - so for example I have dispensed with the parallel constructor because I am only interested in single line melodies. Other properties of the score such as time signature or key signature are carried by abc as headers and so are represented separately - simply as configuration properties.

Imposing Structure

Transformation from MIDI to abc is now a matter of attempting to apply more and more structure to the set of raw notes that you start with. Here are some of the key elements:

Note Duration

Euterpea uses fractional durations but abc uses integral durations. It's sensible to unify on a smallest duration of 1/96th note. This is convenient because it is small enough not to lose precision but has both 3 and 4 as factors and so can be used to represent notes in triplets and quadruplets. A bar of 4/4 music will occupy 96 such measures and we can deduce the length of the smallest note we can reliably detect (for example a 1/32 note occupies 3 measures) which we can call the shortest detectable note.

Bar Lines

MIDI has a notion of time signature and from this and the rounded note durations and offsets we can work out where the bar lines are intended and thus invoke the Bar constructor. If a note spreads across such a bar line, we have to split it into two notes linked with a tie, itself notated as a note type. We can then label all the bars in the score monotonically from zero. This also gives us a mechanism for issuing end of line markers to spread the score out evenly if we issue them regularly after a certain count of bars. We can also work out where the beats in the music occur and mark each note as either on or off the beat. This helps us to separate note phrases in the abc.

Long Notes

When we unify a note's duration, we may find it has a length (say) of (5/8) or (7/8). This is impossible to notate as a single entity and so we again split into two notes which we now can notate, joined by a tie.

Tuplets

If a note does not consist of an exact number of shortest detectable note durations, it is a candidate for embedding in a tuplet. This is true for quadruplets (having a note duration of 3/32) and triplets (having a note duration of 1/12). In addition, duplet notes have a duration of 3/16. We then continue to add neighbouring notes to the tuplet until the total duration is equal to that of an even number of beats.

Pitches

MIDI is specific about pitches - F# is always F#. However, its display in a score depends on the key signature. In the key of C Major it would be shown as F# but in the key of G Major it would be shown simply as F, inheriting its 'sharpness' from the key signature. Conversely, an F natural note in this key is required to be explicitly marked as a natural. To handle this translation it seems sensible to generate a chromatic scale of notes for each possible key and then to translate simply by lookup into this list. MIDI also has a notion of octave which can be directly translated into an abc octave marker.

You also need to pay attention to the way accidentals are represented in the score. Once an accidental is marked in any particular bar, you no longer need to mark further instances of the note explicitly that occur later in the bar, because they inherit their pitch markers from the previous instance.

Articulation

MIDI has no concept of rests, which only exist as gaps between successive notes. This means we need a heuristic which will somehow discriminate between cases where a note decays earlier than intended and where a legitimate rest is indeed intended. Our approach is to identify all such gaps, and where the duration is longer than the shortest detectable note, to insert a rest, otherwise to extend the preceding note by the gap's duration.

Code

The first phase of the project uses MIDI files that themselves were computer-generated (in fact from abc) and so are very regular in rhythm. If you are at all interested, the code is here. A web interface to the midi translation is here.