Saturday, April 18, 2015

Reverse Engineering MIDI

I am very keen to expand the number of Scandi tunes that are saved to my tradtunedb site but I am finding that not enough people are posting tunes - largely because they are put off by the seeming complexity of the abc notation. One of my friends told me they'd find it a lot simpler if they could just play the tune on a MIDI keyboard and somehow get this automatically converted to abc. This got me thinking...

The Haskell School of Music

And then, by chance, I stumbled upon the Haskell School of Music (HSoM). This is a very comprehensive Haskell tutorial, chock-full of exercises, but where all the examples are taken from the field of music. It's the brainchild of Paul Hudak who is both one of the original designers of Haskell and also a keen musician. The book is a successor to his previous Haskell School of Expression, but to my mind it is a great improvement, partly because the treatment of the language is both clearer and deeper and partly because the exercises benefit from the common theme. Although HSoM is still very much a work in progress, it is remarkably comprehensive. It is split into two principal sections - the first part develops a domain-specific language for representing pieces of music and the second explores the generation, composition and analysis of musical signals which would allow you, for example, to design your own electronic instrument. All this is achieved by gradually introducing Euterpea, a computer music library developed in Haskell which supports the programming of computer music at both at the note level and the signal level.

Euterpea

Euterpea stems from a previous library also developed by Paul called Haskore and is maintained on github. It has at its core the Music algebraic data type:
    
data Music a  = 
       Prim (Primitive a)               --  primitive value 
    |  Music a :+: Music a              --  sequential composition
    |  Music a :=: Music a              --  parallel composition
    |  Modify Control (Music a)         --  modifier
  deriving (Show, Eq, Ord)
where Control is represented like this:
 
data Control =
          Tempo       Rational           --  scale the tempo
       |  Transpose   AbsPitch           --  transposition
       |  Instrument  InstrumentName     --  instrument label
       |  Phrase      [PhraseAttribute]  --  phrase attributes
       |  Player      PlayerName         --  player label
       |  KeySig      PitchClass Mode    --  key signature and mode
  deriving (Show, Eq, Ord)
The Control type allows you to insert a variety of modifying instructions - usually at the phrase level (for example you can transpose a tune, pick an instrument or indicate dynamic markings) but otherwise Music is extremely straightforward. Primitives represent the notes (or rests) themselves and you can compose phrases together either serially or in parallel. This is simple but powerful - for example if you compose individual notes in parallel, you get a chord, if you compose whole phrases of notes in parallel you can define different melodic lines, perhaps played on different MIDI instruments.

What is particularly useful is that Euterpea comes with functions to convert between MIDI and this Music data type. This is a good deal more attractive to work with - all you really get from MIDI is an instruction to turn a note on in a particular manner and then later to turn it off again. Euterpea manages the conversion by prefacing each note in the tune with a rest whose length is identical to the offset of the note in the tune and then composing all these two-item phrases in parallel. It thus becomes relatively easy, when trying to produce scores, to identify the notes that start each bar, although no bar indications are present in the Music data type itself.

As yet, Euterpea provides no help at all for producing a score of any kind from Music. It has a notion of a function that would provide notation called NotateFun but this is unimplemented.

Producing Scores

When you want to produce a performance of some kind from Music, things are relatively straightforward. Music is expressive enough to combine different notes together in any manner you wish and Control allows you to plug in your own modifiers, letting you express your own interpretation of the performance. But when you want to go in the opposite direction, things get trickier because the translation into MIDI is lossy - you lose nearly all the contextual information originally applied to phrases.

Accordingly, I don't want to be too ambitious in trying to recreate an abc score. I will limit myself to monophonic MIDI files and to relatively straightforward Scandi tunes with just a single melody line. On the whole, these tend to be in standard rhythms but the most prevalent is the polska. These are normally written in 3/4 time but are not waltzes - they have an emphasis on the first and third beats of the bar. They come in various forms: the slängpolska is straightforward, dividing each beat into semiquavers:
the triplet polska, as its name suggests, tends to divide each beat into triplets.
You would think that 9/8 would be a better representation (as in Irish slip jigs) but by convention, 3/4 is normally used. This means that if you offer the choice of time signature, you have more work to do in the translation of these polskas into 3/4 because you have to invoke the special abc triplet notation which is used whenever three notes take the time allotted to two. This must also be done for another very common polska form - the so-called short first beat polska where three notes are played as a regular triplet lasting the first full two beats in the bar.

Representing Scores

Scores will be represented in an algebraic data type Score:
    
data Score a = EndScore
             | Bar Int (Notes a) (Score a)
        deriving (Show, Eq, Ord)

data Notes a = PrimNote a
             | (Notes a) :+++: (Notes a)    -- a group of notes
             | Phrase (Tuplet a)            -- a duplet, triplet or quadruplet
        deriving (Show, Eq, Ord)

-- here Rational defines the type of Tuplet - 
-- (2/3) is two notes in the time of three (duplet) 
-- (3/2) is three notes in the time of two (triplet) 
-- (4/3) is four notes in the time of three (quadruplet) 
data Tuplet a = Tuplet Rational [a]
        deriving (Show, Eq, Ord)
As with Euterpea, it is polymorphic in the type of note being represented, allowing you to start with Euterpea's representation and end with one more suited to abc. Although very simple, it is sufficient to represent the set of notes in an abc score given the restrictions mentioned above - so for example I have dispensed with the parallel constructor because I am only interested in single line melodies. Other properties of the score such as time signature or key signature are carried by abc as headers and so are represented separately - simply as configuration properties.

Imposing Structure

Transformation from MIDI to abc is now a matter of attempting to apply more and more structure to the set of raw notes that you start with. Here are some of the key elements:

Note Duration

Euterpea uses fractional durations but abc uses integral durations. It's sensible to unify on a smallest duration of 1/96th note. This is convenient because it is small enough not to lose precision but has both 3 and 4 as factors and so can be used to represent notes in triplets and quadruplets. A bar of 4/4 music will occupy 96 such measures and we can deduce the length of the smallest note we can reliably detect (for example a 1/32 note occupies 3 measures) which we can call the shortest detectable note.

Bar Lines

MIDI has a notion of time signature and from this and the rounded note durations and offsets we can work out where the bar lines are intended and thus invoke the Bar constructor. If a note spreads across such a bar line, we have to split it into two notes linked with a tie, itself notated as a note type. We can then label all the bars in the score monotonically from zero. This also gives us a mechanism for issuing end of line markers to spread the score out evenly if we issue them regularly after a certain count of bars. We can also work out where the beats in the music occur and mark each note as either on or off the beat. This helps us to separate note phrases in the abc.

Long Notes

When we unify a note's duration, we may find it has a length (say) of (5/8) or (7/8). This is impossible to notate as a single entity and so we again split into two notes which we now can notate, joined by a tie.

Tuplets

If a note does not consist of an exact number of shortest detectable note durations, it is a candidate for embedding in a tuplet. This is true for quadruplets (having a note duration of 3/32) and triplets (having a note duration of 1/12). In addition, duplet notes have a duration of 3/16. We then continue to add neighbouring notes to the tuplet until the total duration is equal to that of an even number of beats.

Pitches

MIDI is specific about pitches - F# is always F#. However, its display in a score depends on the key signature. In the key of C Major it would be shown as F# but in the key of G Major it would be shown simply as F, inheriting its 'sharpness' from the key signature. Conversely, an F natural note in this key is required to be explicitly marked as a natural. To handle this translation it seems sensible to generate a chromatic scale of notes for each possible key and then to translate simply by lookup into this list. MIDI also has a notion of octave which can be directly translated into an abc octave marker.

You also need to pay attention to the way accidentals are represented in the score. Once an accidental is marked in any particular bar, you no longer need to mark further instances of the note explicitly that occur later in the bar, because they inherit their pitch markers from the previous instance.

Articulation

MIDI has no concept of rests, which only exist as gaps between successive notes. This means we need a heuristic which will somehow discriminate between cases where a note decays earlier than intended and where a legitimate rest is indeed intended. Our approach is to identify all such gaps, and where the duration is longer than the shortest detectable note, to insert a rest, otherwise to extend the preceding note by the gap's duration.

Code

The first phase of the project uses MIDI files that themselves were computer-generated (in fact from abc) and so are very regular in rhythm. If you are at all interested, the code is here. A web interface to the midi translation is here.

Sunday, October 19, 2014

Downloading Resources from Cross-Origin Requests

If you decide to have a separate web- and backend- tier, cross-origin HTTP requests cause no end of difficulties in a variety of unlikely ways. I've written before about using CORS headers to mitigate some of them but here's yet another. It surfaces whenever you try to use the download attribute in an anchor tag but where the URI resource lives on the backend and not the web server. For example, I host tunes on my server and I want users to be able to download the tune scores which are in pdf format:
    
  download pdf
The intention is that when the user selects the link, the browser will know that the resource is intended to be downloaded and will use the download attribute value as the file name. This indeed happens when the resource is hosted locally, but it no longer does so in the most recent versions of Chrome when it's hosted remotely. It seems as if the Chrome developers have now implemented a rather awkward algorithm in the html5 spec designed to mitigate possible security dangers in downloading files from untrusted sites. This is ostensibly because a hostile server could try to inveigle a user into unwittingly downloading private information (thinking it came from the hostile server itself) which could then perhaps be re-uploaded to that server. The solution is to use the Content-Disposition HTTP header when serving up the resource to suggest a file name. For example:
    
  Content-Disposition: attachment; filename=mytune.pdf
Conforming browsers should then use this name when they interpret the download attribute in the anchor tag.

Friday, August 29, 2014

Functional Programming in Scala

Richard Feynman said what I cannot create I do not understand, notoriously misquoted by Craig Venter as what I cannot build I do not understand when he inserted it as a watermark into the DNA of his bacterium with an artificial genome. But Venter has a software engineering approach to biology - the bacterium has a computer for a parent and he went through a series of debugging cycles before eventually the organism would 'boot-up'. His statement is fundamental to the proper understanding of all sorts of disciplines, but non more so than software development.

Manning have today published Functional Programming in Scala by Rúnar Bjarnason and Paul Chiusano. The book is constructed around a carefully graded sequence of exercises where you find yourself attempting to build things of increasing complexity. It's aim is to teach functional programming concepts through the medium of Scala and this means that you must construct your programs from pure functions, free of side effects.  If, like me, you come from an OO background, this causes a massive rethink - there are whole swathes of the language that you just don't use.  For example, you can't reassign a variable; your functions mustn't throw exceptions; you will not develop class hierarchies.  You will use case classes but their use is restricted to the creation of algebraic data types. I found that I had to go through the exercises very slowly and deliberately.  The introductory exercises are relatively straightforward and fun to attempt and your confidence grows, but I found that the material quite soon becomes very challenging indeed.  In fact, whilst studying the various revisions of this book, I was reminded yet again of Feynman - his algorithm:
    Write down the problem.

    Think real hard.

    Write down the solution.
This, as opposed to an approach I've tended to used in OO:
   
    Write down the problem.

    do {
        Think a little bit.

        Type a little bit.

        See what happens

    } until problem solved.

    Write down the solution.  
The problem when lesser mortals attempt Feynman's algorithm is that it tends not to terminate. When it works, however, you often tend to end up with an elegant, terse solution, often incomprehensible to the casual reader. But the main thing is that you gain enormous understanding - you will get no benefit unless you attempt the exercises.

The material is presented tersely, with just enough explanation of the concept and the scala syntax you need to express it.  Some of the problems are very tricky and I would occasionally find myself pondering single questions for an hour or two, struggling to get the types to match.  The satisfaction when you do tease out a solution is immense.  The accompanying material on the web site includes pro-forma Scala exercise files for each chapter with placeholders for the answer, usually in the fom of stubs like this:
    def compose[A,B,C](f: B => C, g: A => B): A => C =
        ??? 
If you get stuck, hints and solutions for each question are provided, as is a fully fleshed-out answer file for each chapter.  Some later chapters use material developed in earlier ones and so, when attempting a new chapter, you need either to have completed the previous ones or to compile the supplied answers before you can proceed.

There are four main sections. The first is an introduction to FP concepts and the authors have given a lot of thought to the progression of ideas.  This allows you gradually to gain confidence in manipulating functions until finally you're able to develop a lazily-evaluated Stream data type and then a purely functional representation of State. If you just get this far, it's of tremendous benefit.

Then the style of the book changes tack to some extent in three chapters devoted to functional design.  The idea is to illustrate some of the thought processes and alternative approaches that can be involved in groping your way towards the development of a combinator library that tackles a particular design space. Three completely different problems (parallelism, test frameworks and parsers) are used. This is illuminating but sometimes a little at odds with the progression of exercises because your approach and the authors' may legitimately diverge.  This means you may occasionally have to look ahead to the answers to keep yourself on track. What is interesting here is that the three different areas evolve solutions with a deep structure in common and you develop something of an intuition of what it is that makes Monads inevitable.

The third section then discusses Monoids, Monads, Functors and Applicatives.  This is familiar territory if you've read Learn You a Haskell but there is considerably more depth of coverage, and because you've spent so much time thinking about the diverse design problems in the previous section, you may well find that your understanding of these deep abstractions takes a more tangible form.

Section 4 deals with the question - if your program is completely pure, how do you deal with an effect such as IO?  There are many aspects to this and the authors review the strengths and limitations of the IO Monad and discuss different alternatives to the problem of handling mutable state. The book ends with a fascinating chapter on Streaming IO.  This was a brave thing to do - the discussion is exploratory in style and feels its way towards a thorough explanation of the design principles behind the scalaz-stream library. It seems as though the chapter might have been developed in parallel with the development of the library itself and you almost think that you are one of the team. Surprisingly, no mention is made of scalaz, however you will learn many of the design principles that underpin it.  (This won't give you everything you need for the scalaz syntax, though.  For this, I would recommend learning-scalaz.)

This is a book which has already taught me a great deal and given me a huge amount of pleasure.  It is one that I will keep going back to until I finally feel that I am getting the hang of that mysterious activity which is called functional programming.
 

Friday, May 16, 2014

A SKI Calculator

I have recently been attempting the exercises in Raymond Smullyan's to mock a Mockingbird. This is a gentle and enjoyable introduction to combinatory logic. He chooses an applicative system of combinators where the objects are birds and so he names them combinator birds. They live in variety of different forests where the inhabitants of each forest have distinctive characteristics.

Combinator Birds

Here are some examples of Smullyan's birds:
  Bluebird: Bxyz = x(yz)
  Cardinal: Cxyz = xzy
  Warbler:  Wxy = xyy
  Identity: Ix = x 
These are conventionally written in a form which minimises the bracketing because the terms are considered to be left-associative. They can be re-written in a normal form which makes the association explicit - for example:
  Cardinal: Cxyz = (xz)y
  Warbler:  Wxy = (xy)y
In the early chapters you learn that you can build all the other birds in a forest by applying birds to other birds. In fact, the birds shown above (BCWI) form the basis for a whole class of birds, and this class can also be generated from a different set (BTMI). More strangely, you can build all the birds you need from just two:
  Starling: Sxyz = xz(yz)
  Kestrel:  Kxy = x
It is very easy to generate the Identity bird from these two, and conventionally you use all three to generate the others - hence the name SKI Calculus. The real interest is that this is a Turing-complete language - a fact that has been used in an elegant piece of work to show that the Scala type system is Turing-complete. But this is not my object here - rather it is to take up Smullyan's challenge on page 178 to write a program that converts any combinator bird to a SKI bird. It also makes sense to write an interpreter for a SKI bird so that when it is applied to the appropriate variables (x,y,z etc.) then the original combinator bird is reconstituted. The puzzles in chapter 18 lead you to discover a deterministic algorithm for doing this which Smullyan sketches out in the answers (although be careful - unfortunately his explanation contains a couple of typos).

Parse Tree

A combinator bird is best represented by a binary tree which can distinguish between the bracketed forms in the following way:

This tree can be generated with the following very simple grammar which recognizes birds in normal form:
   expression ::= term term | term 
   term ::= terminal | bracket
   bracket ::= "(" expression ")" 
   terminal ::= "x"| "y" | "z" | "w" | "v" 
A parse tree in Scala then is of course:
   trait Tree
   case class Leaf(value: String) extends Tree
   case class Node(left: Tree, right: Tree) extends Tree
It is then straightforward to build a parser using the parser combinator library - but be careful - this has been moved to its own jar in Scala 2.11 - scala-parser-combinators_2.11.

α-Elimination

The algorithm that converts a combinator bird to a SKI bird takes one variable at a time, starting with the outermost variable and removes it progressively from each node in the tree until it vanishes - being replaced by a tree of S,K and I. This process is called α-elimination and is an entirely mechanical process which involves invoking any or all of four Principles which are encoded as follows:
   // α-elimination of α alone is I (Iα=α)
   private def principle1 = I

   // α-elimination of X (α not in X) is KX (KXα=X)
   private def principle2(t: Tree) = Node(K,t)

   // α-elimination of Yα (α not in Y) is Y (Yα=α)
   private def principle3(t: Tree) = t

   // α-elimination of XY (X an Y both α-eliminations) is SXY
   private def principle4(l: Tree, r: Tree) =  Node(Node(S,l),r)
where we have:
   object SKINodes {
     val S = Leaf("S")
     val K = Leaf("K")
     val I = Leaf("I") 
   }
This is not a very efficient algorithm because you have to look down the tree to detect whether the α-variable you are trying to replace exists in each branch, and it may not find the optimal SKI representation, but it is entirely deterministic and once you have replaced each variable you obtain a tree where the leaf nodes contain only S, K or I leaves.

SKI Interpretation

To interpret a SKI tree, you attach it to the required variables at its apex and then walk the tree. I have chosen a left side tree walk where I continue translating each branch until (on looking down it) all SKI nodes have been replaced. At each re-write I effectively apply the left hand operator to the right-hand tree. Because I am dealing with a binary tree but the S and K operators require more than one parameter, I accumulate 'partial' operators flowing up the tree until they are fully satisfied. This is done by adding extra transient nodes to the tree:
  // transient nodes used in SKI interpretation - essentially representations
  // of partial application of S, K or I
  case object S0 extends Tree
  case object K0 extends Tree
  case object I0 extends Tree
  case class K1(child: Tree) extends Tree
  case class S1(child: Tree) extends Tree
  case class S2(left: Tree, right: Tree) extends Tree
Again, this algorithm is not extremely efficient because of the look-ahead, but appears to be deterministic (certainly for the class of birds in Smullyan's forest). It will run out of stack space for outrageously complex birds because I have not bothered to preserve stack space by means of tail recursion or trampolining. If you are interested, the code is here.

Wednesday, December 4, 2013

Lenses in Scala

I have just got back from Scala eXchange 2013 where the initial keynote session was from Simon Peyton Jones on Lenses: Compositional data access and manipulation. A presentation in pure Haskell was, in fact, quite a heavyweight start to a Scala conference. However, it turned out to be a lucid explanation of a beautiful abstraction, and one which, at first sight, would appear to be impossible to arrive at given the apparent inconsistency of the types. Have a look at Simon's presentation if you find time - it's well worth the effort.

A lens solves the problem of reading from, and updating, a deeply nested record structure in a purely functional way, but without getting mired in ugly, deeply nested and bracketed code. Anyway, the talk got me interested in finding out what was available with Scala. There seem to be two main contenders at the moment - from scalaz 7 and shapeless. I thought I'd compare the two, to see how they handle the simple name and address record structure used by Simon which is (in scala):
   case class Person (name: String, address: Address, salary: Int)
   case class Address (road: String, city: String, postcode: String)

   // sample Person
   val person = Person("Fred Bloggs", Address("Bayswater Road", "London", "W2 2UE"), 10000)

Lenses from Scalaz 7

A lens itself is a pretty simple affair, consisting only of a pair of functions, one for getting and another for setting the value that the lens describes, and doing this within the context of the container (i.e. the record). In scalaz, these are constructed with a good deal of boilerplate - there are two constructors available to you (lensg and lensu), one which curries the 'setting' function, and the other which does not:
    
   val addressLens = Lens.lensg[Person, Address] (
     a => value => a.copy(address = value),
     _.address
   )   
       
   // or
   
   val nameLens = Lens.lensu[Person, String] (
     (a, value) => a.copy(name = value),
      _.name
   )   
Notice that each lens is defined in terms of two types - that of the value itself and that of its immediate container. Typically, you would then have to get hold of a lens for every atomic value you might wish to access:
   
  val salaryLens = Lens.lensu[Person, Int] (
     (a, value) => a.copy(salary = value),
     _.salary
   )

   val roadLens = Lens.lensu[Address, String] (
     (a, value) => a.copy(road = value),
     _.road
   )

   val cityLens = Lens.lensu[Address, String] (
     (a, value) => a.copy(city = value),
     _.city
   )

   val postcodeLens = Lens.lensu[Address, String] (
     (a, value) => a.copy(postcode = value),
     _.postcode
   )
This soon becomes tedious. However, the true usefulness of lenses comes to light when you compose them:
   
  val personCityLens = addressLens andThen cityLens
  val personPostcodeLens = postcodeLens compose addressLens
i.e. you can stay at the top of your record structure and reach down to the very bowels just by using one of these composite lenses. Scalaz, of course, offers you a shorthand hieroglyphic for these composition functions. So you could write them:
   
  val personCityLens = addressLens >=> cityLens
  val personRoadLens = roadLens <=< addressLens
Now you have your lenses defined, you can use them like this:
  
  def get = salaryLens.get(person)
  // res0: Int = 10000

  def update = salaryLens.set(person, 20000)    
  // res1: Person = Person(Fred Bloggs,Address(Bayswater Road,London,W2 2UE),20000)

  def transform = salaryLens.mod(_ + 500, person) 
  // res2: Person = Person(Fred Bloggs,Address(Bayswater Road,London,W2 2UE),10500)

  def transitiveGet = personCityLens.get(person)
  //res3: String = London

  def transitiveSet:Person = {
    val person1 = personCityLens.set(person, "Wakefield") 
    val person2 = personRoadLens.set(person1, "Eastmoor Road") 
    personPostcodeLens.set(person2, "WF1 3ST")
  }
  // res4: Person = Person(Fred Bloggs,Address(Eastmoor Road,Wakefield,WF1 3ST),10000)
 
With scalaz, you also get good integration with the other type classes you might need to use. So, for example, lenses can be lifted to behave monadically because the %= method applies the update and returns a State Monad.:
  
  def updateViaMonadicState = {
        val s = for {c <- personRoadLens %= {"124, " + _}} yield c
        s(person)
      }
  //res13: (Person, String) = (Person(Fred Bloggs,Address(124, Bayswater Road,London,W2 2UE),10000),124, Bayswater Road)

Lenses from Shapeless

By contrast, Shapeless gets rid of all this boilerplate and allows for composition during the construction of the lenses themselves. It uses a positional notation to define the target field of each lens:
  
  val nameLens     = Lens[Person] >> 0
  val addressLens  = Lens[Person] >> 1
  val salaryLens   = Lens[Person] >> 2
  val roadLens     = Lens[Person] >> 1 >> 0
  val cityLens     = Lens[Person] >> 1 >> 1
  val postcodeLens = Lens[Person] >> 1 >> 2
These can then be used in pretty much the same way as Scalaz 7's lenses. The main difference is that mod becomes modify and function application is curried:
  
  def get = salaryLens.get(person)
  // res0: Int = 10000

  def update = salaryLens.set(person)(20000)    
  // res1: Person = Person(Fred Bloggs,Address(Bayswater Road,London,W2 2UE),20000)

  def transform = salaryLens.modify(person)( _ + 500) 
  // res2: Person = Person(Fred Bloggs,Address(Bayswater Road,London,W2 2UE),10500)

  def transitiveGet = cityLens.get(person)
  // res3: String = London

  def transitiveSet:Person = {
    val person1 = cityLens.set(person)("Wakefield") 
    val person2 = roadLens.set(person1)("Eastmoor Road") 
    postcodeLens.set(person2)("WF1 3ST")
  }
  // res4: Person = Person(Fred Bloggs,Address(Eastmoor Road,Wakefield,WF1 3ST),10000)
As far as I can make out, Shapeless doesn't offer any monadic behaviour.

Which One to Use?

It seems to me that if you're tempted to use lenses, removal of boilerplate is critical. As yet, this doesn't seem possible with Scalaz 7, but I'm sure it's only a matter of time. There are other initiatives as well, for example, macrocosm, an exploration of scala macros, has dynamic lens creation, as does Rillit. But for me, at the moment, Shapeless seems astonishingly elegant.

Sunday, September 29, 2013

CORS Support in Spray

My tradtunedb service is working nicely, but there is a problem with it.  In order to hear a midi tune being played, it has to be rendered.  At the moment, this is done by converting it to a wav file on the server - an expensive process which doesn't scale well.  A better solution is to render the tune in javascript on the browser - but this presents a further problem.  The architecture is such that the front- and back-end have different host names.  And this means that the browser will prevent an XMLHttpRequest to the back-end because of potential security breaches.

Why CORS?

Browsers all maintain a same-origin policy.  One reason this is necessary is that XMLHttpRequest passes the user's authentication tokens. Suppose he is logged in to theBank.com with basic auth and then visits untrusted.com. If there were no restrictions, untrusted.com could issue its own XMLHttpRequest to theBank.com with full authorisation and gain access to that user's private data.

But a blanket ban on cross-origin requests also prohibits legitimate use.  For this reason, most modern browsers support Cross-origin resource sharing (CORS). . This is a protocol which allows the browser to negotiate with the back-end server to discover whether or not such requests from 'foreign' domains are allowed by the server, and to make the actual requests.

When browsers issue requests, they always include the Origin header.  The essence of the protocol is that the server will respond with an Access-Control-Allow-Origin header if that Origin is acceptable to it.  Browsers will then allow the access to go ahead.  For example, here we have an XMLHttpRequest emanating from a script named midijs and hosted on a server running on localhost:9000:
Accept:*/*
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-GB,en-US;q=0.8,en;q=0.6
Cache-Control:no-cache
Connection:keep-alive
Host:192.168.1.64:8080
Origin:http://localhost:9000
Pragma:no-cache
Referer:http://localhost:9000/midijs
User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/28.0.1500.71 Chrome/28.0.1500.71 Safari/537.36
And here is a response from a back-end server that accepts the Origin:
Access-Control-Allow-Credentials:true
Access-Control-Allow-Origin:http://localhost:9000
Content-Length:2222
Content-Type:audio/midi; charset=UTF-8
Date:Sun, 29 Sep 2013 09:13:03 GMT
Server:spray-can/1.1-20130927
This is all that is necessary for GET requests.  Security issues are greater for requests that are not used for pure retrieval and/or where user credentials are supplied.  Here, CORS provides additional headers and a 'preflight request' mechanism which asks permission from the server before it makes the actual request. For more details see http://www.html5rocks.com/en/tutorials/cors/

CORS Support in Spray

The M8 release versions of Spray have no CORS support.  Initial support for the CORS headers has now been integrated into the forthcoming release of Spray. These can be accessed if you develop your own CORS directive which makes use of respondWith directives in order to set the appropriate headers. For example, here's an approach based on that of Cristi Boarlu that works very well:
import spray.http._
import spray.routing._
import spray.http.HttpHeaders._

trait CORSDirectives  { this: HttpService =>
  private def respondWithCORSHeaders(origin: String) =
    respondWithHeaders(      
      HttpHeaders.`Access-Control-Allow-Origin`(SomeOrigins(List(origin))),
      HttpHeaders.`Access-Control-Allow-Credentials`(true)
    )
  private def respondWithCORSHeadersAllOrigins =
    respondWithHeaders(      
      HttpHeaders.`Access-Control-Allow-Origin`(AllOrigins),
      HttpHeaders.`Access-Control-Allow-Credentials`(true)
    )

  def corsFilter(origins: List[String])(route: Route) =
    if (origins.contains("*"))
      respondWithCORSHeadersAllOrigins(route)
    else
      optionalHeaderValueByName("Origin") {
        case None => 
          route        
        case Some(clientOrigin) => {
          if (origins.contains(clientOrigin))
            respondWithCORSHeaders(clientOrigin)(route)
          else {
            // Maybe, a Rejection will fit better
            complete(StatusCodes.Forbidden, "Invalid origin")
          }      
        }
      }
}
And this is how it can be used in my case to protect a route which generates a response for a path that requests a particular file type (e.g. midi):
path(Segment / "tune" / Segment / Segment ) { (genre, tuneEncoded, fileType) =>  
  get { 
    val contentTypeOpt = getContentTypeFromFileType(fileType:String)
    if (contentTypeOpt.isDefined) {
      respondWithMediaType(contentTypeOpt.get.mediaType) { 
        corsFilter(MusicRestSettings.corsOrigin) {
          val tune = java.net.URLDecoder.decode(tuneEncoded, "UTF-8") 
          val futureBin = Tune(genre, tune).asFutureBinary(contentTypeOpt.get)
          _.complete(futureBin)
        }
      }
    }
    else {
     failWith(new Exception("Unrecognized file extension " + fileType))
    }
  } 
} ~    
There is also a discussion started by Tim Perrett which hopes to be able to wrap routes within a cors directive which will just 'do the right thing' with preflight requests and appropriate access control headers as unobtrusively as possible.

If you want to try this out yourself, you can try one of the release candidates that were published on October 23rd.

Embedded YouTube Videos and Chrome

I have just discovered a related problem to do with rendering embedded YouTube videos in Chrome. I want to extend my trad tunes web site with a comments page attached to each tune. The foot of the page will contain a form allowing the user to add a comment to the page, and this is particularly useful if she finds a video where the tune is being played. YouTube allows you to embed the video into the page which it manages by means of an iframe. Unfortunately, when she posts the tune, Chrome won't display it until the page is refreshed (although all other major browsers will). Instead it reports:
The XSS Auditor refused to execute a script in 'http://localhost:9000/genre/agenre/tune/atune/comments' because its source code was found within the request. The auditor was enabled as the server sent neither an 'X-XSS-Protection' nor 'Content-Security-Policy' header.
What's happening is that the browser must run javascript code to render the iframe, and it again thinks that it emanates from the server that provided the page which is thus making a cross-origin request. In this case, the solution is to add a request header instructing the browser to suspend the XSS Auditor check when it process the page. This is achieved with the X-XSS-Protection header. For example, the Play Framework Action that processes a valid comment might respond like this:
  
    val comment = commentForm.bindFromRequest.get
    commentsModel.add(comment)
    Ok(views.html.comments(commentsModel.getAll, commentForm, genre, tune)(userName)).withHeaders("X-XSS-Protection"->"0")
When Chrome sees this header, it immediately renders the video. Note, this technique doesn't work if you issue a redirect instead of responding directly.

Wednesday, July 31, 2013

Running a Play 2 Application as a Debian Service

Debian Services

If you have developed your Play app, you then need a straightforward way to deploy and administer it in your production Linux environment. The conventional way to do this is to install it as a service, which means you can ensure it starts automatically when the server starts and your system administrators will know how to stop or restart it or check its status. This is an area where the Play documentation seems to be lacking. Here's how I approach things.

Tanukisoft Wrapper

The Tanukisoft Wrapper takes much of the pain away from the work of converting a JVM application into a well behaved service.  You need to downloaded a version appropriate to your target machine, then pretty much all you have to do is name your service and edit a single configuration file - wrapper.conf.  This lets you set the classpath, the application main class, the JVM parameters, the logging levels and so on.  There are then various mechanisms for converting your application into a service but by far the simplest is to to use WrapperSimpleApp as your main class in wrapper.conf and let this hand off to Play's main class by setting it as the first application parameter.  You also have a skeletal startup shell script which you rename to the name of the service you wish to use. So for example you can then invoke your application from the bin directory using
./myservice stop
./myservice start
./myservice status
and so on.

Integrating with Play

The play dist command builds a zip file containing all the dependencies which it used to place in the dist directory but since 2.2.0 puts in target/universal. With my tradtunestore application, it builds:



As you can see, all the dependent jars are placed in tradtunestore-1.1-SNAPSHOT/lib and there is also a start script which shows that the main class is named play.core.server.NettyServer.  We are going to ignore this script because we are replacing it with Tanuki.  All you have to do now is extract the contents of the zip file and place it into your Tanuki lib directory.  Now you configure wrapper.conf to pick up all these jars in the classpath and invoke the NettyServer as the sole app wrapper parameter. Here's the relevant section that describes the Java application:

#********************************************************************
# Java Application
#  Locate the java binary on the system PATH:
wrapper.java.command=java
#  Specify a specific java binary:
set.JAVA_HOME=/usr/lib/jvm/jdk1.7.0
wrapper.java.command=%JAVA_HOME%/bin/java

# Tell the Wrapper to log the full generated Java command line.
wrapper.java.command.loglevel=INFO

# Java Main class.  This class must implement the WrapperListener interface
#  or guarantee that the WrapperManager class is initialized.  Helper
#  classes are provided to do this for you.  See the Integration section
#  of the documentation for details.
wrapper.java.mainclass=org.tanukisoftware.wrapper.WrapperSimpleApp

# Java Classpath (include wrapper.jar)  Add class path elements as
#  needed starting from 1
wrapper.java.classpath.1=../lib/*.jar
wrapper.java.classpath.2=../lib/tradtunestore-1.1-SNAPSHOT/lib/*.jar

# Java Library Path (location of Wrapper.DLL or libwrapper.so)
wrapper.java.library.path.1=../lib

# Java Bits.  On applicable platforms, tells the JVM to run in 32 or 64-bit mode.
wrapper.java.additional.auto_bits=TRUE

# Java Additional Parameters
wrapper.java.additional.1=-Dconfig.file=../conf/application.conf

# Initial Java Heap Size (in MB)
#wrapper.java.initmemory=3

# Maximum Java Heap Size (in MB)
#wrapper.java.maxmemory=64

# Application parameters.  Add parameters as needed starting from 1
wrapper.app.parameter.1=play.core.server.NettyServer

init.d

The final step is to install the service into init.d.  With Tanuki, this is simple - from your Tanuki bin directory, enter:
sudo ./myservice install
You can then control the service using:
sudo service myservice stop
sudo service myservice start
sudo service myservice status
and so on from any directory.