Monday, July 14, 2014

Parsing RAML with FParsec, Part 2: The Spec That Launched A Thousand LOC

RAML, the Spec

So, what exactly does RAML look like?  You can find the official spec on GitHub, but that seems a bit remote (Ha!  Sorry, couldn't resist the DVCS pun :)).  Let's take a look at what some actual specs would look like and see if we can build an intuition from that.

Here's an embed of a file that's in my repo that I'm using for testing:


First off, RAML is an extension of YAML, which is a significant-whitespace, human-readable data interchange format.  It also supports remote references, which may will likely be a pain to parse later.  This isn't an exhaustive example of all the things RAML must support, but it's enough to get going.

Baby Steps

My general approach with this project is going to be to take each item in line in this example, taking care to adhere to the spec as best as I can interpret for each type of element.  I'm going to try to be pretty exhaustive, because I'm pretty sure this blog is going to be my own reference for what I was thinking when I'm crawling through git blame cursing at myself.

So, let's start out with the very first tag, the RAML spec number.  It looks like this:


This looks very similar to a YAML version number; namely it's 
  1. a special comment marker (#%), 
  2. a file type (RAML), 
  3. some whitespace, 
  4. and the version number.  
Because we have these discrete elements, why not parse them out in the same way I just described?  The nice thing about FParsec, and other libraries known as "applicative combinators", is that it allows us to do just that.  We can define parsers for each of the elements above, chain them in sequence, and get some greater meaning out of the disparate parts.

AST, Oh Me!

Before we jump into the parsing, we need to have some idea of what the final representation of our API will look like.  This schema is an Abstract Syntax Tree, or AST.  We'll define the shape of RAML in an AST, and then define a parser for a single part of the AST, joining them together one by one until we have a final, full implementation of the spec.

So! Starting from the very beginning of our RAML sample, the spec number, our AST would look like this in F#:


Our spec, initially, will only be able to parse the version.  So let's go to that.

Our First Parser

FParsec comes with a multitude of parsers. Let's take a look at the first one we'll be using, skipString.  Its signature is:
val skipString: string -> Parser<unit, 'u>
Let's break this down a bit. 
  1. val skipString- this is just the name of the function . This parser is a public member of its containing module.
  2. string -> - this parser is a function that takes a string and returns something else
  3. Parser<unit, 'u> - this parser yields unit (or void), and some generic User State.  This User State is something that we don't care about right now, so pretend it's of type unit as well, or void.
So how do we use this to do something useful?  Well, if you combine the skipString parser with a string to expect, what it'll do is walk through the input stream and consume that string, and then throw it away, advancing us along.  This is perfect for situations like we have here, where there's some useless string standing between us and the version number we want to parse.  Let's try using the skipString parser generator function to generate a parser for the sentinel string for the version number.


Ok, we're set.  Now, the whole point of FParsec is to allow us to chain building block like these together to make more complex constructs.  We chain these together with a bunch of different operators.  The ones I use most commonly, so far at least, are:
  • .>>
    • This operator is used like this: parserA .>> parserB, and means "evaluate parserA, then evaluate parserB, and return the result of parserA"
  • >>.
    • This operator is used the same way, but returns the result of parserB instead.
So, to make a full parser for the version string line in a spec, we'd want to chain together a parser for the sentinel (which we've done), a parser for the space, a parser for the version number, and a parser that advances us to the next line.  Using the operators we just saw, that looks like this:

 If you read this in order, it reads:
  1. Evaluate "skipString "#%RAML"" and "ws", returning the former,
  2. Evaluate that previous result and "pfloat", returning the latter,
  3. Evaluate that previous result and "advanceToEOL" (which is really just an alias for blowing through the rest of the line) and return the former.
Which, in essence, is what we were trying to do with our stated goal of "Parsing a RAML version string".  Now that we have this value, we have to get it into an instance of RamlDef, which FParsec does using the operator |>>, which means "apply the result of the Parser on the left (remember the result of a Parser<'a,'b> is 'a) as the first parameter to the function on the right. So if we have a function 'makeRamlDef' that accepts a float, we could do 

Which, as written, is read "the raml parser is the result of the version parser applied to the function makeRamlDef"

And now, if you go to an f# interactive, or a test project or something, and call the FParsec run method


You'll end up with "success"!

Conclusion

So now we've at least got a single parser.  Next time we'll look at combining separate parsers for RAML spec data in sequence as we fill out more of the spec.

Parsing RAML with FParsec, Part 1: Background and Goals

Intro

Hi everyone, and welcome to what I hope will be an exciting journey into the world of parsers and combinators.  

Yeah, those words terrify me too.

I had this crazy idea one day at work that it would be fantastic if I had some better tooling around our internal representation of a particular REST service.  In general, these things are planned out ahead of time and contracts are negotiated, but even in the best of times some details can be left out and consistency in you API can be something that's hard to quantify.

So I went and did an initial search to see what the tooling was like and stumbled across RAML.org, the landing page of the RESTful API Modeling Language.  This seemed like a wonderful tool, with a spec that seemed well-geared towards encouraging API consistency by use of common traits and security schemes.  It also has visualizers and tooling around code generation, which can help reduce the time it takes to crank out the internals of your API.  Anyway, check out RAML or any of its competitors if you're designing an API.

Now, work is mostly .Net languages, and most(all?) of the existing RAML tools are in more...dynamic languages.  I thought it would be really helpful to have some sort of automated converter for .Net that would allow you to populate a structure with your API setup and generate a RAML-compliant description, at which point you could plug that into existing tooling, make changes, and design new APIs.  You could also use this model to do code generation, analysis, etc.

So at this point the delusions of grandeur started to flow, and I thought "If I have a parser, I could totally generate a F# Type Provider using the RAML as the data source to allow for easy exploration and consumption of that API."

And then "If I have an explorable, strongly-typed representation of a REST endpoint, there are F# libraries for generating comprehensive test cases for various inputs, so I MIGHT even be able to exhaustively test a subset of RAML-compliant REST services AUTOMATICALLY!"

But all of these cases require the parser, and so I began to look at what tools I had available.  F#, and functional languages in general, have developed a reputation for being good at parsing text into data.  In addition, I'd developed a fancy for the language after working with the magnificent, cross-platform build tool FAKE.  So it was that I stumbled upon FParsec, and so begins this series of blog posts.

So, to summarize, I hope to use FParsec and the RAML spec to:
  1. Generate a .Net native parser for the RAML spec that outputs a .Net type representing a REST endpoint and its constraints, methods, etc
  2. Generate a Type Provider that you can direct at a RAML spec, either a URI or a local file, and use to explore that API at design time, and (stretch goal)
  3. Develop a tool that, given an endpoint with sufficient constraints around the inputs and outputs of its routes, can be tested in some degree automatically for compliance to a matching RAML spec.
The next post should follow shortly, and we'll start with the basics of parsing using FParsec.  Do note that I'm still quite new to this, so please join in with any comments or suggestions!