Friday, June 15, 2007

Boilerplate considered harmful (Uniplate edition!)

I've just released my Uniplate library, which hopes to remove the boilerplate from Haskell programs. I try to keep my PhD research as practically orientated as I can, but often research things start off as proof of concept and get refined into something more practically usable over time. Uniplate is different - its the one part of my PhD that should be useful to almost everyone, and is useful and polished right now.

I've been working on Uniplate (or Play, as it was originally called) for about two years now. Uniplate did not start off as a project to remove boilerplate, but in my work on Catch I found various patterns kept cropping up. I started to think about abstracting them out - first in a very chaotic manner - then as time progressed I refined the ideas until they hung together much more convincingly. I've used Uniplate in Catch (well over 100 times!), in Yhc.Core (support is built into the library), and most recently in Hoogle. In addition, a few other people have picked up Uniplate (Eric Mertens and Matt Naylor), and with very little time were quite fluent in the use of the library.

I previously posted a bit about how you could use Scrap Your Boilerplate (SYB) to remove boilerplate from Haskell. I'm now going to recap that post, but giving the examples using Uniplate as well. Hopefully this will start to encourage people to make use of Uniplate - the results can be very effective. Recently Matt ported one of his projects, and some functions went from 20 lines of complicated code to 3 lines of simple code - revealing some bugs in the original definition in the process!

A Data Structure

Before showing some operations, I'm going to first introduce a data structure on which we can imagine operations are performed. Here is a data type that looks like an imperative programming language:


{-# OPTIONS_GHC -fglasgow-exts #-}
import Data.Generics
import Data.Generics.PlateData

data Expr = Var String
| Lit Int
| Call String [Expr]
deriving (Data, Typeable)

data Stmt = While Expr [Stmt]
| Assign String Expr
| Sequence [Stmt]
deriving (Data,Typeable)


We define the data type as normal, adding deriving for Data and Typeable - the two key SYB types. We also add an import of Data.Generics and a flag, just to get the GHC machinery working for the derivings. Finally, we add an import of Data.Generics.PlateData - which says that we want to automatically derive the necessary Uniplate instances, and use the Uniplate operations.

Queries

So lets imagine you have to get a list of all literals. In SYB this is easy:


extractLits :: Data a => a -> [Int]
extractLits = everything (++) ([] `mkQ` f)
where f (Lit x) = [x]
f _ = []


Wow, easy! But we can make it even easier with Uniplate:


extractLits :: Data a => a -> [Int]
extractLits x = [y | Lit y <- universeBi x]


Both functions will operate on anything which has a Data instance, so you can run it on an Expr, Stmt, [Stmt], [Either Stmt Expr] - the choice is yours. The Uniplate function universeBi simply gives you a list of all the Expr types in x, from which you can pick the Lit's using a nice list comprehension.

Transformations

Now lets negate all the literals, in SYB we have:


negLit (Lit x) = Lit (negate x)
negLit x = x

negateLits :: Data a => a -> a
negateLits = everywhere (mkT negLit)


Again, its pretty easy. We can also do something very similar in Uniplate:


negateLits :: Data a => a -> a
negateLits = transformBi negLit


The only difference is a mkT.

Advantages of Uniplate

There are a number of advantages:


  • Compatability - the above code will work only in GHC, but if we modify the instance declaration to remove deriving(Data,Typeable) and replace it with an explicit instance (generated by Derive, if you wish), then the Uniplate code will work perfectly happy in Hugs. The Uniplate library also provides substantial Haskell 98 compatibility.

  • Built in compiler support in GHC to the same level as SYB - piggy-backing off the SYB support.

  • Usually produces shorter code - especially for queries.

  • A range of traversals not in SYB (although SYB could add them - and I believe should)

  • Higher performance - about 30% faster in the above examples, up to 80% faster if you are will to write different instances.



What Uniplate is NOT

The SYB library has gone much further than queries and transformations - they provide what amounts to runtime reflection and an incredible level of power. They also offer type generic programming, extensible functions and much more besides. Uniplate does not attempt to offer anything beyond standard traversals.

It is also important to point out that SYB is strictly more powerful than Uniplate. You can implement Uniplate on top of SYB, but cannot implement SYB on top of Uniplate. I would still encourage everyone to read up on SYB - it can be complex to pull of some of the more advanced tricks, but the power can take Haskell to whole new levels.

Conclusion

I hope that people will take a look at Uniplate, and see if it can meet their needs - in any project where a data type is defined and operated over it can probably be of benefit. The code reductions that can be made with Uniplate (or SYB) are substantial, and can hopefully promote the concise declarative style which Haskell was designed for.

3 comments:

Unknown said...

Very cool. I like the syntax more than SYB, it's easier to grasp. I guess I should read into the details of the source.

Anonymous said...

That's very impressive. Is all this fancy traversal happening at run time?

Neil Mitchell said...

orcof: The value only exists at runtime, so yes, the traversal is done at runtime. However, unlike SYB, some of the computation of what traversal to do is done at compile time, which is the reason for being faster than SYB.