Monday, August 23, 2010

CmdArgs Example

Summary: A simple CmdArgs parser is incredibly simple (just a data type). A more advanced CmdArgs parser is still pretty simple (a few annotations). People shouldn't be using getArgs even for quick one-off programs.

A few days ago I went to the Haskell Hoodlums meeting - an event in London aimed towards people learning Haskell. The event was good fun, and very well organised - professional quality Haskell tutition for free by friendly people - the Haskell community really is one of Haskell's strengths! The final exercise was to write a program that picks a random number between 1 and 100, then has the user take guesses, with higher/lower hints. After writing the program, Ganesh suggested adding command line flags to control the minimum/maximum numbers. It's not too hard to do this directly with getArgs:


import System.Environment

main = do
[min,max] <- getArgs
print (read min, read max)


Next we discussed adding an optional limit on the number of guesses the user is allowed. It's certainly possible to extend the getArgs variant to take in a limit, but things are starting to get a bit ugly. If the user enters the wrong number of arguments they get a pattern match error. There is no help message to inform the user which flags the program takes. While getArgs is simple to start with, it doesn't have much flexibility, and handles errors very poorly. However, for years I used getArgs for all one-off programs - I found the other command line parsing libraries (including GetOpt) added too much overhead, and always required referring back to the documentation. To solve this problem I wrote CmdArgs.

A Simple CmdArgs Parser

To start using CmdArgs we first define a record to capture the information we want from the command line:


data Guess = Guess {min :: Int, max :: Int, limit :: Maybe Int} deriving (Data,Typeable,Show)


For our number guessing program we need a minimum, a maximum, and an optional limit. The deriving clause is required to operate with the CmdArgs library, and provides some basic reflection capabilities for this data type. Once we've written this data type, a CmdArgs parser is only one function call away:


{-# LANGUAGE DeriveDataTypeable #-}
import System.Console.CmdArgs

data Guess = Guess {min :: Int, max :: Int, limit :: Maybe Int} deriving (Data,Typeable,Show)

main = do
x <- cmdArgs $ Guess 1 100 Nothing
print x


Now we have a simple command line parser. Some sample interactions are:


$ guess --min=10
NumberGuess {min = 10, max = 100, limit = Nothing}

$ guess --min=10 --max=1000
NumberGuess {min = 10, max = 1000, limit = Nothing}

$ guess --limit=5
NumberGuess {min = 1, max = 100, limit = Just 5}

$ guess --help
The guess program

guess [OPTIONS]

-? --help Display help message
-V --version Print version information
--min=INT
--max=INT
-l --limit=INT


Adding Features to CmdArgs

Our simple CmdArgs parser is probably sufficient for this task. I doubt anyone will be persuaded to use my guessing program without a fancy iPhone interface. However, CmdArgs provides all the power necessary to customise the parser, by adding annotations to the input value. First, we can modify the parser to make it easier to add our annotations:


guess = cmdArgsMode $ Guess {min = 1, max = 100, limit = Nothing}

main = do
x <- cmdArgsRun guess
print x


We have changed Guess to use record syntax for constructing the values, which helps document what we are doing. We've also switched to using cmdArgsMode/cmdArgsRun (cmdArgs which is just a composition of those two functions) - this helps avoid any problems with capturing the annotations when running repeatedly in GHCi. Now we can add annotations to the guess value:


guess = cmdArgsMode $ Guess
{min = 1 &= argPos 0 &= typ "MIN"
,max = 100 &= argPos 1 &= typ "MAX"
,limit = Nothing &= name "n" &= help "Limit the number of choices"}
&= summary "Neil's awesome guessing program"


Here we've specified that min/max must be at argument position 0/1, which more closely matches the original getArgs parser - this means the user is always forced to enter a min/max (they could be made optional with the opt annotation). For the limit we've added a name annotation to say that we'd like the flag -n to map to limit, instead of using the default -l. We've also given limit some help text, which will be displayed with --help. Finally, we've given a different summary line to the program.

We can now interact with our new parser:


$ guess
Requires at least 2 arguments, got 0

$ guess 1 100
Guess {min = 1, max = 100, limit = Nothing}

$ guess 1 100 -n4
Guess {min = 1, max = 100, limit = Just 4}

$ guess -?
Neil's awesome guessing program

guess [OPTIONS] MIN MAX

-? --help Display help message
-V --version Print version information
-n --limit=INT Limit the number of choices


The Complete Program

For completeness sake, here is the complete program. I think for this program the most suitable CmdArgs parser is the simpler one initially written, which I have used here:


{-# LANGUAGE DeriveDataTypeable, RecordWildCards #-}

import System.Random
import System.Console.CmdArgs

data Guess = Guess {min :: Int, max :: Int, limit :: Maybe Int} deriving (Data,Typeable)

main = do
Guess{..} <- cmdArgs $ Guess 1 100 Nothing
answer <- randomRIO (min,max)
game limit answer

game (Just 0) answer = putStrLn "Limit exceeded"
game limit answer = do
putStr "Have a guess: "
guess <- fmap read getLine
if guess == answer then
putStrLn "Awesome!!!1"
else do
putStrLn $ if guess > answer then "Too high" else "Too low"
game (fmap (subtract 1) limit) answer


(The code in this post can be freely reused for any purpose, unless you are porting it to the iPhone, in which case I want 10% of all revenues.)

Monday, August 16, 2010

CmdArgs 0.2 - command line argument processing

I've just released CmdArgs 0.2 to Hackage. CmdArgs is a library for defining and parsing command lines. The focus of CmdArgs is allowing the concise definition of fully-featured command line argument processors, in a mainly declarative manner (i.e. little coding needed). CmdArgs also supports multiple mode programs, as seen in darcs and Cabal. For some examples of CmdArgs, please see the manual or the Haddock documentation.

For the last month I've been working on a complete rewrite of CmdArgs. The original version of CmdArgs did what I was hoping it would - it was concise and easy to use. However, CmdArgs 0.1 had lots of rough edges - some features didn't work together, there were lots of restrictions on which types of fields appear where, and it was hard to add new features. CmdArgs 0.2 is a ground up rewrite which is designed to make it easy to maintain and improve in the future.

The CmdArgs 0.2 API is incompatible with CmdArgs 0.1, for which I apologise. Some of the changes you will notice:


  • Several of the annotations have changed name (text has become help, empty has become opt).

  • Instead of writing annotations value &= a1 & a2, you write value &= a1 &= a2.

  • Instead of cmdArgs "Summary" [mode1], you write cmdArgs (mode1 &= summary "Summary"). If you need a multi mode program use the modes function.



All the basic principles have remained the same, all the old features have been retained, and translating parsers should be simple local tweaks. If you need any help please contact me.

Explicit Command Line Processor

The biggest change is the introduction of an explicit command line framework in System.Console.CmdArgs.Explicit. CmdArgs 0.1 is an implicit command line parser - you define a value with annotations from which a command line parser is inferred. Unfortunately there was not complete separation between determining what parser the user was hoping to define, and then executing it. The result was that even at the flag processing stage there were still complex decisions being made based on type. CmdArgs 0.2 has a fully featured explicit parser that can be used separately, which can process command line flags and display help messages. Now the implicit parser first translates to the explicit parser (capturing the users intentions), then executes it. The advantages of having an explicit parser are substantial.


  • It is possible to translate the implicit parser to an explicit parser (cmdArgsMode), which makes testing substantially easier. As a result CmdArgs 0.2 has far more tests.

  • I was able to write the CmdArgs command line itself in CmdArgs. This command line is a multiple mode explicit parser, which has many sub modes defined by implicit parsers.

  • The use of explicit parsers also alleviates one of the biggest worries of users, the impurity. Only the implicit parser relies on impure functions to extract annotations. In particular, you can create the explicit parser once, then rerun it multiple times, without worrying that GHC will optimise away the annotations.

  • Once you have an explicit parser, you can modify it afterwards - for example programmatically adding some flags.

  • The explicit parser better separates the internals of the program, making each stage simpler.



The introduction of the explicit parser was a great idea, and has dramatically improved CmdArgs. However, there are still some loose ends. The translation from implicit to explicit is a multi-stage translation, where each stage infers some additional information. While this process works, it is not very direct - the semantics of an annotation are hard to determine, and there are ordering constrains on these stages. It would be much better if I could concisely and directly express the semantics of the annotations, which is something I will be thinking about for CmdArgs 0.3.

GetOpt Compatibility

While the intention is that CmdArgs users write implicit parsers, it is not required. It is perfectly possible to write explicit parsers directly, and benefit from the argument processing and help text generation. Alternatively, it is possible to define your own command line framework, which is then translated in to an explicit parser. CmdArgs 0.2 now includes a GetOpt translator, which presents an API compatible with GetOpt, but operates by translating to an explicit parser. I hope that other people writing command line frameworks will consider reusing the explicit parser.

Capturing Annotations

As part of the enhanced separation of CmdArgs, I have extracted all the code that captures annotations, and made it more robust. The essential operation is now &= which takes a value and attaches an annotation. (x &= a) &= b can then be used to attach the two annotations a and b (the brackets are optional). Previously the capturing of annotations and processing of flags was interspersed, meaning the entire library was impure - now all impure operations are performed inside capture. The capturing framework is not fully generic (that would require module functors, to parameterise over the type of annotation), but otherwise is entirely separate.

Consistentency

One of the biggest changes for users should be the increase in consistency. In CmdArgs 0.1 certain annotations were bound to certain types - the args annotation had to be on a [String] field, the argPos annotation had to be on String. In the new version any field can be a list or a maybe, of any atomic type - including Bool, Int/Integer, Double/Float and tuples of atomic types. With this support it's trivial to write:


data Sample = Sample {size :: Maybe (Int,Int)}


Now you can run:


$ sample
Sample {size = Nothing}

$ sample --size=1,2
Sample {size = Just (1,2)}

$ sample --size=1,2 --size=3,4
Sample {size = Just (3,4)}


Hopefully this increase in consistency will make CmdArgs much more predictable for users.

Help Output

The one thing I'm still wrestling with is the help output. While the help output is reasonably straightforward for single mode programs, I am still undecided how multiple mode programs should be displayed. Currently CmdArgs displays something I think is acceptable, but not optimal. I am happy to take suggestions for how to improve the help output.

Conclusion

CmdArgs 0.1 was an experiment - is it possible to define concise command line argument processors? The result was a working library, but whose potential for enhancement was limited by a lack of internal separation. CmdArgs 0.2 is a complete rewrite, designed to put the ideas from CmdArgs 0.2 into practical use, and allow lots of scope for further improvements. I hope CmdArgs will become the standard choice for most command line processing tasks.