Saturday, November 05, 2011

Abstract Generics with Uniplate

Summary: The new version of Uniplate has several wrappers for working with abstract data types, such as Map from the containers package. These wrappers let you transform abstract data types without breaking their invariants.

Abstract data types, such as Map from the containers package, hide their internal structure so they can maintain invariants necessary for their correct operation. Generic programming, such as the Data class used by SYB, allows programs to generically traverse data types, without detailed knowledge of their structure. The two are somewhat incompatible - if Map allows manipulating it's internal structure, the invariants could be broken causing Map to perform incorrectly.

Using the Data class, a Map String Int claims it has no constructors, but if you operate on it with gfoldl is behaves as though it were [(String,Int)]. When Uniplate traverses the Map, such as when searching for an Int, it first analyses the available constructors to see if a Map String Int can possibly contain an Int. Since Map has no constructors, it concludes that Map cannot contain an Int, and Uniplate fails to operate on the contained Int's.

For people who use the Data-style Uniplate module, using the new version of Uniplate you can now correctly operate over Map String Int, provided you use the newtype Map wrapper from Data.Generics.Uniplate.Data.Instances. When you transform over Bool (which does not touch the Map) it will ignore the Map and take O(1). When you transform over Int it will reconstruct the Map in O(n), and if you transform String or Char it will reconstruct the Map in O(n log n). Regardless of what operations you do, it will work efficiently and correctly. As an example:


$ import qualified Data.Map as Map
$ import Data.Char
$ import Data.Generics.Uniplate.Data
$ import Data.Generics.Uniplate.Data.Instances
$ fromMap $ transformBi toUpper $ toMap $ Map.fromList [("haskell",12),("test",18)]
fromList [("HASKELL",12),("TEST",18)]


There are two approaches for dealing with the Map problem in Uniplate. For users of the Direct-style Uniplate module, there is a function plateProject, which has been available for some time. For users of the Data-style Uniplate module, or for users of the SYB package, there is a new module Data.Generics.Uniplate.Data.Instances in uniplate-1.6.4 (released today) which provides three types with special Data instances (Hide, Trigger, Invariant). Using these three types we can construct wrappers providing Data instances for abstract types, and Uniplate includes wrappers for several of the types in the containers package (Map, Set, IntMap, IntSet).

plateProject

The plateProject function helps you define Direct Uniplate instances for abstract types. Instead of defining how to examine the data type, you instead define how to transform the data type into one you can examine, and how to transform it back. As an example:


instance Biplate (Map.Map [Char] Int) Char where
biplate = plateProject Map.toList Map.fromList


If the types ensure that any operations will not change the keys we can optimise and use the fromDistictAscList function to reconstruct the Map:


instance Biplate (Map.Map [Char] Int) Int where
biplate = plateProject Map.toAscList Map.fromDistinctAscList


Hide

The Hide data type is useful for wrapping values that you want to ignore with Uniplate. The type is defined as:


data Hide a = Hide {fromHide :: a}


But has a Data instance that pretends it is defined using the extension EmptyDataDecls as:


data Hide a


As an example of avoiding particular values, you can write:


transformBi (+1) (1, 2, Hide 3, Just 4) == (2, 3, Hide 3, Just 5)


As a result of having no constructors, any calls to the methods toConstr or gunfold will raise an error.

Trigger

The Trigger data type is useful for determining when a value was constructed with the Data methods. It is defined as:


data Trigger a = Trigger {trigger :: Bool, fromTrigger :: a}


But the Data instance pretends that it is defined as:


data Trigger a = Trigger a


However, whenever gfoldl or gunfold constructs a new value, it will have the trigger field set to True. The trigger information is useful to indicate whether any invariants have been broken, and thus need fixing. As an example:


data SortedList a = SortedList (Trigger [a]) deriving (Data,Typeable)
toSortedList xs = SortedList $ Trigger False $ sort xs
fromSortedList (SortedList (Trigger t xs)) = if t then sort xs else xs


This data type represents a sorted list. When constructed the items are initially sorted, but operations such as gmapT could break that invariant. The Trigger type is used to detect when the Data operations have been performed, and resort the list.

The Trigger type is often used in conjunction with Invariant, which fixes the invariants.

Invariant

The Invariant data type is useful for ensuring that an invariant is always applied to a data type. It is defined as:


data Invariant a = Invariant {invariant :: a -> a, fromInvariant :: a}


But the Data instance pretends that it is defined as:


data Invariant a = Invariant a


Whenever a gfoldl constructs a new value, it will have the function in the invariant field applied to it. As an example:


data SortedList a = SortedList (Invariant [a]) deriving (Data,Typeable)
toSortedList xs = SortedList $ Invariant sort (sort xs)
fromSortedList (SortedList (Invariant _ xs)) = xs


Any time an operation such as gmapT is applied to the data type, the invariant function is applied to the result. The fromSortedList function can then rely on this invariant.

The gunfold method is partially implemented - all constructed values will have an undefined value for all fields, regardless of which function is passed to fromConstrB. If you only use fromConstr (as Uniplate does) then the gunfold method is sufficient.

Map

Using the Hide, Trigger and Invariant types, we can define a wrapper for the containers Map type as:


newtype Map k v = Map (Invariant (Trigger [k], Trigger [v], Hide (Map.Map k v)))
deriving (Data, Typeable)


The Map type is defined as an Invariant of three components - the keys, the values, and the underlying Map. We use Invariant to ensure that the keys/values/map always remain in sync. We use Trigger on the keys and values to ensure that whenever the keys or values change we rebuild the Map, but if they don't, we reuse the previous value. The function to extract a containers Map from the wrapper requires only simple pattern matching:


fromMap (Map (Invariant _ (_,_,Hide x))) = x


The function to wrap a containers Map is slightly harder, as we need to come up with an invariant restoring function:


toMap :: Ord k => Map.Map k v -> Map k v
toMap x = Map $ Invariant inv $ create x
where
create x = (Trigger False ks, Trigger False vs, Hide x)
where (ks,vs) = unzip $ Map.toAscList x

inv (ks,vs,x)
| trigger ks = create $ Map.fromList $ zip (fromTrigger ks) (fromTrigger vs)
| trigger vs = create $ Map.fromDistinctAscList $ zip (fromTrigger ks) (fromTrigger vs)
| otherwise = (ks,vs,x)


The create function creates a value from a Map, getting the correct keys and values. The inv function looks at the triggers on the keys/values. If the keys trigger has been tripped, then we reconstruct the Map using fromList. If the values trigger has been tripped, but they keys trigger has not, we can use fromDistinctAscList, reducing the complexity of constructing the Map. If nothing has changed we can reuse the previous value.

The end result is that all Uniplate (or SYB) traversals over Map result in a valid value, which has had all appropriate transformations applied.

2 comments:

Unknown said...

Very interesting. I'd run across the problem, but never really conceived a solution. This has a nice feel to me, at first glance.

Interesting point about gunfold. It doesn't match my experience with generic serialization, which always seems to die from some type having gunfold = error. I'd like to see the minimal Data definition cleared up. Failing on these abstract types has been a real achilles heel and it would be great to get a standardized solution in place.

Minor bug: fromSortedList (SortedList (Trigger t xs)) = if trigger x then sort xs else xs

should be 'if trigger t' I presume.

Neil Mitchell said...

Clifford: Thanks for spotting the typo, should actualy be if t then ... - fixed in the post and in the Uniplate docs.

I agree that we should do something with abstract data types, but I'm not convinced Data is quite the right typeclass - it may need splitting up into two, or changing the methods slightly. Anyway, hopefully this post is a partial solution keeping what we have now.