Functional Programming

Structure and Efficiency of Computer Programs

Planet Haskell - Wed, 12/17/2014 - 07:00
For decades my colleague, Guy Blelloch, and I have been promoting a grand synthesis of the two “theories” of computer science, combinatorial theory and logical theory.  It is only a small exaggeration to say that these two schools of thought operate in isolation.  The combinatorial theorists concern themselves with efficiency, based on hypothetical translations of high-level […]

GHC Weekly News - 2014/12/16

Planet Haskell - Wed, 12/17/2014 - 07:00
Hi *, time for another piece of the GHC weekly news! Joachim Breitner has gotten the new GHC 7.8.4 package to tentatively build on ARM quite easily for Debian. Austin also took the liberty of merging all the needed patches; they'll be part of the 7.8.4 release ​https://www.haskell.org/pipermail/ghc-devs/2014-December/007608.html Greg Weber announced he's taken the time to set up a Docker image for GHC development - if you're on Linux, Greg's image should help you get up to speed with a GHC development environment in minutes! ​https://www.haskell.org/pipermail/ghc-devs/2014-December/007606.html Lennart Kolmodin has spent time working on autocompletion for GHC, and 7.10 will ship with bash completion scripts - which package maintainers and distributions can now ship for their users. Thank you Lennart! ​https://www.haskell.org/pipermail/ghc-devs/2014-December/007614.html Adam Gundry has a question about the new type checker plugin infrastructure; in particular - how do we deal with the serialization of type checker evidence that plugins may want to create or pass around on their own? Richard, Simon and Iavor weigh in. ​https://www.haskell.org/pipermail/ghc-devs/2014-December/007626.html For the past few days, Richard Eisenberg has been hunting a performance regression in the compiler. After profiling, discussion on IRC and elsewhere, Richard has finally made some headway, and discovered one of the 'hot spots' in his patch. Unfortunately the battle isn't quite over just yet, and the hunt for a few more % increase remains. ​https://www.haskell.org/pipermail/ghc-devs/2014-December/007645.html David Spies has hit a very strange situation with GHC 7.8.3 running out of memory. But it turned out this was a change in 7.8, in relation to how stacks were managed. Phew! ​https://www.haskell.org/pipermail/ghc-devs/2014-December/007646.html Austin made a final call for 7.8.4 bugfixes. He plans on making the final release this week, if nobody has any other major complaints. ​https://www.haskell.org/pipermail/ghc-devs/2014-December/007684.html Finally, in a slight change, we'll also be covering some notes from this week's meeting between GHC HQ (Austin, Simon PJ, SimonM, Herbert and Mikolaj), including... The 7.10 RC1 looks like it's scheduled to occur this week still; all of our libraries and submodules are up-to-date, and we've taken the time to alert all of our maintainers about this. Thanks to Herbert for taking control of this! We'll soon be implementing a new 'push hook' for the ghc.git repository: no more trailing whitespace. Since we've recently detabbed, and de-lhs-ified the tree, a knock-on effect was deleting trailing whitespace. Now that we've done a lot of this, we should take the time to enforce it - so they can't slip back in. Austin will be landing ​Phab:D169 and ​Phab:D396 soon to get it into 7.10.1 RC1. This week, Austin managed to secure two sponsors for GHC/Haskell.org. We've been given a wonderful set of ARM buildbots (running in the cloud!) and a new, incredibly powerful POWER8 machine to use (with over 170 hardware threads available, for scalability testing). Hooray for our friends at Online.net and RunAbove.com for helping us out! Closed tickets this week include: #9871, #9808, #9870, #9605, #9874, #9872, #9090, #9404, #8240, #9567, #9566, #9583, #9117, #9882, #9884, #9372, #7942, #8951, #9817, #9620, #9336, #9523, #9552, #8988, #9390, #9415, #9371, #7143, #9563, #8778, #4428, #4896, #9393, #9169, #7015, #8943, #8621, #9132, #9857, #8024, #9831, and #9888. 2014-12-16T20:51:40Z thoughtpolice

Minimal Yesod multifile site

Planet Haskell - Wed, 12/17/2014 - 07:00
I was talking with a coworker yesterday who was uncertain of how to start on a Yesod application without using the scaffolding. A single-file application is easy, and is well documented. The question is: how do you make a trivial multi-file site, where the handler functions can live in individual modules and still use type-safe URLs?The problem in achieving this is separating out the data type creation from the dispatch generation. This is also discussed in the book, but what seems to be lacking is a simple cookbook-style example. So here it is!This example serves a very minimal site (actually, it even has a JSON service), but can easily be extended to support more complex things. This is a good basis for people looking to implement minimal services without the need for things in the scaffolding like authentication, configurable logging, etc.Also, if anyone's interested, it would be possible to add this as one of the scaffolding options from yesod init.

24 Days of GHC Extensions: DeriveGeneric

Planet Haskell - Wed, 12/17/2014 - 07:00
Yesterday, Andraz showed us a variety of extensions that came with GHC to help us avoid writing boilerplate code. We saw that GHC can automatically derive instances for Functor, Traversable, Foldable, along with the usual class of Eq, Ord, Show, etc. However, as exciting as this is, you might have been left a little worried that this is where it stops. All of the classes that mentioned so far exist in the base library, and that allows us to extend GHC to automatically derive code for these classes. But what if you have a type class that’s not in base? It would be disappointing if there wasn’t a way to avoid boilerplate without extending the compiler, and with GHC 7.2, we got a new extension that helps solve exactly this problem. > {-# LANGUAGE DeriveGeneric #-} > {-# LANGUAGE FlexibleContexts #-} > {-# LANGUAGE FlexibleInstances #-} > {-# LANGUAGE UndecidableInstances #-} > {-# LANGUAGE MultiParamTypeClasses #-} > {-# LANGUAGE FunctionalDependencies #-} > {-# LANGUAGE TypeOperators #-} > {-# LANGUAGE ViewPatterns #-} > import GHC.Generics The (relatively) new DeriveGeneric extension allows us to use a paradigm of programming called data-type generic programming. In this style of programming, we are able to write our functions over arbitrary data types - provided they have the right “shape”. To get an idea of what it means for things to have the same shape, let’s start by looking at the humble Either data type: data Either a b = Left a | Right b Another data type that’s similar could be a validation data type: > data Valid e a = Error e | OK a > deriving (Generic) In fact, this data type is more than just similar - we can say it is isomorphic to Either. In this sense, isomorphic is just a fancy word for a pair of functions to translate between Valid and Either, without losing information. This ideas relates to have a structure-preserving mapping between Valid and Either. If that sounds scary, we can easily write this up in code: > toEither :: Valid e a -> Either e a > toEither (Error e) = Left e > toEither (OK a) = Right a > fromEither :: Either e a -> Valid e a > fromEither (Left e) = Error e > fromEither (Right a) = OK a See - easy! As you can imagine, there are lots of different data types that are isomorphic, and the insight behind data-type generic programming is that (most) data-types can be built up out of simpler pieces. For Either and Valid, they are both built out of the same parts: Each data-type has two constructors Each constructor has one field GHC.Generics is the library behind the DeriveGenerics extension, and it gives us the following pieces to build data types: Fields Type parameterized fields Field products - which allow us to make a constructor with multiple fields Constructor sums - which allow a data-type to have multiple constructors The library also goes a little further than this, by providing you with program specific information, such as the name of types and fields. The latter can be useful for working with serializers such as JSON and XML. So far, we’ve skimmed the idea of isomorphic types, and also that GHC.Generics gives us a set of basic parts to build types. It would be tedious if we had to write conversion functions ourselves, and by using DeriveGeneric, GHC will do all the heavy lifting behind the scenes for us. As you can see above, we used deriving (Generic) on Valid, which means we already get some transformations that we can play with: .> from (Error "ENOCHRISTMASPRESENTS") M1 {unM1 = L1 (M1 {unM1 = M1 {unM1 = K1 {unK1 = "ENOCHRISTMAS"}}})} .> from (OK ["Books", "Calculators"]) M1 {unM1 = R1 (M1 {unM1 = M1 {unM1 = K1 {unK1 = ["Books","Calculators"]}}})} While the output is a little dense, notice how the contents of the data type is present in the K1 data type, and we choose a side of the Valid data type with L1 (“left” for Error) or R1 (“right” for OK). Now that we’ve got a generic representation, we’re able to start writing generic functions over data-type that conforms to the shape we want. As a small example, let’s write a generic function to try and get the error out of an error-containing data type. As we have a large amount of different types (M1, L1, etc), we use a variety of type classes to navigate the data structure: > class GetError rep e | rep -> e where > getError' :: rep a -> Maybe e > instance GetError f e => GetError (M1 i c f) e where > getError' (M1 m1) = getError' m1 > instance GetError l e => GetError (l :+: r) e where > getError' (L1 l) = getError' l > getError' (R1 _) = Nothing > instance GetError (K1 i e) e where > getError' (K1 e) = Just e > getError :: (Generic (errorLike e a), GetError (Rep (errorLike e a)) e) => errorLike e a -> Maybe e > getError = getError' . from A little daunting, but I’ll be mean and leave it as an exercise to the reader to determine how this code works. However, just to prove it does work - let’s have a play in GHC: .> getError (Error "Oh no!") Just "Oh no!" .> getError (OK "Phew") Nothing .> getError (Left "More explosions!") Just "More explosions!" .> getError (Right "Oh, false alarm") Nothing Now that’s my idea of some Christmas magic. Undoubtedly, generic programming is not a simple concept, and it will take time to get used to it if you’re new to it. Andres Löh, a firm supporter of generic programming, has a lovely Skills Matter talk that goes into more detail about this very extension. 2014-12-16T00:00:00Z

Simple Combinators for Manipulating CGPoint/CGSize/CGRect with Swift

Planet Haskell - Wed, 12/17/2014 - 07:00
One of the most painful things about Objective-C was having to modify CGPoint, CGSize or CGRect values. The clunky struct interface made even simple modifications verbose and ugly, since struct expressions were read-only:    CGRect imageBounds = self.view.bounds;    imageBounds.size.height -= self.footer.bounds.size.height;    self.imageView.bounds = imageBounds;Even though we have auto-layout, I often find myself doing this kind of arithmetic with points, size or rects. In Objective-C, it required either generating dummy variables so you can modify members (as above), or really messy struct initialization syntax:    self.imageView.bounds = (CGRect) {         .origin = self.view.bounds.origin,        .size = CGSizeMake(self.view.bounds.size.width, self.view.bounds.size.height -                               self.footer.bounds.size.height) };Fortunately, none of this boilerplate is necessary with Swift. Since Swift lets you extend even C structures with new methods, I wrote a handful of combinators that eliminate this kind of code. The above snippet can now be replaced with:    self.imageView.bounds = self.view.bounds.mapHeight { $0 - self.footer.size.height }I can easily enlarge a scroll view's content size to hold its pages:    self.scrollView.contentSize = self.scrollView.bounds.size.mapWidth { $0 * CGFloat(pages.count) }I can do calculations that previously would've required dozens of lines of code in just one or two:    let topHalfFrame = self.view.bounds.mapHeight { $0 / 2 }    let bottomHalfFrame = topHalfFrame.mapY { $0 + topHalfFrame.size.height }These two lines will give me two frames that each take up half of the height of their parent view.In cases where I simply need to set a value, I use the primitive "with..." functions:    self.view.bounds.withX(0).withY(0).withSize(0).withHeight(0)Note that these methods can all be chained to create complex expressions.The code for these methods is trivial, yet they give you a huge boost in expressive power.Codeextension CGPoint {    func mapX(f: (CGFloat -> CGFloat)) -> CGPoint {        return self.withX(f(self.x))    }        func mapY(f: (CGFloat -> CGFloat)) -> CGPoint {        return self.withY(f(self.y))    }        func withX(x: CGFloat) -> CGPoint {        return CGPoint(x: x, y: self.y)    }        func withY(y: CGFloat) -> CGPoint {        return CGPoint(x: self.x, y: y)    }}extension CGSize {    func mapWidth(f: (CGFloat -> CGFloat)) -> CGSize {        return self.withWidth(f(self.width))    }        func mapHeight(f: (CGFloat -> CGFloat)) -> CGSize {        return self.withHeight(f(self.height))    }        func withWidth(width: CGFloat) -> CGSize {        return CGSize(width: width, height: self.height)    }        func withHeight(height: CGFloat) -> CGSize {        return CGSize(width: self.width, height: height)    }}extension CGRect {    func mapX(f: (CGFloat -> CGFloat)) -> CGRect {        return self.withX(f(self.origin.x))    }        func mapY(f: (CGFloat -> CGFloat)) -> CGRect {        return self.withY(f(self.origin.y))    }        func mapWidth(f: (CGFloat -> CGFloat)) -> CGRect {        return self.withWidth(f(self.size.width))    }        func mapHeight(f: (CGFloat -> CGFloat)) -> CGRect {        return self.withHeight(f(self.size.height))    }        func withX(x: CGFloat) -> CGRect {        return CGRect(origin: self.origin.withX(x), size: self.size)    }        func withY(y: CGFloat) -> CGRect {        return CGRect(origin: self.origin.withY(y), size: self.size)    }        func withWidth(width: CGFloat) -> CGRect {        return CGRect(origin: self.origin, size: self.size.withWidth(width))    }        func withHeight(height: CGFloat) -> CGRect {        return CGRect(origin: self.origin, size: self.size.withHeight(height))    }}

A commentary on 24 days of GHC extensions

Planet Haskell - Wed, 12/17/2014 - 07:00
Ollie Charles has continued his excellent series of Christmas Haskell blogs. This year he talks about 24 Days of GHC Extensions. His blog posts are an excellent technical introduction to various Haskell extensions. Reading them inspired me to write some non-technical comments about the various extensions; giving a little bit of history and personal comments about them. Day 1: View PatternsView patterns have a long history. As far as I know views were first suggested by Phil Wadler in 1987, Views: a way for pattern matching to cohabit with data abstraction. As the title of the paper suggests, it was about being able to pattern match on abstract data types. Variations of Phil's suggestions have been implemented several times (I did it both in LML and for Haskell in hbc). In all these early suggestions the conversion between the concrete type and the view type were implicit, whereas in the ViewPatterns extension the conversion is explicit. The addition of view patterns was partly spurred by the fact that F# has a very neat way of introducing pattern matching for abstract types, called active patterns. Since Simon Peyton Jones is in the same research lab as Don Syme it's natural that Simon couldn't let F# have this advantage over Haskell. :) Day 2: Pattern SynonymsPattern synonyms is a more recent addition. The first time I remember hearing the idea was this the paper Abstract Value Constructors, and ever since then I wished that Haskell would have something like that. Patterns were one of the few constructs in Haskell that could not be named and reused. The first time they were added to Haskell was in the SHE preprocessor. At a Haskell hackathon in Cambridge, August 2011, Simon Peyton Jones, Simon Marlow and I met to hash out how they could be added to GHC. I wanted to go beyond simple pattern synonyms. The simplest kind is uni-directional synonyms which can only occur in patterns. The simply bi-directional synonyms can be used both in patterns and expressions, but are limited in what they can do. The explicit bi-directional synonyms allow the full power of both patterns and expressions. With explicit bi-directional synonyms and view patterns we can finally implement views as envisioned by Phil, but now broken down into smaller reusable pieces. The result of our discussions at the hackathon can ge found here. But we were all too busy to implement any of it. All the hard implementation work, and fixing all the additional snags found, was done by Gergő Érdi. I find this a very exciting extension, and you can take it even further, like Associated Pattern Synonyms in class declarations. Day 3: Record WildcardsThe record wildcard extension allows you to open a record and get access to all the fields without using qualified names for them. The first time I encountered this idea was in Pascal which as a with-statement. It looks like with expr do stmt where the expr is a record value and inside the stmt all the fields of the record can be accessed directly. The construct later appeared in Ada as well, where it's called use. So having something similar in Haskell doesn't seem so far fetched. But in was actually the dual, namely to construct values using record wildcard notation that inspired me to make this extension. In 2006 I was developing the Paradise EDSL which is a DSL embedded in Haskell for describing computations and user interfaces. I wanted a way to make the EDSL less verbose and that's why record wildcards came about. Here's a simple example to illustrate the idea. We want to be able to input a coordinate record. data Coord = Coord { x :: Double, y :: Double } coord :: P Coord coord = do x <- input 0.0 y <- input 0.0 return Coord{..} This says that x and y are input and that their default value is 0. We need to have an input for every field in the Coord record and at the end we need to construct the record itself. Without the record wildcard the construction would have been Coord{x=x,y=y}. This isn't too bad, but for hundreds of inputs it gets tedious. I made a quick and dirty implementation of this in GHC, but it was too ugly to get into the official release. Instead Dan Licata reimplemented (and extended) it under Simon PJ's supervision into what we have today. I'm actually quite ambivalent about this extension. It's incredibly convenient (especially in the pattern form), but it introduces new bound names without an explicit mention of the name in the binder. This makes it harder to read code. The same objection can be made about the Haskell import declaration when it lacks an import list. Day 4: Bang PatternsI don't have much to say about BangPatterns. One day they simply appeared in a new GHC version. :) The Clean language has long has something similar, but in Clean the annotation goes on the type instead of on the pattern. In Haskell it seems like a nice duality to have it on the pattern since there also lazy patterns introduced by ~. Day 5: Rebindable SyntaxThe rebindable syntax is almost like going back the older Haskell reports. They lacked an exact specification of where the identifier introduced by desugaring were bound. Of course, it was resolved so that they always refer to the Prelude identifiers. But when experimenting with other preludes it's very useful to be able to override this. Which is exactly what RebindableSyntax allows you to do. In my opinion, this extension should be a little different. It ought to be possible to give a module name for where the names should be found. So normally this would be Prelude, but I'd like to be able to say RebindableSyntax=MyPrelude, and then all names introduced by desugaring will be found in MyPrelude (even if they are not in scope). Day 6: List comprehensionsThis bundles together a number of list comprehension extensions. First, MonadComprehensions, which allows list comprehension syntax for any monad. This is simply going back to Haskell 1.4 where monad comprehensions were introduced. The monad comprehension syntax had been used by Phil Wadler before that. Monad comprehension were removed from 1.4, because they were deemed too confusing for beginners. Monad comprehensions were brought back to GHC by George Giorgidze et al. The ParallelListComp allows zip like operation to be expression in the comprehension. The idea originates from Galois in the design of Cryptol. John Launchbury, Jeff Lewis, and Thomas Nordin were turning crypto networks into recursive equations and they wanted a nicer notation than using zipWith. (Thanks to Andy Adams-Moran for the history lesson.) The origins TransformListComp are simple. It's just a way to bring more of the power of SQL into list comprehensions. It's an extension that introduces far too much special syntax for my taste, but the things you can do are quite neat. 2014-12-15T17:36:00Z augustss noreply@blogger.com

Stackage: Survey results, easier usage, and LTS Haskell 0.X

Planet Haskell - Wed, 12/17/2014 - 07:00
There was tremendous response to our Stackage survey, so I'd like to say: thank you everyone who participated, the feedback was invaluable. Additionally, in the past two weeks, I think we've added around 100 new packages to Stackage based on everyone's pull requests, so again, thank you for everyone who got involved. You can view the survey results yourself. Of particular interest to me were the freeform responses, which gave us a lot of valuable information.Chris Done and I went over the results together, and by far, the strongest impression that we got was that the Stackage setup process was too onerous. Lack of direct cabal-install support and need to choose from among six possible snapshots were very problematic. Furthermore, some people found the homepage confusing, and didn't understand from it why they should use Stackage. There was also fear that, by using Stackage, you'd end up missing out on some important packages, either because those packages aren't present, or because it's unclear how to add new packages.So today, we're announcing a number of changes on stackage.org to address these concerns, as well as to pave the way for the upcoming LTS Haskell release. These changes are still a work in process, so please give us feedback (or feel free to send a pull request as well).Simplified choicesIn order to use Stackage, you first had to choose GHC 7.8, GHC 7.8 + Haskell Platform, or GHC 7.6. You then had to decide if you wanted exclusive vs inclusive. Once we add LTS Haskell, that's another choice to add to the mix. So we've decided to simplify the options advertised on the homepage to two:LTS Haskell (URL: /lts)Stackage Nightly (URL: /nightly)Each of these will be compatible with only one version of GHC (7.8.3 for now). Another piece of feedback is that users are by far using inclusive more commonly than exclusive. So we're going to default to giving inclusive instructions.One important thing to keep in mind is that this will not affect current users at all. All snapshots currently hosted on stackage.org will remain there in perpetuity. We're talking about discovery for new users here.Simplified setupUntil now, we've recommended setting up Stackage by changing your remote-repo setting to point to the appropriate stackage.org URL. In October, Greg Weber came up with the idea of generating a cabal.config file to specify constraints instead of using a manual URL. We've decided to make this the preferred Stackage setup method. This provides a number of benefits for you immediately:Works directly with cabal, without needing a bleeding-edge version with remote-repo supportIt's fully supported by cabal sandboxesIt's easy to tweak your version requirements if desiredYou keep Hackage server as your package source, which may be desired by someThe downsides with this are:There are a few bugs in cabal-install around cabal.config support, see the issue for more informationThis approach only works for "inclusive"-style snapshots. However, as we're now recommending inclusive as the default mechanism, this makes sense. The cabal.config file contains an optional remote-repo line which you can uncomment to get back exclusive-style snapshots.There are some concerns about Hackage server's reliability. If you'd like to have a more reliable server, FP Complete offers hackage.fpcomplete.com as an alternative remote-repo, hosted by Amazon S3.As a result of this change, getting set up with Stackage is now as easy as downloading a cabal.config file, placing it in your project directory, and running cabal install. Our homepage has easy to use instructions for this as well.More focused homepageSpeaking of the homepage: we've updated it to:Give easy-to-use installation instructionsGive a clear, concise explanation of what Stackage is and the problems it solvesProvide a link for installation instructions, for people without a Haskell toolchain installedProvide information on how to deal with a package not included in StackageProvide a link for package authors to get involved with StackageRelevant Github issue with more detailsMore informative snapshot pagesThe snapshot pages now list all packages in the snapshot, together with version numbers, synopses, and documentation links (if available). The setup instructions are also much simpler on each snapshot page.We've also set up nicer URLs for the commonly used snapshots. In particular:/nightly will take you to the latest nightly/nightly/2014-12-15 will take you to the December 15, 2014 nightly/lts will take you to the latest LTS/lts/1 will take you to the latest LTS in the 1 series/lts/1.3 will take you to LTS 1.3Relevant Github issue with more detailsMore informative package pagesWe've streamlined the package pages to provide the most pertinent information. Have a look for yourself. Of particular interest, we now have inline links for Haddock documentation. You can now very easily start browsing docs from just looking at a package page.Relevant Github issue with more detailsNew installation instructionsThere's now a dedicated installation instructions page targeted at people without a Haskell installation. This page is controlled by a Markdown file on Github, and pull requests to improve content are very much welcome!LTS repo has started, updated Stackage codebaseI've created the LTS Haskell repo. I'm populating it with 0.X releases now as pre-release testing. To reiterate: LTS Haskell is not launched yet, and I will be holding off on an official 1.0 until January 1. So if you have packages you want in LTS, you still have two weeks to get them in.I've also done a major overhaul of the Stackage codebase itself to make for far more reliable builds. There are lots of details involved, but they're probably not too terribly interesting to most readers. The important takeaways are:Each build is now fully represented by a YAML file that contains a lot of useful metadataThere are completely automated executables to create new LTS and nightly releasesThe codebase is well set up to create reusable binary package databases, if anyone's interested in doing that (I know we'll be using it at FP Complete)Stopping future GHC 7.6/Haskell Platform builds (?)This decision is still up for discussion, but my plan is to discontinue Stackage daily builds for GHC 7.6 and GHC 7.8 + Haskell Platform. The reasons are:It puts a large burden on package authors to maintain their packages with old dependencies, which is precisely the opposite of what we want to do with LTS HaskellVery few people are using GHC 7.6There are some fundamental problems with the current Haskell Platform package set. I hope these are addressed- hopefully by unifying with LTS Haskell. But currently, the package sets based on Haskell Platform are inherently buggy by using package versions with known deficiencies.If you're using a Haskell Platform installed toolchain now, I recommend trying out the new installation instructions to get a toolchain that will be fully compatible with both LTS Haskell and Stackage Nightly.Future: GHC upgrade policyOne future policy decision we'll need to make is: when do we upgrade to a new version of GHC. My proposed plan is that, once we get a successful nightly build with a new GHC version, we stop generating nightlies for the old version. For LTS Haskell, we'll use whatever version of GHC is used by the nightlies at the time we take a snapshot.The upshot of this will be that, at any given time, there will be at most two supported GHC versions by the official Stackage snapshots: whatever nightly supports, and the version used by LTS, which may be one version behind.

24 Days of GHC Extensions: Functional Dependencies

Planet Haskell - Wed, 12/17/2014 - 07:00
> {-# LANGUAGE FlexibleInstances #-} > {-# LANGUAGE MultiParamTypeClasses #-} > {-# LANGUAGE FunctionalDependencies #-} > import Data.Foldable (forM_) > import Data.IORef Over the last few days we’ve seen a few different ways to model the class of mutable variables using type classes. First of all we saw that we could use type families to associate the type of monad with the type of mutable variables, and yesterday we saw that we could almost take the same approach using multi-parameter type classes. Unfortunately, when we moved to multi-parameter type classes, the type inference engine became a little less useful, as there are multiple possible choices of monad for any given mutable variable. What we really wanted to do with the multiple types was to model a relation between the types - knowing the type of the mutable variable should be enough to inform us as to the type of monad. By using the FunctionalDependencies extension, we have the ability to augment a type class with information about functional dependencies - a concept you might already be familiar with from relational database theory. Loosely speaking, a functional dependency lets us indicate that a given set of one or more types determine the type of a single other type. The notation for this is to indicate a list of types and then use an arrow (->) to note the dependency. Revisiting our mutable variables type class, we can now write: > class Store store m | store -> m where > new :: a -> m (store a) > get :: store a -> m a > put :: store a -> a -> m () This is the same type class as yesterday, but we have now indicated that store determines m. We are able to re-use the existing instances unchanged: > instance Store IORef IO where > new = newIORef > get = readIORef > put ioref a = modifyIORef ioref (const a) However, now when we ask for the type of yesterday’s ex function and choose IORef as our store, GHC will be able to infer that the type of m must be IO - as that was determined by the instance above: .> :t ex ex :: [Present] -> IO [Present] Perfect! While this may seem inconsequential, this use of functional dependencies to direct the type inference engine is significant if we want to build practical libraries. While it’s great to be able to do a lot with types, many agree that giving up type inference can be too much of a cost. That said, the fun doesn’t stop there - as functional dependencies and multi-parameter type classes really do start to capture relations between types we can start using type classes as a form of logic programming. A prime example of this is the paper Fun with Functional Dependencies. Another example is in the work around the HList library - discussed in the paper Strongly Typed Heterogeneous Collections. To recap, here is yesterday’s code: > type Present = String > storePresents :: (Store store m, Monad m) => [Present] -> m (store [Present]) > storePresents xs = do > store <- new [] > forM_ xs $ \x -> do > old <- get store > put store (x : old) > return store > > ex ps = do > store <- storePresents ps > get (store :: IORef [Present]) This post is part of 24 Days of GHC Extensions - for more posts like this, check out the calendar. 2014-12-14T00:00:00Z

24 Days of GHC Extensions: Multi-parameter Type Classes

Planet Haskell - Wed, 12/17/2014 - 07:00
Over the last few days, we’ve looked at a few extensions that can extend the notion of type classes in Haskell. First, we saw that nullary type classes remove the requirement that a type class varies over a single type by allowing it mention no types at all, and yesterday we saw how type families can be used to associate more types against a single type. Today, we’re going to revisit yesterdays example and use the multi-parameter type classes extension. > {-# LANGUAGE FlexibleInstances #-} > {-# LANGUAGE MultiParamTypeClasses #-} > import Control.Monad.Trans.Class (lift) > import Control.Monad.Trans.Reader (ReaderT) > import Data.Foldable (forM_) > import Data.IORef The extension does just what it says on the tin - with MultiParamTypeClasses enabled, GHC removes the constraint that a type class can mention only a single type. Now, we’re able to have our type class mention multiple types at once. Lifting this constraint has significant consequences; if we think of a type class over one type as modelling a set of types, whereas multiple types now let us model relations between types. The latter is interesting, though beyond the scope of this article. Interested readers are pointed to Oleg Kiselyov’s home page - which is full of mind bending tricks with type classes! Yesterday, we looked at a traditional example around type classes - modelling the class of types that represent mutable variables. We used type families to associate the type of monad with each mutable variable, reaching the following API: class Store store where type StoreMonad store :: * -> * new :: a -> (StoreMonad store) (store a) get :: store a -> (StoreMonad store) a put :: store a -> a -> (StoreMonad store) () However, the API that we ended at is a little obtuse - those types take quite a bit of mental parsing to understand. Conceptually, we can think of mutable variables as having a relationship between types - the type of a mutable variable is related to the type of its monad. Using MultiParamTypeClasses, we can encode just this idea - we simply vary the type class over both the variable type and its monad: > class Store store m where > new :: a -> m (store a) > get :: store a -> m a > put :: store a -> a -> m () This API is much easier to understand! Furthermore, because the type class itself mentions the type of monad, using this type in our programs is straightforward. We can port over yesterdays example with only changes to the type: > type Present = String > storePresents :: (Store store m, Monad m) => [Present] -> m (store [Present]) > storePresents xs = do > store <- new [] > forM_ xs $ \x -> do > old <- get store > put store (x : old) > return store I’m sure you’ll agree, that’s a much more manageable type. All that is left now is to provide instances for our type class: > instance Store IORef IO where > new = newIORef > get = readIORef > put ioref a = modifyIORef ioref (const a) Again, very little has changed from yesterdays code here - we just move the type of monad up to the type class instance declaration, rather than using an associated type. So far I’ve put the extension in a great light, but there is a caveat: the use of multi-parameter type classes can lead to ambiguity during type checking. This can be a huge problem when writing large applications, as it means we now have to annotate our programs extensively. To look at this problem in more detail, let’s look at using the storePresents function we wrote earlier. If we build a store out of a list of Presents as an IORef and then query for the contents of the IORef, something perculiar seems to happen: > ex ps = do > store <- storePresents ps > get (store :: IORef [Present]) What would you expect the type of this function to be? We’ve chosen IORef as our store, and IORefs are associated with the IO monad, so we have ex :: [Present] -> IO [Present], right? Let’s see what GHCI makes of it: .> :t ex ex :: (Store IORef m, Monad m) => [Present] -> m [Present] That’s odd! GHCI clearly knows that the variable type itself is an IORef, but that’s not enough information to determine type of monad. For example, another equally valid definition of Store IORef would be: > instance Store IORef (ReaderT () IO) where > new = lift . newIORef > get = lift . readIORef > put ioref a = lift (modifyIORef ioref (const a)) The problem we’re encountering is that multi-parameter type classes don’t add any information to the type inference engine - because knowing one type doesn’t let you know anything about the other types. However, we needn’t abandon hope here - this problem can be solved, it just needs another extension (oh, of course!). This post is part of 24 Days of GHC Extensions - for more posts like this, check out the calendar. 2014-12-13T00:00:00Z

Unification-fd tutorial (part 1/n)

Planet Haskell - Wed, 12/17/2014 - 07:00
A while back I released the unification-fd library, which gives a generic implementation of first-order unification of non-cyclic terms. I've given a few talks on how the library is implemented and what optimizations it performs, but that's not the topic for today. Today, I'm going to talk about how to use it. Unification is a widely useful operation and, consequently, comes in many different flavors. The version currently supported by the library is the sort used by logic programming languages like Prolog, Curry, Dyna, and MiniKanren; which is the same sort that's used for unification-based type inference algorithms like Hindley–Damas–Milner. Of these two examples, the logic programming example is the simpler one to discuss— at least for folks who've used a language like Prolog before. So let's start from there. Caveat Emptor: This post is something of a stream of consciousness. I've gotten a few requests for tutorials on how to use the library, but the requests weren't terribly specific about what problems people've had or what's been difficult to figure out. So I'm shooting in the dark as far as what folks need and how much background they have. I'm going to assume you're familiar with Prolog and the basics of what unification is and does. Preemptive apology: I started writing this post months and months (and months) ago, but unintentionally dropped it after running into a certain issue and then getting distracted and moving onto other things. Actually, this happened at least twice. I'm terribly sorry about that. So, apologies for not tackling the disjunction issue in this post. I'll come back to it later, but figured this post really needs to get out the door already. Logic Terms A term, in Prolog, is just a fancy name for a value of some algebraic data type. In most variants of Prolog there's no explicit definition of the ADT, no restriction on what the constructors are, and no type checking to ensure that subterms have a particular shape. That is, Prolog is what's called a single-sorted logic; in other words, Prolog is an untyped/unityped language. With unification-fd we can implement multi-sorted (aka typed) logics, but for this tutorial we're going to stick with Prolog's single-sorted approach. Opening up Control.Unification we'll see a handful of types and type classes, followed by a bunch of operators. The UTerm data type captures the recursive structure of logic terms. (UTerm is the free monad, if you're familiar with that terminology.) That is, given some functor t which describes the constructors of our logic terms, and some type v which describes our logic variables, the type UTerm t v is the type of logic terms: trees with multiple layers of t structure and leaves of type v. For our single-sorted logic, here's an implementation of t: data T a = T String [a] The String gives the name of the term constructor, and the list gives the ordered sequence of subterms. Thus, the Prolog term foo(bar,baz(X)) would be implemented as UTerm$T "foo" [UTerm$T "bar" [], UTerm$T "baz" [UVar x]]. If we're going to be building these terms directly, then we probably want to define some smart constructors to reduce the syntactic noise: foo x y = UTerm$T "foo" [x,y] bar = UTerm$T "bar" [] baz x = UTerm$T "baz" [x] Now, we can implement the Prolog term as foo bar (baz x). If you prefer a more Prolog-like syntax, you can use uncurried definitions for smart constructors that take more than one argument. Unifiable In order to use our T data type with the rest of the API, we'll need to give a Unifiable instance for it. Before we do that we'll have to give Functor, Foldable, and Traversable instances. These are straightforward and can be automatically derived with the appropriate language pragmas. The Unifiable class gives one step of the unification process. Just as we only need to specify one level of the ADT (i.e., T) and then we can use the library's UTerm to generate the recursive ADT, so too we only need to specify one level of the unification (i.e., zipMatch) and then we can use the library's operators to perform the recursive unification, subsumption, etc. The zipMatch function takes two arguments of type t a. The abstract t will be our concrete T type. The abstract a is polymorphic, which ensures that we can't mess around with more than one level of the term at once. If we abandon that guarantee, then you can think of it as if a is UTerm T v. Thus,t a means T (UTerm T v); and T (UTerm T v) is essentially the type UTerm T v with the added guarantee that the values aren't in fact variables. Thus, the arguments to zipMatch are non-variable terms. The zipMatch method has the rather complicated return type: Maybe (t (Either a (a,a))). Let's unpack this a bit by thinking about how unification works. When we try to unify two terms, first we look at their head constructors. If the constructors are different, then the terms aren't unifiable, so we return Nothing to indicate that unification has failed. Otherwise, the constructors match, so we have to recursively unify their subterms. Since the T structures of the two terms match, we can return Just t0 where t0 has the same T structure as both input terms. Where we still have to recursively unify subterms, we fill t0 with Right(l,r) values where l is a subterm of the left argument to zipMatch and r is the corresponding subterm of the right argument. Thus, zipMatch is a generalized zipping function for combining the shared structure and pairing up substructures. And now, the implementation: instance Unifiable T where zipMatch (T m ls) (T n rs) | m /= n = Nothing | otherwise = T n <$> pairWith (\l r -> Right(l,r)) ls rs Where list-extras:Data.List.Extras.Pair.pairWith is a version of zip which returns Nothing if the lists have different lengths. So, if the names m and n match, and if the two arguments have the same number of subterms, then we pair those subterms off in order; otherwise, either the names or the lengths don't match, so we return Nothing. Feature Structures For the T example, we don't need to worry about the Left option. The reason it's there is to support feature structures and other sparse representations of terms. That is, consider the following type: newtype FS k a = FS (Map k a) Using this type, our logic terms are sets of key–subterm pairs. When unifying maps like these, what do we do if one argument has a binding for a particular key but the other argument does not? In the T example we assumed that subterms which couldn't be paired together (because the lists were different lengths) meant the unification must fail. But for FS it makes more sense to assume that terms which can't be paired up automatically succeed! That is, we'd like to assume that all the keys which are not explicitly present in the Map k a are implicitly present and each one is bound to a unique logic variable. Since the unique logic variables are implicit, there's no need to actually keep track of them, we'll just implicitly unify them with the subterm that can't be paired off. This may make more sense if you see the Unifiable instance: instance (Ord k) => Unifiable (FS k) where zipMatch (FS ls) (FS rs) = Just . FS $ unionWith (\(Left l) (Left r) -> Right(l,r)) (fmap Left ls) (fmap Left rs) We start off by mapping Left over both the ls and the rs. We then call unionWith to pair things up. For any given key, if both ls and rs specify a subterm, then these subterms will be paired up as Right(l,r). If we have extra subterms from either ls or rs, however, then we keep them around as Left l or Left r. Thus, the Unifiable instance for FS performs a union of the FS structure, whereas the instance for T performs an intersection of T structure. The Left option can be used in any situation where you can immediately resolve the unification of subterms, whereas the Right option says you still have work to do.1 Logic Variables The library ships with two implementations of logic variables. The IntVar implementation uses Int as the names of variables, and uses an IntMap to keep track of the environment. The STVar implementation uses STRefs, so we can use actual mutation for binding logic variables, rather than keeping an explicit environment around. Of course, mutation complicates things, so the two implementations have different pros and cons. Performing unification has the side effect of binding logic variables to terms. Thus, we'll want to use a monad in order to keep track of these effects. The BindingMonad type class provides the definition of what we need from our ambient monad. In particular, we need to be able to generate fresh logic variables, to bind logic variables, and to lookup what our logic variables are bound to. The library provides the necessary instances for both IntVar and STVar. You can, of course, provide your own implementations of Variable and BindingMonad. However, doing so is beyond the scope of the current tutorial. For simplicity, we'll use the IntVar implementation below. Example Programs When embedding Prolog programs into Haskell, the main operators we want to consider are those in the section titled "Operations on two terms". These are structural equality (i.e., equality modulo substitution), structural equivalence (i.e., structural equality modulo alpha-variance), unification, and subsumption. Consider the following Horn clause in Prolog: example1(X,Y,Z) :- X = Y, Y = Z. To implement this in Haskell we want a function which takes in three arguments, unifies the first two, and then unifies the second two. Thus,2 example1 x y z = do x =:= y y =:= z To run this program we'd use one of the functions runIntBindingT, evalIntBindingT, or execIntBindingT, depending on whether we care about the binding state, the resulting logic term, or both. Of course, since the unifications may fail, we also need to run the underlying error monad, using something like runErrorT3,4. And since these are both monad transformers, we'll need to use runIdentity or the like in order to run the base monad. Thus, the functions to execute the entire monad stack will look like: -- Aliases to simplify our type signatures. N.B., the -- signatures are not actually required to get things -- to typecheck. type PrologTerm = UTerm T IntVar type PrologFailure = UnificationFailure T IntVar type PrologBindingState = IntBindingState T -- N.B., the @FallibleBindingMonad@ isn't yet a monad -- for Prolog because it doesn't support backtracking. type FallibleBindingMonad = ErrorT PrologFailure (IntBindingT T Identity) -- N.B., this definition runs into certain issues. type PrologMonad = ErrorT PrologFailure (IntBindingT T Logic) runFBM :: FallibleBindingMonad a -> (Either PrologFailure a, PrologBindingState) runFBM = runIdentity . runIntBindingT . runErrorT Here are some more examples: -- A helper function to reduce boilerplate. First we get -- a free variable, then we embed it into @PrologTerm@, -- and then we embed it into some error monad (for -- capturing unification failures). getFreeVar = lift (UVar <$> freeVar) -- example2(X,Z) :- X = Y, Y = Z. example2 x z = do y <- getFreeVar x =:= y y =:= z -- example3(X,Z) :- example1(X,Y,Z). example3 x z = do y <- getFreeVar example1 x y z -- example4(X) :- X = bar; X = backtrack. example4 x = (x =:= bar) <|> (x =:= atom "backtrack") The complete code for this post can be found here online, or at ./test/tutorial/tutorial1.hs in the Darcs repo. Notably, there are some complications about the semantics of example4; it doesn't mean what you think it should mean. We'll tackle that problem and fix it later on in the tutorial series (in part 4 or thereabouts). Term Factoring and Clause Resolution Automata (CRAs) Note that for the above examples, the Haskell functions only execute the right-hand side of the Horn clause. In Prolog itself, there's also a process of searching through all the Horn clauses in a program and deciding which one to execute next. A naive way to implement that search process would be to have a list of all the Horn clauses and walk through it, trying to unify the goal with the left-hand side of each clause and executing the right-hand side if it matches. A more efficient way would be to compile all the right-hand sides into a single automaton, allowing us to match the goal against all the right-hand sides at once. (The idea here is similar to compiling a bunch of strings together into a trie or regex.) Constructing optimal CRAs is NP-complete in general, though it's feasible if we have an arbitrary ordering of clauses (e.g., Prolog's top–down order for trying each clause). The unification-fd library does not implement any support for CRAs at present, though it's something I'd like to add in the future. For more information on this topic, see Dawson et al. (1995) Optimizing Clause Resolution: Beyond Unification Factoring and Dawson et al. (1996) Principles and Practice of Unification Factoring. Other operators In addition to unification itself, it's often helpful to have various other operators on hand. One such operator is the subsumption operator. Whereas unification looks for a most-general substitution which when applied to both arguments yields terms which are structurally equal (i.e., l =:= r computes the most general s such that s l === s r), subsumption applies the substitution to only one side. That is, l subsumes r just in case r is a substitution instance of l (i.e., there exists a substitution s such that s l === r). The symbolic name (<:=) comes from the fact that when l subsumes r we also say that l is less defined5 than r. Subsumption shows up in cases where we have to hold r fixed for some reason, such as when implementing polymorphism or subtyping. Other operators work on just one term, such as determining the free variables of a term, explicitly applying the ambient substitution to obtain a pure term, or cloning a term to make a copy where all the free variables have been renamed to fresh variables. These sorts of operators aren't used very often in logic programming itself, but are crucial for implementing logic programming languages. Conclusion Hopefully that gives a quick idea of how the library's API is set up. Next time I'll walk through an implementation of Hindley–Damas–Milner type inference, and then higher-ranked polymorphism à la Peyton Jones et al. (2011) Practical type inference for arbitrary-rank types. After that, I'll discuss the complications about backtracking choice I noticed when writing this post, and walk through how to fix them. If there's still interest after that, I can get into some of the guts of the library's implementation— like ranked path compression, maximal structure sharing, and so on. If you have any particular questions you'd like me to address, drop me a line. [1] Older versions of the library used the type zipMatch :: forall a b. t a -> t b -> Maybe (t (a,b)) in order to ensure that we did in fact properly pair up subterms from the two arguments. Unfortunately I had to relax that guarantee in order to add support for feature structures. ↩ [2] N.B., a more efficient implementation is: example1' x y z = do y' <- x =:= y y' =:= z The unification operator returns a new term which guarantees maximal structure sharing with both of its arguments. The implementation of unification makes use of observable structure sharing, so by capturing y' and using it in lieu of y, the subsequent unifications can avoid redundant work. ↩ [3] The ErrorT transformer was deprecated by transformers-0.4.1.0, though it still works for this tutorial. Unfortunately, the preferred ExceptT does not work since UnificationFailure doesn't have a Monoid instance as of unification-fd-0.9.0. The definition of UnificationFailure already contains a hack to get it to work with ErrorT, but future versions of the library will remove that hack and will require users to specify their own monoid for combining errors. The First monoid captures the current behavior, though one may prefer to use other monoids such as a monoid that gives a trace of the full execution path, or witnesses for all the backtracks, etc. ↩ [4] To be honest, I don't entirely recall why I had the error monad explicitly separated out as a monad transformer over the binding monad, rather than allowing these two layers to be combined. Since it's so awkward, I'm sure there was some reason behind it, I just failed to make note of why. If there turns out to be no decent reason for it, future versions of the library may remove this fine-grain distinction. ↩ [5] The symbolic name for subsumption is chosen to reflect the meaning of more/less defined (rather than more/less grounded) so that the subsumption ordering coincides with the domain ordering (think of logic variables as being bottom). This is the standard direction for looking at subsumption; though, of course, we could always consider the dual ordering instead. ↩ Twitter Facebook Google+ Tumblr WordPress comments 2014-12-12T21:22:07Z

Test suite for Haskell2010

Planet Haskell - Wed, 12/17/2014 - 07:00
To keep track of progress and to ward off regressions, the test suite now have a section for Haskell2010 compatibility checks:# runhaskell Main.hs -t Haskell2010 --plain | tail -n 4 Test Cases Total Passed 0 0 Failed 6 6 Total 6 6The tests only cover a small part of the Haskell2010 specification and none of them pass yet.

24 Days of GHC Extensions: Type Families

Planet Haskell - Wed, 12/17/2014 - 07:00
Today, we’re going to look at an extension that radically alters the behavior of GHC Haskell by extending what we can do with types. The extension that we’re looking at is known as type families, and it has a wide variety of applications. > {-# LANGUAGE FlexibleContexts #-} > {-# LANGUAGE TypeFamilies #-} > import Control.Concurrent.STM > import Control.Concurrent.MVar > import Data.Foldable (forM_) > import Data.IORef As the extension is so large, we’re only going to touch the surface of the capabilities - though this extension is well documented, so there’s plenty of extra reading for those who are interested! Associated Types To begin, lets look at the interaction of type families and type classes. In ordinary Haskell, a type class can associate a set of methods with a type. The type families extension will now allow us to associate types with a type. As an example, lets try and abstract over the various mutable stores that we have available in Haskell. In the IO monad, we can use IORefs and MVars to store data, whereas other monads have their own specific stores, as we’ll soon see. To begin with, we’ll start with a class over the different types of store: > class IOStore store where > newIO :: a -> IO (store a) > getIO :: store a -> IO a > putIO :: store a -> a -> IO () This works fine for IO stores: we can add an instance for MVar… > instance IOStore MVar where > newIO = newMVar > getIO = readMVar > putIO mvar a = modifyMVar_ mvar (return . const a) and an instance for IORef: > instance IOStore IORef where > newIO = newIORef > getIO = readIORef > putIO ioref a = modifyIORef ioref (const a) Now we have the ability to write functions that are polymorphic over stores: > type Present = String > storePresentsIO :: IOStore store => [Present] -> IO (store [Present]) > storePresentsIO xs = do > store <- newIO [] > forM_ xs $ \x -> do > old <- getIO store > putIO store (x : old) > return store While this example is obviously contrived, hopefully you can see how we are able to interact with a memory store without choosing which store we are commiting to. We can use this by choosing the type we need, as the following GHCI session illustrates: .> s <- storePresentsIO ["Category Theory Books"] :: IO (IORef [Present]) .> :t s s :: IORef [Present] .> get s ["Category Theory Books"] Cool - now we can go and extend this to TVar and other STM cells! Ack… there is a problem. Reviewing our IOStore type class, we can see that we’ve commited to working in the IO monad - and that’s a shame. What we’d like to be able to do is associate the type of monad with the type of store we’re using - as knowing the store tells us the monad that we have to work in. To use type families, we use the type keyword within the class definition, and specify the kind of the type: > class Store store where > type StoreMonad store :: * -> * > new :: a -> (StoreMonad store) (store a) > get :: store a -> (StoreMonad store) a > put :: store a -> a -> (StoreMonad store) () As you can see, the types of the methods in the type class has become a little more complicated. Rather than working in the IO monad, we calculate the monad by using the StoreMonad type family. The instances are similar to what we saw before, but we also have to provide the necessary type of monad: > instance Store IORef where > type StoreMonad IORef = IO > new = newIORef > get = readIORef > put ioref a = modifyIORef ioref (const a) > > instance Store TVar where > type StoreMonad TVar = STM > new = newTVar > get = readTVar > put ioref a = modifyTVar ioref (const a) As you can see - our methods don’t need to change at all; type families naturally extend the existing type class functionality. Our original storePresentsIO can now be made to work in any monad, with only a change to the type: > storePresents :: (Store store, Monad (StoreMonad store)) > => [Present] -> (StoreMonad store) (store [Present]) > storePresents xs = do > store <- new [] > forM_ xs $ \x -> do > old <- get store > put store (x : old) > return store As we have an instance for Store TVar, we can now use this directly in an STM transaction: .> atomically (do (storePresents ["Distributed Computing Through Combinatorial Topology"] :: STM (TVar [Present])) >>= get) ["Distributed Computing Through Combinatorial Topology"] Awesome! Type Families and Computation What we’ve seen so far is extremely useful, but the fun needn’t stop there! Type families also give us the ability to compute over types! Traditionally, Haskell is built around value level computation - running programs should do something. That said, we all know how useful it is to have functions - so why can’t we have them at the type level? Well, now that we have the ability to associate types with types, we can! To look at this new functionality (closed type families), we need a few more extensions to really unlock the potential here, so I’ll finish this blog post on that cliff hanger. Watch this space! This post is part of 24 Days of GHC Extensions - for more posts like this, check out the calendar. 2014-12-12T00:00:00Z

File system snapshots make build scripts easy

Planet Haskell - Wed, 12/17/2014 - 07:00
or, how Docker can relieve the pain of developing long running build scripts I think I’ve found a pretty compelling use case for Docker. But before you think that this is yet another blog post parroting the virtues of Docker I’d like to make clear that this post is really about the virtues of treating your file system as a persistent data structure. Thus, the insights of this post are equally applicable to other copy-on-write filesystems such as btrfs, and ZFS. The problem Let’s start with the problem I was trying to solve. I was developing a long running build script that consisted of numerous steps. It took 1-2 hours to run. It downloaded many fairly large files from the Internet. (One exceeded 300M.) Later stages depended heavily on libraries built in earlier stages. But the most salient feature was that it took a long time to run. Filesystems are inherently stateful We typically interact with filesystems in a stateful way. We might add, delete or move a file. We might change a file’s permissions or its access times. In isolation most actions can be undone. e.g. you can move a file back to its original location after having moved it somewhere else. What we don’t typically do is take a snapshot and revert back to that state. This post will suggest that making more use of this feature can be a great boon to developing long running build scripts. Snapshots using union mounts Docker uses what is called a union filesystem called AUFS. A union filesystem implements what is known as a union mount. As the name suggests this means that files and directories of separate file systems are layered on top of each other forming a single coherent file system. This is done in a hierarchical manner. If a file appears in two filesystems the one further up the hierarchy will be the one presented. (The version of the file further down the hierarchy is there, unchanged, but invisible.) Docker calls each filesystem in the union mount a layer. The upshot of using this technology is that it implements snapshots as a side effect. Each snapshot is a simply a union mount of all the layers up to a certain point in the hierarchy. Snapshots for build scripts Snapshots make developing a long-running build script a dream. The general idea is to break up the script up into smaller scripts (which I like to call scriptlets) and run each one individually, snapshotting the filesystem after each one is run. (Docker does this automatically.) If you find that a scriptlet fails, one simply has to go back to the last snapshot (still in its pristine state!) and try again. Once you have completed your build script you have a very high assurance that the script works and can now be distributed to others. Constrast this with what would happen if you weren’t using snapshots. Except for those among us with monk-like patience, no one is going to going to run their build script from scratch when it fails an hour and a half into building. Naturally, we’ll try our best to put the system back into the state it was in before we try to build the component that failed last time. e.g. we might delete a directory or run a make clean. However, we might not have perfect understanding of the component we’re trying to build. It might have a complicated Makefile that puts files in places on the file system which we are unaware of. The only way to be truly sure is to revert to a snapshot. Using Docker for snapshotted build scripts In this section I’ll cover how I used Docker to implement a build script for a GHC 7.8.3 ARM cross compiler. Docker was pretty good for this task, but not perfect. I did some things that might look wasteful or inelegant but were necessary in order to keep the total time developing the script to a minimum. The build script can be found here. Building with a Dockerfile Docker reads from a file called Dockerfile to build images. A Dockerfile contains a small vocabulary of commands to specify what actions should be performed. A complete reference can be found here. The main ones used in my script are WORKDIR, ADD, and RUN. The ADD command is particularly useful because it allows you to add files that external to the current Docker image into the image’s filesystem before running them. You can see the many scriptlets that make up the build script here. Design 1. ADD scriptlets just before you RUN them. If you ADD all the scriptlets too early in the Dockerfile you may run into the following problem: your script fails, you go back to modify the scriptlet and you run docker build . again. But you find that Docker starts building at the point where the scriptlets were first added! This wastes a lot of time and defeats the purpose of using snapshots. The reason this happens is because of how Docker tracks its intermediate images (snapshots). As Docker steps through the Dockerfile it compares the current command with an intermediate image to see if there is a match. However, in the case of the ADD command the contents of the files being put into the image are also examined. This makes sense. If the files have changed with respect to an existing intermediate image Docker has no choice but to build a new image from that point onwards. There’s just no way it can know that those changes don’t affect the build. Even if they wouldn’t it must be conservative. Also, beware using RUN commands that would cause different changes to the filesystem each time they are run. In this case Docker will find the intermediate image and use it, but this will be the wrong thing for it to do. RUN commands must cause the same change to the filesystem each time they are run. As an example, I ensured that in my scriptlets I always downloaded a known version of a file with a specific MD5 checksum. A more detailed explanation of Docker’s build cache can be found here. 2. Don’t use the ENV command to set environment variables. Use a scriptlet. It may seem tempting to use the ENV command to set up all the environment variables you need for your build script. However, it does not perform variable substitution the way a shell would. e.g. ENV BASE=$HOME/base will set BASE to have the literal value $HOME/base which is probably not what you want. Instead I used the ADD command to add a file called set-env.sh. This file is included in each subsequent scriptlet with: THIS_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" source $THIS_DIR/set-env-1.sh What if you don’t get set-env.sh right the first time around? Since it’s added so early in the Dockerfile doesn’t this mean that modifying it would invalidate and subsequent snapshots? Yes, and this leads to some inelegance. While developing the script I discovered that I’d missed adding a useful environment variable in set-env.sh. The solution was to create a new file set-env-1.sh containing: THIS_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" source $THIS_DIR/set-env.sh if ! [ -e "$CONFIG_SUB_SRC/config.sub" ] ; then CONFIG_SUB_SRC=${CONFIG_SUB_SRC:-$NCURSES_SRC} fi I then included this file in all subsequent scriptlets. Now that I have finished the build script I could go back and fix this up but, in a sense, it would defeat the purpose. I would have to run the build script from scratch to see if this change worked. Drawbacks The one major drawback to this approach is that the resulting image is larger than it needs to be. This is especially true in my case because I remove a large number of files at the end. However, these files are still present in a lower layer filesystem in the union mount, so the entire image is larger than it needs to be by at least the size of the removed files. However, there is a work-around. I did not publish this image to the Docker Hub Registry. Instead, I: used docker export to export the contents as a tar archive. created a new Dockerfile that simply added the contents of this tar archive. The resulting image was as small as it could be. Conclusion The advantage of this approach is two-fold: it keeps development time to a minimum. No longer do you have to sit through builds of sub-components that you already know succeed. You can focus on the bits that are still giving you grief. it is great for maintaining a build script. There is a chance that the odd RUN command changes its behaviour over time (even though it shoudn’t). The build may fail but at least you don’t have to go back to the beginning once you’ve fixed the Dockerfile Also, as I alluded to earlier in the post Docker only makes writing these build scripts easier. With the right tooling the same could be accomplished in any file system that provides snapshots. Happy building! 2014-12-12T00:00:00Z

24 Days of GHC Extensions: Implicit Parameters

Planet Haskell - Wed, 12/17/2014 - 07:00
> {-# LANGUAGE ImplicitParams #-} > import Data.Char Yesterday we looked at nullary type classes - type classes that don’t vary over any types - and saw how they can be used to leave part of a program undefined. Our particular example looked at building a library that needs to call a logging function, but we leave the implementation up to the library author. However, using a nullary type class for this has some drawbacks - the biggest is that we are now tied to a single choice of logging function. This can be problematic if we ever need to vary what it means to log throughout the life time of a program. This may crop up if we decide that we need to transform or discard log entries at smaller parts of our program. To solve this, lets rewrite our library to take logging as a parameter to a function: > type LogFunction = String -> IO () > > type Present = String > > queueNewChristmasPresents :: LogFunction -> [Present] -> IO () > queueNewChristmasPresents logMessage presents = do > mapM (logMessage . ("Queueing present for delivery: " ++)) presents > return () This isn’t much different from what we saw yesterday, but it quickly becomes painful to use in practice. Whenever we want to abstract over queueNewChristmasPresents, we have to either choose to commit to a specific log function, or we have to manually propagate the LogFunction to the parent function. One solution might be to move the LogFunction to a reader monad, but this still carries a cost and the program will need to be transformed. A less invasive technique is to use an implicit parameter. Implicit parameters act like parameters to a function, except the caller never has to apply the function to the argument. Instead, the argument is automatically passed to the function by merely being in scope. Let’s see how this works out for our logger: > queueNewChristmasPresents2 :: (?logMessage :: LogFunction) => [Present] -> IO () > queueNewChristmasPresents2 presents = do > mapM (?logMessage . ("Queueing present for delivery: " ++)) presents > return () As you can see, something interesting has happened. The LogFunction is no longer a parameter to the function, but is rather part of the context of the function - constraints that must be satisfied when we use the program. The body of the program is mostly the same, other than the leading ? that prefixes implicit parameters. To supply an implicit parameter, all we need to do is bring an appropriately named variable into scope: > ex1 :: IO () > ex1 = > let ?logMessage = \t -> putStrLn ("[XMAS LOG]: " ++ t) > in queueNewChristmasPresents2 ["Cuddly Lambda", "Gamma Christmas Pudding"] .> ex1 [XMAS LOG]: Queueing present for delivery: Cuddly Lambda [XMAS LOG]: Queueing present for delivery: Gamma Christmas Pudding Perfect, we’ve got the same type of programs as we saw yesterday! However, we now have the ability to use two different types of loggers in the same program: > ex2 :: IO () > ex2 = do > -- Specifies its own logger > ex1 > > -- We can locally define a new logging function > let ?logMessage = \t -> putStrLn (zipWith (\i c -> if even i > then c > else toUpper c) > [0..] > t) > queueNewChristmasPresents2 ["Category Theory Books"] [XMAS LOG]: Queueing present for delivery: Cuddly Lambda [XMAS LOG]: Queueing present for delivery: Gamma Christmas Pudding QUeUeInG PrEsEnT FoR DeLiVeRy: CaTeGoRy THeOrY BoOkS Neat! Implicit parameters are something of a love-hate extension, to the best of my knowledge - though it can be mighty useful sometimes. Also, as with many aspects of Haskell - it is useful for things that were never even anticipated. Tom Ellis has a blog post that demonstrates how one can “fake” a module system by using implicit parameters. This post is part of 24 Days of GHC Extensions - for more posts like this, check out the calendar. 2014-12-11T00:00:00Z

Beta testing the Windows Minimal GHC Installer

Planet Haskell - Wed, 12/17/2014 - 07:00
Summary: There is now a minimal Windows GHC installer that can install the network library.Michael Snoyman and I are seeking beta testers for our new Windows GHC installer. The installer is very new, and we're keen to get feedback.Page to download the Minimal GHC 7.8.3 InstallerUpdate: If you want to install to Program Files (the default installation path), download the file, right click and hit "Run as administrator". This will be automatic in the next version.One of the standard problems when upgrading Haskell libraries on Windows is that sometimes you have to upgrade a package which requires a configure script - typically the network library. When that happens, you have to take special measures, and even then, sometimes the binary will fail at runtime (this happens on about half my machines).Our installer solves that problem by bundling GHC, Cabal and MSYS in one installer. You can install the network library out of the box without any special configuration and it just works.Our installer does not provide all the packages included with the Haskell Platform, but it does provide an environment where you can easily install those packages. This minimal installer might be more suitable for developers.Why might I want this installerThe installer is much easier to install than the GHC distribution (has a real installer, sets up the %PATH%, includes Cabal and can build the network library).The installer has less libraries than the Haskell Platform installer, but includes MSYS so you can build everything in the Platform. The lack of additional packages and close correspondence with GHC (there is one installer for each GHC version) might make it more suitable for library developers. Once GHC 7.10.1 is released, we hope to make a GHC Minimal Installer available within days.The installer also has more precise control over what is added to the %PATH%. You can add all the binaries it provides, or just add a single binary minghc-7.8.3 which when run at the command prompt temporarily adds all the installed binaries to the %PATH%. Using the switcher, you can easily switch GHC versions. 2014-12-10T21:27:00Z Neil Mitchell noreply@blogger.com

Using Ansi console for nicer console output in EclipseFP

Planet Haskell - Wed, 12/17/2014 - 07:00
A little tip from the trenches (-:. I use the Ansi Console plugin in EclipseFP to be able to see a nice color output from my Haskell executable runs.For example, the Tasty example:Works also for Yesod applications, etc.Note that some support for Tasty is being added in EclipseFP 2.6.3, to at least provide some templates for the cabal test-suite section and test modules. I'm not too sure if it's worth to develop a runner for it, since contrary to HTF we won't have a link to the source, so the standard output is probably nice enough.Happy Haskell Hacking!

How to save US

Planet Haskell - Wed, 12/17/2014 - 07:00
A short, inventively presented, video from Lawrence Lessig, making the case that before we can solve all the other problems in the US, the first problem we must solve is to take big money out of politics, and how we can do it via Mayday.

SNAPL update

Planet Haskell - Wed, 12/17/2014 - 07:00
SNAPL, taking place 3-6 May in Asilomar, now has a publisher. Proceedings will appear in the LIPIcs series published by Dagstuhl. Five pages in ACM style is about eight pages in LIPIcs style, so the SNAPL call for papers has changed the paper length accordingly.Previously: SNAPL.

Why dependent types matter (in Haskell)

Planet Haskell - Wed, 12/17/2014 - 07:00
A year ago I published Agda code for “Why dependent types matter” paper by Thorsten Altenkirch, Conor McBride and James McKinna. Now I rewrote that code in Haskell. This work is similar to my recent conversion of weight-biased leftist heaps from Agda to Haskell so I won’t go into technical details. As usual you can […]

Proposal: bind

Planet Haskell - Wed, 12/17/2014 - 07:00
I often find myself writing: fmap (mu bar) (foo zot) Then I decide to change the type of mu, so instead I want to just write: bind (mu bar) (foo zot) Which is just like fmap but the function can run in the monad. Similar to traverse: (Traversable t, Applicative f) => (a -> f b) -> t a -> f (t b) As someone who isn’t a fan of operators, I generally am appreciative of alternative regular plain English word versions of functions, which I find easier to type, read and edit. Currently without defining such a handy name, I have to transform the code to this: mu bar =<< foo zot The name for this function is a no-brainer ((>>=) is now pronnounced “bind”): bind :: Monad m => (a -> m b) -> m a -> m b bind = (=<<) For comparison, the not-very-pleasant <$> and <*> each have word alternatives, fmap and ap. I submitted this to the haskell libraries mailing list, but include it here for future reference. 2014-12-09T00:00:00Z
Syndicate content