Functional Programming

'O' is for ontology.

Planet Haskell - Fri, 04/18/2014 - 06:00
'O' is for Ontology.What is an ontology? A knowledge-base? Sure, if that's simpler to grasp, but only insofar as 'knowledge-base' doesn't mean 'collection of facts' or 'data set.'An ontology is more than that. But what, precisely, is an ontology?Well, actually, there is a precise meaning to 'ontology.' And 'meaning,' itself, is central to ontology. Because what does these data mean is what an ontology is about. It's not a listing of facts, but it's also the relationship of the facts in the ontology that makes it what it is. The data, the facts, of an ontology have meaning, not only intrinsically, but also explicitly, or: the meaning is useable, or can be processed, itself, as information, and used in the handling of the underlying information.So:MilkEggsBread(Ho-hos)OrangesLettuceAn ontology? Absolutely! You hand that to your husband, and he knows exactly what it is and he knows exactly how to use it. He even, helpfully, penciled in the missing item (ho-hos, just as a 'fer instance') onto your shopping list for you.See?Now, ontology, per se?  Not so much. But if you explicitly titled it ‚ÄúShopping List,‚Äù now you're talking!Format it as XML or JSON or OWL and then your computer will do your shopping for you, just as well as your husband would.Even better, as it won't add those pesky ho-hos your husband always 'helpfully' adds to your list for you.Your computer does need arms and legs and artificial intelligence, but I'm sure Cyberdyne Systems will be happy to help you there ... and also hand your terminator that was a computer a fully automatic plasma rifle.Whoopsie! Better give your husband the list, and live with the ho-hos ... Ho-hos are, relatively speaking, better than global thermonuclear war and the extinction of all mankind.But I digress.As always.

Haskell gets static typing right

Planet Haskell - Fri, 04/18/2014 - 06:00
> The following blog post originally appeared as a guest post on the > Skills Matter blog. Well-Typed are regularly teaching both > introductory or advanced Haskell courses at Skills Matter, and we > will also be running a special course on Haskell’s type system. Statically typed languages are often [...]

'N' is for No Nix Nada Nil Null Not Nothing

Planet Haskell - Fri, 04/18/2014 - 06:00
'N' is for got Nuttin'!I was thinking of covering the universal combinator, the Nightingale, with this post. It's a combinator that gives you any other combinator in the combinatory logic:N = λx -> (xS)KorN = λx -> xKSKor an infinite many other combinations of combinators that when combined repeatedly reduce either to the S or the K combinators, because the SK-basis has been shown to be Turing-complete.But then I thought: eh! Maybe you've read enough about combinators this month, or to date, at any rate, so there is it: the N-combinator, the Nightingale, when combined with only itself (in certain desired ways), can give you any computable form. Yay!So, let's talk about nothing for a sec here.... hm, hm, hm, ...Nice talk?The problem with nothing is that there's nothing to talk about ... about it. But people tend to talk quite a bit, and if you reduce what they've just been saying to you and to everyone else who will listen (and quite a few of them won't), then you find that they haven't been really talking about anything at all, and, insult to injury, even they would find their own words boring, obnoxious, offensive, thoughtless, careless, if you played them back to them and forced them to listen to every word that came out of their mouths.Ever listen to yourself speaking? Educational experience. Sometimes, I stop and listen to myself. Most times, I find, it would be better if I shut myself up sooner rather than later.geophf, shut up! I tell myself.Sometimes I listen when I tell myself that.I feels better when I shut myself up when I'm speaking I have nothing to sayYou know: THINK before you speak.If it's not every one of those things, then why did I just open my mouth.Here's another gauge.Bad people talk about people.Good people talk about things.Great people talk about ideas and ideals.What are you talking about?Okay, but that's really 'something' that I've been talking about, and that is what people talk about, which is usually 'nothing good nor kind,' but it's still something, even if that something is vicious or inane or banal.So let's talk about nothing.The problem of nothing is that there's none of it. And here's why.Get yourself to a pure vacuum. Is it inside a glass tube? No, because, actually, you've sucked most of the air out of it, but guess what? There's still quite a bit of light in there. That's not nothing. That's quite a bit of energy.Drat! Bummer! So, turning off the night doesn't do anything for you. Visible light is gone but then you have infrared and ultraviolet — ooh! vampires! — so you've got to go somewhere where's there's no light, no light, on either end of the visible spectrum.Okay, spaaaaace!Quite a lot of dust out there, star dust, and solar winds.Hm, anywhere where we can go where there's nothing?How about next to a black hole?Now, that's interesting.Now, if we can find a place somewhere around the black hole where it's not sucking up mass nor jetting out X-rays (and, granted, that would be hard to find, ... how about a black hole away from any nearby galaxy in spaaaace! Aw, the poor lonely black hole!) (Nobody cares about black holes being all alone, they're just like: oh! black hole! black holes are evil! I hate you, black hole! But does anybody ever consider the black hole's feelings when they're making these pronouncements? Noooo! They just say these things to the black hole's face, and then wonder why black holes are always so mean to them!)There's still a problem. It's called the Hawking radiation.You see, even in a 'pure vacuum' ... it isn't. Nature abhors a vacuum, so what happens is that there's so much energy around a black hole that it creates quantum particles (the Higgs field?) sucking most of them back into itself, but some little quarks escape, at around the speed of light (quark: 'Imma getting me away from that there black hole! NAOW!') and so Hawkings predicted, correctly, that even black holes emit radiation.Bummer, dude! There doesn't seem to be anywhere where you can find yourself some nothing to be all sad and morose and all alone by yourself! You just have to buck up and accept the John Donne commandment: No quark is an island.Or something like that.BUT THEN! There's mathematics. You can invent a mathematical space where there's nothing in it, just it, and you (which makes it not nothing, but you get some quiet-time, finally, some alone time to catch up on your reading without having to worry about if the dishes were done).What happens there? Besides nothing? What does it look like? Besides really, really empty?Here's an interesting factoid (factoid, n.: geophf's interesting declarations that may or may not be somewhat related to some real reality), you're not the first visitor there.There was this tall, thin dude in a white lab coat, by the name of Mach (no, not Mac the Knife, okay? Different Mac), and yes, the measure of velocity greater than the speed of sound was named after him (you know: mach 1, mach 2, mach 5+ for SR-71s), but besides that (and that's more than most people have ever done with their lives, but did he rest on his laurels? No. Besides, ... he still needed to earn his bread, so he could bake it, and then enjoy it with a latte) (lattes are important to physicists and mathematicians, don't you know.)(But I digress.)(As usual.)Besides that, he did this thought-experiment. He created this mostly empty space, just him and the star Alpha Centauri in the distance. That was it in the entire Universe. And he spun himself, arms outstretched.How did he know he was spinning? Easy, Alpha Centauri came into view, crossing it, then disappeared behind him as he spun, and he saw the star in an arc.Next, he removed Alpha Centauri.Now. Was he spinning?He had no way to tell. You can tell you're spinning, because you get dizzy and then sick, but why? Because of gravity. Earth's gravity (mostly). But now there's not Earth.There's not even Another Earth. So gravity is not now giving you a frame of reference, and you could tell you were spinning because there Alpha Centuri was in the distance, giving you your (only) point of reference, but now it's gone, too.You could be moving a million miles per hour, you could be spinning and doing backward summersaults (or winter-saults for all you know: no seasons; no nothing. No nothing but nothing in your created Universe), and you'd have no frame of reference for you to make that determination.Are you spinning in Mach's empty Universe?The answer to that is: mu.The answer to that question is that that's not a question to be asking. The answer to that question is that it has no sensible answer, whether you're spinning or not, you have no way to tell, or, no way to measure it. What's your coordinate system? How do you measure your speed or your spin. You don't, because if the XY-axis extends from in front of you, then it's always oriented to your front, no matter which way you face.Anyway, where is time in all this? There's nothing. Space is bounded by the things in it, otherwise there's not space. 'Space' is emptiness, nothing there, but the space is defined by the no-thing between things.There are no things in your nothing-Universe.No space, no space-time. No space-time, no time. Spin needs time to measure.If you had Alpha Centuri, then you would have light from four years ago shining on you, but now you have nothing, no light, no space, no time. No spin.This post was about nothing, nothing at all. I hope you see when you get to that point where there is indeed nothing, no dust, no light, no nothing, you really, really do have nothing, not even space (the space between things: there are no things), not even time, not even spin.Not even weight; weight needs gravity. You have no gravity in your nothing-Universe.Enjoy the slim new-you, but the thing is, it'll be a lonely victory, so no bragging rights: nobody to brag to.Okay, I'm going to go enjoy a latte with my busy, noisy family, and enjoy my not-nothing, not-solitude.Nothing's not all cracked up to be what it's supposed to be.

Autocomplete command-line flags with GHC 7.8.2

Planet Haskell - Fri, 04/18/2014 - 06:00
GHC 7.8.2 has been released just a few days ago1. This is the first official release of GHC that contains my contributions. Most of them are improvements in the code generator and are thus practically invisible to most users. But I also implemented one very nice feature that will be useful to an average Haskeller. […]

Team Scotland

Planet Haskell - Fri, 04/18/2014 - 06:00
What happens after Yes? It's not the SNP, it's the people. A great post on Bella Caledonia. The UK chattering classes have been wondering what a real, mass grass-roots campaign might look like in modern, professionalised politics. Impotent is their usual conclusion. Well come on up and we’ll show you. The old feudal dance where councilor doths cap to MP, MP to Minister, Minister to Prime Minister and Prime Minister to corporate CEO may well continue apace even here in Scotland. But it’s not winning Scotland. 

Help ORG restart the debate about internet filters

Planet Haskell - Fri, 04/18/2014 - 06:00
The Open Rights Group is starting a campaign opposed to the default filtering now imposed by all providers in the UK---de facto censorship. You can fund it via IndieGogo.

Continuations and Exceptions

Planet Haskell - Fri, 04/18/2014 - 06:00
Posted on April 14, 2014 Continuations are useful things. They provide a nice way to manually handle control flow. This fact makes them very useful for compilers, an often used alternative to SSA is an intermediate language in which every function call is a tail-call and every expression has been converted to continuation passing style. Often however, this isn’t enough. In a language which exceptions, we don’t just have a single continuation. Since every expression can either do one of two things. Continue the rest of the program normally Throw an exception and run an alternative program, the exception handler To represent this, we can imagine having two continuations. Instead of newtype Cont r a = Cont {runCont :: (a -> r) -> r} We have {-# LANGUAGE DeriveFunctor #-} import Control.Monad newtype Throws r e a = Throws {runThrows :: (e -> r) -> (a -> r) -> r} deriving (Functor) Now we have two continuations, where e -> r represents the composition of exception handlers. We can write a trivial monad instance similar to Cont instance Monad (Throws r e) where return a = Throws $ \ex cont -> cont a (Throws c) >>= f = Throws $ \ ex cont -> c ex $ \a -> runThrows (f a) e cont So >>= maintains the exception handler between computations and otherwise acts exactly like Cont. To actually take advantage of our exception handlers, we need two things, a throw and catch like pair of function. Let’s start with throw since it’s easiest. throw :: e -> Throws r e a throw e = Throws $ \ex cont -> ex e This is pretty straightforward, when we’re given an exception an to throw, we simply feed it to our exception handler continuation. Since care what value cont needs, we can universally quantify over a. Next up is handle, we’ll represent an exception handler as a function from e -> Maybe a. If we return Nothing, we can’t handle the exception at this level and we’ll just pass it to the existing exception handler. So our handle is handle :: Throws r e a -> (e -> Maybe a) -> Throws r e a handle (Throws rest) handler = Throws $ \ex cont -> rest (\e -> maybe (ex e) cont (handler e)) cont Notice the clever bit here, each handler actually contains both the success and failure continuations! If we can’t handle the exception we fail otherwise we can resume exactly where we were before. No post would be complete without a demonstration! data Ex = Boom | Kaboom | Splat String deriving Show throwEx1 = throw Boom throwEx2 = throw Kaboom throwEx3 = throw (Splat "splat") test = do result <- handle throwEx1 $ \e -> case e of Boom -> Just "foo" _ -> Nothing result2 <- handle throwEx2 $ \e -> case e of Boom -> Just "what" Kaboom -> Just "huh?" _ -> Nothing result3 <- handle throwEx3 $ \e -> case e of Splat s -> Just s _ -> Nothing return (unwords [result, result2, result3]) We can run this with runThrows (error . ("Toplevel fail "++)) test which returns "foo huh? splat" A Note on Either Now we already have a perfectly good system of monadic exception like thing in the form of Either. It might be interesting to note that what we’ve written is in fact isomorphic to Either. (e -> r) -> (a -> r) -> r is just the church representation of Either e a. We can even go all the way and change Throws to newtype Throws e a = Throws {runThrows :: forall r. (e -> r) -> (a -> r) -> r} So there you are, an interesting realization that one of the classic representations of a language like SML is in fact a really complicated version of Either :) /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */ var disqus_shortname = 'codeco'; // required: replace example with your forum shortname /* * * DON'T EDIT BELOW THIS LINE * * */ (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); Please enable JavaScript to view the comments powered by Disqus. comments powered by Disqus 2014-04-14T00:00:00Z

Bargain Priced Coroutines

Planet Haskell - Fri, 04/18/2014 - 06:00
Posted on April 8, 2014 The other day I was reading the 19th issue of the Monad.Reader and there was a fascinating post on coroutines. While reading some of the code I noticed that it, like most things in Haskell, can be reduced to 5 lines with a library that Edward Kmett has written. Consider the type of a trampoline as described in this article newtype Trampoline m a = Tramp {runTramp :: m (Either (Tramp m a) a)} So a trampoline is a monadic computation of some sort returning either a result, a, or another computation to run to get the rest. Now this looks strikingly familiar. A computation returning Trampoline m a is really a computation returning a tree of Tramp m a’s terminating in a pure value. This sounds like a free monad! import Control.Monad.Trans.Free import Control.Monad.Identity type Trampoline = FreeT Identity Recall that FreeT is defined as data FreeF f a b = Pure a | Free (f b) data FreeT f m a = FreeT (m (FreeF f a (FreeT f m a))) This is isomorphic to what we where looking at before. As an added bonus, we’ve saved the tedium of defining our own monad and applicative instance for Trampoline. We can now implement bounce and pause to define our trampolines. bounce must take a computation and unwrap it by one level, leaving either a value or another computation. This is just a matter of rejiggering the FreeF into an Either bounce :: Functor m => Trampoline m a -> m (Either (Trampoline m a) a) bounce = fmap toEither . runFreeT where toEither (Pure a) = Right a toEither (Free m) = Left $ runIdentity m pause requires some thought, the trick is to realize that if we wrap a computation in one layer of Free when unwrapped by bounce we’ll get the rest of the computation. Therefore, pause :: Monad m => Trampoline m () pause = FreeT $ return (Free . Identity $ return ()) So that’s 6 lines of code for trampolines. Let’s move on to generators. A generator doesn’t yield just another computation, it yields a pair of a computation and a freshly generated value. We can account for this by changing that Identity functor. type Generator c = FreeT ((,) c) Again we get free functor, applicative and monad instances. We two functions, yield and runGen. Yield is going to take one value and stick it into the first element of the pair. yield :: Monad m => g -> Generator g m () yield g = FreeT . return $ Free (g, return ()) This just sticks a good old boring m () in the second element of the pair. Now runGen should take a generator and produce a m (Maybe c, Generator c m a). This can be done again by pattern matching on the underlying FreeF. runGen :: (Monad m, Functor m) => Generator g m a -> m (Maybe g, Generator g m a) runGen = fmap toTuple . runFreeT where toTuple (Pure a) = (Nothing, return a) toTuple (Free (g, rest)) = (Just g, rest) Now, last but not least, let’s build consumers. These wait for a value rather than generating one, so -> looks like the right functor. type Consumer c = FreeT ((->) c) Now we want await and runCon. await to wait for a value and runCon to supply one. These are both fairly mechanical. runConsumer :: Monad m => c -> Consumer c m a -> m a runConsumer c = (>>= go) . runFreeT where go (Pure a) = return a go (Free f) = runConsumer c $ f c runCon :: (Monad m, Functor m) => Maybe c -> Consumer c m a -> m (Either a (Consumer c m a)) runCon food c = runFreeT c >>= go where go (Pure a) = return . Left $ a go (Free f) = do result <- runFreeT $ f food return $ case result of Pure a -> Left $ a free -> Right . FreeT . return $ free runCon is a bit more complex than I’d like. This is to essentially ensure that if we had some code like Just a <- await lift $ do foo bar baz Just b <- await We want foo, bar, and baz to run with just one await. You’d expect that we’d run as much as possible with each call to runCon. Thus we unwrap not one, but two layers of our FreeT and run them, then rewrap the lower layer. The trick is that we make sure never to duplicate side effects by using good old return. We can sleep easy that this is sound since return a >>= f is f a by the monad laws. Thus, our call to return can’t do anything detectable or too interesting. While this is arguably more intuitive, I don’t particularly like it so we can instead write runCon :: (Monad m, Functor m) => Maybe c -> Consumer c m a -> m (Either a (Consumer c m a)) runCon food = fmap go . runFreeT where go (Pure a) = Left a go (Free f) = Right (f food) Much simpler, but now our above example wouldn’t run foo and friends until the second call of runCon. Now we can join generators to consumers in a pretty naive way, (>~>) :: (Functor m, Monad m) => Generator c m () -> Consumer c m a -> m a gen >~> con = do (cMay, rest) <- runGen gen case cMay of Nothing -> starve con Just c -> runCon c con >>= use rest where use _ (Left a) = return a use rest (Right c) = rest >~> c And now we can use it! addGen :: Generator Int IO () addGen = do lift $ putStrLn "Yielding 1" yield 1 lift $ putStrLn "Yielding 2" yield 2 addCon :: Consumer Int IO () addCon = do lift $ putStrLn "Waiting for a..." Just a <- await lift $ putStrLn "Waiting for b..." Just b <- await lift . print $ a + b main = addGen >~> addCon When run this prints Yielding 1 Waiting for a... Yielding 2 Waiting for b... 3 Now, this all falls out of playing with what functor we give to FreeT. So far, we’ve gotten trampolines out of Identity, generators out of (,) a, and consumers out of (->) a. /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */ var disqus_shortname = 'codeco'; // required: replace example with your forum shortname /* * * DON'T EDIT BELOW THIS LINE * * */ (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); Please enable JavaScript to view the comments powered by Disqus. comments powered by Disqus 2014-04-08T00:00:00Z

'M' is for burrito. No, really! (Monads, a burrito-escque introduction)

Planet Haskell - Fri, 04/18/2014 - 06:00
'M' is for Burrito. No, really.(A little post about monads)This one's easy.A monad is a burrito: a nice, soft-shell wrapper hiding all that yummy, but complicated stuff inside (green peppers and onions? who was the first one to think of that?)A monad is a burrito, wrapping all that complicated stuff like an integer:Just 3under its soft-shell wrapper:[1,2,3]Just showed you two monads: the first one is the 'Maybe' monad, representing semideterminism (we're certain, in this case, that the underlying unit is 'Just' 3, but we could, in a different situation, be equally certain that the underlying unit was 'Nothing' Those two cases (two = semi) give us our certainty (determinism) one way or the other), the second is the 'List' monad, representing nondeterminism (you could have the empty list [] (no answers, no = non), or, in the above case, a list with a set of different answers (on or several)).So, monad = burrito. Simple, right?Class dismissed."Excuse me, Professor geophf," the wormy student in the back of the class wearing b-c glasses and who always has to ask annoying questions in his whiny voice at the very end of class, making everybody late for their next class and me late for my latte, and you don't want to make me late for my latte! He continued, oblivious: "What about monads that don't contain anything at all? What is that? An empty burrito?"He giggled at his own joke, thinking himself rather smart at drôle."What?" I roared, "An empty burrito? Such sacrilege cannot be! Besides, what's the point of an empty monad? That's just ridiculous!"I was rather peeved that this young upstart would trespass on the beauty of my burrito-analogue. It was simple and easy to understand. Why mess with it?"But," he continued in his whiny voice, "what about the empty list and 'Nothing' in the Maybe monad. They carry no value but are essential to each of those monads, how do you explain these monadic values as burritos? Are they churros?"I sputtered in fury. He had me there.So I did what every tyrantteacher does, I relied on my position of authority."No, monads are burritos! Get that through your head!" And then: "Class dismissed!"Everybody fled. I do cut quite the imposing figure when I have that angry glint in my eye and my flinty face set in stone.Okay, so monads aren't burritos. I admit it (But not to Mr. Wormwood. Never, never!) There is nothing that says a monad has to carry a value, it's just a convenience to think of monads like that: a wrapper around an 'ordinary' object. But monads are objects, themselves, and, in fact, more 'ordinary,' that is: regular, than plain-old objects like integers.The problem of plain-old objects, the problem that monads address, each in their own way (and there are many species of monads, not just maybes and lists, but eithers, and states and ... stuff! Lots of wild and weird different stuff!), is that plain-old objects admit the uninhabited instance, ⊥, in their type. So the type of natural numbers is 0, 1, 2, 3, ... but it's also ⊥. For programming languages that realize this (haskell, for example), they deal with this gracefully. For programming languages that realize this, but then do nothing about it, there is a systemic problem of the program going into an unstable state when encountering an uninhabited instance.Something that causes Algol-child language programmers to shake in their booties: the 'null-pointer exception.'Question: how can Java throw a 'NullPointerException,' when it doesn't have the pointer construct in its language, at all? Am I the first to call this one out?So, the function in these languages that have ⊥ but do not handle it properly must always admit that you're not working with functions at all, for:x + yis x plus y, but if either in uninhabited, the answer is KABOOM!A program crash. You just destroyed your entire system. And you didn't even do it: the language designers did.Monads were an answer to this problem. A little dude with a long, bushy beard by the name of Kleisli developed the work around what came to be known as the Kleisli categories, that had monads as their objects and (what came to be known as) Kleisli arrows as morphism, or functions.The guarantee of the Kleisli category was that, indeed, every object as a monad, so there's no uninhabited type, this means Kleisli functions always get you back into a Kleisli category (yes: once you enter the monadic domain, you never leave it. That's the guarantee of a Kleisli category, you can only be a monad (or a monadic function or monad category ...) to get in, and no monads exist outside the Kleisli categories. The monadic domain is the Hotel California of categories).Okay, your stuck. And the guarantee of the monad is that you don't care what's under the soft-shell, there's no 'getting a value out' of a monad, because, as my annoying student, Mr. Wormwood, pointed out, there are monads with no underlying values. So what do you do with them?Monads have a special function associated with them, call the 'join' operator.Monads, you see, are a special type in that a join of a monad of a monad is just a monad, so, for example,join (Just (Just 3))gives you Just 3What does that do for us?Well, the structure of a monadic function is this:Monad m => f m :: a -> b mThat is, the function takes an ordinary type a and 'lifts' it into the monadic category giving the (monadic) answer of type b (correctly m b, as m is the monadic type and b is the type carried by the monad).Okay, but monadic functions can't dig out the underlying type in a monad of its category, so how do even the functions work at all?Just as you can lift an unit type into a monad:return 3 = Just 3You can also lift an ordinary function into the monadic domain:liftM succ = m Int -> m Int (for some monadic type m)Okay, so, but where does that even get us? We're still stuck, right?Well, the thing is, you can even lift monadic functions into the monadic-monadic domain:liftM f (of type Monad m => a -> m b) = f' :: Monad m => m a -> m (m b)What does that get us?Well, combining join with a lifted monadic function gets us a monad from monadic input:join (liftM f (Just 3)) = some monadic answer dependent on the resultant type of f.This whole lifting-into-the-monadic-domain-and-then-joining-the-result is so common when working in monad categories that a special operator was invented, à la Haskell, named '>>=' (pronounced: 'bind').Now, with bind the true power of monads come out:m >>= f >>= g >>= h >>= ... etc.What you see here a monad being bound, in turn, to the monadic functions f, g, h, etc.So, okay, you've got a short-hand for monadically binding functions together. Great. Good on you, and have a latte.The thing here is this.In 'plain-old' programming, you have this ever-present fear that a NullPointerException is going to ruin your life, so you can't write:obj.getA().getB().getC().thenDoSomething()getting an A from obj, then getting a B from that A, then getting a C from that B, then you do something with that C.I mean, you can write that code, but if any of those objects are null: obj, A, B, C, your whole statement explodes?No, worst than that: your entire system blows up, and you have to deal with irate customers wondering why they got a 505-Service down error when they submit their credit care information but forgot to enter their zip code, or some such.So you have to wrap that statement in a try-block, and try it and see if it works, and if it doesn't you have this whole envelope, this whole burrito shell of code you have to write to handle the uninhabited case.Now let's look at the monadic function binding again:m >>= f >>= g >>= h >>= ... etc.What happens if m is null, or the result of f m is ...No. Wait. Monads don't have nulls. Monads are just monads, so the above binding ...... okay, wait for it, ...just.works.In fact, in the Kleisli category, it is guaranteed to work.It. just. works.For you, not a Java(script) programmer, you're like: "Well, duh, yeah, that's how it's supposed to be! 1 + 1 = 2, not 'error' or whatever."You can say that, and expect that, in a pure mathematical domain (with monads implicit in the mathematics for you. You're welcome.) But a Java programmer, with any experience, any hard-won experience should be (rightly) awed."Wait. You mean, it just works? How is that possible?"Yeah. How is that possible? That Java code just works?It doesn't. But, when I translate the Kleisli category down into a Java implementation, and then lift every Java object into that cateogy, so it's not a Plain-old Java Object (POJO) any more, but now it's a monad in the Kleisli category, and operate on it as such from then on.Well, then, it just works.Some people look at my monadic harness and say I'm unnecessarily complicating things.I beg to differ."But, geophf, I just want the start date; all this stuff just complicates things!"But what do you want the start date ... for? If you just want the start date, thenpojo.getStartDate()will get you there, and you're done.But if you want to base any logic off it? Is your start date  start of a work-flow? So if the start date is null, your system, that you unit tested with start date values, now crashes in production and you have no idea why, because you unit tested it, and the code looks good.But that one null wrecks everything.I don't have any nulls. So I lift my start date up into a monad, and start my work flow, and my work flow works, and if the start date is null, then the start date isn't lifted, the functions fall through, because they're not working with a monad, and I just go onto the next thing.And I coded zero lines of code around that.Your try-catch blocks looking for nulls everywhere ...Well, what's more complicated now? What code works in production, regardless of dirty or bad data?Monads are a guarantee, and, as burritos go, they are quite tasty. Give them a try!p.s.:Now there's the whole conversation around monads that carry no value intentionally throughout the entire monad. So, for example, if the monad type is 'Quark' the the monadic inhabited types are up, down, strange, charmed, top and bottom, and their values are ... nothing at all: their instance is the value, it has no underlying value it carries (unless you want to get into the guts of spin and charge ...), its the particular monad instance, or the particular quark, we care about, not what's inside at all, something like monads as enumerated values, but ... eh. Monads-as-burritos is a limited view, it will trip you up, but for the most part it works. When you do get to the point of treating monads-as-monads, and not caring, anymore, about the underlying value, you can tap yourself on the shoulder with monadic-Nirvana. It's a really good feeling, and gives an entirely new understanding of monads and their utility. "Good to know," as it were, but monad-as-burrito works for the most part and is tasty, too, to boot.

Writing admin interfaces with Fay using fay-builder

Planet Haskell - Fri, 04/18/2014 - 06:00
We are in the process of open sourcing a lot of the general purpose libraries we’ve built at Silk under the BSD3 license. We’re planning to write a series of blog posts introducing them so please stay tuned! First off is one of the smaller packages, fay-builder (github). Fay is a proper subset of Haskell that compiles to JavaScript, as well as a compiler. I’ve been helping out with the project since its first release in July 2012. We have several Haskell backend services with admin interfaces written in HTML and JavaScript that communicate with the running server using REST APIs. It seemed like a good idea to write their admin interfaces in Haskell and we have been experimenting with using Fay for this. Since Fay supports Cabal packages we can use our usual tools and upload these packages to the public hackage or our private instance. fay-builder lets you store options needed to compile Fay code along with an executable. This is done by using custom cabal flags that can be read either by Setup.hs (but see caveats below) or from within the application at startup, on request, or at some other point. If you want to try it out you can install it by running cabal install fay-builder. This post demonstrates how we build Haskell code for our admin interfaces.Join our team if you enjoy this stuff too, we’re hiring! The Cabal file Here’s how you can define a Cabal file that uses fay-builder for a project: name: my-program version: 0.1 Use a custom build type build-type: Custom extra-source-files specifies files what should be included with a source distribution. The fay files are added here so they don’t get lost when running cabal sdist. extra-source-files: fay/*.hs data-files are also included, with the addition that Cabal will generate a paths module that you can use to fetch the resources during runtime. We probably want to precompile the fay files so the program can just serve them. data-files: my-program.cabal, js/*.js x-foo specifies custom flags that Cabal itself ignores, but we can use it for extra configuration options. x-fay-packages: fay-jquery, fay-text x-fay-root-modules: Index, Misc x-fay-include-paths: src x-fay-output-dir: data/js x-fay-source-dir: fay executable my-program main-is: Main.hs hs-source-dirs: src default-language: Haskell2010 ghc-options: -Wall This is the haskell module Cabal generates to let us access data-files. other-modules: Paths_my_program To make sure we can build the Fay code we add all the dependencies it has here along with the GHC libraries dependencies. One thing to note is that we can’t depend on base and fay-base at the same time since they have clashing module names. But if we depend on some other fay package we get a transitive dependency to fay-base and we know it will be available. build-depends: base, Cabal, fay, fay-builder, fay-jquery, fay-text We have a few options for when to build the Fay code. Building as a Cabal hook With a custom Setup.hs we can add a postBuildHook that will compile the Fay code after the executable has been compiled, but before an sdist is created. This is nice because it will generate the needed .js files before the tarball is created. The disadvantage is that Setup.hs cannot have explicit dependencies so you won’t be able to sdist your program unless you already have all dependencies installed. It is possible to do this, and fay-builder includes a defaultFayHook that you can use in Setup.hs like this: import Fay.Builder (defaultFayHook) main = defaultFayHook It’s pretty cool, but maybe not very useful because of the dependency issue. Just make sure to have fay-builder installed before running cabal configure. Building inside the program For one application I started out with the above Setup.hs approach, but soon realized that this would not work out since sdisting wasn’t possible in a clean package DB. I instead chose to add a --compile-fay flag that would compile all Fay sources when the process starts when developing. The live setup won’t do this and instead read the data-files. The trick here is that we added the .cabal file as a data-file to let us access it from other modules. import qualified Paths_my_program as Paths import qualified Fay.Builder as Builder compileFay :: IO () compileFay = do pkgDb <- getPackageDb packageDesc <- Paths.getDataFileName "my-program.cabal" >>= Builder.readPackageDescription Builder.build packageDesc pkgDb where -- Some clever way to fetch the current package DB -- if you are using a sandbox. getPackageDb :: IO (Maybe FilePath) If we had added the settings elsewhere, such as in a custom configurator file like snaplet-fay does, we wouldn’t be able to read it from within Setup.hs so the cabal file might be the only place to put this configuration if we want both features at once. One of the biggest advantages in using fay-builder is that all dependencies can be listed in one cabal file instead of having to split the client and server code into multiple packages (which would turn into three different packages if there’s shared code). 2014-04-15T09:38:00Zhttp://engineering.silk.co/post/82777010096

'L' is for the Lambda-calculus

Planet Haskell - Fri, 04/18/2014 - 06:00
'L' is for the λ-calculusSo, nearly halfway through the alphabet. How are we all doing, are we going to be able to go from 'A' to 'Z' in one month? Tune in tomorrow to see the next compelling and exciting installment of the #atozchallenge!So, this is the big one, a way at looking at mathematics that has been seething under the surface, but with the λ-calculus (pronounced 'lambda calculus') has now bubbled up to the surface.So, we've seen that the formula, or the functions, are the things that concern mathematics, and not so much the objects operated on, and that turned world-thought on its head. Well, the λ-calculus came to be to address a very particular concern that mathematicians worry about (mathematicians are such worriers. Except me, I'm carefree as a hummingbird, flitting among rose bushes and lemon trees, gathering necktar to keep my heartrate at three-thousand beats per minute.) (no, ... wait.), and that concern is this: in the functionf x y z = x + 2y - 3zHow do we know x is x, y is y and z is z?That is, what if I juxtapose y and z, mixing the two up, as it were? Is there something, anything, that enforces the ordering of the arguments to a function?In set theory, you see, the set {1, 2, 3} is equivalent to the set {3, 1, 2} or any permutation? Two sets are the same if they have the same members, so there's no ordering imposed.Now, you could enforce an ordering, by adding another layer of sets:{{1, 1}, {2, 2}, {3, 3}}And this ... 'means' (I say 'means' quite loosely, you see) that "the first number is 1, the second number is two, and the third number is two."But it doesn't at all, because, as sets are order-independent, how can we guarantee the ordering argument is the first one in the set? We can't. So we add another layering of sets:{{{1}, 27}, {42, {3}}, {{2}, 21}}}Which says: "The first argument is 27, the second argument is 21, and the third argument is 42."You see what I did there? I mixed up the ordering of the sets and even their ordering arguments, but since the order is a subset of the subset, you're still able to get that information to order the sets as you read through them.This gave mathematicians headaches, because mathematicians are wont to get headaches about things, because the above set is not the set {27, 21, 42} (or any of its permutations, but something quite different, and the functionf x y zis not the functionf {{z, {3}}, {{1}, x}, {y, {2}}(or any of its permutationsbecause the first one, f x y z, has three arguments, but f {...whatever...} has one argument, which is a set of ordered arguments.Ugh. Try working with that!Some people have enough problems working with f x y z, already!So, how to solve this?Two of the most popular ways that mathematicians solved this problem are:1. get rid of variables altogether. And numbers, too. And all symbols, except some very primitive functions which we'll predefine for you, so instead of the above functionf x y z = x + 2y - 3zwe have something looking like this:BBBVKI xxxxOkay, you can reason about that, it comes as its own Gödel numeral, so you can have fun with it that way, but ...But that's hard for a lot of people to swallow or to know what you're talking about, or to have a conversation about the veracity of the statement you've constructed.The other approach is this: partial funtions.The function f x y z can actually be represented as three functionsSome function f that takes a variable x given the result of the partial functions of y and z, or, in 'math words':f x = something + xwhere 'something' isg y = 2y + something elseand 'something else' ish z = - 3 z.See?So when we string these functions together to get our total answer (instead of three partial answers we havef x . g y . h z = x + 2y - 3zWhich gets us our original equation back, and guarantees that our f x is getting x, because it's not getting anything else, our g y and h z are getting their y and z, respectively, too, for the same reason. Once you reduce a function down to one of only one argument, then you know the argument you're getting is the right one for that function.Problem solved.Class...is...dismissed!(And all the students stand up an applaud at the end of the lecture.)That solves that problem, neatly, too.But then it magically solved a bunch of other problems, too.Because now that you're going through your math-space in your spaceship of imagination, just like Carl Sagan through his Cosmos, then it becomes totally natural to start composing these very simple functions together to get more complex ones, and you build what you're really after from these very simple building blocks.And it was Church, the inventor of the lambda calculus, or maybe his Hinglish buddy, Turing, who invented the U-combinator, aka the 'universal combinator,' or also called the Turing-combinator in his honor, was that all you needed was the lambda (λ), the variables, and 'application' (the gluing of them together) and you got out of that ... everything.And by everything, I mean, everything: numbers, counting, addition, and all math operations after that, and then, from number (again), knowledge-representation.λ z -> z  (zero)λ succ . λ z -> succ z (one) (and counting, too)You just named the things, like 'z' and 'succ,' but you could name them anything, and they would ... 'mean' the same thing.This naming to make numbers came to earn its own name: it was called 'Church numerals,' after its inventor.And the Church numbers actually mean the numbers they name, becauseλ succ . λ z -> succ succ succ succ zIs the Church numeral 'four,' but if you put in a chicken for 'succ' you have:λ succ[succ/chicken] . λ z -> succ succ succ succ z orchicken chicken chicken chicken zAnd there you go: your four chickens. Throw those puppies on the grille and you've got yourself a nice tailgate party for you and your friends to go with 'z,' the keg.(What else would 'z' stand for, I ask you.)(And the keg would definitely be 'zeroed' out, I tell you what.)The really neat thing?There's a one-to-one correspondence between combinatory logic and the λ-calculus.For if you have the λ-term K such thatλ K . λ x . λ y -> x (dropping the 'y' term)And the λ-term S such thatλ S . λ x . λ y . λ z -> xy(xz) (distributing application over the forms)Then combining the λ-terms K and S in the following way:SKKgets you λ x -> xWhich is the identity function, or the I-combinator, just like combining the combinators S and K in the exact same way gets you the same result.Surprising. Shocking. Really neat!These to things, the λ-calculus and combinatory logic were developed around the same time, the lambda calculus to provide proper ordering of variables, and then demonstrating that all you needed were the variables, and nothing else, not numbers, nor addition, nor anything else, just the λ-abstraction and function application got you to numbers and everything else.Well, combinatory logic argued that all you needed were the functions. It had no variables whatsoever.Two opposite mathematical languages, but they ended up by meaning the exact same thing!Fancy that!So, if you proved one thing in one of the languages, because it was easier to express the forms in that way, then the exact same proof held in the corresponding (and opposite) mathematical language.The other really neat thing is that once you've captured formulas in these building blocks, you can even express logic in these formula, so not only do you have:λ x . λ y . λ z = x + 2y - 3zin the λ-calculus, but you also begin to build logic statements of truth, like: For every x in the above formula, x is a number.It was later found out that these statements of truth are simple propositions that type the λ-terms.Ooh! Typed-λ calculus, and on top of that, using the λ-calculus, itself, to type ... well, itself!AND THEN!And then came this dude, Per Martin-Löf (Swedish professors call each other dude, but collegially. That's a well-known fact.), and he said, 'Yo! Homies!' and then lifted the type-system to predicate calculus, where a type depended (or was predicated on) other types or the instances of the type itself.So, he would write about the chickens (that came after the eggs, as you recall), that:For every succ in the below formula, given that every succ is a chicken, then for every z below, then the z must be a keg.λ succ . λ z -> succ succ succ succ zThe very type of z depends upon the instances ('inhabitants') of the type of succ, knowing one gives you the other, dependent on the foreknowledge.Cool!So 'cool,' in fact, that you could actually program with just the types and not have any instances or spelled out formula at all, and, guess what? People have actually done that: created programs that only had type descriptors, and the program somehow magically figured out what it was supposed to do from those type descriptions.How did it figure this out? The types themselves only allowed one way for the program to flow from its start to the end.Magic! A program with no code actually doing something!(Cue spooky Twilight Zone thematic music.)And that brings us right back to 'meaning' ... (and my quotes around 'meaning').(Question: when you're indicating that you're quoting a word, such as the word 'meaning' ... is that quoted word quoted (again) because you're quoting that you're quoting that word?)(Just wondering.)What does 'succ' and 'z' and 'x' and 'y' and the other 'z' and ... everything else ... mean?I could save that for my entry 'M' is for Meaning, but I'm not going to write that entry, because 'M,' of course, is for 'Burrito' (a.k.a. 'Monad,' but we'll get to that later).So I'll tell you know:Nothing.None of those symbols mean anything at all in mathematics. Succ is an abbreviation for 'successor function,' which means 'add one to a number,' but, for example:λ succ . λ z -> succ succ succ succ zcould mean, just as well, 'apply the function named "succ" cumulatively to some function named "z"'Because that's what is actually happening there, in the lambda calculus. The lambda calculus doesn't 'care' what 'succ' or 'z' 'mean.' They are just functional variables to be applied to each other, the above function could have just as easily be written with the symbols 'x' and 'y':λ x . λ y -> x x x x ySame functional application, same effect. No meaning.Just like 'succ' doesn't 'mean' successor function, nor does it mean 'chicken.'In mathematics, it is vitally important that the symbols mean nothing, and by nothing, I mean absolutely nothing. All that matters is the mechanics: there are symbols and how you move them around in your system, and that's it.Why is that so vitally important?Because if the symbols mean nothing, then they can represent anything, in which case math truly becomes the universal language: you encode your knowledge base as symbols ('O' is for ontology) you manipulate those symbols following the rules of manipulation you established a priori, and you come up with a result. Many times, it's a surprising result, so surprising that you get people telling you, to your face, 'that can't be done.'The you translate those symbols back to what they were used to represent, or you apply your knowledge to those symbols a posteri to the symbolic manipulations, and you do those things that you were told are impossible for you, or anyone else, to do.But you just did it, despite that fact that 'it can't be done.'The non-mathematician then has the opportunity to do something they've never done in their lives, not at home, not at school, not in college, and especially not in their jobs:They have the opportunity to learn something.About the world, yes, but the world is as how they see it, so they have the opportunity to do something even better. They have the opportunity to learn something about themselves.Math is hard.Why?Because, to 'get' math, you have to 'get' yourself, or, as is most often the case: you have to 'get' over yourself and your preconceived notions of what the world is and how the world works. Math will fail you as hard as you're failing, and, when you transcend yourself, it will succeed in places you never could before and let you see things you've never seen, ... if you are brave enough to look.The λ-calculus, by the way, is the basis for every electronic device that exists today, and for every programming language used today. A programming language judged as 'Turing-complete,' and it must be that to solve anything more than trivial problems (which, unfortunately, most programming tasks are not even trivial, anymore: they're just simply data storage and retrieval and 'ooh! look at the pretty colors on this web-page!' 'Can I get it in corn-flower blue?' 'Why, yes! Yes, you can!' 'Good, good!') Turing based his 'machine,' which was a conceptual model of a universal computational machine, on an equivalent to Church's λ-calculus (Turing and Church were two dudes who were buds.)Church developed this very simple calculus to address a very small concern. It's simple math.And it changed the world into the one we now live in.'L' is for this world we now live in. Thanks to a little bit of mathematics.Oh, and p.s. ... HA! I finally wrote a complete post. Yay!

The heartbleed bug and FP Haskell Center

Planet Haskell - Fri, 04/18/2014 - 06:00
The heartbleed bug and FP Haskell CenterIf you haven't heard about the heartbleed bug and related security issues, this article provides a good explanation.Applications developed in FPHC and deployed using the FP Application servers are not vulnerable to this bug.Services we use from Amazon and GitHub were affected. SSL connections to our servers go through Amazon software, and we use SSL to connect to GitHub repositories. Both Amazon and GitHub have already updated their services to remove the vulnerability. FP Complete has followed GitHub's suggestions by changing our passwords and OAuth tokens. You can read those guidelines at github.While we have no evidence that any sensitive information was leaked from FPHC, we recommend changing your password for FPHC as soon as possible, just in case.Other measures to increase security never hurt. Things like using two-factor authentication on sites for which it is available, and using password locker software that will generate strong passwords unique to each site, will help prevent people breaking into your accounts. This event provides a good time to consider adding these extra security measures if you aren't already using them.

'K' is for Kappa-calculus

Planet Haskell - Fri, 04/18/2014 - 06:00
'K' is for κ-calculusThis is an epic story of epic epicness!No, really.So, way back when, this guy named Church (Alonzo was his Christian name, and he liked going to churches)(Actually, that's just conjecture on my part.)Okay, everybody, stop everything and go to this wikipedia link on the Frege-Church ontology, right now.Have you read it?Well, what are you waiting for, read the post, for goodness sake!I'M WAITING!...Okay.So, I read the post, and I was all like skyr, you know? No whey!Like, totally! Like. I mean, like, wikipedia, wrote that?So, I quizzed my girls:"Girls!" I shouted excitedly, "What's the farthest planet from the Sun?"They were like, "Whaaaaa?"Then they asked, like, "Papa!" with arched eyebrows, "why do you want to know what's the farthest plant from the Sun?"Hm. I, like, have to work on my, like, diction.(We'll get to the reason behind the paroxysms of 'like's momentarily.)So, I clarified, "Planet, planet!" I shouted, "What's the farthest planet from the Sun!"EM, the older, and therefore smarter one (she has to be, you see), said, "Pluto. No, wait. Pluto's not a planet. Neptune, then!""HAAAAAA!" I screamed.Elena Marie looked at me like her Papa had finally gone off the deep end.Well, finally off the deep end, but this time, never to return, gone-off-the-deep kind-of-way.But before I could explain my victorious berserker scream ...(Berserker, n.: etymology: íslandic, "Bear Skinned")(Question: what is the etymology of the word 'etymology'?)(Did you know Icelandic is an insular Nordic language? They can read the poetic and prose eddas straight off, just like that. Could you read a 1,000 year old document? No, you can't, unless your an Icelander. So there.)(But I digress.)(As always.)(The neat thing here, is this is a digression, from a digression! And it's not even leading us back to the topic, but further away) (just as this digressionary turn discussing my digressionary habits).(Digressions within digressions within digressions.) (Move over, Frank Herbert, you dead old guy, you're not the only one who can write 'Wheels within wheels within wheels' and have a really bad movie with a Sting cameo made out of your book!)BUT BEFORE I COULD EXPLAIN my berserker-scream, Li'l Iz piped right up."How can you know what's the farthest planet of the Sun, huh, Papa? There're other planets in this galaxy, and what about other galaxies, huh?"She had me there. But never confuse the point you're trying to make with facts.WHEN I REREAD what I wrote to my girls, they sighed, appreciatively, EM said, "How ironic can that be?" and walked off to her room to read another book before her ballet recital in forty-five minutes.I asked Li'l Iz, "Was that, like, ironic?"She rolled her eyes and deigned not to answer, returning to her own book.SO, the POINT of all this is:Pluto is not a planet!Whew!I'm so glad that the ravings of a madman in Borderlands 2 would actually have significance to mathematical logic theory. The things one learns by reading blogposts by the geophfmeister.(That is moi-self) (That is French) (No, it's not.)Ooh! Digression!When I was working with some Ph.D.'s and cognitive scientists, I worked with this flaming-red-haired dude (cognitive scientist) named Bill. He would always say 'Gratzi' to people answering his questions, I would always respond: "That is French."He always gave me weird looks, but he never got the nerve to correct me.Until one day, he said 'Gratzi' to one of our analysts, and I said, 'That is French.'She looked me dead in the eye and snarled a disdainful, "Who cares?" before stalking off.Bill was on the floor, suffering a paroxysm of laugher.I think that was the best day of his life.ANYWAY!Pluto, not planet ... It's not even a plant.Besides which, Eris is more massive than Pluto, anyway. So there.Hail, Eris!SO!The Lambda calculus ('L' is for Lambda calculus) addressed a certain set of concerns, which I'll get to in Monday's post, but it, like mathematics in general, has its own set of problems, that is, you can write a function that never terminates and so it is undecidable.So, along came this little kid by the name of Masahito Hasegawa (dunno, maybe he's Italian?) and he noticed something, and that was this: the problem of the lambda calculus is that it's actually two calculi intertwined. And when you disentangle them, you have two distinct calculi, neither of which are undecidable.The first of the calculi, the one we focus on here, is the κ-calculus, or contextual calculus.The κ-calculus is this.The types are pairs of unit types:τ : 1|τ x τ|...So, you see from the types that it's either nothing (the 1, or unit type) or it's a (crossed, or cartesian product) of types.The functions only take these types, NOT functional types, so we're safe in that we're only working with unit types.The primary operation is the lifting function, which takes an instance and lifts it into the type of concern.liftτ x -> (a, x) where a is of type τThen the κ-abstraction that when acting with an instance carries it along:κ x . (some function (τ1, τ2) -> τ3) : τ3Tying lift and the κ-abstraction together gives us our function 'application' where actually no application happens, it's all just composing (in this case) tuples.

As a spin off from teaching programming to my 10 year old son...

Planet Haskell - Fri, 04/18/2014 - 06:00
As a spin off from teaching programming to my 10 year old son and his friends, we have published a sprite and pixel art editor for the iPad, called BigPixel, which you can get from the App Store. (It has a similar feature set as the earlier Haskell version, but is much prettier!) 2014-04-10T11:47:54Z

Migration Complete: Welcome to blog.cppcabrera.com!

Planet Haskell - Fri, 04/18/2014 - 06:00
The migration is now complete. Currently, I have:* Atom feed support* Hosted on https:// (with heartbleed patched)* Sources hosted on github* Main blog over at https://blog.cppcabrera.com* All content from here ported over* All posts tagged appropriatelyI've documented the process as my first new post at the new location: Learning Hakyll and Setting UpAt this point, the one feature I'd like to add to my static blog soon is the ability to have a Haskell-only feed. I'll be working on that over the coming week.Thanks again for reading, and I hope you'll enjoy visiting the new site!

Parallelism and Concurrency, Revisited

Planet Haskell - Fri, 04/18/2014 - 06:00
To my delight, I still get compliments on and criticisms of my post from three years ago (can it possibly be that long?) on parallelism and concurrency.  In that post I offered a “top down” argument to the effect that these are different abstractions with different goals: parallelism is about exploiting computational resources to maximize efficiency, concurrency is […]

Gibbs Sampling in R, Haskell, Jags and Stan

Planet Haskell - Fri, 04/18/2014 - 06:00
Introduction It’s possible to Gibbs sampling in most languages and since I am doing some work in R and some work in Haskell, I thought I’d present a simple example in both languages: estimating the mean from a normal distribution … Continue reading →

Some not-so-grand challenges in bioinformatics

Planet Haskell - Fri, 04/18/2014 - 06:00
Bioinformatics is a field in the intersection of biology, statistics, and computer science. Broadly speaking, biologists will plan an experiment, bioinformaticians will run the analysis, and the biologists will interpret the results. Sometimes it can be rather frustrating to be the bioinformatician squeezed into the middle, realizing that results are far from as robust as they should be. Here, I try to list a number of areas in bioinformatics that I think are problematic. They are not in any way “grand challenges”, like computing protein structure or identifying non-trivial factors from GWAS analysis. These are more your everyday challenges that bioinformaticians need to be aware of, but are too polite to mention. Importance of bioinformatics High throughput sequencing (HTS) has rapidly become an indispensable tool in medicine, biology, and ecology. As a technology, it is crucial in addressing many of our most important challenges: health and disease, food supply and food production, and ecologoy. Scientifically, it has allowed us to reveal the mechanisms of living organisms in unprecedented detail and scale. New technology invariably poses new challenges to analysis, and in addition to a thorough understanding of relevant biology, robust study design and analysis based on HTS requires a solid understanding of statistics and bioinformatics. A lack of this background often results in study design that is overly optimistic and ambitious, and in analysis that fails to take the many sources of noise and error into account. Specific examples The typical bioinformatics pipeline is (as implied by the name) structured as a chain of operations. The tools used will of course not be perfect, and they will introduce errors, artifacts, and biases. Unfortunately, the next stage in the pipeline usually works from the assumption that the results from the previous stage are correct, and errors thus propagate and accumulate - and in the end, it is difficult to say whether results represent real, biological phenomena, or are simply artifacts of the analysis. What makes matters worse, is that the pipeline is often executed by scientists with a superficial knowledge of the errors that can occur. (A computer said it, so it must be true.) Gene prediction and transcript assembly One important task in genomics is identifying the genes of new organisms. This can be performed with a variety of tools, and in one project, we used both ab initio gene prediction (MAKER and Augustus) as well as RNAseq assembly (CLC, abyss). The results varied substantially, with predicted gene counts ranging from 16000 to 45000. To make things worse, clustering revealed that there was not a simple many-to-one relationship between the predicted genes. Clearly, results from analysis of e.g. GO-enrichment or expansion and contraction of gene families will be heavily dependent on which set of transcripts one uses. Expression analysis Quantifying the expression of specific genes is an important tool in many studies. In addition to the challenges of acquiring a good set of transcripts, expression analysis based on RNAseq introduces several other problems. The most commonly used measure, reads (or fragments) per kilobase per million (RPKM), has some statistical problems. In addition, mapping is challenging, and in particular, mapping reads from transcripts to a genome will struggle with intron/exon boundaries. Of course, you can map to the transcriptome instead, but as we have seen, the reliability of reference transcriptomes are not always as we would like. Reference genomes The main goal of a de novo genome project is to produce a correct reference sequence for the genome. Although in many ways a simpler task than the corresponding transcriptome assembly, there are many challenges to overcome. In reality, most genome sequences are fragmented and contain errors. But even in the optimal case, a reference genome may not be representative. For instance, a reference genome from a single individual is clearly unlikely to accurately model sex chromosomes of the other sex (but on the other hand, similarities between different sex chromosomes will likely lead to a lot of incorrectly mapped reads). But a single individual is often not representative even for other individuals of the same sex. E.g., an obvious method for identifying a sex chromosome is to compare sequence data from one sex to the draft reference (which in our case happened to be from an individual of the other sex) Unfortunately, it turned out that the marker we found to be absent in the reference - and thus inferred to be unique to the other sex – also occurred in 20% of the other sex. Just not in the sequenced individual. So while reference genomes may work fairly well for some species, it is less useful for species with high genomic variation (or with B-chromosomes), and simply applying methods developed for mammals to different species will likely give misleading and/or incorrect results. Genome annotation I recently attended an interesting talk about pseudogenes in various species. One thing that stuck in my mind, was that the number of pseudogenes in a species is highly correlated with the institution responsible for the gene prediction. Although this is just an observation and not a scientific result, it seems very likely that different institutes use in-house pipelines with varying degrees of sensitivity/specificity. Thus what one pipeline would identify as a real gene, a more conservative pipeline would ignore as a pseudogene. So here too, our precious curated resources may be less valuable than you thought. Population genomics In population genomics, there are many different measures of diversity, but the most commonly used is FST. Unfortunately, the accuracy of estimating FST for a SNP is highly dependent on coverage and error rate, and a simulation study I did showed that with the coverage commonly used in projects, the estimated FST has a large variance. In addition, the usual problems apply: the genome may not be complete, correct or representative. And even if it were, reads could still be mapped incorrectly. And it is more likely to fail in variable regions, meaning you get less accurate results exactly in the places it matters most. Variant analysis Variant analysis suffers from many of the same challenges as population genomics, genome assembly and mapping being what they are. Again, estimated allele frequencies tend to be wildly inaccurate. Structural variants (non-trivial indels, transpositions, copy number variation, etc) are very common, but difficult to detect accurately from mapped reads. In many cases, such features will just result in lower mapping coverage, further reducing your chances of identifying them. Curation is expensive Analysis is challenging even under the best of conditions – with high quality data and using a good reference genome that is assembled and annotated and curated by a reputable institution or heavyweight consortium. But in many cases, data quality is poor and we don’t even have the luxury of good references. A de novo genome project takes years and costs millions. This is acceptable for species crucial to medicine, like human or mouse, and for food production, like cow or wheat. These are sequenced by large multi-national consortia at tremendous expense. But while we will eventually get there for the more important species, the large majority of organisms – the less commercially or scientifically interesting ones – will never have anything close to this. Similarly, the databases with unannotated sequences (e.g. UniProt) grow at a much quicker pace than the curated databases (e.g. SwissProt), simply because identifying the function of a protein in the lab takes approximately one post doc., and nobody has the incentives or manpower to do this. Conclusions So, what is the take-home message from all of this? Let’s take a moment to sum up: Curation is less valuable than you think Everybody loves and anticipates finished genomes, complete with annotated transcripts and proteins. But there’s a world of difference between the finished genomes of widely studied species and the draft genomes of the more obscure ones. Often, the reference resources are incomplete and partially incorrect, and sometimes the best thing that can be said for them, is that as long as we all use it our results will be comparable because we all make the same errors. Our methods aren’t very robust Genome assembly, gene annotation, transcriptome assembly, and functional annotation is difficult. Anybody can run some assembler or gene predictor and get a result, but if you run two of them, you will get two different results. Sequence mapping is not very complicated (at a conference, somebody claimed to have counted over ninety programs for short read mapping), and works well if you have a good reference genome, if you don’t have too many repeats, and if you don’t have too much variation in your genome. In other words, everywhere you don’t really need it. Dealing with the limitations Being everyday challenges, I don’t think any of this is insurmountable, but I’ll save that for a later post. Or grant application. In the meantime, do let me know what you think. 2014-04-08T17:00:00Z

Proposal: Changes to the PVP

Planet Haskell - Fri, 04/18/2014 - 06:00
As I mentioned two posts ago, there was a serious discussion on the libraries mailing list about the Package Versioning Policy (PVP).This blog post presents some concrete changes I'd like to see to the PVP to make it better for both general consumers of Hackage, and for library authors as well. I'll start off with a summary of the changes, and then give the explanations:The goal of the PVP needs to be clarified. Its purpose is not to ensure reproducible builds of non-published software, but rather to provide for more reliable builds of libraries on Hackage. Reproducible builds should be handled exclusively through version freezing, the only known technique to actually give the necessary guarantees.Upper bounds should not be included on non-upgradeable packages, such as base and template-haskell (are there others?). Alternatively, we should establish some accepted upper bound on these packages, e.g. many people place base < 5 on their code.We should be distinguishing between mostly-stable packages and unstable packages. For a package like text, if you simply import Data.Text (Text, pack, reverse), or some other sane subset, there's no need for upper bounds.Note that this doesn't provide a hard-and-fast rule like the current PVP, but is rather a matter of discretion. Communication between library authors and users (via documentation or other means) would be vital to making this work well.For a package version A.B.C, a bump in A or B indicates some level of breaking change. As an opt-in approach, package authors are free to associated meaning to A and B beyond what the PVP requires. Libraries which use these packages are free to rely on the guarantees provided by package authors when placing upper bounds.Note that this is very related to point (3).While I (Michael Snoyman) am the author of this proposal, the following people have reviewed the proposal and support it:Bryan O'SullivanFelipe LessaRoman CheplyakaVincent HanquezReproducible buildsThere are a number of simple cases that can result in PVP-compliant code not being buildable. These aren't just hypothetical cases; in my experience as both a package author and Stackage maintainer, I've seen these come up.Package foo version 1.0 provides an instance for MonadFoo for IO and Identity. Version 1.1 removes the IO instance for some reason. Package bar provides a function:bar :: MonadFoo m => Int -> m DoublePackage bar compiles with both version 1.0 and 1.1 of foo, and therefore (following the PVP) adds a constraint to its cabal file foo >= 1.0 && < 1.2.Now a user decides to use the bar package. The user never imports anything from foo, and therefore has no listing for foo in the cabal file. The user code depends on the IO instance for MonadFoo. When compiled with foo 1.0, everything works fine. However, when compiled with foo 1.1, the code no longer compiles.Similarly, instead of typeclass instances, the same situation can occur with module export lists. Consider version 1.0 of foo which provides:module Foo (foo1, foo2) whereVersion 1.1 removes the foo2 export. The bar package reexports the entire Foo module, and then a user package imports the module from bar. If the user package uses the foo2 function, it will compile when foo-1.0 is used, but not when foo-1.1 is used.In both of these cases, the issue is the same: transitive dependencies are not being clamped down. The PVP makes an assumption that the entire interface for a package can be expressed in its version number, which is not true. I see three possible solutions to this:Try to push even more of a burden onto package authors, and somehow make them guarantee that their interface is completely immune to changes elsewhere in the stack. This kind of change was proposed on the libraries list. I'm strongly opposed to some kind of change like this: it makes authors' lives harder, and makes it very difficult to provide backwards compatibility in libraries. Imagine if transformers 0.4 adds a new MonadIO instance; the logical extreme of this position would be to disallow a library from working with both transformers 0.3 and 0.4, which will split Hackage in two.Modify the PVP so that instead of listing just direct dependencies, authors are required to list all transitive dependencies as well. So it would be a violation to depend on bar without explicitly listing foo in the dependency list. This will work, and be incredibly difficult to maintain. It will also greatly increase the time it takes for a new version of a deep dependency to be usable due to the number of authors who will have to bump version bounds.Transfer responsibility for this to package users: if you first built your code against foo 1.0, you should freeze that information and continue building against foo 1.0, regardless of the presence of new versions of foo. Not only does this increase reproducibility, it's just common sense: it's entirely possible that new versions of a library will introduce a runtime bug, performance regression, or even fix a bug that your code depends on. Why should the reliability of my code base be dependent on the actions of some third party that I have no control over?Non-upgradeable packagesThere are some packages which ship with GHC and cannot be upgraded. I'm aware of at least base and template-haskell, though perhaps there are others (haskell98 and haskell2010?). In the past, there was good reason to place upper bounds on base, specifically with the base 3/4 split. However, we haven't had that experience in a while, and don't seem to be moving towards doing that again. In today's world, we end up with the following options:Place upper bounds on base to indicate "I haven't tested this with newer versions of GHC." This then makes it difficult for users to test out that package with newer versions of GHC.Leave off upper bounds on base. Users may then try to install a package onto a version of GHC on which the package hasn't been tested, which will result in either (1) everything working (definitely the common case based on my Stackage maintenance), or (2) getting a compilation error.I've heard two arguments to push us in the direction of keeping the upper bounds in this case, so I'd like to address them both:cabal error messages are easier to understand than GHC error messages. I have two problems with that:I disagree: cabal error messages are terrible. (I'm told this will be fixed in the next version of cabal.) Take the following output as a sample:cabal: Could not resolve dependencies: trying: 4Blocks-0.2 rejecting: base-4.6.0.1/installed-8aa... (conflict: 4Blocks => base>=2 && <=4) rejecting: base-4.6.0.1, 4.6.0.0, 4.5.1.0, 4.5.0.0, 4.4.1.0, 4.4.0.0, 4.3.1.0, 4.3.0.0, 4.2.0.2, 4.2.0.1, 4.2.0.0, 4.1.0.0, 4.0.0.0, 3.0.3.2, 3.0.3.1 (global constraint requires installed instance)I've seen a number of users file bug reports not understanding that this message means "you have the wrong version of GHC."Even if the error messages were more user-friendly, they make it more difficult to fix the actual problem: the code doesn't compile with the new version of GHC. Often times, I've been able to report an error message to a library author and, without necessarily even downloading the new version of GHC, he/she has been able to fix the problem.Using upper bounds in theory means that cabal will be able to revert to an older version of the library that is compatible with the new version of GHC. However, I find it highly unlikely that there's often- if ever- a case where an older version of a library is compatible with a later version of GHC.Mostly-stable, and finer-grained versioningI'll combine the discussion of the last two points. I think the heart of the PVP debates really comes from mostly-stable packages. Let's contrast with the extremes. Consider a library which is completely stable, never has a breaking change, and has stated with absolute certainty that it never will again. Does anyone care about upper bounds on this library? They're irrelevant! I'd have no problem with including an upper bound, and I doubt even the staunchest PVP advocates would really claim it's a problem to leave it off.On the other hand, consider an extremely unstable library, which is releasing massively breaking changes on a weekly basis. I would certainly agree in that case that an upper bound on that library is highly prudent.The sticking point is the middle ground. Consider the following code snippet:import Data.Text (Text, pack) foo :: Text foo = pack "foo"According to the PVP as it stands today, this snippet requires an upper bound of < 1.2 on the text package. But let's just play the odds here: does anyone actually believe there's a real chance that the next iteration of text will break this code snippet? I highly doubt it; this is a stable subset of the text API, and I doubt it will ever be changing. The same can be said of large subsets of many other packages.By putting in upper bounds in these cases, we run a very real risk of bifurcating Hackage into "those demanding the new text version for some new feature" vs "those who haven't yet updated their upper bounds to allow the new version of text."The PVP currently takes an extremely conservative viewpoint on this, with the goal of solving just one problem: making sure code that compiles now continues to compile. As I demonstrated above, it doesn't actually solve that problem completely. And in addition, in this process, it has created other problems, such as this bifurcation.So my proposal is that, instead of creating rigid rules like "always put an upper bound no matter what," we allow some common sense into the process, and also let package authors explicitly say "you can rely on this API not changing."

Generate Javascript classes for your .NET types

Planet Haskell - Fri, 04/18/2014 - 06:00
We open-sourced another library: ClosureExterns.NET (on github and nuget). It generates Javascript classes from .NET-based backend types, to preserve type “safety” (as safe as Javascript gets) across front- and backend. As a bonus you get Google closure annotations. The type annotations are understood by WebStorm (and other editors) and improve your development experience. Also, if […]
Syndicate content