Functional Programming

DarcsWatch End-Of-Life’d

Planet Haskell - Fri, 02/27/2015 - 07:00
Almost seven years ago, at a time when the “VCS wars” have not even properly started yet, GitHub was seven days old and most Haskell related software projects were using Darcs as their version control system of choice, when you submitted a patch, you simply ran darcs send and mail with your changes would be sent to the right address, e.g. the maintainer or a mailing list. This was almost as convenient as Pull Requests are on Github now, only that it was tricky to keep track of what was happening with the patch, and it would be easy to forget to follow up on it. So back then I announced DarcsWatch: A service that you could CC in your patch submitting mail, which then would monitor the repository and tell you about the patches status, i.e. whether it was applied or obsoleted by another patch. Since then, it quitely did its work without much hickups. But by now, a lot of projects moved away from Darcs, so I don’t really use it myself any more. Also, its Darcs patch parser does not like every submissions by a contemporary darcs, so it is becoming more and more unreliable. I asked around on the xmonad and darcs mailing lists if others were still using it, and noboy spoke up. Therefore, after seven years and 4660 monitored patches, I am officially ceasing to run DarcsWatch. The code and data is still there, so if you believe this was a mistake, you can still speak up -- but be prepared to be asked to take over maintaining it. I have a disklike for actually deleting data, so I’ll keep the static parts of DarcsWatch web page in the current state running. I’d like to thank the guys from spiny.org.uk for hosting DarcsWatch on urching for the last 5 years. 2015-02-25T23:39:00Z Joachim Breitner mail@joachim-breitner.de

Making withSocketsDo unnecessary

Planet Haskell - Fri, 02/27/2015 - 07:00
Summary: Currently you have to call withSocketsDo before using the Haskell network library. In the next version you won't have to.The Haskell network library has always had a weird and unpleasant invariant. Under Windows, you must call withSocketsDo before calling any other functions. If you forget, the error message isn't particularly illuminating (e.g. getAddrInfo, does not exist, error 10093). Calling withSocketsDo isn't harmful under Linux, but equally isn't necessary, and thus easy to accidentally omit. The network library has recently merged some patches so that in future versions there is no requirement to call withSocketsDo, even on Windows.Existing versions of networkThe reason for requiring withSocketsDo is so that the network library can initialise the Windows Winsock library. The code for withSocketsDo was approximately:withSocketsDo :: IO a -> IO a#if WINDOWSwithSocketsDo act = do initWinsock act `finally` termWinsock#elsewithSocketsDo act = act#endifWhere initWinsock and termWinsock were C functions. Both checked a mutable variable so they only initialised/terminated once. The initWinsock function immediately initialised the Winsock library. The termWinsock function did not terminate the library, but merely installed an atexit handler, providing a function that ran when the program shut down which terminated the Winsock library.As a result, in all existing versions of the network library, it is fine to nest calls to withSocketsDo, call withSocketsDo multiple times, and to perform networking operations after withSocketsDo has returned.Future versions of networkMy approach to removing the requirement to call withSocketsDo was to make it very cheap, then sprinkle it everywhere it might be needed. Making such a function cheap on non-Windows just required an INLINE pragma (although its very likely GHC would have always inlined the function anyway).For Windows, I changed to:withSocketsDo act = do evaluate withSocketsInit; act {-# NOINLINE withSocketsInit #-}withSocketsInit = unsafePerformIO $ do initWinsock termWinsockNow withSocketsDo is very cheap, with subsequent calls requiring no FFI calls, and thanks to pointer tagging, just a few cheap instructions. When placing additional withSocketsDo calls my strategy was to make sure I called it before constructing a Socket (which many functions take as an argument), and when taking one of the central locks required for the network library. In addition, I identified a few places not otherwise covered.In newer versions of the network library it is probably never necessary to call withSocketsDo - if you find a place where one is necessary, let me know. However, for compatibility with older versions on Windows, it is good practice to always call withSocketsDo. Libraries making use of the network library should probably call withSocketsDo on their users behalf. 2015-02-25T22:11:00Z Neil Mitchell noreply@blogger.com

Haskell on Yosemite (OSX 10.10)

Planet Haskell - Fri, 02/27/2015 - 07:00
Haskell on Yosemite (OSX 10.10) Nearly all my development has been done under linux. Only occasionally have I worked under osx. This is all to change – osx is to be my primary development platform. In the past, my experiences … Continue reading →

Engineering Manager (OCaml or Haskell under Linux) at FireEye (Full-time)

Planet Haskell - Fri, 02/27/2015 - 07:00
Position Title: Engineering Manager Location: Dresden, Germany The Company FireEye has invented a purpose-built, virtual machine-based security platform that provides real-time threat protection to enterprises and governments worldwide against the next generation of cyber attacks. These highly sophisticated cyber attacks easily circumvent traditional signature-based defenses, such as next-generation firewalls, IPS, anti-virus, and gateways. The FireEye Threat Prevention Platform provides real-time, dynamic threat protection without the use of signatures to protect an organization across the primary threat vectors and across the different stages of an attack life cycle. The core of the FireEye platform is a virtual execution engine, complemented by dynamic threat intelligence, to identify and block cyber attacks in real time. FireEye has over 3,100 customers across 67 countries, including over 200 of the Fortune 500. Job Description: In Dresden, Germany, an outstanding team of formal methods engineers uses formal methods tools, such as proof assistants, to develop correctness proofs for FireEye's leading edge products. In real world applications of formal methods tools, automation is often not sufficient for the specific problems at hand. Therefore, we are seeking outstanding software developers with a passion for implementing both well-designed as well as ad-hoc formal methods software tools for proof refactoring, proof search, systematic testing and other areas. Responsibilities: Design and development of software tools, scripts and automated processes for various tasks, supporting formal method tools Refactoring, redesigning and rewriting earlier ad-hoc solutions in a systematic way Maintaining existing tools and scripts Continuous focus and contribution in the areas of performance, reliability, scalability, and maintainability of the product Active participation in our ongoing process enhancements in software development and verification practices Desired Skills & Experience BS or MS in Computer Science or equivalent experience Strong programming skills in OCaml or Haskell under Linux Good skills in scripting languages, such as bash, perl or python Good knowledge of logic and formal logical reasoning Experience with parallel programs running on multiple machines and cores Interest in acquiring basic knowledge about proof assistants for higher-order logic Excellent interpersonal and teamwork skills Strong problem solving, troubleshooting and analysis skills Excellent oral and written communication skills Get information on how to apply for this position. 2015-02-16T17:29:14Z

Senior Software Engineer at McGraw-Hill Education (Full-time)

Planet Haskell - Fri, 02/27/2015 - 07:00
This Senior Software Engineer position is with the new LearnSmart team at McGraw-Hill Education's new and growing Research & Development center in Boston's Innovation District. We make software that helps college students study smarter, earn better grades, and retain more knowledge. The LearnSmart adaptive engine powers the products in our LearnSmart Advantage suite — LearnSmart, SmartBook, LearnSmart Achieve, LearnSmart Prep, and LearnSmart Labs. These products provide a personalized learning path that continuously adapts course content based on a student’s current knowledge and confidence level. On our team, you'll get to: Move textbooks and learning into the digital era Create software used by millions of students Advance the state of the art in adaptive learning technology Make a real difference in education Our team's products are built with Flow, a functional language in the ML family. Flow lets us write code once and deliver it to students on multiple platforms and device types. Other languages in our development ecosystem include especially JavaScript, but also C++, SWF (Flash), and Haxe. If you're interested in functional languages like Scala, Swift, Erlang, Clojure, F#, Lisp, Haskell, and OCaml, then you'll enjoy learning Flow. We don't require that you have previous experience with functional programming, only enthusiasm for learning it. But if you have do some experience with functional languages, so much the better! (On-the-job experience is best, but coursework, personal projects, and open-source contributions count too.) We require only that you: Have a solid grasp of CS fundamentals (languages, algorithms, and data structures) Be comfortable moving between multiple programming languages Be comfortable with modern software practices: version control (Git), test-driven development, continuous integration, Agile Get information on how to apply for this position. 2015-02-24T15:34:45Z

Peg Solitaire

Planet Haskell - Fri, 02/27/2015 - 07:00
I'm proud to announce my new game: Peg Solitaire.Peg solitaire is a classic board puzzle game for one player involving movement of pegs on a board with holes. The objective is, making valid moves, to empty the entire board except for a solitary peg. A valid move is to jump a peg orthogonally over an adjacent peg into a hole two positions away and then to remove the jumped peg. Thus valid moves in each of the four orthogonal directions are: Jump to right;Jump to left;Jump up;Jump down.The first challenge is to solve the puzzle, and then you can challenge yourself and try to solve it with the minimum number of movements. Each jump is considered a move, multiple jumps with the same peg are considered a single move, so you can keep jumping with the same peg and still count it as a single movement. The game displays the count of how many movements you have made at the top of the screen. Features: Multiple board types to select;Five different board types: English (Standard), European (French), Wiegleb, Asymmetrical (Bell) and Diamond;Display game move count;Quick replay same board;Give up current game;Restart current game;Shows system status bar, so you can keep track of other things while playing, for example current timeThe game is entirely free and displays ads, there is no in-app purchase options.It's made of web technologies (HTML, JavaScript, CSS) and deployed as a native application via Apache Cordova. Translated to six languages. Really exciting!I tried to get the look and feel of each platform individually, using a flat design for iOS, a Material design for Android and a dark theme for Windows Phone. Take a look, I hope it's pretty obvious which screenshot belongs to which platform:There are still updates to come. iOS has one in the queue for review process and I'm delivering another one for Windows Phone this week.Take a look at the game, download, play it and leave a rating or feedback:Peg Solitaire at Android Play StorePeg Solitaire at iOS App StorePeg Solitaire at Windows Phone StoreHope to see you soon!

Writing an LRU Cache in Haskell

Planet Haskell - Fri, 02/27/2015 - 07:00
Introduction In-memory caches form an important optimisation for modern applications. This is one area where people often tend to write their own implementation (though usually based on an existing idea). The reason for this is mostly that having a one-size-fits all cache is really hard, and people often want to tune it for performance reasons according to their usage pattern, or use a specific interface that works really well for them. However, this sometimes results in less-than-optimal design choices. I thought I would take some time and explain how an LRU cache can be written in a reasonably straightforward way (the code is fairly short), while still achieving great performance. Hence, it should not be too much trouble to tune this code to your needs. The data structure usually underpinning an LRU cache is a Priority Search Queue, where the priority of an element is the time at which it was last accessed. A number of Priority Search Queue implementations are provided by the psqueues package, and in this blogpost we will be using its HashPSQ data type. Disclaimer: I am one of the principal authors of the psqueues package. This blogpost is written in literate Haskell, so you should be able to plug it into GHCi and play around with it. The raw file can be found here. A pure implementation First, we import some things, including the Data.HashPSQ module from psqueues. > {-# LANGUAGE BangPatterns #-} > import Control.Applicative ((<$>)) > import Data.Hashable (Hashable, hash) > import qualified Data.HashPSQ as HashPSQ > import Data.IORef (IORef, newIORef, atomicModifyIORef') > import Data.Int (Int64) > import Data.Maybe (isNothing) > import qualified Data.Vector as V > import Prelude hiding (lookup) Let’s start with our datatype definition. Our Cache type is parameterized by k and v, which represent the types of our keys and values respectively. The priorities of our elements will be the logical time at which they were last accessed, or the time at which they were inserted (for elements which have never been accessed). We will represent these logical times by values of type Int64. > type Priority = Int64 The cTick field stores the “next” logical time – that is, the value of cTick should be one greater than the maximum priority in cQueue. At the very least, we need to maintain the invariant that all priorities in cQueue are smaller than cTick. A consequence of this is that cTick should increase monotonically. This is violated in the case of an integer overflow, so we need to take special care of that case. > data Cache k v = Cache > { cCapacity :: !Int -- ^ The maximum number of elements in the queue > , cSize :: !Int -- ^ The current number of elements in the queue > , cTick :: !Priority -- ^ The next logical time > , cQueue :: !(HashPSQ.HashPSQ k Priority v) > } Creating an empty Cache is easy; we just need to know the maximum capacity. > empty :: Int -> Cache k v > empty capacity > | capacity < 1 = error "Cache.empty: capacity < 1" > | otherwise = Cache > { cCapacity = capacity > , cSize = 0 > , cTick = 0 > , cQueue = HashPSQ.empty > } Next, we will write a utility function to ensure that the invariants of our datatype are met. We can then use that in our lookup and insert functions. > trim :: (Hashable k, Ord k) => Cache k v -> Cache k v > trim c The first thing we want to check is if our logical time reaches the maximum value it can take. If this is the case, can either reset all the ticks in our queue, or we can clear it. We choose for the latter here, since that is simply easier to code, and we are talking about a scenario that should not happen very often. > | cTick c == maxBound = empty (cCapacity c) Then, we just need to check if our size is still within bounds. If it is not, we drop the oldest item – that is the item with the smallest priority. We will only ever need to drop one item at a time, because our cache is number-bounded and we will call trim after every insert. > | cSize c > cCapacity c = c > { cSize = cSize c - 1 > , cQueue = HashPSQ.deleteMin (cQueue c) > } > | otherwise = c Insert is pretty straightforward to implement now. We use the insertView function from Data.HashPSQ which tells us whether or not an item was overwritten. insertView :: (Hashable k, Ord p, Ord k) => k -> p -> v -> HashPSQ k p v -> (Maybe (p, v), HashPSQ k p v) This is necessary, since we need to know whether or not we need to update cSize. > insert :: (Hashable k, Ord k) => k -> v -> Cache k v -> Cache k v > insert key val c = trim $! > let (mbOldVal, queue) = HashPSQ.insertView key (cTick c) val (cQueue c) > in c > { cSize = if isNothing mbOldVal then cSize c + 1 else cSize c > , cTick = cTick c + 1 > , cQueue = queue > } Lookup is not that hard either, but we need to remember that in addition to looking up the item, we also want to bump the priority. We can do this using the alter function from psqueues: that allows us to modify a value (bump its priority) and return something (the value, if found) at the same time. alter :: (Hashable k, Ord k, Ord p) => (Maybe (p, v) -> (b, Maybe (p, v))) -> k -> HashPSQ.HashPSQ k p v -> (b, HashPSQ.HashPSQ k p v) The b in the signature above becomes our lookup result. > lookup > :: (Hashable k, Ord k) => k -> Cache k v -> Maybe (v, Cache k v) > lookup k c = case HashPSQ.alter lookupAndBump k (cQueue c) of > (Nothing, _) -> Nothing > (Just x, q) -> > let !c' = trim $ c {cTick = cTick c + 1, cQueue = q} > in Just (x, c') > where > lookupAndBump Nothing = (Nothing, Nothing) > lookupAndBump (Just (_, x)) = (Just x, Just ((cTick c), x)) That basically gives a clean and simple implementation of a pure LRU Cache. If you are only writing pure code, you should be good to go! However, most applications deal with caches in IO, so we will have to adjust it for that. A simple IO-based cache Using an IORef, we can update our Cache to be easily usable in the IO Monad. > newtype Handle k v = Handle (IORef (Cache k v)) Creating one is easy: > newHandle :: Int -> IO (Handle k v) > newHandle capacity = Handle <$> newIORef (empty capacity) Our simple interface only needs to export one function. cached takes the key of the value we are looking for, and an IO action which produces the value. However, we will only actually execute this IO action if it is not present in the cache. > cached > :: (Hashable k, Ord k) > => Handle k v -> k -> IO v -> IO v > cached (Handle ref) k io = do First, we check the cache using our lookup function from above. This uses atomicModifyIORef', since our lookup might bump the priority of an item, and in that case we modify the cache. > lookupRes <- atomicModifyIORef' ref $ \c -> case lookup k c of > Nothing -> (c, Nothing) > Just (v, c') -> (c', Just v) If it is found, we can just return it. > case lookupRes of > Just v -> return v Otherwise, we execute the IO action and call atomicModifyIORef' again to insert it into the cache. > Nothing -> do > v <- io > atomicModifyIORef' ref $ \c -> (insert k v c, ()) > return v Contention This scheme already gives us fairly good performance. However, that can degrade a little when lots of threads are calling atomicModifyIORef' on the same IORef. atomicModifyIORef' is implemented using a compare-and-swap, so conceptually it works a bit like this: atomicModifyIORef' :: IORef a -> (a -> (a, b)) -> IO b atomicModifyIORef' ref f = do x <- readIORef ref let (!x', !y) = f x -- Atomically write x' if value is still x swapped <- compareAndSwap ref x x' if swapped then return y else atomicModifyIORef' ref f -- Retry We can see that this can lead to contention: if we have a lot of concurrent atomicModifyIORef's, we can get into a retry loop. It cannot cause a deadlock (i.e., it should still eventually finish), but it will still bring our performance to a grinding halt. This is a common problem with IORefs which I have also personally encountered in real-world scenarios. A striped cache A good solution around this problem, since we already have a Hashable instance for our key anyway, is striping the keyspace. We can even reuse our Handle in quite an elegant way. Instead of just using one Handle, we create a Vector of Handles instead: > newtype StripedHandle k v = StripedHandle (V.Vector (Handle k v)) The user can configure the number of handles that are created: > newStripedHandle :: Int -> Int -> IO (StripedHandle k v) > newStripedHandle numStripes capacityPerStripe = > StripedHandle <$> V.replicateM numStripes (newHandle capacityPerStripe) Our hash function then determines which Handle we should use: > stripedCached > :: (Hashable k, Ord k) > => StripedHandle k v -> k -> IO v -> IO v > stripedCached (StripedHandle v) k = > cached (v V.! idx) k > where > idx = hash k `mod` V.length v Because our access pattern is now distributed among the different Handles, we should be able to avoid the contention problem. Conclusion We have implemented a very useful data structure for many applications, with two variations and decent performance. Thanks to the psqueues package, the implementations are very straightforward, small in code size, and it should be possible to tune the caches to your needs. Many variations are possible: you can use real timestamps (UTCTime) as priorities in the queue and have items expire after a given amount of time. Or, if modifications of the values v are allowed, you can add a function which writes the updates to the cache as well as to the underlying data source. For embedding the pure cache into IO, there many alternatives to using IORefs: for example, we could have used MVars or TVars. There are other strategies for reducing contention other than striping, too. You could even write a cache which is bounded by its total size on the heap, rather than by the number of elements in the queue. If you want a single bounded cache for use across your entire application, you could allow it to store heterogeneously-typed values, and provide multiple strongly-typed interfaces to the same cache. However, implementing these things is a story for another time. Thanks to the dashing Alex Sayers for proofreading and suggesting many corrections and improvements. 2015-02-24T00:00:00Z

GHC Weekly News - 2015/02/23

Planet Haskell - Fri, 02/27/2015 - 07:00
Hi *, It's once again time for your sometimes-slightly-irregularly-scheduled GHC news! This past Friday marked the end of the FTP vote for GHC 7.10, there's an RC on the way (see below), we've closed out a good set of patches and tickets from users and pushed them into HEAD, and to top it off - it's your editor's birthday today, so that's nice! Quick note: as said above GHC HQ is expecting to make a third release candidate for GHC 7.10.1 soon in early March since the delay has allowed us to pick up some more changes and bugfixes. We plan on the final release being close to the end of March (previously end of February). This week, GHC HQ met up again to discuss and write down our current priorities and thoughts: After discussing our current timetable - as we're currently hovering around the ICFP deadline - we're hoping to make our third GHC 7.10.1 release candidate on Friday, March 13th, with the final release on Friday, March 27th. This was the main take away from our meeting today. We've also had a little more list activity this week than we did before: The FTP debate has ended, and the results are in: GHC 7.10.1 will continue with the generalized Prelude, known as "Plan FTP". ​https://mail.haskell.org/pipermail/libraries/2015-February/025009.html Edward Kmett announced the directory package needed an active maintainer to step up - and luckily, Phil Ruffwind and Elliot Robinson did just that and stepped up as maintainers! ​https://mail.haskell.org/pipermail/ghc-devs/2015-February/008358.html Kazu Yamamoto asked about a behavioral change in ghc-7.10 for Data.Char - it turns out this difference looks like it's caused by GHC 7.10 shipping an update to use Unicode 7.0 datasets. ​https://mail.haskell.org/pipermail/ghc-devs/2015-February/008371.html Thomas Bereknyei asked about a fundamental change in the Arrow desugarer, and whether or not something like this was worth it. Jan Stolarek and Ross Paterson stepped in to speak up to some specifics Thomas had about. ​https://mail.haskell.org/pipermail/ghc-devs/2015-February/008377.html Gabor Grief spoke up about strange behavior in the desugarer when using RebindableSyntax and RankNTypes, which Adam Gundry popped in to say may be a deeper issue due to the way typechecking and desugaring interact - ​https://mail.haskell.org/pipermail/ghc-devs/2015-February/008383.html Johan Tibell announced Cabal 1.22.1.0, which will ship with GHC 7.10. ​https://mail.haskell.org/pipermail/ghc-devs/2015-February/008388.html Some noteworthy commits that went into ghc.git in the past week include: Commit 1b82619bc2ff36341d916c56b0cd67a378a9c222 - The hpc commands now take a configurable verbosity level (merged to `ghc-7.10) Commit 0fa20726b0587530712677e50a56c2b03ba43095 - GHC now errors out on a module explicitly declared Main without a main export. Closed tickets the past week include: #9266, #10095, #9959, #10086, #9094, #9606, #9402, #10093, #9054, #10102, #4366, #7604, #9103, #10104, #7765, #7103, #10051, #7056, #9907, #10078, #10096, #10072, #10043, #9926, #10088, #10091, #8309, #9049, #9895, and #8539. 2015-02-23T23:52:48Z thoughtpolice

The Future of community.haskell.org

Planet Haskell - Fri, 02/27/2015 - 07:00
Community.haskell.org is a server in our ecosystem that comparatively few know about these days. It actually was, to my knowledge, a key part of how the whole haskell.org community infrastructure got set up way back when. The sparse homepage still even says: "This server is run by a mysterious group of Haskell hackers who do not wish to be known as a Cabal, and is funded from money earned by haskell.org mentors in the Google Summer-of-Code programme." At a certain point after this server was created, it ceased to be run by a "mysterious group of Haskell hackers" and instead became managed officially by the haskell.org Committee that we know today. You can see the original announcement email in the archives. The community server, first set up in 2007 played a key role back before the current set of cloud-based services we know today was around. It provided a shared host which could provide many of the services a software project needs -- VCS hosting, public webspace for documentation, issue trackers, mailing lists, and soforth. Today, the server is somewhat of a relic of another time. People prefer to host projects in places like github, bitbucket, or darcs hub. Issue trackers likewise tend to be associated with these hosts, and there are other free, hosted issue trackers around as well. When folks want a mailing list, they tend to reach for google groups. Meanwhile, managing a big box full of shell account has become a much more difficult, riskier proposition. Every shell account is a security vulnerability waiting to happen, and there are more adversarial "scriptkiddie" hackers than ever looking to claim new boxes to spam and otherwise operate from. Managing a mailman installation is likewise more difficult. There are more spammers out there, with better methods, and unmaintained lists quickly can turn into ghost towns filled with pages of linkspam and nothing but. The same sad fate falls on unmaintained tracs. As a whole, the internet is a more adversarial world for small, self-hosted services, especially those whose domain names have some "google juice". We think it would be good to, to the extent possible, get out of the business of doing this sort of hosting. And indeed, very few individuals tend to request accounts, since there are now so many nicer, better ways of getting the benefits that community.haskell.org once was rare in providing. So what next? Well, we want to "end of life" most of community.haskell.org, but in as painless a way as possible. This means finding what few tracs, if any, are still active, and helping their owners migrate. Similarly for mailing lists. Of course we will find a way to continue to host their archives for historical purposes. Similarly, we will attempt to keep source repositories accessible for historical purposes, but would very much like to encourage owners to move to more well supported code hosting. One purpose that, until recently, was hard to serve elsewhere was in hosting of private darcs repositories with shared access -- such as academics might use to collaborate on a work in project. However, that capability is now also provided on http://hub.darcs.net. At this point, we can't think of anything in this regard that is not better provided elsewhere -- but if you can, please let us know. On webspace, it may be the case that a little more leniency is in order. For one, it is possible to provide restricted accounts that are able to control web-accessible files but have no other rights. For another, while many open source projects now host documentation through github pages or similar, and there are likewise many services for personal home pages, nonetheless it seems a nice thing to allow projects to host their resources on a system that is not under the control of a for-profit third party that, ultimately is responsible to its bottom line and not its users. But all this is open for discussion! Community.haskell.org was put together to serve the open source community of Haskell developers, and its direction needs to be determined based on feedback regarding current needs. What do you think? What would you like to see continued to be provided? What do you feel is less important? Are there other good hosted services that should be mentioned as alternatives? And, of course, are you interested in rolling up your sleeves to help with any of the changes discussed? This could mean simply helping out with sorting out the mailman and trac situation, inventorying the active elements and collaborating with their owners. Or, it could mean a more sustained technical involvement. Whatever you have to offer, we will likely have a use for it. As always, you can email admin@h.o or hop on the #haskell-infrastructure freenode channel to get involved directly.

Parallel Evaluation of the Typed Lambda Calculus

Planet Haskell - Fri, 02/27/2015 - 07:00
There is much written about the duality between strict-order (call-by-value) evalutaion for the lambda calculus and the normal-order (call-by-need) evaluation (or semantic equivently, lazy evaluation). In the simply typed lambda calculus, all evaluation eventually terminates, so both evaluation strategies result in the same values. However, when general recursion is added to the simply typed lambda calculus (via a fixpoint operator, for example) then evaluation of some expressions does not terminate. More expressions terminate with normal-order evaluation than with strict-order evaluation. In fact, if evaluation terminates in any order, then it terminates with normal-order evaluation. I would like to discuss the possibility of a third, even laxer evaluation strategy available for the typed lambda calculus that allows for even more expressions to terminate. I did just say that normal-order evaluation is, in some sense, a best possible evaluation order, so, in order to beat it, we will be adding more redexes that add the commuting conversions. The typed lambda calculus enjoys certain commuting conversions for case expressions that allow every elimination term to pass through the case expression. For example, the commuting conversion for the π₁ elimination term and the case experssion says that π₁(case e₀ of σ₁ x. e₁ | σ₂ y. e₂) converts to case e₀ of σ₁ x. π₁(e₁) | σ₂ y. π₁(e₂) These commuting conversions are required so that the subformula property holds. My understanding is that a corollary of this says that f(case e₀ of σ₁ x. e₁ | σ₂ y. e₂) and case e₀ of σ₁ x. f(e₁) | σ₂ y. f(e₂) are denotationally equivalent whenever f is a strict function. I would like to develop a version of the lambda calculus that allows these two expressions to denote the same value for any f. Call this, the unrestricted commuting conversion property. A lambda calculus with this property would necessarily be parallel and thus will require a parallel evaluation strategy. For example, the natural definition of or becomes the parallel-or operation. or x y := if x then True else y This definition has the usual short-circuit property that or True ⊥ is True where ⊥ is defined by ⊥ := fix id If we use the unrestricted commuting conversion property then we also have the that or ⊥ True is True: or ⊥ True = {definition of or} if ⊥ then True else True = {β-expansion} if ⊥ then const True ⟨⟩ else const True ⟨⟩ = {commuting} const True (if ⊥ then ⟨⟩ else ⟨⟩) = {β-reduction} True Hence or is parallel-or. Other parallel functions, such as the majority function, also follow from their natural definitions. maj x y z := if x then (or y z) else (and y z) In this case maj ⊥ True True is True. maj ⊥ True True = {definition of maj} if ⊥ then (or True True) else (and True True) = {evaluation of (or True True) and (and True True) if ⊥ then True else True = {commuting} True It is easy to verify that maj True ⊥ True and maj True True ⊥ are also both True. My big question is whether we can devise some nice operational semantics for the lambda calculus that will have the unrestricted commuting conversions property that I desire. Below I document my first attempt at such operational semantics, but, spoiler alert, it does not work. The usual rule for computing weak head normal form for the case expression case e₀ of σ₁ x. e₁ | σ₂ y. e₂ says to first convert e₀ to weak head normal form. If it is σ₁ e₀′ then return the weak head normal form of e₁[x ↦ e₀′]. If it is σ₂ e₀′ then return the weak head normal form of e₂[y ↦ e₀′]. In addition to this rule, I want to add another rule for computing the weak head normal form for the case expression. This alernative rules says that we compute the weak head normal forms of e₁ and e₂. If we get C₁ e₁′ and C₂ e₂′ respectively for introduction terms (a.k.a. constructors) C₁ and C₂, and if additionally C₁ = C₂ then return the following as a weak head normal form, C₁ (case e₀ of σ₁ x. e₁′ | σ₂ y. e₂′). If C₁ ≠ C₂ or if we get stuck on a neutral term (e.g. a varaible), then this rule does not apply. This new rule is in addition to the usual rule for case. Any implementation must run these two rules in parallel because it is possible that either rule (or both) can result in non-termination when recursivly computing weak head normal forms of sub-terms. I suppose that in case one has unlifted products then when computing the weak head normal form of a case expression having a product type or function type, one can immediately return ⟨case e₀ of σ₁ x. π₁ e₁ | σ₂ y. π₁ e₂, case e₀ of σ₁ x. π₂ e₁ | σ₂ y. π₂ e₂⟩ or λz. case e₀ of σ₁ x. e₁ z | σ₂ y. e₂ z This amended computation of weak head normal form seems to work for computing or and maj functions above so that they are non-strict in every argument, but there is another example where even this method of computing weak head normal form is not sufficent. Consider the functions that implements associativity for the sum type. assocL : A + (B + C) -> (A + B) + C assocL z := case z of σ₁ a. σ₁ (σ₁ a) | σ₂ bc. (case bc of σ₁ b. σ₁ (σ₂ b) | σ₂ c. σ₂ c) assocR : (A + B) + C -> A + (B + C) assocR z := case z of σ₁ ab. (case ab of σ₁ a. σ₁ a | σ₂ b. σ₂ (σ₁ b)) | σ₂ c. σ₂ (σ₂ c) Now let us use unrestricted commuting conversions to evaluate assocR (assocL (σ₂ ⊥)). assocR (assocL (σ₂ ⊥)) = { definition of assocL and case evaluation } assocR (case ⊥. σ₁ b. σ₁ (σ₂ b) | σ₂ c. σ₂ c) = { commuting conversion } case ⊥. σ₁ b. assocR (σ₁ (σ₂ b)) | σ₂ c. assocR (σ₂ c) = { definition of assocR and case evaluations } case ⊥. σ₁ b. σ₂ (σ₁ b) | σ₂ c. σ₂ (σ₂ c) = { commuting conversion } σ₂ (case ⊥. σ₁ b. σ₁ b | σ₂ c. σ₂ c) = { η-contraction for case } σ₂ ⊥ Even if η-contraction is not a reduction rule used for computation, it is still the case that t and case t. σ₁ b. σ₁ b | σ₂ c. σ₂ c should always be dentotationally equivalent. Anyhow, we see that by using commuting conversions that a weak head normal form of assocR (assocL (σ₂ ⊥)) should expose the σ₂ constructor. However, if you apply even my ammended computation of weak head normal form, it will not produce any constructor. What I find particularly surprising is the domain semantics of assocL and assocR. assocL seems to map σ₂ ⊥ to ⊥ because no constructor can be exposed. assocR maps ⊥ to ⊥. Therefore, according to the denotational semantics, the composition should map σ₂ ⊥ to ⊥, but as we saw, under parallel evaluation it does not. It would seem that the naive denotational semantics appears to not capture the semantics of parallel evaluation. The term case ⊥. σ₁ b. σ₁ (σ₂ b) | σ₂ c. σ₂ c seems to be more defined than ⊥, even though no constructor is available in the head position. Although my attempt at nice operational semantics failed, I am still confident some nice computation method exists. At the very least, I believe a rewriting system will work which has all the usual rewriting rules plus a few extra new redexes that says that an elimination term applied to the case expression commutes the elimination term into all of the branches, and another that says when all branches of a case expression contain the same introduction term, that introduction term is commuted to the outside of the case expression, and maybe also the rules I listed above for unlifted products. I conjecture this rewriting system is confluent and unrestricted commuting conversions are convertable (probably requiring η-conversions as well). Without proofs of my conjectures I am a little concerned that this all does not actually work out. There may be some bad interaction with fixpoints that I am not seeing. If this does all work out then shouldn’t I have heard about it by now?

Examples of monads in a dynamic language

Planet Haskell - Fri, 02/27/2015 - 07:00
In Monads in dynamic languages, I explained what the definition of a monad in a dynamic language should be and concluded that there’s nothing precluding them from existing. But I didn’t give an example either. So, in case you are still wondering whether non-trivial monads are possible in a dynamic language, here you go. I’ll implement a couple of simple monads — Reader and Maybe — with proofs. And all that will take place in the ultimate dynamic language — the (extensional) untyped lambda calculus. The definitions of the Reader and Maybe monads are not anything special; they are the same definitions as you would write in a typed language, except Maybe is Church-encoded. What I find fascinating about this is that despite the untyped language, which allows more things to go wrong than a typed one, the monad laws still hold. You can still write monadic code and reason about it in the untyped lambda calculus in the same way as you would do in Haskell. Reader return x = λr.x a >>= k = λr.k(ar)r Left identity return x >>= k { inline return } = λr.x >>= k { inline >>= } = λr.k((λr.x)r)r { β-reduce } = λr.kxr { η-reduce } = kx Right identity a >>= return { inline return } = a >>= λx.λr.x { inline >>= } = λr.(λx.λr.x)(ar)r { β-reduce } = λr.ar { η-reduce } = a Associativity a >>= f >>= g { inline 1st >>= } = λr.f(ar)r >>= g { inline 2nd >>= } = λr.g((λr.f(ar)r)r)r { β-reduce } = λr.g(f(ar)r)r a >>= (λx. f x >>= g) { inline 2nd >>= } = a >>= λx.λr.g((fx)r)r { inline 1st >>= } = λr.(λx.λr.g(fxr)r)(ar)r { β-reduce } = λr.g(f(ar)r)r Maybe return x = λj.λn.jx a >>= k = λj.λn.a(λx.kxjn)n Left identity return x >>= k { inline return } = λj.λn.jx >>= k { inline >>= } = λj.λn.(λj.λn.jx)(λx.kxjn)n { β-reduce } = λj.λn.kxjn { η-reduce } = kx Right identity a >>= return { inline return } = a >>= λx.λj.λn.jx { inline >>= } = λj.λn.a(λx.(λx.λj.λn.jx)xjn)n { β-reduce } = λj.λn.a(λx.jx)n { η-reduce } = λj.λn.ajn { η-reduce } = a Associativity a >>= f >>= g { inline 1st >>= } = (λj.λn.a(λx.fxjn)n) >>= g { inline 2nd >>= } = (λj.λn.(λj.λn.a(λx.fxjn)n)(λx.gxjn)n) { β-reduce } = λj.λn.a(λx.fx(λx.gxjn)n)n a >>= (λx. f x >>= g) { inline 2nd >>= } = a >>= (λx.λj.λn.fx(λx.gxjn)n) { inline 1st >>= } = λj.λn.a(λx.(λx.λj.λn.fx(λx.gxjn)n)xjn)n { β-reduce } = λj.λn.a(λx.fx(λx.gxjn)n)n 2015-02-22T20:00:00Zhttp://ro-che.info/articles/2015-02-22-examples-monads-dynamic-language

Free Monoids in Haskell

Planet Haskell - Fri, 02/27/2015 - 07:00
It is often stated that Foldable is effectively the toList class. However, this turns out to be wrong. The real fundamental member of Foldable is foldMap (which should look suspiciously like traverse, incidentally). To understand exactly why this is, it helps to understand another surprising fact: lists are not free monoids in Haskell. This latter fact [...]

Turing tarpits in Rust's macro system

Planet Haskell - Fri, 02/27/2015 - 07:00
Bitwise Cyclic Tag is an extremely simple automaton slash programming language. BCT uses a program string and a data string, each made of bits. The program string is interpreted as if it were infinite, by looping back around to the first bit. The program consists of commands executed in order. There is a single one-bit command: 0: Delete the left-most data bit. and a single two-bit command: 1 x: If the left-most data bit is 1, copy bit x to the right of the data string. We halt if ever the data string is empty. Remarkably, this is enough to do universal computation. Implementing it in Rust's macro system gives a proof (probably not the first one) that Rust's macro system is Turing-complete, aside from the recursion limit imposed by the compiler.#![feature(trace_macros)]macro_rules! bct { // cmd 0: d ... => ... (0, $($ps:tt),* ; $_d:tt) => (bct!($($ps),*, 0 ; )); (0, $($ps:tt),* ; $_d:tt, $($ds:tt),*) => (bct!($($ps),*, 0 ; $($ds),*)); // cmd 1p: 1 ... => 1 ... p (1, $p:tt, $($ps:tt),* ; 1) => (bct!($($ps),*, 1, $p ; 1, $p)); (1, $p:tt, $($ps:tt),* ; 1, $($ds:tt),*) => (bct!($($ps),*, 1, $p ; 1, $($ds),*, $p)); // cmd 1p: 0 ... => 0 ... (1, $p:tt, $($ps:tt),* ; $($ds:tt),*) => (bct!($($ps),*, 1, $p ; $($ds),*)); // halt on empty data string ( $($ps:tt),* ; ) => (());}fn main() { trace_macros!(true); bct!(0, 0, 1, 1, 1 ; 1, 0, 1);} This produces the following compiler output: bct! { 0 , 0 , 1 , 1 , 1 ; 1 , 0 , 1 }bct! { 0 , 1 , 1 , 1 , 0 ; 0 , 1 }bct! { 1 , 1 , 1 , 0 , 0 ; 1 }bct! { 1 , 0 , 0 , 1 , 1 ; 1 , 1 }bct! { 0 , 1 , 1 , 1 , 0 ; 1 , 1 , 0 }bct! { 1 , 1 , 1 , 0 , 0 ; 1 , 0 }bct! { 1 , 0 , 0 , 1 , 1 ; 1 , 0 , 1 }bct! { 0 , 1 , 1 , 1 , 0 ; 1 , 0 , 1 , 0 }bct! { 1 , 1 , 1 , 0 , 0 ; 0 , 1 , 0 }bct! { 1 , 0 , 0 , 1 , 1 ; 0 , 1 , 0 }bct! { 0 , 1 , 1 , 1 , 0 ; 0 , 1 , 0 }bct! { 1 , 1 , 1 , 0 , 0 ; 1 , 0 }bct! { 1 , 0 , 0 , 1 , 1 ; 1 , 0 , 1 }bct! { 0 , 1 , 1 , 1 , 0 ; 1 , 0 , 1 , 0 }bct! { 1 , 1 , 1 , 0 , 0 ; 0 , 1 , 0 }bct! { 1 , 0 , 0 , 1 , 1 ; 0 , 1 , 0 }bct! { 0 , 1 , 1 , 1 , 0 ; 0 , 1 , 0 }...bct.rs:19:13: 19:45 error: recursion limit reached while expanding the macro `bct`bct.rs:19 => (bct!($($ps),*, 1, $p ; $($ds),*)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ You can try it online, as well. Notes about the macroI would much rather drop the commas and write// cmd 0: d ... => ...(0 $($ps:tt)* ; $_d:tt $($ds:tt)*) => (bct!($($ps)* 0 ; $($ds)*));// cmd 1p: 1 ... => 1 ... p(1 $p:tt $($ps:tt)* ; 1 $($ds:tt)*) => (bct!($($ps)* 1 $p ; 1 $($ds)* $p));// cmd 1p: 0 ... => 0 ...(1 $p:tt $($ps:tt)* ; $($ds:tt)*) => (bct!($($ps)* 1 $p ; $($ds)*)); but this runs into the macro future-proofing rules. If we're required to have commas, then it's at least nice to handle them uniformly, e.g.// cmd 0: d ... => ...(0 $(, $ps:tt)* ; $_d:tt $(, $ds:tt)*) => (bct!($($ps),*, 0 ; $($ds),*));// cmd 1p: 1 ... => 1 ... p(1, $p:tt $(, $ps:tt)* ; $($ds:tt),*) => (bct!($($ps),*, 1, $p ; 1 $(, $ds)*, $p));// cmd 1p: 0 ... => 0 ...(1, $p:tt $(, $ps:tt)* ; $($ds:tt),*) => (bct!($($ps),*, 1, $p ; $($ds),*)); But this too is disallowed. An $x:tt variable cannot be followed by a repetition $(...)*, even though it's (I believe) harmless. There is an open RFC about this issue. For now I have to handle the "one" and "more than one" cases separately, which is annoying. In general, I don't think macro_rules! is a good language for arbitrary computation. This experiment shows the hassle involved in implementing one of the simplest known "arbitrary computations". Rather, macro_rules! is good at expressing patterns of code reuse that don't require elaborate compile-time processing. It does so in a way that's declarative, hygienic, and high-level. However, there is a big middle ground of non-elaborate, but non-trivial computations. macro_rules! is hardly ideal for that, but procedural macros have problems of their own. Indeed, the bct! macro is an extreme case of a pattern I've found useful in the real world. The idea is that every recursive invocation of a macro gives you another opportunity to pattern-match the arguments. Some of html5ever's macrosdo this, for example.

OPLSS 2015 Announcement

Planet Haskell - Fri, 02/27/2015 - 07:00
We are pleased to announce the preliminary program for the 14th Annual Oregon Programming Languages Summer School (OPLSS) to be held June 15th to 27th, 2015 at the University of Oregon in Eugene. This year’s program is titled Types, Logic, Semantics, and Verification and features the following speakers: Amal Ahmed, Northeastern University Nick Benton, Microsoft Cambridge Research […]

The New Haskell Homepage

Planet Haskell - Fri, 02/27/2015 - 07:00
Roughly 3/4 of a year after Chris Done first proposed his redesign, we finally went live with the new https://haskell.org homepage. Much of the intermediate time was spent on cleanup of the introductory text, as well as adding features and tweaking designs. There was also a very length process of setting everything up to ensure that the "try it" feature could be deployed and well supported. Finally, we had to do all the work to ensure that both the wiki remained hosted well somewhere else with proper rewrite rules, and also that the non-wiki content hosted under haskell.org (things like /ghc, /platform, /hoogle, etc) continued to work. Some of the more publicly visible elements of this are the move of mailinglist hosting to http://mail.haskell.org and of the wiki content to http://wiki.haskell.org. When we did finally go live, we got great feedback on reddit and made #1 on hacker news. There were also a lot of things we missed, and which those threads pointed out -- in particular we didn't pay enough attention to the content of various pages. The "documentation" and "community" section needed lots of cleanup and expansion. Furthermore, the "downloads" page didn't point to the platform. Whoops! (It is to be noted that recommendation of the platform or not is an issue of some discussion now, for example on this active reddit thread. ) We think that we have fixed most of the issues raised. But there are still issues! The code sample for primes doesn't work in the repl. We have an active github ticket discussing the best way to fix this -- either change the sample, change the repl, or both. This is one of a number of issues under discussion on the github tracker. We still want feedback, we still want patches, and there's still plenty to be made more awesome. For example: should we have some sort of integrated search? How? Where? The github infrastructure makes taking patches and tracking these issues easy. The repo now has 27 contributors, and we're look forward to more. One balance that has been tricky to strike has been in making the site maximally useful for new users just looking to learn and explore Haskell, while also providing access to the wide range of resources and entry points that the old wiki did. One advantage that we have now is that the wiki is still around, and is still great. Freed from needing to also serve as a default haskell landing page, the wiki can hopefully grow to be an even better resource. It needs volunteers to chip in and curate the content. And ideally it needs a new front page that highlights the things that you _won't_ find on the main homepage, instead of the things you now can. One place we could use more help is in wiki admins, if anyone wants to volunteer. We need people to curate the news and events sections, to fight spam, to manage plugins and investigate adding new features, to update and clean up the stylesheets, and to manage new user accounts. If you're interested, please get in touch at admin@h.o. All the resources we have today are the result of our open-source culture, that believes in equal parts in education, innovation, and sharing. They stem from people who enjoy Haskell wanting to contribute back to others -- wanting to make more accessible the knowledge and tools they've struggled to master, and wanting to help us make even better and more powerful tools. There's always more infrastructure work to be done, and there are always more ways to get involved. In a forthcoming blog, I'll write further about some of the other pressing issues we hope to tackle, and where interested parties can chip in.

Ad-hoc polymorphism, type families, ambiguous types, and Javascript

Planet Haskell - Fri, 02/27/2015 - 07:00
Imagine a language, where all we have are strings and numbers, and where + is a built-in function that can either add numbers or concatenate strings. Consider the following code: a + b What are the types of a, b and +? Using a Haskell-like type signature we can say that + has either of […]

GHC Weekly News - 2015/02/17

Planet Haskell - Fri, 02/27/2015 - 07:00
Hi *, It's time for the GHC weekly news. It's been particularly quiet the past week still, and the ghc-7.10 branch has been quite quiet. So the notes are relatively short this week. This week, GHC HQ met up to discuss some new stuff: Most of the discussion this week was about particular bugs for GHC 7.10, including getting some tickets fixed like #10058, #8276, and #9968. Since the 7.10 release is getting close, we'll be starting up a new status page about GHC 7.12 (and probably get started writing things for the HCAR report in May) and what our plans are soon. Watch this space! As usual, we've had a healthy amount of random assorted chatter on the mailing lists: Simon Peyton Jones opened the polls for the GHC 7.10 Prelude changes this week, following the discussions and delay of the 7.10 release, as to what the new Prelude should look like. Simon's email has all the details - and voting ends next week! ​https://mail.haskell.org/pipermail/ghc-devs/2015-February/008290.html Hengchu Zhang popped up on the list as an excited new contributor, and wanted to know about the process strategy for fixing a bug. Joachim was quick to respond with help - and welcome Hengchu! ​https://mail.haskell.org/pipermail/ghc-devs/2015-February/008324.html Francesco Mazzoli has a question about Template Haskell, specifically the semantics of reification since 7.8. In short, the semantics of reify changed in 7.8, and Francesco was wondering if the old behavior should be supported. But while it could be, it discussion seems to indicate that perhaps it shouldn't. ​https://mail.haskell.org/pipermail/ghc-devs/2015-February/008327.html Darren Grant popped up on the list and asked: "I notice there are a series of related long-standing issues subject to particular cygwin64 quirks, and I'd like to offer time to help resolve these if possible". Darren wanted some pointers, and they were given! GHC on Windows crucially still needs dedicated developers; the email sparked up a bunch of chatter amongst Windows developers on the list as well, so hopefully life is coming back to it. ​https://mail.haskell.org/pipermail/ghc-devs/2015-February/008333.html Jan Stolarek hit a confusing error when trying to install vector with HEAD and asked for help. The quick reply: you need support for the new deepseq package, which hasn't been merged upstream yet. ​https://mail.haskell.org/pipermail/ghc-devs/2015-February/008349.html Francesco Mazzoli had a simple feature request: could we have anonymous FFI calls that don't require a name? ​https://mail.haskell.org/pipermail/ghc-devs/2015-February/008300.html Some noteworthy commits that went into ghc.git in the past week include: Commit e22282e5d2a370395535df4051bdeb8213106d1c - GHC 7.12 will no longer ship with the Typeable.h header file. Commit 5d5abdca31cdb4db5303999778fa25c4a1371084 - The LLVM backend has been overhauled and updated to use LLVM 3.6 exclusively. Closed tickets the past week include: #10047, #10082, #10019, #10007, #9930, #10085, #10080, #9266, #10095, and #3649. 2015-02-18T04:06:29Z thoughtpolice

nub considered harmful

Planet Haskell - Fri, 02/27/2015 - 07:00
Summary: Don't use nub. A much faster alternative is nubOrd from the extra package.The Haskell Data.List module contains the function nub, which removes duplicate elements. As an example:nub [1,2,1,3] == [1,2,3]The function nub has the type Eq a => [a] -> [a]. The complexity of take i $ nub xs is O(length xs * i). Assuming all elements are distinct and you want them all, that is O(length xs ^ 2). If we only have an Eq instance, that's the best complexity we can achieve. The reason is that given a list as ++ [b], to check if b should be in the output requires checking b for equality against nub as, which requires a linear scan. Since checking each element requires a linear scan, we end up with a quadratic complexity.However, if we have an Ord instance (as we usually do), we have a complexity of O(length xs * log i) - a function that grows significantly slower. The reason is that we can build a balanced binary-tree for the previous elements, and check each new element in log time. Does that make a difference in practice? Yes. As the graph below shows, by the time we get to 10,000 elements, nub is 70 times slower. Even at 1,000 elements nub is 8 times slower.The fact nub is dangerous isn't new information, and I even suggested changing the base library in 2007. Currently there seems to be a nub hit squad, including Niklas Hambüchen, who go around raising tickets against various projects suggesting they avoid nub. To make that easier, I've added nubOrd to my extra package, in the Data.List.Extra module. The function nubOrd has exactly the same semantics as nub (both strictness properties and ordering), but is asymptotically faster, so is almost a drop-in replacement (just the additional Ord context).For the curious, the above graph was generated in Excel, with the code below. I expect the spikes in nub correspond to garbage collection, or just random machine fluctuations.import Control.Exceptionimport Data.List.Extraimport Control.Monadimport System.Time.Extrabenchmark xs = do n <- evaluate $ length xs (t1,_) <- duration $ evaluate $ length $ nub xs (t2,_) <- duration $ evaluate $ length $ nubOrd xs putStrLn $ show n ++ "," ++ show t1 ++ "," ++ show t2main = do forM_ [0,100..10000] $ \i -> benchmark $ replicate i 1 forM_ [0,100..10000] $ \i -> benchmark [1..i] 2015-02-17T21:00:00Z Neil Mitchell noreply@blogger.com

Primitive Haskell

Planet Haskell - Fri, 02/27/2015 - 07:00
I originally wrote this content as a chapter of Mezzo Haskell. I'm going to be starting up a similar effort to Mezzo Haskell in the next few days, and I wanted to get a little more attention on this content to get feedback on style and teaching approach. I'll be discussing that new initiative on the Commercial Haskell mailing list.The point of this chapter is to help you peel back some of the layers of abstraction in Haskell coding, with the goal of understanding things like primitive operations, evaluation order, and mutation. Some concepts covered here are generally "common knowledge" in the community, while others are less well understood. The goal is to cover the entire topic in a cohesive manner. If a specific section seems like it's not covering anything you don't already know, skim through it and move on to the next one.While this chapter is called "Primitive Haskell," the topics are very much GHC-specific. I avoided calling it "Primitive GHC" for fear of people assuming it was about the internals of GHC itself. To be clear: these topics apply to anyone compiling their Haskell code using the GHC compiler.Note that we will not be fully covering all topics here. There is a "further reading" section at the end of this chapter with links for more details.Let's do additionLet's start with a really simple question: tell me how GHC deals with the expression 1 + 2. What actually happens inside GHC? Well, that's a bit of a trick question, since the expression is polymorphic. Let's instead use the more concrete expression 1 + 2 :: Int.The + operator is actually a method of the Num type class, so we need to look at the Num Int instance:instance Num Int where I# x + I# y = I# (x +# y)Huh... well that looks somewhat magical. Now we need to understand both the I# constructor and the +# operator (and what's with the hashes all of a sudden?). If we do a Hoogle search, we can easily find the relevant docs, which leads us to the following definition:data Int = I# Int#So our first lesson: the Int data type you've been using since you first started with Haskell isn't magical at all, it's defined just like any other algebraic data type... except for those hashes. We can also search for +#, and end up at some documentation giving the type:+# :: Int# -> Int# -> Int#Now that we know all the types involved, go back and look at the Num instance I quoted above, and make sure you feel comfortable that all the types add up (no pun intended). Hurrah, we now understand exactly how addition of Ints works. Right?Well, not so fast. The Haddocks for +# have a very convenient source link... which (apparently due to a Haddock bug) doesn't actually work. However, it's easy enough to find the correct hyperlinked source. And now we see the implementation of +#, which is:infixl 6 +# (+#) :: Int# -> Int# -> Int# (+#) = let x = x in xThat doesn't look like addition, does it? In fact, let x = x in x is another way of saying bottom, or undefined, or infinite loop. We have now officially entered the world of primops.primopsprimops, short for primary operations, are core pieces of functionality provided by GHC itself. They are the magical boundary between "things we do in Haskell itself" and "things which our implementation provides." This division is actually quite elegant; as we already explored, the standard + operator and Int data type you're used to are actually themselves defined in normal Haskell code, which provides many benefits: you get standard type class support, laziness, etc. We'll explore some of that in more detail later.Look at the implementation of other functions in GHC.Prim; they're all defined as let x = x in x. When GHC reaches a call to one of these primops, it automatically replaces it with the real implementation for you, which will be some assembly code, LLVM code, or something similar.Why do all of these functions end in a #? That's called the magic hash (enabled by the MagicHash language extension), and it is a convention to distinguish boxed and unboxed types and operations. Which, of course, brings us to our next topic.Unboxed typesThe I# constructor is actually just a normal data constructor in Haskell, which happens to end with a magic hash. However, Int# is not a normal Haskell data type. In GHC.Prim, we can see that it's implementation is:data Int#Which, like everything else in GHC.Prim is really a lie. In fact, it's provided by the implementation, and is in fact a normal long int from C (32-bit or 64-bit, depending on architecture). We can see something even funnier about it in GHCi:> :k Int Int :: * > :k Int# Int# :: #That's right, Int# has a different kind than normal Haskell datatypes: #. To quote the GHC docs:Most types in GHC are boxed, which means that values of that type are represented by a pointer to a heap object. The representation of a Haskell Int, for example, is a two-word heap object. An unboxed type, however, is represented by the value itself, no pointers or heap allocation are involved.See those docs for more information on distinctions between boxed and unboxed types. It is vital to understand those differences when working with unboxed values. However, we're not going to go into those details now. Instead, let's sum up what we've learnt so far:Int addition is just normal Haskell code in a typeclassInt itself is a normal Haskell datatypeGHC provides Int# and +# as an unboxed long int and addition on that type, respectively. This is exported by GHC.Prim, but the real implementation is "inside" GHC.An Int contains an Int#, which is an unboxed type.Addition of Ints takes advantage of the +# primop.More additionAlright, we understand basic addition! Let's make things a bit more complicated. Consider the program:main = do let x = 1 + 2 y = 3 + 4 print x print yWe know for certain that the program will first print 3, and then print 7. But let me ask you a different question. Which operation will GHC perform first: 1 + 2 or 3 + 4? If you guessed 1 + 2, you're probably right, but not necessarily! Thanks to referential transparency, GHC is fully within its rights to rearrange evaluation of those expressions and add 3 + 4 before 1 + 2. Since neither expression depends on the result of the other, we know that it is irrelevant which evaluation occurs first.Note: This is covered in much more detail on the GHC wiki's evaluation order and state tokens page.That begs the question: if GHC is free to rearrange evaluation like that, how could I say in the previous paragraph that the program will always print 3 before printing 7? After all, it doesn't appear that print y uses the result of print x at all, so we not rearrange the calls? To answer that, we again need to unwrap some layers of abstraction. First, let's evaluate and inline x and y and get rid of the do-notation sugar. We end up with the program:main = print 3 >> print 7We know that print 3 and print 7 each have type IO (), so the >> operator being used comes from the Monad IO instance. Before we can understand that, though, we need to look at the definition of IO itselfnewtype IO a = IO (State# RealWorld -> (# State# RealWorld, a #))We have a few things to understand about this line. Firstly, State# and RealWorld. For now, just pretend like they are a single type; we'll see when we get to ST why State# has a type parameter.The other thing to understand is that (# ... #) syntax. That's an unboxed tuple, and it's a way of returning multiple values from a function. Unlike a normal, boxed tuple, unboxed tuples involve no extra allocation and create no thunks.So IO takes a real world state, and gives you back a real world state and some value. And that right there is how we model side effects and mutation in a referentially transparent language. You may have heard the description of IO as "taking the world and giving you a new one back." What we're doing here is threading a specific state token through a series of function calls. By creating a dependency on the result of a previous function, we are able to ensure evaluation order, yet still remain purely functional.Let's see this in action, by coming back to our example from above. We're now ready to look at the Monad IO instance:instance Monad IO where (>>) = thenIO thenIO :: IO a -> IO b -> IO b thenIO (IO m) k = IO $ \ s -> case m s of (# new_s, _ #) -> unIO k new_s unIO :: IO a -> (State# RealWorld -> (# State# RealWorld, a #)) unIO (IO a) = a(Yes, I changed things a bit to make them easier to understand. As an exercise, compare that this version is in fact equivalent to what is actually defined in GHC.Base.)Let's inline these definitions into print 3 >> print 7:main = IO $ \s0 -> case unIO (print 3) s0 of (# s1, res1 #) -> unIO (print 7) s1Notice how, even though we ignore the result of print 3 (the res1 value), we still depend on the new state token s1 when we evaluate print 7, which forces the order of evaluation to first evaluate print 3 and then evaluate print 7.If you look through GHC.Prim, you'll see that a number of primitive operations are defined in terms of State# RealWorld or State# s, which allows us to force evaluation order.Exercise: implement a function getMaskingState :: IO Int using the getMaskingState# primop and the IO data constructor.The ST monadLet's compare the definitions of the IO and ST types:newtype IO a = IO (State# RealWorld -> (# State# RealWorld, a #)) newtype ST s a = ST (State# s -> (# State# s, a #))Well that looks oddly similar. Said more precisely, IO is isomorphic to ST RealWorld. ST works under the exact same principles as IO for threading state through, which is why we're able to have things like mutable references in the ST monad.By using an uninstantiated s value, we can ensure that we aren't "cheating" and running arbitrary IO actions inside an ST action. Instead, we just have "local state" modifications, for some definition of local state. The details of using ST correctly and the Rank2Types approach to runST are interesting, but beyond the scope of this chapter, so we'll stop discussing them here.Since ST RealWorld is isomorphic to IO, we should be able to convert between the two of them. base does in fact provide the stToIO function.Exercise: write a pair of functions to convert between IO a and ST RealWorld a.Exercise: GHC.Prim has a section on mutable variables, which forms the basis on IORef and STRef. Provide a new implementation of STRef, including newSTRef, readSTRef, and writeSTRef`.PrimMonadIt's a bit unfortunate that we have to have two completely separate sets of APIs: one for IO and another for ST. One common example of this is IORef and STRef, but- as we'll see at the end of this section- there are plenty of operations that we'd like to be able to generalize.This is where PrimMonad, from the primitive package, comes into play. Let's look at its definition:-- | Class of primitive state-transformer monads class Monad m => PrimMonad m where -- | State token type type PrimState m -- | Execute a primitive operation primitive :: (State# (PrimState m) -> (# State# (PrimState m), a #)) -> m aNote: I have not included the internal method, since it will likely be removed. In fact, at the time you're reading this, it may already be gone!PrimState is an associated type giving the type of the state token. For IO, that's RealWorld, and for ST s, it's s. primitive gives a way to lift the internal implementation of both IO and ST to the monad under question.Exercise: Write implementations of the PrimMonad IO and PrimMonad (ST s) instances, and compare against the real ones.The primitive package provides a number of wrappers around types and functions from GHC.Prim and generalizes them to both IO and ST via the PrimMonad type class.Exercise: Extend your previous STRef implementation to work in any PrimMonad. After you're done, you may want to have a look at Data.Primitive.MutVar.The vector package builds on top of the primitive package to provide mutable vectors that can be used from both IO and ST. This chapter is not a tutorial on the vector package, so we won't go into any more details now. However, if you're curious, please look through the Data.Vector.Generic.Mutable docs.ReaderIO monadTo tie this off, we're going to implement a ReaderIO type. This will flatten together the implementations of ReaderT and IO. Generally speaking, there's no advantage to doing this: GHC should always be smart enough to generate the same code for this and for ReaderT r IO (and in my benchmarks, they perform identically). But it's a good way to test that you understand the details here.You may want to try implementing this yourself before looking at the implementation below.{-# LANGUAGE FlexibleInstances #-} {-# LANGUAGE MagicHash #-} {-# LANGUAGE MultiParamTypeClasses #-} {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE UnboxedTuples #-} import Control.Applicative (Applicative (..)) import Control.Monad (ap, liftM) import Control.Monad.IO.Class (MonadIO (..)) import Control.Monad.Primitive (PrimMonad (..)) import Control.Monad.Reader.Class (MonadReader (..)) import GHC.Base (IO (..)) import GHC.Prim (RealWorld, State#) -- | Behaves like a @ReaderT r IO a@. newtype ReaderIO r a = ReaderIO (r -> State# RealWorld -> (# State# RealWorld, a #)) -- standard implementations... instance Functor (ReaderIO r) where fmap = liftM instance Applicative (ReaderIO r) where pure = return (<*>) = ap instance Monad (ReaderIO r) where return x = ReaderIO $ \_ s -> (# s, x #) ReaderIO f >>= g = ReaderIO $ \r s0 -> case f r s0 of (# s1, x #) -> let ReaderIO g' = g x in g' r s1 instance MonadReader r (ReaderIO r) where ask = ReaderIO $ \r s -> (# s, r #) local f (ReaderIO m) = ReaderIO $ \r s -> m (f r) s instance MonadIO (ReaderIO r) where liftIO (IO f) = ReaderIO $ \_ s -> f s instance PrimMonad (ReaderIO r) where type PrimState (ReaderIO r) = RealWorld primitive f = ReaderIO $ \_ s -> f s -- Cannot properly define internal, since there's no way to express a -- computation that requires an @r@ input value as one that doesn't. This -- limitation of @PrimMonad@ is being addressed: -- -- https://github.com/haskell/primitive/pull/19 internal (ReaderIO f) = f (error "PrimMonad.internal: environment evaluated")Exercise: Modify the ReaderIO monad to instead be a ReaderST monad, and take an s parameter for the specific state token.Further readingGHC docs on primitivesGHC Wiki on PrimOpsEvaluation order and state tokens

Thoughts on phylogenetic trees

Planet Haskell - Fri, 02/27/2015 - 07:00
An important aspect of studying evolution is the construction of phylogenetic trees, graphically representing the relationship between current and historic species. These trees are usually calculated based on similarities and differences between genetic material of current species, and one particular challenge is that the topology of the resulting trees depend on the selection of genes used to construct them. Quite often, the species tree based on one set of genes differ substantially from the tree based on another set of genes. The phylogenetic tree is usually presented as a simple tree of species. The end points of brances at the bottom of the tree (leaves) represent current species, and branching points higher up (internal nodes) represent the most recent common ancestor, or MRCA, for the species below it. A very simple example could look something like this: Evolution as a tree of species Here you have two current species, and you can trace back their lineage to a MRCA, and further back to some ancient origin. Varying colors indicate that gradual change along the branches has introduced differences, and that the current species now have diverged from each other, and their origin. This representation has the advantage of being nice and simple, and the disadvantage of being misleading. For instance, one might get the impression that a species is a well-defined concept, and ask questions like: when the first member of a species diverged from its ancestor species, how did it find a mate? Species are populations But we are talking about species here - that is, not individuals but populations of individuals. So a more accurate representation might look like this: Evolution as a tree of populations Circles now represent individuals, and it should perhaps be clearer that there is no such thing as the “first” of anything. At the separation point, there is no difference between the two populations, and it is only after a long period of separation that differences can arise. (Of course, if there are selective pressure favoring specific properties - perhaps redness is very disadvantageous for species B, for instance - this change will be much quicker. Look at how quickly we have gotten very different breeds of dogs by keeping populations artificially separate, and selecting for specific properties.) The so-called “speciation” is nothing more than a population being split into two separate parts. Typically, this can be geographically - a few animals being carried to Madagascar on pieces of driftwood - but anything that prevents members of one branch from mating with members of the other one will suffice. At he outset, the two branches are just indistinguishable subpopulations of the same species, but if the process goes on long enough, differences between the two populations can become large enough that they can no longer interbreed, and we can consider them different species. In practice, such a separation is often not complete, some individuals can stray between the groups. In that case, speciation is less likely to happen, since the property of being unable to breed with the other group represents a reproductive disadvantage, and it would therefore be selected against. In other words, if your neighbor is able to have children with more members of the population than you, his chances of having children are better than yours. Your genes get the short end of the stick. Kind of obvious, no? Populations consist of genes But we can also view evolution as working on populations, not of individuals, but of individual genes. This is illustrated in the next picture: Evolution as a tree of gene populations The colored circles now represent genes, and an individual is here just a sample from the population of genes - illustrated by circling three gene circles. (Note that by “gene”, we here mean an abstract unit of inheritance. In other fields of biology, the word might be synonymous with a genetic region that codes for a protein, or is transcribed to (possibly non-coding) RNA.) Here, we see that although the genes themselves do not change (in reality they are subject to mutations), the availability of the different genes vary over time, and some might disappear from one of the branches entirely - like red genes from species B here. This kind of genetic drift can still cause distinct changes in individuals. Ancestry of individual genes Each individual typically gets half its genes from each parent, one fourth from each grandparent, and so on, so after a few generations, all genes come from essentially different ancestors. This means you can calculate the MRCA for each gene individually, and this is exactly what has been done to estimate the age our “mitochondrial Eve” and “Y-chromosomal Adam”. Here is the example lineage for the green gene: Lineage and MRCA for the green gene We see that the green-gene MRCA is much older than the speciation event. In addition, each gene has its unique history. This means that when we try to compute the MRCA, different genes will give different answers, and it can be difficult to construct a sensible consensus. For example, bits of our genome appear to come from the Neanderthal, and those bits will have a MRCA that predates the time point where Neanderthal branched from H. sapiens (possibly 1-2 million years ago). (Interestingly, “Adam” and “Eve” are both estimated to have lived in the neighborhood of 200000 years ago. This means that although 20% of the Neanderthal genome is claimed to have survived, all Neanderthal mitochondria and Y-chromosomes have been eradicated.) 2015-02-15T20:00:00Z
Syndicate content