Functional Programming

A taste of Rust (yum) for C/C++ programmers

Planet Haskell - Thu, 10/30/2014 - 06:00
If, like me, you've been frustrated with the status quo in systems languages, this article will give you a taste of why Rust is so exciting. In a tiny amount of code, it shows a lot of ways that Rust really kicks ass compared to C and C++. It's not just safe and fast, it's a lot more convenient. Web browsers do string interningto condense the strings that make up the Web, such as tag and attribute names, into small values that can be compared quickly. I recently added event logging support to Servo's string interner. This will allow us to record traces from real websites, which we can use to guide further optimizations. Here are the events we can log:#[deriving(Show)]pub enum Event { Intern(u64), Insert(u64, String), Remove(u64),} Interned strings have a 64-bit ID, which is recorded in every event. The Stringwe store for "insert" events is like C++'s std::string; it points to a buffer in the heap, and it owns that buffer. This enum is a bit fancier than a C enum, but its representation in memory is no more complex than a C struct. There's a tag for the three alternatives, a 64-bit ID, and a few fields that make up the String. When we pass or return an Event by value, it's at worst a memcpyof a few dozen bytes. There's no implicit heap allocation, garbage collection, or anything like that. We didn't define a way to copy an event; this means the String buffer always has a unique owner who is responsible for freeing it. The deriving(Show) attribute tells the compiler to auto-generatea text representation, so we can print an Eventjust as easily as a built-in type. Next we declare a global vector of events, protected by a mutex:lazy_static! { pub static ref LOG: Mutex> = Mutex::new(Vec::with_capacity(50_000));} lazy_static! will initialize both of them when LOG is first used. Like String, the Vec is a growable buffer. We won't turn on event logging in release builds, so it's fine to pre-allocate space for 50,000 events. (You can put underscores anywhere in a integer literal to improve readability.) lazy_static!, Mutex, and Vec are all implemented in Rust using gnarly low-level code. But the amazing thing is that all three expose a safe interface. It's simply not possible to use the variable before it's initialized, or to read the value the Mutex protects without locking it, or to modify the vector while iterating over it. The worst you can do is deadlock. And Rust considers that pretty bad, still, which is why it discourages global state. But it's clearly what we need here. Rust takes a pragmatic approach to safety. You can always write the unsafe keywordand then use the same pointer tricks you'd use in C. But you don't need to be quite so guarded when writing the other 95% of your code. I want a language that assumes I'm brilliant but distracted :) Rust catches these mistakes at compile time, and produces the same code you'd see with equivalent constructs in C++. For a more in-depth comparison, see Ruud van Asseldonk's excellent series of articlesabout porting a spectral path tracer from C++ to Rust. The Rust code performs basically the same as Clang / GCC / MSVC on the same platform. Not surprising, because Rust uses LLVMand benefits from the same backend optimizations as Clang. lazy_static! is not a built-in language feature; it's a macro provided by a third-party library. Since the library uses Cargo, I can include it in my project by adding [dependencies.lazy_static]git = "https://github.com/Kimundi/lazy-static.rs" to Cargo.toml and then adding#[phase(plugin)]extern crate lazy_static; to src/lib.rs. Cargo will automatically fetch and build all dependencies. Code reuse becomes no harder than in your favorite scripting language. Finally, we define a function that pushes a new event onto the vector:pub fn log(e: Event) { LOG.lock().push(e);} LOG.lock() produces an RAII handle that will automatically unlock the mutex when it falls out of scope. In C++ I always hesitate to use temporaries like this because if they're destroyed too soon, my program will segfault or worse. Rust has compile-time lifetime checking, so I can do things that would be reckless in C++. If you scroll up you'll see a lot of prose and not a lot of code. That's because I got a huge amount of functionality for free. Here's the logging module again:#[deriving(Show)]pub enum Event { Intern(u64), Insert(u64, String), Remove(u64),}lazy_static! { pub static ref LOG: Mutex> = Mutex::new(Vec::with_capacity(50_000));}pub fn log(e: Event) { LOG.lock().push(e);} This goes in src/event.rsand we include it from src/lib.rs.#[cfg(feature = "log-events")]pub mod event; The cfg attributeis how Rust does conditional compilation. Another project can specify [dependencies.string_cache]git = "https://github.com/servo/string-cache"features = ["log-events"] and add code to dump the log:for e in string_cache::event::LOG.lock().iter() { println!("{}", e);} Any project which doesn't opt in to log-eventswill see zero impact from any of this. If you'd like to learn Rust, the Guide is a good place to start. We're getting close to 1.0and the important concepts have been stable for a while, but the details of syntax and libraries are still in flux. It's not too early to learn, but it might be too early to maintain a large library. By the way, here are the events generated by interning the three strings foobarbaz foo blockquote: Insert(0x7f1daa023090, foobarbaz)Intern(0x7f1daa023090)Intern(0x6f6f6631)Intern(0xb00000002) There are three different kinds of IDs, indicated by the least significant bits. The first is a pointer into a standard interning table, which is protected by a mutex. The other two are created without synchronization, which improves parallelism between parser threads. In UTF-8, the string foois smaller than a 64-bit pointer, so we store the characters directly. blockquote is too big for that, but it corresponds to a well-known HTML tag. 0xb is the index of blockquote in a static listof strings that are common on the Web. Static atoms can also be used in pattern matching, and LLVM's optimizations for C's switch statements will apply.

Announcing: yesod-gitrepo

Planet Haskell - Thu, 10/30/2014 - 06:00
I'm happy to announce the first release of a new package, yesod-gitrepo. This package encapsulates a pattern I've used a number of times, namely: loading and refreshing content from a Git repository. Below is the current contents of the README.md file.This code is currently being used in production, and should be pretty stable. That said, it has not received a huge amount of battle testing yet. So please due test corner cases before shipping it in production for your site.yesod-gitrepo provides a means of embedding content from a Git repository inside a Yesod application. The typical workflow is:Use gitRepo to specify a repository and branch you want to work with.Provide a function that will perform some processing on the cloned repository.Use grContent in your Handler functions to access this parsed data.Embed the GitRepo as a subsite that can be used to force a refresh of the data.Set up a commit handler that pings that URL. On Github, this would be a webhook.This is likely easiest to understand with a concrete example, so let's go meta: here's an application that will serve this very README.md file. We'll start off with language extensions and imports:{-# LANGUAGE NoImplicitPrelude #-} {-# LANGUAGE OverloadedStrings #-} {-# LANGUAGE QuasiQuotes #-} {-# LANGUAGE TemplateHaskell #-} {-# LANGUAGE TypeFamilies #-} import ClassyPrelude.Yesod import Text.Markdown import Yesod.GitRepoNow we're going to create our foundation datatype. We need to give it one field: the GitRepo value containing our parsed content. Our content will simply be the text inside README.md, wrapped up in a Markdown newtype for easy rendering. This gives us:data App = App { getGitRepo :: GitRepo Markdown } instance Yesod AppAnd now let's set up our routes. We need just two: a homepage, and the subsite. Our subsite type is GitRepo Markdown (as given above). We replace the space with a hyphen as an escaping mechanism inside Yesod's route syntax:mkYesod "App" [parseRoutes| / HomeR GET /refresh RefreshR GitRepo-Markdown getGitRepo |]Next up is our home handler. We start off by getting the current content parsed from the repository:getHomeR :: Handler Html getHomeR = do master <- getYesod content <- liftIO $ grContent $ getGitRepo masterThen it's just some normal Hamlet code: defaultLayout $ do setTitle "yesod-gitrepo sample" [whamlet|

Force a refresh at @{RefreshR GitRepoRoute} #{content} |]And finally, our main function. We pass in the repo URL and branch name, plus a function that, given the filepath containing the cloned repo, processes it and generates a Markdown value. Finally, we provide the generated repo value to our App constructor and run it:main :: IO () main = do repo <- gitRepo "https://github.com/snoyberg/yesod-gitrepo" "master" $ \fp -> fmap Markdown $ readFile $ fp "README.md" warp 3000 $ App repoGive it a shot. You should have a webapp hosting this very README file!

The Guts of a Spineless Machine

Planet Haskell - Thu, 10/30/2014 - 06:00
Posted on October 28, 2014 Tags: haskell, c It’s fairly well known that Haskell is a bit um.. different then how stock hardware sees the world. I’m not aware of too many processors that have decided that immutability and higher order functions are the right way to go. Compiling Haskell and its ilk, however, does have one interesting wrinkle on top of the normal problem: laziness. Laziness stands completely at odds with how most everything else works. More over, whether or not you think it’s the right default it’s an interesting question of how to efficiently compile some evaluation strategy other than call by value or name. To this end, people have built a lot of abstract machines that lazy languages could target. The idea being that these machines can be mapped easily to what we hardware want and transitively, we can get our compiler. Most of these work by “graph reduction” (that’s the G in STG) and the latest incarnation of these graph machines is the spineless tagless graph machine which lies at the heart of GHC and a few other compilers. In this post, I’d like to go over how exactly the STG machine actually works. Turns out it’s pretty cool! Core Concepts The basic idea behind a compiler intent on going the STG route is something like .. front end stuff .. Translate IL to STG language Compile STG language to C/ASM/LLVM/Javascript In GHC case I understand the pipeline is something like Parsing Type checking Desugaring + a few bobs and bits Translation to core Lion share of optimization Translation to STG language STG language to C– C– to assembly or llvm We’re really concerned with parts 6 and 7 here. First things first, let’s lay out what’s exactly in the STG language. It’s a tiny functional language that looks a bit like Haskell or Core, with a few restrictions. A program is simply a series of bindings, much like Haskell. The top levels look something like f = {x y z} flag {a b c} -> ... You should read this for now as f = \a b c -> .... The first set of variables and the flag correspond to some stuff we’ll discuss later. Inside the ... we can write most of what you would expect form Haskell. We have let[rec] bindings, case expressions, application, constructors, literals, and primitives. There is a caveat though, first off all constructor applications must be fully saturated. This isn’t unlike OCaml or something where you can’t just treat a constructor as a function with an arbitrary name. We would write \a -> Just a instead of just Just. Another bit of trickiness, our language has no lambdas! So we can’t even write the above. Instead if we had something like map Just [1, 2, 3] We’d have to write let f = \a -> Just a l'' = 3 : nil l' = 2 : l'' l = 1 : l' in map f l The reason for the awkward l'' series is that we’re only allowed to apply constructors and functions to atoms (literals and variables). One other noteworthy feature of STG is that we have primitive operations. They need to be fully saturated, just like constructors, but they work across unboxed things. For example there would probably be something like +# which adds to unboxed integers. To work with these we also have unboxed literals, 1#, 2#, so on and so on. No despite all these limitations placed on STG, it’s still a pretty stinking high level language. There’s letrec, higher order functions, a lot of the normal stuff we’d expect in a functional language. This means it’s not actually to hard to compile something like Haskell or Core to STG (I didn’t say “compile efficiently”). As an example, let’s look at translating factorial into STG language. We start with f :: Int -> Int f i = case i of 0 -> 1 i -> i * (f (i - 1)) Now the first step is we change the binding form f = {} n {i} -> ... The case expressions clause can remain the same, we’re already casing on an atom case i of (MkInt# i#) -> ... Now comes the first big change, our boxed integers are going to get in the way here, so the case expression strips away the constructor leaving us with an unboxed integer. We can similarly refactor the body to make evaluation order explicit case i of MkInt i# -> case i# -# 1# of dec# -> let dec = \{dec#} u {} -> MkInt dec# in case fact dec of MkInt rest# -> case i# * rest# of result# -> MkInt result# Notice how the case expressions here are used to make the evaluation of various expressions explicit and let was used to create a new thing to evaluate. Now we can see what those extra {}’s were for. They notate the free variables for a thunk. Remember how we can have all sorts of closures and it can make for some really nice code? Well the machine doesn’t exactly support those naively. What we need to do and note the variables that we close over explicitly and then generate code that will store these free variables with the value that closes over them. This pair is more or less what is called a “closure” for the rest of this post. dec for example has a free variable dec# and it exists to box that result for the recursive call to factorial. We use case expressions to get evaluation. Most programs thus become chains of case’s and let alternating between creating thunks and actually doing work. That u in between the {}’s in dec was also important. It’s the update flag. Remember how in Haskell we don’t want to force the same thunk twice. If I say let i = 1 + 1 in i + i We should only evaluate 1 + 1 once. That means that the thunk i will have to be mutated to not evaluate 1 + 1 twice. The update flag signifies the difference between thunks that we want to update and thunks that we don’t. For example, if we replaced the thunk for + with the first result it returned, we’d be mighty surprised. Suddenly 1 + 1 + 1 is just 2! The u flag says “yes, I’m just a normal expression that should be updated” and the n flag says the opposite. That about wraps up our discussion of the STG language, let’s talk about how to implement it now. Semantics This language wouldn’t be much good if it didn’t lend itself to an easy implementation, indeed we find that the restrictions we placed upon the language prove to be invaluable for its compilation (almost like they were designed that way!). In order to decide how best to implement it, we first let at the formal semantics for our language. We give these semantics a tuple of 6 things. The code - the instruction we’re currently executing The argument stack - A stack of integers or pointers to closures The return stack - A stack of continuations The update stack - A stack of update frames and sadness The heap - A map from addresses to closures The environment - A map from names to addresses of toplevel closures A code is more or less the current thing we’re attempting to do. It’s either Eval e p - evaluate an expression in an environment (p) Enter a - Enter a closure ReturnCon c ws - Return a constructor applied to some arguments ReturnInt - Return an integer Now the idea is we’re going to “unroll” our computations into pushing things onto the continuation stack and entering closures. We start with the code Eval main {}. That is to say, we start by running main. Then if we’re looking at a case we do something really clever EVAL(case expr of {pat1 -> expr1; ...}, p) as rs us h o becomes EVAL (expr, p) as ({pat1 -> expr1; ...} : rs) us h o That is to say, we just push the pattern matching on to the continuation stack and evaluate the expression. At some point we’ll get to a “leaf” in our expression. That is random literal (a number) or constructor. At this point we make use of our continuation stack EVAL (C ws, p) as ((...; c vs -> expr; ...) : rs) us h o ReturnCon (C ws) as ((...; c vs -> expr; ...) : rs) us h o EVAL (expr, p[vs -> ws]) as rs us h o So our pattern matching is rolled into ReturnCon. ReturnCon will just look on top of the return stack looking for a continuation which wants its constructor and evaluate its expression, mapping the constructor’s variables to the pattern’s variables. The story is similar for literals EVAL (Int i, p) as ((...; c vs -> expr; ...) : rs) us h o ReturnInt i as ((...; i -> expr; ...) : rs) us h o EVAL (expr, p) as rs us h o Another phase is how we handle let’s and letrec’s. In this phase instead of dealing with continuations, we allocate more thunks onto the heap. EVAL ((let x = {fs} f {xs} -> e; ... in expr), p) as rs us h o EVAL e p' as us h' o So as we’d expect, evaluating a let expression does indeed go and evaluate the body of the let expression, but changes up the environment in which we evaluate them. We have p' = p[x -> Addr a, ...] h' = h[a -> ({fs} f {xs} -> e) p fs, ...] In words “the new environment contains a binding for x to some address a. The heap is extended with an address a with a closure {fs} f {xs} -> ... where the free variables come from p”. The definition for letrec is identical except the free variables come from p' allowing for recursion. So the STG machine allocates things in lets, adds continuations with case, and jumps to continuation on values. Now we also have to figure out applications. EVAL (f xs, p) as rs us h o ENTER a (values of xs ++ as) rs us h o where the value of f is Addr a. So we push all the arguments (remember they’re atoms and therefore trivial to evaluate) on to the argument stack and enter the closure of the function. How do we actually enter a closure? Well we know that our closures are of the form ({fs} f {vs} -> expr) frees If we have enough arguments to run the closure (length vs > length of argument stack), then we can just EVAL expr [vs -> take (length vs) as, fs -> frees]. This might not be the case in something like Haskell though, we have partial application. So what do we do in this case? What we want is to somehow get something that’s our closure but also knows about however many arguments we actually supplied it. Something like ({fs ++ supplied} f {notSupplied} -> expr) frees ++ as where supplied ++ notSupplied = vs. This updating of a closure is half of what our update stack us is for. The other case is when we do actually enter the closure, but f = u so we’re going to want to update it. If this is the case we add an update from to the stack (as, rs, a) where as is the argument stack, rs is the return stack, and a is the closure which should be updated. Once we’ve pushed this frame, we promptly empty the argument stack and return stack. We then add the following rules to the definition of ReturnCon ReturnCon c ws {} {} (as, rs, a) : us h o ReturnCon c ws as rs us h' o where h' is the new heap that’s replaced our old closure at a with our new, spiffy, updated closure h' = h[a -> ({vs} n {} -> c vs) ws] So that’s what happens when we go to update a closure. But what about partial application? Enter a as {} (asU, rs, aU) : us h o Enter a (as ++ asU) rs us h' o where h a = ({vs} n {xs} -> expr) frees h' = h [aU -> ((vs ++ bound) n xs -> e) (frees ++ as)] This is a simplified rule from what’s actually used, but gives some intuition to what’s happening: we’re minting a new closure in which we use the arguments we’ve just bound and that’s what the result of our update is. Compiling This Now that we have some idea of how this is going to work, what does this actually become on the machine? The original paper by SPJ suggests an “interpreter” approach to compilation. In other words, we actually almost directly map the semantics to C and call it compiled. There’s a catch though, we’d like to represent the body of closures as C functions since they’re well.. functions. However, since all we do is enter closures and jump around to things till the cows come home, it had damn well better be fast. C function calls aren’t built to be that fast. Instead the paper advocates a tiny trampolining-esque approach. When something wants to enter a closure, it merely returns it and our main loop becomes while(1){cont = (*cont)();} Which won’t stackoverflow. In reality, more underhanded tricks are applied to make the performance suck less, but for we’ll ignore such things. In our compiled results there will be 2 stacks, not the 3 found in our abstract machine. In the first stack (A-stack) there are pointer things and the B-stack has non-pointers. This are monitored by two variables/registers SpA and SpBwhich keep track of the heights of the two stacks. Then compilation becomes reasonably straightforward. An application pushes the arguments onto the appropriate stacks, adjusts Sp*, and enters the function A let block allocates each of the bound variables, then the body. Entering a closure simply jumps to the closures code pointer. This is actually quite nifty. We delegate all the work of figuring out exactly what Enter will do (updates, continuation jiggering) is left to the closure itself. A case expression is a bit more complicated since a continuation’s representation involves boxing up the local environment for each branch. Once that’s bundled away, we represent a continuation as a simple code pointer. It is in charge of scrutinizing the argument stack and selecting an alternative and then running the appropriate code. This is a lot of work, and unless I’m crazy will need to types of bound variables for each branch (really just ptr/non-ptr). The selection of an alternative would be represented as a C switch, letting all sorts of trickery with jump tables be done by the C compiler. In order to return a value, we do something clever. We take a constructor and point a global variable at its constructor closure, containing its values and jump to the continuation. The continuation can then peek and poke at this global variable to bind things as needed for the alternatives. There is potentially a massive speedup by returning through registers, but this is dangerously close to work. From here, primitive operations can be compiled to statements/instructions in whatever environment we’re targeting. In C for example we’d just use the normal + to add our unboxed integers. The last beast to slay is updates. We represent update frames by pointers to argument stacks and a pointer to a closure. That means that the act of updating is merely saving Sp* in an update from, clobbering them, and then jumping into the appropriate closure. We push the update from onto stack B and keep on going. I realize that this is a glancing overview and I’m eliding a lot of the tricky details, but hopefully this is sufficient to understand a bit about what going on at an intuitive level. Wrap Up So now that you’ve put all the effort to get through this post, I get to tell you it’s all lies! In reality GHC has applied all manner of tricks and hacks to get fast performance out of an STG model. To be honest I’m not sure where I should point to that explains these tricks because well… I have no idea what they are. I can point to SPJ’s original paper The Relevant GHC Wiki Page If you have any suggestions for other links I’d love to add them! Thanks Chris Ganas for proof reading /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */ var disqus_shortname = 'codeco'; // required: replace example with your forum shortname /* * * DON'T EDIT BELOW THIS LINE * * */ (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); Please enable JavaScript to view the comments powered by Disqus. comments powered by Disqus 2014-10-28T00:00:00Z

Fast pagination on PostgreSQL

Planet Haskell - Thu, 10/30/2014 - 06:00
During the implementation of IRCBrowse I discovered that Postgres’s built-in offset is not very fast. Here are the characteristics of my data: ircbrowse=> \d event Table "public.event" Column | Type | -----------+--------------------------+ timestamp | timestamp with time zone | type | text | nick | text | text | text | network | integer | channel | integer | id | bigint | Indexes: "event_unique" UNIQUE CONSTRAINT, btree (network, channel, "timestamp", nick, type, text) "event_unique_id" UNIQUE CONSTRAINT, btree (id) "event_channel_idx" btree (channel) "event_nick_idx" btree (nick) "event_timestamp_idx" btree ("timestamp") "event_type_idx" btree (type) And the size: ircbrowse=> select count(*) from event; count ---------- 28673917 Channel 1 is the biggest: ircbrowse=> select count(*) from event where channel = 1; count ---------- 19340467 When you’re working with data on this scale (large, but not “big data”), PostgreSQL handles it beautifully. But the speed of OFFSET/LIMIT is not great: ircbrowse=> explain analyze select * from event where channel = 1 order by id offset 500000 limit 30; QUERY PLAN Limit (cost=5648.81..5818.28 rows=30 width=85) (actual time=0.301..0.309 rows=30 loops=1) -> Index Scan using event_unique_id on event (cost=0.00..81914674.39 rows=14501220 width=85) (actual time=0.020..0.288 rows=1030 loops=1) Filter: (channel = 1) I think that this index scan is simply too expensive. Notice that I’m ordering by id which has a unique btree index on it. Check out the speed: ircbrowse=> select * from event where channel = 1 order by id offset 1000 limit 30; Time: 0.721 ms ircbrowse=> select * from event where channel = 1 order by id offset 500000 limit 30; Time: 191.926 ms You might think less than a second to sift through 500,000 rows of a 28million row table is pretty good, but I think it sucks. It’s also deceptive. Let’s increase it to 1,000,000 rows (of 19,000,00): ircbrowse=> select * from event where channel = 1 order by id offset 1000000 limit 30; Time: 35022.464 ms This is getting worse and worse! It’s probably linear in its poor performance. However, there’s a solution. Use an index table. A separate table which contains foreign keys pointing to this table: ircbrowse=> \d event_order_index Table "public.event_order_index" Column | Type | Modifiers --------+---------+----------- id | integer | not null origin | integer | not null idx | integer | not null Indexes: "event_order_id_origin" UNIQUE CONSTRAINT, btree (id, origin) "event_order_idx" btree (id) "event_order_idx_idx" btree (idx) "event_order_origin_dx" btree (origin) Now you can have a pagination index for channel 1: ircbrowse=> select * from event_order_index where idx = 1000 limit 1; id | origin | idx ----+--------+------ 1 | 1 | 1000 (I used idx=1000 for channel 1, 2000 for channel 2, etc. so that I would have space for other numerical indexes for the same channel.) Now you can make a very efficient query for the same data as above: ircbrowse=> SELECT idx.id,e.timestamp,e.network,e.channel, ircbrowse=> e.type,e.nick,e.text FROM event e, ircbrowse-> event_order_index idx ircbrowse-> WHERE e.id = idx.origin and idx.idx = 1000 and ircbrowse=> idx.id > 1000000 and idx.id < 1000030 ircbrowse-> ORDER BY e.id asc ircbrowse-> LIMIT 30; Time: 1.001 ms This is more or less constant time. And you can see this in action on the site. Takes about 30ms to load and render the page if I run this on the server: $ time curl 'http://ircbrowse.net/browse/haskell?events_page=234' real 0m0.031s user 0m0.000s sys 0m0.004s Of course, sending a request in your browser will take longer due to the connection overhead and assets, but generally the goal was for it to be very snappy. The old ircbrowse.com (by another individual, who kindly let me have the name) was very slow indeed. You’d see the page loading the data incrementally from the database. Anyhoo, thought that was a decent, practical PostgreSQL-specific optimization regarding pagination. Hope it was worth writing up. 2014-10-28T00:00:00Z

How to desugar Haskell code

Planet Haskell - Thu, 10/30/2014 - 06:00
Haskell's core language is very small, and most Haskell code desugars to either:lambdas / function application,algebraic data types / case expressions,recursive let bindings,type classes and specialization, or:Foreign function callsOnce you understand those concepts you have a foundation for understanding everything else within the language. As a result, the language feels very small and consistent.I'll illustrate how many higher-level features desugar to the same set of lower-level primitives.ifif is equivalent to a case statement:if b then e1 else e2-- ... is equivalent to:case b of True -> e1 False -> e2This works because Bools are defined within the language:data Bool = False | TrueMulti-argument lambdasLambdas of multiple arguments are equivalent to nested lambdas of single arguments:\x y z -> e-- ... is equivalent to:\x -> \y -> \z -> eFunctionsFunctions are equivalent to lambdas:f x y z = e-- ... is equivalent to:f = \x y z -> e-- ... which in turn desugars to:f = \x -> \y -> \z -> eAs a result, all functions of multiple arguments are really just nested functions of one argument each. This trick is known as "currying".Infix functionsYou can write functions of at least two arguments in infix form using backticks:x `f` y-- ... desugars to:f x yOperatorsOperators are just infix functions of two arguments that don't need backticks. You can write them in prefix form by surrounding them with parentheses:x + y-- ... desugars to:(+) x yThe compiler distinguishes operators from functions by reserving a special set of punctuation characters exclusively for operators.Operator parametersThe parentheses trick for operators works in other contexts, too. You can bind parameters using operator-like names if you surround them with parentheses:let f (%) x y = x % yin f (*) 1 2-- ... desugars to:(\(%) x y -> x % y) (*) 1 2-- ... reduces to:1 * 2Operator sectionsYou can partially apply operators to just one argument using a section:(1 +)-- desugars to:\x -> 1 + xThis works the other way, too:(+ 1)-- desugars to:\x -> x + 1This also works with infix functions surrounded by backticks:(`f` 1)-- desugars to:\x -> x `f` 1-- desugars to:\x -> f x 1Pattern matchingPattern matching on constructors desugars to case statements:f (Left l) = eLf (Right r) = eR-- ... desugars to:f x = case x of Left l -> eL Right r -> eRPattern matching on numeric or string literals desugars to equality tests:f 0 = e0f _ = e1-- ... desugars to:f x = if x == 0 then e0 else e1-- ... desugars to:f x = case x == 0 of True -> e0 False -> e1Non-recursive let / whereNon-recursive lets are equivalent to lambdas:let x = y in z-- ... is equivalent to:(\x -> z) ySame thing for where, which is identical in purpose to let:z where x = y-- ... is equivalent to:(\x -> z) yActually, that's not quite true, because of let generalization, but it's close to the truth. Recursive let / where cannot be desugared like this and should be treated as a primitive.Top-level functionsMultiple top-level functions can be thought of as one big recursive let binding:f0 x0 = e0f1 x1 = e1main = e2-- ... is equivalent to:main = let f0 x0 = e0 f1 x1 = e1 in e2In practice, Haskell does not desugar them like this, but it's a useful mental model.ImportsImporting modules just adds more top-level functions. Importing modules has no side effects (unlike some languages), unless you use Template Haskell.Type-classesType classes desugar to records of functions under the hood where the compiler implicitly threads the records throughout the code for you.class Monoid m where mappend :: m -> m -> m mempty :: minstance Monoid Int where mappend = (+) mempty = 0f :: Monoid m => m -> mf x = mappend x x-- ... desugars to:data Monoid m = Monoid { mappend :: m -> m -> m , mempty :: m }intMonoid :: Monoid IntintMonoid = Monoid { mappend = (+) , mempty = 0 }f :: Monoid m -> m -> mf (Monoid p z) x = p x x... and specializing a function to a particular type class just supplies the function with the appropriate record:g :: Int -> Intg = f-- ... desugars to:g = f intMonoidTwo-line do notationA two-line do block desugars to the infix (>>=) operator:do x <- m e-- ... desugars to:m >>= (\x ->e )One-line do notationFor a one-line do block, you can just remove the do:main = do putStrLn "Hello, world!"-- ... desugars to:main = putStrLn "Hello, world!"Multi-line do notationdo notation of more than two lines is equivalent to multiple nested dos:do x <- mx y <- my z-- ... is equivalent to:do x <- mx do y <- my z-- ... desugars to:mx >>= (\x ->my >>= (\y ->z ))let in do notationNon-recursive let in a do block desugars to a lambda:do let x = y z-- ... desugars to:(\x -> z) yghciThe ghci interactive REPL is analogous to one big do block (with lots and lots of caveats):$ ghci>>> str <- getLine>>> let str' = str ++ "!">>> putStrLn str'-- ... is equivalent to the following Haskell program:main = do str <- getLine let str' = str ++ "!" putStrLn str'-- ... desugars to:main = do str <- getLine do let str' = str ++ "!" putStrLn str'-- ... desugars to:main = getLine >>= (\str -> do let str' = str ++ "!" putStrLn str' )-- ... desugars to:main = getLine >>= (\str -> (\str' -> putStrLn str') (str ++ "!") )-- ... reduces to:main = getLine >>= (\str -> putStrLn (str ++ "!") )List comprehensionsList comprehensions are equivalent to do notation:[ (x, y) | x <- mx, y <- my ]-- ... is equivalent to:do x <- mx y <- my return (x, y)-- ... desugars to:mx >>= (\x -> my >>= \y -> return (x, y))-- ... specialization to lists:concatMap (\x -> concatMap (\y -> [(x, y)]) my) mxThe real desugared code is actually more efficient, but still equivalent.The MonadComprehensions language extension generalizes list comprehension syntax to work with any Monad. For example, you can write an IO comprehension:>>> :set -XMonadComprehensions>>> [ (str1, str2) | str1 <- getLine, str2 <- getLine ]Line1Line2("Line1", "Line2")Numeric literalsInteger literals are polymorphic by default and desugar to a call to fromIntegral on a concrete Integer:1 :: Num a => a-- desugars to:fromInteger (1 :: Integer)Floating point literals behave the same way, except they desugar to fromRational:1.2 :: Fractional a => a-- desugars to:fromRational (1.2 :: Rational)IOYou can think of IO and all foreign function calls as analogous to building up a syntax tree describing all planned side effects:main = do str <- getLine putStrLn str return 1-- ... is analogous to:data IO r = PutStrLn String (IO r) | GetLine (String -> IO r) | Return rinstance Monad IO where (PutStrLn str io) >>= f = PutStrLn str (io >>= f) (GetLine k ) >>= f = GetLine (\str -> k str >>= f) Return r >>= f = f rmain = do str <- getLine putStrLn str return 1-- ... desugars and reduces to:main = GetLine (\str -> PutStrLn str ( Return 1 ))This mental model is actually very different from how IO is implemented under the hood, but it works well for building an initial intuition for IO.For example, one intuition you can immediately draw from this mental model is that order of evaluation in Haskell has no effect on order of IO effects, since evaluating the syntax tree does not actually interpret or run the tree. The only way you can actually animate the syntax tree is to define it to be equal to main.ConclusionI haven't covered everything, but hopefully that gives some idea of how Haskell translates higher level abstractions into lower-level abstractions. By keeping the core language small, Haskell can ensure that language features play nicely with each other.Note that Haskell also has a rich ecosystem of experimental extensions, too. Many of these are experimental because each new extension must be vetted to understand how it interacts with existing language features.

Notes on Focusing

Planet Haskell - Thu, 10/30/2014 - 06:00
Posted on October 27, 2014 Tags: types I’ve been spending a lot of time whacking my head on focusing literature. I’d like to jot down some intuition around what a focused system is and how it relates to the rest of the world. I’m going to steer clear of actually proving things but I will point out where a proof would be needed. What Is Focusing In a nutshell, focusing is a strategy to create proofs that minimizes the amount of choices available at each step. Focusing is thus amenable to mechanization since a computer is very good at applying a lot of deterministic procedures but incredibly bad at nondeterministic choice. Now when we set out to define a focused system we usually do something like Formalize our logical framework with natural deduction Translate our framework into a sequent calculus Transform our sequent calculus into a focused variant At each of these steps there’s a proof that says something like “System 2 is sound and complete with respect to System 1”. We can then chain these proofs together to get that we can transform any nonfocused proof into a focused one (focalization) and the reverse (de-focalization). In order to actually carry out these proofs there’s a fair amount of work and pain. Usually we’ll need something like cut elimination and/or identity expansion. Groundwork Now before we go on to define an example logic, let’s notice a few things. First off, in sequent calculus there are left and right rules. Left rules decompose known facts into other known facts while right rules transform our goal. There’s also an identity sequent which more or less just states A is an atom ————————————— Γ, A → A This is a bit boring though. Now certain rules are invertible: their conclusion implies their premise in addition to the reverse. For example if I said you must prove A ∧ B clearly we’ll have to prove both A and B in order to prove A ∧ B; there’s no alternative set of rule applications that let us circumvent proving A and B. This means that if we were mechanically trying to prove something of the form A ∧ B we can immediately apply the right rule that decomposes ∧ into 2 goals. We can these sort of rules invertible or asynchronous. Dually, there are rules that when applied transform our goal into something impossible to prove. Consider ⊥ ∨ ⊤, clearly apply the rule that transforms this into ⊥ would be a bad idea! Now if we begin classifying all the left and write rules we’ll notice that the tend to all into 2 categories Things with invertible left rules and noninvertible right rules Things with noninvertible left rules and invertible right rules We dub the first group “positive” things and the second “negative” things. This is called polarization and isn’t strictly necessary but greatly simplifies a lot of our system. Now there are a few things that could be considered both positive and negative. For example we can consider ∧ as positive with Γ → A⁺ Γ → B⁺ ——————————————— Γ → A⁺ ∧ B⁺ Γ, A⁺, B⁺ → C ————————————————— Γ, A⁺ ∧ B⁺ → C In this case, the key determiner for the polarity of ∧ comes from its subcomponents. We can just treat ∧ as positive along with its subcomponents and with an appropriate dual ∧⁻, our proof system will still be complete. As a quick example, implication ⊃ is negative. the right rule Γ, A → B —————————— Γ → A ⊃ B While its left rule isn’t Γ, A ⊃ B → A Γ, B, A ⊃ B → C —————————————————————————————— Γ, A ⊃ B → C Since we could easily have something like ⊥ ⊃ ⊤ but using this rule would entail (heh) proving ⊥! Urk. If our system applied this rules remorselessly, we’d quickly end up in a divergent proof search. An Actual Focused System Do note that these typing rules are straight out of Rob Simmons’ paper, linked below Now that we’ve actually seen some examples of invertible rules and polarized connectives, let’s see how this all fits into a coherent system. There is one critical change we must make to the structure of our judgments: an addition to the form _ → _. Instead of just an unordered multiset on the left, in order to properly do inversion we change this to Γ; Ω ⊢ A where Ω is an ordered list of propositions we intend to focus on. Furthermore, since we’re dealing with a polarized calculus, we occasionally want to view positive things as negative and vice versa. For this we have shifts, ↓ and ↑. When we’re focusing on some proposition and we reach a shift, we pop out of the focused portion of our judgment. Our system is broken up into 3 essentially separate judgments. In this judgment we basically apply as many invertible rules as many places as we can. Γ, A⁻; Q ⊢ U —————————————— Γ; ↓A⁻, Q ⊢ U Γ; A⁺, Ω ⊢ U Γ; B+; Ω ⊢ U ——————————————————————————— Γ; A⁺ ∨ B⁺, Ω ⊢ U Γ; A⁺, B⁺, Ω ⊢ U ———————————————————— Γ; A⁺ ∧ B⁺, Ω ⊢ U —————————————— Γ; ⊥, Ω ⊢ U We first look at how to break down Ω into simpler forms. The idea is that we’re going to keep going till there’s nothing left in Ω. Ω can only contain positive propositions so eventually we’ll decompose everything to shifts (which we move into Γ) ⊤+ (which we just drop on the floor) or ⊥ (which means we’re done). These are all invertible rules to we can safely apply them eagerly and we won’t change the provability of our goal. Once we’ve moved everything out of Ω we can make a choice. If U is “stable” meaning that we can’t break it down further easily, we can pick a something negative out of our context and focus on it Γ; [A⁻] ⊢ U ————————————– Γ, A⁻; • ⊢ U This pops us into the next judgment in our calculus. However, if U is not stable, then we have to decompose it further as well. Γ; • ⊢ A⁺ —————————————— Γ; • ⊢ ↑ A⁺ ——————————— Γ; • ⊢ ⊤⁻ Γ; A⁺ ⊢ B⁻ ————————————— Γ; • ⊢ A⁺ ⊃ B⁻ Γ; • ⊢ A⁻ Γ; • ⊢ B⁻ ————————————————————— Γ; • ⊢ A⁻ ∧ B⁻ If we have a negative connective at the top level we can decompose that further, leaving us with a strictly smaller goal. Finally, we may reach a positive proposition with nothing in Ω. In this case we focus on the right. Γ ⊢ [A⁺] ——————————— Γ; • ⊢ A⁺ Now we’re in a position to discuss these two focused judgments. If we focus on the right we decompose positive connectives —————————— Γ ⊢ [⊤⁺] Γ; • ⊢ A⁻ ————————— Γ ⊢ ↓ A⁻ Γ ⊢ [A⁺] ————————————— Γ ⊢ [A⁺ ∨ B⁺] Γ ⊢ [B⁺] ————————————— Γ ⊢ [A⁺ ∨ B⁺] Γ ⊢ [A⁺] Γ ⊢ [B⁺] ——————————————————— Γ ⊢ [A⁺ ∧ B⁺] These judgments follow the ones we’ve already seen. If we encounter a shift, we stop focusing. Otherwise we decompose the topmost positive connective. Now looking at these, you should see that sometimes these rules we’ll lead us to a “mistake”. Imagine if we applied the 4th rule to ⊤ ∨ ⊥! This is why these rules are segregated into a separate judgment. In this judgment’s dual we essentially apply the exact same rules to the left of the turnstile and on negative connectives. Γ; A⁺ ⊢ U ———————————— Γ; [↑A⁺] ⊢ U Γ ⊢ [A⁺] Γ; [B⁻] ⊢ U —————————————————————— Γ; [A⁺ ⊃ B⁻] ⊢ U Γ; [A⁻] ⊢ U ————————————————— Γ; [A⁻ ∧ B⁻] ⊢ U Γ; [B⁻] ⊢ U ————————————————— Γ; [A⁻ ∧ B⁻] ⊢ U That wraps up our focused system. The idea is now we have this much more limited system which can express the same things our original, unfocused system could. A computer can be easily programmed to do a focused search since there’s much less backtracking everywhere leading to fewer rules being applicable at each step. I think Pfenning has referred to this as removing most of the “don’t-care” nondeterminism from our rules. Wrap Up I’m going to wrap up the post here. Proving focalization or even something like cut elimination is quite fiddly and I have no desire at all to try to transcribe it (painfully) into markdown and get it wrong in the process. Instead, now that you have some idea of what focusing is about, go read Rob Simmons’ paper. It provides a clear account of proving everything necessary prove a focused system is complete and sound with respect to its unfocused counterpart. Cheers /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */ var disqus_shortname = 'codeco'; // required: replace example with your forum shortname /* * * DON'T EDIT BELOW THIS LINE * * */ (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); Please enable JavaScript to view the comments powered by Disqus. comments powered by Disqus 2014-10-27T00:00:00Z

GHC Weekly News - 2014/10/24

Planet Haskell - Thu, 10/30/2014 - 06:00
Hi *, Welcome to the weekly GHC news. This one will be short this week, as the preceding one occurred only on Monday - but we'll be going with Fridays from now on, so next week we'll hopefully see a longer list. GHC 7.8.4 tickets have been in waiting, and the RC will be soon after Austin finishes some final merges and tests on his branch. We have not committed a time for the release after the RC, yet we would like people to please seriously test and immediately report any major showstoppers - or alert us of ones we missed. For the ​ GHC 7.10 release, one of the major features we planned to try and merge was DWARF debugging information. This is actually a small component of larger ongoing work, including adding stack traces to Haskell executables. While, unfortunately, not all the work can be merged, we talked with Peter, and made a plan: our hope is to get ​Phab:D169 merged, which lays all the groundwork, followed by DWARF debugging information in the code generators. This will allow tools like gdb or other extensible debuggers to analyze C-- IR accurately for compiled executables. Peter has written up a wiki page, available at SourceNotes, describing the design. We hope to land all the core infrastructure in ​Phab:D169 soon, followed by DWARF information for the Native Code Generator, all for 7.10.1 This past week, a discussion sort of organically started on the #ghc IRC channel about the future of the LLVM backend. GHC's backend is buggy, has no control over LLVM versions, and breaks frequently with new versions. This all significantly impacts users, and relegates the backend to a second class citizen. After some discussion, Austin wrote up a ​ proposal for a improved backend, and wrangled several other people to help. The current plan is to try an execute this by GHC 7.12, with the goal of making the LLVM backend Tier 1 for major supported platforms. You may notice ​https://ghc.haskell.org is now responds slightly faster in some cases - we've activated a caching layer (CloudFlare) on the site, so hopefully things should be a little more smooth. Closed tickets this week: #9684, #9692, #9038, #9679, #9537, #1473. 2014-10-24T23:43:39Z thoughtpolice

Update on Old Projects

Planet Haskell - Thu, 10/30/2014 - 06:00
Posted on October 24, 2014 Tags: haskell, personal All though most people I talk to know me for my blog, I do occasionally actually write software instead of just talking about it :) Sadly, as a mercurial user most of my stuff has languished with on bitbucket. I’ve had a few people tell me that this is annoying for various reasons. Yesterday, I finally got around to fixing that! As of yesterday, all of my interesting projects are mirrored on [github][my-github]. I’m still using mercurial but thanks to the lovely git-hg tool this is not an issue. You can fork, pull-request, or generally peek and poke as you please. From my end all of these actions look like nice mercurial changesets so I can continue to live under my rock where I don’t need to understand Git. As a quick list of what haskell code is up there now c-dsl c_of_scheme ds-kanren generic-church hasquito hlf monad-gen reified-records this blog Which I think includes every project I’ve blogged about here as well as a few others. Sorry it took so long! /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */ var disqus_shortname = 'codeco'; // required: replace example with your forum shortname /* * * DON'T EDIT BELOW THIS LINE * * */ (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); Please enable JavaScript to view the comments powered by Disqus. comments powered by Disqus 2014-10-24T00:00:00Z

The New Haskell.org

Planet Haskell - Thu, 10/30/2014 - 06:00
Hello there! What you're reading is a blog post. Where is it from? It's from https://blog.haskell.org. What's it doing? It's cataloging the thoughts of the people who run Haskell.org. That's right. This is our new adventure in communicating with you. We wanted some place to put more long-form posts, invite guests, and generally keep people up to date about general improvements (and faults) to our infrastructure, now and in the future. Twitter and a short-form status site aren't so great, and a non-collective mind of people posting scattered things on various sites/lists isn't super helpful for cataloging things. So for an introduction post, we've really got a lot to talk about... A quick recap on recent events Haskell.org has had some rough times lately. About a month and a half ago, we had an extended period of outage, roughly around the weekend of ICFP 2014. This was due to a really large amount of I/O getting backed up on our host machine, rock. rock is a single-tenant, bare-metal machine from Hetzner that we used to host several VMs that comprise the old server set; including the main website, the GHC Trac and git repositories, and Hackage. We alleviated a lot of the load by turning off the hackage server, and migrating one of the VMs to a new hosting provider. Then, about a week and a half ago, we had another hackage outage that was a result of more meager concerns: disk space usage. Much to my chagrin, this was due in part to an absence of log rotation over the past year, which resulted in a hefty 15GB of text sitting around (in a single file, no less). Oops. This caused a small bump on the road, which was that the hackage server had a slight error while committing some transactions in the database when it ran out of disk. We recovered from this (thanks to @duncan for the analysis), and restarted it. (We also had point-in-time backups, but in this case it was easier to fix than rollback the whole database). But we've had several other availability issues beforehand too, including faulty RAM and inconsistent performance. So we're setting out to fix it. And in the process we figured, hey, they'd probably like to hear us babble about a lot of other stuff, too, because why not? New things OK, so enough sad news about what happened. Now you're wondering what's going to happen. Most of these happening-things will be good, I hope. There are a bunch of new things we've done over the past year or so for Haskell.org, so it's best to summarize them a bit. These aren't in any particular order; most of the things written here are pretty new and some are a bit older since the servers have started churning a bit. But I imagine many things will be new to y'all. A new blog, right here. And it's a fancy one at that (powered by Phabricator). Like I said, we'll be posting news updates here that we think are applicable for the Haskell.org community at large - but most of the content will focus on the administrative side. A new hosting provider: Rackspace As I mentioned earlier this year pending the GHC 7.8 release, Rackspace has graciously donated resources towards Haskell.org for GHC, particularly for buildbots. We had at that time begun using Rackspace resources for hosting Haskell.org resources. Over the past year, we've done so more and more, to the point where we've decided to move all of Haskell.org. It became clear we could offer a lot higher reliability and greatly improved services for users, using these resources. Jesse Noller was my contact point at Rackspace, and has set up Haskell.org for its 2nd year running with free Rackspace-powered machines, storage, and services. That's right: free (to a point, the USD value of which I won't disclose here). With this, we can provide more redundant services both technically and geographically, we can offer better performance, better features and management, etc. And we have their awesome Fanatical Support. So far, things have been going pretty well. We've migrated several machines to Rackspace, including: https://ghc.haskell.org & https://git.haskell.org https://darcs.haskell.org http://deb.haskell.org https://planet.haskell.org https://phabricator.haskell.org, and our GHC build bot https://new-www.haskell.org (soon not to be so new-) https://hackage.haskell.org (and its build bot) We're still moving more servers, including: https://www.haskell.org, to become https://wiki.haskell.org most likely http://code.haskell.org / http://community.haskell.org / http://projects.haskell.org Many thanks to Rackspace. We owe them greatly. Technology upgrades, increased security, etc etc We've done several overhauls of the way Haskell.org is managed, including security, our underlying service organization, and more. A better webserver: All of our web instances are now served with nginx where we used Apache before. A large motivation for this was administrative headache, since most of us are much more familiar with nginx as opposed to our old Apache setup. On top of that we get increased speed and a more flexible configuration language (IMO). It also means we have to now run separate proxy servers for nginx, but systems like php-fpm or gunicorn tend to have much better performance and flexibility than things like mod_php anyway. Ubuntu LTS: Almost all of our new servers are running Ubuntu 14.04 LTS. Previously we were running Debian stable, and before Debian announced their LTS project for Squeeze, the biggest motivation was Ubuntu LTSes typically had a much longer lifespan. IPv6 all the way: All of our new servers have IPv6, natively. HTTPS: We've rolled out HTTPS for the large majority of Haskell.org. Our servers sport TLS 1.2, ECDHE key exchange, and SPDY v3 with strong cipher suites. We've also enabled HSTS on several of our services (including Phabricator), and will continue enabling it for likely every site we have. Reorganized: We've done a massive reorganization of the server architecture, and we've generally split up services to be more modular, with servers separated in both geographic locations and responsibilities where possible. Consolidation: We've consolidated several of our services too. The biggest change is that we now have a single, consolidated MariaDB 10.0 server powering our database infrastructure. All communications to this server are encrypted with spiped for high security. Phabricator, the wiki, some other minor things (like a blog), and likely future applications will use it for storage where possible too. Improved hardware: Every server now has dedicated network, and servers that are linked together (like buildbots, or databases) are privately networked. All networking operations are secured with spiped where possible. Interlude: A new Hackage server While we're on the subject, here's an example of what the new Hackage Server will be sporting: Old server: 8GB RAM, probably 60%+ of all RAM taken by disk cache. Combined with the hackage-builder process. 1 core. Shared ethernet link amongst multiple VMs (no dedicated QOS per VM, AFAIK). No IPv6. 1x100GB virtual KVM block device backed by RAID1 2x2TB SATA setup on the host. New server: 4x cores. 4GB RAM, as this should fit comfortably with nginx as a forward proxy. Hackage builder has its own server (removing much of the RAM needs). Dedicated 800Mb/s uplink, IPv6 enabled. Dedicated dual 500GB block devices (backed by dedicated RAID10 shared storage) in RAID1 configuration. So, Hackage should hopefully be OK for a long time. And, the doc builder is now working again, and should hopefully stay that way too. Automation: it's a thing Like many other sites, Haskell.org is big, complicated, intimidating, and there are occasionally points where you find a Grue, and it eats you mercilessly. As a result, automation is an important part of our setup, since it means if one of us is hit by a bus, people can conceivably still understand, maintain and continue to improve Haskell.org in the future. We don't want knowledge of the servers locked up in anyone's head. In The Past, Long ago in a Galaxy unsurprisingly similar to this one at this very moment, Haskell.org did not really have any automation. At all, not even to create users. Some of Haskell.org still does not have automation. And even still, in fact, some parts of it are still a mystery to all, waiting to be discovered. That's obviously not a good thing. Today, Haskell.org has two projects dedicated to automation purposes. These are: Ansible, available in rA, which is a set of Ansible playbooks for automating various aspects of the existing servers. Auron, available in rAUR, is a new, Next Gen™ automation framework, based on NixOS. We eventually hope to phase out Ansible in favor of Auron. While Auron is still very preliminary, several services have been ported over, and the setup does work on existing providers. Auron also is much more philosophically aligned with our desires for automation, including reproducibility, binary determinism, security features, and more. More bugs code in the open In our quest to automate our tiny part of the internet, we've begun naturally writing a bit of code. What's the best thing to do with code? Open source it! The new haskell-infra organization on GitHub hosts our code, including: Our copies of Phabricator which this server runs, and our extension libraries to it. The New Haskell.org homepage Our scripts for building GHC with Phabricator And maybe other fun stuff. Most of our repositories are hosted on GitHub, and we use our Phabricator for code review and changes between ourselves. (We still accept GitHub pull requests though!) So it's pretty easy to contribute in whatever way you want. Better DNS and content delivery: CloudFlare & Fastly We're very recently begun using CloudFlare for Haskell.org for DNS management, DDoS mitigation, and analytics. After a bit of deliberation, we decided that after moving off Hetzner we'd think about a 3rd party provider, as opposed to running our own servers. We chose CloudFlare mostly because aside from a nice DNS management interface, and great features like AnyCast, we also get analytics and security features, including immediate SSL delivery. And, of course, we get a nice CDN on top for all HTTP content. The primary benefits from CloudFlare are the security and caching features (in that order, IMO). The DNS interface is still particularly useful however; the nameservers should be redundant, and CloudFlare acts more like a reverse proxy as changes are quick and instant. But unfortunately while CloudFlare is great, it's only a web content proxy. That means certain endpoints which need things like SSH access can not (yet) be reliably proxied, which is one of the major downfalls. As a result, not all of Haskell.org will be magically DDoS/spam resistant, but a much bigger amount of it will be. But the bigger problem is: we have a lot of non-web content! In particular, none of our Hackage server downloads for example can proxied: Hackage, like most package repositories, merely uses HTTP as a transport layer for packages. In theory you could use a binary protocol, but HTTP has a number of advantages (like firewalls being nice to it). Using a service like CloudFlare for such content is - at the least - a complete violation of the spirit of their service, and just a step beyond that a total violation of their ToS (Section 10). But hackage pushes a few TB a month in traffic - so we have to pay line-rate for that, by the bits. And also, Hackage can't have data usefully mirrored to CDN edges - all traffic has to hop through to the Rackspace DCs, meaning users suffer at the hands of latency and slower downloads. But that's where Fastly came to the rescue. Fastly also recently stepped up to provide Haskell.org with an Open Source Software discount - meaning we get their awesome CDN for free, for custom services! Hooray! Since Fastly is a dedicated CDN service, you can realistically proxy whatever you want with it, including our package downloads. With the help of a new friend of ours (@davean), we'll be moving Fastly in front of Hackage soon. Hopefully this just means your downloads and responsiveness will get faster, and we'll use less bandwidth. Everyone wins. Finally, we're rolling out CloudFlare gradually to new servers to test them and make sure they're ready. In particular, we hope to not disturb any automation as a result of the switch (particularly to new SSL certificates), and also, we want to make sure we don't unfairly impact other people, such as Tor users (Tor/CloudFlare have a contentious relationship - lots of nasty traffic comes from Tor endpoints, but so does a ton of legitimate traffic). Let us know if anything goes wrong. Better server monitoring: DataDog & Nagios Server monitoring is a crucial part of managing a set of servers, and Haskell.org unfortunately was quite bad at it before. But not anymore! We've done a lot to try and increase things. Before my time, as far as I know, we pretty much only had some lame mrtg graphs of server metrics. But we really needed something more than that, because it's impossible to have modern infrastructure on that alone. Enter DataDog. I played with their product last year, and I casually approached them and asked if they would provide an account for Haskell.org - and they did! DD provides real-time analytics for servers, while providing a lot of custom integrations with services like MySQL, nginx, etc. We can monitor load, networking, and correlate this with things like database or webserver connection count. Events occur from all over Haskell.org On top of that, DD serves as a real-time dashboard for us to organize and comment on events as they happen. But metrics aren't all we need. There are two real things we need: metrics (point-in-time data), and resource monitoring (logging, daemon watchdogs, resource checks, etc etc). This is where Nagios comes in - we have it running and monitoring all our servers for daemons, heatlh checks, endpoint checks for connectivity, and more. Datadog helpfully plugins into Nagios, and reports events (including errors), as well as sending us weekly summaries of Nagios reports. This means we can helpfully use the Datadog dashboard as a consolidated piece of infrastructure for metrics and events. As a result: Haskell.org is being monitored much more closely here on out we hope. Better log analysis: ElasticSearch We've (very recently) also begun rolling out another part of the equation: log management. Log management is essential to tracking down big issues over time, and in the past several years, ElasticSearch has become incredibly popular. We have a new ElasticSearch instance, running along with Logstash, which several of our servers now report to (via the logstash-forwarder service, which is lightweight even on smaller servers). Kibana sits in front on a separate server for query management so we can watch the systems live. Furthermore, our ElasticSearch deployment is, like the rest of our infrastructure, 100% encrypted - Kibana proxies backend ElasticSearch queries through HTTPS and over spiped. Servers dump messages into LogStash over SSL. I would have liked to use spiped for the LogStash connection as well, but SSL is unfortunately mandatory at this time (perhaps for the best). We're slowly rolling out logstash-forwarder over our new machines, and tweaking our LogStash filters so they can get juicy information. Hopefully our log index will become a core tool in the future. A new status site As I'm sure some of you might be aware, we now have a fancy new site, https://status.haskell.org, that we'll be using to post updates about the infrastructure, maintenance windows, and expected (or unexpected!) downtimes. And again, someone came to help us - https://status.io gave us this for free! Better server backups Rackspace also fully supports their backup agents which provide compressed, deduplicated backups for your servers. Our previous situation on Hetzner was a lot more limited in terms of storage and cost. Our backups are stored privately on Cloud Files - the same infrastructure that hosts our static content. Of course, backup on Rackspace is only one level of redundancy. That's why we're thinking about trying to roll out Tarsnap soon too. But either way, our setup is far more reliable and robust and a lot of us are sleeping easier (our previous backups were space hungry and becoming difficult to maintain by hand.) GHC: Better build bots, better code review GHC has for a long time had an open infrastructure request: the ability to build patches users submit, and even patches we write, in order to ensure they do not cause machines to regress. Developers don't necessarily have access to every platform (cross compilers, Windows, some obscurely old Linux machine), so having infrastructure here is crucial. We also needed more stringent code review. I (Austin) review most of the patches, but ideally we want more people reviewing lots of patches, submitting patches, and testing patches. And we really need ways to test all that - I can't be the bottleneck to test a foreign patch on every machine. At the same time, we've also had a nightly build infrastructure, but our build infrastructure as often hobbled along with custom code running it (bad for maintenance), and the bots are not directed and off to the side - so it's easy to miss build reports from them. Enter Harbormaster, our Phabricator-powered buildbot for continuous integration and patch submissions! Harbormaster is a part of Phabricator, and it runs builds on all incoming patches and commits to GHC. How? First, when a patch or commit for GHC comes in, this triggers an event through a Herald rule. Herald is a Phab application to get notified or perform actions when events arrive. When a GHC commit or patch comes in, a rule is triggered, which begins a build. Our Herald rule runs a build plan, which is a dependency based sequence of actions to run. The first thing our plan does is allocate a machine resource, or a buildbot. It does this by taking a lease on the resource to acquire (non-exclusive) access to it, and it moves forward. Machine management is done by a separate application, Drydock. After leasing a machine, we SSH into it. We then run a build, using our phab-ghc-builder code. Harbormaster tracks all the stdout output, and test results. It then reports back on the Code review, or the commit in question, and emails the author. This has already lead to a rather large change in development for most GHC developers, and Phabricator is building our patches regularly now - yes, even committers use it! Harbormaster will get more powerful in the future: our build plans will lease more resources, including Windows, Mac, and different varieties of Linux machines, and it will run more general build plans for cross compilers and other things. It's solved a real problem for us, and the latest infrastructure has been relatively reliable. In fact I just get lazy and submit diffs to GHC without testing them - I let the machines do it. Viva la code review! (See the GHC wiki for more on our Phabricator process there's a lot written there for GHC developers.) Phabricator: Documentation, an official wiki, and a better issue tracker That's right, there's now documentation about the Haskell.org infrastructure, hosted on our new official wiki. And now you can report bugs through Maniphest to us. Both of these applications are powered by Phabricator, just like our blog. In a previous life, Haskell.org used Request Tracker (RT) to do support management. Our old RT instance is still running, but it's filled with garbage old tickets, some spam, it has its own PostgreSQL instance alone for it (quite wasteful) and generally has not seen active use in years. We've decided to phase it out soon, and instead use our Phabricator instance to manage problems, tickets, and discussions. We've already started importing and rewriting new content into our wiki and modernizing things. Hopefully these docs will help keep people up to date about the happenings here. But also, our Phabricator installation has become an umbrella installation for several Haskell.org projects (even the Haskell.org Committee may try to use it for book-keeping). In addition, we've been taking the time to extend and contribute to Phab where possible to improve the experience for users. In addition to that, we've also authored several Phab extensions: libphutil-haskell in rPHUH, which extends Phabricator with custom support for GHC and other things. libphutil-rackspace in rPHUR, which extends Phabricator with support for Rackspace, including Cloud Files for storage needs, and build-machine allocation for Harbormaster. libphutil-scrypt (deprecated; soon to be upstream) in rPHUSC, which extends Phabricator with password hashing support for the scrypt algorithm. Future work Of course, we're not done. That would be silly. Maintaining Haskell.org and providing better services to the community is a real necessity for anything to work at all (and remember: computers are the worst). We've got a lot further to go. Some sneak peaks... We'll probably attempt to roll out HHVM for our Mediawiki instance to improve performance and reduce load. We'll be creating more GHC buildbots, including a fabled Windows build bot, and on-demand servers for distributed build load. We'll be looking at ways of making it easier to donate to Haskell.org (on the homepage, with a nice embedded donation link). Moar security. I (Austin) in particular am looking into deploying a setup like grsecurity for new Haskell.org servers to harden them automatically. We'll roll out a new server, https://downloads.haskell.org, that will serve as a powerful, scalable file hosting solution for things like the Haskell Platform or GHC. This will hopefully alleviate administration overhead, reduce bandwidth, and make things quicker (thanks again, Fastly!) [REDACTED: TOP SECRET] And, of course, we'd appreciate all the help we can get! el fin This post was long. This is the ending. You probably won't read it. But we're done now! And I think that's all the time we have for today.

Postdoc Position in Functional and Constraint Programming

Planet Haskell - Thu, 10/30/2014 - 06:00
Postdoctoral position in Functional and Constraint Programming at KU LeuvenThe Declarative Languages and Artificial Intelligence (DTAI) group of KU Leuven (Belgium) invites applicants for a postdoctoral position in the area of functional and constraint programming. The position revolves around domain-specific languages (DSLs) embedded in Haskell for constraint programming. It is part of the EU project GRACeFUL whose overarching theme is tools for collective decision making. The KU Leuven part of the project is under the direction of prof. Tom Schrijvers.To apply you must hold a recent PhD (or be about to graduate) related to either functional or constraint programming. Experience in both areas is an advantage.You will work closely with prof. Schrijvers and his PhD students at KU Leuven, as well as with the GRACeFUL project partners across Europe.The position is for 3 years. The salary is competitive and the starting date negotiable (but no later than February 1). Moreover, KU Leuven's policy of equal opportunities and diversity applies to this position.Application procedure: http://people.cs.kuleuven.be/~tom.schrijvers/postdocposition2.html

Upcoming talk

Planet Haskell - Thu, 10/30/2014 - 06:00
For all you local folks, I'll be giving a talk about my dissertation on November 5th at 4:00–5:00 in Ballantine Hall 011. For those who've heard me give talks about it before, not much has changed since NLCS 2013. But the majority of current CL/NLP, PL, and logic folks haven't seen the talk, so do feel free to stop by. Abstract: Many natural languages allow scrambling of constituents, or so-called "free word order". However, most syntactic formalisms are designed for English first and foremost. They assume that word order is rigidly fixed, and consequently these formalisms cannot handle languages like Latin, German, Russian, or Japanese. In this talk I introduce a new calculus —the chiastic lambda-calculus— which allows us to capture both the freedoms and the restrictions of constituent scrambling in Japanese. In addition to capturing these syntactic facts about free word order, the chiastic lambda-calculus also captures semantic issues that arise in Japanese verbal morphology. Moreover, chiastic lambda-calculus can be used to capture numerous non-linguistic phenomena, such as: justifying notational shorthands in category theory, providing a strong type theory for programming languages with keyword-arguments, and exploring metatheoretical issues around the duality between procedures and values. comments 2014-10-21T02:39:32Z

GHC Weekly News - 2014/10/20

Planet Haskell - Thu, 10/30/2014 - 06:00
Hi *, It's been a few weeks since the last message - and I apologize! We actually are changing the posting time to be Friday now, so hopefully this situation will be corrected preeeeetty quickly from this point forward, and hopefully will give better context over the course of a weekly discussion. That said, let's begin! We've seen plenty of changes to GHC itself in the past few weeks. Some of the highlights include: Some changes to help make Prelude combinators fuse better. David Feuer has been leading a lot of this work, and it's been quite fruitful, with several new things now fusing (like takeWhile, scanl, scanr, and mapAccumL. Relatedly, Data.List.Inits should be far faster thanks to David Feuer (ref: ​Phab:D329). The testsuite driver now has preliminary support for Python 3 - which should be useful for platforms out there that sport it, and ones that will use it as the default eventually (such as Fedora 22, possibly). Some of the initial work by Edward Yang to remove HEAP_ALLOCED from the GHC runtime system has landed. Briefly, HEAP_ALLOCED is a check the RTS uses to determine if some address is part of the dynamic heap - but the check is a bit costly. Edward's full patchset hopes to remove this with an 8% speedup or so on average. GHC now has a new macro, __GLASGOW_HASKELL_PATCHLEVEL__, which will allow you to determine the point-level release of the GHC you're using. This has been a requested feature in the past we were a little hesitant of adding, but Herbert went through and did it for us. (Ref: ​Phab:D66) Template Haskell now supports LINE pragmas, thanks to Eric Mertens (ref: ​Phab:D299). Sergei Trofimovich revived libbfd debugging support for the runtime system linker, which should be of use to several daring souls out there (ref: ​Phab:D193). Several patches from Gintautas Miliauskas has improved the usability of msys and the testsuite on Windows - and he's not done yet! A few improvements to the x86 code generator were committed by Reid Barton and Herbert Valerio Riedel, improving size/space for certain cases (ref: ​Phab:D320, ​Phab:D163). and more besides that, including some linker improvements, and general cleanups as usual. The mailing list has been busy (as usual), with some discussions including: Austin posted some discussion about the tentative 7.10.1 plans - we're still hoping these are accurate, so take note! We hope to freeze mid-November, and release Feburary 2015! [1] Austin also called for some feedback: GHC HQ has become convinced a 7.8.4 release is needed to fix some showstoppers - so please let him know soon if you're totally incapable of using 7.8 for something! [2] Alan Zimmerman has asked for some feedback on his proposed "AST Annotations", which will hopefully allow GHC API clients to add richer annotations to GHC's syntactic representations. The motivation is for refactoring tools like HaRe - and I'm sure all input would be appreciated. [3] Chris done sparked off a discussion about making GHCi awesomer, and I'm sure everyone can appreciate that! In particular, Chris wanted to discuss programmatic means of controlling GHCi itself, and naturally we need to ask - is the current API not enough, and why? [4] Yuras Shumovich has implemented a proposal for allowing the Haskell FFI to support C structures natively as return values - this involves interfacing with C ABI rules to properly support structure layout. While Yuras has an initial implementation in ​Phab:D252, some questions about the feature - including its implementation complexity - remain before it gets merged. [5] Richard Eisenberg made a modest request: can Phabricator patches have a 'roadmap', so people can instruct reviewers how to read a diff? The answer: yes, and it should be relatively easy to implement, and Austin can do so Real Soon Now™. [6] Ben Gamari started a big discussion about one-shot event semantics in the I/O manager, with a lot of replies concerning not only the bug, but machines to test the actual change on. With any luck, Ben's fix for the I/O manager and a test machine should come quickly enough. [7] Herbert Valerio Riedel opened an RFC: Should we look into using AsciiDoc for GHC? Historically, GHC's documentation has been written using DocBook, a verbose but very precise and unambiguous documentation format. However, AsciiDoc offers a much nicer markup language, while retaining DocBook support. In short, it looks like GHC may get a much more clean user manual soon. [8] Yuras opened another discussion: Should we namespace proposals we create on our wiki? What seems uncontroversial can lead to surprising discussion, and the results were mixed this time it seems. [9] Geoff Mainland stepped up and fixed Data Parallel Haskell to work with a new version of vector and GHC. Austin had disabled DPH a few weeks prior due to its difficulty to upgrade, and divergent source trees. With 7.10, GHC will hopefully ship a more modern vector and dph to boot. Austin asks: can we warn on tabs by default for GHC 7.10? It's an easy change and a minor one - but we should at least ask first. Vote now! [10] Philip Hölzenspies opens up a discussion about Uniques in GHC, and their impact on the compilers current design. Philip has a hopeful design to redo Unique values in GHC, and a patch to support it: ​Phab:D323. [11] Richard Eisenberg asks: can we somehow integrate GitHub into our development process? While GitHub doesn't have as many nice tools as something like Phabricator, it has a very high inertia factor, and Richard is interested in making the 'first step' as easy as possible for newbies. Discussions about Phab<->GitHub integrations were afoot, as well as general discussion about contributor needs. There were a lot of points brought up, but the conversation has slightly dried up now - but will surely be revived again. [12] And now, look at all these tickets we closed! Including: #9658, #9094, #9356, #9604, #9680, #9689, #9670, #9345, #9695, #9639, #9296, #9377, #9184, #9684. [1] ​http://www.haskell.org/pipermail/ghc-devs/2014-October/006518.html [2] ​http://www.haskell.org/pipermail/ghc-devs/2014-October/006713.html [3] ​http://www.haskell.org/pipermail/ghc-devs/2014-October/006482.html [4] ​http://www.haskell.org/pipermail/ghc-devs/2014-October/006771.html [5] ​http://www.haskell.org/pipermail/ghc-devs/2014-October/006616.html [6] ​http://www.haskell.org/pipermail/ghc-devs/2014-October/006719.html [7] ​http://www.haskell.org/pipermail/ghc-devs/2014-October/006682.html [8] ​http://www.haskell.org/pipermail/ghc-devs/2014-October/006599.html [9] ​http://www.haskell.org/pipermail/ghc-devs/2014-October/006730.html [10] ​http://www.haskell.org/pipermail/ghc-devs/2014-October/006769.html [11] ​http://www.haskell.org/pipermail/ghc-devs/2014-October/006546.html [12] ​http://www.haskell.org/pipermail/ghc-devs/2014-October/006523.html 2014-10-20T14:14:23Z thoughtpolice

Dijkstra quotes from EWD1284

Planet Haskell - Thu, 10/30/2014 - 06:00
I recently read through this long post entitles Object Oriented Programming is an expensive disaster which must end. I have to agree I largely agree with what he writes, but I don’t think I ever could have written such a well-researched article, and absolutely never one of equal length ;) It does include some nice quotes and references and so far I’ve only read one of the many that I bookmarked, Computing Science: Achievements and Challenges (EWD1284). It does include a few quips that, based on other articles I’ve read, seem fairly typical to Dijkstra. I simply love the way he expressed his opinions at times. This one really ought to have been in the lengthy post on the OOP disaster: After more than 45 years in the field, I am still convinced that in computing, elegance is not a dispensable luxury but a quality that decides between success and failure; in this connection I gratefully quote from The Concise Oxford Dictionary a definition of “elegant”, viz. “ingeniously simple and effective”. Amen. (For those who have wondered: I don’t think object-oriented programming is a structuring paradigm that meets my standards of elegance.) And his thoughts on formal methods are well-known of course, as are his thoughts on iterative design. However, I rather like how he expresses a certain level of disgust of the Anglo-Saxon world when writing about those topics: The idea of a formal design discipline is often rejected on account of vague cultural/philosophical condemnations such as “stifling creativity”; this is more pronounced in the Anglo-Saxon world where a romantic vision of “the humanities” in fact idealizes technical incompetence. Another aspect of that same trait is the cult of iterative design. It amazes me every time I read something by someone like Dijkstra, just how much things stay the same, even in a field like computing science, which is said to be moving so fast. 2014-10-20T00:00:00Z 2014-10-20T00:00:00Z

New Stackage features

Planet Haskell - Thu, 10/30/2014 - 06:00
We have two new updates to Stackage: providing cabal.config files and including Haddock documentation.Haddock documentation on snapshotsNow all new exclusive snapshots will have haddock links, which you can access via the following steps:Go to the stackage.org home page.Choose an exclusive snapshot, e.g. GHC 7.8, 2014-10-20.On the snapshot page will be a link in the menu entitled Haddocks.That link will be to an index page like this from which you can view documentation of all packages included in the snapshot. This means you can generally view everything in one place, on one high availability service.Using Stackage without changing your repoThe recommended way to use Stackage is to simply change your remote-repo field in your .cabal/config file and run cabal update. Henceforth your dependencies will be resolved from Stackage, which is backed by high availability Amazon S3 storage servers, and you will have successful build plans, compilation and passing tests. Hurrah!Try Haskell and the upcoming Haskell.org homepage were both developed with Stackage. This meant I could just specify the Stackage snapshot to use in the README and then the next time I want to upgrade I can just update the snapshot version to get a fresh working build plan.The issue some people are facing is that they cannot change this remote-repo field, either because they're using a cabal sandbox, which doesn't support this yet, or because they just don't want to.The solution to this, in my experience, has been for me to manually go and run cabal freeze in a project I've already built to get the cabal.config file and then give these people that file.Now, it's automated via a cabal.config link on snapshot pages:For new developers working on an application who want to use Stackage, they can do something like this:$ cabal sandbox init $ curl http://www.stackage.org//cabal.config > cabal.config $ cabal install --only-depWhich will install their dependencies from Hackage. We can't guarantee this will always work -- as Stackage snapshots sometimes will have a manual patch in the package to make it properly build with other packages, but otherwise it's as good as using Stackage as a repo.A cabal freeze output in cabal.config will contain base and similar packages which are tied to the minor GHC version (e.g. GHC 7.8.2 vs GHC 7.8.3 have different base numbers), so if you get a cabal.config and you get a dependencies error about base, you probably need to open up the cabal.config and remove the line with the base constraint. Stackage snapshots as used as repos do not have this constraint (it will use whichever base your GHC major version uses).Another difference is that cabal.config is more like an “inclusive” Stackage snapshot -- it includes packages not known to build together, unlike “exclusive” snapshots which only contain packages known to build and pass tests together. Ideally every package you need to use (directly or indirectly) will come from an exclusive snapshot. If not, it's recommended that you submit the package name to Stackage, and otherwise inclusive snapshots or cabal.config are the fallbacks you have at your disposal.

HLint now spots bad unsafePerformIO

Planet Haskell - Thu, 10/30/2014 - 06:00
Summary: I've just released a new version of HLint that can spot an unsafePerformIO which should have NOINLINE but doesn't.I've just released HLint v1.9.10, a tool to suggest improvements to Haskell code. I don't usually bother with release announcements of HLint, as each version just fixes a few things and adds a few hints, it's all in the changelog (plus there have now been 102 releases). The latest release attempts to make using unsafePerformIO a little bit safer. A common idiom for top-level mutable state in Haskell is:myGlobalVar :: IORef IntmyGlobalVar = unsafePerformIO (newIORef 17)That is, define a top-level CAF (function with no variables) and bind it to unsafePerformIO of some created mutable state. But the definition above is unsafe. GHC might decide myGlobalVar is cheap and inline it into several uses, duplicating the variable so that some functions update different versions. Running this code through the latest version of HLint we see:Sample.hs:2:1: Error: Missing NOINLINE pragmaFound: myGlobalVar = unsafePerformIO (newIORef 17)Why not: {-# NOINLINE myGlobalVar #-} myGlobalVar = unsafePerformIO (newIORef 17)HLint has spotted the problem, and suggested the correct fix.Trying it for realLet's take the package slave-thread-0.1.0 and run HLint on it. Slave thread is a new package that helps you ensure you don't end up with ghost threads or exceptions being thrown but missed - a useful package. Running HLint we see:Sample.hs:19:1: Error: Missing NOINLINE pragmaFound: slaves = unsafePerformIO $ Multimap.newIOWhy not: {-# NOINLINE slaves #-} slaves = unsafePerformIO $ Multimap.newIOSample.hs:20:3: Warning: Redundant $Found: unsafePerformIO $ Multimap.newIOWhy not: unsafePerformIO Multimap.newIOSample.hs:25:1: Error: Eta reduceFound: fork main = forkFinally (return ()) mainWhy not: fork = forkFinally (return ())Sample.hs:55:28: Warning: Redundant $Found: PartialHandler.totalizeRethrowingTo_ masterThread $ memptyWhy not: PartialHandler.totalizeRethrowingTo_ masterThread memptySample.hs:72:5: Error: Use unlessFound: if null then return () else retryWhy not: Control.Monad.unless null retryHLint has managed to spot the missing NOINLINE pragma, and also a handful of tiny cleanups that might make the code a little more readable. Fortunately (and independent of HLint), the NOINLINE pragma was added in slave-thread-0.1.1, so the library no longer suffers from that bug. 2014-10-19T21:23:00Z Neil Mitchell noreply@blogger.com

Anonymity in public life

Planet Haskell - Thu, 10/30/2014 - 06:00
In an article published in the Guardian yesterday, author Kathleen Hale recounts how her first book got some negative reviews by reviewers on a book review website. One reviewer in particular upset her and Kathleen ends up figuring out the reviewer is using a false identity, finds out who the reviewer really is and confronts her. The piece doesn't read to me like some sort of valedictory "I outed a fraud" type piece (though there are some passages in there which are questionable in that direction) and equally there are several passages where Kathleen expresses deep embarrassment and regret for the course of action she took. This episode, and that article in particular has caused substantial reaction: currently 600 comments on the Guardian article plus several other blog posts. There's no shortage of opinion to be found on Twitter either, as you'd expect. The course of action that Kathleen took seems to be fairly undisputed as far as I can find. There is some dispute from some of the other blog posts as to exactly what was tweeted and said by whom, and there is dispute over Kathleen's claim that there are factual inaccuracies made in a review of her book. It is not disputed that the reviewer was using a false identity and that the reviewer had at least public Twitter, Facebook, and Instagram accounts under the false identity. The false identity was also a real name (Blythe Harris), by which I mean a name which if you introduced yourself by that name, no one would think you're using a false identity. This is distinct from claiming to be Peter Rabbit, or Buzz Lightyear. Many people have equated Kathleen's actions with stalking. My dictionary defines the verb to stalk as: to follow or approach (game, prey, etc.) stealthily and quietly to pursue persistently and, sometimes, attack (a person with whom one is obsessed, often a celebrity) , 4,... [not relevant] The second item there certainly fits. The British legal approach, whilst it gives no strict definition gives examples and guidance: ....following a person, watching or spying on them or forcing contact with the victim through any means, including social media. The effect of such behaviour is to curtail a victim's freedom, leaving them feeling that they constantly have to be careful. In many cases, the conduct might appear innocent (if it were to be taken in isolation), but when carried out repeatedly so as to amount to a course of conduct, it may then cause significant alarm, harassment or distress to the victim. I'm glad it includes "social media" there. Some comments have suggested that stalking "in real life" is worse than online. This seems bizarre to me: as if through a computer you are not interacting with other human beings but merely with shiny pixels who have no emotional capacity. "In real life" is everything we know. Whilst we're alive we have no personal experience of anything other than "in real life". So I'm fairly sold on the whole argument that Kathleen's behaviour towards this reviewer can be considered stalking and as such is reprehensible. To me, the far more interesting issue is the use of anonymity, false identities and any realistic expectation we have of privacy on the internet. A number of people who claim to write book reviews on such sites have suggested that the behaviour of Kathleen is exactly why they write their reviews under false names. I think there's something of a contradiction going on here. But let's work backwards. Firstly, Kathleen, through some social engineering (she requested from the book review site the address of the reviewer so that she could post her a copy of the book) got the address of the book reviewer. She then used a telephone directory and census results to identify who really lived there (or likely owned the land). Now the use of the telephone directory seems a bit odd to me: telephony directories map names to numbers (and maybe addresses). Yes, you could use it to map an address to a name but it's very inefficient: you're essentially searching through the whole directory looking for the address whilst the directory is sorted by name, not address. So unless it was a very small telephone directory, I don't really buy that. Using census results is far more creditable: they're public documents and when they're online, they do allow you to search by address. In the UK you can only get access to the raw census details 100 years after the census has been published which, to a high probability, rules it out as a means to tie an address to a person who's still alive. You can get statistics and aggregates from more recent census results but you can't get the raw data. I'm assuming that in the US there's no such restriction on access to raw census data. If there is then I don't understand how Kathleen really managed to get a name for the owner of the property. Instead, in the UK, if you want to find out who owns some land, you can pay the land registry £3 and they'll tell you. Presumably there are means by which you can legally hide this; I'm sure the rich have figured this out - probably some method by which some fake company in a tax haven technically "owns" the land and as they're registered abroad, they don't have to divulge any further details about that company. So yes, you could argue the Land Registry is profiting from facilitating stalkers, but equally there are a bunch of legitimate reasons to need access to such data and I can't think of any sane way to ensure the use of such a service isn't abused. So from that I conclude that unless the owner is a millionaire, the owner of any land is public knowledge. The use of social engineering to get the address in the first place is more interesting but very obvious. This sort of thing happens a lot and sometimes to horrifying consequences (e.g. the Australian DJs who phoned up a hospital, pretending to be the Queen and Prince of Wales, enquiring as to the health of the Duchess of Cambridge. The nurse fell for the hoax and put the call through. Three days later, the nurse committed suicide). As a species we are not good at taking the time to verify who we're talking to or why. Whilst (hopefully) most of us would hang up if our bank apparently rang us and then asked for our credit card details "for security" this is largely only because it's in the bank's interest (in terms of cost of insurance) to reduce fraud, so they've trained us as such. But in all sorts of other scenarios we implicitly trust people we've no real reason to. A simple example: ticket inspectors on public transport. They may be wearing the uniform, but it could be faked. With their travel-card readers they could be seeing who has the expensive yearly travel cards, scanning the unique numbers from them and then using them to program up fraudulent cards. The crypto on those things is notoriously weak. Has anyone ever requested some means to verify the identity of a ticket inspector? And even if you could, how do you know they're not crooked regardless? So phoning someone up, impersonating someone else, or pretending to have valid reasons to request the information you're requesting is always likely to work. It might be illegal in some cases, but it's certainly human nature to try to be helpful and if you're given a plausible justification, on what basis could you refuse the request unless it's contrary to some sort of company policy? In this case, if you're concerned about anonymity, wouldn't you be concerned about this possibility, and make use of an anonymous mail box? Article 8 of the European Convention on Human Rights guarantees an individual's right to respect for privacy and family life, including correspondence. Is privacy the same as anonymity? No, definitely not: In conflating anonymity and privacy, we have failed to see an important factual difference between them: under the condition of privacy, we have knowledge of a person’s identity, but not of an associated personal fact; whereas under the condition of anonymity, we have knowledge of a personal fact, but not of the associated person’s identity The vast violations of our lives by state surveillance as revealed by Snowdon over the last year demonstrates the whole-scale collation of everything we do online and off by our governments. This is both being able to observe an action and identify the individual who caused it (thus we have no hope of performing any action anonymously), and being able to observe an individual and know the actions they take (thus no privacy). I can't work out whether the ECHR has anything to say on a right to anonymity; I get the sense that it doesn't try to protect that. So that's basically saying: "the state shouldn't record your every move (as that's an invasion of privacy), but moves that we're interested in, we can know who did them". Of course, we now know they're recording everything anyway. We also know that computer systems can always be hacked into - there is no real security anywhere. Given a skilled and sufficiently funded adversary, any computer system connected in any way to the internet can be hacked into. Why? Because humans wrote the software that runs on those computers and humans are incapable of writing bug-free software. Look at all the large scale data breaches in recent history. Nothing is secure. So we have laws that seem to try and protect privacy, but they're violated by our own governments, and in any case, we have countless examples of our inability to store any information securely. So is there really any hope to be able to exist with anonymity on the internet? As ever, it depends who your adversary is. If your adversary is a government (either your own or some foreign government) then no, you have no hope. If it's a previous partner of yours who has no particular computer training, then yes, you're probably going to have a reasonable chance of being anonymous for a while. But you need to read up on this and think hard: it's not a trivial undertaking. There are some good guides as to how to do this, but: All writers - whether writing under their own names or not - should be aware of the risks they may incur by hitting 'publish'. What is the effect of hitting "publish"? It's to put more data points out there which may lead people to be able to identify you. The fewer data points out there, the better. So coming back to our book reviewer, if you want to review books anonymously, and if your justification for acting anonymously is to avoid being stalked by authors who don't like your reviews, then why put so many data points out there? Why have the Facebook page, the Instagram profile with the faked photos, the Twitter account? Why give your real postal address to the book review club knowing they're going to post books to it and might conceivably give your address out to other people? The social media accounts in particular I find most odd. If you want to review books then review books. Build your following, your reputation and cachet on the quality of your reviews. If I'm looking at a book review I really don't care where you went on holiday, what your tweets are, or how many pets you have. Putting that information out there undermines your entire justification for being anonymous: if you want to be anonymous (i.e. you don't want people to find out who you are) then why are you putting so much unnecessary information out there that may allow people to figure out who you are? Equally, use a name that clearly communicates to me you're trying to be anonymous: call yourself TheBookReviewer53, DostoyevskyLover or OrwellWasRight. Doing so doesn't lessen the validity of your opinions on your chosen subject and is more honest with people reading your reviews: it's overtly saying "I have reasons to want to exist anonymously on the internet". It reveals nothing more about your real identity either: regardless of the obvious fictitious-ness of your online persona, if you can be found, you can be found. Researchers show that four data points about a person’s location can identify that person with 95% accuracy. FOUR. You think you can tweet anonymously from your phone? You think apps like Whisper allow you to act anonymously? As with pretty much everything related to the internet and computing, unless you've spent the last 20 years of your life working with computers, studying computers and thinking very hard about threat models and what data you're putting out there, and are utterly paranoid, you basically haven't got a chance. Do you turn off wifi on your phone when you leave the house? You should. You trust that USB pen drive you're transferring documents on? You shouldn't. Finally and most obviously, any attempt at anonymity clearly doesn't insulate you from the law. As members of various hacking groups such as lulzsec found out, you always can be found out by law enforcement agencies. Yes, you might be able to make it difficult for a poorly funded person to come after you for libel (which is really just an indictment of the disgusting relationship between justice and money) but it's quite a risk to take. If you wouldn't put it in print with your real name attached, you're placing an awful lot of trust on your ability to maintain your anonymity against an adversary you probably don't know as well as you need to. 2014-10-19T15:00:00Z

Quasi-quoting DSLs for free

Planet Haskell - Thu, 10/30/2014 - 06:00
Suppose you are writing a compiler for some programming language or DSL. If you are doing source to source transformations in your compiler, perhaps as part of an optimization pass, you will need to construct and deconstruct bits of abstract syntax. It would be very convenient if we could write that [...]

Haskell : A neat trick for GHCi

Planet Haskell - Thu, 10/30/2014 - 06:00
Just found a really nice little hack that makes working in the GHC interactive REPL a little easier and more convenient. First of all, I added the following line to my ~/.ghci file. :set -DGHC_INTERACTIVE All that line does is define a GHC_INTERACTIVE pre-processor symbol. Then in a file that I want to load into the REPL, I need to add this to the top of the file: {-# LANGUAGE CPP #-} and then in the file I can do things like: #ifdef GHC_INTERACTIVE import Data.Aeson.Encode.Pretty prettyPrint :: Value -> IO () prettyPrint = LBS.putStrLn . encodePretty #endif In this particular case, I'm working with some relatively large chunks of JSON and its useful to be able to pretty print them when I'm the REPL, but I have no need for that function when I compile that module into my project.

Fixing Haddock docs on Hackage

Planet Haskell - Thu, 10/30/2014 - 06:00
Summary: A few weeks ago Hackage stopped generating docs. I have a script that generates the docs, and also fixes some Haddock bugs.Update: The Haddock documentation generators are running once again, but may be somewhat behind for a little while. A few weeks ago Hackage stopped generating documentation, so if you look at recently uploaded pages they tend to either lack docs, or have very alert maintainers who did a manual upload. I've packaged up my solution, which also fixes some pretty annoying Haddock bugs. Given that I can now get docs faster and better with my script, I'll probably keep using it even after Haddock on Hackage gets restarted.How to use itYou are likely to get better results if you always install the packages you use with documentation.Ensure you have tar, curl, cp and cabal on your $PATH.cabal update && cabal install neilMake a release, don't touch any other code, then make sure you are in the project directory.neil docs --username=YourHackageUsernameType in your Hackage password at the prompt.And like that, your docs are online. To see an example of something that was generated with this process, look at Shake.What I fixedI automated the process using scripts originally taken from the lens library, supplemented with suggestions from Reddit. I then do a number of manual fixups.Haddock now makes cross-module links where it doesn't know what the target is default to types. Concretely, if I write 'Development.Shake.need' in Haddock it generates a link to #t:need, which doesn't exist, when it should be #v:need - I fix that. Update: fixed in Haddock 1.14 or above, as per this ticket.On Windows, if you use CPP and multiline bird-tick (>) Haddock blocks you get a blank line between each line. I fix that.If you follow some of the simpler scripts links outside your package won't work (or at least, didn't for me). I fix that.The neil toolThe neil tool is my personal set of handy Haskell scripts. I make all my releases with it (neil sdist), and do lots of checks that my packages conform to my rules (neil check). I also use it for driving my Travis instances. It's in fairly regular flux. Until now, I've always kept it in Darcs/Git and never released it - it's personal stuff tuned to how I work.You might also notice that neil provides a library. Don't use that, I intend to delete it in a few weeks. Update: library removed. 2014-10-17T20:49:00Z Neil Mitchell noreply@blogger.com

Notes on Quotients Types

Planet Haskell - Thu, 10/30/2014 - 06:00
Posted on October 17, 2014 Tags: types Lately I’ve been reading a lot of type theory literature. In effort to help my future self, I’m going to jot down a few thoughts on quotient types, the subject of some recent google-fu. But Why! The problem quotient types are aimed at solving is actually a very common one. I’m sure at some point or another you’ve used a piece of data you’ve wanted to compare for equality. Additionally, that data properly needed some work to determine whether it was equal to another piece. A simple example might would be representing rational numbers. A rational number is a fraction of two integers, so let’s just say type Rational = (Integer, Integer) Now all is well, we can define a Num instance and what not. But what about equality? Clearly we want equivalent fractions to be equal. That should mean that (2, 4) = (1, 2) since they both represent the same number. Now our implementation has a sticky point, clearly this isn’t the case on its own! What we really want to say is “(2, 4) = (1, 2) up to trivial rejiggering”. Haskell’s own Rational type solves this by not exposing a raw tuple. It still exists under the hood, but we only expose smart constructors that will reduce our fractions as far as possible. This is displeasing from a dependently typed setting however, we want to be able to formally prove the equality of some things. This “equality modulo normalization” leaves us with a choice. Either we can really provide a function which is essentially foo : (a b : Rational) -> Either (reduce a = reduce b) (reduce a /= reduce b) This doesn’t really help us though, there’s no way to express that a should be observationally equivalent to b. This is a problem seemingly as old as dependent types: How can we have a simple representation of equality that captures all the structure we want and none that we don’t. Hiding away the representation of rationals certainly buys us something, we can use a smart constructor to ensure things are normalized. From there we could potentially prove a (difficult) theorem which essentially states that =-with-norm : (a b c d : Integer) -> a * d = b * c -> mkRat a b = mkRat c d This still leaves us with some woes however, now a lot of computations become difficult to talk about since we’ve lost the helpful notion that denominator o mkRat a = id and similar. The lack of transparency shifts a lot of the burden of proof onto the code privy to the internal representation of the type, the only place where we know enough to prove such things. Really what we want to say is “Hey, just forget about a bit of the structure of this type and just consider things to be identical up to R”. Where R is some equivalence relation, eg a R a a R b implies b R a a R b and b R c implies a R c If you’re a mathematician, this should sound similar. It’s a lot like how we can take a set and partition it into equivalence classes. This operation is sometimes called “quotienting a set”. For our example above, we really mean that our rational is a type quotiented by the relation (a, b) R (c, d) iff a * c = b * d. Some other things that could potentially use quotienting Sets Maps Integers Lots of Abstract Types Basically anything where we want to hide some of the implementation details that are irrelevant for their behavior. More than Handwaving Now that I’ve spent some time essentially waving my hand about quotient types what are they? Clearly we need a rule that goes something like Γ ⊢ A type, E is an equivalence relation on A ———————————————–——————————————————————————————— Γ ⊢ A // E type Along with the typing rule Γ ⊢ a : A —————————————————— Γ ⊢ a : A // E So all members of the original type belong to the quotiented type, and finally Γ ⊢ a : A, Γ ⊢ b : A, Γ ⊢ a E b –——————————————–—————————————————— Γ ⊢ a ≡ b : A // E Notice something important here, that ≡ is the fancy shmancy judgmental equality baked right into the language. This calls into question decidability. It seems that a E b could involve some non-trivial proof terms. More than that, in a constructive, proof relevant setting things can be a bit trickier than they seem. We can’t just define a quotient to be the same type with a different equivalence relation, since that would imply some icky things. To illustrate this problem, imagine we have a predicate P on a type A where a E b implies P a ⇔ P b. If we just redefine the equivalence relation on quotes, P would not be a wellformed predicate on A // E, since a ≡ b : A // E doesn’t mean that P a ≡ P b. This would be unfortunate. Clearly some subtler treatment of this is needed. To that end I found this paper discussing some of the handling of NuRPL’s quotients enlightening. How NuPRL Does It The paper I linked to is a discussion on how to think about quotients in terms of other type theory constructs. In order to do this we need a few things first. The first thing to realize is that NuPRL’s type theory is different than what you are probably used to. We don’t have this single magical global equality. Instead, we define equality inductively across the type. This notion means that our equality judgment doesn’t have to be natural in the type it works across. It can do specific things at each case. Perhaps the most frequent is that we can have functional extensionality. f = g ⇔ ∀ a. f a = g a Okay, so now that we’ve tossed aside the notion of a single global equality, what else is new? Well something new is the lens through which many people look at NuRPL’s type theory: PER semantics. Remember that PER is a relationship satisfying a R b → then b R a a R b ∧ b R c → a R c In other words, a PER is an equivalence relationship that isn’t necessarily reflexive at all points. The idea is to view types not as some opaque “thingy” but instead to be partial equivalence relations across the set of untyped lambda calculus terms. Inductively defined equality falls right out of this idea since we can just define a ≡ b : A to be equivalent to (a, b) ∈ A. Now another problem rears it head, what does a : A mean? Well even though we’re dealing with PERs, but it’s quite reasonable to say something is a member of a type if it’s reflexive. That is to say each relation is a full equivalence relation for the things we call members of that type. So we can therefore define a : A to be (a, a) ∈ A. Another important constraint, in order for a type family to be well formed, it needs to respect the equality of the type it maps across. In other words, for all B : A → Type, we have (a, a') ∈ A' ⇒ (B a = B a') ∈ U. This should seem on par with how we defined function equality and we call this “type functionality”. Let’s all touch on another concept: squashed types. The idea is to take a type and throw away all information other than whether or not it’s occupied. There are two basic types of squashing, extensional or intensional. In the intensional we consider two squashed things equal if and only if the types they’re squashing are equal A = B ———————————— [A] = [B] Now we can also consider only the behavior of the squashed type, the extensional view. Since the only behavior of a squashed type is simply existing, our extensional squash type has the equivalence ∥A∥ ⇔ ∥B∥ ————————– ∥A∥ = ∥B∥ Now aside from this, the introduction of these types are basically the same: if we can prove that a type is occupied, we can grab a squashed type. Similarly, when we eliminate a type all we get is the trivial occupant of the squashed type, called •. Γ ⊢ A ——————— Γ ⊢ [A] Γ, x : |A|, Δ[̱•] ⊢ C[̱•] —————————————————————————— Γ, x : |A|, Δ[x] ⊢ C[x] What’s interesting is that when proving an equality judgment, we can unsquash obth of these types. This is only because NuRPL’s equality proofs computationally trivial. Now with all of that out of the way, I’d like to present two typing rules. First Γ ⊢ A ≡ A'; Γ, x : A, y : A ⊢ E[x; y] = E'[x; y]; E and E' are PERS ———————————————————————————————————————————————————————————————————— Γ ⊢ A ‌// E ≡ A' // E' In English, two quotients are equal when the types and their quotienting relations are equal. Γ, u : x ≡ y ∈ (A // E), v : 
∥x E y∥, Δ[u] ⊢ C [u] ———————————————————————————————————————————————————– Γ, u : x ≡ y ∈ (A // E), Δ[u] ⊢ C [u] There are a few new things here. The first is that we have a new Δ [u] thing. This is a result of dependent types, can have things in our context that depend on u and so to indicate that we “split” the context, with Γ, u, Δ and apply the depend part of the context Δ to the variable it depends on u. Now the long and short of this is that when we’re of this is that when we’re trying to use an equivalence between two terms in a quotient, we only get the squashed term. This done mean that we only need to provide a squash to get equality in the first place though Γ ⊢ ∥ x E y 
∥; Γ ⊢ x : A; Γ ⊢ y : A ——————————————————————————————————– Γ ⊢ x ≡ y : A // E Remember that we can trivially form an ∥ A ∥ from A’. Now there’s just one thing left to talk about, using our quotiented types. To do this the paper outlines one primitive elimination rule and defines several others. Γ, x : A, y : A, e : x E y, a : ND, Δ[ndₐ{x;y}] ⊢ |C[ndₐ{x;y}]| ——————————————————————————————————————————————————————————————– Γ, x : A // E, Δ[x] ⊢ |C[x]| ND is a admittedly odd type that’s supposed to represent nondeterministic choice. It has two terms, tt and ff and they’re considered “equal” under ND. However, nd returns its first argument if it’s fed tt and the second if it is fed ff. Hence, nondeterminism. Now in our rule we use this to indicate that if we’re eliminating some quotiented type we can get any value that’s considered equal under E. We can only be assured that when we eliminate a quotiented type, it will be related by the equivalence relation to x. This rule captures this notion by allowing us to randomly choose some y : A so that x E y. Overall, this rule simply states that if C is occupied for any term related to x, then it is occupied for C[x]. Wrap up As with my last post, here’s some questions for the curious reader to pursue What elimination rules can we derive from the above? If we’re of proving equality can we get more expressive rules? What would an extensional quotient type look like? Why would we want intensional or extensional? How can we express quotient types with higher inductive types from HoTT The last one is particularly interesting. Thanks to Jon Sterling for proof reading /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */ var disqus_shortname = 'codeco'; // required: replace example with your forum shortname /* * * DON'T EDIT BELOW THIS LINE * * */ (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); Please enable JavaScript to view the comments powered by Disqus. comments powered by Disqus 2014-10-17T00:00:00Z
Syndicate content