Summer of Haskell

GSoC 2018 Ideas

This is a list of ideas for students who are considering to apply to Google Summer of Code 2018 for Haskell.org. You can contribute ideas by sending a pull request to our github repository.

Please be aware that:

Table of Contents

  1. Proof of Concept Support for Multiple Public Libraries in a .cabal package
  2. Complete Cabal's Nix-style `cabal new-build`
  3. CodeWorld Editing and Debugging Tools
  4. Add support for deprecating exports
  5. Implement quantified contexts (or other type system goodies)
  6. Finish the package candidate workflow for Hackage
  7. Make Hackage CDN-aware
  8. Help Hadrian
  9. Improvements to Haskell IDE Engine
  10. Hi Haddock
  11. Add a CommonMark parser to Pandoc
  12. A Binary backend for Postgresql/Persistent

Proof of Concept Support for Multiple Public Libraries in a .cabal package🔗

A common pattern with large scale Haskell projects is to have a large number of tightly-coupled packages that are released in lockstep. One notable example is amazonka; as pointed out in amazonka#4155 every release involves the lockstep release of 89 packages. Here, the tension between the two uses of packages is clearly on display:

  1. A package is a unit of code, that can be built independently. amazonka is split into lots of small packages instead of one monolithic package so that end-users can pick and choose what code they actually depend on, rather than bringing one gigantic, mega-library as a dependency of the library.

  2. A package is the mechanism for distribution, something that is ascribed a version, author, etc. amazonka is a tightly coupled series of libraries with a common author, and so it makes sense that they want to be distributed together.

The concerns of (1) have overridden the concerns of (2): amazonka is split into small packages which is nice for end-users, but means that the package maintainer needs to upload 89 packages whenever they need to do a new version.

The way to solve this problem is to split apart (1) and (2) into different units. The package should remain the mechanism for distribution, but a package itself should contain multiple libraries, which are independent units of code that can be built separately.

The goal of this project is to complete a proof-of-concept implementation of multiple public libraries in Cabal and cabal-install. The completion of this feature requires additional work outside the scope of this project, including patching hackage-server and Haddock.

For additional information and discussion, see cabal#4206

Mentor: Edward Z. Yang

Difficulty: Advanced

Complete Cabal's Nix-style `cabal new-build`🔗

new-build is a major reworking of how Cabal and cabal-install work internally that unifies the old build commands and sandboxes, and is inspired by concepts from Nix. new-build significantly improves developer experience by addressing common problems which were attributed to “Cabal Hell”. See also Edward’s blog post for an introduction to new-build and a more detailed explanation.

Last year a lot of progress was made, and cabal new-build is already gaining popularity, despite being incomplete; we’re quite close, but we’re not there yet!

There is likely too much for a single student to complete in a single summer, so we welcome proposals that include some reasonable subset of the functionality listed below.

In order to reach the major “Cabal 3.0” milestone, which denotes switching over to the new-build infrastructure as the default (thus finally retiring the old/sandbox commands), the following critical features need to be completed or implemented:

Additional nice-to-have stretch goals: * Support for remote Git-repository dependencies * Resolve cyclic dependencies in test/benchmark-suites * Integrate cabal outdated with new-build’s codepaths (see cabal#4831 * Implement cabal new-doctest counterpart to cabal doctest

Potential Mentors: Edward Z. Yang, Mikhail Glushenkov, Herbert Valerio Riedel

Difficulty: Intermediate to Advanced, depending on the chosen task(s).

CodeWorld Editing and Debugging Tools🔗

CodeWorld is an educational web-based programming environment based on Haskell. There are significant opportunities to make the project easier to use and more successful for students by rethinking editing and debugging tools in a functional setting. The possible feature set is very large, and a significant part of this project would be choosing a set of features to work on. Someone could work on a single ambitious feature, or a collection of smaller features with a cumulative impact.

Specific ideas include:

  1. Extending the auto-complete interface, and adding contextual hints as the user types code, to offer relevant documentation as the user types code.
  2. Offering better visual clues in the editor interface. CodeWorld users are typically children, and struggle with nesting and syntax structure. Ideas include color-coding function names with their corresponding arguments, or marking up major syntax to make the structure apparent.
  3. Extending the debugging features. CodeWorld now offers some unique debugging features that show users how shapes in their output link back to their code. This could be extended to include Elm-style time-traveling debugging features, and other useful extensions.

Mentors: Chris Smith

Difficulty: Varies depending on proposal

Add support for deprecating exports🔗

GHC currently supports a pragma for deprecating top-level entities. This includes individual functions, modules, classes or data constructors. However, it does not support deprecating an export from a module.

Adding support for this would allow us to gracefully (i.e., with a deprecating phase) move functions from one module to another. A good example is the Data.List.lines. This is a String-specific function which clearly belongs in Data.String rather than Data.List.

The desired syntax would probably end up looking like:

module Data.List
  ( …
  {-# DEPRECATE lines "Exported from Data.String instead" #-}
  , lines

  ) where

For more background information, see this ticket: https://ghc.haskell.org/trac/ghc/ticket/4879.

Mentor: Ben Gamari

Difficulty: Advanced

Implement quantified contexts (or other type system goodies)🔗

In last year’s Haskell Symposium, Gert-Jan Bottu et al. described a plan for quantified contexts, where a user could write a type like forall h. (forall f. Functor f => Functor (h f)) => h Maybe Int -> h [] Int. The paper linked above has more realistic examples. The key is that a constraint is actually an implication.

The idea as described in that paper would not jibe well with GHC, as the paper’s specification requires backtracking in order to implement. However, a small tweak to what’s described in the paper would no longer need backtracking and should be relatively straightforward to implement. The project would be to finish specifying and then implement this proposal.

It will have significant real-world impact, fixing long-standing GHC bug #2256 and allowing join to be added to the Monad typeclass, among other benefits. (The route from this proposal to join is a bit long and goes via roles, but trust me here that this proposal is the blocker.)

Beyond just quantified contexts, I’m happy to mentor students who wish to hack on GHC’s type system. In particular, advances toward dependent types are strongly encouraged. This might include implementing one of several proposals posted recently. Ideas beyond those proposals includes merging the parsers for types and terms, as well as to sort out The Namespace Problem (GHC allows declarations like data T = T. How would that work in a dependently typed language where terms are not syntactically distinct from types?). If you like, you can see Richard Eisenberg’s thesis for inspiration.

If you can relocate to the Philadelphia, PA, USA, area for the summer, there will be office space you can use, and you’ll be able to work in a space with several other people hacking on GHC. Remote mentorship is also possible, of course.

Mentor: Richard Eisenberg (feel free to email to discuss ideas for your proposal)

Difficulty: Advanced

Finish the package candidate workflow for Hackage🔗

Hackage candidate packages currently cannot be used directly, and their UI could be improved. We would like to have new packages be uploaded as candidates by default, to improve the vetting process. But this means polishing off candidate functionality. The main issues left to do are tracked here

The first step is moving the candidate display page to the new templating system and sharing code with the main package page. Following this, we need to implement a new candidate index, able to be provided as a secondary index. This would be a “v1” index, and mutable.

Beyond this we want to extend the docbuilder and docuploads to work with candidates, and then implement a fixed workflow from candidacy to validation and then publishing.

Mentors: Gershom Bazerman, Herbert Valerio Riedel

Difficulty: Intermediate

Make Hackage CDN-aware🔗

We have speed and bandwidth issues with the hackage package repository due to needing to disable the CDN for too many pages. This is because when the CDN is on, it caches things people don’t expect – in particular, things that can be updated due to user action.

There are utility functions in the hackage codebase to teach each page to send proper cache-control headers to keep the CDN from serving stale content. However, they aren’t used carefully and uniformly.

Additionally, the CDN interferes with our ability to collect download statistics.

This would be a two phase project:

  1. Annotate hackage pages carefully to ensure that the CDN doesn’t cause confusion with regards to updates to pages.

  2. Design a solution to both allow caching of package downloads and also collect granular statistics. One possibility is to serve downloads via redirects, with the redirect always being hit, and the redirected-to .tgz file being cached.

Mentors: Gershom Bazerman, Herbert Valerio Riedel

Difficulty: Intermediate

Help Hadrian🔗

Hadrian is a new build system for the Glasgow Haskell Compiler. It is based on the Shake library and we hope that it will soon replace the current Make-based build system.

There are many issues that need to be addressed before Hadrian can take over. Help Hadrian by solving some of them! Two specific issues that you will need to solve as part of your summer project are:

Warning: build systems are messy, even those that are written in Haskell. This is not a very glamorous project but it is a very important one: you have a chance to increase the productivity of GHC developers, and hence help the whole community!

Mentor: Andrey Mokhov (feel free to email to discuss the project).

Difficulty: Intermediate.

Improvements to Haskell IDE Engine🔗

Haskell IDE engine is starting to be useful, largely due to the work done in the 2017 HSOC by Zubin Duggal

But there is still plenty to be done to bring it closer to its potential.

Possible goals for a HIE project:

Some of the above may not be substantial enough to fill up the entire summer. In that case, you may target multiple goals.

Mentors: Alan Zimmerman, Zubin Duggal

Difficulty: Intermediate

Hi Haddock🔗

…or how to get Haddock docstrings into .hi files

A long-standing issue with Haskell’s documentation tool Haddock is that it needs to effectively re-perform a large part of the parse/template-haskell/typecheck compilation pipeline in order to extract the necessary information from Haskell source for generating rendered Haddock documentation. This makes Haddock generation a costly operation, and makes for a poor developer experience.

An equally long-standing suggestion to address this issue (c.f. “Haddock strings in .hi files” email thread) is to have GHC include enough information in the generated .hi interface files in order to avoid Haddock having to duplicate that work. This would pave the way for following use-cases and/or have the following benefits:

Proposed implementation strategy

This proposal focuses on making the needed changes to GHC’s codebase. The subsequent changes to Haddock are considered future work and are out of scope for this proposal.

An implementation needs to make sure to load the documentation as lazy as possible from the interfaces as it might impose a performance hit in the common case.

Future work

Make Haddock use the GHC’s interface files to produce documentation and thereby simplify its codebase; also figure out how to speedup Haddock’s --hyperlinked-source feature.

Mentors: Alex Biehl, Herbert Valerio Riedel

Difficulty: Advanced

Add a CommonMark parser to Pandoc🔗

Pandoc is a very popular tool to convert documents to other formats. One of the most common conversions is converting Markdown to HTML. Unfortunately, Markdown is not well-specified so different tools will produce distinct results for the same Markdown.

CommonMark is an attempt to solve this problem. It consists of an unambiguous specification, a reference implementation, and an extensive test suite. Pandoc naturally needs to support this format.

Currently pandoc uses a wrapper around libcmark (the C parser) for commonmark and gfm. Having a pure Haskell parser would improve security and allow us to add more extensions. It would also allow compilation to JavaScript with ghcjs.

In other languages, people have written some very efficient CommonMark parsers (cmark and commonmark.js) that can serve as inspiration. However, the parsing algorithms are very imperative and rely on mutable data structures. It will be an interesting project write a nice functional CommonMark parser that share some of the performance properties of these imperative parsers:

In addition to the CommonMark parser, there are more ideas available for Pandoc, too many to list them here.

Mentors: John McFarlane

Difficulty: Beginner

A Binary backend for Postgresql/Persistent🔗

The Persistent library provides an abstract interface to a number of different databases, including Postgresql. The Postgresql current backend for Persistent uses the postgresql-simple library which uses UTF-8 strings to communicate between the Haskell program and PostgresQL. Marshalling to and from strings obviously has some performance implications.

It would therefore be nice to have an backend for Persistent that uses PostgresQL’s binary protocol. There are already two Haskell libraries that use this binary protocol, Hasql and postgresql-binary.

The aim of this project is to write a new PostgresQL backend for the Persistent library that makes use of this binary protocol, possibly via one of the two existing binary protocol libraries.

The project outline is something along the lines of:

  1. Investigate all three libraries; Persistent, Hasql and postgresql-binary.
  2. Decide how this new binary PostgresQL Persistent backend will operate.
  3. Implement it.
  4. Benchmark the new backend comparing it with the existing Persistent backend which uses postgresql-simple.

Mentors: Erik de Castro Lopo, Nikita Volkov

Difficulty: Intermediate