GSoC 2018 Ideas
This is a list of ideas for students who are considering to apply to Google Summer of Code 2018 for Haskell.org. You can contribute ideas by sending a pull request to our github repository.
Please be aware that:
- This is not an all-inclusive list, so you can apply for projects not in this list and we will try our best to match you with a mentor.
- You can apply for as many ideas as you want (but only one can be accepted).
- Some general tips on writing a proposal are discussed here.
Table of Contents
- Proof of Concept Support for Multiple Public Libraries in a .cabal package
- Complete Cabal's Nix-style `cabal new-build`
- CodeWorld Editing and Debugging Tools
- Add support for deprecating exports
- Implement quantified contexts (or other type system goodies)
- Finish the package candidate workflow for Hackage
- Make Hackage CDN-aware
- Help Hadrian
- Improvements to Haskell IDE Engine
- Hi Haddock
- Add a CommonMark parser to Pandoc
- A Binary backend for Postgresql/Persistent
Proof of Concept Support for Multiple Public Libraries in a .cabal package🔗
A common pattern with large scale Haskell projects is to have a large number of tightly-coupled packages that are released in lockstep. One notable example is amazonka; as pointed out in amazonka#4155 every release involves the lockstep release of 89 packages. Here, the tension between the two uses of packages is clearly on display:
A package is a unit of code, that can be built independently. amazonka is split into lots of small packages instead of one monolithic package so that end-users can pick and choose what code they actually depend on, rather than bringing one gigantic, mega-library as a dependency of the library.
A package is the mechanism for distribution, something that is ascribed a version, author, etc. amazonka is a tightly coupled series of libraries with a common author, and so it makes sense that they want to be distributed together.
The concerns of (1) have overridden the concerns of (2): amazonka is split into small packages which is nice for end-users, but means that the package maintainer needs to upload 89 packages whenever they need to do a new version.
The way to solve this problem is to split apart (1) and (2) into different units. The package should remain the mechanism for distribution, but a package itself should contain multiple libraries, which are independent units of code that can be built separately.
The goal of this project is to complete a proof-of-concept implementation of multiple public libraries in Cabal and cabal-install. The completion of this feature requires additional work outside the scope of this project, including patching hackage-server and Haddock.
For additional information and discussion, see cabal#4206
Mentor: Edward Z. Yang
Complete Cabal's Nix-style `cabal new-build`🔗
new-build is a major reworking of how Cabal and cabal-install work internally that unifies the old build commands and sandboxes, and is inspired by concepts from Nix.
new-build significantly improves developer experience by addressing common problems which were attributed to “Cabal Hell”. See also Edward’s blog post for an introduction to
new-build and a more detailed explanation.
Last year a lot of progress was made, and
cabal new-build is already gaining popularity, despite being incomplete; we’re quite close, but we’re not there yet!
There is likely too much for a single student to complete in a single summer, so we welcome proposals that include some reasonable subset of the functionality listed below.
In order to reach the major “Cabal 3.0” milestone, which denotes switching over to the
new-build infrastructure as the default (thus finally retiring the old/sandbox commands), the following critical features need to be completed or implemented:
cabal new-install(see cabal#4558 for design and status)
cabal new-cleancommand (see cabal#3835 for an early attempt)
- Resolve issues related to and complete
- Fix high priority show-stopper bugs tagged
nix-local-buildsin the issue tracker
Additional nice-to-have stretch goals: * Support for remote Git-repository dependencies * Resolve cyclic dependencies in test/benchmark-suites * Integrate
cabal outdated with new-build’s codepaths (see cabal#4831 * Implement
cabal new-doctest counterpart to
Potential Mentors: Edward Z. Yang, Mikhail Glushenkov, Herbert Valerio Riedel
Difficulty: Intermediate to Advanced, depending on the chosen task(s).
CodeWorld Editing and Debugging Tools🔗
CodeWorld is an educational web-based programming environment based on Haskell. There are significant opportunities to make the project easier to use and more successful for students by rethinking editing and debugging tools in a functional setting. The possible feature set is very large, and a significant part of this project would be choosing a set of features to work on. Someone could work on a single ambitious feature, or a collection of smaller features with a cumulative impact.
Specific ideas include:
- Extending the auto-complete interface, and adding contextual hints as the user types code, to offer relevant documentation as the user types code.
- Offering better visual clues in the editor interface. CodeWorld users are typically children, and struggle with nesting and syntax structure. Ideas include color-coding function names with their corresponding arguments, or marking up major syntax to make the structure apparent.
- Extending the debugging features. CodeWorld now offers some unique debugging features that show users how shapes in their output link back to their code. This could be extended to include Elm-style time-traveling debugging features, and other useful extensions.
Mentors: Chris Smith
Difficulty: Varies depending on proposal
Add support for deprecating exports🔗
GHC currently supports a pragma for deprecating top-level entities. This includes individual functions, modules, classes or data constructors. However, it does not support deprecating an export from a module.
Adding support for this would allow us to gracefully (i.e., with a deprecating phase) move functions from one module to another. A good example is the
Data.List.lines. This is a
String-specific function which clearly belongs in
Data.String rather than
The desired syntax would probably end up looking like:
For more background information, see this ticket: https://ghc.haskell.org/trac/ghc/ticket/4879.
Mentor: Ben Gamari
Implement quantified contexts (or other type system goodies)🔗
In last year’s Haskell Symposium, Gert-Jan Bottu et al. described a plan for quantified contexts, where a user could write a type like
forall h. (forall f. Functor f => Functor (h f)) => h Maybe Int -> h  Int. The paper linked above has more realistic examples. The key is that a constraint is actually an implication.
The idea as described in that paper would not jibe well with GHC, as the paper’s specification requires backtracking in order to implement. However, a small tweak to what’s described in the paper would no longer need backtracking and should be relatively straightforward to implement. The project would be to finish specifying and then implement this proposal.
It will have significant real-world impact, fixing long-standing GHC bug #2256 and allowing
join to be added to the
Monad typeclass, among other benefits. (The route from this proposal to
join is a bit long and goes via roles, but trust me here that this proposal is the blocker.)
Beyond just quantified contexts, I’m happy to mentor students who wish to hack on GHC’s type system. In particular, advances toward dependent types are strongly encouraged. This might include implementing one of several proposals posted recently. Ideas beyond those proposals includes merging the parsers for types and terms, as well as to sort out The Namespace Problem (GHC allows declarations like
data T = T. How would that work in a dependently typed language where terms are not syntactically distinct from types?). If you like, you can see Richard Eisenberg’s thesis for inspiration.
If you can relocate to the Philadelphia, PA, USA, area for the summer, there will be office space you can use, and you’ll be able to work in a space with several other people hacking on GHC. Remote mentorship is also possible, of course.
Mentor: Richard Eisenberg (feel free to email to discuss ideas for your proposal)
Finish the package candidate workflow for Hackage🔗
Hackage candidate packages currently cannot be used directly, and their UI could be improved. We would like to have new packages be uploaded as candidates by default, to improve the vetting process. But this means polishing off candidate functionality. The main issues left to do are tracked here
The first step is moving the candidate display page to the new templating system and sharing code with the main package page. Following this, we need to implement a new candidate index, able to be provided as a secondary index. This would be a “v1” index, and mutable.
Beyond this we want to extend the docbuilder and docuploads to work with candidates, and then implement a fixed workflow from candidacy to validation and then publishing.
Mentors: Gershom Bazerman, Herbert Valerio Riedel
Make Hackage CDN-aware🔗
We have speed and bandwidth issues with the hackage package repository due to needing to disable the CDN for too many pages. This is because when the CDN is on, it caches things people don’t expect – in particular, things that can be updated due to user action.
There are utility functions in the hackage codebase to teach each page to send proper cache-control headers to keep the CDN from serving stale content. However, they aren’t used carefully and uniformly.
Additionally, the CDN interferes with our ability to collect download statistics.
This would be a two phase project:
Annotate hackage pages carefully to ensure that the CDN doesn’t cause confusion with regards to updates to pages.
Design a solution to both allow caching of package downloads and also collect granular statistics. One possibility is to serve downloads via redirects, with the redirect always being hit, and the redirected-to
.tgzfile being cached.
Mentors: Gershom Bazerman, Herbert Valerio Riedel
Hadrian is a new build system for the Glasgow Haskell Compiler. It is based on the Shake library and we hope that it will soon replace the current Make-based build system.
There are many issues that need to be addressed before Hadrian can take over. Help Hadrian by solving some of them! Two specific issues that you will need to solve as part of your summer project are:
Although Hadrian can build GHC, the resulting binary does not pass the validation. To solve this issue you will need to analyse failing tests and find a way to fix them – in most cases this will be a matter of finding a command line flag that will need to be added to or removed from a GHC build command.
There is currently no support for binary distribution. You will need to implement the corresponding build rule in Hadrian.
Warning: build systems are messy, even those that are written in Haskell. This is not a very glamorous project but it is a very important one: you have a chance to increase the productivity of GHC developers, and hence help the whole community!
Mentor: Andrey Mokhov (feel free to email to discuss the project).
Improvements to Haskell IDE Engine🔗
Haskell IDE engine is starting to be useful, largely due to the work done in the 2017 HSOC by Zubin Duggal
But there is still plenty to be done to bring it closer to its potential.
Possible goals for a HIE project:
- Rewriting the completion system
- Complete for module names, ghc pragmas, .cabal files, syntax(if/case/let/where etc.)
- Smart templates for case splitting,
- Make completions include local definitions(possibly making it scope aware)
- Making completions scope aware would involve traversing the AST and collecting all the symbols defined until reaching the cursor position
- Less bugs
- Case splitting
- Expanding Template Haskell in place
- More extensive testing, testing a full LSP session from beginning to end.
- Making find definition work for symbols defined in dependencies
- Sharing build cache between HaRe and GHC, to drastically improve refactoring speed.
- Implementing support for project-wide/cross file references.
- (ghc-mod) Support for more build types(new-build, hpack, nix etc.)
- Anything - Haskell tooling could support your favourite ide feature
- Come up with a way to match the running hie server GHC version with the project GHC version. See https://github.com/haskell/haskell-ide-engine/issues/439
- Add a command to build the project
Some of the above may not be substantial enough to fill up the entire summer. In that case, you may target multiple goals.
Mentors: Alan Zimmerman, Zubin Duggal
…or how to get Haddock docstrings into .hi files
A long-standing issue with Haskell’s documentation tool Haddock is that it needs to effectively re-perform a large part of the parse/template-haskell/typecheck compilation pipeline in order to extract the necessary information from Haskell source for generating rendered Haddock documentation. This makes Haddock generation a costly operation, and makes for a poor developer experience.
An equally long-standing suggestion to address this issue (c.f. “Haddock strings in .hi files” email thread) is to have GHC include enough information in the generated
.hi interface files in order to avoid Haddock having to duplicate that work. This would pave the way for following use-cases and/or have the following benefits:
Significantly speed up Haddock generation by avoiding redundant work.
On-the-fly/lazy after-the-fact Haddock generation in
cabal new-haddockfor already build/installed Cabal library packages.
Allows downstream tooling like Hoogle or Hayoo! to index documentation right from interface files.
Simplify Haddock’s code base.
Proposed implementation strategy
This proposal focuses on making the needed changes to GHC’s codebase. The subsequent changes to Haddock are considered future work and are out of scope for this proposal.
- The student would add two new fields to GHC’s
ifaceArgMapfrom Haddock’s interface files (c.f.
ModGutswith the documentation for declarations (taken from the
MkIfaceto serialise the collected documentation.
- As a simple way to validate the new ability, teach GHCi’s
:info(or alternatively add a new
:doccommand) how to read the documentation from loaded interface files (pretty rendering is not necessary at this point; just dump the raw comment strings).
An implementation needs to make sure to load the documentation as lazy as possible from the interfaces as it might impose a performance hit in the common case.
Make Haddock use the GHC’s interface files to produce documentation and thereby simplify its codebase; also figure out how to speedup Haddock’s
Mentors: Alex Biehl, Herbert Valerio Riedel
Add a CommonMark parser to Pandoc🔗
Pandoc is a very popular tool to convert documents to other formats. One of the most common conversions is converting Markdown to HTML. Unfortunately, Markdown is not well-specified so different tools will produce distinct results for the same Markdown.
CommonMark is an attempt to solve this problem. It consists of an unambiguous specification, a reference implementation, and an extensive test suite. Pandoc naturally needs to support this format.
In other languages, people have written some very efficient CommonMark parsers (cmark and commonmark.js) that can serve as inspiration. However, the parsing algorithms are very imperative and rely on mutable data structures. It will be an interesting project write a nice functional CommonMark parser that share some of the performance properties of these imperative parsers:
- No space/time blowups on specific input cases
In addition to the CommonMark parser, there are more ideas available for Pandoc, too many to list them here.
Mentors: John McFarlane
A Binary backend for Postgresql/Persistent🔗
The Persistent library provides an abstract interface to a number of different databases, including Postgresql. The Postgresql current backend for Persistent uses the postgresql-simple library which uses UTF-8 strings to communicate between the Haskell program and PostgresQL. Marshalling to and from strings obviously has some performance implications.
It would therefore be nice to have an backend for Persistent that uses PostgresQL’s binary protocol. There are already two Haskell libraries that use this binary protocol, Hasql and postgresql-binary.
The aim of this project is to write a new PostgresQL backend for the Persistent library that makes use of this binary protocol, possibly via one of the two existing binary protocol libraries.
The project outline is something along the lines of:
- Investigate all three libraries; Persistent, Hasql and postgresql-binary.
- Decide how this new binary PostgresQL Persistent backend will operate.
- Implement it.
- Benchmark the new backend comparing it with the existing Persistent backend which uses postgresql-simple.
Mentors: Erik de Castro Lopo, Nikita Volkov