Summer of Haskell

GSoC 2018 Ideas

This is a list of ideas for students who are considering to apply to Google Summer of Code 2018 for Haskell.org. You can contribute ideas by sending a pull request to our github repository.

Please be aware that:

Table of Contents

  1. Implementing Automatic Differentiation in Accelerate
  2. Proof of Concept Support for Multiple Public Libraries in a .cabal package
  3. Complete Cabal's Nix-style `cabal new-build`
  4. Haskell specific tooling for working with nix projects
  5. CodeWorld Editing and Debugging Tools
  6. A format-preserving YAML library for Haskell
  7. Improve GHC Code Generator
  8. Add support for deprecating exports
  9. Implement aspects of Dependent Haskell
  10. Benchmarking graph libraries and optimising algebraic graphs
  11. Finish the package candidate workflow for Hackage
  12. Make Hackage CDN-aware
  13. Help Hadrian
  14. Improvements to Haskell IDE Engine
  15. Finalize the Hasktorch Library for Hackage Release
  16. Hi Haddock
  17. A library for in-memory data analysis in Haskell
  18. A Binary backend for Postgresql/Persistent
  19. Haskell Program Analysis using GHC Source Plugins
  20. Quickchecking web APIs
  21. New authentication schemes for servant-auth
  22. Offline mode for Stack

Implementing Automatic Differentiation in Accelerate🔗

Automatic Differentiation (AD) is a technique for automatically calculating the derivatives of numerical functions, and its implementation on massively parallel processors is one of the core technical advances driving the ongoing deep learning revolution. Within the Haskell ecosystem, the ad library provides a comprehensive and intuitive implementation of automatic differentiation, while accelerate is a library for parallel array computations which can be executed on GPUs and multicore CPUs. Unfortunately, these libraries cannot be easily combined; ad works with standard Haskell types, whereas accelerate is implemented as an embedded language with a distinct type system.

The goal of this project is to implement automatic differentiation on the GPU (and by extension, other supported architectures) using accelerate. We propose to implement this as a new package, drawing inspiration from the design of the existing ad package. The success of this project would allow Haskell to stake out territory in the ever expanding field of machine learning, as a language capable of providing both high-performance numerics, as well as the safety of a strong static type system.

Implementation Strategies

Basic sequential automatic differentiation is relatively straightforward to implement (as in, for example, the ad package) but obtaining code which can be executed in parallel is trickier. Recent work by Fritz Henglein and Garbiele Keller provides a general framework for how to differentiate linear functions; implementing this framework in accelerate would be the best path towards vectorised automatic differentiation.

Future work / possible extensions

The use of BLAS libraries, such as via the bindings provided in accelerate-blas, could further improve performance of the library.

Mentor: Fritz Henglein, Gabriele Keller, Trevor McDonell, Edward Kmett, Sacha Sokoloski

Difficulty: Advanced

Proof of Concept Support for Multiple Public Libraries in a .cabal package🔗

A common pattern with large scale Haskell projects is to have a large number of tightly-coupled packages that are released in lockstep. One notable example is amazonka; as pointed out in amazonka#4155 every release involves the lockstep release of 89 packages. Here, the tension between the two uses of packages is clearly on display:

  1. A package is a unit of code, that can be built independently. amazonka is split into lots of small packages instead of one monolithic package so that end-users can pick and choose what code they actually depend on, rather than bringing one gigantic, mega-library as a dependency of the library.

  2. A package is the mechanism for distribution, something that is ascribed a version, author, etc. amazonka is a tightly coupled series of libraries with a common author, and so it makes sense that they want to be distributed together.

The concerns of (1) have overridden the concerns of (2): amazonka is split into small packages which is nice for end-users, but means that the package maintainer needs to upload 89 packages whenever they need to do a new version.

The way to solve this problem is to split apart (1) and (2) into different units. The package should remain the mechanism for distribution, but a package itself should contain multiple libraries, which are independent units of code that can be built separately.

The goal of this project is to complete a proof-of-concept implementation of multiple public libraries in Cabal and cabal-install. The completion of this feature requires additional work outside the scope of this project, including patching hackage-server and Haddock.

For additional information and discussion, see cabal#4206

Mentor: Edward Z. Yang

Difficulty: Advanced

Complete Cabal's Nix-style `cabal new-build`🔗

new-build is a major reworking of how Cabal and cabal-install work internally that unifies the old build commands and sandboxes, and is inspired by concepts from Nix. new-build significantly improves developer experience by addressing common problems which were attributed to “Cabal Hell”. See also Edward’s blog post and the “Nix-style Local Builds” section of the Cabal manual for an introduction to new-build and a more detailed explanation.

Last year a lot of progress was made, and cabal new-build is already gaining popularity, despite being incomplete; we’re quite close, but we’re not there yet!

There is likely too much for a single student to complete in a single summer, so we welcome proposals that include some reasonable subset of the functionality listed below.

In order to reach the major “Cabal 3.0” milestone, which denotes switching over to the new-build infrastructure as the default (thus finally retiring the old/sandbox commands), the following critical features need to be completed or implemented:

Additional nice-to-have stretch goals:

Potential Mentors: Edward Z. Yang, Mikhail Glushenkov, Herbert Valerio Riedel

Difficulty: Intermediate to Advanced, depending on the chosen task(s).

Haskell specific tooling for working with nix projects🔗

Nix is becoming more and more widely used as a way to manage package dependencies. This is despite the approach being quite low level and difficult to use. There are very few layers of abstraction which isolate less experienced uses from the internal workings of the nix machine.

There are currently three main ways in which people use nix and Haskell together. All of these have different benefits and tradeoffs.

  1. Using stack with nix (somewhat common)
  2. Using cabal with the nix option (very uncommon)
  3. Using nix directly (the most common)

To take each option in turn.

  1. only uses nix to manage non-haskell dependencies. This is clearly not ideal as we can’t make use of the binary caching or anything else which is great about nix.

  2. is quite simple minded currently and relies on the presence of an already generated shell.nix file. When the option is set several commands are run in this shell instead of using cabal’s normal dependency management.

  3. The most flexible option is to invoke cabal2nix yourself and then manipulate the environment using nix-shell but there are several redundancies in this approach such as having to regenerate the shell.nix file every time your cabal file changes. It is also quite low level and requires in-depth knowledge about how nix works. We want to abstract away from this.

However, the ideal tool doesn’t yet exist. We want a tool that has the following philosophy: Nix, you are responsible for provisioning the correct environment but I will take care of the all important aspects of the build.

The user provides a declarative specification of their necessary environment (by a cabal file or some other means), then when a user runs a command, nix provisions this environment and then the tool runs the haskell specific commands necessary to build the package locally.

As an exemplification of this, using workflow (3), by default invoking cabal2nix --shell will generate a nix expression which loads both the build and test dependencies into the environment. It is not usual for the test dependency tree to quite a bit larger than the build dependency tree. Ideally, when a user runs “cabal build”, cabal should enter a nix shell with the appropriate build dependencies for building whichever component it wants to build and no more. Similarly, “cabal test” should enter an environment with test dependencies. It is currently possible to achieve this for benchmarking dependencies by the somewhat archaic nix-shell --argstr doBenchmark true.

Some more possible angles to explore are:

This is more of a framework for a project proposal rather than a concrete idea, and there are many more angles to explore. A successful proposal will need to flesh out in detail what would be necessary to implement one or perhaps two of these ideas:

Difficulty: Advanced

Mentors: Matthew Pickering, Will Fancher

CodeWorld Editing and Debugging Tools🔗

CodeWorld is an educational web-based programming environment based on Haskell. There are significant opportunities to make the project easier to use and more successful for students by rethinking editing and debugging tools in a functional setting. The possible feature set is very large, and a significant part of this project would be choosing a set of features to work on. Someone could work on a single ambitious feature, or a collection of smaller features with a cumulative impact.

Specific ideas include:

  1. Extending the auto-complete interface, and adding contextual hints as the user types code, to offer relevant documentation as the user types code.
  2. Offering better visual clues in the editor interface. CodeWorld users are typically children, and struggle with nesting and syntax structure. Ideas include color-coding function names with their corresponding arguments, or marking up major syntax to make the structure apparent.
  3. Extending the debugging features. CodeWorld now offers some unique debugging features that show users how shapes in their output link back to their code. This could be extended to include Elm-style time-traveling debugging features, and other useful extensions.

Mentors: Chris Smith

Difficulty: Varies depending on proposal

A format-preserving YAML library for Haskell🔗

A growing number of (Haskell) applications and libraries rely on YAML as a configuration format. One of the motivating applications that uses YAML a lot is stack.

YAML is meant to be a human-friendly language, so files are often written by hand and contain many comments. However, given the number of YAML files that stack users need to deal with, it would be useful to be able to modify the YAML files programatically from within Haskell.

The existing Haskell YAML library supports parsing and rendering YAML files, but because it uses the efficient C library under the hood, metadata like file layout and comments are not preserved.

Thus, there is a clear niche for a new library that provides a pure Haskell format-preserving YAML parser and renderer library. The focus of this project will be correctness, complying with the YAML specification and providing a clean API. Micro-optimizations are less important, and we don’t expect this parser to be as fast as other YAML parsers, since it needs to do extra bookkeeping for the metadata.

Difficulty: Beginner to Intermediate

Mentor: Tom Sydney Kerckhove, Jasper Van der Jeugt

Improve GHC Code Generator🔗

Some simple improvement to the GHC’s code generator can make a big difference to performance. For example, a recent change reduced the number of instructions to perform floating point abs from approximately 20 to 2.

There are many open tickets for the code generator, at least two of which are suitable for a GSoC student. For example

Steps for the student:

Some tickets that can be reviewd:

   id | Summary
 3557 | CPU Vector instructions in GHC.Prim
 7741 | Add SIMD support to x86/x86_64 NCG
10648 | Some 64-vector SIMD primitives are absolutely useless
13852 | Can we have more SIMD primops, corresponding to the untapped AVX etc. instructions?
12412 | SIMD things introduce a metric ton of known key things
14251 | LLVM Code Gen messes up registers
 4211 | LLVM: Stack alignment on OSX
 5567 | LLVM: Improve alias analysis / performance
 7297 | LLVM incorrectly hoisting loads
 7610 | Cross compilation support for LLVM backend
10010 | LLVM/optimized code for sqrt incorrect for negative values
10074 | Implement the 'Improved LLVM Backend' proposal
10295 | Putting SCC in heavily inlined code results in "error: redefinition of global"
11138 | Kill the terrible LLVM Mangler
11295 | Figure out what LLVM passes are fruitful
11538 | Wrong constants in LL code for big endian targets
12470 | Move LLVM code generator to LLVM bitcode format
12798 | LLVM seeming to over optimize, producing inefficient assembly code...
13045 | LLVM code generation causes segfaults on FreeBSD
13062 | `opt' failed in phase `LLVM Optimiser'. (Exit code: -11)
13724 | Clamping of llvm llc to -O1 and -O2
13852 | Can we have more SIMD primops, corresponding to the untapped AVX etc. instructions?
14528 | LLVM's CallAnalyzer Breaks
 4308 | LLVM compiles Updates.cmm badly
 5140 | Fix LLVM backend for PowerPC
14372 | CMM contains a bunch of tail-merging opportunities 

Mentor: Dominic Steinitz (aka idontgetoutmuch aka cinimod), Ben Gamari, Matthew Pickering, Moritz Angermann, Carter Schonwald (SIMD/floating point semantics and api impact focus)

Difficulty: Intermediate - Hard

Add support for deprecating exports🔗

GHC currently supports a pragma for deprecating top-level entities. This includes individual functions, modules, classes or data constructors. However, it does not support deprecating an export from a module.

Adding support for this would allow us to gracefully (i.e., with a deprecating phase) move functions from one module to another. A good example is the Data.List.lines. This is a String-specific function which clearly belongs in Data.String rather than Data.List.

The desired syntax would probably end up looking like:

module Data.List
  ( …
  {-# DEPRECATE lines "Exported from Data.String instead" #-}
  , lines

  ) where

For more background information, see this ticket: https://ghc.haskell.org/trac/ghc/ticket/4879.

Mentor: Ben Gamari

Difficulty: Advanced

Implement aspects of Dependent Haskell🔗

The design of GHC/Haskell has been steadily marching toward support for dependent types for a long time, debatably starting with the addition of -XFunctionalDependencies, as proposed in 2000. With the -XTypeInType extension in GHC 8.0 (2016), we’re as close as ever. However, much more work remains to be done. This Summer of Haskell idea is to slice off a chunk of that work and implement it!

Precisely which chunk(s) are up to you, the proposer of the project. Good starting places if you’re looking for inspiration can be found in one of several related proposals posted recently. Ideas beyond those proposals include merging the parsers for types and terms, as well as to sort out The Namespace Problem (GHC allows declarations like data T = T. How would that work in a dependently typed language where terms are not syntactically distinct from types?). A student more versed in type theory (e.g., having experience with Types and Programming Languages, among other introductions) might even attempt implementing a dependently typed Core replacement. If you like, you can see Richard Eisenberg’s thesis for inspiration. That thesis aims to describe both the surface language and Core language for Dependent Haskell.

If you can relocate to the Philadelphia, PA, USA, area for the summer, there will be office space you can use, and you’ll be able to work in a space with several other people hacking on GHC. Unfortunately, there is no extra funding to support this relocation. Remote mentorship is also possible, of course.

Mentor: Richard Eisenberg (feel free to email to discuss ideas for your proposal)

Difficulty: Advanced

Benchmarking graph libraries and optimising algebraic graphs🔗

Graphs are a very important data structure and they are known to be difficult to work with in functional programming languages. Several libraries are currently implemented to create and process graphs in Haskell, each of them using different graph representation: Data.Graph from containers, fgl, hash-graph and alga.

Due to their differences and the lack of a reference benchmark, it is not easy for a new user to find the best one for their project.

There will be two major tasks in this proposal:

  1. Develop an automated and fair benchmarking suite for these libraries. The main goal is to help developers to choose easily the library that fits their project. The suite will benchmark (on sparse/dense and weighted/unweighted graphs):
    • Graph construction (e.g. from a list of edges).
    • Graph deconstruction (e.g. to a list of edges).
    • Graph manipulation (add/remove a vertex or an edge).
    • Graph lookup (test existence of a vertex or an edge).
    • Graph algorithms (reachability, topsort, DFS, BFS, SCC).

The “automated” adjective denote the ability to automatically update benchmarks when a new version of a graph library is released. The “fair” one is about the community part of the project. Haskell community should agree that the libraries are used correctly and to their full potential. The student will make effort to contact library authors to receive their feedback.

The aim is to complete the benchmarking suite before the mid-term evaluation.

  1. Help improve the alga library. It is a promising and new approach (based on mathematical results about an algebra of graphs), but it lacks some important features, a user-friendly tutorial, and has not yet been optimised for performance. Hence the following subtasks:
    • Write a tutorial. Alga is well documented, but a new user can be lost in this new way of thinking about graphs.
    • Implement missing algorithms and optimise existing ones on the basis of the developed benchmarking suite. Graph libraries are expected to provide some basic algorithms, but because Alga is so different from conventional graph representations most of these algorithms need to be designed from scratch.
    • As a bonus, try to implement edge-labelled graphs. It is a high-risk subtask, because the theory behind it is still being worked out and requires further discussions.

Working with the Haskell community is the core of this project. The student will engage Haskell developers, in particular both users and authors of existing graph libraries, in order to develop a high-quality well-documented benchmarking suite. Concerning Alga, there is a lot to do and again, the input of the community will be essential to decide which algorithms are needed, how to implement them, and receive feedback on the results. Alga is new, but the student will have support from the library author and will share the challenges with the Haskell community in blog posts.

It is hoped that the project will also benefit the entire community: it will help new developers to choose the right library, help current developers of these libraries to focus on specific optimisation goals and missing features, and, finally, make algebraic graphs a real alternative to existing graph libraries.

Mentor: Andrey Mokhov

Difficulty: Intermediate

Finish the package candidate workflow for Hackage🔗

Hackage candidate packages currently cannot be used directly, and their UI could be improved. We would like to have new packages be uploaded as candidates by default, to improve the vetting process. But this means polishing off candidate functionality. The main issues left to do are tracked here

The first step is moving the candidate display page to the new templating system and sharing code with the main package page. Following this, we need to implement a new candidate index, able to be provided as a secondary index. This would be a “v1” index, and mutable.

Beyond this we want to extend the docbuilder and docuploads to work with candidates, and then implement a fixed workflow from candidacy to validation and then publishing.

Mentors: Gershom Bazerman, Herbert Valerio Riedel

Difficulty: Intermediate

Make Hackage CDN-aware🔗

We have speed and bandwidth issues with the hackage package repository due to needing to disable the CDN for too many pages. This is because when the CDN is on, it caches things people don’t expect – in particular, things that can be updated due to user action.

There are utility functions in the hackage codebase to teach each page to send proper cache-control headers to keep the CDN from serving stale content. However, they aren’t used carefully and uniformly.

Additionally, the CDN interferes with our ability to collect download statistics.

This would be a two phase project:

  1. Annotate hackage pages carefully to ensure that the CDN doesn’t cause confusion with regards to updates to pages.

  2. Design a solution to both allow caching of package downloads and also collect granular statistics. One possibility is to serve downloads via redirects, with the redirect always being hit, and the redirected-to .tgz file being cached.

Mentors: Gershom Bazerman, Herbert Valerio Riedel

Difficulty: Intermediate

Help Hadrian🔗

Hadrian is a new build system for the Glasgow Haskell Compiler. It is based on the Shake library and we hope that it will soon replace the current Make-based build system.

There are many issues that need to be addressed before Hadrian can take over. Help Hadrian by solving some of them! Two specific issues that you will need to solve as part of your summer project are:

Warning: build systems are messy, even those that are written in Haskell. This is not a very glamorous project but it is a very important one: you have a chance to increase the productivity of GHC developers, and hence help the whole community!

Mentor: Andrey Mokhov (feel free to email to discuss the project)

Difficulty: Intermediate.

Improvements to Haskell IDE Engine🔗

Haskell IDE engine is starting to be useful, largely due to the work done in the 2017 HSOC by Zubin Duggal

But there is still plenty to be done to bring it closer to its potential.

Possible goals for a HIE project:

Some of the above may not be substantial enough to fill up the entire summer. In that case, you may target multiple goals.

Mentors: Alan Zimmerman, Zubin Duggal

Difficulty: Intermediate

Finalize the Hasktorch Library for Hackage Release🔗

There are deep connections between functional programming and machine learning computation. Such links can yield new algorithms, as in some recent papers [1,2], while the field of neural networks can be recast as an emerging model of computation based on differentiable function composition [3,4,5]. Furthermore, as machine learning is deployed in diverse fields ranging from technology to healthcare to finance, ensuring the correctness and reliability of these systems is increasingly critical. Given this intersection of ideas and needs, Haskell’s powerful type system and composition mechanisms have the potential to advance the field in important new directions.

To explore this design space, Hasktorch builds on a tensor-based scientific computing C library that has been undergoing development for over a decade and is the foundation of the PyTorch/Torch neural network libraries. It makes available hundreds of mathematical operations including vectorized linear algebra, GPU computation, non-linear transformations, probability functions and sampling operations.

Initial development has been done to comprehensively bind the the core Torch API via code generation, write dependently-typed memory managed abstractions around core tensor operations, and implement a backpack layer for module-level polymorphism. The goal of this proposal is to finalize the library for its first hackage release, which includes:

  1. Migrate any high-level python code from pytorch and torchvision into haskell
  2. Write interfaces to libraries which abstract automatic differentiation procedures (e.g. backprop and diffhask libraries)
  3. Add high-level optimization routines such as stochastic gradient descent and ADAM. This would be a subset of the previous two points.
  4. Automated testing and benchmarking with continuous integration.
  5. Write examples and documentation.

References

  1. Brendan Fong, David I. Spivak, Rémy Tuyéras. Backprop as Functor: A compositional perspective on supervised learning
  2. Leland McInnes, John Healy. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
  3. Chris Olah. https://colah.github.io/posts/2015-09-NN-Types-FP/
  4. Yann Lecun. https://www.facebook.com/yann.lecun/posts/10155003011462143
  5. Atilim Gunes Baydin, Barak A. Pearlmutter, Alexey Andreyevich Radul, Jeffrey Mark Siskind. Automatic differentiation in machine learning: a survey

Difficulty: Advanced

Mentors: Sam Stites, Tim Pierson, Austin Huang

Hi Haddock🔗

…or how to get Haddock docstrings into .hi files

A long-standing issue with Haskell’s documentation tool Haddock is that it needs to effectively re-perform a large part of the parse/template-haskell/typecheck compilation pipeline in order to extract the necessary information from Haskell source for generating rendered Haddock documentation. This makes Haddock generation a costly operation, and makes for a poor developer experience.

An equally long-standing suggestion to address this issue (c.f. “Haddock strings in .hi files” email thread) is to have GHC include enough information in the generated .hi interface files in order to avoid Haddock having to duplicate that work. This would pave the way for following use-cases and/or have the following benefits:

Proposed implementation strategy

This proposal focuses on making the needed changes to GHC’s codebase. The subsequent changes to Haddock are considered future work and are out of scope for this proposal.

An implementation needs to make sure to load the documentation as lazy as possible from the interfaces as it might impose a performance hit in the common case.

Future work

Make Haddock use the GHC’s interface files to produce documentation and thereby simplify its codebase; also figure out how to speedup Haddock’s --hyperlinked-source feature.

Mentors: Alex Biehl, Herbert Valerio Riedel

Difficulty: Advanced

A library for in-memory data analysis in Haskell🔗

A typical workflow in interactive data analysis consists of :

This project aims to provide a library with the following features:

  1. An efficient data structure for possibly larger-than-memory tabular data. The Frames library is notable prior work, and this project may build on top of it (namely, by extending its functionality for generating types from stored data).
  2. A set of functions to “tidy”/clean the data to bring it to a form fit for further analysis, e.g. splitting one column to multiple columns (“spread”) or vice versa (“gather”).
  3. A DSL for performing a representative set of relational operations e.g. filtering/aggregation.

Difficulty: Intermediate

Mentors: Marco Zocca

A Binary backend for Postgresql/Persistent🔗

The Persistent library provides an abstract interface to a number of different databases, including Postgresql. The Postgresql current backend for Persistent uses the postgresql-simple library which uses UTF-8 strings to communicate between the Haskell program and PostgresQL. Marshalling to and from strings obviously has some performance implications.

It would therefore be nice to have an backend for Persistent that uses PostgresQL’s binary protocol. There are already two Haskell libraries that use this binary protocol, Hasql and postgresql-binary.

The aim of this project is to write a new PostgresQL backend for the Persistent library that makes use of this binary protocol, possibly via one of the two existing binary protocol libraries.

The project outline is something along the lines of:

  1. Investigate all three libraries; Persistent, Hasql and postgresql-binary.
  2. Decide how this new binary PostgresQL Persistent backend will operate.
  3. Implement it.
  4. Benchmark the new backend comparing it with the existing Persistent backend which uses postgresql-simple.

Mentors: Erik de Castro Lopo, Nikita Volkov, Maximilian Tagher

Difficulty: Intermediate

Haskell Program Analysis using GHC Source Plugins🔗

Performing any kind of static analysis on Haskell programs has traditionally been very difficult. The main problem has been that in order to load a Haskell module, you need to know a lot of additional information such as where dependencies exist, which preprocessors to run and so on. Syntactic analysis is somewhat possible but semantic analysis has been out of reach.

In recent work, @nboldi extended the plugin interface to allow users to modify and inspect the compiler’s AST as the program is compiled. This has the advantage that it can be integrated easily into any existing build system and the desired information is computed as the program is compiled.

The plugin architecture is very powerful but this project will focusing on using the API in order to extract information rather than modify source program.

Some potential avenues of analyses include:

  1. Refactor SourceGraph
  2. Refactor haskell-indexer
  3. A plugin which analyses a project with hs-boot files and identifies ways to reduce the size and number of them.
  4. A plugin which computes minimal exports
  5. A plugin which computes unused functions in an application across modules.
  6. A plugin which computes statistics about source code, for example which counts how many times certain language features are used.
  7. Analysis of core programs using the existing core2core plugins. For example, looking at the calculated sizes of expressions to help visualise core output.
  8. Integrate using plugins with nix so that it is easy for use to specify they want to run plugins in a declarative manner.

A side-effect of this project will be a refinement of the plugins API and documentation about how other users can use plugins effectively.

A succesful proposal will identify one potential application of a plugin and explain why using a plugin will be beneficial for that application. It would be useful to also consider the challenges of alternative approaches such as using the GHC API.

A succesful project will implement at least one of these analysis ideas and document the process in order to advertise the plugin architecture to other users.

Mentors: Boldizsár Németh, Oliver Charles

Difficulty: Intermediate

Quickchecking web APIs🔗

When writing web applications, there are a number of things one has to keep in mind, independent of the domain of the application, in order to do the job well. For example, that “Accept” headers are honored; that HTML has a doctype; that no endpoint takes too long to respond. Learning about these best practices, remembering to implement them, and testing for them is currently very time-consuming, since it must be done by every developer and for every application anew.

This proposal is to develop a tool to check that any application, if appropriately described, satisfies conditions such as these. Already ‘servant-quickcheck’ does this for ‘servant’ applications, but in this proposal we’d extend the reach to applications which have an OpenAPI (Swagger) description (and perhaps other types of description as well), that indeed need not even by written in Haskell. (However, since more detailed customization, and the definition of new conditions or predicates, would happen in Haskell, this tool might serve, like XMonad, as an introduction to Haskell to many people.) Some related work in this space already exists: servant-swagger generates Swagger definitions from servant types (i.e., the opposite direction), and Masahiro Yamauchi has added servant type generation to the Swagger tool, though this is a large Java project that would be hard to distribute, and wouldn’t allow for developing annotations to Swagger descriptions that have meaning specific to this project.

Beyond the development of translation from Swagger definitions to ‘servant’ types and of the executable, the project may include defining new common predicates.

Some relevant background includes:

Mentors: Julian K. Arni

Difficulty: Intermediate

New authentication schemes for servant-auth🔗

The servant-auth packages are a relatively young attempt at providing a definitive answer to any authentication needs for dealing with protected web applications using servant, a fairly popular set of libraries for serving web applications, querying them and more. The servant-auth packages already have a sufficiently general infrastructure to support about any authentication scheme one might be interested in but only offer JSON Web Tokens (JWT) and basic authentication out of the box at the moment.

One interesting project that could have a quite significant impact would be to dedicate an entire summer to the implementation of a few other essential authentication schemes (OAuth is a good example, but would not fill an entire summer). A decent starting point to figure out what already exists (in other packages or ecosystems) is this hackage search for the existing servant authentication solutions and perhaps this article for a list of common authentication schemes.

Besides OAuth, we do not have a fixed list of authentication schemes to consider so prospective students should feel free to talk to us, the haskell community and do a bit of research in order to come up with a list of authentication schemes that they would like to implement during the summer.

The end goal of this project would be to have 2-3 new authentication schemes (or more of course) implemented in servant-auth, with reasonable haddocks, some tests. The student could optionally also (co-?)author cookbook recipes illustrating how to serve or query APIs that are protected by the newly supported authentication schemes, therefore making the student’s work easily discoverable by current and future servant users. All in all, by the end of the summer, servant-auth would not just be an attempt at solving the authentication problem with servant anymore, it would finally be the definitive solution.

Mentors: Alp Mestanogullari (co-mentors: Julian Arni & Oleg Grenrus)

Difficulty: Intermediate

Offline mode for Stack🔗

Stack is a tool for installing and developping Haskell applications and libraries. It currently requires internet access to operate well.

There are various hacks and projects that attempt to circumvent this problem, e.g:

However, the Stack maintainers are now interested supporting this first-class. This has become more important since offline is operation is required in some corporate settings – and it’s also just useful if you’re writing Haskell on an airplane or on a train!

This work will require touching many different parts of the Stack codebase but it will not require a deep understanding of its internals.

Mentors: Emanuel Borsboom, Stack Contributors

Difficulty: Beginner to Intermediate