GSoC 2018 Ideas
This is a list of ideas for students who are considering to apply to Google Summer of Code 2018 for Haskell.org. You can contribute ideas by sending a pull request to our github repository.
Please be aware that:
- This is not an all-inclusive list, so you can apply for projects not in this list and we will try our best to match you with a mentor.
- You can apply for as many ideas as you want (but only one can be accepted).
- Some general tips on writing a proposal are discussed here.
Table of Contents
- Implementing Automatic Differentiation in Accelerate
- Proof of Concept Support for Multiple Public Libraries in a .cabal package
- Complete Cabal's Nix-style `cabal new-build`
- Haskell specific tooling for working with nix projects
- CodeWorld Editing and Debugging Tools
- A format-preserving YAML library for Haskell
- Improve GHC Code Generator
- Add support for deprecating exports
- Implement aspects of Dependent Haskell
- Benchmarking graph libraries and optimising algebraic graphs
- Finish the package candidate workflow for Hackage
- Make Hackage CDN-aware
- Help Hadrian
- Improvements to Haskell IDE Engine
- Finalize the Hasktorch Library for Hackage Release
- Hi Haddock
- A library for in-memory data analysis in Haskell
- A Binary backend for Postgresql/Persistent
- Haskell Program Analysis using GHC Source Plugins
- Quickchecking web APIs
- New authentication schemes for servant-auth
- Offline mode for Stack
Implementing Automatic Differentiation in Accelerate🔗
Automatic Differentiation (AD) is a technique for automatically calculating the derivatives of numerical functions, and its implementation on massively parallel processors is one of the core technical advances driving the ongoing deep learning revolution. Within the Haskell ecosystem, the
ad library provides a comprehensive and intuitive implementation of automatic differentiation, while
accelerate is a library for parallel array computations which can be executed on GPUs and multicore CPUs. Unfortunately, these libraries cannot be easily combined;
ad works with standard Haskell types, whereas
accelerate is implemented as an embedded language with a distinct type system.
The goal of this project is to implement automatic differentiation on the GPU (and by extension, other supported architectures) using
accelerate. We propose to implement this as a new package, drawing inspiration from the design of the existing
ad package. The success of this project would allow Haskell to stake out territory in the ever expanding field of machine learning, as a language capable of providing both high-performance numerics, as well as the safety of a strong static type system.
Basic sequential automatic differentiation is relatively straightforward to implement (as in, for example, the
ad package) but obtaining code which can be executed in parallel is trickier. Recent work by Fritz Henglein and Garbiele Keller provides a general framework for how to differentiate linear functions; implementing this framework in
accelerate would be the best path towards vectorised automatic differentiation.
Future work / possible extensions
The use of BLAS libraries, such as via the bindings provided in accelerate-blas, could further improve performance of the library.
Mentor: Fritz Henglein, Gabriele Keller, Trevor McDonell, Edward Kmett, Sacha Sokoloski
Proof of Concept Support for Multiple Public Libraries in a .cabal package🔗
A common pattern with large scale Haskell projects is to have a large number of tightly-coupled packages that are released in lockstep. One notable example is amazonka; as pointed out in amazonka#4155 every release involves the lockstep release of 89 packages. Here, the tension between the two uses of packages is clearly on display:
A package is a unit of code, that can be built independently. amazonka is split into lots of small packages instead of one monolithic package so that end-users can pick and choose what code they actually depend on, rather than bringing one gigantic, mega-library as a dependency of the library.
A package is the mechanism for distribution, something that is ascribed a version, author, etc. amazonka is a tightly coupled series of libraries with a common author, and so it makes sense that they want to be distributed together.
The concerns of (1) have overridden the concerns of (2): amazonka is split into small packages which is nice for end-users, but means that the package maintainer needs to upload 89 packages whenever they need to do a new version.
The way to solve this problem is to split apart (1) and (2) into different units. The package should remain the mechanism for distribution, but a package itself should contain multiple libraries, which are independent units of code that can be built separately.
The goal of this project is to complete a proof-of-concept implementation of multiple public libraries in Cabal and cabal-install. The completion of this feature requires additional work outside the scope of this project, including patching hackage-server and Haddock.
For additional information and discussion, see cabal#4206
Mentor: Edward Z. Yang
Complete Cabal's Nix-style `cabal new-build`🔗
new-build is a major reworking of how Cabal and cabal-install work internally that unifies the old build commands and sandboxes, and is inspired by concepts from Nix.
new-build significantly improves developer experience by addressing common problems which were attributed to “Cabal Hell”. See also Edward’s blog post and the “Nix-style Local Builds” section of the Cabal manual for an introduction to
new-build and a more detailed explanation.
Last year a lot of progress was made, and
cabal new-build is already gaining popularity, despite being incomplete; we’re quite close, but we’re not there yet!
There is likely too much for a single student to complete in a single summer, so we welcome proposals that include some reasonable subset of the functionality listed below.
In order to reach the major “Cabal 3.0” milestone, which denotes switching over to the
new-build infrastructure as the default (thus finally retiring the old/sandbox commands), the following critical features need to be completed or implemented:
cabal new-install(see cabal#4558 for design and status).
cabal new-cleancommand (see cabal#3835 for an early attempt).
- Resolve issues related to and complete
- Fix high priority show-stopper bugs tagged
nix-local-buildin the issue tracker.
Additional nice-to-have stretch goals:
- Support for remote Git-repository dependencies.
- Resolve cyclic dependencies in test/benchmark-suites.
cabal outdatedwith new-build’s codepaths (see cabal#4831.
cabal new-doctestcounterpart to
Potential Mentors: Edward Z. Yang, Mikhail Glushenkov, Herbert Valerio Riedel
Difficulty: Intermediate to Advanced, depending on the chosen task(s).
Haskell specific tooling for working with nix projects🔗
Nix is becoming more and more widely used as a way to manage package dependencies. This is despite the approach being quite low level and difficult to use. There are very few layers of abstraction which isolate less experienced uses from the internal workings of the nix machine.
There are currently three main ways in which people use nix and Haskell together. All of these have different benefits and tradeoffs.
stackwith nix (somewhat common)
cabalwith the nix option (very uncommon)
- Using nix directly (the most common)
To take each option in turn.
only uses nix to manage non-haskell dependencies. This is clearly not ideal as we can’t make use of the binary caching or anything else which is great about nix.
is quite simple minded currently and relies on the presence of an already generated
shell.nixfile. When the option is set several commands are run in this shell instead of using cabal’s normal dependency management.
The most flexible option is to invoke
cabal2nixyourself and then manipulate the environment using
nix-shellbut there are several redundancies in this approach such as having to regenerate the
shell.nixfile every time your cabal file changes. It is also quite low level and requires in-depth knowledge about how nix works. We want to abstract away from this.
However, the ideal tool doesn’t yet exist. We want a tool that has the following philosophy: Nix, you are responsible for provisioning the correct environment but I will take care of the all important aspects of the build.
The user provides a declarative specification of their necessary environment (by a cabal file or some other means), then when a user runs a command, nix provisions this environment and then the tool runs the haskell specific commands necessary to build the package locally.
As an exemplification of this, using workflow (3), by default invoking
cabal2nix --shell will generate a nix expression which loads both the build and test dependencies into the environment. It is not usual for the test dependency tree to quite a bit larger than the build dependency tree. Ideally, when a user runs “cabal build”, cabal should enter a nix shell with the appropriate build dependencies for building whichever component it wants to build and no more. Similarly, “cabal test” should enter an environment with test dependencies. It is currently possible to achieve this for benchmarking dependencies by the somewhat archaic
nix-shell --argstr doBenchmark true.
Some more possible angles to explore are:
cabal.projectfile we can specify additional local dependencies. In
--nixmode, these should turn into overrides of the local package set and
nixshould build them.
There should be an easy way to “pin” a nixpkgs version so that builds are reproducible. This could take the form of specifying a hash directly of a nixpkgs commit or more indirectly such as specifying a
ltsversion (with an appropriately generated package set) and so on.
cabal build --nix -w ghc-8.0.2should modify the environment to provision the 8.0.2 package set rather than rely on the user to have already installed the compiler locally.
This is more of a framework for a project proposal rather than a concrete idea, and there are many more angles to explore. A successful proposal will need to flesh out in detail what would be necessary to implement one or perhaps two of these ideas:
Mentors: Matthew Pickering, Will Fancher
CodeWorld Editing and Debugging Tools🔗
CodeWorld is an educational web-based programming environment based on Haskell. There are significant opportunities to make the project easier to use and more successful for students by rethinking editing and debugging tools in a functional setting. The possible feature set is very large, and a significant part of this project would be choosing a set of features to work on. Someone could work on a single ambitious feature, or a collection of smaller features with a cumulative impact.
Specific ideas include:
- Extending the auto-complete interface, and adding contextual hints as the user types code, to offer relevant documentation as the user types code.
- Offering better visual clues in the editor interface. CodeWorld users are typically children, and struggle with nesting and syntax structure. Ideas include color-coding function names with their corresponding arguments, or marking up major syntax to make the structure apparent.
- Extending the debugging features. CodeWorld now offers some unique debugging features that show users how shapes in their output link back to their code. This could be extended to include Elm-style time-traveling debugging features, and other useful extensions.
Mentors: Chris Smith
Difficulty: Varies depending on proposal
A format-preserving YAML library for Haskell🔗
A growing number of (Haskell) applications and libraries rely on YAML as a configuration format. One of the motivating applications that uses YAML a lot is stack.
YAML is meant to be a human-friendly language, so files are often written by hand and contain many comments. However, given the number of YAML files that stack users need to deal with, it would be useful to be able to modify the YAML files programatically from within Haskell.
The existing Haskell YAML library supports parsing and rendering YAML files, but because it uses the efficient C library under the hood, metadata like file layout and comments are not preserved.
Thus, there is a clear niche for a new library that provides a pure Haskell format-preserving YAML parser and renderer library. The focus of this project will be correctness, complying with the YAML specification and providing a clean API. Micro-optimizations are less important, and we don’t expect this parser to be as fast as other YAML parsers, since it needs to do extra bookkeeping for the metadata.
Difficulty: Beginner to Intermediate
Mentor: Tom Sydney Kerckhove, Jasper Van der Jeugt
Improve GHC Code Generator🔗
Some simple improvement to the GHC’s code generator can make a big difference to performance. For example, a recent change reduced the number of instructions to perform floating point abs from approximately 20 to 2.
There are many open tickets for the code generator, at least two of which are suitable for a GSoC student. For example
- Adding SIMD support to x86/x86_64 NCG
- Adding more SIMD primops corresponding to the untapped AVX etc. instructions
Steps for the student:
- Review and curate the tickets related to the code generator
- With the mentors, select those that are achievable in the timescale and that give the biggest bang for buck
- Create PRs for one or more of the selected tickets
- Update the guidance for working on the code generator (which may be out of date)
Some tickets that can be reviewd:
id | Summary 3557 | CPU Vector instructions in GHC.Prim 7741 | Add SIMD support to x86/x86_64 NCG 10648 | Some 64-vector SIMD primitives are absolutely useless 13852 | Can we have more SIMD primops, corresponding to the untapped AVX etc. instructions? 12412 | SIMD things introduce a metric ton of known key things 14251 | LLVM Code Gen messes up registers 4211 | LLVM: Stack alignment on OSX 5567 | LLVM: Improve alias analysis / performance 7297 | LLVM incorrectly hoisting loads 7610 | Cross compilation support for LLVM backend 10010 | LLVM/optimized code for sqrt incorrect for negative values 10074 | Implement the 'Improved LLVM Backend' proposal 10295 | Putting SCC in heavily inlined code results in "error: redefinition of global" 11138 | Kill the terrible LLVM Mangler 11295 | Figure out what LLVM passes are fruitful 11538 | Wrong constants in LL code for big endian targets 12470 | Move LLVM code generator to LLVM bitcode format 12798 | LLVM seeming to over optimize, producing inefficient assembly code... 13045 | LLVM code generation causes segfaults on FreeBSD 13062 | `opt' failed in phase `LLVM Optimiser'. (Exit code: -11) 13724 | Clamping of llvm llc to -O1 and -O2 13852 | Can we have more SIMD primops, corresponding to the untapped AVX etc. instructions? 14528 | LLVM's CallAnalyzer Breaks 4308 | LLVM compiles Updates.cmm badly 5140 | Fix LLVM backend for PowerPC 14372 | CMM contains a bunch of tail-merging opportunities
Mentor: Dominic Steinitz (aka idontgetoutmuch aka cinimod), Ben Gamari, Matthew Pickering, Moritz Angermann, Carter Schonwald (SIMD/floating point semantics and api impact focus)
Difficulty: Intermediate - Hard
Add support for deprecating exports🔗
GHC currently supports a pragma for deprecating top-level entities. This includes individual functions, modules, classes or data constructors. However, it does not support deprecating an export from a module.
Adding support for this would allow us to gracefully (i.e., with a deprecating phase) move functions from one module to another. A good example is the
Data.List.lines. This is a
String-specific function which clearly belongs in
Data.String rather than
The desired syntax would probably end up looking like:
For more background information, see this ticket: https://ghc.haskell.org/trac/ghc/ticket/4879.
Mentor: Ben Gamari
Implement aspects of Dependent Haskell🔗
The design of GHC/Haskell has been steadily marching toward support for dependent types for a long time, debatably starting with the addition of
-XFunctionalDependencies, as proposed in 2000. With the
-XTypeInType extension in GHC 8.0 (2016), we’re as close as ever. However, much more work remains to be done. This Summer of Haskell idea is to slice off a chunk of that work and implement it!
Precisely which chunk(s) are up to you, the proposer of the project. Good starting places if you’re looking for inspiration can be found in one of several related proposals posted recently. Ideas beyond those proposals include merging the parsers for types and terms, as well as to sort out The Namespace Problem (GHC allows declarations like
data T = T. How would that work in a dependently typed language where terms are not syntactically distinct from types?). A student more versed in type theory (e.g., having experience with Types and Programming Languages, among other introductions) might even attempt implementing a dependently typed Core replacement. If you like, you can see Richard Eisenberg’s thesis for inspiration. That thesis aims to describe both the surface language and Core language for Dependent Haskell.
If you can relocate to the Philadelphia, PA, USA, area for the summer, there will be office space you can use, and you’ll be able to work in a space with several other people hacking on GHC. Unfortunately, there is no extra funding to support this relocation. Remote mentorship is also possible, of course.
Mentor: Richard Eisenberg (feel free to email to discuss ideas for your proposal)
Benchmarking graph libraries and optimising algebraic graphs🔗
Graphs are a very important data structure and they are known to be difficult to work with in functional programming languages. Several libraries are currently implemented to create and process graphs in Haskell, each of them using different graph representation: Data.Graph from containers, fgl, hash-graph and alga.
Due to their differences and the lack of a reference benchmark, it is not easy for a new user to find the best one for their project.
There will be two major tasks in this proposal:
- Develop an automated and fair benchmarking suite for these libraries. The main goal is to help developers to choose easily the library that fits their project. The suite will benchmark (on sparse/dense and weighted/unweighted graphs):
- Graph construction (e.g. from a list of edges).
- Graph deconstruction (e.g. to a list of edges).
- Graph manipulation (add/remove a vertex or an edge).
- Graph lookup (test existence of a vertex or an edge).
- Graph algorithms (reachability, topsort, DFS, BFS, SCC).
The “automated” adjective denote the ability to automatically update benchmarks when a new version of a graph library is released. The “fair” one is about the community part of the project. Haskell community should agree that the libraries are used correctly and to their full potential. The student will make effort to contact library authors to receive their feedback.
The aim is to complete the benchmarking suite before the mid-term evaluation.
- Help improve the alga library. It is a promising and new approach (based on mathematical results about an algebra of graphs), but it lacks some important features, a user-friendly tutorial, and has not yet been optimised for performance. Hence the following subtasks:
- Write a tutorial. Alga is well documented, but a new user can be lost in this new way of thinking about graphs.
- Implement missing algorithms and optimise existing ones on the basis of the developed benchmarking suite. Graph libraries are expected to provide some basic algorithms, but because Alga is so different from conventional graph representations most of these algorithms need to be designed from scratch.
- As a bonus, try to implement edge-labelled graphs. It is a high-risk subtask, because the theory behind it is still being worked out and requires further discussions.
Working with the Haskell community is the core of this project. The student will engage Haskell developers, in particular both users and authors of existing graph libraries, in order to develop a high-quality well-documented benchmarking suite. Concerning Alga, there is a lot to do and again, the input of the community will be essential to decide which algorithms are needed, how to implement them, and receive feedback on the results. Alga is new, but the student will have support from the library author and will share the challenges with the Haskell community in blog posts.
It is hoped that the project will also benefit the entire community: it will help new developers to choose the right library, help current developers of these libraries to focus on specific optimisation goals and missing features, and, finally, make algebraic graphs a real alternative to existing graph libraries.
Mentor: Andrey Mokhov
Finish the package candidate workflow for Hackage🔗
Hackage candidate packages currently cannot be used directly, and their UI could be improved. We would like to have new packages be uploaded as candidates by default, to improve the vetting process. But this means polishing off candidate functionality. The main issues left to do are tracked here
The first step is moving the candidate display page to the new templating system and sharing code with the main package page. Following this, we need to implement a new candidate index, able to be provided as a secondary index. This would be a “v1” index, and mutable.
Beyond this we want to extend the docbuilder and docuploads to work with candidates, and then implement a fixed workflow from candidacy to validation and then publishing.
Mentors: Gershom Bazerman, Herbert Valerio Riedel
Make Hackage CDN-aware🔗
We have speed and bandwidth issues with the hackage package repository due to needing to disable the CDN for too many pages. This is because when the CDN is on, it caches things people don’t expect – in particular, things that can be updated due to user action.
There are utility functions in the hackage codebase to teach each page to send proper cache-control headers to keep the CDN from serving stale content. However, they aren’t used carefully and uniformly.
Additionally, the CDN interferes with our ability to collect download statistics.
This would be a two phase project:
Annotate hackage pages carefully to ensure that the CDN doesn’t cause confusion with regards to updates to pages.
Design a solution to both allow caching of package downloads and also collect granular statistics. One possibility is to serve downloads via redirects, with the redirect always being hit, and the redirected-to
.tgzfile being cached.
Mentors: Gershom Bazerman, Herbert Valerio Riedel
Hadrian is a new build system for the Glasgow Haskell Compiler. It is based on the Shake library and we hope that it will soon replace the current Make-based build system.
There are many issues that need to be addressed before Hadrian can take over. Help Hadrian by solving some of them! Two specific issues that you will need to solve as part of your summer project are:
Although Hadrian can build GHC, the resulting binary does not pass the validation. To solve this issue you will need to analyse failing tests and find a way to fix them – in most cases this will be a matter of finding a command line flag that will need to be added to or removed from a GHC build command.
There is currently no support for binary distribution. You will need to implement the corresponding build rule in Hadrian.
Help integrating the relocatable GHC branch into master.
Warning: build systems are messy, even those that are written in Haskell. This is not a very glamorous project but it is a very important one: you have a chance to increase the productivity of GHC developers, and hence help the whole community!
Mentor: Andrey Mokhov (feel free to email to discuss the project)
Improvements to Haskell IDE Engine🔗
Haskell IDE engine is starting to be useful, largely due to the work done in the 2017 HSOC by Zubin Duggal
But there is still plenty to be done to bring it closer to its potential.
Possible goals for a HIE project:
- Rewriting the completion system
- Complete for module names, ghc pragmas, .cabal files, syntax(if/case/let/where etc.)
- Smart templates for case splitting,
- Make completions include local definitions(possibly making it scope aware)
- Making completions scope aware would involve traversing the AST and collecting all the symbols defined until reaching the cursor position
- Less bugs
- Case splitting
- Expanding Template Haskell in place
- More extensive testing, testing a full LSP session from beginning to end.
- Making find definition work for symbols defined in dependencies
- Sharing build cache between HaRe and GHC, to drastically improve refactoring speed.
- Implementing support for project-wide/cross file references.
- (ghc-mod) Support for more build types(new-build, hpack, nix etc.)
- Anything - Haskell tooling could support your favourite ide feature
- Come up with a way to match the running hie server GHC version with the project GHC version. See https://github.com/haskell/haskell-ide-engine/issues/439
- Add a command to build the project
Some of the above may not be substantial enough to fill up the entire summer. In that case, you may target multiple goals.
Mentors: Alan Zimmerman, Zubin Duggal
Finalize the Hasktorch Library for Hackage Release🔗
There are deep connections between functional programming and machine learning computation. Such links can yield new algorithms, as in some recent papers [1,2], while the field of neural networks can be recast as an emerging model of computation based on differentiable function composition [3,4,5]. Furthermore, as machine learning is deployed in diverse fields ranging from technology to healthcare to finance, ensuring the correctness and reliability of these systems is increasingly critical. Given this intersection of ideas and needs, Haskell’s powerful type system and composition mechanisms have the potential to advance the field in important new directions.
To explore this design space, Hasktorch builds on a tensor-based scientific computing C library that has been undergoing development for over a decade and is the foundation of the PyTorch/Torch neural network libraries. It makes available hundreds of mathematical operations including vectorized linear algebra, GPU computation, non-linear transformations, probability functions and sampling operations.
Initial development has been done to comprehensively bind the the core Torch API via code generation, write dependently-typed memory managed abstractions around core tensor operations, and implement a backpack layer for module-level polymorphism. The goal of this proposal is to finalize the library for its first hackage release, which includes:
- Migrate any high-level python code from pytorch and torchvision into haskell
- Write interfaces to libraries which abstract automatic differentiation procedures (e.g. backprop and diffhask libraries)
- Add high-level optimization routines such as stochastic gradient descent and ADAM. This would be a subset of the previous two points.
- Automated testing and benchmarking with continuous integration.
- Write examples and documentation.
- Brendan Fong, David I. Spivak, Rémy Tuyéras. Backprop as Functor: A compositional perspective on supervised learning
- Leland McInnes, John Healy. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
- Chris Olah. https://colah.github.io/posts/2015-09-NN-Types-FP/
- Yann Lecun. https://www.facebook.com/yann.lecun/posts/10155003011462143
- Atilim Gunes Baydin, Barak A. Pearlmutter, Alexey Andreyevich Radul, Jeffrey Mark Siskind. Automatic differentiation in machine learning: a survey
Mentors: Sam Stites, Tim Pierson, Austin Huang
…or how to get Haddock docstrings into .hi files
A long-standing issue with Haskell’s documentation tool Haddock is that it needs to effectively re-perform a large part of the parse/template-haskell/typecheck compilation pipeline in order to extract the necessary information from Haskell source for generating rendered Haddock documentation. This makes Haddock generation a costly operation, and makes for a poor developer experience.
An equally long-standing suggestion to address this issue (c.f. “Haddock strings in .hi files” email thread) is to have GHC include enough information in the generated
.hi interface files in order to avoid Haddock having to duplicate that work. This would pave the way for following use-cases and/or have the following benefits:
Significantly speed up Haddock generation by avoiding redundant work.
On-the-fly/lazy after-the-fact Haddock generation in
cabal new-haddockfor already build/installed Cabal library packages.
Allows downstream tooling like Hoogle or Hayoo! to index documentation right from interface files.
Simplify Haddock’s code base.
Proposed implementation strategy
This proposal focuses on making the needed changes to GHC’s codebase. The subsequent changes to Haddock are considered future work and are out of scope for this proposal.
- The student would add two new fields to GHC’s
ifaceArgMapfrom Haddock’s interface files (c.f.
ModGutswith the documentation for declarations (taken from the
MkIfaceto serialise the collected documentation.
- As a simple way to validate the new ability, teach GHCi’s
:info(or alternatively add a new
:doccommand) how to read the documentation from loaded interface files (pretty rendering is not necessary at this point; just dump the raw comment strings).
An implementation needs to make sure to load the documentation as lazy as possible from the interfaces as it might impose a performance hit in the common case.
Make Haddock use the GHC’s interface files to produce documentation and thereby simplify its codebase; also figure out how to speedup Haddock’s
Mentors: Alex Biehl, Herbert Valerio Riedel
A library for in-memory data analysis in Haskell🔗
A typical workflow in interactive data analysis consists of :
- Loading data (e.g. a CSV on disk)
- Transforming the data
- Various data processing stages
- Storing the result in some form (e.g. in a database). The goal of this project is to provide a unified and idiomatic Haskell way of carrying out these tasks. Informally, you can think of “dplyr”/“tidyr” from the R ecosystem, but type safe.
This project aims to provide a library with the following features:
- An efficient data structure for possibly larger-than-memory tabular data. The Frames library is notable prior work, and this project may build on top of it (namely, by extending its functionality for generating types from stored data).
- A set of functions to “tidy”/clean the data to bring it to a form fit for further analysis, e.g. splitting one column to multiple columns (“spread”) or vice versa (“gather”).
- A DSL for performing a representative set of relational operations e.g. filtering/aggregation.
Mentors: Marco Zocca
A Binary backend for Postgresql/Persistent🔗
The Persistent library provides an abstract interface to a number of different databases, including Postgresql. The Postgresql current backend for Persistent uses the postgresql-simple library which uses UTF-8 strings to communicate between the Haskell program and PostgresQL. Marshalling to and from strings obviously has some performance implications.
It would therefore be nice to have an backend for Persistent that uses PostgresQL’s binary protocol. There are already two Haskell libraries that use this binary protocol, Hasql and postgresql-binary.
The aim of this project is to write a new PostgresQL backend for the Persistent library that makes use of this binary protocol, possibly via one of the two existing binary protocol libraries.
The project outline is something along the lines of:
- Investigate all three libraries; Persistent, Hasql and postgresql-binary.
- Decide how this new binary PostgresQL Persistent backend will operate.
- Implement it.
- Benchmark the new backend comparing it with the existing Persistent backend which uses postgresql-simple.
Mentors: Erik de Castro Lopo, Nikita Volkov, Maximilian Tagher
Haskell Program Analysis using GHC Source Plugins🔗
Performing any kind of static analysis on Haskell programs has traditionally been very difficult. The main problem has been that in order to load a Haskell module, you need to know a lot of additional information such as where dependencies exist, which preprocessors to run and so on. Syntactic analysis is somewhat possible but semantic analysis has been out of reach.
In recent work,
@nboldi extended the plugin interface to allow users to modify and inspect the compiler’s AST as the program is compiled. This has the advantage that it can be integrated easily into any existing build system and the desired information is computed as the program is compiled.
The plugin architecture is very powerful but this project will focusing on using the API in order to extract information rather than modify source program.
Some potential avenues of analyses include:
- A plugin which analyses a project with hs-boot files and identifies ways to reduce the size and number of them.
- A plugin which computes minimal exports
- A plugin which computes unused functions in an application across modules.
- A plugin which computes statistics about source code, for example which counts how many times certain language features are used.
- Analysis of core programs using the existing core2core plugins. For example, looking at the calculated sizes of expressions to help visualise core output.
- Integrate using plugins with
nixso that it is easy for use to specify they want to run plugins in a declarative manner.
A side-effect of this project will be a refinement of the plugins API and documentation about how other users can use plugins effectively.
A succesful proposal will identify one potential application of a plugin and explain why using a plugin will be beneficial for that application. It would be useful to also consider the challenges of alternative approaches such as using the GHC API.
A succesful project will implement at least one of these analysis ideas and document the process in order to advertise the plugin architecture to other users.
Mentors: Boldizsár Németh, Oliver Charles
Quickchecking web APIs🔗
When writing web applications, there are a number of things one has to keep in mind, independent of the domain of the application, in order to do the job well. For example, that “Accept” headers are honored; that HTML has a doctype; that no endpoint takes too long to respond. Learning about these best practices, remembering to implement them, and testing for them is currently very time-consuming, since it must be done by every developer and for every application anew.
This proposal is to develop a tool to check that any application, if appropriately described, satisfies conditions such as these. Already ‘servant-quickcheck’ does this for ‘servant’ applications, but in this proposal we’d extend the reach to applications which have an OpenAPI (Swagger) description (and perhaps other types of description as well), that indeed need not even by written in Haskell. (However, since more detailed customization, and the definition of new conditions or predicates, would happen in Haskell, this tool might serve, like XMonad, as an introduction to Haskell to many people.) Some related work in this space already exists: servant-swagger generates Swagger definitions from
servant types (i.e., the opposite direction), and Masahiro Yamauchi has added
servant type generation to the Swagger tool, though this is a large Java project that would be hard to distribute, and wouldn’t allow for developing annotations to Swagger descriptions that have meaning specific to this project.
Beyond the development of translation from Swagger definitions to ‘servant’ types and of the executable, the project may include defining new common predicates.
Some relevant background includes:
Mentors: Julian K. Arni
New authentication schemes for servant-auth🔗
The servant-auth packages are a relatively young attempt at providing a definitive answer to any authentication needs for dealing with protected web applications using servant, a fairly popular set of libraries for serving web applications, querying them and more. The servant-auth packages already have a sufficiently general infrastructure to support about any authentication scheme one might be interested in but only offer JSON Web Tokens (JWT) and basic authentication out of the box at the moment.
One interesting project that could have a quite significant impact would be to dedicate an entire summer to the implementation of a few other essential authentication schemes (OAuth is a good example, but would not fill an entire summer). A decent starting point to figure out what already exists (in other packages or ecosystems) is this hackage search for the existing servant authentication solutions and perhaps this article for a list of common authentication schemes.
Besides OAuth, we do not have a fixed list of authentication schemes to consider so prospective students should feel free to talk to us, the haskell community and do a bit of research in order to come up with a list of authentication schemes that they would like to implement during the summer.
The end goal of this project would be to have 2-3 new authentication schemes (or more of course) implemented in servant-auth, with reasonable haddocks, some tests. The student could optionally also (co-?)author cookbook recipes illustrating how to serve or query APIs that are protected by the newly supported authentication schemes, therefore making the student’s work easily discoverable by current and future servant users. All in all, by the end of the summer, servant-auth would not just be an attempt at solving the authentication problem with servant anymore, it would finally be the definitive solution.
Mentors: Alp Mestanogullari (co-mentors: Julian Arni & Oleg Grenrus)
Offline mode for Stack🔗
Stack is a tool for installing and developping Haskell applications and libraries. It currently requires internet access to operate well.
There are various hacks and projects that attempt to circumvent this problem, e.g:
However, the Stack maintainers are now interested supporting this first-class. This has become more important since offline is operation is required in some corporate settings – and it’s also just useful if you’re writing Haskell on an airplane or on a train!
This work will require touching many different parts of the Stack codebase but it will not require a deep understanding of its internals.
Mentors: Emanuel Borsboom, Stack Contributors
Difficulty: Beginner to Intermediate