Summer of Haskell

GSoC 2020 Ideas

For project maintainers

This is a list of ideas for students who are considering to apply to Google Summer of Code 2020 for Haskell.org. You can contribute ideas by sending a pull request to our github repository. If you just want to discuss a possible idea, please contact us.

For students

Please be aware that:

Table of Contents

  1. Finishing SIMD support for GHCs native backend
  2. Add primops to expand the (boxed) array API.
  3. Documentation generator for the Dhall configuration language
  4. Faster factorization algorithms
  5. Build-integration and Badges for Hackage
  6. Finish the package candidate workflow for Hackage
  7. Hasktorch library for neural networks and tensor math
  8. Lua interface to the http-client library
  9. "New heads for Hydra": enhancing the framework for real world apps
  10. OpenTelemetry support for Haskell
  11. Property-based testing stateful programs using QuickCheck
  12. Interactive reports for smos
  13. Update stylish-haskell to use ghc-lib-parser

Finishing SIMD support for GHCs native backend🔗

Motivation: SIMD is highly valued for fast processing of large datasets. GHC currently supports SIMD only when using the llvm backend.

There is an unfinished patch adding SIMD support to GHCs native backend. However the patch currently breaks because the register allocator does not support vector variables properly.

This project would consist of fixing the register allocator which should be the last big hurdle for SIMD support. And then making sure SIMD makes it over the finishing line.

Potential Mentors:

Difficulty: Hard

Add primops to expand the (boxed) array API.🔗

Arrays are the bedrock on which popular data structures like Vector are implemented. However the API provided by GHC is quite limited. This project would expand the API to fill some of these gaps.

All examples are given for Array# but apply to SmallArray# as well.

Create boxed arrays from existing ones

Motivation: Creating (Small)Array#s efficiently in GHC Haskell stands in tension with the garbage collector as we must never encounter uninitialised slots during a collection. Thus, for safety, array creation primitives have to conservatively intialise array slots at the cost of performance.

Currently to make a new Array# zs out of existing Array#s xs and ys, we have to call the following operations (analogously for SmallArray#):

  1. let zs = newArray# ...
  2. copyArray# xs ... zs ...
  3. copyArray# ys ... zs ...

Proposed idea: We propose to add a new array primitive that allows copying existing arrays into a new array while bypassing any unnecessary initialisation step.

  1. let zs = concatArray#s xs ys

Provide an API for modifying array sizes

Motivation: Creating a new array from an subset of the element inside an array currently requires us to first create a new array, and then copy the elements over.

Proposed idea: Provide sliceArray# and growArray# primops which combine the copy and initialization step.

These could be used for example in the implementation of grow from the vector package.

Create boxed arrays from known elements.

Motivation: Currently creating an array of fully known contents consists of two steps. Creating an array initializing it with default values and then filling in the actual contents.

Proposed idea: Provide Array literals which allow giving the size and contents of an array as a single construct. E.g. arrayFrom# (# a, b, c #)

This would allow us to completely eliminate the initialization with default values completely.

Potential Mentors:

Difficulty: Advanced

Documentation generator for the Dhall configuration language🔗

The Dhall configuration language is a programmable configuration language designed to balance ease of maintenance with general-purpose programming language features. The Dhall language has multiple independent implementations, each of which binds to a different host programming language, similar to how JSON or YAML can be read into multiple programming languages. However, a large number of supporting tools are built on top of the Haskell implementation, mainly because that was the first Dhall implementation.

One supporting tool of interest is a documentation generator. Up until now, Dhall packages have been mostly hosted on GitHub/GitLab and documentation consists of inline comments within source files, such as this one:

Many users have requested a more polished solution for generating documentation from these commented source files, analogous to Haskell’s haddock tool (a documentation generator for Haskell), which they can then host (as HTML) or include within their Dhall projects (as Markdown files checked into version control). There have even been some nascent attempts to implement this, such as:

To that end, the goal of this project is to implement a command line documentation generator whose input is a directory tree containing a Dhall package and whose output is documentation in either markdown or HTML form. The scope of this project does not include hosting documentation on behalf of users. In other words, this project will only build a Dhall analog of haddock and will not attempt to build a Dhall analog of Hackage.

This project should be appropriate for an beginning Haskell programmer with web development experience. The amount of Haskell code required to write the first draft of the project should be small and there will be many opportunities within the project to exercise web development skills to improve the visual appeal, user experience, and ease of comprehension of the generated documentation.

The project scope can also be extended depending on how things progress by adding features common to other documentation generators, such as:

Potential Mentors: Gabriel Gonzalez, Simon Jakobi, Profpatsch

Difficulty: Beginner

Faster factorization algorithms🔗

There is a growing (and coming-of-age) ecosystem of Haskell packages for cryptography, witnessed by increasing number of blockchain and zero knowledge protocols. This project aims to fill one of remaining gaps: state-of-the-art algorithms for integer factorization.

The most advanced existing Haskell implementations still use integer factorization over elliptic curves (example). But there is a modern family of vastly superior and faster methods of factorization: number field sieves. The goal is to implement them as a separate Haskell library or as a part of arithmoi package.

To reach the goal the candidate could implement the quadratic sieve, achieving decent performance characteristics. If there is still time left, we will proceed with the general number field sieve.

The candidate should have a basic knowledge of linear algebra and number theory and be willing to learn more. This project may be a good fit for students with a strong mathematical background, but little practice in Haskell, because it is self-contained and involves neither scary types nor arcane interfaces. We nevertheless expect a decent understanding of Haskell 2010 (standard types and classes, folds, monads) and an acquaintance with core libraries, e. g., containers and vector.

Mentors: Andrew Lelechenko, Sergey Vinokurov.

Difficulty: Intermediate.

Build-integration and Badges for Hackage🔗

People commonly add “shields” (or “badges”) to their project landing pages to signal how well their project is maintained. Hackage even supports such shields so that Haskell project maintainers can keep their project’s status page up-to-date with their latest package upload.

However, the amount of information that Hackage makes available this way is very limited: currently Hackage only reports whether or not documentation was built successfully and that’s about it.

Imagine if a package author could proudly display the latest GHC version that they successfully build against! Imagine if you, a package user, could quickly discern a well-maintained package when they report 100% documentation coverage.

Hackage could potentially report this information, and more! Hackage is perfectly positioned to monitor test suites, benchmarks, code coverage, and any other measure of package quality. All that’s missing is for you to make this information available.

This project is ideal even for Haskell programmers of all experience levels with an interest in gaining experience in server-side web development. The project difficulty can adapt to the student’s proficiency from beginner (make a few API changes to expose more information) to intermediate (add new quality checks to Hackage) to experienced (make architectural changes to parallelize Hackage builds). You can also exercise front-end programming skills if you are interested in full-stack work by extending the Hackage user interface to report new information collected in this way.

This project gives you the opportunity to contribute to make highly recognizable changes to core Haskell infrastructure in ways that benefit the entire community.

Potential Mentors: Gershom Bazerman, Herbert Valerio Riedel

Difficulty: Beginner

Finish the package candidate workflow for Hackage🔗

Hackage candidate packages currently cannot be used directly, and their UI could be improved. We would like to have new packages be uploaded as candidates by default, to improve the vetting process. But this means polishing off candidate functionality. The main issues left to do are tracked here

The first step is moving the candidate display page to the new templating system and sharing code with the main package page. Following this, we need to implement a new candidate index, able to be provided as a secondary index. This would be a “v1” index, and mutable.

Beyond this we want to extend the docbuilder and docuploads to work with candidates, and then implement a fixed workflow from candidacy to validation and then publishing.

Mentors: Gershom Bazerman, Herbert Valerio Riedel

Difficulty: Intermediate

Hasktorch library for neural networks and tensor math🔗

Hasktorch is a library for neural networks and tensor math in Haskell leveraging the C++ backend of PyTorch for fast numerical computation with GPU support. Our goal with Hasktorch is to provide a platform for machine learning using typed functional programming.

As a summer project, there are a number of potential areas to contribute:

Write an in-depth tutorial series

Port a tutorial series on neural networks such as tensor2tensor or fastai. Alongside code, write detailed comments/notes as well as adding to/improving//debugging the library when needed.

Relevant Links:

PyTorch Interop

Implement saving/loading serialized representations of models so that models can be transferred between PyTorch and Hasktorch.

Relevant Links:

Expand ecosystem around higher-level functionality

Implement/improve modules for data loading, vision/text libraries, model visualization and interpretability, probabilistic modeling, etc.

Relevant Links:

Contribute to Foundational Low-level Code

Help refine/debug foundational C++ foreign-function interface and code generation implementation. Improve resource management, help with migrations tracking upstream libtorch/PyTorch releases.

Relevant Links:

Potential Mentors:

Difficulty: Intermediate

Lua interface to the http-client library🔗

The HsLua library allows to embed an interpreter for the Lua programming language into programs written in Haskell. One example is pandoc, which uses Lua as extension language allowing users to author custom writers or to modify pandoc’s internal document representation.

HsLua allows to expose Haskell functions to Lua scripts, thereby enabling users to access functionality otherwise hidden in a program’s internals. Lua bindings to http-client, a popular, easy to use, and powerful Haskell HTTP library, are currently lacking. Such bindings could give pandoc users great additional power without the need for external C Lua libraries.

The candidate, who should be familiar with Haskell and Lua knowledge as an optional bonus, could

  1. choose Haskell functions which would be most useful to Lua users;

  2. write bindings for these functions, as well as tests for those bindings;

  3. publish the bindings as a library on Hackage and Stackage.

If time permits, the new library could be included in pandoc.

Mentors: Albert Krewinkel

Difficulty: Intermediate to advanced.

"New heads for Hydra": enhancing the framework for real world apps🔗

Hydra showcase project

The Hydra project is aimed to be a showcase for Software Design practices, and application architectures, best practices and patterns of building big, complex applications in Haskell. This framework is a full-fledged solution for creating web-services, console and standalone applications. Applications may use several subsystems out of the box: SQL DB (3 different SQL backends with beam), KV DB (2 different KVDBs), STM-like multithreading and concurrency, logging and many other features.

Hydra is a framework which currently provides three independent engines having the same functionality. This allows your to compare three different approaches in terms of impact to the application architecture, usability, performance, code structure and design decisions.

The following independent engines are implemented:

The Hydra project also provides several demo applications and a testing framework.

The Hydra project is a showcase project for the book “Functional Design and Architecture”.

Tasks

The framework can be improved in many ways.

This project should be appropriate for a Haskell programmers who is willing to learn best practices in Haskell, and who wants to contribute into the community by creating showcases, demo applications and a better documentation.

These techniques are used in real production with great success. Using the ideas from the book and showcase projects, the following frameworks have been implemented in Juspay and Enecuum:

So the implementors will get a real knowledge useful in their careers. The implementors will be mentioned on the official web page of the book.

Mentor: Alexander Granin

Difficulty: Beginner, Intermediate, Advanced

OpenTelemetry support for Haskell🔗

What this project is about

OpenTelemetry is a set of APIs and protocols for instrumenting code, gathering traces and metrics produced by running that code and analyzing all that data.

Despite being targeted primarily at distributed systems it also can be useful for non-networked single machine applications like CLI tools and GUI applications.

Why Haskell should support OpenTelemetry

Supporting a language-agnostic format like OpenTelemetry is important for Haskell because the profiling story is rather immature compared to other languages:

Instrumentation-based approach allows to have profiling data without recompiling your application and dependencies.

Already existing tools like Jaeger and LightStep can be used to visualize and explore the telemetry data.

The current state of the library

The current implementation of OpenTelemetry for Haskell is in its infancy and needs contributions to cover all of the OpenTelemetry API and to support all export targets.

Example screenshot showing a trace of stack build executed in an already built project

Possible tasks for GSoC

Who benefits from this project

Mentors:

Difficulty: Any

Property-based testing stateful programs using QuickCheck🔗

When the first version of QuickCheck was released for Haskell it was the state-of-the-art in testing. Today however it’s lagging behind, for example, Erlang’s PropEr and eqc libraries. The quickcheck-state-machine library is an attempt to add state machine modelling to Haskell’s QuickCheck for testing stateful/monadic code, and thereby catch up with the Erlang versions of QuickCheck.

This proposal is about using, and possibly extending, quickcheck-state-machine in order to improve the quality of Haskell code in general and for a specific project in particular.

The intermediate candidate could:

  1. Find a commonly used and stateful Haskell library or application to test. This can also be a toy library or application from a commonly used Haskell resource (e.g. a tutorial, book or blog post);

  2. Write a state machine model, for said library or application, together with at least a sequential property, and possibly a parallel property as well;

Getting this far would already reach the goal, but if there’s enough time the candidate could in addition to the above also try to do one of the following items:

    1. Add fault injection to the model, and thereby test the robustness of the code;
    2. Turn the state machine model into a mock, like described here, and implement and test a library or application that depends on the original library or application using the mock.

The advanced candidate could additionally try to one of the following items:

    1. Combine fault injection with parallel testing and thereby achieve Jepsen-like tests;
    2. Use the gained experience and try to improve the quickcheck-state-machine library itself.

Mentors: Stevan Andjelkovic

Difficulty: Intermediate to advanced

Interactive reports for smos🔗

Smos has a bunch of nice reports via smos-query but they are not interactive. There is one interactive report within smos: the next-action report. It would be nicer for users if they could stay within the editor to browse through reports.

This proposal is about building interactive smos reports within the editor itself. It involves a lot of pure Haskell code and plenty of testing, with immediate visual feedback.

The intermediate candidate should be able to:

  1. Make an interactive report for each of the missing interactive reports:
    • smos-query entry
    • smos-query waiting
    • smos-query report
    • smos-query clock
    • smos-query projects
    • smos-query log
  2. Make a nice tui-interface experience for filters, time periods, time blocks, and other options that are usually passed on the command-line.

Getting this far would already reach the goal, but if there’s enough time the candidate could in addition to the above also try to do one of the following:

  1. The smos-scheduler: A way of scheduling recurring projects such that they are put into place at the right time with the right template substitution. This work will involve designing a templating language.

  2. An interactive weekly review experience as part of the editor. The weekly review is currently something that a user has to do manually as part of their own checklist. It would be nice to make that a guided experience.

Mentors: Tom Sydney Kerckhove

Difficulty: Intermediate

Update stylish-haskell to use ghc-lib-parser🔗

stylish-haskell is a Haskell prettifier that currently uses haskell-src-exts as parser workhorse to parse Haskell code into an AST. Unfortunately, haskell-src-exts is not actively maintained at this point and it can not keep up with the de-facto standard compiler, GHC.

ghc-lib-parser is a newer parsing library that packages the parser from GHC itself, and as such, is always up to date.

The main objective for the summer would be to do the port to ghc-lib-parser, but a strong student should be able to fit in other improvements as well.

Mentors: Jasper Van der Jeugt, Łukasz Gołębiewski, Pawel Szulc.

Difficulty: Beginner to intermediate.