GSoC 2020 Ideas
For project maintainers
This is a list of ideas for students who are considering to apply to Google Summer of Code 2020 for Haskell.org. You can contribute ideas by sending a pull request to our github repository. If you just want to discuss a possible idea, please contact us.
Please be aware that:
- This is not an all-inclusive list, so you can apply for projects not in this list and we will try our best to match you with a mentor.
- You can apply for as many ideas as you want (but only one can be accepted).
- Some general tips on writing a proposal are discussed here.
Table of Contents
- Finishing SIMD support for GHCs native backend
- Add primops to expand the (boxed) array API.
- Documentation generator for the Dhall configuration language
- Faster factorization algorithms
- Build-integration and Badges for Hackage
- Finish the package candidate workflow for Hackage
- Hasktorch library for neural networks and tensor math
- Lua interface to the http-client library
- "New heads for Hydra": enhancing the framework for real world apps
- OpenTelemetry support for Haskell
- Property-based testing stateful programs using QuickCheck
- Interactive reports for smos
- Update stylish-haskell to use ghc-lib-parser
Finishing SIMD support for GHCs native backend🔗
Motivation: SIMD is highly valued for fast processing of large datasets. GHC currently supports SIMD only when using the llvm backend.
There is an unfinished patch adding SIMD support to GHCs native backend. However the patch currently breaks because the register allocator does not support vector variables properly.
This project would consist of fixing the register allocator which should be the last big hurdle for SIMD support. And then making sure SIMD makes it over the finishing line.
- Andreas Klebinger
Add primops to expand the (boxed) array API.🔗
Arrays are the bedrock on which popular data structures like Vector are implemented. However the API provided by GHC is quite limited. This project would expand the API to fill some of these gaps.
All examples are given for
Array# but apply to
SmallArray# as well.
Create boxed arrays from existing ones
Motivation: Creating (
Array#s efficiently in GHC Haskell stands in tension with the garbage collector as we must never encounter uninitialised slots during a collection. Thus, for safety, array creation primitives have to conservatively intialise array slots at the cost of performance.
Currently to make a new
zs out of existing
ys, we have to call the following operations (analogously for
let zs = newArray# ...
copyArray# xs ... zs ...
copyArray# ys ... zs ...
Proposed idea: We propose to add a new array primitive that allows copying existing arrays into a new array while bypassing any unnecessary initialisation step.
let zs = concatArray#s xs ys
Provide an API for modifying array sizes
Motivation: Creating a new array from an subset of the element inside an array currently requires us to first create a new array, and then copy the elements over.
Proposed idea: Provide
growArray# primops which combine the copy and initialization step.
These could be used for example in the implementation of
grow from the vector package.
Create boxed arrays from known elements.
Motivation: Currently creating an array of fully known contents consists of two steps. Creating an array initializing it with default values and then filling in the actual contents.
Proposed idea: Provide Array literals which allow giving the size and contents of an array as a single construct. E.g. arrayFrom# (# a, b, c #)
This would allow us to completely eliminate the initialization with default values completely.
- Andreas Klebinger
- Andrew Martin
Documentation generator for the Dhall configuration language🔗
The Dhall configuration language is a programmable configuration language designed to balance ease of maintenance with general-purpose programming language features. The Dhall language has multiple independent implementations, each of which binds to a different host programming language, similar to how JSON or YAML can be read into multiple programming languages. However, a large number of supporting tools are built on top of the Haskell implementation, mainly because that was the first Dhall implementation.
One supporting tool of interest is a documentation generator. Up until now, Dhall packages have been mostly hosted on GitHub/GitLab and documentation consists of inline comments within source files, such as this one:
Many users have requested a more polished solution for generating documentation from these commented source files, analogous to Haskell’s
haddock tool (a documentation generator for Haskell), which they can then host (as HTML) or include within their Dhall projects (as Markdown files checked into version control). There have even been some nascent attempts to implement this, such as:
To that end, the goal of this project is to implement a command line documentation generator whose input is a directory tree containing a Dhall package and whose output is documentation in either markdown or HTML form. The scope of this project does not include hosting documentation on behalf of users. In other words, this project will only build a Dhall analog of
haddock and will not attempt to build a Dhall analog of Hackage.
This project should be appropriate for an beginning Haskell programmer with web development experience. The amount of Haskell code required to write the first draft of the project should be small and there will be many opportunities within the project to exercise web development skills to improve the visual appeal, user experience, and ease of comprehension of the generated documentation.
The project scope can also be extended depending on how things progress by adding features common to other documentation generators, such as:
- Rendering tests (which are natively supported by the language)
- Browsing the original source code
- Type on hover (within the rendered source code)
- Jump to definition (within the rendered source code)
Potential Mentors: Gabriel Gonzalez, Simon Jakobi, Profpatsch
Faster factorization algorithms🔗
There is a growing (and coming-of-age) ecosystem of Haskell packages for cryptography, witnessed by increasing number of blockchain and zero knowledge protocols. This project aims to fill one of remaining gaps: state-of-the-art algorithms for integer factorization.
The most advanced existing Haskell implementations still use integer factorization over elliptic curves (example). But there is a modern family of vastly superior and faster methods of factorization: number field sieves. The goal is to implement them as a separate Haskell library or as a part of
To reach the goal the candidate could implement the quadratic sieve, achieving decent performance characteristics. If there is still time left, we will proceed with the general number field sieve.
The candidate should have a basic knowledge of linear algebra and number theory and be willing to learn more. This project may be a good fit for students with a strong mathematical background, but little practice in Haskell, because it is self-contained and involves neither scary types nor arcane interfaces. We nevertheless expect a decent understanding of Haskell 2010 (standard types and classes, folds, monads) and an acquaintance with core libraries, e. g.,
Mentors: Andrew Lelechenko, Sergey Vinokurov.
Build-integration and Badges for Hackage🔗
People commonly add “shields” (or “badges”) to their project landing pages to signal how well their project is maintained. Hackage even supports such shields so that Haskell project maintainers can keep their project’s status page up-to-date with their latest package upload.
However, the amount of information that Hackage makes available this way is very limited: currently Hackage only reports whether or not documentation was built successfully and that’s about it.
Imagine if a package author could proudly display the latest GHC version that they successfully build against! Imagine if you, a package user, could quickly discern a well-maintained package when they report 100% documentation coverage.
Hackage could potentially report this information, and more! Hackage is perfectly positioned to monitor test suites, benchmarks, code coverage, and any other measure of package quality. All that’s missing is for you to make this information available.
This project is ideal even for Haskell programmers of all experience levels with an interest in gaining experience in server-side web development. The project difficulty can adapt to the student’s proficiency from beginner (make a few API changes to expose more information) to intermediate (add new quality checks to Hackage) to experienced (make architectural changes to parallelize Hackage builds). You can also exercise front-end programming skills if you are interested in full-stack work by extending the Hackage user interface to report new information collected in this way.
This project gives you the opportunity to contribute to make highly recognizable changes to core Haskell infrastructure in ways that benefit the entire community.
Potential Mentors: Gershom Bazerman, Herbert Valerio Riedel
Finish the package candidate workflow for Hackage🔗
Hackage candidate packages currently cannot be used directly, and their UI could be improved. We would like to have new packages be uploaded as candidates by default, to improve the vetting process. But this means polishing off candidate functionality. The main issues left to do are tracked here
The first step is moving the candidate display page to the new templating system and sharing code with the main package page. Following this, we need to implement a new candidate index, able to be provided as a secondary index. This would be a “v1” index, and mutable.
Beyond this we want to extend the docbuilder and docuploads to work with candidates, and then implement a fixed workflow from candidacy to validation and then publishing.
Mentors: Gershom Bazerman, Herbert Valerio Riedel
Hasktorch library for neural networks and tensor math🔗
Hasktorch is a library for neural networks and tensor math in Haskell leveraging the C++ backend of PyTorch for fast numerical computation with GPU support. Our goal with Hasktorch is to provide a platform for machine learning using typed functional programming.
As a summer project, there are a number of potential areas to contribute:
Write an in-depth tutorial series
Port a tutorial series on neural networks such as tensor2tensor or fastai. Alongside code, write detailed comments/notes as well as adding to/improving//debugging the library when needed.
- FastAI - Practical Deep Learning for Coders course
- Trax - Example neural network implementations from Google Brain
Implement saving/loading serialized representations of models so that models can be transferred between PyTorch and Hasktorch.
- torchscript - an intermediate representation target for PyTorch models
- Hasktorch PR 268 supporting torchscript types
Expand ecosystem around higher-level functionality
Implement/improve modules for data loading, vision/text libraries, model visualization and interpretability, probabilistic modeling, etc.
- torchvision and torchtext documentation
- tf lucid - a collection of infrastructure and tools for research in neural network interpretability.
Contribute to Foundational Low-level Code
Help refine/debug foundational C++ foreign-function interface and code generation implementation. Improve resource management, help with migrations tracking upstream libtorch/PyTorch releases.
- Junji Hashimoto
- Austin Huang
- Adam Paszke
- Torsten Scholak
- Sam Stites
Lua interface to the http-client library🔗
The HsLua library allows to embed an interpreter for the Lua programming language into programs written in Haskell. One example is pandoc, which uses Lua as extension language allowing users to author custom writers or to modify pandoc’s internal document representation.
HsLua allows to expose Haskell functions to Lua scripts, thereby enabling users to access functionality otherwise hidden in a program’s internals. Lua bindings to http-client, a popular, easy to use, and powerful Haskell HTTP library, are currently lacking. Such bindings could give pandoc users great additional power without the need for external C Lua libraries.
The candidate, who should be familiar with Haskell and Lua knowledge as an optional bonus, could
choose Haskell functions which would be most useful to Lua users;
write bindings for these functions, as well as tests for those bindings;
publish the bindings as a library on Hackage and Stackage.
If time permits, the new library could be included in pandoc.
Mentors: Albert Krewinkel
Difficulty: Intermediate to advanced.
"New heads for Hydra": enhancing the framework for real world apps🔗
Hydra showcase project
The Hydra project is aimed to be a showcase for Software Design practices, and application architectures, best practices and patterns of building big, complex applications in Haskell. This framework is a full-fledged solution for creating web-services, console and standalone applications. Applications may use several subsystems out of the box: SQL DB (3 different SQL backends with beam), KV DB (2 different KVDBs), STM-like multithreading and concurrency, logging and many other features.
Hydra is a framework which currently provides three independent engines having the same functionality. This allows your to compare three different approaches in terms of impact to the application architecture, usability, performance, code structure and design decisions.
The following independent engines are implemented:
- (Hierarchical) Free Monads
- (Hierarchical) Church-Encoded Free Monads
- Final Tagless (mtl)
The Hydra project also provides several demo applications and a testing framework.
The Hydra project is a showcase project for the book “Functional Design and Architecture”.
The framework can be improved in many ways.
Synchronizing functionality between engines. Currently, the Free Monad based engine supports more features than other two engines.
Task 1: Port functionality from the Free Monad engine into the Church Encoded Free Monad engine.
Difficulty: Beginner (the code will be almost identical)
Task 2: Port functionality from the Free Monad engine into the Final Tagless engine.
Difficulty: Advanced (requires some advanced type level programming)
Adding more features into the framework.
The framework can be enhanced by adding several new subsystems: File System, Time, TCP/UDP, JSON-RPC, …
Difficulty: From Beginner to Advanced
Improving demo applications. Demo applications can be extended and improved to better demonstrate the approaches and ideas of Software Design.
- Task 1: Showcase app for servant-based web server and client
- Task 2: Showcase app for SQL interactions
- Task 3: Showcase app for havily concurrent and multithreaded calculations
- Task 4: Showcase app for command-line tool working with user input
Difficulty: Intermediate, Advanced
Improving documentation. Currenlty, the approaches in the framework are the best described among other Haskell approaches due to the book “Functional Design and Architecture” and other materials (articles, talks). Still, a lot of additional materials and documentation is needed.
This project should be appropriate for a Haskell programmers who is willing to learn best practices in Haskell, and who wants to contribute into the community by creating showcases, demo applications and a better documentation.
These techniques are used in real production with great success. Using the ideas from the book and showcase projects, the following frameworks have been implemented in Juspay and Enecuum:
- PureScript Presto, PureScript Presto.Backend, Juspay
- Two more private frameworks for Juspay
- Node, Enecuum
So the implementors will get a real knowledge useful in their careers. The implementors will be mentioned on the official web page of the book.
Mentor: Alexander Granin
Difficulty: Beginner, Intermediate, Advanced
OpenTelemetry support for Haskell🔗
What this project is about
OpenTelemetry is a set of APIs and protocols for instrumenting code, gathering traces and metrics produced by running that code and analyzing all that data.
Despite being targeted primarily at distributed systems it also can be useful for non-networked single machine applications like CLI tools and GUI applications.
Why Haskell should support OpenTelemetry
Supporting a language-agnostic format like OpenTelemetry is important for Haskell because the profiling story is rather immature compared to other languages:
- Tools for visualizing profile data like ThreadScope and ghc-events-analyze didn’t enjoy hundreds of person-years spent on them
- Haskell code needs to be rebuilt with profiling support
- But that might affect some optimizations you’d get a profile of not the thing you are running in production
Instrumentation-based approach allows to have profiling data without recompiling your application and dependencies.
Already existing tools like Jaeger and LightStep can be used to visualize and explore the telemetry data.
The current state of the library
The current implementation of OpenTelemetry for Haskell is in its infancy and needs contributions to cover all of the OpenTelemetry API and to support all export targets.
- Minimal implementation of Trace portion of OpenTelemetry API
- Exporter to a local file in Chrome Tracing format
- WIP: Exporter to LightStep, one of the services for analyzing telemetry data. Will be functional before GSoC starts and serve as a starting point for implementing other exporters.
Possible tasks for GSoC
- Implement Metrics portion of OpenTelemetry API
- Implement local file exporter for metrics
- Implement LightStep exporter for metrics
- Implement an exporter for another service (Jaeger, Zipkin, Prometheus, etc.)
- Instrument a popular network interaction library such as http-client or postgresql-simple
Who benefits from this project
- Developers of distributed systems which have components implemented in Haskell
- Developers of Haskell applications and libraries that have some sufficiently slow parts
- Indirectly: users of said systems and applications enjoying faster software
- Dmitry Ivanov
- Elena Kovalenko
- Dmitrii Dolgov
Property-based testing stateful programs using QuickCheck🔗
When the first version of
QuickCheck was released for Haskell it was the state-of-the-art in testing. Today however it’s lagging behind, for example, Erlang’s
eqc libraries. The
quickcheck-state-machine library is an attempt to add state machine modelling to Haskell’s QuickCheck for testing stateful/monadic code, and thereby catch up with the Erlang versions of QuickCheck.
This proposal is about using, and possibly extending,
quickcheck-state-machine in order to improve the quality of Haskell code in general and for a specific project in particular.
The intermediate candidate could:
Find a commonly used and stateful Haskell library or application to test. This can also be a toy library or application from a commonly used Haskell resource (e.g. a tutorial, book or blog post);
Write a state machine model, for said library or application, together with at least a sequential property, and possibly a parallel property as well;
Getting this far would already reach the goal, but if there’s enough time the candidate could in addition to the above also try to do one of the following items:
- Add fault injection to the model, and thereby test the robustness of the code;
- Turn the state machine model into a mock, like described here, and implement and test a library or application that depends on the original library or application using the mock.
The advanced candidate could additionally try to one of the following items:
- Combine fault injection with parallel testing and thereby achieve Jepsen-like tests;
- Use the gained experience and try to improve the
Mentors: Stevan Andjelkovic
Difficulty: Intermediate to advanced
Interactive reports for smos🔗
Smos has a bunch of nice reports via
smos-query but they are not interactive. There is one interactive report within smos: the next-action report. It would be nicer for users if they could stay within the editor to browse through reports.
This proposal is about building interactive smos reports within the editor itself. It involves a lot of pure Haskell code and plenty of testing, with immediate visual feedback.
The intermediate candidate should be able to:
- Make an interactive report for each of the missing interactive reports:
- Make a nice tui-interface experience for filters, time periods, time blocks, and other options that are usually passed on the command-line.
Getting this far would already reach the goal, but if there’s enough time the candidate could in addition to the above also try to do one of the following:
The smos-scheduler: A way of scheduling recurring projects such that they are put into place at the right time with the right template substitution. This work will involve designing a templating language.
An interactive weekly review experience as part of the editor. The weekly review is currently something that a user has to do manually as part of their own checklist. It would be nice to make that a guided experience.
Mentors: Tom Sydney Kerckhove
Update stylish-haskell to use ghc-lib-parser🔗
stylish-haskell is a Haskell prettifier that currently uses haskell-src-exts as parser workhorse to parse Haskell code into an AST. Unfortunately,
haskell-src-exts is not actively maintained at this point and it can not keep up with the de-facto standard compiler, GHC.
ghc-lib-parser is a newer parsing library that packages the parser from GHC itself, and as such, is always up to date.
The main objective for the summer would be to do the port to
ghc-lib-parser, but a strong student should be able to fit in other improvements as well.
Mentors: Jasper Van der Jeugt, Łukasz Gołębiewski, Pawel Szulc.
Difficulty: Beginner to intermediate.