Summer of Haskell

GSoC 2022 Ideas

This is a list of ideas for contributors who are considering to apply to Google Summer of Code 2022 for Haskell.org.

For project maintainers

Are you working on a Haskell project and you could use the help of a contributor during the summer? Consider adding it as an idea here! You can contribute ideas by sending a pull request to our github repository (example from 2020). If you just want to discuss a possible idea, please contact us.

What is a good idea? Anything that improves the Haskell ecosystem is valid. The GSoC rules state that it must involve writing code primarily (as opposed to docs).

Projects should be concrete and small enough in scope such that they can be finished by the contributor. Past experience has shown that keeping projects “small” is almost always a good idea.

Important change for 2021/2022: In the past, GSoC projects were expected to take up the equivalent of full time employment for a student. In 2021, this was reduced to half time positions: students were expected to work around 175 hours in a 10 week period. In 2022, contributors now have the choice between a larger (around 350 hours) or a smaller project. Ideas should indicate in which category they fall.

Projects should benefit as many people as possible – e.g. an improvement to GHC will benefit more people than an update to a specific library or tool, but both are acceptable. New libraries and applications written in Haskell, rather than improvements to existing ones, are also welcome.

For students/contributors

We have added some tips on writing a proposal here. Please be aware that:

Table of Contents

  1. Algorithmic Pattern: formalising heritage algorithms for new creative interfaces
  2. Reimplementing `cabal check` as a syntax tree traversal
  3. Field collection and filtering for cabal-install build plans
  4. Control headless Chrome/Chromium
  5. CodeCrafters courses in Haskell
  6. Hackage features (new ranking, user info update, email notifications)
  7. Implementing a GPU backend for advanced machine learning algorithms
  8. Support `OverloadedRecordDot` in Haskell Language Server
  9. Support more LSP features in Haskell Language Server

Algorithmic Pattern: formalising heritage algorithms for new creative interfaces🔗

Functional programming (and mathematics in general) is founded on the idea of patterns, which have a very long cultural history, including in crafts (braiding, wire-bending, tile patterns) as well as the performing and community arts (konnakol, bell ringing, maypole dancing, juggling siteswaps etc).

That patterns run throughout technology and the arts and crafts allows us to approach well-established cultural practices, many of which are 1000s of years old, and investigate the computational basis for their patterns, and also the ‘user interfaces’ that humans have developed to work with them.

A project idea for the Summer of Haskell then could make use of Haskell’s advanced type system for formalising the structures underlying such an arts or crafts practice. Once formalised, this could be developed into creative user interfaces for making new patterns.

Such a project would not necessarily require Haskell programming skills, as long as the mentee is happy to develop such skills during the project. You should have some experience with programming, but perhaps more important would be expertise in the artform or craft that is to be translated into/explored through code. One important outcome of the project would be to introduce the mentee to working collaboratively on free/open source projects at the intersection of art, culture and technology.

This work would connect with the Algorithmic Pattern research project that I run. I’m also instigator/maintainer of the TidalCycles free/open source live coding system, which the project could potentially feed into, but this is not a requirement.

Mentor: Alex McLean

Difficulty: Easy/medium

Size: 350 hours

Reimplementing `cabal check` as a syntax tree traversal🔗

The check command of the cabal build tool has two purposes:

  1. Raise errors if a package would be rejected by Hackage.
  2. Give warnings if things looks suspicious or problematic (a kind of lint tool for .cabal files).

In some sense cabal check is comparable to a scope or type checker in a programming language. It should run over the whole syntax tree of the parsed .cabal file and try to make sense of each part, finding real and potential problems. Typically scope/type checkers consider each part of the syntax tree in its context, which contains e.g. the variables in scope at the point, or some constraints that hold there.

The current cabal check has grown historically and is not organized like a scope/type checker. Rather, it collects some data from the syntax tree first (in a fold) and then checks this data. There are certain problems with this approach and/or its current implementation:

  1. It does not deliver position information to give precise error locations to the user.

  2. It does not evaluate data in its context always, like under which conditionals the data sits. Thus, spurious or imprecise warnings might be generated.

  3. When the cabal syntax is updated, the old checker might still compile, and the cabal developer is not alerted of the fact that they also need to update the checker. A plain syntax-tree traversal would fail to compile if the grammar of the syntax-tree changes.

The goal of this project is to reimplement cabal check in the style of a scope/type checker. The new implementation should be faithful with regard to the old implementation, so that it does not overlook problems in .cabal files that the old implementation noticed.

The project may be structured into these phases/milestones:

  1. Secure the features of the old implementation. This amounts to building a testsuite (resp. extend the existing one) for cabal check that captures the behavior of the current implementation. This testsuite is already a viable result and shall be integrated in the cabal codebase.

  2. Design the new check, the kind of context information it needs, etc. Write a design document and a first documentation.

  3. Rough implementation of the new check. Can be a stand-alone executable using the Cabal-syntax library at first.

  4. Refined implementation, integrated into the cabal executable. Update testsuite and documentation to the final implementation.

Each of the milestones produces a valuable result that can be considered a partial success of the internship.

Potential mentor: Andreas Abel

Difficulty: Intermediate

Size: 350 hours

Required skills:

Field collection and filtering for cabal-install build plans🔗

There are a number of places (excluding custom setups, certain licenses, etc) where explicitly disallowing certain values of certain fields from solver build plans could be useful, or, alternately, warning on such values, or simply providing reporting of the union of such values.

While third party tools can make use of generated info after the fact to report on build plans, it would be very good to provide cabal with a general purpose way to filter the index presented to the solver on any top level cabal field (either positive or negative presence of certain values) as well as report on the union of data of any cabal field, so that one could see all licenses in a plan and their provenance, or all uses of custom setups in a plan and their provenance, etc.

Potential Mentors: Gershom Bazerman

Difficulty: Medium

Size: 350 hours, possibly 175 hours for a strong, experienced contributor

Control headless Chrome/Chromium🔗

Chrome and Chromium are powerful and highly popular browsers. When used in headless mode, they can be controlled via the Chromium DevTools Protocol (CDP). It is a powerful tool for automation.

The project will build a Haskell library, based on the websockets, to control headless Chrome from Haskell. The library will enable many interesting use-cases, including the ability to use Chrome for PDF generation from HTML sources. If time permits, the library could be showcased by integrating it into pandoc, the universal document converter. Pandoc users can then convert to PDF documents through Chrome.

Potential mentors: Albert Krewinkel, Jasper Van der Jeugt

Difficulty: Medium

Size: 350 hours

CodeCrafters courses in Haskell🔗

On CodeCrafters, one can interactively recreate popular developer tools from scratch, in any language. Programmers enjoy this as a learning exercise (e.g Build your own Git).

The language support module is OSS, and The Build your own Redis exercise is available in Haskell, thanks to an OSS contribution. It’s also quite popular.

Depending on the student’s bandwidth, the GSoC project(s) can comprise:

Opening up more Haskell courses means 1000s of CodeCrafters learners will get to experience and master Haskell, which is great for the ecosystem.

Additional notes:

Units: We have the bandwidth to mentor up to 3 students

Mentors: Sarup Banskota, Rohit Paul Kuruvilla

Difficulty: Beginner, Medium

Size: Available both as 175 hours, and 350 hours

Hackage features (new ranking, user info update, email notifications)🔗

There are a number of features for Hackage that would make good gsoc projects. Among them:

A variety of other related tickets are under the ux tag on the hackage github repo: https://github.com/haskell/hackage-server/labels/component%3A%20ux

Potential Mentors: Gershom Bazerman

Difficulty: Medium

Size: 175 hours or 350 hours (depending on how many features the proposal covers)

Implementing a GPU backend for advanced machine learning algorithms🔗

The goal of this project is to provide a collection of high-performance, GPU-accelerated machine learning algorithms to the Haskell community. To do this, we would combine two existing, well-developed libraries: The first is Hasktorch, which provides industry-grade GPU-accelerated performance through bindings to pytorch/libtorch. The second is Goal, which are a collection of machine learning libraries that I maintain for my research (https://elifesciences.org/articles/64615 and http://cognet.mit.edu/node/57955 are two examples of my research where all simulations were implemented based on Goal). Goal provides not only research-specific machine learning models, but also a suite of fundamental and widely used models such as mixture models, Kalman filters, hidden Markov models, and factor analysis.

Goal is high-performance, but ultimately CPU-bound. Nevertheless, the Goal type-system is built on a backend of newtype wrappers around hmatrix and vector types. As such, if the backend could be replaced by newtypes around Hasktorch primitives, the large collection of models and algorithms that Goal provides could easily take advantage of GPU-acceleration. The principle work of this project would thus be to develop a backend for Goal based on Hasktorch, and propogate the changes up through the rest of the Goal libraries.

Although this project is focused on my own libraries, I see it as fulfilling a broader need in the Haskell community. Users have long lamented the lack of data science tools in Haskell, even though Haskell’s strong type-system would appear to be a natural fit for describing advanced machine learning models and algorithms. A critical part of progress is simply trying, and therefore even if the particular approach of Hasktorch + Goal that we propose does not satisfy all the needs of the wider community, it will serve as an important, yet also achievable and well-defined exploration of the space of Haskell-driven data science.

Mentors: Sacha Sokoloski, Junji Hashimoto

Difficulty: Medium

Size: 350 hours

Support `OverloadedRecordDot` in Haskell Language Server🔗

Haskell Language Server (HLS) has support for many GHC language features, but new ones (particularly ones that add syntax) sometimes require additional work. Currently HLS does not have good support for OverloadedRecordDot, introduced in GHC 9.2.

The following improvements would bring us up to a reasonable level of support:

Once this is completed, more advanced support could be provided:

Potential Mentors: Michael Peyton Jones/Pepe Iborra

Difficulty: Medium

Size: 175 hours

Support more LSP features in Haskell Language Server🔗

Haskell Language Server (HLS) uses the Language Server Protocol (LSP) to communicate with the editor. LSP servers can announce what features they support in this protocol. While HLS already comes with the most important ones, there are a few that haven’t been implemented yet.

Implementing some of those could be a very good way of helping out other Haskellers!

Potential Mentors: Michael Peyton Jones

Difficulty: Medium

Size: 175 hours (implemeting 1-2 features) or 350 hours (implementing 3-4 features)