GSoC 2022 Ideas
This is a list of ideas for contributors who are considering to apply to Google Summer of Code 2022 for Haskell.org.
For project maintainers
Are you working on a Haskell project and you could use the help of a contributor during the summer? Consider adding it as an idea here! You can contribute ideas by sending a pull request to our github repository (example from 2020). If you just want to discuss a possible idea, please contact us.
What is a good idea? Anything that improves the Haskell ecosystem is valid. The GSoC rules state that it must involve writing code primarily (as opposed to docs).
Projects should be concrete and small enough in scope such that they can be finished by the contributor. Past experience has shown that keeping projects “small” is almost always a good idea.
Important change for 2021/2022: In the past, GSoC projects were expected to take up the equivalent of full time employment for a student. In 2021, this was reduced to half time positions: students were expected to work around 175 hours in a 10 week period. In 2022, contributors now have the choice between a larger (around 350 hours) or a smaller project. Ideas should indicate in which category they fall.
Projects should benefit as many people as possible – e.g. an improvement to GHC will benefit more people than an update to a specific library or tool, but both are acceptable. New libraries and applications written in Haskell, rather than improvements to existing ones, are also welcome.
We have added some tips on writing a proposal here. Please be aware that:
- This is not an all-inclusive list, so you can apply for projects not in this list and we will try our best to match you with a mentor.
- You can apply for as many ideas as you want (but only one can be accepted).
Table of Contents
- Algorithmic Pattern: formalising heritage algorithms for new creative interfaces
- Reimplementing `cabal check` as a syntax tree traversal
- Field collection and filtering for cabal-install build plans
- Control headless Chrome/Chromium
- CodeCrafters courses in Haskell
- Hackage features (new ranking, user info update, email notifications)
- Implementing a GPU backend for advanced machine learning algorithms
- Support `OverloadedRecordDot` in Haskell Language Server
- Support more LSP features in Haskell Language Server
Algorithmic Pattern: formalising heritage algorithms for new creative interfaces🔗
Functional programming (and mathematics in general) is founded on the idea of patterns, which have a very long cultural history, including in crafts (braiding, wire-bending, tile patterns) as well as the performing and community arts (konnakol, bell ringing, maypole dancing, juggling siteswaps etc).
That patterns run throughout technology and the arts and crafts allows us to approach well-established cultural practices, many of which are 1000s of years old, and investigate the computational basis for their patterns, and also the ‘user interfaces’ that humans have developed to work with them.
A project idea for the Summer of Haskell then could make use of Haskell’s advanced type system for formalising the structures underlying such an arts or crafts practice. Once formalised, this could be developed into creative user interfaces for making new patterns.
Such a project would not necessarily require Haskell programming skills, as long as the mentee is happy to develop such skills during the project. You should have some experience with programming, but perhaps more important would be expertise in the artform or craft that is to be translated into/explored through code. One important outcome of the project would be to introduce the mentee to working collaboratively on free/open source projects at the intersection of art, culture and technology.
This work would connect with the Algorithmic Pattern research project that I run. I’m also instigator/maintainer of the TidalCycles free/open source live coding system, which the project could potentially feed into, but this is not a requirement.
Mentor: Alex McLean
Size: 350 hours
Reimplementing `cabal check` as a syntax tree traversal🔗
check command of the
cabal build tool has two purposes:
- Raise errors if a package would be rejected by Hackage.
- Give warnings if things looks suspicious or problematic
(a kind of lint tool for
In some sense
cabal check is comparable to a scope or type checker
in a programming language. It should run over the whole syntax tree
of the parsed
.cabal file and try to make sense of each part,
finding real and potential problems. Typically scope/type checkers
consider each part of the syntax tree in its context, which contains
e.g. the variables in scope at the point, or some constraints that
cabal check has grown historically and is not organized
like a scope/type checker. Rather, it collects some data from the
syntax tree first (in a
fold) and then checks this data. There are
certain problems with this approach and/or its current implementation:
It does not deliver position information to give precise error locations to the user.
It does not evaluate data in its context always, like under which conditionals the data sits. Thus, spurious or imprecise warnings might be generated.
When the cabal syntax is updated, the old checker might still compile, and the cabal developer is not alerted of the fact that they also need to update the checker. A plain syntax-tree traversal would fail to compile if the grammar of the syntax-tree changes.
The goal of this project is to reimplement
cabal check in the style
of a scope/type checker. The new implementation should be faithful
with regard to the old implementation, so that it does not overlook
.cabal files that the old implementation noticed.
The project may be structured into these phases/milestones:
Secure the features of the old implementation. This amounts to building a testsuite (resp. extend the existing one) for
cabal checkthat captures the behavior of the current implementation. This testsuite is already a viable result and shall be integrated in the
Design the new check, the kind of context information it needs, etc. Write a design document and a first documentation.
Rough implementation of the new check. Can be a stand-alone executable using the
Cabal-syntaxlibrary at first.
Refined implementation, integrated into the
cabalexecutable. Update testsuite and documentation to the final implementation.
Each of the milestones produces a valuable result that can be considered a partial success of the internship.
Potential mentor: Andreas Abel
Size: 350 hours
- Good familiarity with
cabalfrom a user’s perspective.
- Good familiarity with abstract syntax trees.
- Good familiarity with monadic programming.
Field collection and filtering for cabal-install build plans🔗
There are a number of places (excluding custom setups, certain licenses, etc) where explicitly disallowing certain values of certain fields from solver build plans could be useful, or, alternately, warning on such values, or simply providing reporting of the union of such values.
While third party tools can make use of generated info after the fact to report on build plans, it would be very good to provide cabal with a general purpose way to filter the index presented to the solver on any top level cabal field (either positive or negative presence of certain values) as well as report on the union of data of any cabal field, so that one could see all licenses in a plan and their provenance, or all uses of custom setups in a plan and their provenance, etc.
Potential Mentors: Gershom Bazerman
Size: 350 hours, possibly 175 hours for a strong, experienced contributor
Control headless Chrome/Chromium🔗
Chrome and Chromium are powerful and highly popular browsers. When used in headless mode, they can be controlled via the Chromium DevTools Protocol (CDP). It is a powerful tool for automation.
The project will build a Haskell library, based on the websockets, to control headless Chrome from Haskell. The library will enable many interesting use-cases, including the ability to use Chrome for PDF generation from HTML sources. If time permits, the library could be showcased by integrating it into pandoc, the universal document converter. Pandoc users can then convert to PDF documents through Chrome.
Potential mentors: Albert Krewinkel, Jasper Van der Jeugt
Size: 350 hours
CodeCrafters courses in Haskell🔗
On CodeCrafters, one can interactively recreate popular developer tools from scratch, in any language. Programmers enjoy this as a learning exercise (e.g Build your own Git).
The language support module is OSS, and The Build your own Redis exercise is available in Haskell, thanks to an OSS contribution. It’s also quite popular.
Depending on the student’s bandwidth, the GSoC project(s) can comprise:
Add Haskell support for the Git and SQLite exercises. This involves porting the challenge’s starter repository to use Haskell, and updating its Dockerfile. This exercise is particularly great as a beginner-level, since the Redis port serves as a Haskell example, and there are examples supporting other courses in other languages.
Contribute to the breakdown of more technologies, and creation of new Haskell courses. Some topics requested by our community include Build your own Shell, BitTorrent Client, Blockchain, and Regex Parser, but more ideas are absolutely welcome. Depending on the skill-level and interest of the student, they can be involved in breaking down new topics into stages (see Git overview for reference), or creating a language-agnostic tester repository as per a spec (example of a very simple stage test) — and adding Haskell support.
Opening up more Haskell courses means 1000s of CodeCrafters learners will get to experience and master Haskell, which is great for the ecosystem.
Units: We have the bandwidth to mentor up to 3 students
Mentors: Sarup Banskota, Rohit Paul Kuruvilla
Difficulty: Beginner, Medium
Size: Available both as 175 hours, and 350 hours
Hackage features (new ranking, user info update, email notifications)🔗
There are a number of features for Hackage that would make good gsoc projects. Among them:
- implementation of PackageRank as a possible sorting metric (https://github.com/haskell/hackage-server/issues/986).
- a user info account update feature to let users change their own email address: https://github.com/haskell/hackage-server/issues/167
- relatedly, finishing user notification emails and allowing them to be managed from that page: https://github.com/haskell/hackage-server/pull/622
A variety of other related tickets are under the ux tag on the hackage github repo: https://github.com/haskell/hackage-server/labels/component%3A%20ux
Potential Mentors: Gershom Bazerman
Size: 175 hours or 350 hours (depending on how many features the proposal covers)
Implementing a GPU backend for advanced machine learning algorithms🔗
The goal of this project is to provide a collection of high-performance, GPU-accelerated machine learning algorithms to the Haskell community. To do this, we would combine two existing, well-developed libraries: The first is Hasktorch, which provides industry-grade GPU-accelerated performance through bindings to pytorch/libtorch. The second is Goal, which are a collection of machine learning libraries that I maintain for my research (https://elifesciences.org/articles/64615 and http://cognet.mit.edu/node/57955 are two examples of my research where all simulations were implemented based on Goal). Goal provides not only research-specific machine learning models, but also a suite of fundamental and widely used models such as mixture models, Kalman filters, hidden Markov models, and factor analysis.
Goal is high-performance, but ultimately CPU-bound. Nevertheless, the Goal type-system is built on a backend of newtype wrappers around hmatrix and vector types. As such, if the backend could be replaced by newtypes around Hasktorch primitives, the large collection of models and algorithms that Goal provides could easily take advantage of GPU-acceleration. The principle work of this project would thus be to develop a backend for Goal based on Hasktorch, and propogate the changes up through the rest of the Goal libraries.
Although this project is focused on my own libraries, I see it as fulfilling a broader need in the Haskell community. Users have long lamented the lack of data science tools in Haskell, even though Haskell’s strong type-system would appear to be a natural fit for describing advanced machine learning models and algorithms. A critical part of progress is simply trying, and therefore even if the particular approach of Hasktorch + Goal that we propose does not satisfy all the needs of the wider community, it will serve as an important, yet also achievable and well-defined exploration of the space of Haskell-driven data science.
Mentors: Sacha Sokoloski, Junji Hashimoto
Size: 350 hours
Support `OverloadedRecordDot` in Haskell Language Server🔗
Haskell Language Server (HLS) has support for many GHC language features,
but new ones (particularly ones that add syntax) sometimes require additional work.
Currently HLS does not
have good support for
OverloadedRecordDot, introduced in GHC 9.2.
The following improvements would bring us up to a reasonable level of support:
- Hover information for fields accessed through dot notation.
- Completions for record fields after a dot.
Once this is completed, more advanced support could be provided:
- Completions for virtual record fields.
- Code actions to rewrite non-dot-using field accesses into dot-using field accesses.
- More as inspiration strikes.
Potential Mentors: Michael Peyton Jones/Pepe Iborra
Size: 175 hours
Support more LSP features in Haskell Language Server🔗
Haskell Language Server (HLS) uses the Language Server Protocol (LSP) to communicate with the editor. LSP servers can announce what features they support in this protocol. While HLS already comes with the most important ones, there are a few that haven’t been implemented yet.
Implementing some of those could be a very good way of helping out other Haskellers!
Change Annotations: These let you annotate pieces of edits with notes explaining what they do, and in some clients may let the user review them before accepting. For example, this would let us annotate each hlint fix in the “apply all fixes” edit with the hint that it is addressing, making it more obvious what it has done.
Linked editing: this could be a lightweight way of doing document-local renamings for e.g. local variables.
Completion / Code Action / Code Lens resolving. This is definitely the most complicated, since it potentially requires state tracking on the part of the server. But if we could do it well then it might lead to improved performance. Completions would probably be highest priority.
Document links: these could be used for type references in Haddock, for example.
Potential Mentors: Michael Peyton Jones
Size: 175 hours (implemeting 1-2 features) or 350 hours (implementing 3-4 features)