GSoC 2024 Ideas

This is a list of ideas for contributors who are considering to apply to Google Summer of Code 2024 for Haskell.org

For project maintainers

Are you working on a Haskell project and you could use the help of a contributor during the summer? Consider adding it as an idea here! You can contribute ideas by sending a pull request to our github repository (example from 2023). If you just want to discuss a possible idea, please contact us.

What is a good idea? Anything that improves the Haskell ecosystem is valid. The GSoC rules state that it must involve writing code primarily (as opposed to docs).

Projects should be concrete and small enough in scope such that they can be finished by the contributor. Past experience has shown that keeping projects “small” is almost always a good idea.

Important changes since 2021/2022: In the past, GSoC projects were expected to take up the equivalent of full time employment for a student. In 2021, this was reduced to half time positions: students were expected to work around 175 hours in a 10 week period. Since 2022, contributors now have the choice between a larger (around 350 hours) or a smaller project. Ideas should indicate in which category they fall.

Projects should benefit as many people as possible – e.g. an improvement to GHC will benefit more people than an update to a specific library or tool, but both are acceptable. New libraries and applications written in Haskell, rather than improvements to existing ones, are also welcome.

For students/contributors

We have added some tips on writing a proposal here. Please be aware that:

This is not an all-inclusive list, so you can apply for projects not in this list and we will try our best to match you with a mentor.
You can apply for up to two ideas (but only one can be accepted).

cabal-install security vulnerability checking
improved override semantics for cabal.project files
Continuous Integration Log Explorer
Enhance Hackage server to display security vulnerability information
Haskell Language Server Cabal Plugin Continuation
Use GHCs Structured Diagnostics in HLS
Haskell Language Server Test Suite Improvements
Inlay hints in haskell-language-server
Improve name resolution in Liquid Haskell
Parse error recovery and incrementality for GHC

cabal-install security vulnerability checking🔗

cabal-install is a widely used tool for building Haskell projects. In addition to building and testing packages it can update package indexes from remote servers and handles some aspects of dependency management.

The Haskell Security Response Team maintains the Haskell Security Advisory Database. This database can serve as the basis for enhancing security tooling for the Haskell ecosystem.

In particular, the Advisory Database records known vulnerabilities of packages in the Hackage namespace. The advisory data includes the affected version ranges, written summary and details of the vulnerability, CVSS score and CWE numbers.

We propose the addition of security vulnerability checking to the cabal-install tool. For discussion purposes, this document suggests the cabal audit subcommand name, but this is just a suggestion.

When executed in package or project context, cabal audit would analyse the dependencies of the package/project and advise when vulnerable dependencies are found. There are several considerations that warrant further discussion.

.cabal files, in general, specify version bounds via the build-depends field. In the general case, any overlap between a dependency’s version bounds specified in the build-depends field, and known vulnerable version ranges specified in an advisory should be reported.
cabal-install can produce freeze files via the cabal freeze subcommand, which specify an exact set of dependencies locked at particular versions. cabal audit should have a mode that analyses freeze files. (This mode could work with an explicit input file, outside of a package or project directory context).
In some cases, the vulnerable behaviour in a dependency is not used by the dependent library or program. A mechansim (or mechanisms) to suppress false positives is a requirement. Some or all of the following mechanisms should be considered:
- A local cache of suppressions. In plain words, this would record information such as “suppress HSEC-2024-0001 for package acme-frobnicator”.
- Extending the Cabal package metadata to enable package maintainers or trustees to record such suppressions in the package metadata itself. The data would be propagated through package indexes (e.g. Hackage) and ensure that users do not see false positives, after the metadata have been uploaded to the package index.
- Extending the Advisory Database to record non-exploitability information. This is an alternative way of expressing and propagating the same data as the preceding point. The approaches are not mutually exclusive. The student should engage with the community and pursue a consensus on which approach is preferred, or if both are desired, which should be prioritised.
- The VEX (Vulnerability Exploitability eXchange) standard might provide an appropriate data model for recording and/or transmitting (non-)exploitability information.
- The Advisory Database can optionally record, for each advisory, the names of the problematic functions/values. It may be possible to use this data to produce exploitability information, but how to do so may be complicated or error prone. Depending on progress made, the student may or may not wish to pursue this idea.
- Commands or behaviours to assist the user in reporting vulnerabilities to the Advisory Database is another idea to consider, if time permits.

See also David Christian’s call to action for writing a security advisory analyser for Haskell, which discusses the same general topic.

Mentorship

Ideally a Cabal/cabal-install developer/maintainer would be able to mentor the student.

Haskell Security Response Team can mentor and collaborate with respect to the Advisory Database, the content of advisories, expanding the database to include exploitability information, or exporting the data in a format suitable for use by hackage-server.

Difficulty and size

A Cabal/cabal-install maintainer should weigh in, but this whole effort definitely lies on the larger end.

Difficulty: Medium/Hard

Size: 350 hours

improved override semantics for cabal.project files🔗

The files which cabal uses to configure multipackage projects (cabal.project files) have been extended in recent years to allow includes and conditionals.

This makes more common and useful situations where certain stanzas (build constraints, flags, index-state, etc) may be used to override other stanzas, rather than simply augment them. (But also, we may at times want augmenting semantics – there’s a delicate balance).

The contents of different stanzas in cabal project files are monoidally accumulated. However, the monoid chosen for different stanzas has been done without much thought – typically either with purely accumulating or purely replacement semantics.

This project is for an audit of the monoidal semantics of the various setting which can be controlled by cabal.project, as well as proposal of and implementation of more useful ones.

Related tickets are

Mentorship

Gershom is willing to mentor

Difficulty and size

The difficulty of implementation is medium at most, but this will require somebody who is able to thoughtfully inventory, think through, and propose the specifics of a solution, ideally with prior experience with cabal project files as a user, and also with a sense of existing user workflows. The size of the project is likely 175 hours.

Continuous Integration Log Explorer🔗

Goals

Create a web-based tool that can be used to explore continuous integration test logs suitable for large projects with big workflows that are susceptible to rare intermittent failures.

There are two components to this goal.

Create a service that automatically inserts test logs into a full text search database.
Create a web tool for querying the full text search database and visualizing results.

Background

The Haskell compiler GHC has an old testsuite that is slowly lumbering into the modern era. As more aspects of GHC are tested automatically, rare intermittent failures that cause spurious test results are uncovered. As more infrastructure is added to support automation, the surface area for such spurious failures increases. Collectively, the intermittent failures affect many CI runs and can create a frustrating experience for would-be GHC contributors.

One successful technique for combating intermittent failures is to collect data from many test runs and look for patterns. By finding the “fingerprint” of a particular failure, we can identify whether it is indeed spurious, what circumstances accompany the failure, and how frequently it occurs. This information can be used to identify the root cause and fix the failure. At the very least, it can be used to recover from the failure automatically, giving contributors a smoother experience.

Existing Tooling

Some tooling to support this technique is found at https://gitlab.haskell.org/chreekat/spurious-failures/-/tree/master/local-tooling. It requires the user to manually download all job logs, and the “interface” is nothing more than a sqlite database. This project will improve on the idea.

There is already a service that listens to job events, found at https://gitlab.haskell.org/chreekat/spurious-failures/-/tree/master/spuriobot. Therefore, the first component of the project goal (creating a service that automatically inserts test logs into a full text search database) will only need to extend that service with the log-insertion feature.

Outcomes

Phase 1: The tool will be implemented and brought online with a basic user interface. It will only support GHC.

Phase 2, option 1: Guided by user feedback, better visualizations will be added to the UI.

Phase 2, option 2: The service that automatically inserts test logs into a full text search database will be extended to support Github workflows, allowing the tool to be used much more widely.

Phase 2, option 3: Use the tool to characterize spurious failures in GHC. There is a large list of potential spurious failures that can be investigated. And maybe fix them!

Size

Estimated at 175 hours.

The first deliverable, described in Phase 1, is small. By choosing from the Phase 2 options, however, the project can be extended to medium or large as suits the circumstances.

Required Skills

Read and write technical English
Haskell programming basics

Suitable for the Following Interests

devops
Haskell tooling
web app development
web services
data visualization

Project Mentor

Bryan Richter, Haskell Foundation DevOps engineer and author of existing tooling

Enhance Hackage server to display security vulnerability information🔗

Hackage is the Haskell community’s central package archive of open source software. It is an instance of the open source hackage-server program.

The Haskell Security Response Team maintains the Haskell Security Advisory Database. This database can serve as the basis for enhancing security tooling for the Haskell ecosystem.

We propose to enhance hackage-server to use the advisory database to augment package pages with security information about the package. In particular, we propose:

Updating package/version pages to clearly indicate that the package/version contains known security issues, and provide details of those issues (a brief summary with a link to an external resource could be sufficient).
Updating package/version pages to clearly indicate that the package/version depends on (or may depend on, according to version bounds) vulnerable version of other packages.
Provide a link or information on every package page about how to report security vulnerabilities in that package. This could be a form that creates an issue or pull request against the security-advisories repository, sends an email to the SRT, or something along those lines.

Mentorship

Ideally someone familiar with the hackage-server implementation would be able to mentor the student.

Haskell Security Response Team can mentor and collaborate with respect to the Advisory Database, the content of advisories, or exporting the data in a format suitable for use by hackage-server.

Difficulty and size

Intermediate difficulty - 175 Hours

Haskell Language Server Cabal Plugin Continuation🔗

The hls-cabal-plugin is a Haskell Language Server (HLS) plugin that allows HLS to be a Language Server for .cabal files as well as Haskell files. While the plugin already provides many core features, there are more possible features that would increase the ergonomics of working with cabal files, such as:

Goto-Definition for local stanzas, such as library, executable or common stanzas
Integrating the cabal-add into HLS
Prompt to add unknown modules to exposed-modules and other-modules sections.
Completion of local and non-local package names
Completion of package version bounds
Showing documentation for keywords and enum values.

With some creativity, we can come up with many more features.

Mentorship

Fendor

Difficulty and size

The difficulty of this project is medium, as there are two rather big existing projects that developers need to understand in order to provide improvements.

The estimated size of this project is 175 hours, but there is likely 350 hours worth of work depending on the mentee’s interests and ideas.

Use GHCs Structured Diagnostics in HLS🔗

Haskell Language Server provides many quick fixes and refactoring to fix common errors and warnings reported by GHC. This is done in the hls-refactor-plugin, but is implemented by manually parsing the text of GHCs error/warning messages. However, since GHC 9.2, GHC provides a much more structured representation of its error messages, which should allow a much more robust implementation of these refactorings and avoid fragile regular expression parsing of plain text messages.

Task List

Adding infrastructure to ghcide and HLS to keep and associate structured error messages with the diagnostics we send to the editor so that it is possible to look up the associated structured error value for any diagnostic that the editor wants us to fix.
Port the existing refactorings to use the diagnostic infrastructure instead of relying on parsing rendered error messages.
Cataloging any existing refactorings that cannot be ported to use the new structured diagnostic infrastructure, and possibly making GHC MRs to remedy the situation.
Implement more refactorings that are enabled by taking advantage of more suggestions in structured error messages.

References:

Mentorship

Zubin Duggal will mentor

Difficulty: Medium

Size: 175 hours should be enough to implement the infrastructure required and port most existing diagnostics. However, the work can easily expand to 350 hours to finish porting all diagnostics and making any necessary changes to GHC itself.

Haskell Language Server Test Suite Improvements🔗

Haskell Language Server (HLS) has an extensive test suite that is run on every commit. Over time, the test suite has degraded in performance, reliability and consistency. This has become a bottleneck in the development of HLS, as a slow and unreliable test suite deters new contributors and makes life more difficult for maintainers.

This project aims to improve the quality of the test suite by:

Reducing the overall test suite execution time
Fixing flaky test cases
Removing artificial wait times in test cases
Unifying the style of tests in HLS

The issue #3736 provides some ideas on how to improve the test suite further.

To achieve the aforementioned goals, some of the following intermediate steps could be helpful:

Unifying testing infrastructure of ghcide and plugins
Enabling parallelism of test case execution
Exploiting custom LSP messages to reduce flakiness of tests written in lsp-test

Mentorship

Fendor

Difficulty and size

Intermediate - 175 hours

The project itself is not too difficult, as there is lots of prior work and many low hanging fruits. However, there is a fair amount of working with the internals of Haskell Language Server which can be intimidating as they tend to be underdocumented.

The size of this project ranges from 175 hours up to 350 hours, depending on the exact scope of the proposal.

Inlay hints in haskell-language-server🔗

Inlay hints are a relatively new langauge server protocol feature that allow servers to display additional information inline in the user’s editor, and in some cases triggering edits when clicked.

They have a wide variety of uses, the HLS issue discusses some of them. For example:

Replacing the bulky import lens with a compact inlay hint
Type annotations on various kinds of binding
Explicit display of record field names when they are omitted
More we haven’t thought of!

This project would be to try and implement as many of these inlay hint uses as possible in HLS.

Mentorship

Michael Peyton Jones will mentor.

Difficulty and size

Much of this project will not be too difficult, however it will require understanding and modifying many parts of HLS’s codebase, which is non-trivial. We may also need to work out how to get additional information from GHC, which will require interacting with GHC, and possibly contributing fixes upstream.

Difficulty: medium Size: A basic version is probably doable in 175 hours, but there is probably 350 hours worth of work, or the project could extend into implementing other missing LSP features.

Improve name resolution in Liquid Haskell🔗

Liquid Haskell is a tool to verify Haskell programs. The programmer supplies specifications for functions and data types in special comments, and the tool produces error messages when it cannot automatically prove that the program behaves as specified. The programmer can then improve the specifications or the program until the tool reports no errors. More information is available on the Liquid Haskell website. Similar to a type system in spirit, Liquid Haskell enrols the computer in the effort to discover programming mistakes.

Liquid Haskell has been around for a decade and is a versatile tool, but it still has a few issues which hinder the user experience. This project is to address one of these shortcomings.

When analyzing a program, Liquid Haskell needs to link identifiers that appear in specifications to the entities they refer to, much in the same way that a compiler needs to link identifiers in the text of a function to the language entities that they represent. We refer to this task as name resolution.

The identifiers in a specification can refer to other specifications, or they can refer to Haskell entities like functions and types. The output of name resolution tells for each identifier the module in which the referred entity is defined and the package it comes from.

At the moment, name resolution in Liquid Haskell is done twice for the same specifications, and the outcomes of both passes do not always yield the same result, leading to confusing errors and tedious workarounds. This project is about having name resolution done only once. For more details please see the github issue. The project involves both the implementation of a solution and the writing of a blog post summarizing the achievements.

As a stretch goal, this project might specify and fix the scope rules and mechanisms to use in Liquid Haskell specifications. While the intention of Liquid Haskell maintainers over time has been to imitate GHC scoping rules as much as possible, there are cases where Liquid Haskell just deviates from them. As part of attaining this supplementary stretch goal, one would need to answer questions like the following.

What names are in scope when writing the specifications of a module?
How should ambiguities be resolved when imports offer definitions with the same Liquid Haskell names?
What Liquid Haskell names are exported from modules and when?

Potential Mentors: Facundo Domínguez

Difficulty: Medium

Size: 350 hours but they are flexible by adjusting the scope

Parse error recovery and incrementality for GHC🔗

GHC is able to report multiple type errors at once, yet a single parser error brings the whole compilation pipeline to a halt; see this tech proposal.

One significant obstacle is the parser generator happy that GHC relies on for versatile and fast parsing: The current error handling architecture exposed by happy will abort on the first parse error without producing a partial syntax tree at all.

This draft PR improves happy to resume parsing after reporting a parse error, but it lacks documentation, introduces a number of breaking changes and is in bad need of cleanup. Nevertheless, it is technically complete, passes the testsuite and has already been tried on GHC as a proof of concept.

The goal of this project is to take over the pull request to happy so that it can be merged, and then use the improved happy to generate multiple and better parse error messages in GHC.

There are a couple of stretch goals:

happy could further be improved to pass a closure of its parse state to reduction actions, so as to enable incremental parsing in GHC’s parser.
Improve happy so that it provides a convenient and encapsulated way to introspect the LALR item stack, for example to identify bracketing productions such as '(' expr . ')' in GHC’s parser in order to report mismatched brackets. There is a hacky GHC Merge Request that tries to achieve as much without buy in from happy.
Improve happys code base, which by now is over 25 years old. For example, recently happy has been modularised, thus split into multiple independently usable packages (for modelling grammars, building LALR tables, producing Haskell code from LALR tables, etc.), but unfortunately the individual packages lack documentation and examples.

Potential Mentors: Sebastian Graf

Difficulty: Medium, given that the technical bits have been drafted out. Still, the student would be required to familiarise themselves with the basics of LALR parsing theory in order to contribute documentation.

Size: 175 hours for merging the PR and beginning to improve GHC, but 350 hours can easily be spent on working on stretch goals as well for significant improvement of GHC.

GSoC 2024 Ideas

For project maintainers

For students/contributors

Table of Contents

cabal-install security vulnerability checking🔗

Mentorship

Difficulty and size

improved override semantics for cabal.project files🔗

Mentorship

Difficulty and size

Continuous Integration Log Explorer🔗

Goals

Background

Existing Tooling

Outcomes

Size

Required Skills

Suitable for the Following Interests

Project Mentor

Enhance Hackage server to display security vulnerability information🔗

Mentorship

Difficulty and size

Haskell Language Server Cabal Plugin Continuation🔗

Use GHCs Structured Diagnostics in HLS🔗

Haskell Language Server Test Suite Improvements🔗

Inlay hints in haskell-language-server🔗

Mentorship

Difficulty and size

Improve name resolution in Liquid Haskell🔗

Parse error recovery and incrementality for GHC🔗