Summer of Haskell

GSoC 2024 Ideas

This is a list of ideas for contributors who are considering to apply to Google Summer of Code 2024 for Haskell.org

For project maintainers

Are you working on a Haskell project and you could use the help of a contributor during the summer? Consider adding it as an idea here! You can contribute ideas by sending a pull request to our github repository (example from 2023). If you just want to discuss a possible idea, please contact us.

What is a good idea? Anything that improves the Haskell ecosystem is valid. The GSoC rules state that it must involve writing code primarily (as opposed to docs).

Projects should be concrete and small enough in scope such that they can be finished by the contributor. Past experience has shown that keeping projects “small” is almost always a good idea.

Important changes since 2021/2022: In the past, GSoC projects were expected to take up the equivalent of full time employment for a student. In 2021, this was reduced to half time positions: students were expected to work around 175 hours in a 10 week period. Since 2022, contributors now have the choice between a larger (around 350 hours) or a smaller project. Ideas should indicate in which category they fall.

Projects should benefit as many people as possible – e.g. an improvement to GHC will benefit more people than an update to a specific library or tool, but both are acceptable. New libraries and applications written in Haskell, rather than improvements to existing ones, are also welcome.

For students/contributors

We have added some tips on writing a proposal here. Please be aware that:

Table of Contents

  1. cabal-install security vulnerability checking
  2. improved override semantics for cabal.project files
  3. Continuous Integration Log Explorer
  4. Enhance Hackage server to display security vulnerability information
  5. Haskell Language Server Cabal Plugin Continuation
  6. Use GHCs Structured Diagnostics in HLS
  7. Haskell Language Server Test Suite Improvements
  8. Inlay hints in haskell-language-server
  9. Improve name resolution in Liquid Haskell
  10. Parse error recovery and incrementality for GHC

cabal-install security vulnerability checking🔗

cabal-install is a widely used tool for building Haskell projects. In addition to building and testing packages it can update package indexes from remote servers and handles some aspects of dependency management.

The Haskell Security Response Team maintains the Haskell Security Advisory Database. This database can serve as the basis for enhancing security tooling for the Haskell ecosystem.

In particular, the Advisory Database records known vulnerabilities of packages in the Hackage namespace. The advisory data includes the affected version ranges, written summary and details of the vulnerability, CVSS score and CWE numbers.

We propose the addition of security vulnerability checking to the cabal-install tool. For discussion purposes, this document suggests the cabal audit subcommand name, but this is just a suggestion.

When executed in package or project context, cabal audit would analyse the dependencies of the package/project and advise when vulnerable dependencies are found. There are several considerations that warrant further discussion.

See also David Christian’s call to action for writing a security advisory analyser for Haskell, which discusses the same general topic.

Mentorship

Ideally a Cabal/cabal-install developer/maintainer would be able to mentor the student.

Haskell Security Response Team can mentor and collaborate with respect to the Advisory Database, the content of advisories, expanding the database to include exploitability information, or exporting the data in a format suitable for use by hackage-server.

Difficulty and size

A Cabal/cabal-install maintainer should weigh in, but this whole effort definitely lies on the larger end.

Difficulty: Medium/Hard

Size: 350 hours

improved override semantics for cabal.project files🔗

The files which cabal uses to configure multipackage projects (cabal.project files) have been extended in recent years to allow includes and conditionals.

This makes more common and useful situations where certain stanzas (build constraints, flags, index-state, etc) may be used to override other stanzas, rather than simply augment them. (But also, we may at times want augmenting semantics – there’s a delicate balance).

The contents of different stanzas in cabal project files are monoidally accumulated. However, the monoid chosen for different stanzas has been done without much thought – typically either with purely accumulating or purely replacement semantics.

This project is for an audit of the monoidal semantics of the various setting which can be controlled by cabal.project, as well as proposal of and implementation of more useful ones.

Related tickets are

Mentorship

Gershom is willing to mentor

Difficulty and size

The difficulty of implementation is medium at most, but this will require somebody who is able to thoughtfully inventory, think through, and propose the specifics of a solution, ideally with prior experience with cabal project files as a user, and also with a sense of existing user workflows. The size of the project is likely 175 hours.

Continuous Integration Log Explorer🔗

Goals

Create a web-based tool that can be used to explore continuous integration test logs suitable for large projects with big workflows that are susceptible to rare intermittent failures.

There are two components to this goal.

  1. Create a service that automatically inserts test logs into a full text search database.

  2. Create a web tool for querying the full text search database and visualizing results.

Background

The Haskell compiler GHC has an old testsuite that is slowly lumbering into the modern era. As more aspects of GHC are tested automatically, rare intermittent failures that cause spurious test results are uncovered. As more infrastructure is added to support automation, the surface area for such spurious failures increases. Collectively, the intermittent failures affect many CI runs and can create a frustrating experience for would-be GHC contributors.

One successful technique for combating intermittent failures is to collect data from many test runs and look for patterns. By finding the “fingerprint” of a particular failure, we can identify whether it is indeed spurious, what circumstances accompany the failure, and how frequently it occurs. This information can be used to identify the root cause and fix the failure. At the very least, it can be used to recover from the failure automatically, giving contributors a smoother experience.

Existing Tooling

Some tooling to support this technique is found at https://gitlab.haskell.org/chreekat/spurious-failures/-/tree/master/local-tooling. It requires the user to manually download all job logs, and the “interface” is nothing more than a sqlite database. This project will improve on the idea.

There is already a service that listens to job events, found at https://gitlab.haskell.org/chreekat/spurious-failures/-/tree/master/spuriobot. Therefore, the first component of the project goal (creating a service that automatically inserts test logs into a full text search database) will only need to extend that service with the log-insertion feature.

Outcomes

Phase 1: The tool will be implemented and brought online with a basic user interface. It will only support GHC.

Phase 2, option 1: Guided by user feedback, better visualizations will be added to the UI.

Phase 2, option 2: The service that automatically inserts test logs into a full text search database will be extended to support Github workflows, allowing the tool to be used much more widely.

Phase 2, option 3: Use the tool to characterize spurious failures in GHC. There is a large list of potential spurious failures that can be investigated. And maybe fix them!

Size

Estimated at 175 hours.

The first deliverable, described in Phase 1, is small. By choosing from the Phase 2 options, however, the project can be extended to medium or large as suits the circumstances.

Required Skills

Suitable for the Following Interests

Project Mentor

Enhance Hackage server to display security vulnerability information🔗

Hackage is the Haskell community’s central package archive of open source software. It is an instance of the open source hackage-server program.

The Haskell Security Response Team maintains the Haskell Security Advisory Database. This database can serve as the basis for enhancing security tooling for the Haskell ecosystem.

In particular, the Advisory Database records known vulnerabilities of packages in the Hackage namespace. The advisory data includes the affected version ranges, written summary and details of the vulnerability, CVSS score and CWE numbers.

We propose to enhance hackage-server to use the advisory database to augment package pages with security information about the package. In particular, we propose:

Mentorship

Ideally someone familiar with the hackage-server implementation would be able to mentor the student.

Haskell Security Response Team can mentor and collaborate with respect to the Advisory Database, the content of advisories, or exporting the data in a format suitable for use by hackage-server.

Difficulty and size

Intermediate difficulty - 175 Hours

Haskell Language Server Cabal Plugin Continuation🔗

The hls-cabal-plugin is a Haskell Language Server (HLS) plugin that allows HLS to be a Language Server for .cabal files as well as Haskell files. While the plugin already provides many core features, there are more possible features that would increase the ergonomics of working with cabal files, such as:

With some creativity, we can come up with many more features.

Mentorship

Fendor

Difficulty and size

The difficulty of this project is medium, as there are two rather big existing projects that developers need to understand in order to provide improvements.

The estimated size of this project is 175 hours, but there is likely 350 hours worth of work depending on the mentee’s interests and ideas.

Use GHCs Structured Diagnostics in HLS🔗

Haskell Language Server provides many quick fixes and refactoring to fix common errors and warnings reported by GHC. This is done in the hls-refactor-plugin, but is implemented by manually parsing the text of GHCs error/warning messages. However, since GHC 9.2, GHC provides a much more structured representation of its error messages, which should allow a much more robust implementation of these refactorings and avoid fragile regular expression parsing of plain text messages.

Task List

References:

Mentorship

Zubin Duggal will mentor

Difficulty: Medium

Size: 175 hours should be enough to implement the infrastructure required and port most existing diagnostics. However, the work can easily expand to 350 hours to finish porting all diagnostics and making any necessary changes to GHC itself.

Haskell Language Server Test Suite Improvements🔗

Haskell Language Server (HLS) has an extensive test suite that is run on every commit. Over time, the test suite has degraded in performance, reliability and consistency. This has become a bottleneck in the development of HLS, as a slow and unreliable test suite deters new contributors and makes life more difficult for maintainers.

This project aims to improve the quality of the test suite by:

The issue #3736 provides some ideas on how to improve the test suite further.

To achieve the aforementioned goals, some of the following intermediate steps could be helpful:

Mentorship

Fendor

Difficulty and size

Intermediate - 175 hours

The project itself is not too difficult, as there is lots of prior work and many low hanging fruits. However, there is a fair amount of working with the internals of Haskell Language Server which can be intimidating as they tend to be underdocumented.

The size of this project ranges from 175 hours up to 350 hours, depending on the exact scope of the proposal.

Inlay hints in haskell-language-server🔗

Inlay hints are a relatively new langauge server protocol feature that allow servers to display additional information inline in the user’s editor, and in some cases triggering edits when clicked.

They have a wide variety of uses, the HLS issue discusses some of them. For example:

This project would be to try and implement as many of these inlay hint uses as possible in HLS.

Mentorship

Michael Peyton Jones will mentor.

Difficulty and size

Much of this project will not be too difficult, however it will require understanding and modifying many parts of HLS’s codebase, which is non-trivial. We may also need to work out how to get additional information from GHC, which will require interacting with GHC, and possibly contributing fixes upstream.

Difficulty: medium Size: A basic version is probably doable in 175 hours, but there is probably 350 hours worth of work, or the project could extend into implementing other missing LSP features.

Improve name resolution in Liquid Haskell🔗

Liquid Haskell is a tool to verify Haskell programs. The programmer supplies specifications for functions and data types in special comments, and the tool produces error messages when it cannot automatically prove that the program behaves as specified. The programmer can then improve the specifications or the program until the tool reports no errors. More information is available on the Liquid Haskell website. Similar to a type system in spirit, Liquid Haskell enrols the computer in the effort to discover programming mistakes.

Liquid Haskell has been around for a decade and is a versatile tool, but it still has a few issues which hinder the user experience. This project is to address one of these shortcomings.

When analyzing a program, Liquid Haskell needs to link identifiers that appear in specifications to the entities they refer to, much in the same way that a compiler needs to link identifiers in the text of a function to the language entities that they represent. We refer to this task as name resolution.

The identifiers in a specification can refer to other specifications, or they can refer to Haskell entities like functions and types. The output of name resolution tells for each identifier the module in which the referred entity is defined and the package it comes from.

At the moment, name resolution in Liquid Haskell is done twice for the same specifications, and the outcomes of both passes do not always yield the same result, leading to confusing errors and tedious workarounds. This project is about having name resolution done only once. For more details please see the github issue. The project involves both the implementation of a solution and the writing of a blog post summarizing the achievements.

As a stretch goal, this project might specify and fix the scope rules and mechanisms to use in Liquid Haskell specifications. While the intention of Liquid Haskell maintainers over time has been to imitate GHC scoping rules as much as possible, there are cases where Liquid Haskell just deviates from them. As part of attaining this supplementary stretch goal, one would need to answer questions like the following.

Potential Mentors: Facundo Domínguez

Difficulty: Medium

Size: 350 hours but they are flexible by adjusting the scope

Parse error recovery and incrementality for GHC🔗

GHC is able to report multiple type errors at once, yet a single parser error brings the whole compilation pipeline to a halt; see this tech proposal.

One significant obstacle is the parser generator happy that GHC relies on for versatile and fast parsing: The current error handling architecture exposed by happy will abort on the first parse error without producing a partial syntax tree at all.

This draft PR improves happy to resume parsing after reporting a parse error, but it lacks documentation, introduces a number of breaking changes and is in bad need of cleanup. Nevertheless, it is technically complete, passes the testsuite and has already been tried on GHC as a proof of concept.

The goal of this project is to take over the pull request to happy so that it can be merged, and then use the improved happy to generate multiple and better parse error messages in GHC.

There are a couple of stretch goals:

Potential Mentors: Sebastian Graf

Difficulty: Medium, given that the technical bits have been drafted out. Still, the student would be required to familiarise themselves with the basics of LALR parsing theory in order to contribute documentation.

Size: 175 hours for merging the PR and beginning to improve GHC, but 350 hours can easily be spent on working on stretch goals as well for significant improvement of GHC.