GSoC 2024 Ideas
This is a list of ideas for contributors who are considering to apply to Google Summer of Code 2024 for Haskell.org
For project maintainers
Are you working on a Haskell project and you could use the help of a contributor during the summer? Consider adding it as an idea here! You can contribute ideas by sending a pull request to our github repository (example from 2023). If you just want to discuss a possible idea, please contact us.
What is a good idea? Anything that improves the Haskell ecosystem is valid. The GSoC rules state that it must involve writing code primarily (as opposed to docs).
Projects should be concrete and small enough in scope such that they can be finished by the contributor. Past experience has shown that keeping projects “small” is almost always a good idea.
Important changes since 2021/2022: In the past, GSoC projects were expected to take up the equivalent of full time employment for a student. In 2021, this was reduced to half time positions: students were expected to work around 175 hours in a 10 week period. Since 2022, contributors now have the choice between a larger (around 350 hours) or a smaller project. Ideas should indicate in which category they fall.
Projects should benefit as many people as possible – e.g. an improvement to GHC will benefit more people than an update to a specific library or tool, but both are acceptable. New libraries and applications written in Haskell, rather than improvements to existing ones, are also welcome.
For students/contributors
We have added some tips on writing a proposal here. Please be aware that:
- This is not an all-inclusive list, so you can apply for projects not in this list and we will try our best to match you with a mentor.
- You can apply for up to two ideas (but only one can be accepted).
Table of Contents
- cabal-install security vulnerability checking
- improved override semantics for cabal.project files
- Continuous Integration Log Explorer
- Enhance Hackage server to display security vulnerability information
- Haskell Language Server Cabal Plugin Continuation
- Use GHCs Structured Diagnostics in HLS
- Haskell Language Server Test Suite Improvements
- Inlay hints in haskell-language-server
- Improve name resolution in Liquid Haskell
- Parse error recovery and incrementality for GHC
cabal-install security vulnerability checking🔗
cabal-install is a widely used tool for building Haskell projects. In addition to building and testing packages it can update package indexes from remote servers and handles some aspects of dependency management.
The Haskell Security Response Team maintains the Haskell Security Advisory Database. This database can serve as the basis for enhancing security tooling for the Haskell ecosystem.
In particular, the Advisory Database records known vulnerabilities of packages in the Hackage namespace. The advisory data includes the affected version ranges, written summary and details of the vulnerability, CVSS score and CWE numbers.
We propose the addition of security vulnerability checking to the
cabal-install
tool. For discussion purposes, this document
suggests the cabal audit
subcommand name, but this is just a
suggestion.
When executed in package or project context, cabal audit
would
analyse the dependencies of the package/project and advise when
vulnerable dependencies are found. There are several considerations
that warrant further discussion.
.cabal
files, in general, specify version bounds via thebuild-depends
field. In the general case, any overlap between a dependency’s version bounds specified in thebuild-depends
field, and known vulnerable version ranges specified in an advisory should be reported.cabal-install
can produce freeze files via thecabal freeze
subcommand, which specify an exact set of dependencies locked at particular versions.cabal audit
should have a mode that analyses freeze files. (This mode could work with an explicit input file, outside of a package or project directory context).In some cases, the vulnerable behaviour in a dependency is not used by the dependent library or program. A mechansim (or mechanisms) to suppress false positives is a requirement. Some or all of the following mechanisms should be considered:
A local cache of suppressions. In plain words, this would record information such as “suppress HSEC-2024-0001 for package acme-frobnicator”.
Extending the Cabal package metadata to enable package maintainers or trustees to record such suppressions in the package metadata itself. The data would be propagated through package indexes (e.g. Hackage) and ensure that users do not see false positives, after the metadata have been uploaded to the package index.
Extending the Advisory Database to record non-exploitability information. This is an alternative way of expressing and propagating the same data as the preceding point. The approaches are not mutually exclusive. The student should engage with the community and pursue a consensus on which approach is preferred, or if both are desired, which should be prioritised.
The VEX (Vulnerability Exploitability eXchange) standard might provide an appropriate data model for recording and/or transmitting (non-)exploitability information.
The Advisory Database can optionally record, for each advisory, the names of the problematic functions/values. It may be possible to use this data to produce exploitability information, but how to do so may be complicated or error prone. Depending on progress made, the student may or may not wish to pursue this idea.
Commands or behaviours to assist the user in reporting vulnerabilities to the Advisory Database is another idea to consider, if time permits.
See also David Christian’s call to action for writing a security advisory analyser for Haskell, which discusses the same general topic.
Mentorship
Ideally a Cabal/cabal-install developer/maintainer would be able to mentor the student.
Haskell Security Response Team can mentor and collaborate with respect to the Advisory Database, the content of advisories, expanding the database to include exploitability information, or exporting the data in a format suitable for use by hackage-server.
Difficulty and size
A Cabal/cabal-install maintainer should weigh in, but this whole effort definitely lies on the larger end.
Difficulty: Medium/Hard
Size: 350 hours
improved override semantics for cabal.project files🔗
The files which cabal uses to configure multipackage projects (cabal.project files) have been extended in recent years to allow includes and conditionals.
This makes more common and useful situations where certain stanzas (build constraints, flags, index-state, etc) may be used to override other stanzas, rather than simply augment them. (But also, we may at times want augmenting semantics – there’s a delicate balance).
The contents of different stanzas in cabal project files are monoidally accumulated. However, the monoid chosen for different stanzas has been done without much thought – typically either with purely accumulating or purely replacement semantics.
This project is for an audit of the monoidal semantics of the various setting which can be controlled by cabal.project, as well as proposal of and implementation of more useful ones.
Related tickets are
- https://github.com/haskell/cabal/issues/8568
- https://github.com/haskell/cabal/pull/9510
- https://github.com/haskell/cabal/issues/7556#issuecomment-1120433903
Mentorship
Gershom is willing to mentor
Difficulty and size
The difficulty of implementation is medium at most, but this will require somebody who is able to thoughtfully inventory, think through, and propose the specifics of a solution, ideally with prior experience with cabal project files as a user, and also with a sense of existing user workflows. The size of the project is likely 175 hours.
Continuous Integration Log Explorer🔗
Goals
Create a web-based tool that can be used to explore continuous integration test logs suitable for large projects with big workflows that are susceptible to rare intermittent failures.
There are two components to this goal.
Create a service that automatically inserts test logs into a full text search database.
Create a web tool for querying the full text search database and visualizing results.
Background
The Haskell compiler GHC has an old testsuite that is slowly lumbering into the modern era. As more aspects of GHC are tested automatically, rare intermittent failures that cause spurious test results are uncovered. As more infrastructure is added to support automation, the surface area for such spurious failures increases. Collectively, the intermittent failures affect many CI runs and can create a frustrating experience for would-be GHC contributors.
One successful technique for combating intermittent failures is to collect data from many test runs and look for patterns. By finding the “fingerprint” of a particular failure, we can identify whether it is indeed spurious, what circumstances accompany the failure, and how frequently it occurs. This information can be used to identify the root cause and fix the failure. At the very least, it can be used to recover from the failure automatically, giving contributors a smoother experience.
Existing Tooling
Some tooling to support this technique is found at https://gitlab.haskell.org/chreekat/spurious-failures/-/tree/master/local-tooling. It requires the user to manually download all job logs, and the “interface” is nothing more than a sqlite database. This project will improve on the idea.
There is already a service that listens to job events, found at https://gitlab.haskell.org/chreekat/spurious-failures/-/tree/master/spuriobot. Therefore, the first component of the project goal (creating a service that automatically inserts test logs into a full text search database) will only need to extend that service with the log-insertion feature.
Outcomes
Phase 1: The tool will be implemented and brought online with a basic user interface. It will only support GHC.
Phase 2, option 1: Guided by user feedback, better visualizations will be added to the UI.
Phase 2, option 2: The service that automatically inserts test logs into a full text search database will be extended to support Github workflows, allowing the tool to be used much more widely.
Phase 2, option 3: Use the tool to characterize spurious failures in GHC. There is a large list of potential spurious failures that can be investigated. And maybe fix them!
Size
Estimated at 175 hours.
The first deliverable, described in Phase 1, is small. By choosing from the Phase 2 options, however, the project can be extended to medium or large as suits the circumstances.
Required Skills
- Read and write technical English
- Haskell programming basics
Suitable for the Following Interests
- devops
- Haskell tooling
- web app development
- web services
- data visualization
Project Mentor
- Bryan Richter, Haskell Foundation DevOps engineer and author of existing tooling
Enhance Hackage server to display security vulnerability information🔗
Hackage is the Haskell community’s central package archive of open source software. It is an instance of the open source hackage-server program.
The Haskell Security Response Team maintains the Haskell Security Advisory Database. This database can serve as the basis for enhancing security tooling for the Haskell ecosystem.
In particular, the Advisory Database records known vulnerabilities of packages in the Hackage namespace. The advisory data includes the affected version ranges, written summary and details of the vulnerability, CVSS score and CWE numbers.
We propose to enhance hackage-server to use the advisory database to augment package pages with security information about the package. In particular, we propose:
Updating package/version pages to clearly indicate that the package/version contains known security issues, and provide details of those issues (a brief summary with a link to an external resource could be sufficient).
Updating package/version pages to clearly indicate that the package/version depends on (or may depend on, according to version bounds) vulnerable version of other packages.
Provide a link or information on every package page about how to report security vulnerabilities in that package. This could be a form that creates an issue or pull request against the
security-advisories
repository, sends an email to the SRT, or something along those lines.
Mentorship
Ideally someone familiar with the hackage-server implementation would be able to mentor the student.
Haskell Security Response Team can mentor and collaborate with respect to the Advisory Database, the content of advisories, or exporting the data in a format suitable for use by hackage-server.
Difficulty and size
Intermediate difficulty - 175 Hours
Haskell Language Server Cabal Plugin Continuation🔗
The hls-cabal-plugin
is a Haskell Language Server (HLS) plugin that allows HLS to be a Language Server for .cabal
files as well as Haskell files.
While the plugin already provides many core features, there are more possible features that would increase the ergonomics of working with cabal files, such as:
- Goto-Definition for local stanzas, such as library, executable or common stanzas
- Integrating the
cabal-add
into HLS - Prompt to add unknown modules to
exposed-modules
andother-modules
sections. - Completion of local and non-local package names
- Completion of package version bounds
- Showing documentation for keywords and enum values.
With some creativity, we can come up with many more features.
Mentorship
Fendor
Difficulty and size
The difficulty of this project is medium, as there are two rather big existing projects that developers need to understand in order to provide improvements.
The estimated size of this project is 175 hours, but there is likely 350 hours worth of work depending on the mentee’s interests and ideas.
Use GHCs Structured Diagnostics in HLS🔗
Haskell Language Server provides many quick fixes and refactoring to
fix common errors and warnings reported by GHC. This is done in the
hls-refactor-plugin
, but is implemented by manually parsing the
text of GHCs error/warning messages. However, since GHC 9.2, GHC provides a
much more structured representation of its error messages, which should allow a
much more robust implementation of these refactorings and avoid fragile regular
expression parsing of plain text messages.
Task List
- Adding infrastructure to
ghcide
and HLS to keep and associate structured error messages with the diagnostics we send to the editor so that it is possible to look up the associated structured error value for any diagnostic that the editor wants us to fix. - Port the existing refactorings to use the diagnostic infrastructure instead of relying on parsing rendered error messages.
- Cataloging any existing refactorings that cannot be ported to use the new structured diagnostic infrastructure, and possibly making GHC MRs to remedy the situation.
- Implement more refactorings that are enabled by taking advantage of more suggestions in structured error messages.
References:
- https://github.com/haskell/haskell-language-server/issues/2014
- https://gitlab.haskell.org/ghc/ghc/-/issues/18516
- https://github.com/ghc-proposals/ghc-proposals/pull/306
- https://gitlab.haskell.org/ghc/ghc/-/wikis/Errors-as-(structured)-values
Mentorship
Zubin Duggal will mentor
Difficulty: Medium
Size: 175 hours should be enough to implement the infrastructure required and port most existing diagnostics. However, the work can easily expand to 350 hours to finish porting all diagnostics and making any necessary changes to GHC itself.
Haskell Language Server Test Suite Improvements🔗
Haskell Language Server (HLS) has an extensive test suite that is run on every commit. Over time, the test suite has degraded in performance, reliability and consistency. This has become a bottleneck in the development of HLS, as a slow and unreliable test suite deters new contributors and makes life more difficult for maintainers.
This project aims to improve the quality of the test suite by:
- Reducing the overall test suite execution time
- Fixing flaky test cases
- Removing artificial wait times in test cases
- Unifying the style of tests in HLS
The issue #3736 provides some ideas on how to improve the test suite further.
To achieve the aforementioned goals, some of the following intermediate steps could be helpful:
- Unifying testing infrastructure of ghcide and plugins
- Enabling parallelism of test case execution
- Exploiting custom LSP messages to reduce flakiness of tests written in lsp-test
Mentorship
Fendor
Difficulty and size
Intermediate - 175 hours
The project itself is not too difficult, as there is lots of prior work and many low hanging fruits. However, there is a fair amount of working with the internals of Haskell Language Server which can be intimidating as they tend to be underdocumented.
The size of this project ranges from 175 hours up to 350 hours, depending on the exact scope of the proposal.
Inlay hints in haskell-language-server🔗
Inlay hints are a relatively new langauge server protocol feature that allow servers to display additional information inline in the user’s editor, and in some cases triggering edits when clicked.
They have a wide variety of uses, the HLS issue discusses some of them. For example:
- Replacing the bulky import lens with a compact inlay hint
- Type annotations on various kinds of binding
- Explicit display of record field names when they are omitted
- More we haven’t thought of!
This project would be to try and implement as many of these inlay hint uses as possible in HLS.
Mentorship
Michael Peyton Jones will mentor.
Difficulty and size
Much of this project will not be too difficult, however it will require understanding and modifying many parts of HLS’s codebase, which is non-trivial. We may also need to work out how to get additional information from GHC, which will require interacting with GHC, and possibly contributing fixes upstream.
Difficulty: medium Size: A basic version is probably doable in 175 hours, but there is probably 350 hours worth of work, or the project could extend into implementing other missing LSP features.
Improve name resolution in Liquid Haskell🔗
Liquid Haskell is a tool to verify Haskell programs. The programmer supplies specifications for functions and data types in special comments, and the tool produces error messages when it cannot automatically prove that the program behaves as specified. The programmer can then improve the specifications or the program until the tool reports no errors. More information is available on the Liquid Haskell website. Similar to a type system in spirit, Liquid Haskell enrols the computer in the effort to discover programming mistakes.
Liquid Haskell has been around for a decade and is a versatile tool, but it still has a few issues which hinder the user experience. This project is to address one of these shortcomings.
When analyzing a program, Liquid Haskell needs to link identifiers that appear in specifications to the entities they refer to, much in the same way that a compiler needs to link identifiers in the text of a function to the language entities that they represent. We refer to this task as name resolution.
The identifiers in a specification can refer to other specifications, or they can refer to Haskell entities like functions and types. The output of name resolution tells for each identifier the module in which the referred entity is defined and the package it comes from.
At the moment, name resolution in Liquid Haskell is done twice for the same specifications, and the outcomes of both passes do not always yield the same result, leading to confusing errors and tedious workarounds. This project is about having name resolution done only once. For more details please see the github issue. The project involves both the implementation of a solution and the writing of a blog post summarizing the achievements.
As a stretch goal, this project might specify and fix the scope rules and mechanisms to use in Liquid Haskell specifications. While the intention of Liquid Haskell maintainers over time has been to imitate GHC scoping rules as much as possible, there are cases where Liquid Haskell just deviates from them. As part of attaining this supplementary stretch goal, one would need to answer questions like the following.
What names are in scope when writing the specifications of a module?
How should ambiguities be resolved when imports offer definitions with the same Liquid Haskell names?
What Liquid Haskell names are exported from modules and when?
Potential Mentors: Facundo Domínguez
Difficulty: Medium
Size: 350 hours but they are flexible by adjusting the scope
Parse error recovery and incrementality for GHC🔗
GHC is able to report multiple type errors at once, yet a single parser error brings the whole compilation pipeline to a halt; see this tech proposal.
One significant obstacle is the parser generator
happy
that GHC relies on for versatile
and fast parsing:
The current error handling architecture exposed by happy
will abort on the
first parse error without producing a partial syntax tree at all.
This draft PR improves happy to resume parsing after reporting a parse error, but it lacks documentation, introduces a number of breaking changes and is in bad need of cleanup. Nevertheless, it is technically complete, passes the testsuite and has already been tried on GHC as a proof of concept.
The goal of this project is to take over the pull request to happy
so that it
can be merged, and then use the improved happy
to generate multiple and better
parse error messages in GHC.
There are a couple of stretch goals:
happy
could further be improved to pass a closure of its parse state to reduction actions, so as to enable incremental parsing in GHC’s parser.- Improve
happy
so that it provides a convenient and encapsulated way to introspect the LALR item stack, for example to identify bracketing productions such as'(' expr . ')'
in GHC’s parser in order to report mismatched brackets. There is a hacky GHC Merge Request that tries to achieve as much without buy in from happy. - Improve
happy
s code base, which by now is over 25 years old. For example, recentlyhappy
has been modularised, thus split into multiple independently usable packages (for modelling grammars, building LALR tables, producing Haskell code from LALR tables, etc.), but unfortunately the individual packages lack documentation and examples.
Potential Mentors: Sebastian Graf
Difficulty: Medium, given that the technical bits have been drafted out. Still, the student would be required to familiarise themselves with the basics of LALR parsing theory in order to contribute documentation.
Size: 175 hours for merging the PR and beginning to improve GHC, but 350 hours can easily be spent on working on stretch goals as well for significant improvement of GHC.