GSoC 2019 Ideas
For project maintainers
This is a list of ideas for students who are considering to apply to Google Summer of Code 2019 for Haskell.org. You can contribute ideas by sending a pull request to our github repository. If you just want to discuss a possible idea, please contact us.
Please be aware that:
- This is not an all-inclusive list, so you can apply for projects not in this list and we will try our best to match you with a mentor.
- You can apply for as many ideas as you want (but only one can be accepted).
- Some general tips on writing a proposal are discussed here.
Table of Contents
- HsYAML Improvements
- More graph algorithms for Alga
- Implement more library widgets for brick
- Improve the CodeWorld environment for Haskell in K-12 education
- Dhall language server backend
- Enhance large eventlog handing in ThreadScope
- Extract heap profiles from the event log
- Build-integration and Badges for Hackage
- Finish the package candidate workflow for Hackage
- Hadrian Optimisation
- Functional Machine Learning with Hasktorch
- issue-wanted: Web app for discovering potential contributions to the Haskell ecosystem
- Property-based testing stateful programs using QuickCheck
- Make servant-auth "open" for defining custom auth schemas
- Stack performance improvements
- Streaming JSON/YAML parser
- Implement tokenstream-based parsing in `aeson`
- Implement accepted GHC proposals
HsYAML is a pure Haskell idiomatic implementation of the YAML 1.2 data serialization language with a strong emphasis on compliance with the YAML 1.2 specification.
HsYAML leverages Haskell’s predisposition for writing language parsers and by implementing the YAML parser natively in Haskell this also avoids the need for interfacing via FFI to C-based libraries such as
LibYAML. Benefits of this approach include:
- Trivially portable to Haskell implementations such as Eta or GHCJS which are not C-based
- Avoiding the risk of vulnerabilities commonly associated with C-based implementations (e.g. CVE-2013-6393, CVE-2014-2525, CVE-2014-9130) which is important for webservices or other systems consuming potentially malicious YAML data
- Control over the design of all codepaths of the YAML pipeline as we don’t need to subject ourselves to the design-choices of a pre-existing C system-library’s API (such as
LibYAML) or have to deal with different versions of said libraries being pre-installed
- Easier to audit, maintain, develop and debug by avoiding the added complexity of dealing with two languages and being subjected to the complexities and limitations of FFI (including memory allocation and data marshalling); this also avoids the need for bypassing the typechecker via
unsafePerformIOto hide the internal occurrences of
HsYAML library in its current form is already successfully in use, there’s still lots of room for improvement; potential improvements include:
- Implement/finish YAML pipeline for dumping/emitting YAML
- Extend data-model to allow for load/dump round-tripping while preserving ordering, anchors, and comments (to allow automated non-lossy refactoring/modification of YAML documents)
- Improve error handling (i.e. more accurate error messages; provide source-locations and/or fragments in higher native/representation layer
- Improve/optimize performance
- Integrate results from tokenstream-based parsing in
aesonproject (if those become available early enough during the GSOC)
Note that the list above is only a non-exhaustive list of potential improvements for the student to pick from; you don’t have to accomplish everything in that last nor are the items on that list the only things worth doing!
This project would be a good opportunity for intermediate students to work on a parser library for a popular non-trivial data format.
Skills/knowledge expected from the student
Basic knowledge of how parsers work; ability to understand a language grammar
Basic knowledge of Haskell; e.g. being able to write a Haskell parser and printer for a simple language; being familiar with popular Haskell APIs for other formats such as e.g.
Having a general appreciation and understanding of the YAML 1.2 format and its feature-set/data-model (intimate knowledge of the YAML 1.2 grammar productions is desirable but not essential; help will be provided in case it becomes necessary to look at specific grammar productions)
Mentor: Herbert Valerio Riedel
More graph algorithms for Alga🔗
The Algebraic Graphs library has so far succeeded (thanks in part to last years summer of code) in providing a common, typesafe interface for graph instances. However, a lot of graph algorithms are still missing.
Therefore the goal of this year’s Summer of Code is to increase the usefulness of Alga for a wider audience:
Goal 1: Develop a type-safe representation for acyclic graphs in Alga.
Acyclic graphs are both common and heavily used in dependency management. Improvements in this area would therefore directly benefit downstream packages like build, plutus or aura, as well as a few commercial users of the library.
In particular, the result should be a type-safe abstraction, that makes it easier to work with algorithms like
topSort as has been remarked in some issues.
This includes adding tests to the testsuite and writing a short blog post or article documenting the design and possible use cases.
Goal 2: Implement common graph algorithms like Kruskal, Dijkstra and Moore-Bellman-Ford
Thanks to the edge-labelled algebraic graphs developed in the build-up of last year’s Summer of Code, it is now possible to encode distances in Alga; however, this feature is rarely used as there are few algorithms provided.
The student should provide algorithms solving the following problems: - Finding a minimum spanning tree (Kruskal or Prim or ..) - Finding shortest paths in a graph - with positive edge-weigths (Dijkstra) - with conservative edge-weights (Moore-Bellman-Ford) - between all vertices (Floyd-Warshall or Dijkstra+MBF)
As with Goal 1, tests and documentation are mandatory.
Optional: Implement more advanced algorithms
A common problem in graph theory is the following: Given a network (e.g. a graph) with nodes s and t, find the maximum flow that can be sent over this network from s to t. While this might seem highly specialized at first, it can be used to solve a wide variety of problems.
Algorithms solving this or other common problems would be welcome, as they increase the scope of what Alga can be used for.
Mentors: Andrey Mokhov
Implement more library widgets for brick🔗
brick is a library for writing terminal user interfaces in Haskell. Users compose interfaces using high-level combinators in order to create complicated interfaces.
A widget is a subcomponent of a user interface. A widget can be rendered, has an internal state and can respond to events. The library already provides several widget types including drop down menus, a file browser, text entry and so on but users writing their own applications will invariably find that they have to implement their own widget at some point.
Examples can be found in already existing applications
gitlab-triageimplements a lazily loaded list which requests additional elements by HTTP requests.
gitlab-triageimplements an autocomplete dialog which combines together a text-field and a list.
purebredis a complicated application with several widgets.
The goal of the project is to implement more widgets in a library which can be reused in other applications.
A successful proposal should specify an idea for at least four different types of widget to implement. It could be possible to start with extracting existing widgets from previously mentioned projects but by the end of the project the student should be implementing their own widgets from scratch.
Potential Mentors: Roman Joost, Jonathan Daugherty (proposal review)
Improve the CodeWorld environment for Haskell in K-12 education🔗
CodeWorld is a web-based tool designed to support Haskell in K-12 education, as well as provide a low-overhead playground for trying things in Haskell in general. There are several promising projects here for students interested in promoting functional programming in education.
Haskell-mode features. Difficulty: Intermediate. This project is to expand the usefulness of CodeWorld for the mainstream Haskell ecosystem. Some ideas here are collected in the relevant CodeWorld project on GitHub. Although the page at https://code.world implements a simplified variant of Haskell for teaching children, the CodeWorld project also provides https://code.world/haskell, which is a more vanilla Haskell environment. However, this “Haskell mode” environment could be made more useful in a few ways. Adapters for well-known Haskell libraries like Gloss or Diagrams or Reflex or others would be one project. More ambitious is to extend the I/O implementation to implement reading from an stdin stream implemented in the browser. A strong proposal for this would include a vision of what features would be useful to the Haskell community.
Guided activities. Difficulty: Intermediate. Some reflections on this are available in CodeWorld Issue #793. Although CodeWorld is designed to be a general-purpose tool to create whatever you like, in practice many educators would like to assign specific activities, and have the environment guide their students through them. The environment could do things like make certain parts of the code uneditable or hidden, check answers and proceed to a next step or exercise in a sequence when complete, keep track of points (or “stars” as many casual games do) as students progress, and simplify the normal CodeWorld UI so that an exercise can be embedded into an iframe without the project saving UI and things like that.
Finish mobile app export. Difficulty: Intermediate. There is some code already written to export from CodeWorld projects to mobile apps, and CodeWorld Issue #22 has discussion and advice. It would be exciting to finally offer this feature to students learning with CodeWorld. Although the existing pull request implements a rough sketch of this feature, there is a lot of room for working out the details, such as offering on-screen controls that generate key events, fine tuning the developer experience, tuning the generated user interface, and building official apps so that student projects work on mobile without Android third-party sources or iOS jailbreaking. This would be an ideal project for a Summer of Code student with past experience developing for mobile platforms or using Apache Cordova.
Automated requirements checking. Difficulty: Intermediate to Advanced (depending on the features implemented). Some steps on this are available in the related CodeWorld project on GitHub. CodeWorld already has an experimental system for instructors to define a set of requirements for the source code of a project, such as containing certain source constructs, functions using all of their parameters, etc. There’s plenty of room to both extend the set of rule types this system knows about, and experiment with sample projects to prove its usefulness. An ambitious student could consider proposals to change the implementation so that the system also allows for runtime testing.
Expression-level debugging. Difficulty: Advanced. Some brainstorming on this project is already available in CodeWorld Issue #741. A promising approach is to create a Haskell source plugin which adds new module exports for each expression in a module, and generates a source map linking them to the original expression in the source code. These two things would support a UI for a programmer to select an expression and inspect its value. Although the output UI implementation may be specific to CodeWorld or another editor, the source plugin and a library of supporting code would be reusable by many tools.
Additional projects are also possible, and some of the best CodeWorld projects in Summer of Code have been students proposing their own ideas after familiarizing themselves with the environment.
Potential Mentors: Chris Smith, Brandon Barker
Difficulty: Varies (see specific ideas above)
Dhall language server backend🔗
The Dhall configuration language is a programmable configuration language that is not Turing-complete. People most commonly migrate other configuration file formats (like YAML) to Dhall when their configuration files become large, repetitive, and unwieldy to maintain.
Dhall supports several features not commonly found in other programming languages, such as:
- Strong normalization of all expressions (even functions)
- Importing expressions from relative paths, URLs, or environment variables
- Strong language security guarantees, including semantic integrity checks
Dhall is a small language with a formal standard and a standard evolution process. This makes the language a slow-moving target that can be enriched with better developer tools, including support for integrated development environments (IDEs).
The Language Server Protocol provides a convenient path to broad IDE support. Any language server that implements the backend half of the protocol is compatible with any editor that implements the frontend half of the protocol (which is every widely used editor). This greatly reduces the effort necessary to add Dhall support for the large number of IDEs in the wild.
Initial work has already begun to implement a Dhall language server, but the work is not complete. An initial prototype integration is in place, powered by the Haskell bindings to Dhall and the Haskell implementation of the language server protocol, but remaining work includes:
- Supporting a broader range of IDE features
- Simplifying installation of the language server executable for IDE users
This project should be appropriate for an intermediate Haskell programmer to contribute to, possibly exercising the following skills depending on the specific contributions the student is interested in:
Programming languages and compilers
The student will most likely need to modify the Dhall package to extract relevant information for the language server, which will give them hands-on experience improving a real-world interpreter.
Providing simple language server installers for a wide variety of platforms will expose the student to a broad survey of packaging ecosystems and tools.
Mentor: Gabriel Gonzalez
Enhance large eventlog handing in ThreadScope🔗
ThreadScope is a GUI for viewing and analyzing GHC events (aka. “event logs”). One of the current limitations of ThreadScope is that it needs to deserialize and keep the whole event log file (
.eventlog generated by the GHC runtime system) in memory. For a 1G eventlog file this requires more than 30G memory. Given how easy it is to generate a 1G-large eventlog file (just run a threaded server under load for a few minutes), this renders ThreadScope useless for profiling long-running applications.
Goal of this project is to improve this by loading necessary chunks of the .eventlog file (or a pre-processed file derived from it) in runtime and releasing the unnecessary bits (thus freeing memory).
Some investigation on an implementation strategy has already been done by the mentor of this project, however there are bugs to fix and code to refactor before actually implementing this feature, and the actual implementation plan is open to discussion.
In addition to Haskell experience, GTK+ experience will be useful for this project.
Mentor: Ömer Sinan Ağacan
Extract heap profiles from the event log🔗
The event log is a general purpose mechanism for understanding what a Haskell program is doing. It is a binary stream of low-level information logging what the RTS is doing.
Ben Gamari recently extended the format so that heap profiling events are also written to the log. This means that, in theory, most the information present in a heap profile is now available for consumption in the event log. Support still needs to be added for the biographical and retainer profiler.
However, tools such as
hp2pretty still consume heap profiles. It would be beneficial if they would instead consume the event log for at least two reasons. Firstly it means that the specific logic relating to making
.hp files can be removed from the compiler. Secondly, users can better correlate over events in their program with the program’s memory usage. This would implement a feature present in
nhc98 from over 20 years ago.
The goal of this project is to eliminate the this redundancy by making the event log the primary way to produce heap profiles.
- Understand the event log and heap profile format. Write a program to convert the event log to a heap profile.
- Modify tools such as
hp2prettyto directly consume an event log.
- Make the biographical and retainer profiler work with the event log.
- Modify the RTS so that turning on the heap profiler just means emitting the event log.
- Add support for marking user events emitted with
traceEventon a heap profile graph.
The project is a great way to get to grips with how to analyse Haskell programs at a very low level. An area where there is much scope for innovation and improvements.
Potential Mentors: Matthew Pickering, Ben Gamari
There is also much more scope for projects involving the profiler and event log such as:
Build-integration and Badges for Hackage🔗
The hackage docbuilder currently only gives a pass/fail with generated documentation or failure logs. Ideally we should be able to infer and present a lot more interesting data about packages to encourage package maintainers. Existence of test-suites and the extent of their coverage, success of builds with different versions, existence of benchmark suites, even extent of documentation can all be recognized with badges or shields.
This work involves extending the existing docbuilder to run more detailed builds and report more detailed data, as well as extending the Hackage UI to better display data both within cabal metadata and also as generated by the builder.
Additionally, it would be good to rearchitect the builder so that it doesn’t store its “unbuildable” set locally, but instead is locally stateless and driven by polling the hackage server for instructions – this allows better scale-out and parallelization of builders, as well as distribution of work.
Potential Mentors: Gershom Bazerman, Herbert Valerio Riedel
Finish the package candidate workflow for Hackage🔗
Hackage candidate packages currently cannot be used directly, and their UI could be improved. We would like to have new packages be uploaded as candidates by default, to improve the vetting process (and the cabal tool currently does upload candidates when not passed the
--publish flag) . But this means polishing off candidate functionality. The main issues left to do are tracked here.
The first step is moving the candidate display page to the new templating system and sharing code with the main package page. Following this, we need to implement a new candidate index, able to be provided as a secondary index. This would be a “v1” index, and mutable.
Beyond this we want to extend the docbuilder and docuploads to work with candidates, and then implement a fixed workflow from candidacy to validation and then publishing.
Mentors: Gershom Bazerman, Herbert Valerio Riedel
Hadrian is a new build system for the Glasgow Haskell Compiler. Hadrian is written in Haskell and will hopefully become the default build system around GHC 8.8. Right now both Hadrian and the current Make-based build system peacefully co-exist in the GHC tree, and Hadrian jobs are run alongside the Make ones in our CI pipelines since the recent move to GitLab.
GHC is a large project. It takes a long time to build it, which slows down GHC development and is a bottleneck for the CI infrastructure. The goal of this project is to reduce the time it takes Hadrian to build GHC on various platforms.
The project comprises two parts:
Profiling: you will need to find and prioritise optimisation opportunities in Hadrian itself, in the Shake library that powers Hadrian, and in the GHC dependency graph (i.e. analyse the critical path of the build graph and figure out potential GHC refactoring that could reduce it).
Implementation and evaluation: you will pick the most promising optimisation opportunities and implement them, measuring and reporting on the resulting improvements.
We have already identified several promising opportunities for optimisation, and several profiling techniques that are certain to uncover others, so there are plenty of paths to follow.
The project is a great way to familiarise yourself with the GHC codebase, and make a concrete and measurable impact on the productivity of all GHC developers.
Mentors: Andrey Mokhov, Alp Mestanogullari, Neil Mitchell
Functional Machine Learning with Hasktorch🔗
Machine learning with neural networks can be viewed as an approach to computation that relies on optimization over function composition. In spite of the use of pure functions, composability, and function transformations (such as differentiation), most available frameworks target dynamically-typed imperative languages rather than typed functional languages.
Hasktorch is a library for tensors and neural networks in Haskell. It is an independent open source project which leverages PyTorch’s C / C++ backend implementation and provides low-level bindings as well as higher-level abstractions for math and model-building.
For this GSOC project, there would be two main areas to contribute -
Target Libtorch 1.0 and Overhaul the Foreign Function Bindings
The next release’s FFI will undergo a substantial overhaul, tracking major upstream developments with libtorch/pytorch 1.0. We will change the low level parts of the stack in doing so. Code generation will be updated to be derived from a yaml spec (instead of parsing the C source) and target the new libtorch C++ backend (which subsumes the ATen/C functions). As new development, a second yaml spec defining function derivatives will be used to auto-generate a large set of composable differentiable functions that will be usable from Haskell.
Skills Used: C++, systems programming, FFI, parsers
Produce Functional Machine Learning Model Reference Implementations
At a higher level, we want to demonstrate proof-of-concept implementations of machine learning models for a variety of use cases including structured data, natural language processing, and image data. The goal would be to provide reference examples using variational auteoncoders, RNNs, CNNs, and graph neural networks. This will also require implementing supporting modules for reusable optimizers and model interpretation/visualization.
We will explore how language capabilities such as dependent types and typeclasses combined with functional data transformation concepts such as lenses and pipes affect the representation and implementation of modern machine learning systems.
Skills Used: Machine learning, basic type-level programming, typeclasses
Mentors: Sam Stites, Austin Huang
issue-wanted: Web app for discovering potential contributions to the Haskell ecosystem🔗
Haskell ecosystem can be improved in many ways. There are many libraries of different categories with tons of issues — starting with low-hanging fruits to expert level problems. However, there is no easy way to discover potential problems you would like to work on. The goal of issue-wanted is to help find project issues using GitHub labels and various metadata fields from
.cabal files (like
The benefits of this project are the following:
issue-wantedit’s more accessible for beginners to find interesting issues they can solve.
- Average maintenance of Haskell libraries will be improved: if project owners want more open-source contributors, they could provide better meta information to their packages, and this actually pays off. As a consequence, such projects can become more popular and gather more attention.
issue-wanted contains several parts:
- Haskell process that syncs information about Haskell repositories at GitHub.
- Haskell backend that provides REST API for fetched information.
- Elm frontend that displays this information.
The student is expected to implement Haskell parts of the project on top of existing minimalistic scaffold of
issue-wanted. Minimal goal: be able to find issues from GitHub using
issue-wanted web interface by repository labels and
category field from
.cabal file. Maximal goal: implement user authentication via GitHub and achievement system (to motivate contributors even more).
This project is an excellent way for students to work on the real-world Haskell application using modern libraries and approaches.
What is expected from student
- Basic knowledge of Haskell programming language concepts:
Monads, ability to write code with
- Very basic knowledge of SQL. At least student should not be afraid to learn and use SQL, though it’s not expected to have any complex queries in the implementation.
- Patience. Reading GitHub API documentation and testing integration of multiple systems might be frustrating sometimes.
What student can get from this project
- Experience of writing REST API web services in Haskell.
- Familiarity with intermediate Haskell features like monad transformers and type-level computations.
- Knowledge of modern Haskell programming patterns and idioms.
Potential Mentors: Dmitrii Kovanikov, Veronika Romashkina
Mentors can take care of Elm frontend and deployment. But students are not restricted to work with Haskell only. If they have any interest in other parts of the project, they can experiment with those parts without any limitations.
Suggested libraries for the project:
servantfor writing REST API servers and clients in Haskell
- PostgreSQL as database using
- JSON as a communication format between backend and frontend,
asyncfor writing concurrent code
Property-based testing stateful programs using QuickCheck🔗
When the first version of
QuickCheck was released for Haskell it was the state-of-the-art in testing. Today however it’s lagging behind, for example, Erlang’s
eqc libraries. The
quickcheck-state-machine library is an attempt to add state machine modelling to Haskell’s QuickCheck for testing stateful/monadic code, and thereby catch up with the Erlang versions of QuickCheck.
This proposal is about using, and possibly extending,
quickcheck-state-machine in order to improve the quality of Haskell code in general and for a specific project in particular.
The intermediate candidate could:
Find a commonly used and stateful Haskell library or application to test;
Write a state machine model, for said library or application, together with at least a sequential property, and possibly a parallel property as well;
Getting this far would already reach the goal, but if there’s enough time the candidate could in addition to the above also:
- Add fault injection to the model, and thereby test the robustness of the code;
The advanced candidate could additionally try to one of the following items:
Combine fault injection with parallel testing and thereby achieve Jepsen-like tests;
Use the gained experience and try to improve the
Mentors: Stevan Andjelkovic and Robert Danitz
Difficulty: Intermediate to advanced
Make servant-auth "open" for defining custom auth schemas🔗
The popular and principled web framework Servant offers an extensible way to define web APIs as types, which is perhaps unique among web frameworks in any programming language, and this provides a way to build all kinds of related functionality from servers to clients to documentation. As a result, an entire ecosystem of Servant-related libraries have sprung up to solve problems related to web programming.
Since its introduction in 2017, the servant-auth package has seen uptake in the community, but unlike the rest of the Servant ecosystem,
servant-auth won’t let you define your own auth schemas and use them just as you’d use
servant-auth’s out-of-the-box support for basic auth or JWT. Indeed, users have shown interest in Oauth1 and Oauth2 (and, relatedly, OpenIDConnect) in the Servant family of libraries, so if
servant-auth were open, it should be more straightforward to integrate these and other auth schemas for Servant servers, clients, API documentation, and in other places.
Thus, one potential project could be to open up
servant-auth and make it extensible in a way similar to Servant itself. After that, a useful further goal could be to implement Oauth1 or another common authorization schema using this new extensibility. This would not only be a widely appreciated end result in itself, but it would also offer an example of how to use this new functionality in the project. The end goal of this effort would be to increase the flexibility and freedom end-users have to implement their own auth schemas in
servant-auth, along with reasonable haddocks, and some tests for the new functionality as well.
The student could optionally also (co-?)author cookbook recipes illustrating how to serve or query APIs that are protected by the newly supported authentication schemes, or cookbooks on how to implement new auth schemas using the new functionality made available in
servant-auth, and this would make the student’s work easily discoverable by current and future servant users.
By the end of the summer, servant-auth would move closer to its goal of being the definitive auth solution for Servant and provide Servant users with even more tools to fully realize the goals of their applications.
Mentor: Alp Mestanogullari
Stack performance improvements🔗
The Stack build tool has been in development for some years now. While efforts are usually taken to improve performance where possible, it has been some years since we’ve had a dedicated focus on improving performance across the board. Recent refactorings as part of the Stack 2.0 effort have either introduced some slowdowns that can be corrected, or opened opportunities for performance speedups.
In addition to finding and addressing performance sensitive areas themselves, the Stack team has begun collecting such issues into a Github project board on performance improvements. The goal would be to:
- introduce some concrete performance benchmarks to measure performance
- address as many of these individual topics as possible
- provide updates to the community on the performance improvements achieved
Potential Mentors: Michael Snoyman, Niklas Hambüchen
Streaming JSON/YAML parser🔗
aeson library is the de facto standard JSON library in the Haskell ecosystem today. Both parsing and rendering have historically been performed via an intermediate datatype,
Value, which represents all possible raw JSON values.
This type forms a tree.
Of particular interest is the
Object constructor, whose type is defined as:
This type is what you get when you parse JSON, and what you give to the printer when you want to generate JSON.
yaml library builds on top of
aeson’s data type with its primary API, providing support for a common subset of YAML in both parsing and rendering. By reusing the same data type,
yaml users can reuse existing parsing and rendering functionality from
aeson, simplifying implementations. For example,
aeson authors have already discovered and addressed one limitation in this approach: it is a significant overhead to create an intermediate
Value value when rendering. Instead,
aeson now supports encoding directly to a bytestring
yaml’s primary interface does not support this, though the
Text.Libyaml API does provide a streaming interface for rendering.)
On the parsing side, however, all data still flows through the intermediate
Value type. This is not only inefficient, but leads to vulnerability to maliciously formed object inputs: you have to consume the whole object, no matter how many keys, before you can do anything else. This causes a weakness for any web services accepting JSON or YAML from untrusted sources. Additionally, the current parsing mechanism makes it difficult to perform some common activities, like provide warnings for unused fields.
For example, if your parser only needs two fields
bar, but I send:
Then you had to consume and allocate 26 fields into an
Object, only to discover that
bar weren’t even in the object. Imagine if the user submitted one million, or sent specially crafted keys to exploit a weakness in the hashing algorithm of the hashtable used to contain the
Object. There is no way to manage that.
This is why we need streaming parsing to consume input incrementally.
Meanwhile, for YAML (and to some extent JSON), you often want a description of what the parser is capable of parsing, for user-documentation (or in the case of JSON, API docs). For that, the parser should be built as an
Applicative data type. For example:
data MappingParser a where MPPure :: a -> MappingParser a MPLiftA2 :: (a -> b -> c) -> MappingParser a -> MappingParser b -> MappingParser c MPAlt :: MappingParser a -> MappingParser a -> MappingParser a MPField :: Doc -> Text -> ValueParser a -> MappingParser a data ValueParser a where VPScalar :: Doc -> (ByteString -> Maybe a) -> ValueParser a VPArray :: ([a] -> b) -> ValueParser a -> ValueParser b VPMapping :: MappingParser a -> ValueParser a
Give it an
Applicative instance and a
Semigroup instance (for alternation`, and this would let you write something like:
stackParser :: ValueParser StackConfig stackParser = mapping $ StackConfig <$> field "The stackage resolver e.g. lts-123." "resolver" scalarText <*> (field "Project packages." "packages" (array scalarText) <> pure ["."]) <*> ((Just <$> field "GHC compiler to use." "compiler" scalarText) <> pure Nothing)
And then generate a
--help for your program, such as:
Many projects use YAML as their configuration file format, and could benefit from having a parser capable of generating documentation.
Meanwhile, some web services could benefit from a trivially-generated documentation/schema of their JSON parser.
The goal of this project is to implement a new parsing mechanism which works on a stream of data instead of a full
Value-style type, and permit automatic generating of documentation. Initially, this will be for parsing YAML, with the goal of using the same machinery for JSON data as well.
Potential goal: if possible to do without introducing a lot of complexity, it would be a bonus to have the parser double up as both a printer and parser as a kind of profunctor or “bimap” (as featured in
tomland). However, not all data types that you want to parse neccessarily need to be printed, and vise-versa. So care would have to be taken to allow for defining parsers without printers, and the opposite. If this seems to be too difficult, it’s not a neccessary part of the project.
Potential Mentors: Chris Done
Implement tokenstream-based parsing in `aeson`🔗
Haskell’s aeson library for JSON parsing, is used pervasively throughout the Haskell ecosystem, including but not limited to powering most of Haskell’s web-service frameworks. Consequently, improvements to
aeson are likely to have a major beneficial impact on the Haskell ecosystem.
aeson is already reasonably optimised, a long-known issue remains with the intermediate JSON abstract syntax tree. For the serialization path, this was addressed early on in
aeson-0.10.0.0 by adding support for direct encoding to
Builders and thus bypasses the intermediate construction of the
Value-tree and has resulted in significant speed ups realised to various degrees by every consumer of
A prospective solution for the parsing direction, however, has proven to be more elaborate and has not fully materialised yet. And yet, parsing JSON predictively efficient is critical when dealing with web-services that need to consume potentially malicious JSON input which when carefully crafted can be used to perform DoS attacks on a web-service.
In order to address this, preliminary work such as implementing a JSON Token stream parser API and experimenting with an
Applicative JSON Object parser emerged. Even though very promising, due to lack of time, this incomplete work was unfortunately not brought to completion.
Consequently, the goal of this project shall be to continue the work started in the aforementioned patches, and complete these to a working prototype of an optimized streaming JSON parsing codepath for integration into the
aeson library. If there’s time left, additional stretch goals may include adding the opt-in ability to add validation to efficiently reject or warn about JSON documents including unknown object fields.
The conceptual results of this work will likely be reusable and thus also benefit other libraries with similar design, such as Haskell’s YAML 1.2 library
We estimate that this should be a good fit for a GSOC project executed by an intermediate Haskell student who enjoys working on a potentially high-impact improvement and getting to work on one of Haskell’s foundational core libraries.
Mentors: Oleg Grenrus, Herbert Valerio Riedel
Implement accepted GHC proposals🔗
Proposed changes to GHC go through a rigorous discussion and review process before they are accepted. This happens in the ghc-proposals repository on GitHub.
Once they are accepted, they usually still need to be implemented. This is a filter that shows the accepted but not yet implemented proposals:
We think it should be possible for an advanced student to pick either one or multiple (depending on the size) of these proposals and implement them over the summer.
In this case a GSoC project proposal should clearly indicate what the work will entail. It is also strongly advised to contact prospective mentors about your proposal.
Potential Mentors: Authors of proposals, GHC HQ