I also want to mention that we are currently in need of idea submissions for the upcoming Google Summer of Code 2024! This program depends on having a quality list of ideas, so please consider submitting any you might have (ideally before Feb. 4th).
Without further ado, what follows is a summary of the work that was completed under the Summer of Haskell 2023.
Project | Support for Resolve Functionality in HLS |
Contributor | Nathan Maxson |
Mentor | Michael Peyton Jones |
Nathan Maxson contributed support for resolve
functionality
to HLS. He has
also updated a number of HLS plugins to utilize this functionality, thus
reducing CPU and memory usage and improving speed. The plugins that have been
updated in this way are overloaded-record-dot
, hlint-plugin
,
explicit-imports
, refine-imports
, type-lenses
, explicit-records
, and
class-plugin
.
Relevant code contributions
Project | Cabal File Support for HLS |
Contributor | Jana Chadt |
Mentor | Fendor |
Jana Chadt worked on improving support for Cabal files in HLS. The work has been summarized in this gist, which includes links to relevant PRs and issues. There is also a blog post detailing the new HLS functionality.
Relevant code contributions
Project | HLS: Goto 3rd Party Definition |
Contributor | Elodie Lander |
Mentor | Zubin Duggal |
Elodie Lander worked on allowing the HLS goto definition functionality to work with definitions from outside of the current project. Although primarily focused on HLS, this work involves contributions to other key Haskell infrastructure: GHC, Cabal, HieDb, and haskell/actions.
Relevant code contributions
Project | Standardize GHC’s Error Dump in JSON Format |
Contributor | Ben Bellick |
Mentor | Aaron Allen |
Ben Bellick contributed a new well-defined JSON interface for GHC diagnostic
emissions,
which will be available via a new -fdiagnostics-as-json
flag. The existing
under-specified -ddump-json
flag has been deprecated in favor of this new
interface. Additionally, Ben made valuable contributions to the effort of
converting GHC error messages to use the new structured representation.
Relevant code contributions
Project | Teaching Weeder About Type Classes |
Contributor | Vasily Sterekhov |
Mentor | Oliver Charles |
Vasily Sterekhov implemented support for detecting unused type class instances in Weeder, along with several other significant improvements. See the Weeder 2.7 release notes for details.
Relevant code contributions
Project | Structured Errors for cabal-install |
Contributor | Suganya Arun |
Mentor | Gershom Bazerman |
Suganya Arun implemented structured errors for Cabal, including the assignment of a unique code to each error which can be then be referenced on the Haskell Error Index. You can read more about the results and challenges of the project in this blog post.
Relevant code contributions
Project | Maximally Decoupling Haddock and GHC |
Contributor | Gregory Baimetov |
Mentor | Laurent P. Rene de Cotret |
Gregory Baimetov contributed to the effort towards decoupling GHC and Haddocks. Although the original goal proved to be too ambitious, he has produced a prototype of a JSON serialization for the Haskell AST as well as a document explaining the difficulties encountered, which should be of value to future work on this issue.
Project | Representing Pattern |
Contributor | Saachi Kaup |
Mentor | Alex Mclean |
Saachi Kaup worked with various libraries to explore pattern visualization, drawing connections to the traditional mandalas common in Southeast Asian art. She put together a blog post on the Tidal website describing her process and showcasing some of the images that were produced. You can also view the code repository.
]]>Contributor: Jana Chadt
Mentor: Fendor
Abstract:
The goal of this proposal is to provide cabal file support for Haskell Language Server. I have been working on the cabal plugin for Haskell Language server during various Hackathons since 2021, implementing formatting and code-completion of cabal files and I would like to be able to commit to working on the plugin full time this summer.
Contributor: Nathan Maxson
Mentor: Michael Peyton Jones
Abstract:
With “codeAction/resolve” and “codeLens/reslove” the language server protocol has added methods to allow language servers to delay some of the work it needs to do for codeActions and codeLens until it is actually needed, allowing the server significant savings in both memory and cpu usage. This proposal is to add both of these methods to the haskell-language-server, allowing plugins to call them at will. In addition I propose adding support for the resolve methods to some haskell language server’s plugins.
Contributor: Elodie Lander
Mentor: Zubin Duggal
Abstract:
Making goto definition work for third party libraries is of interest to me as a Haskell developer because it is a feature I would like to use in my Haskell development. In fact, it is the feature that might finally motivate me to use HLS in my own projects. My Haskell workflow has usually involved a lot of switching back and forth between my editor and Hackage documentation in the browser. I believe that being able to see third party library definitions in my editor would reduce this back and forth significantly and help increase my efficiency as a Haskell developer.
Contributor: Vasily Sterekhov
Mentor: Ollie Charles
Abstract:
A frequent complaint about Haskell is the lack of tooling. This proposal aims to contribute to improving the situation by addressing a particular limitation of Weeder, a tool for detecting dead code. In the process, this may involve proposing minor additions to hie files to GHC, which may benefit other similar projects working in the same area.
Contributor: Ben Bellick
Mentor: Aaron Allen
Abstract:
GHC is currently undergoing a long scale project to move to a more structured error representation by treating errors as values. An additional useful feature that can be made available is to dump a JSON representation of warnings/errors. An experimental implementation of this feature exists when GHC is invoked with -ddump-json, although this is an unfinished command which suffers from the following:
- it is non-standardized
- it does not leverage new structured error representation
- previous implementation issues led to a hard-coding of output to stdout
There is an opportunity to benefit consumers of GHC output and to improve Haskell tooling infrastructure. Some examples of possible use-cases for down stream consumers can be found here. Not all consumers of Haskell’s error messages intend on doing so via the GHC API, and such a standardized JSON output enables a larger set of developers to expand the error tooling in the Haskell ecosystem. I am also personally excited to help with this project because I love Haskell and want to make a contribution to one of its crowning achievements–GHC. I am especially interested in any improvements which enable outside consumers to better understand/process the internals of the compiler.
Contributor: Gregory Baimetov
Mentor: Laurent P. René de Cotret
Abstract:
In practice, development and usage of Haddock is strongly coupled to the internals of the Glasgow Haskell Compiler (GHC). One concrete example of this coupling is the fact that Haddock makes use of the GHC parser itself. Therefore, if Haddock was compiled using GHC version X, it might not be able to parse the source code of a Haskell program written for GHC version Y > X.
This strong coupling between GHC and Haddock slows down Haddock development and prevents Haddock from being better integrated in other tools, such as Hackage, the Haskell Language Server, or Hoogle.
Contributor: Saachi Kaup
Mentor: Alex McLean
Abstract:
Using Haskell’s advanced type system to map the structures in Tidal Cycles to the underlying shapes of Mandala art and produce beautiful visualisations.
Contributor: Dominic Mills
Mentor: Luis Morillo Najarro
Abstract:
Calligraphy, a tool for visualizing Haskell projects, faces the challenges of developing and maintaining Haskell tooling due to the constantly evolving nature of the language and its implementation in GHC.
In light of these challenges, the primary aim of this Summer of Haskell project is to enhance the Calligraphy tool to provide visualizations that are both simple and easy to use. This will be done by modularizing the Calligraphy tool into its various parts such as the calligraphy-gui, calligraphy-graphviz, calligraphy-cli, and calligraphy-fgl. In addition to keeping it up-to-date with GHC releases.
Contributor: Suganya Arun
Mentor: Gershom Bazerman
Abstract:
]]>The https://errors.haskell.org/ site provides an index that maps error codes in haskell tooling to documentation. GHC, ghcup, and stack have all begin to implement support for structured errors that have assigned codes. This project is to refactor the cabal codebase to also provide structured errors rather than mere strings, and also assign cabal errors corresponding codes that can be added to the error index.
In 2022, the program will be addressed to all newcomers of open source that are 18 years and older. GSoC will no longer be solely focused on university students or recent graduates - people that are at various stages of their career, recent career changers, self-taught, those returning to the workforce, etc., are welcome to join. Those changes should better fulfill the needs of open source communities and provide more flexibility to both projects and contributors.
Organizers are aware that not everyone can devote an entire summer to coding. Offered projects are available in multiple sizes: medium (~175 hours) and large (~350 hours). There’s an availability to join a 12-week program or extend the deadline - up to 22 weeks.
Are you working on a Haskell project, and you could use the help of a student during the summer? Consider contributing it as an idea here! Send a pull request to our github repository (example from 2020). If you just want to discuss a possible idea, don’t hesitate to get in touch with us or/and read through the student/contributor guide.
We encourage you to explore GSoC’s webpage, and you can learn more on the FAQ website. All the updates about this year’s GSoC edition can be found in this blogpost.
GSoC 2022: OSS projects, developed during Summer (from June to September/November) for newcomers that are 18 years and older and want to spend 175 - 350 hours on coding activities, with a mentor’s support.
]]>Despite that, all our 10 slots were successful! This is the first that has happened in the history of Haskell.org’s participation in the program. Some of these are high-profile and will benefit a lot of users in the ecosystem, which is super exciting.
Enhanced figure support in pandoc
Student: Aner Lucero
Mentors: tarleb
Student report
Google summer of code was a great way to expand my involvement with the haskell community and to test my knowledge working on one of haskell’s most used apps.
Gradually Typed Hasktorch
Student: Julius Marozas
Mentors: Torsten Scholak
Student report
Dhall bindings to TOML configuration language
Student: Marcos Lerones
Mentors: Gabriella Gonzalez, Simon Jakobi
Student report
Haskell in CodeMirror 6
Student: Olivian Cretu
Mentors: Chris Smith
Student repo
Fixing ihaskell-widgets
Student: David Davó
Mentors: James Brock, Vaibhav Sagar
Student report
Three years ago, I started learning Haskell and functional programming. As I had recently started using Jupyter Notebooks in other projects, I wanted to try using them with Haskell to take notes and do the course homework. A few weeks in, I noticed I couldn’t use the widgets, but I didn’t give it much thought. Three years later, this summer, I’ve had the opportunity to fix it, while learning a lot in the process.
That’s what open source is about.
TidalCycles API and editor plugin
Student: Martin Gius
Mentors: Alex McLean
Student report
Haskell Language Server: Symbol Renaming
Student: Oliver Madine
Mentors: Pepeiborra
Student report
Working on the Haskell Language Server (HLS) was my first time using Haskell in production. While navigating through different areas of the tooling infrastructure, the community was supportive in helping me develop my understanding.
Specifically, my project involved exploring hie-bios and the GHC API to create a symbol renaming plugin. Overall, the work was engaging, and I was able to substantially improve my development skills with the help of my mentor!
Support call hierarchy in Haskell Language Server
Student: Lei Zhu
Mentors: Javier Neira, Pepeiborra
Student report
Haskell community is warm and friendly to everyone, no matter you are a beginner or an expert. This summer, I am more familiar with haskell-language-server and GHC itself. Thank haskell.org and GSoC for providing this opportunity!
Visualization Libraries for ghc-debug
Student: Ethan Tsz Hang Kiang
Mentors: Matthew Pickering
Student report
TOML Support in dhall-haskell
Student: Julio Grillo
Mentors: Gabriella Gonzalez, Simon Jakobi
Student report
We hope that Google hosts the program in 2022; and in that case we plan to apply again. If you have ideas for projects that students could work on, we’ll be using the same format as the years before – this page has more information on how to submit an idea.
Thanks a lot to everyone involved!
]]>SPECIALIZABLE GHC pragma
Student: Francesco Gazzetta @fgaz
Mentors: Carter Schonwald, Andreas Klebinger, chessai
Student report
Add primops to expand the (boxed) array API
Student: buggymcbugfix
Mentors: andrewthad, Andreas Klebinger, chessai
Student report
Build-integration and Badges for Hackage
Student: Shubham Awasthi
Mentors: hvr, Gershom Bazerman
Student report
Building the Haskell Language Server and more
Student: Luke Lau
Mentors: Alan Zimmerman, Pepe Iborra, Zubin Duggal
Student report
Custom Dataloader for Hasktorch
Student: Andre Daprato
Mentors: Austin Huang, Adam Paszke, Torsten Scholak, Junji Hashimoto
Documentation generator for the Dhall configuration language
Student: German Robayo
Mentors: Profpatsch, Gabriel Gonzalez, sjakobi
Student report
Finish the package candidate workflow for Hackage
Student: Sitao Chen
Mentors: hvr, Gershom Bazerman
Student report
This summer, I have participated in Google Summer of Code with Haskell org and worked on Hackage candidate UI and workflow. Without previous experience in open source development, I was able to grasp a large codebase and its structure in a short period with the help of my mentors. Besides, I got a chance to learn about how to make API calls and how to improve UI using Haskell in a formal setting. This experience helps me have a better understanding of packages workflow management and web services in Haskell. I wish I can contribute again in the future!
Functional Machine Learning Algorithms for Music Generation
Student: Elizabeth Wilson
Mentors: Alex McLean, Austin Huang, Torsten Scholak
Student report
Multiple Home Packages for GHC
Student: fendor
Mentors: Zubin Duggal, John Ericson, Matthew Pickering
Student report
Haskell IDE Engine was the first open source project I ever contributed to, and over time, it became of a project of passion for me. Over the months I dove deeper into Haskell tooling, until I got the chance to work on GHC itself in this year’s Google Summer of Code! I worked on this project to improve the tooling situation for Haskell, as well as improving the IDE experience by implementing features needed by both.
The project itself proved to be challenging, mainly because of my unfamiliarity with the GHC code base. However, with the help of my helpful mentors, I was able to overcome the challenges and learned a lot about GHC. I am glad I had the chance to work on this project, although I did not accomplish everything I wanted to, yet.
Number Field Sieves
Student: Federico Bongiorno
Mentors: Sergey Vinokurov, Andrew Lelechenko
Student report
Optimising Haskell developer tool performance using OpenTelemetry
Student: Michalis Pardalos
Mentors: Dmitry Ivanov, Matthew Pickering
My project was about adding support for opentelemetry tracing into ghcide, the core component of haskell-language-server. I had very little experience with open-source development, or the internals of haskell and ghc before this project and I can say for sure that this has changed. Aside from working on ghcide itself, I also had to submit patches to haskell-opentelemetry, implementing features necessary for this project. When the project was blocked by a ghc bug, I also took this as an opportunity to dive into ghc and fix it myself, which I found incredibly rewarding and consider a valuable experience.
Even though I ended up running out of time and not finishing everything I hoped for in the project, I can say for sure that it was a positive experience which I would absolutely recommend.
Update stylish-haskell to use ghc-lib-parser
Student: Beatrice Vergani
Mentors: Jasper Van der Jeugt, lukaszgolebiewski, Paweł Szulc
Student report
Google will be hosting GSoC again in 2021, and of course we plan to apply again. If you have ideas for projects that students could work on, we’ll be using the same format as the years before – this page has more information on how to submit an idea.
Thanks a lot to everyone involved!
]]>Haskell.org has been able to take part in this program in the past two years, and we’d like to keep this momentum up since it greatly benefits the community.
Google is not extremely open about what factors it considers for applications from organizations, but they have stated multiple times that a well-organized ideas list is crucial. For that, we would like to count on all of you again.
If you are the maintainer or a user of a Haskell project, and you have an improvement in mind which a student could work on during the summer, please submit an idea here:
https://summer.haskell.org/ideas.html
For context, Google Summer of Code is a program where Google sponsors students to work on open-source projects during the summer. Haskell.org has taken part in this program in 2006-2015, and 2018-2019. Many important improvements to the ecosystem have been the direct or indirect result of Google Summer of Code projects, and it has also connected new people with the existing community.
Projects should benefit as many people as possible – e.g. an improvement to GHC will benefit more people than an update to a specific library or tool, but both are definitely valid. New libraries and applications written in Haskell, rather than improvements to existing ones, are also accepted. Projects should be concrete and small enough in scope such that they can be finished by a student in three months. Past experience has shown that keeping projects “small” is almost always a good idea.
]]>Unfortunately; this summary is less successful – I meant to contact the students immediately after the summer, but that mail never went through and I failed to follow up on it – my apologies.
In either case, I still wanted to list the successful projects here for posterirty. I reached out to the students again and will be updating this post with more information and quotes as they get back to me.
A language server for Dhall
Student: Frederik Ramcke
Mentors: Luke Lau, Gabriel Gonzalez
A stronger foundation for interactive Haskell tooling
Student: dxld
Mentors: Alan Zimmerman, Matthew Pickering
Automated requirements checking as a GHC plugin
Student: Daniel Marshall
Mentors: Chris Smith, chessai, Alphalambda
Extending Alga
Student: O V Adithya Kumar
Mentors: Andrey Mokhov, Jasper Van der Jeugt, Alexandre Moine
Extending Hasktorch With RNNs and Encoder-Decoder
Student: AdLucem
Mentors: Austin Huang, Junji Hashimoto, Sam Stites
Functional Machine Learning with Hasktorch: Produce Functional Machine
Learning Model Reference Implementations
Student: Jesse Sigal
Mentors: Austin Huang, idontgetoutmuch, Junji Hashimoto, Sam Stites
Hadrian Optimisation
Student: ratherforky
Mentors: Andrey Mokhov, Neil Mitchell
Implementing Chebyshev polynomial approximations in Haskell: Having the
speed and precision of numerics with complex, non-polynomial functions.
Student: Deifilia To
Mentors: tmcdonell, idontgetoutmuch, Albert Krewinkel
Improving Hackage Matrix Builder as a Real-world Fullstack Haskell
Project
Student: Andika Riyandi (Rizary)
Mentors: Herbert Valerio Riedel, Robert Klotzner
Improving HsYAML Library
Student: Vijay Tadikamalla
Mentors: Herbert Valerio Riedel, Michał Gajda
Issue-Wanted Web Application
Student: Rashad Gover
Mentors: Veronika Romashkina, Dmitrii Kovanikov
More graph algorithms for Alga
Student: Vasily Alferov
Mentors: Andrey Mokhov, Alexandre Moine
Property-based testing stateful programs using
quickcheck-state-machine
Student: Kostas Dermentzis
Mentors: stevana, Robert Danitz
Putting hie Files to Good Use
Student: Zubin Duggal
Mentors: Alan Zimmerman, Matthew Pickering
Upgrading hs-web3 library
Student: amany9000
Mentors: Alexander Krupenkin, Thomas Dietert
Thanks to everyone involved!
]]>When you apply to Summer of Code, you write a proposal. The proposal is a document in which you describe your ideas on the chosen project. It should be a clear, detailed text with suggestions on every subtask. The proposal should also include a timeline, in which you estimate the time you intend to spend on each of those subtasks.
I chose this project for my summer. In my proposal, I drafted all the algorithms mentioned in the list and suggested a few more. I published this part of my proposal as a Github gist there.
I don’t suggest this gist as a complete example of a good proposal: it’s only a part of the document I submitted. You should also include some information about you, together with the timeline. Communication with your future mentors is also a significant part of the application.
However, as I mentioned in one of my previous posts, another student ended up doing the part suggested in the ideas list. So my task is to introduce bipartite graphs.
This task was my idea. I mentioned it in my proposal. I meant that finding maximum matchings in bipartite graphs should be easily implemented when we have algorithms for finding maximum flows in networks. Kuhn’s algorithm is an application of the Ford-Fulkerson algorithm, and the Hopcroft-Karp algorithm is an application of Dinic’s algorithm.
However, this option is not the best. Both algorithms have specialized implementations that work times faster. So my task for this summer was to introduce bipartite graphs and special functions for working with them.
I made four pull requests to Alga this summer. Each pull request represents a separate task and summarizes the work of several weeks.
Each PR contains the actual implementation, tests, and documentation. The whole
project is release-ready after merging each one of them. I put the tests in the
test/
directory. The documentation for each function and datatype precedes the
declaration. After release, it will compile to beautiful Haddock file like
this.
Link to PR: https://github.com/snowleopard/alga/pull/207
In this part, I defined the Bipartite.AdjacencyMap
datatype and added many
functions to work with adjacency maps.
The datatype represents a map of vertices into their neighbours. I defined it as two maps:
data AdjacencyMap a b = BAM {
leftAdjacencyMap :: Map.Map a (Set.Set b),
rightAdjacencyMap :: Map.Map b (Set.Set a)
}
The properties are based on the existing properties of graphs in Alga.
Link to PR: https://github.com/snowleopard/alga/pull/218
There is a folklore algorithm that checks if a given graph is bipartite. The task to implement this algorithm in Haskell was a little challenging for me.
I finished up with the following definition of the function:
detectParts :: Ord a => AM.AdjacencyMap a -> Either (OddCycle a) (AdjacencyMap a a)
It is known that a graph is bipartite if and only if it contains no cycles of odd length. This function either finds an odd cycle or returns a partition.
The implementation is so exciting that I wrote a whole
post
about it. I explained the reason I needed monad transformers there and made some
interesting benchmarks that pointed me to use the explicit INLINE
directive.
Link to the unfinished PR: https://github.com/snowleopard/alga/pull/226
Some families of graphs are bipartite: simple paths, even cycles, trees, bicliques, etc. The task is to provide a simple method to construct all those graphs.
The most exciting part of this task was to provide type-safe implementations. For example, only cycles of even length are bipartite. And speaking of paths, we should provide a method for constructing paths of vertices of two different types.
The circuit
definition for constructing graphs containing one even cycle is
simple:
circuit :: (Ord a, Ord b) => [(a, b)] -> AdjacencyMap a b
For the paths, I added a special type for alternating lists:
data List a b = Nil | Cons a (List b a)
So the path
definition is:
path :: (Ord a, Ord b) => List a b -> AdjacencyMap a b
As for now, the PR is almost merge-ready, only several small comments need fixes.
Link to the unfinished PR: https://github.com/snowleopard/alga/pull/229
This algorithm is the fastest one for maximum matchings in bipartite graphs. The implementation is rather straightforward.
However, there is an aspect of this PR I’d like to share there.
I implemented the following function:
augmentingPath :: (Ord a, Ord b) => Matching a b
-> AdjacencyMap a b
-> Either (VertexCover a b) (List a b)
Given a matching in a graph, it returns either an augmenting path for the matching or a vertex cover of the same size, thus proving that the given matching is maximum. As both outcomes can be easily verified, this helps to write perfect tests that ensure that the matching returned by my function is maximum indeed.
This PR still needs some work. The reason is that two different implementations behave weirdly on the benchmarks.
I wrote a lot of Haskell this summer. This gave me a lot of experience in this language. Although there’s still work to be done, I’m satisfied with the results I got.
I adore the way functional programs are developed. I was surprised to know how popular testing (QuickCheck) and benchmarking (Criterion) frameworks are organized. And preciseness of the documentation makes the work a lot easier.
]]>A graph is called bipartite if its vertices can be split into two parts in such way that there are no edges inside one part. While testing graph on tripartiteness is NP-hard, there is a linear algorithm that tests graph on bipartiteness and restores the partition.
The algorithm is usually one of the first graph algorithms given in any university course. The idea is rather straightforward: we try to assign vertices to the left or right part in some way, and when we get a conflict, we claim that the graph is not bipartite.
First, we assign some vertex to the left part. Then, we can confidently say that all neighbours of this vertex should be assigned to the right part. Then, all neighbours of this vertex should be assigned to the left part, and so on. We continue this until all the vertices in the connected component are assigned to some part, then we repeat the same action on the next connected component, and so on.
If there is an edge between vertices in the same part, one can easily find an odd cycle in the graph, hence the graph is not bipartite. Otherwise, we have the partition, hence the graph is bipartite.
There are two common ways of implementing this algorithm in linear time: using Depth-First Search or Breadth-First Search. We usually select DFS for this algorithm in imperative languages. The reason is that DFS implementation is a little bit simpler. I selected DFS, too, as a traditional way.
So, now we came to the following scheme. We go through the vertices in DFS order and assign them to parts, flipping the part when going through an edge. If we try to assign some vertex to some part and see that it is already assigned to another part, then we claim that the graph is not bipartite. When all vertices are assigned to parts and we’ve looked through all edges, we have the partition.
In Haskell, all computations are supposed to be pure. Still, if it was really so, we wouldn’t be able to print anything to the console. And what I find most funny about pure computations is that they are so lazy that there is no pure reason to compute anything.
Monads are the Haskell way to express computations with effects. I’m not going to give a complete explanation of how they work here, but I find this one very nice and clear.
What I do want to notice there is that while some monads, like IO
, are
implemented through some deep magic, others have simple and pure
implementations. So the entire computation in these monads is pure.
There are many monads that express all kinds of effects. It is a very beautiful and powerful theory: they all implement the same interface. We will talk about the three following monads:
Either e a
— a computation that returns value of type a
or throws an
error of type e
. The behaviour is very much like exceptions in imperative
languages and the errors may be caught. The main difference is that this monad
is fully logically implemented in the standard library, while in imperative
languages it is usually implemented by the operating system or virtual machine.State s a
— a computation that returns value of type a
and has an access
to a modifiable state of type s
.Maybe a
. A Monad
instance for Maybe
expresses a computation that can be
at any moment interrupted with returning Nothing
. But we will mostly speak
of MonadPlus
instance, which expresses a vice versa effect: this is a
computation which can be at any moment interrupted with returning a concrete
value.We have two data types, Graph a
and Bigraph a b
, first of them representing
graphs with vertex labels of type a
and second representing bipartite graphs
with left part labels of type a
and right part labels of type b
.
A Word of Warning: These are not Alga data types. Alga representation for bipartite graphs is not yet released and there is no representation for undirected graphs.
We also assume that we have the following functions.
-- List of neighbours of a given vertex.
neighbours :: Ord a => a -> AM.AdjacencyMap a -> [a]
-- Convert a graph with vertices labelled with their parts to a bipartite
-- graph, ignoring the edges within one part.
toBipartiteWith :: (Ord a, Ord b, Ord c) => (a -> Either b c)
-> Graph a
-> Bigraph b c
-- List of vertices
vertexList :: Ord a => AM.AdjacencyMap a -> [a]
Now we write the definition for the function we are going to implement.
type OddCycle a = [a]
detectParts :: Ord a => Graph a -> Either (OddCycle a) (Bigraph a a)
It can be easily seen that the odd cycle is at the top of the recursion stack in case we failed to find the partition. So, in order to restore it, we only need to cut everything from the recursion stack before the first occurrence of the last vertex.
We will implement a Depth-First Search, while maintaining a map of part
identifiers for each vertex. The recursion stack for the vertex in which we
failed to find the partition will be automatically restored with the Functor
instance for the monad we choose: we only need to put all vertices from the
path into the result on our way back from the recursion.
The first idea is to use the Either
monad, that fits perfectly well to our
goals. The first implementation I had was something very close to that. In
fact, I had five different implementations at some point to choose the best,
and I finally stopped at another option.
First, we need to maintain a map of effects — this is something about
State
. Then, we need to stop when we found a conflict. This could be either
Monad
instance for Either
or MonadPlus
instance for Maybe
. The main
difference is that Either
has a value to be returned in case of success
while MonadPlus
instance for Maybe
only returns a value in case we failed
to find the partition. As we don’t need a value because it’s already stored in
State
, we choose Maybe
. Now, we need to combine two monadic effects, so we
need monad transformers,
which are a way to combine several monadic effects.
Why had I chosen such complicated type? There are two reasons. The first is
that the implementation becomes very similar to one we have in imperative
languages. The second is that I needed to manipulate the value returned in case
of conflict to restore the odd cycle, and this becomes much simpler in Maybe
.
So, here we go now.
{-# LANGUAGE ExplicitForAll #-}
{-# LANGUAGE ScopedTypeVariables #-}
data Part = LeftPart | RightPart
otherPart :: Part -> Part
LeftPart = RightPart
otherPart RightPart = LeftPart
otherPart
type PartMap a = Map.Map a Part
type OddCycle a = [a]
toEither :: Ord a => PartMap a -> a -> Either a a
= case fromJust (v `Map.lookup` m) of
toEither m v LeftPart -> Left v
RightPart -> Right v
type PartMonad a = MaybeT (State (PartMap a)) [a]
detectParts :: forall a. Ord a => Graph a -> Either (OddCycle a) (Bigraph a a)
= case runState (runMaybeT dfs) Map.empty of
detectParts g Just c, _) -> Left $ oddCycle c
(Nothing, m) -> Right $ toBipartiteWith (toEither m) g
(where
inVertex :: Part -> a -> PartMonad a
= ((:) v) <$> do modify $ Map.insert v p
inVertex p v let q = otherPart p
| u <- neigbours v g ]
msum [ onEdge q u
{-# INLINE onEdge #-}
onEdge :: Part -> a -> PartMonad a
= do m <- get
onEdge p v case v `Map.lookup` m of
Nothing -> inVertex p v
Just q -> do guard (q /= p)
return [v]
processVertex :: a -> PartMonad a
= do m <- get
processVertex v `Map.notMember` m)
guard (v LeftPart v
inVertex
dfs :: PartMonad a
= msum [ processVertex v | v <- vertexList g ]
dfs
oddCycle :: [a] -> [a]
= tail (dropWhile ((/=) last c) c) oddCycle c
I’ll try to explain each of the first four scoped functions: this is the core of the algorithm.
inVertex
is the part of DFS that happens when we visit the vertex for the
first time. Here, we assign the vertex to the part and launch onEdge
for
every incident edge. And that’s the place where we hope to restore the call
stack: if a Just
is returned from sum edge, we add v
to the beginning.onEdge
is the part that happens when we visit any edge. It happens twice
for each edge. Here we check if the vertex on the other side is visited. If
not, we visit it. Else we check whether we found an odd cycle. If we did, we
simple return the current vertex as a singleton. The other vertices from the
path are added at the way back from the recursion.processVertex
checks if the vertex is visited and runs DFS on it if not.dfs
runs processVertex
on all vertices.That’s it.
When I first wrote the above code, action
was not explicitly inlined. Then,
when I was benchmarking different versions of detectParts
to select the best,
I noticed that on some graphs this version with transformers had a serious
overhead over the version with Either
. I had no idea of what was going on,
because semantically two functions were supposed to perform the same operations.
And it became even weirder when I ran it on another machine with another
version of GHC and didn’t notice any overhead there.
After a weekend of reading GHC Core code, I managed to fix this with one
explicit inline. At some point between GHC 8.4.4 and GHC 8.6.5 they changed the
optimizer in some way that it didn’t inline action
.
This is just a crazy thing about programming I didn’t expect to come through with Haskell. Still, it seems that the optimizers make mistakes even in our time and it is our job to give them hints of what should be done. For example, here we knew that the function should be inlined as it is in the imperative version, and that’s a reason to give GHC a hint.
When this patch is merged, I’m going to start implementing Hopcroft-Karp algorithm. I think the BFS part is going to be rather interesting, so the next blog post will come in a couple of weeks.
]]>The idea of the project was on the ideas list published earlier. Two of us were accepted for this project, the other one being Adithya Kumar and who will be doing the work described on the ideas list. He told me his GSoC blog will probably be here.
My task is to introduce bipartite graphs to Alga and that is what I am going to tell you about now.
There are three common ways to represent graphs in computing:
All three of them have their advantages and disadvantages. The most commonly used is the adjacency lists approach: that is storing a list of neighbors for each vertex. In fact, I can think of only one common algorithm for which this approach is not perfect: it is Kruskal’s algorithm for finding the minimum spanning tree.
However, the problem is that feeding graphs formed this way to algorithms is
not always safe. For example, if the algorithm is designed for bidirectional
graphs, it may rely on the fact that if some vertex u
is in the list of
neighbors of some another vertex v
then v
is in the list of neighbors of
u
.
A traditional solution for functional programming would be to guarantee the consistency of input data for the algorithm by taking a representation of the graph that would not allow a wrong graph to be passed. That’s what we call type safety.
Alga is a library that provides such a safe representation with a beautiful algebraic interpretation. It also has a nice set of algorithms out of the box. You can find the paper on Alga by its author here, I’m just going to provide some basics.
Consider the following definition for the graph data type:
data Graph a = Empty
| Vertex a
| Overlay (Graph a) (Graph a)
| Connect (Graph a) (Graph a)
The constructors mean the following:
Empty
constructs an empty graph.Vertex v
constructs a graph of single vertex labeled v
.Overlay g h
constructs a graph with sets of vertices and edges united from
graphs g
and h
.Connect g h
does the same as Overlay
and also connects all vertices of
g
to all vertices of h
.One can easily construct a Graph
of linear size having a list of edges of the
desired graph. In fact, this approach may even save memory for dense graphs
comparing to adjacency lists. And this approach is surely type safe in the
sense described above. Comparing to adjacency lists, there is no problem with
an edge not present in the list of neighbours of another vertex. Another
possible problem with adjacency lists not present here is that an edge might
lead to a vertex with no associated adjacency list.
Why algebraic? Well, if we write down simple laws for these graphs we will see
that laws for Connect
and Overlay
operations are very similar to those for
multiplication and addition in a semiring, respectively.
This was just a brief description of Alga. There are many other parts not
covered here. One example is that Graph
might also be provided as a type
class rather than a data type. This approach is much more flexible.
An important part of Alga is providing different type-safe representations for different kinds of graph. For example, one for edge-labeled graphs was introduced last year.
Another option is to add a representation that restricts the set of possible graphs. One example from the ideas list is to represent only acyclic directed graphs. This is what Adithya will be doing. And my task for the first evaluation period is to provide bipartite graphs.
We often meet bipartite graphs in real world: connections between entities of different kinds are common. For example, graph of clients and backends they use is bipartite. Another example I can think of is about content recommendation systems: graph of users and films or songs they like is bipartite, too.
There are many ideas on how to do so. For example, in my proposal I suggested an approach that seems to match Alga’s design:
data Bigraph a b = Empty
| LeftVertex a
| RightVertex b
| Overlay (Bigraph a b) (Bigraph a b)
| Connect (Bigraph a b) (Bigraph a b)
Here, Connect
only connects left vertices to the right. As my mentor Andrey
figured, there is an interesting addition to the laws:
(LeftVertex u) * (LeftVertex v) = (LeftVertex u) + (LeftVertex v)
. Of course,
the same holds for the right vertices.
By now, we agreed that first, I will focus on implementing adjacency maps for bipartite graphs (hey, didn’t I mention that Alga uses adjacency maps on the inside?). It doesn’t make much sense to make a separate algebraic representation, but I may do it if I find something interesting in it.
Now, the first task is to implement the conversion function, which I’m going to start right now. This implementation will simply ignore the edges between vertices of the same part.
fromGraph :: Graph (Either a b) -> Bipartite.AdjacencyMap a b
= undefined fromGraph
With this stub, my summer-long dive into Haskell begins!
]]>