Memory footprint analysis of Ocaml concurrent programs

Nomadic Labs is looking for a Memory footprint analysis of Ocaml concurrent programs intern. Tutors: François Thiré The Octez suite of Tezos blockchain-related software, developed by Nomadic La

Nomadic Labs is looking for a Memory footprint analysis of Ocaml concurrent programs intern.

Tutors: François Thiré

The Octez suite of Tezos blockchain-related software, developed by Nomadic Labs and others, is a highly complex software artefact: the codebase is relatively large, it is highly concurrent, and it is designed to be resilient to attacks from malicious peers. Finding errors in such large piece of software can be very difficult and in particular, memory leaks are notoriously difficult to track. This is because memory-allocation in inherently hard to analyse:
  • Memory-allocation is implicit, and the lifetime of an allocated object is nondeterministic because of how the incremental, generational, OCaml garbage collector works.
  • Concurrency in the code makes usual memory analysis tools (such as statmemprof,valgrind or landmarks) unusable
  • Memory leaks can come from a so-called allocation race, meaning that the program allocates memory faster than the garbage collector can deallocate it – so that even though every memory cell that is allocated will eventually get deallocated,

Goals

The goal of this internship is to develop tools and/or libraries to help developers find memory-related bugs. The intern will choose amongst the proposed topics as follows:
  • Make OCaml’s values traceable: a library to analyze and trace the lifetime of OCaml’s values and generating a readable report.
  • Memory footprint with Lwt: Octez concurrency is obtained thanks to the Lwt library for cooperative threading. To analyse clearly the memory of a program, one needs to separate the Lwt specific part from the Lwt agnostic part.
  • Profiling Lwt overhead: In addition to the inner code of Octez, Lwt is used by its external dependencies, therefore it is hard to predict and observe which “threads” are created at runtime. Profiling Lwt would help gain a better understanding of the different “threads” which are running/created by a program, and to identify hot spots.
  • Eliminate Lwt interference fromprofiling: To benchmark and understand a program’s allocation behavior, we would like to isolate Lwt from the traces of memory analysis. This could be implemented on top of the memtrace library for example.

Requirements

The intern should have a good knowledge of the OCaml programming language and have an appetite to delve into the inner workings of the OCaml compiler. Some knowledge of C or assembly language would also be useful.

Internship Context

You will work at the Nomadic Labs’ offices in Paris.
Participating in a large scale open-source project you will have to rapidly learn to use collaborative tools (Git, merge request, issues, gitlab, continuous integration, documentation) and to communicate about your work. The final results might be presented at an international conference or workshop.
You will have a designated advisor at Nomadic Labs and will have to work independently and to propose thoroughly-considered solutions to the different problems you will have to solve. You will be encouraged to seek advice from members of the team.

Intellectual Property

All material produced (essays, documentation, code, etc.) will be released under an open source license (e.g. MIT or CC).

➡️ If you don’t meet all the criteria above, but think you can still be an asset to us, please consider applying.