How Covenant Will Deliver You to the Promised Land (of Scripts)
Introduction
When writing a script for Cardano (or a program in any language, really), there are typically three 'layers' to the work:
Functionality
Optimization and code generation
Analysis
The functionality layer is concerned with what the script (or program) will actually do: the 'business logic' that the script was made to solve. The optimization and code generation layer instead focuses on actually getting it runnable, both 'as such' (code generation) and 'efficiently' (optimization). Lastly, the analysis layer is concerned with exactly what can (and can't) happen when the code runs, whether for reasons of efficiency, security, or something else.
Each of these 'layers' requires a different skill set, and in an ideal world, they would be completely separate. To see an example of where such a situation exists, consider the LLVM project. LLVM defines an intermediate representation (LLVM IR), which acts as a 'universal target' for any language's front end. Then, LLVM provides ways to work with LLVM IR, including analysis, optimization, and code generation. This leads to a range of benefits:
Language front-ends have a uniform target and don't have to consider optimization or code generation at all, even if they are very different from one another.
Specialists who write analysis or optimization passes don't have to consider details of any specific language; instead, they focus on LLVM IR, allowing such techniques to be shared across many (again, possibly quite different) languages.
Code generation doesn't have to consider the specifics of any given language. If LLVM IR can target any physical platform, so can any language using LLVM IR, without any intervention from the language-specific developers.
LLVM IR has a stable serial form, independent of any language that targets it, or any architecture we can generate code for from it. Thus, any tool that works with LLVM IR doesn't have to consider who created it or where it came from.
Unsurprisingly, LLVM is thus used as the basis for many languages, optimization and analysis tools, and many other tasks besides, whether in research or industry.
Currently, if we want to write Cardano scripts, we are unfortunately forced by all current solutions to mash all the 'layers' described above into one. In all of these cases, one or more of the following ends up being true:
A hard dependency on Plutus' codebase exists, even if nothing gets used from it in a user-visible way (Plutarch).
Developers must be aware of the specifics of UPLC at the low level, even if what they write is many steps removed (Plinth is especially bad here, but Plutarch and Aiken aren't much better).
Analysis and optimization tools must work against either UPLC (which is hard or impossible, to say nothing of having to depend on Plutus) or against the language developers actually write (which is tied to the language of implementation and doesn't generalize at all).
While in theory, we could use UPLC as the equivalent of LLVM IR, this leads to a different set of problems:
UPLC's serial form is not straightforward to parse and write. The easiest (and typical) solution is to depend on Plutus to handle this problem, which means a hard dependency on both Haskell and Plutus.
UPLC claims to be capable of general recursion (which impedes static analysis), but in practice, this isn't actually usable (due to execution budget limits).
The second point is worth elaborating on somewhat. If we look at the definition of a UPLC Term
in the Plutus codebase, we get this:
data Term name uni fun ann
= Var !ann !name
| LamAbs !ann !name !(Term name uni fun ann)
| Apply !ann !(Term name uni fun ann) !(Term name uni fun ann)
| Force !ann !(Term name uni fun ann)
| Delay !ann !(Term name uni fun ann)
| Constant !ann !(Some (ValueOf uni))
| Builtin !ann !fun
| Error !ann
| Constr !ann !Word64 ![Term name uni fun ann]
| Case !ann !(Term name uni fun ann) !(Vector (Term name uni fun ann))
deriving stock (Functor, Generic)
Modulo some specifics of Plutus, this is an untyped lambda calculus, which means that it is capable of general recursion. However, this is an illusion: while we can describe general recursion in a UPLC Term
, we can't actually perform it! This is because the Cardano onchain environment forces a strict budget on all scripts, both in terms of time and space. Thus, we are in fact making promises we cannot keep[1]
. At the same time, thanks (not least of all) to Rice's theorem, our ability to perform any kind of analysis on UPLC (at least, without simulating it) is severely limited. In effect, we end up with the worst of both worlds: neither general recursion in practice nor the static analysis to make up for it.
What a Solution Would Look Like
Taking our cue from LLVM IR, any solution to the above problems would need to do at least the following:
Have a stable serial form.
Be expressive enough to act as a target for a range of different DSLs, embedded in different languages, with different semantics, without too much difficulty.
Have sufficient structure to enable optimization and analysis tools to both discover, and apply, transformations and improvements, without needing any 'external information' about how the form was generated (such as what DSL it got translated from).
Not depend inherently on any programming language or framework, including, but not limited to, Haskell, Plutus or Nix.
Furthermore, in light of the above, the following would be highly desirable:
A typed IR, ideally using a well-known and well-understood system.
The serial form of the IR should use a well-known format, such as JSON.
The type system of the IR should forbid (anything equivalent to) general recursion, instead favouring something more restrictive, but easier to analyze.
The DSL should not over-generalize instead focusing only on the use case of scripts
At the moment, no real options exist that satisfy all of these requirements, or even all of the minimal ones. UPLC itself has a serial form that is (fairly) stable, and is certainly expressive enough, but lacks the structure for good analysis tools (as previously described), lacks types, and in practice, has a format so obscure that a dependency on Plutus to work with it is almost mandatory. TPLC (UPLC's typed counterpart) has types but otherwise inherits all the deficiencies of UPLC. PIR is also unsuitable, as it lacks a serial form, and in many ways isn't really an 'IR' per se, designed as it is for a fairly specific use case around a specific language. Finally, the assorted DSLs around writing Cardano scripts (Plutarch, Plinth and Aiken, as well as others) lack serial forms, aren't expressive enough or structured enough, or depend too closely on specific frameworks and programming languages.
Our Solution
We propose Covenant: a Turner-total, call-by-push-value, serializable DSL, designed as a target for higher-level tools. Not only will this allow a common substrate for higher-level DSLs designed for writing Cardano scripts, but it will also enable optimization and analysis tools to be written that can be re-used across any DSL for writing Cardano scripts, including all current approaches like Plutarch or Aiken. Furthermore, we will enable the use of far more languages for writing tools that work with Cardano scripts, whether as hosts for DSLs, optimization or analysis tools, or indeed anything else, by allowing working over Covenant's serial form, instead of something more specific.
By being a Turner-total, call-by-push-value DSL, Covenant will re-use familiar and well-founded techniques, as well as enable analysis currently not possible with any other approach. Furthermore, due to the nature of the onchain execution environment, the limitations of Turner totality will not arise, effectively enabling us to be more capable by being more honest. This combination is also general enough to act as a target for a large range of possible DSLs: in particular, the use of call-by-push-value enables both strict and non-strict semantics without needing separate forms or special cases. The resulting system is general enough to act as a target for many different DSLs, though with a slight bias towards functional languages.
Covenant's serial form will be JSON-based, as well as stable and well-documented. By choosing a well-known, widely-used, and textual form, we will allow working with Covenant to be much easier, and not require a dependency on Plutus, Cardano, or even Nix or Haskell. Almost any programming language can work with JSON: this will mean that writing optimization passes, analysis tools or indeed any application that works with Covenant will be significantly simpler, and won't need to concern itself with what generated the Covenant serial form being worked on. In combination with all the above benefits, we will finally be able to separate the 'layers' of Cardano script development we previously discussed:
Script developers will be able to use DSLs that target Covenant, hosted in whatever languages they choose, without worrying about optimization, analysis or the specifics of Plutus (beyond what is needed for their business logic).
Analysis or optimization tools could be written and used regardless of in which DSL or language Cardano scripts are developed.
Code generation can be written against Covenant, not specific DSLs, allowing code generation experts to ignore any specifics of any given DSL, such as Plutarch or Aiken.
One major implication of the static analysis capabilities of Covenant is for security purposes. Currently, when given a script, this analysis can only be done manually, or via simulation, as static analysis of UPLC is virtually impossible. With Covenant, due to its Turner totality, this will not be required nearly as often, allowing a range of security-related analysis tooling to exist. When combined with the independence of Covenant from Plutus, Cardano or even Haskell, this has significant implications, as security specialists who don't have a background in Haskell would be able to design tooling that nonetheless benefits Cardano and Plutus work.
What We Will Produce
The main output of our work will be Covenant itself. This will consist of both a specification of its JSON-based serial form, as well as a Haskell implementation of a serializer and deserializer for it. We will provide a set of conformance cases to ensure that other implementations can have something to compare themselves against, as well as to verify that our implementation correctly implements the specification. This implementation will have no dependency on Plutus or Nix; we will release it on Hackage as a demonstration of this fact.
Additionally, to demonstrate the possibilities of Covenant, we will also provide:
A sample static analysis tool, which will consume Covenant's serial form and provide some kind of analysis of it, without simulating the computation the serial form describes; and
A code generator for UPLC from Covenant's serial form.
The static analysis tool will, once again, have no dependency on Plutus or Nix; we will also release it on Hackage. The code generator must, by necessity, depend on Plutus and Nix, but it will be a standalone component, allowing its use as a binary without requiring developers to integrate, or even know about, either of these tools.
Lastly, and most importantly, we will provide documentation, using the Diataxis framework. This will ensure that others can use Covenant, both as a target for their DSLs and their analysis and optimization tools.
With our work, a situation like the following would become possible:
A developer writes a script in an eDSL in Python, which generates Covenant's serial form. The eDSL doesn't perform any optimizations.
This serial form is then run through an optimizer written in Rust, which the developer downloaded from Github. Knowing nothing about optimization or Rust was no impediment: the developer 'grabbed a binary and went'.
Lastly, the now-optimized Covenant IR is run through a code generator to produce UPLC written in Haskell. Once again, the developer knows nothing about Haskell or UPLC: they just 'grabbed a binary and went'.
This highlights the power of Covenant: the separation of the 'layers' brings with it a separation of specializations, allowing experts in each layer to work together without having to share an implementation language, or worry about specifics outside of their area of expertise. This is already possible outside of Cardano with LLVM IR: our work will bring these same benefits to the Cardano ecosystem.
What You Can Do
If these possibilities excite you, we encourage you to vote for Covenant here. Only with your help can we bring Covenant to life: your support will allow us to change how Cardano scripts are written, optimized, analyzed, and compiled, for the better.
- We could argue that any system claiming general recursion is similarly flawed. Ultimately, our computers, and the observable universe, are finite, thus we don't truly have 'arbitrary' resources at our disposal. However, the difference between the kind of finiteness imposed on our computations in a more conventional setting, and onchain, is that we are almost certain to hit these limits onchain.