Extending OCaml programs with Lua (soupault got plugin support)
Last update:
Tags:
Most of the time, when people make extensible programs in typed functional languages,
they make a DSL, not least because it's much easier to make a DSL in a language with algebraic types
and pattern matching than in one without.
Some use cases really require a general-purpose language though. That's where things get
more interesting. Commonly used embeddable interpreters such as Lua, Guile, or Chicken are written in C.
It's possible to make OCaml or Haskell bindings for them and such bindings do exist,
but that's two high level languages communicating through a low level one.
It would be much better to be able to expose native types to the embedded language in a type-safe and more or less convenient
fashion. Here's my take at it.
The use case: soupault website generator
Soupault is a website generator based on HTML rewriting instead of template processing. I made it for my own website out of conceptual disagreement with the “classic” workflow with Markdown and front matter that no one seems to question.
I don’t mind it for blogs like this one since the blog format itself is ridig, but for non-blog websites it easily becomes limiting and forces you to either mix Markdown with HTML or invent custom extensions—both approaches arguably defeat the purpose of Markdown.
Soupault works directly on HTML and uses CSS selectors for locating elements, for example,
“insert file site/about.html
into <div id="content">
.
That allows you to use any imagineable formatting and also offers
features impossible with classic generators. You can make every page look different if you want,
how much of the page is a template and how much is content is up to you. Its TOC and footnotes widgets
can reuse existing ids and make the links persist even if the heading text changes completely.
It’s also quite easy to use as a drop-in workflow upgrade for handwritten websites or other generators,
without losing original URLs—it doesn’t force any workflow on you.
The cost of the templateless approach is that if something is not already supported, it cannot be done at all. Generators that use logicless templates exclusively have the same problem. The usual way to solve that problem is to support plugins.
Most generators fall into two categories: easily extensible but slow or fast but not extensible. Those written in interpreted languages like Jekyll are trivial to add plugin support, and plugins are easy to distribute.
Hugo is well known for speed thanks to being a native executable, but it’s not extensible.
Wyam is a rare example of a middle ground, written for .Net.
Can we combine the native full speed of core components with extensibility? Soupault is written in OCaml, which does support dynamic linking and it’s quite easy to use, but the real problem is distributing plugins. The users would need to compile them for their platform, and it’s obviously much harder than just dropping a file into the plugins directory.
The Lua-ML project
I’ve discovered the Lua-ML project around the time I started working on soupault, so I immediately started wondering if I can use it.
Lua-ML is a pure OCaml implementation of Lua. The great thing about it is that it’s fully modular. You can replace any part with your own module as long as the interface is compatible. Link in new modules as if they were a part of the standard library. Sure. Replace a part of the standard library with your module? That’s possible. Replace the AST interpreter but keep the library? That’s possible too. There are no black boxes.
These are the good parts. The bad part is that it comes from a now defunct research project—a compiler backend named C–. One of the project members, Christian Lindig, salvaged it from C– and published it on Github, but hadn’t actively worked on it.
Last time any non-maintenance work was done on was around 2005 or so. I was to become its first real user, too.
It had a complicated build process due to its use of literate programming (even though most modules had no documentation)
and there was a fork of a 2004 version of the Hashtbl
module from the standard library inside it.
Fortunately, Christian turned out to be very far from an unreachable maintainer. He gave me write access to the original repository so that I could fix the issues, and answered all questions about the codebase he could answer.
After some long nights spent messing up with the code, and with a patch by Gabriel Radanne that adds OPAM packaging, the build process was sane enough to make it a build dependency.
Another bad part is that it implements Lua 2.5,
which was a rather limited language. Many improvements, including for
loops were only made later.
But, that’s a start.
For a simple interpreter example you can easily play with, check out luaclient.ml. My goal for this post is to walk through are more realistic example from the soupault codebase that can be found in plugin_api.ml.
Plugin example
That was a rather long introduction. Let’s see what using Lua-ML actually looks like.
We’ll start with a plugin example. This is a very simple plugin replicating the site_url
feature of website generators that makes relative links into absolute URLs:
-- Converts relative links to absolute URLs
-- e.g. "/about" -> "https://www.example.com/about"
-- Get the URL from the widget config
site_url = config["site_url"]
if not Regex.match(site_url, "(.*)/$") then
site_url = site_url .. "/"
end
links = HTML.select(page, "a")
-- That's Lua 2.5, hand-cranked iteration...
index, link = next(links)
while index do
href = HTML.get_attribute(link, "href")
if href then
-- Check if URL schema is present
if not Regex.match(href, "^([a-zA-Z0-9]+):") then
-- Remove leading slashes
href = Regex.replace(href, "^/*", "")
href = site_url .. href
HTML.set_attribute(link, "href", href)
end
end
index, link = next(links, index)
end
As you can see, there’s a page
variable in the default environment.
There are also HTML
and Regex
modules made accessible to Lua. They are in fact
wrappers for lambdasoup and
ocaml-re libraries.
Assembling the interpreter
Preparing modules
“Abstract” types are known as userdata
in Lua. To expose our type to Lua,
we need to make a module matching this signature:
module type USERDATA = sig
type 'a t (* type parameter will be Lua value *)
val tname : string (* name of this type, for projection errors *)
val eq : ('a -> 'a -> bool) -> 'a t -> 'a t -> bool
val to_string : ('a -> string) -> 'a t -> string
end
So, we need a type, a string name for it, and functions for equality and string conversion.
The lambdasoup library uses phantom types to distinguish between element nodes and non-elements (roots, text, and whitespace) for better type safety: internally all nodes have the same structure, but their types are artificially made different so that you can’t do things that make no sense, like inserting a child into a text node. We’ll artificially force that type to monomorphic with a simple sum type wrapper and some conversion/coercion functions:
module Html = struct
type soup_wrapper =
| GeneralNode of Soup.general Soup.node
| ElementNode of Soup.element Soup.node
| SoupNode of Soup.soup Soup.node
type 'a t = soup_wrapper
let tname = "html"
let eq _ = fun x y -> Soup.equal_modulo_whitespace (to_general x) (to_general y)
let to_string _ s = Soup.to_string (to_general s)
let from_soup s = SoupNode s
let from_element e = ElementNode e
let to_element n =
match n with
| ElementNode n -> n
| _ -> raise (Plugin_error "Expected an element, but found a document")
let to_general n =
match n with
| GeneralNode n -> n
| ElementNode n -> Soup.coerce n
| SoupNode n -> Soup.coerce n
let select soup selector =
to_general soup |> Soup.select selector |> Soup.to_list |> List.map (fun x -> ElementNode x)
let get_attribute node attr_name =
to_element node |> Soup.attribute attr_name
let set_attribute node attr_name attr_value =
to_element node |> Soup.set_attribute attr_name attr_value
end
Now we need to make modules that provide embedding and projection for out types (that is, conversion
to and from Lua values). For that we need to feed our module to a Lua.Lib.Combine
functor.
It provides multiple different functors for different number of modules to handle, we’ll use the Lua.Lib.Combine.T2
one fo handle the built-in Luaiolib.T
module (that provides I/O functions) and our module at once:
module T =
Lua.Lib.Combine.T2 (Luaiolib.T) (Html)
module LuaioT = T.TV1
module HtmlT = T.TV2
Now LuaioT
and HtmlT
modules are ready to use. Use for what exactly? For assembling a complete Lua library.
The regex module works with strings, which are supported by Lua-ML without resorting to custom types,
so it’s just a simple wrapper for ocaml-re
and we do not need to do anything special with it.
module Re_wrapper = struct
let replace ?(all=false) s pat sub =
try
let re = Re.Perl.compile_pat pat in
Re.replace ~all:all ~f:(fun _ -> sub) re s
with Re__Perl.Parse_error | Re__Perl.Not_supported ->
raise (Plugin_error (Printf.sprintf "Malformed regex \"%s\"" pat))
(* ... *)
Assembling the library
This is the complicated part. The first stage is to create a functor that will convert our Html
module
to a Lua library and register the Lua-visible HTML
and Regex
modules in the interpreter state.
The functor will take a Lua.Lib.TYPEVIEW
module setup with type 'a Html.t
to make the module
with embedding and projection functions from it.
Simply creating such a module will not yet expose it to Lua. For that we need to pass a list of function name
and function tuples to register_module
.
Lua-friendly functions are created from OCaml functions using combinators from the C
module create by the Luavalue.Make
functor.
module MakeLib
(HtmlV: Lua.Lib.TYPEVIEW with type 'a t = 'a Html.t) :
Lua.Lib.USERCODE with type 'a userdata' = 'a HtmlV.combined =
struct
type 'a userdata' = 'a HtmlV.combined
module M (C: Lua.Lib.CORE with type 'a V.userdata' = 'a userdata') = struct
module V = C.V
let ( **-> ) = V.( **-> )
let ( **->> ) x y = x **-> V.result y
module Map = struct
let html = HtmlV.makemap V.userdata V.projection
end (* Map *)
let init g =
C.register_module "HTML" [
"select", V.efunc (Map.html **-> V.string **->> (V.list Map.html)) Html.select;
"get_attribute", V.efunc (Map.html **-> V.string **->> V.option V.string) Html.get_attribute;
"set_attribute", V.efunc (Map.html **-> V.string **-> V.string **->> V.unit) Html.set_attribute;
(* ... *)
] g;
C.register_module "Regex" [
"replace", V.efunc (V.string **-> V.string **-> V.string **->> V.string)
(Re_wrapper.replace ~all:false);
(* ... *)
] g
end (* M *)
end (* MakeLib *)
Now we need to link those modules together:
module W = Lua.Lib.WithType (T)
module C =
Lua.Lib.Combine.C5
(Luaiolib.Make(LuaioT))
(Luacamllib.Make(LuaioT))
(W (Luastrlib.M))
(W (Luamathlib.M))
(MakeLib (HtmlT))
And finally create an interpreter module:
module I =
Lua.MakeInterp
(Lua.Parser.MakeStandard)
(Lua.MakeEval (T) (C))
Passing values to the interpreter
That’s all good, but to make it possible for plugins to modify internal values of our program, we need to pass them to the interpreter.
This is where the HtmlT
module we created is needed. It provides a makemap
function that creates a record whose fields are functions, among them the embed
and project
we need:
let lua_of_soup s =
let v = HtmlT.makemap I.Value.userdata I.Value.projection in
v.embed s
let soup_of_lua l =
let v = HtmlT.makemap I.Value.userdata I.Value.projection in
v.project l
Running the interpreter
Finally we can setup an environment and run Lua code in it:
let state = I.mk () in
let soup = Soup.parse "<p>hello world</p>" in
let () = I.register_globals ["page", lua_of_soup (Html.SoupNode soup)] state in
let _ = I.dostring state "print(page)" in
The I.dostring
and I.dofile
functions return a list of Lua values now.
It’s not very easy to work with, and worse, execution errors are only logged
to stderr
and the caller has no easy way to see if plugin execution succeeded
or failed. That’s definitely one of the things to fix.