June 2022 Monthly Update

By Alyssa Parado

Tablecloth

During the first period of this project (April - June), I conducted some initial research to plan and scope the project. I also opened an initial PR into tablecloth that establishes a new `tablecloth.column.api` and some core functions setting up a `column` “primitive.” For a fuller account, of my thought process and the work I’ve been doing, I’ve written a blog post “Columns for Tablecloth”.

I’ve included a copy of that text below:

Columns for Tablecloth

For a few years now, I have been working with a unique, global group of people associated with the SciCloj community, many of whom are interested in promoting the use of Clojure for data-centric computing. As part of this effort, I recently applied for and then received funding from Clojurists Together – an organization in the Clojure community that provides funding for open source work on Clojure tools – to contribute to an important new data-processing library called tablecloth.

The context for the project

Before delving into the nature of the project itself, I want to quickly explain my understanding of where this library tablecloth fits into the emerging Clojure data stack so it’s clear why I think this project is worthwhile. Generally, speaking much of the work that I’ve been doing within SciCloj has been focused on the question of usabilty. The problem of usability has emerged only because in the last years a number of very talented individuals have created a set of powerful new tools that provide the bedrock for highly performant data-processing in Clojure.

One of the tools that has become particularly prominent is the so-called “tech” stack developed by Chris Nurenberger. This stack consists of a low-level library called dtype-next, which provides a method for handling typed arrays or buffers (see a workshop I gave on this library here), and tech.ml.dataset that provides a column-based tabular dataset much like the “dataframes” one finds in R or Python’s Pandas library.

Using just tech.ml.dataset, one can already perform the kind of data analyses over large amounts of data. Indeed, in many cases, thanks to Chris Nurenberger’s amazing work, this stack can outperform equivalent tools in Python, R, and even Julia.1 However, although very usable in its own right, tech.ml.dataset is somewhat low level and its API is not always consistent. For this reason, many people start out with the library to which I will be contributing: tablecloth.

Tablecloth is essentially a wrapper on top of tech.ml.dataset. Authored by Tomasz Sulej (@generateme_blog), who has created many other useful libraries and has a knack for creating beautiful tools, it provides a consistent API for interacting with datasets that is inspired by the user-friendly syntax of R’s tidyverse libraries among others.

My project: columns for tablecloth

What I will work on during this project, then is adding a new dimension to tablecloth’s API. Currently, the focus of tablecloth’s API is the dataset. This, of course, makes perfect sense since in many cases when working with data, especially when manipulating data as a preparation for feature engineering, you are working primarily at the level of the whole dataset.

Sometimes, however, you may want to perform operations not across a whole dataset (or set of columns), but on a single column. And that’s where my project comes in. It will add add a new API for the column to tablecloth. In other words, once we’ve added this API we should, at the very least, be able to do something like:

(def a (column [20 30 40 50]))
(def b (column (range 4)))
(- a b)
;; => [20 29 38 47]

The goal is to make the column, like the dataset, a first-class primitive within the tech stack. Like R’s vectors and Numpy’s array, the hope is that when people are working with tablecloth, users can reach for the column when they need to do some focused processing on a 1d (or perhaps 2d set of items).

A big open question: n-dimensionality

When I originally conceived of this project, I thought what we might be doing is bringing full-fledged support for n-dimensional arrays into tablecloth. Indeed, I orgiinally conceived of the project as an adjunct library called tablecloth.array. My thought was that this distinct library might eschew reliance on tech.ml.dataset – which has some startup costs – and simply rely on the lower-level library dtype-next, which has the key tools for efficient array processing and is in fact the basis for tech.ml.datset’s columns.

However, for a number of reasons this is not practical. First, there is already a solid tool for array processing in the Clojure world: the neanderathal library by Dragan Djuric (@draganrocks). Second, dtype-next is so low-level that some of the things that one might need, such as automatic scanning of the items in an array to determine their datatype, are not present by default. Right now, anyway, those features only exist within tech.ml.dataset’s Column type. As such, what we decided to do is build directly on the tech.ml.dataset’s Column as we get so much for free.

One consequence of this approach is that we cannot easily add n-dimensional support. The column in tech.ml.dataset, as the name might suggest, is not designed for multi-dimensionality. It is built on a single dtype-next buffer. There are possible approaches for layering on support for n-dimensionality, for example, as Chris Nurenberger put it:

[A]s far as if a column could be N-D, you can take a linear random access array and make it ND via an addressing operator…2

This solution sounds like a promising approach, but involves sufficient complexity that I think it will remain out of scope for this project. Another idea that was batted around briefly was whether we could support just two dimensions if we just allowed tablecloth’s column API to pass around a dataset internally as a kind of representation of a two dimensional matrix.

However, I think ghosting behind these technical implementation questions is a more general question of what people need. The impetus for this project was about usability. We already have powerful tools such as neanderthal and core.matrix that handle matrix operations. We wanted to build support for those kind of operations within the syntactical vernacular of tablecloth, which so many people have found pleasant to use. It’s about making it so that users don’t need to change tools, and with it their whole mental model for working with the tool they are using.

Yet I think it is fair to say we still don’t know what people really need in this space. In that respect, I think of this project as an experiment. It is better that we do not go whole hog and try to get n-dimensions right away. What we’ll do first is try to build a sensible column API for tablecloth and then see how (and if) it is used. Perhaps there will be a need for more dimensions; perhaps not.

What I’ve done so far

Now for what work has been done so far. Much of what I have been doing so far has been research. I have reviewed some key tutorials for Python Numpy (here) and R’s vectors (here) in an attempt to study the other dominant APIs. I’ve also had a planning session with the author of tablecloth, Tomasz Sulej, to think about dimensionailty issue detailed above as well as the kinds of functionality we want to target during this project.

Here roughly, are the main areas of work that we think now will emerge more or less in order:

  1. Establish the core “primitive” of the column, i.e. the entity that users will be able to create and manipulate;
  2. Build out any necessary API for indexing, slicing and interating;
  3. Lift various relevant operations and linear algebra functions that are in dtype-next into tablecloth, in particular dtype-next’s “functional” namespace (i.e. tech.v3.datatype.functional);
  4. finally, consider importing ideas from R’s “factors” and lapply iterators.

So far the hard coding work that has been done relates to the first item above: establishing the column primitive. This PR does two things:

  1. It establishes a new api domain: tablecloth.column.api. Rather than mixing column support into tablecloth’s main api (tablecloth.api), Tomasz suggested we keep it distinct. This should help clarify when we are dealing with a column and when we are dealing with a dataset. It also means that we don’t end up with naming collisions (e.g. for example there’s already a tablecloth.api/column function).
  2. It adds a basic set of core functions that establish the column primitive. These are:
    • column - creates a column
    • column? - identifies a column
    • typeof/typeof? - identify the datatypes of the column’s elements
    • ones / zeros - create columns filled with ones or zeros

There’s not a lot to say about these functions that is probably not rather obvious. One thing to note is that the inspiration is here being taken from both the R and Python worlds. R uses the name typeof for its typed “atomic” vectors, and Python’s Numpy uses the functions ones and zeros. This practice of drawing on (hopefully the best) of the existing libraries in other languages is something that I intend to continue and do even more deeply, and which I believe is key to the Tomasz’s strategy in building tablecloth to begin with.

Next steps

Having established the tablecloth.column.api namespace, the next major step I think will be to build out from this simple core of functions making it possible to do more with columns. This means considering either basic operations or indexing, slicing, and iterating over columns. I have a sense that indexing, slicing, and iterating will take precedence.

I also want to try to quickly conduct a rough survey of the operations that we might include, a kind of working spec or planning document. This way I can look at the full set of operations in one place, and also understand what we may take from the various existing APIs from which we want to learn. In other words, what can concepts/functions may we want to darw on from Numpy, from R, from Julia, and so on.

Deep Diamond

My goal with this round (Q1-2022) is to implement Recurrent Neural Networks (RNN) support in Deep Diamond. The first month was dedicated to literature review (and other hammock-located work), exploration of OneDNN implementation of RNN layers, and implementation of a RNN prototype in Clojure with Deep Diamond. In the second month, I made the first iteration of fully functional vanilla RNN layer based on the DNNL backend that fits into the existing high-level network building code.

Building on this, in the third month I finally managed to crack that tough nut called “recurrent networks implementation libraries” and make:

  1. A nice RNN layer implementation that seamlessly fits into the existing low-level and high level Deep Diamond infrastructure.
  2. The DNNL implementation of Vanilla RNN, LSTM, and GRU layers for the CPU (fully functional, but not ideal).
  3. The low-level integration of Nvidia’s CUDNN GPU library, including TESTS.
  4. The high-level integration of CUDNN into Deep Diamond’s layers (tests show that I need to improve this).
  5. A functional test that demonstrates learning of the said layers with Adam and SGD learning algorithms.
  6. I released a new version of Deep Diamond 0.23.0 with these improvements (RNN support should be considered a preview, as I still need to fix and polish some parts of CUDNN GPU implementation).

Some notes about this milestone:

All in all, I am 80% satisfied with what I achieved in these 3 months. I hoped for a more complete implementation. On the other hand, I solved all tricky problems, and have clear idea how to fix and improve what is still not up to Uncomplicate standards, so I’ll have something to work on in the following weeks/months. And it’s finally time to update the book with these new goodies!

Thank you Clojurists for your continued trust in my work!

Overtone Playground

Since the last project update I have been exploring overtone possibilities and finding ways to implement Sonic-Pi tutorial through my project. I have pushed the updates to the Overtone-Playground GitHub Page where I have partially or fully covered the functionalities from the beginning chapters, such as playing notes, groups of notes, samples, playing melodies, looping what is played (similar to the powerful live_loop function in sonic-pi, but work still in progress), using function scheduler to play multiple sounds at the same time (which works well, but needs to be upgraded to allow more control to the user) and controlling loops with stop-loop function. The code is in the repository and the guides for using those functionalities are in the making and will be published soon. However, I do need an extension because I’ve underestimated the complexity of the problem and overestimated the speed at which I can discover Overtone issues. You can follow me on twitter @savorocks for future updates on the project, and there is also a category Clojurists Together 2022 Q1 on my website where you can see all the articles that are, and will be published regarding this funding.

Datahike Server

Second Update

The second iteration saw a lot of research and discussions on the JSON interface and the new target platform. Since a part of our team was on vacation we focused on fewer things this month.

Error Handling & Testing

Better error handling and testing was added with the finalization of the historical database headers (https://github.com/replikativ/datahike-server/pull/40). We separated dev mode and production configuration to better test both scenarios without re-using an existing database all the time.

New platform

While finding a new target platform different languages were considered. While checking different surveys (https://insights.stackoverflow.com/survey/2021, https://spectrum.ieee.org/top-programming-languages, https://redmonk.com/sogrady/2022/03/28/language-rankings-1-22, https://pypl.github.io/PYPL.html) of popular languages we were discussing which platform would be a good fit outside of JVM and JavaScript. The language should not be so esoteric that nobody would use it and not a specialised language where a database connection would not make sense. We agreed on a popular language that would support proper REST support, commonly used and maintained libraries, and that is also distinct to Lisp. In the end the decision was between Python, Ruby, PHP and Go. Python would be too much focused on Data Science and high performant databases, Ruby would be only work well with Ruby on Rails, and PHP is well PHP (part of us had to use it on their job a couple of years ago and were not a fan of it). So the new target platform will be Go with a simple REST client to the server.

JSON support

We are implementing support for JSON requests and responses, using the Muuntaja middleware stack for basic encoding and decoding, with route-specific parsing in handlers. Where feasible, we strive to include transparent conversion between strings and Clojure keywords and symbols. We investigated supporting fully transparent JSON-Datahike interoperability without requiring any usage of Clojure or Datalog syntax. However, that seems infeasible for query and pull, which can be too complex for reliably correct parsing of JSON. Still, for the next most complex Datahike API call, transact, it seems feasible at least in principle for common, simpler use cases. Thus, we take a two-tier approach: one with out-of-the-box JSON support for simple API calls, sufficient to cover most casual usage; and another, allowing or requiring JSON strings containing Clojure and Datalog syntax, for more advanced usage, including calls to pull and query. Please note that some uncertainty remains: for instance, we currently need to resolve a problem in implementing support in transact involving default decoding into java.lang.Integer where a Long is needed instead.

Outlook

The last month will see the implementation of the new JSON interface and subsequently the implementation of the new platform client. With the server API finished up, we will focus on adjusting the client API as well as the documentation while polishing the last open PRs from this iteration.

Beyond Clojurists Together Tasks

We had a fix on the read handler and config tests. Additionally we adjusted extensively our benchmarks and backwards compatibility tests. Finally we started with better migration tooling that would support older versions and better intermediary export files.

Last Update

In the last iteration we focused heavily on the JSON interface and the Go client, as well as documentation and overall cleanup.

Cleanup and PRs

We finished up the open PRs like the historical-db-fix and additional DB data, so we have better database information for the clients to consume.

Go Client

Since we only worked with Clojure for some time, it took a little bit of effort getting back into the Go language. The datahike-go-client library was set up with very simple functionality without any caching for now and a basic rest client setup. First we started with basic GET handlers like fetching databases or fetching the current schema since the basic JSON responses were easily supported. Most time was spent on writing out the proper data types and coercing them from the JSON definitions. Yeah, we totally forgot how long it takes to write this boiler plate in all that typed languages out there and without a REPL the whole development cycle was event slower since everything needed to be covered with tests. Finally with the implementation of the JSON interface we could also implement the POST interfaces that would include transactions and queries. Now we have our first foreign platform client and are eager to test it.

JSON support

We added support for JSON requests and responses across Datahike Server API endpoints. This is done via a two-tier approach (mentioned in the last report): one tier with out-of-the-box JSON support for simple API calls, sufficient to cover most casual usage; and another, requiring JSON strings containing Clojure and Datalog syntax, for more advanced usage, including calls to pull and query. At the moment, most endpoints belong only to one or the other (or neither, if no arguments are required), with the only exception being load-entities, which requires Clojure syntax in JSON requests to databases with :schema-flexibility :read, and vanilla JSON otherwise.

Subject to syntax requirements (documentation in progress), both tiers encode and decode keywords (including namespaced keywords), symbols, and lists. Sets are also supported in server responses, but not requests, since they are to the best of our knowledge rarely or never used there.

In terms of database-specific expressions, our JSON support includes cardinality-many attributes, lookup refs, and attribute references.

The “advanced” tier is written to support arbitrarily complex expressions, with no loss of expressivity relative to EDN. As for the “simple” or rather out-of-the-box tier, its functionality should be largely identical to that available using EDN data, with limitations largely relevant to the transact endpoint, which does not support tuple-valued attributes, transacting datoms directly, and the operations :db.fn/call, :db/cas or :db.fn/cas, :db.install/_attribute, :db.install/attribute, and :db.entity/preds–which are rarely used anyway, as far as we know.

Limitations and future work

Types and tagged literals are not yet fully supported, though that is a high-priority to-do.

We plan to extend transact, datoms, seek-datoms, entity, schema, reverse-schema, and index-range, in the out-of-the-box tier, to allow “advanced” Clojure-inclusive syntax as well.

In addition, we would like to revisit the feasibility of minimizing handler-level parsing by maximizing usage of Muuntaja and Jsonista functionality in the middleware chain, for improved performance and cleaner code.

Database and server metrics

We are finalizing on-demand (as opposed to pre-computed) datom and index metrics for Datahike, as well as query and transaction time logging on the Server. Backend and library versions are in the works, pending updates to upstream libraries, primarily Konserve.

Outlook

With a JSON interface now integrated into the server we are able to experiment with other platforms where people can harness the power of Datalog. For that we started writing a blog post on how to do that in plain Go so other contributors can implement a simple REST wrapper in their favorite language. Since the development in this context heavily focussed on moving the server forward, we now want to step back a little bit and work on more basic issues in Datahike, mainly performance improvements with the index, convenience functions for import/export through the wanderung library, documentation with clerk instead of plain markdown files, benchmarking improvements so we can see regressions, and more compatibility checks. Additionally, we’re trying to introduce polylith to our stack since there are too many namespaces that share the same code and a lot of dependencies that need to be updated collectively.

We’re very grateful to Clojurists Together for allowing us to work on the things that we care for. Thanks guys! :)

Conjure

First Update

Conjure has been moving forward on many fronts over the last few months. Some of that work is related to non-Clojure language support (such as Julia and Common-Lisp), some is to do with the underlying Fennel system (Aniseed) but the majority of the commits were improving the Clojure client is various ways.

It’s also worth noting that the release I put out today for this Clojurists Together checkpoint was the 100th release! That’s 100 tags, each containing many commits, since 2018 or so when I started work on my perfect Clojure interactive evaluation environment. I’m so happy with the state of Conjure today, I’m proud of what I managed to build and I’m so impressed with the community interaction and contributions. Seeing the (almost) 1000 stars on GitHub and hundreds of members in our Discord is wonderful.

You can read all the juicy details of every recent release on the GitHub releases page but I’ll highlight my favourite parts here.

There’s also been so much work done to Aniseed, the underlying Fennel Lisp system Conjure is built with as well as the various clients alongside the core Clojure one. I’ve managed to close of a bunch of bugs and clean up tickets that have been lingering for far too long.

The next batch of work under Clojurists Together will be the long awaited interactive debugger support. Hopefully everything goes well and it’s possible, it’s what I’ve wanted to do as part of this funding all along. My second and final update here in a few months should hopefully involve interactive stepping debugging of Clojure from Neovim!

Thank you so very much to each and every one of you who uses, supports, funds or contributes to Conjure and it’s associated projects. I cannot express my gratitude enough here, so I will instead carry on trying to build the best conversational software development platform out there for you.

Second Update

Since my last update on Conjure I’ve managed to get a few more key fixes and features in as well as learned enough to realise I need to take on yet another open source project (you’ll be interested in this even if you don’t use Neovim!). I’ve completed the main feature I wanted to finish as part of this funding round (debugging) and started a longer journey to supersede it eventually (even better editor agnostic debugging).

Here’s a quick overview of the changes since the my last project update:

The debugger support is the star of the show, I hope you enjoy this new support and help me improve it into the future with issue reports and pull requests. It’s fairly minimal but that’s where the new open source project I mentioned comes into play.

Introducing the Clojure CIDER DAP server (which is an empty repository at the time of writing)! This project fell out of all of my Clojure debugging research, it will bridge the gap between DAP compatible tools (such as nvim-dap) in any editor and the CIDER debugger tooling.

This new, editor agnostic, standalone, CLI tool will allow you to plug your editor’s debugger support into a running nREPL server. You will be able to perform interactive debugging with powerful tools, in or outside of Neovim, in a shared nREPL connection with Conjure.

The goal will be to make this the primary way of debugging your Clojure applications with Conjure’s built in support being a simpler fallback for when you don’t have a choice or when you don’t need a rich GUI.

So, I hope you’ve enjoyed the features, fixes, improvements and optimisations I’ve brought to Conjure over my Clojurists Together funding period. And I really hope you enjoy my upcoming further work in the debugger tooling space, regardless of your editor or REPL tooling choices.

I’d love to hear your thoughts, opinions, feelings and feedback on Twitter @OliverCaldwell. Bye for now!

Bozhidar Batsov

Michiel Borkent

Babashka CLI

Turn Clojure functions into CLIs!

This is one of my newest projects. It aims to close the gap between good command line UX and calling Clojure functions. It is very much inspired by the clojure CLI, but solves a problem which sometimes causes frustration, especially among Windows users: having to use quotes in a shell. It also offers support for subcommands. One project benefiting from that is neil. I blogged about babashka CLI here.

Http-server

Serve static assets.

Another new project is http-server, which can be used in Clojure JVM and babashka to serve static assets in an http server.

Clj-kondo workshop

In June I had the honor and pleasure to give a workshop about clj-kondo at ClojureD. You can work through the material yourself if you’d like here. Feel free to join the clj-kondo channel on Clojurians Slack for questions. Here are some pictures from the event.

Jet

CLI to transform between JSON, EDN and Transit, powered with a minimal query language.

Changelog

The jet binary is now available for Apple Silicon and adds specter as part of the standard library for transforming data. Also, the output is colorized and pretty-printed using puget now.

Edamame

Changelog

Configurable EDN/Clojure parser with location metadata. It has been stable for a while and reached version 1.0.0. The API is exposed now in babashka and nbb as well.

Quickdoc

Quickdoc is a tool to generate documentation from namespace/var analysis done by clj-kondo. It’s fast and spits out an API.md file in the root of your project, so you can immediately view it on Github. It has undergone significant improvements in the last two months. I’m using quickdoc myself in several projects.

Nbb

Scripting in Clojure on Node.js using SCI.

Changelog

Added edamame.core, cljs.math, nREPL improvements and now has significant faster startup due to an improvement in SCI.

Clojure-lsp

Clojure/Script Language Server (LSP) implementation.

This project is driven by the static analysis done by clj-kondo and used by many people to get IDE-like features in editors like emacs and VSCode.

I added support for Apple Silicon using Cirrus CI.

Babashka

Native, fast starting Clojure interpreter for scripting.

Changelog

Two new version of babashka were released:

0.8.2 and 0.8.156. The last segment of the version number now indicates the release count, so the last release is the 156th release.

Babashka now also has a new Apple Silicon binary built on Cirrus CI. What is very exciting is that babashka can now execute schema from source. Compatibility with malli is underway.

Clj-kondo

A linter for Clojure code that sparks joy.

Changelog

New linters:

Clj-kondo now also has a new Apple Silicon binary built on Cirrus CI.

SCI

Configurable Clojure interpreter suitable for scripting and Clojure DSLs. Powering babashka, nbb, joyride and many other projects.

Changelog

New releases: 0.3.5 - 0.3.32

Highlights:

SCI configs

A collection of ready to be used SCI configurations.

This project contains configurations for reagent, promesa, etc. and are used in nbb, clerk and other projects.

A recent addition was a configuration for cljs.test which is now shared by nbb and joyride.

Process

Changelog

New releases: 0.1.2 - 0.1.4

Highlights:

Support exec call in GraalVM native-images - this means you can replace the current process with another one.

Scittle

The Small Clojure Interpreter exposed for usage in browser script tags.

Added support developing CLJS via nREPL. See docs.

Etaoin

Pure Clojure Webdriver protocol implementation.

This project is now compatible with babashka! Most of the work on this project was done by Lee Read. If you appreciate his work on this, or other projects like rewrite-clj, consider sponsoring him.

Misc

Brief mentions of miscellaneous other projects I worked on:

Dragan Djuric

During this reporting period, I focused more on behind-the-scenes work, due to having a tough schedule on other Clojurists Together project in May (Q1 Deep Diamond). That left me with less energy for other coding and releases, but OTOH enabled me to work on foundations of the planned work, specifically Clojure Sound devices and music skills, Neanderthal sparse matrices, Deep Diamond backends and stability work, etc.

This does not mean that I don’t have anything to show, of course! I do, but maybe not as much as I showed in other reports :)

So, let’s start with releases:

I finally started writing tutorials on my blog. This June I published 4 articles

Fundamentals and hammock time (this is actually sometimes that I spent most of my time on):

Thomas Heller

Time was mostly spent on doing maintenance work and some bugfixes. As well as helping people out via the typical channels (eg. Clojurians Slack).

Current shadow-cljs version: 2.19.5 Changelog

Notable Updates

David Nolen

Nikita Prokopov

Hello everybody, this is Niki Tonsky and we continue working for the greater good of Clojure Community.

HumbleUI:

Main topic for last two months: text fields! Surprisingly complicated beasts. I have a list of 47 tasks related to it, so far I’ve managed to complete 35 of them, or ~75%.

Also:

JWM:

Skija:

ClojureScript:

DataScript:

Clojure Sublimed:

And some Clojure-related blogging:


  1. Chris Nurenberger, “High Performance Data with Clojure.” YouTube, 9 June 2021, https://youtu.be/5mUGu4RlwKE. ↩︎

  2. Chris Nurenberger. Message posted to #data-science channel, topic: “matrix muliplication in dtype-next”. Clojurians Zulip, 20 September 2021. ↩︎