January 2021 Monthly Update
By Alyssa Parado
Here’s the latest updates from the projects in January. This was the first month for ClojisR.
Here is an overview of the work I did per project.
- Only build static binary on non-SNAPSHOT release #695
- Migration from borkdude/babashka to babashka/babashka
- Babashka pods: write to output before delivering result commit
- pod-babashka aws: a new pod to interact with AWS
--download-diroption to install script #688
- conditionally defined vars should not have metadata #496
- experiment with using sci and pgx to create postgres extension #373. Also see the repo.
- Fix shadow warnings #499
- Several performance improvements
- Review PR #1108: add local analysis to clj-kondo
- Move to clj-kondo/clj-kondo organisation
- Fix local analysis name #1112
- Research for clj-kondo resolve from other namespace, work in progress.
- Fix finding without location info #1101
- Alpine Docker build #1111
- Review several PRs
- First release (0.0.1-0.0.3) of the puget CLI, a binary to pipe EDN to and get pprinted and colorize output.
- First release of Carve as a binary.
- spartan.spec: fix compatibility with expound. See example.
- spartan.spec: fix compatibility with cli-matic. See issue.
- interop on map returns nil. #711
(instance? clojure.lang.Fn x)now works
- Release 0.2.7
- pod-babashka-aws: fix problem with
nilin response #27
- babashka-sql-pods: upgrade to newest next.jdbc version #18
- spartan.spec: fix s/and + s/cat #15
- Include clojure.core.match #594
- Include hiccup #646
- Include clojure.test.check #487. Included namespaces:
- Unroll local binding calls #502
- Unroll local binding ref #504
alengthdoesn’t work with GraalVM #507
- Release 0.2.1
- Prioritize current namespace in syntax quote #509
- Faster processing of colls #482
- Fix destructuring of destructuring in protocol method of record #512
- Fix shadowing of record field in protocol method #513
- New linter:
:unresolved-var. This detects unresolved vars in other namespaces, like
set/onion. See docs. #635
- Derive file path from only file linted #1135
- Fix finding without location info #1101
- Derive config dir from only file path linted #1135
- Support name in defmethod fn-tail #1115
- Avoid crash when using
- Release v2021.01.20
In the past few weeks I tried finding the low-hanging fruit to improve performance. I did this with the help of the excellent clj-async-profiler. I also tried a few ideas that ultimately didn’t pan out:
:whenconditions earlier in the network. In theory, this could reduce unnecessary work by discarding invalid facts sooner. In practice, it didn’t affect the benchmarks at all, and the code was really convoluted. So much for that.
- Storing matches as records instead of hash maps. In theory, this could boost performance because records have faster field access than hash maps. In practice, the benchmarks actually got a bit slower! I still haven’t figured out exactly why, but it at least indicates that field access isn’t a bottleneck.
With the micro-optimizations out of the way, the next few weeks will be all about improving perf with the RETE algorithm itself. GOTTA GO FAST.
I shed a lot of blood, sweat, and parenthesis to implement a common optimization called “node sharing”. Rules that pulled in similar data now could share nodes internally, avoiding duplicated effort. Then, just as I was about to cut a release, I had an epiphany: “derived facts” give us the same benefits as node sharing!
I wrote a section in the README about how to use derived facts in this way. When I applied this technique to the “dungeon” benchmark, it went from ~1700 ms to ~1100 ms – an even better improvement than from node sharing. The dungeon crawler game improved by 5 FPS with the same technique.
I ended up deleting a few hundred lines of fresh code – no need to complicate the codebase if we can just leverage an existing feature instead. I’m now going to focus on debugging and inspecting sessions. Can we do better than the almighty
println? Probably not, but no harm in trying…
I’ve done some work in both Calva and Clojure-lsp.
Calva now uses clj-kondo via clojure-lsp, and no longer bundles the clj-kondo extension. This reduces Calva’s indirect memory footprint as opposed to the previous setup of using clojure-lsp and the clj-kondo extension, since the clj-kondo extension runs its own LSP server, which is no longer necessary.
My co-maintainer Peter also did quite a bit of work recently. Below is a list of Calva changes in the last couple weeks.
[2.0.151] - 2021-01-15
[2.0.150] - 2021-01-13
[2.0.149] - 2021-01-12
- Fix: calva.jackInEnv does not resolve
- Update clojure-lsp to version 2021.01.12-02.18.26. Fix: clojure-lsp processes left running/orphaned if VS Code is closed while the lsp server is starting
[2.0.148] - 2021-01-07
- Update clojure-lsp to version 2021.01.07-20.02.02
[2.0.147] - 2021-01-07
- Fix: Dimming ignored forms does not work correctly with metadata
- Improve clojure-lsp jar integration
- Update clojure-lsp to version 2021.01.07-12.28.44
[2.0.146] - 2021-01-04
- Fix: Slurp forward sometimes joins forms to one
- Fix: clojure-lsp processes left running/orphaned if VS Code is closed while the lsp server is starting
- Fix: go to definition jumps to inc instead of inc'
- Fix: Error when start a REPL with jdk15
[2.0.145] - 2021-01-03
- Add command for opening the file for the output/repl window namespace
- Add setting for auto opening the repl window on Jack-in/Connect
- Add setting for auto opening the Jack-in Terminal
- Replace opening Calva says on start w/ info message box
- Add command for opening Calva documentation
- Change default keyboard shortcut for syncing the repl window ns to
[2.0.144] - 2021-01-01
- Reactivate definitions/navigation in core and library files
- Make load-file available in the output window
- Make the ns in the repl prompt a peekable symbol
I added a change to Clojure-lsp to prevent the process from being orphaned. If VS Code was killed in a certain time window while Clojure-lsp was initializing, the process would be left running. Now Clojure-lsp periodically checks if the parent process is alive and will exit if not. This is a suggested feature of servers by the language server protocol.
I also fixed an issue with the default classpath lookup for Windows.
For decorating functions instrumented for debugging, I’ve replaced the usage of clj-kondo with clojure-lsp, as well as restructured the code to make the feature work a bit better. This allowed us to remove clj-kondo as an injected jack-in dependency.
I’ve also changed the way clojure-lsp initialization progress is shown. Instead of a popup progress indicator, progress is now shown in the bottom status bar, and disappears when initialization is complete.
I’ve also helped with finding and debugging issues and testing fixes in Clojure-lsp after some significant changes were released. Much thanks to Eric Dallo for being very active, responding to and fixing issues that arise.
[2.0.156] - 2021-01-28
- Fix: Debug instrumentation decoration not working correctly anymore on Windows
- Fix: Debugger decorations issues
[2.0.155] - 2021-01-27
- Make command palette show alt+enter shortcut variant instead of enter for evaluating top level form
- Update clojure-lsp to 2021.01.28-03.03.16
- Fix: nrepl port detection race condition
[2.0.154] - 2021-01-27
- Fix: Calva uses ; for comments instead of ;;
- Update cider-nrepl to 0.25.8
- Update clojure-lsp to 2021.01.26-22.35.27
[2.0.153] - 2021-01-19
- Use status bar message instead of withProgress message for clojure-lsp initialization
- Update cider-nrepl: 0.25.6 -> 0.25.7
- Fix: “Extract function” refactoring doesn’t work as expected with selections
[2.0.152] - 2021-01-19
- Fix: Jack-In env with non-string variables fails
- Use clojure-lsp for usages for debug instrumentation decorations, and stop injecting clj-kondo at jack-in
- Creating good documentation and beginner-friendly tutorials for ClojisR (a wrapper library exposing APIs for calling R functions on R objects in Clojure), thereby allowing us to expose the Clojure data science ecosystem to a diverse group of users, especially data scientists studying R as their main language.
- Comparing R and Clojure ecosystems to push the Clojure data science stack forward.
- figuring out the differences and helping bring the missing functionalities to Clojure
- creating documentation that explains how R functionality and idioms can be achieved in Clojure
- Bringing the ClojisR library to a more complete and stable state.
- #1 Used a wide range of Clojure data science libraries (mainly Tablecloth, tech.ml.dataset, dtype-next, ClojisR, Vega+Hanami, Fastmath, Notespace) to translate code samples under the Data Wrangling section of R4DS covering the following topics:
- Tibbles (Dataframes)
- Data import
- Tidy data
- Relational data
- Dates and times
- The above translation allowed us to compare the current Clojure data science ecosystem with the R ecosystem and helped us figure out the functionality which was missing and the features which could be made more user-friendly. These points were discussed with the relevant library authors (mainly Tablecloth, ClojisR, Notespace, dtype-next) and various GitHub issues were opened for feature requests as well as bug fixes.
- dtype-next : human readable datetime
- tablecloth: view each column’s datatype below the column name
- tablecloth: Simplify access to row values in adjacent columns
- tablecloth: Update documentation
- tablecloth: (repeatedly rand) is running indefinitely despite other columns being finite in a dataset
- notespace: Ability to view the entire dataset in a scrollable format
- Translate the Data Visualization, Data Transformation, and Exploratory Data Analysis sections of R4DS to Clojure using libraries such as Tablecloth, Vega+Hanami, Fastmath, and dtype-next.
- Implement the relevant sections of R4DS using ClojisR in order to compare Clojure-R interop vs native Clojure functionality.
- Maybe translate the Predictive Machine Learning Models section of R4DS. (Note: Throughout my work, I’m maintaining a continuous dialogue with some of the relevant library authors. The order in which I approach these tasks may depend upon related developments in other libraries such as tech.ml (for machine learning))
Currently, our main focus is to help different Clojure data science libraries to bring R like functionality by filling in the missing gaps.
- #2 Used a wide range of Clojure data science libraries (mainly Vega+Hanami, Tablecloth, tech.ml.dataset, dtype-next, Fastmath, Notespace) to translate code samples under the Data Exploration section of R4DS covering the following topics:
- Data Visualization
- Data Transformation
- Exploratory Data Analysis
- Raised the following PRs to add functionality/fix issues:
- tech.ml.dataset: show or hide column’s datatype in a dataset
- tech.ml.dataset: functionality to print all the rows of a dataset
- tech.ml.dataset: created convenience functions for varying printing behavior
- tablecloth: updated documentation
- tablecloth: renamed functions used to add/replace columns
- dtype-next: cleanup column printing
- Implement the relevant sections of R4DS using ClojisR.
- Figure out how to bring some of R’s main ideas of working with vectors to Clojure by creating better ergonomics for array programming.