« Back to article list

Clojure is Capable

Table of Contents

Clojure is Capable

"Clojure is Capable". Cool tagline. But, capable of what?

Capable of many things!

So, given all of that, lets dive into the meat of the article. What are we going to accomplish in Clojure today?

How about writing a custom FUSE (Filesystem in Userspace) on GNU/Linux thanks to a combo of libfuse (https://github.com/libfuse/libfuse) and a JNR (Java Native Runtime) implementation of it for Java interop? (https://github.com/SerCeMan/jnr-fuse).

Inspired by the very rough idea here (https://github.com/kevinquinnyo/rest-fs), I felt that we could easily use Clojure to put the pieces together and make a REST based FUSE system to serve dog pictures from the Dog CEO API! (https://dog.ceo/dog-api/documentation/).

So, what does this mean from a user perspective? They'd be able to navigate through a local directory and view dog pictures, as if they had all the pictures saved locally (while, in actuality, all the pictures remain on the remote server and are pulled up via the API endpoints in real time).

Huge potential to use this idea for something like maintaining database records as if they were files in your favorite editor (Emacs of course, right?) or for maintaining updates on a true CRUD document store/API.

To see the full source at anytime: https://github.com/ahungry/ruse

Before we can run, we must walk

Ok, so, the very first thing we need to do is get a project going. If you've done this before, feel free to skip ahead.

lein new app ruse

This will create a new bare bones project, ready to develop in.

Open up project.clj and add the following:

(defproject ruse "0.1.0-SNAPSHOT"
  :description "FIXME: write description"
  :url "http://example.com/FIXME"
  :license {:name "Eclipse Public License"
            :url "http://www.eclipse.org/legal/epl-v10.html"}
  :dependencies [[org.clojure/clojure "1.10.0"]
                 [clj-http "3.9.1"]
                 [cheshire "5.8.1"]
                 [com.github.serceman/jnr-fuse "0.5.2.1"]]
  :repositories {"bintray" "https://jcenter.bintray.com"}
  :main ^:skip-aot ruse.core
  :target-path "target/%s"
  :profiles {:uberjar {:aot :all}})

We are setting Clojure to 1.10 because the version is amazing (it has spec from 1.9 and the better 1.10 error handling).

We pull in clj-http for API integration, cheshire for allowing clj-http to JSON translation, and jnr-fuse for the integration with libfuse (make sure you have this installed on your host OS).

Since jnr-fuse isn't on the default maven repo, we have to add it in the repositories key.

Ok, now setup is done, lets build an API integration

Some of this is rough and not purely functional. Please do not use this repo's code samples as the ideal way to write idiomatic Clojure.

Anyways, lets write a couple calls to the API in a new file in your project called `src/ruse/dog.clj`.

First step - set up the name space:

(ns ruse.dog
  (:require
   [clojure.repl :refer :all]
   [clj-http.client :as client]
   [ruse.util :as u]
   ))

We will visit ruse.util in a bit. Just know that its one of our own files meant to make some things in the code easier.

Now, lets map an API call.

(defn api-get-dog-breeds []
  (-> (client/get "https://dog.ceo/api/breeds/list/all"
                  {:as :json})
      :body :message))

(def mapi-get-dog-breeds (memoize api-get-dog-breeds))

Very simply, we are pulling a remote list with a structure similar to:

{"status":"success","message":{"affenpinscher":[],"african":[]}}

We add the memoize line, so that (given this API in particular) we do not make calls over and over when we've already pulled the result set once.

Using the thread-first arrow ("->") is letting us chain the result to the :body and :message function calls (did you know keys in Clojure are functions?)

In a moment I'll talk about threading arrows, so just take my word for it for now.

Now, we want to get the list of breeds as a string vector of a format similar to:

["affenpinscher", "african"]

Pretty easy with a call like this:

(defn get-dog-breeds []
  (->> (mapi-get-dog-breeds)
       keys
       (map #(subs (str %) 1))
       (into [])))

If you're new to Clojure, that fancy arrow is the "thread last" arrow and basically weaves our initial value (the result of mapi-get-dog-breeds) through the last arg slot of each other function in the list. Functions of 1 arity do not even need the parenthesis to wrap around them.

The hash is a reader macro/shortcut. It roughly translates to this lambda if we were not using the shorthand:

(map (fn [s] (subs (str s) 1)))

Anyways, we end up mapping a few more endpoints (one for getting a list of dogs under a breed category), and then we get to a more interesting one.

(defn api-get-dog-pic [breed s]
  (-> (client/get
       (str "https://images.dog.ceo/breeds/" breed "/" s)
       {:as :byte-array}
       )
      :body))

(def mapi-get-dog-pic (memoize api-get-dog-pic))

In this case, we're not passing the clj-http option to pull json content, but instead getting a raw byte-array. If you couldn't guess yet, that's because this call is responsible for pulling out the binary bytes that make an image for our custom file system.

Yada, yada, go on and map some more endpoints (just clone the repo and look at dog.clj if you're interested).

After a bit, you'll come across another slightly interesting thing:

(def http-cache (atom {}))

(defn set-http-cache! [breed]
  (swap! http-cache conj {(keyword breed) (get-pics-clean breed)}))

(defn get-dog-list! [breed]
  (let [kw (keyword breed)]
    (if (kw @http-cache)
      (kw @http-cache)
      (kw (set-http-cache! breed)))))

In this case, I'm leveraging the functionality of atoms to provide my own "cache" layer. The memoize seems like it would have worked, but (for whatever reason I didn't get around to debugging yet) it didn't seem to for this case, and running without my own atom layer to handle it caused a lot of duplicate hits to the API when certain shells (zsh + oh-my-zsh) were browsing the directory, due to the huge amount of getattr calls something like that makes.

Phew, get to the FUSE stuff already!

Sure, I will. In src/ruse/core.clj you can find the custom implementation.

Set up the name space, its a bit larger than last time:

(ns ruse.core
  (:require
   [clj-http.client :as client]
   [ruse.util :as u]
   [ruse.dog :as dog]
   )
  (:import
   (jnr.ffi Platform Pointer)
   (jnr.ffi.types off_t size_t)
   (ru.serce.jnrfuse ErrorCodes FuseFillDir FuseStubFS)
   (ru.serce.jnrfuse.struct FileStat FuseFileInfo)
   (ru.serce.jnrfuse.examples HelloFuse)
   (java.io File)
   (java.nio.file Paths)
   (java.nio ByteBuffer)
   (java.util Objects))
  (:gen-class))

Oh yea, and if you're so inclined, open up src/ruse/util.clj to see what that's all about. Most of it is just standard utility calls (very specific string parsing etc.), but one thing that's pretty interesting is this macro:

(defmacro lexical-ctx-map
  "Pull in all the lexical bindings into a map for passing somewhere else."
  []
  (let [symbols (keys &env)]
    (zipmap (map (fn [sym] `(quote ~(keyword sym)))
                 symbols)
            symbols)))

This essentially lets us "bubble up" all lexical variables into a map that we can pass to something else.

See this sample:

(let [x 1]
  (let [y 2]
    (lexical-ctx-map)))
;; Evals to: {:x 1 :y 2}

So, if you had a look at the JNR library, or have used any Java interop in the past, you should be aware that to extend a class, you just make a proxy call and give it your method overrides. The JNR FuseStubFS class closely resembles the functions you would define for raw libfuse integration in a C file:

(defn fuse-custom-mount []
  (proxy [FuseStubFS] []
    (getattr
      [path stat]                       ; string , jni
      (cond
        (u/member path stub-dirs) (getattr-directory (u/lexical-ctx-map))
        (dog/dog-exists? path) (getattr-file (u/lexical-ctx-map))
        :else (enoent-error)))
    (readdir
      [path buf filt offset fi]
      ;; Here we choose what to list.
      (prn "In readdir")
      (if (not (u/member path stub-dirs))
        (enoent-error)
        (readdir-list-files (u/lexical-ctx-map))))
    (open
      [path fi]
      ;; Here we handle errors on opening
      (prn "In open: " path fi)
      (if (and (u/member path stub-dirs) (not (dog/dog-exists? path)))
        (enoent-error)
        0))
    (read
      [path buf size offset fi]
      ;; Here we read the contents
      (prn "In read" path)
      (if
          (not (dog/dog-exists? path))
          (enoent-error)
          (read-fuse-file (u/lexical-ctx-map))))))

Here you can see what's happening - we're mapping out / overriding 4 methods:

attribute info,

Also take note of the code organization - unlike a Java class where all those calls have to be defined inline, resulting in a massive class definition, in Clojure we can call plain old top level functions from within these method overrides.

See the macro calls to lexical-ctx-map? That's so our tightly coupled functions that handle the fuse functionality don't need to redundantly bind / pass around map data or scalar arguments from the methods to the functions etc. If we had kept the code inline, it would have had the same variables available, so this is pretty safe with the tight coupling.

Speaking of those defns, lets check them out one at a time, starting with our getattr implementations (one for directories, one for files):

(defn getattr-directory [{:keys [path stat]}]
  (doto stat
    (-> .-st_mode (.set (bit-or FileStat/S_IFDIR (read-string "0755"))))
    (-> .-st_nlink (.set 2))))

(defn getattr-file [{:keys [path stat]}]
  (doto stat
    (-> .-st_mode (.set (bit-or FileStat/S_IFREG (read-string "0444"))))
    (-> .-st_nlink (.set 1))
    ;; Fake size reporting - 10MB is plenty.
    (-> .-st_size (.set (* 1024 1024 1)))))

So, if you've ever seen GNU/Linux (or I guess any UNIX/POSIX file system permissions before) you'll likely recognize the octal permission masks.

The only thing of real interest/note here is that we're giving a fake size value to our JNR struct. That's fine, as long as the number here is higher than the real bytes any given file handle will contain (I chose something reasonable like 10MB instead of 10TB in case a tool out there attempts to always read the full bytes from a getattr call vs stopping at the EOF signal - locking it up like that could be bad).

If you were making a really fancy REST API integration, maybe you would query all records on each `ls` invocation to serve real file sizes. We're not that fancy.

Lets go onto how to provide the list of directories/files:

(defn readdir-list-files-base
  "FILES is a string col."
  [{:keys [path buf filt offset fi]} dirs files]
  (doto filt
    (.apply buf "." nil 0)
    (.apply buf ".." nil 0))
  (doseq [dir dirs]
    (.apply filt buf dir nil 0))
  (doseq [file files]
    (.apply filt buf file nil 0))
  filt)

(defn readdir-list-files [{:keys [path buf filt offset fi] :as m}]
  (cond
    (= "/" path) (readdir-list-files-base m (dog/get-breeds) [])
    ;; Pop off leading slash and show the list of breeds.
    :else (readdir-list-files-base m [] (dog/get-dog-list! (subs path 1)))
    ))

So, we have a "base" function, which handles the full map of method params, as well as a string vector of dirs and files. For any given call that isn't in the root directory (the `/` is your FUSE mount root, not your OS level root) it will check the path, so `/whippet` will become `whippet`, at which point we will query our dog API to get a list of all whippet pictures.

Had we queried against the root directory, we would instead give a list of all the dog breeds as our directories.

So, how do we give these wonderful images to the user?

(defn read-fuse-file [{:keys [path buf size offset fi]}]
  (let [
        bytes (dog/get-dog-pic path)
        length (count bytes)
        bytes-to-read (min (- length offset) size)
        contents (ByteBuffer/wrap bytes)
        bytes-read (byte-array bytes-to-read)
        ]
    (doto contents
      (.position offset)
      (.get bytes-read 0 bytes-to-read))
    (-> buf (.put 0 bytes-read 0 bytes-to-read))
    (.position contents 0)
    bytes-to-read))

In this case, we simply use our byte-array returning API call, and chunk it out in a ByteBuffer wrapper to the user (most software that reads files does it in chunks, giving the OS the offset + size of data they want to read, so just plopping it all out at once will not work well in 99% of your use cases here).

Oh, and in that main proxy definition, we also had relied on some "stub-dirs" variable. That was me being very lazy and just pre-querying to override some directories I had hard coded during prototyping:

(defn set-stub-dirs []
  (->> (conj (map #(str "/" %) (dog/get-breeds)) "/")
       (into [])))

(def stub-dirs (set-stub-dirs))

Wow, cool implementation, so how do we mount it?

Woah, don't get too far ahead - we should ensure we unmount when needed first.

Lets define an atom to hold our mounted directory (atm this program mounts one custom system at a time, so its fine):

(def stub-atom (atom nil))

(defn mount-it! [dir]
  (let [stub (fuse-custom-mount)]
    (future
      (reset! stub-atom stub)
      ;; params: path blocking debug options
      (-> stub (.mount (u/string-to-path dir) true true (into-array String []))))
    ))

(defn umount-it! []
  (-> @stub-atom .umount))

(defn cleanup-hooks [mnt]
  (.addShutdownHook
   (Runtime/getRuntime)
   (Thread. (fn []
              (println "Unmounting " mnt)
              (umount-it!)))))

Oh, oops! How did the mount code sneak in there.

Anyways, we are using a Java facility to register an unmount call on cleanup. Unfortunately, this does not seem to work on all CTRL+C invocations to kill the program (but fortunately it does seem to end up unmounted most the time - if not, you get a bad mount point stuck in broken I/O state until you reboot the OS or something).

Then, we rely on the JNR "mount" method that we did not override in our proxy class to do the work for us.

Lastly, the "-main" you've all been waiting for:

(defn -main
  "I don't do a whole lot ... yet."
  [& args]
  (let [dir (first args)]
    (cleanup-hooks dir)
    (println "Mounting: " dir)
    (deref (mount-it! dir))
    (println "Try going to the directory and running ls.")))

The templated Lein message about not doing a whole lot (yet) seems appropriate. Maybe this repo will do much more in a bit (perhaps a config driven/declarative approach to mapping database tables and records to OS files? Or a declarative way of listing APIs, but that seems trickier).

Woah woah woah, not QUITE done yet

Up top (and in a couple of my other articles) I only briefly touched on the REPL experience.

I made a quick comment about it being like playing a trial and error based video game with emulator save states.

What I mean by that, is that even in an app such as this, involving low level C libraries and Java interop, Clojure's REPL is a beautiful tool for working on each "thing" one small piece at a time (not even getting into the extra functionality something like CIDER mode in Emacs provides for this).

You can easily work on and confirm that functions work as you iterate through your program, reducing the overhead of testing a new thing from other language "best cases" that usually involve unit testing (and if you're lucky, having a fast way to unit test just one thing vs all things), or if you're unlucky, having to just code for a long period of time and hope you don't hit a segfault.

To touch on the video game analogy - imagine playing one of the old Mega Man games, Dark Souls, or Battletoads. Now, imagine that instead of getting feedback (a death) that results in a hard restart at the beginning, you instead just get taken back one action / event that caused an error (a single failed function call in REPL). That's like loading a save state, and if you like easy mode, you'll love the Clojure REPL.

Thoughts and opinions?

So, what did you think? What did I leave out? Leave your comments below (or on HN or Reddit or Clojurians, or wherever else this may be cross-posted)!