« Back to article list

Round 2 - Clojure vs Blub Lang - Parallelism NOT Concurrency

Table of Contents

Round 2 - Clojure vs Blub Lang - Parallelism NOT Concurrency

Well, the first round was certainly interesting!

Lots of great samples of potential solutions for feature parity with the Clojure one were posted here:

https://old.reddit.com/r/programming/comments/cwgvxl/clojure_vs_blub_lang_parallelism/ and here: https://old.reddit.com/r/Clojure/comments/cwgvvi/clojure_vs_blub_lang_parallelism/

Unfortunately, the majority of them knowing (or with caveat included / knowingly) provided a concurrent solution, not a parallel solution.

What's the difference? Well, a concurrency problem is equivalent to:

You are working on a product feature, and hit a blocker. Soandso must complete an update to their API before you can finish one portion of your feature. Instead of sitting and twiddling your thumbs (non-concurrent) you begin to work on a different part of your feature, while you wait for theirs to finish. That's concurrency.

Parallelism would be more akin to killing two birds with one stone - you have 2 tasks you need to take care of - bringing a peer up to speed on some programming techniques, as well as delivering your feature. You pull your peer into a peer-programming session and manage to complete 2 tasks in the time it would have taken to complete the single task (now, substitute tasks for computations - that's parallelism).

While the simulation in the round 1 post (http://ahungry.com/blog/2019-08-28-Round-1-Clojure-vs-Blub-Lang-Parallelism.html) did lend itself closer to a waiting/idle problem (waiting on a remote HTTP request, aka a concurrency issue, not a computational bottleneck, and truly a concurrency problem as most implementors used an idling delay, aka sleep), the Clojure solution covered both aspects equally, while most the others only solve for concurrency and do nothing to assist with parallelism. Note - no extra thought went into creating the initial Clojure solution to ensure it handled concurrency and parallelism equally well - its a language freebie.

Breaking down the original Clojure solution

Lets do a snippet breakdown of the original:

(def my-data-log (atom []))

(defn fetch-data! [n]
  (Thread/sleep 1e3)
  (swap! my-data-log conj (str "Fetched record: " n)) n)

(defn get-data []
  (pmap fetch-data! (range 100)))

(time (doall (get-data)))
;; "Elapsed time: 4001.424277 msecs"

Now, lets slightly adjust it to ensure we do not have an idling block:

(defn burner
  "Burn up the CPU for almost exactly 1 second."
  []
  (def looped (atom 0))
  (def start-time (System/nanoTime))
  (while (< (System/nanoTime) (+ start-time 1e9))
    (swap! looped inc))
  looped)

(def my-data-log (atom []))

(defn fetch-data! [n]
  (burner)
  (swap! my-data-log conj (str "Fetched record: " n)) n)

(defn get-data []
  (pmap fetch-data! (range 100)))

(time (doall (get-data)))
;; "Elapsed time: 4431.369554 msecs"

Definitely within the same ballpark - no custom fixes required, and we put our loop/parallelism under some strong stress (an unconstrained while loop - aka, run as fast as the CPU can go).

Looking into some of the community solutions

Now, what would happen if we perform an equivalent refactoring of some of the solutions presented in the comments?

Javascript

const sleep = ms => new Promise(resolve => setTimeout(resolve, ms))
const range = n => [...Array(n).keys()]
const fetch = n => sleep(1000).then(() => `Fetched record: ${n}`)
Promise.all(range(100).map(fetch))
  .then(console.log)

The solution is wonderful - very succint, each part reusable, and it handles the concurrency simulation faster than my trivial Clojure version (1 second or so vs 4 seconds).

However, understand in a single threaded environment, a concurrent solution like this is effectively iterating over the elements in the array as fast as the CPU can go, checking each for "are you done yet?", and if so, resolving. Since this "are you done yet?" is set to answer yes, a second later, it would stand to reason 1 second is an appropriate resolution time for this solution (and my Clojure one at 4 seconds was actually inferior in this way - some of the Reddit comments had a more robust solution to ensure we did not run into a 4 second time by working around the pmap partitioning/chunking strategy it employs).

How will this implementation perform if refactored to use a burner function to eat up some heavy CPU?

// lets make a 'sleep' equivalent that resolves in 1 second but does work
const burner = () => {
  return new Promise((resolve, reject) => {
    let i = 0
    let start = Date.now()
    while (Date.now() < (start + 1e3)) {
      i++
    }

    resolve(i)
  })
}

const range = n => [...Array(n).keys()]
const fetch = n => burner().then(() => `Fetched record: ${n}`)
Promise.all(range(100).map(fetch))
  .then(console.log)

Oh no…it took almost 100 seconds on the dot! (it scaled completely linearly with the computation time/space).

How succint would the previous solution look if the new task at hand was to rewrite it to handle parallelism and solve this in the same order of magnitude as the Clojure solution? I think it would involve a lot of forking and child process spawning to have any hope at success, and I don't think all that plumbing code would allow the javascript to continue to appear quite so succint.

Alternate Clojure

A good solution on the /r/clojure subreddit was to use Clojure async (similar to Go channels) for a way to solve the problem without running into the builtin pmap chunking issues.

While this definitely requires more setup work than just using the shipped pmap, it ends up with a reusable pmap! call with a very similar interface to the built in pmap (so, the complicated part of parallelism is written once, and reused many times quite easily).

(ns reddit
  (:require [clojure.core.async :as a]))

(def my-data-log (atom []))

(defn fetch-data! [n]
  ;; (Thread/sleep 1e3)
  (burner)
  (swap! my-data-log conj (str "Fetched record: " n))
  n)

(defn pmap!
  ([n f xs]
   (let [output-chan (a/chan)]
     (a/pipeline-blocking n
                          output-chan
                          (map f)
                          (a/to-chan xs))
     (a/<!! (a/into [] output-chan))))
  ([f xs] (pmap! (.availableProcessors (Runtime/getRuntime)) xs)))

(defn work! [n]
  (pmap! n fetch-data! (range 100)))

But, if I were to change the Thread/sleep to the burner call from my top level sample - what would happen? Would the core async clojure tooling work with parallelism? or is it just a concurrency solution?

After benchmarking it - it turns it out did work just fine! (the original solution pre-burner change resolved in 1 second, post-burner resolved in 1.5 seconds)

Discuss

So - when you re-run your code from the first simulation with a "burner" function that simulates CPU constraints - did your solution to round 1 scale linearly or in the same ballpark as the first solution?