Rewrite of a Flask Web App in Clojure


Published: 2025-02-22, Last Updated: 2025-03-10

Intro

A few years ago, I made a simple web app in Flask to deal with some text processing problems from my daily work. It has two main features:

  • Feature #1: generating compile_commands.json for GNU Makefile projects written in C/C++ using the output of the make command. Because, unlike CMake, the make command can't generate it.
  • Feature #2: extract text using Python regex. It's handy when I feel like sed/awk/grep's line-oriented processing isn't enough for the task at hand.

Recently, I was thinking about adding some functionalities to it. But since I've put more effort into Clojure these days, I'd like to use Clojure to do it. So, I spent a few nights rewriting the web app in Clojure.

In this post, I am going to share the journey and some thoughts on it.

Photo by Oskar Yildiz on Unsplash
Photo by Oskar Yildiz on Unsplash

The Flask Stack

First and foremost, let's take a look at the "old" stack. The backend service was written using Flask, and Flask uses the Jinja template library to render HTML to the frontend.

The backend stack was simple; nothing special. But for the frontend, I used a hybrid stack: Backbone.js and jQuery. Backbone.js was used to write logic in the MVC way, and jQuery helped me manipulate the DOM easily.

For CSS, I mainly used Bootstrap to style UI elements.

Rewrite Plan

In order to get it done as soon as possible, first, I needed to come up with a plan with executable tasks and prioritize them. The main idea is to move the whole "ecosystem" to Clojure and ClojureScript and then deploy the app to production as soon as possible. After that, I can gradually refactor remaining pieces and even add new features to it in small and quick iterations.

So here was the plan:

  1. Think about what Clojure and ClojureScript technologies to use.
  2. Migrate the GET endpoint of feature #1.
  3. Migrate the POST endpoint of feature #1.
  4. Migrate feature #2 and remainings.
  5. Compare the performance of the two.
  6. Deploy to production

Clojure(Script) Stacks

The Clojure community has roughly two flavors to do web development; some folks prefer to use frameworks like Luminus or Biff, while others would rather roll up their sleeves and build the whole app using different libraries like playing Lego. For me, I'm more into the latter, which lets me have more control and understand the technologies better.

In the end, here were the stacks I came up with.

Backend:

On the backend, since I must use Python's re.findall function to support Python regex, I decided to keep all POST handling logic in Python temporarily. And for the frontend side, I would like to go down a little deeper and use ClojureScript + Reagent to render the UI elements on the client side, but keep the Backbone.js logic as is and spit it out in strings from the GET Ring handlers.

Frontend:

Development

The most beautiful thing about Clojure development is interactive development, which means we can leverage a REPL to quickly verify our thoughts, try out language features and libraries, etc. Here, not only does it mean that we can define a function or call a function in an interpreter shell of a programming language, but it also means we can, for example, evaluate any forms anywhere.

For example, we can send over the preceding form to the REPL and get the result immediately when the cursor is at anywhere in the following code snippet, be it position 1, position 2, or position 3:

(*
 (+ 1 2)                                ; position 1
 (- 5 4)                                ; position 2
 )                                      ; position 3

Another big advantage of using Clojure is that we can use Clojure(Script) for both the backend and the frontend; all the code can reside within the same project, and we can even share some logic in common files with the .cljc file extension.

With Clojure and ClojureScript existing in the same project, ideally, I can bring up two REPLs for Clojure and ClojureScript in one go with CIDER in Emacs using the command M-x cider-jack-in-clj&cljs, but it didn't work as expected, and I failed to figure it out in a short time. Instead, in order to accomplish my major task, migrating to the Clojure ecosystem as soon as possible, I suppressed my impulse to nail it down and just started up two REPLs separately, that is, running M-x cider-jack-in-clj and then M-x cider-jack-in-cljs.

In order to quickly convert HTMLs to the hiccup forms, I intended to use some help from conversion tools. There are a bunch of them out there, as listed on Converting html to hiccup · weavejester/hiccup Wiki.

Finally, I used https://html2hiccup.dev/. It's simple and intuitive to use and overall gets the task done, except there were a few corner cases relating to using them as Reagent components (It seems like these are differences between Reagent and hiccup):

  1. Incorrect capitalizations. Some noticeable examples are:

    • :autofocus "" should be :autoFocus true
    • :readonly "" -> :readOnly true
    • autocomplete "on" -> autoComplete "on"
  2. style's value is a string, which is incorrect for Reagent. E.g. {:style "display:none;"} -> {:style {:display :none}}
  3. Whitespaces around DOM elements are eliminated.

    [:h1
     "Generate"
     " " ; It should have a space before and after [:i]
     [:i "compile_commands.json"]
     " "
     "from GNU make output online"]

Update:

Besides, there are also some editor-based solutions:

  1. An Emacs package: https://github.com/kwrooijen/hiccup-cli
  2. A VSCode extension: https://calva.io/hiccup. Thanks to Peter Strömberg for telling me over Slack that, calva can handle the above style issue properly with custom config in settings.json:

    "calva.html2HiccupOptions": {
      "mapify-style?": true,
      "kebab-attrs?": true,
      "add-classes-to-tag-keyword?": true
    },

A libpython-clj Problem

The development process went smoothly until I needed to return a list of dictionaries in Python for Cheshire to consume on Clojure side.

This was my first time using libpython-clj, and I initially thought it would perfectly be able to return complex Python data structures to Clojure. After asking for help in the libpython-clj-dev channel on Zulip, James Tolton kindly pointed out to me it's better to encode the result as a JSON string, which is compatible with Clojure strings, and the performance is better.

For the record, here is the issue detailing the problem on GitHub, you can follow up there if you're interested.

Performance

After finishing the rewrite, it's a good chance to conduct some load tests to have a basic understanding of the performance of the Flask app and the Clojure one.

Here I would like to use Apache Bench (ab), a simple benchmarking tool from Apache, to do the job. I used ab to establish 100 connections and send 10K requests for GET and POST handlers, respectively. Both server processes and ab ran on the same machine with 16GB memory and an 8-core 2.80GHz CPU.

GET Handlers

First, let's test a GET handler with command ab -n 10000 -c 100 $url, and run it twice by replacing $url with each other's URL in turn.

Below are the distilled outputs for both tests. We can see that Clojure's time per request is 156ms vs Flask's 126ms, while Flask's response is much bigger (2638 bytes vs. 7256 bytes), which indicates that Flask's performance is much better.

######################### Clojure
$ ab -n 10000 -c 100 http://localhost:3000/
...
Document Path:          /
Document Length:        2638 bytes
Concurrency Level:      100
Time taken for tests:   15.610 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      27670000 bytes
HTML transferred:       26380000 bytes
Requests per second:    640.62 [#/sec] (mean)
Time per request:       156.100 [ms] (mean)
Time per request:       1.561 [ms] (mean, across all concurrent requests)
Transfer rate:          1731.04 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    2  44.9      0    1005
Processing:     5  153  41.0    153     270
Waiting:        4  153  41.0    153     270
Total:         11  155  61.4    153    1195


######################### Flask
$ ab -n 10000 -c 100 http://localhost:8000/
...
Document Path:          /
Document Length:        7256 bytes
Concurrency Level:      100
Time taken for tests:   12.557 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      74310000 bytes
HTML transferred:       72560000 bytes
Requests per second:    796.39 [#/sec] (mean)
Time per request:       125.567 [ms] (mean)
Time per request:       1.256 [ms] (mean, across all concurrent requests)
Transfer rate:          5779.25 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.6      0      11
Processing:    17  124  11.7    122     204
Waiting:        3  111  11.6    109     191
Total:         17  124  11.5    122     204
...

POST Handlers

Second, for POST handlers, I ran a slightly different command to post JSON data: ab -n 10000 -c 100 -T 'application/json' -p ~/tmp/texttoolkit-ab-post.json $url.

With the distilled outputs below, we can see that both response lengths are almost the same, and similarly as above, Clojure's time per request is much higher (161ms vs vs. 91 ms) than Flask's.

######################### Clojure
$ ab -n 10000 -c 100 -T 'application/json' -p ~/tmp/texttoolkit-ab-post.json http://localhost:3000/compilation-database-generator
...
Document Path:          /compilation-database-generator
Document Length:        710 bytes
Concurrency Level:      100
Time taken for tests:   16.125 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      8310000 bytes
Total body sent:        6290000
HTML transferred:       7100000 bytes
Requests per second:    620.17 [#/sec] (mean)
Time per request:       161.246 [ms] (mean)
Time per request:       1.612 [ms] (mean, across all concurrent requests)
Transfer rate:          503.28 [Kbytes/sec] received
                        380.94 kb/s sent
                        884.23 kb/s total

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.8      0      13
Processing:     7  160  42.2    160     283
Waiting:        7  160  42.2    160     283
Total:         17  160  42.1    160     283
...


######################### Flask
$ ab -n 10000 -c 100 -T 'application/json' -p ~/tmp/texttoolkit-ab-post.json http://localhost:8000/compilation-database-generator
...
Document Path:          /compilation-database-generator
Document Length:        711 bytes
Concurrency Level:      100
Time taken for tests:   9.144 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      8770000 bytes
Total body sent:        6290000
HTML transferred:       7110000 bytes
Requests per second:    1093.59 [#/sec] (mean)
Time per request:       91.442 [ms] (mean)
Time per request:       0.914 [ms] (mean, across all concurrent requests)
Transfer rate:          936.60 [Kbytes/sec] received
                        671.74 kb/s sent
                        1608.34 kb/s total

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.3      0       4
Processing:    32   91   4.2     91     110
Waiting:        1   79   4.1     79      99
Total:         33   91   4.0     91     112
...

Performance Recap

The above data indicated that the Clojure version lagged behind the Flask one in performance, and I didn't see that coming.

But since it doesn't have too much traffic at the moment, it was ok for me to just move on.

Deployment

Time to ship to production, finally!

Leiningen really shines here; all I have to do is run a simple command lein ring uberjar (provided by the lein-ring plugin), and it will produce a single jar file for deployment. After uploading it to the production, I can start the server with java -jar texttoolkit-0.1.0-SNAPSHOT-standalone.jar.

Why not use deps.edn? You may ask.

Well, I noticed that people are moving towards deps.edn these days, but sometimes I feel like I just can't figure out how to get the job at hand done in a short time. It's not that it can't accomplish the task; it's just that it may take much more time to nail it down. In fact, I did start the project with deps.edn but fell back to leiningen later.

Smoothing Traffic Migration. If I were to deploy this app with huge traffic, I would need to be really careful and think about a perfect solution to serve users uninterrupted. But this app doesn't have too much traffic yet; still, I would like to take cautions so that a user's chance to notice the downtime is as low as possible.

So I first got my Clojure app up by firing java -jar texttoolkit.jar and verified if it worked perfectly by running curl against the Ring handlers. Unfortunately, I ran into a few issues.

1. Huge Startup Time

The first problem I encountered was that it took nearly one minute to get the process up, which I totally didn't anticipate. Only after that, I came to notice that it also took around 25 seconds to get up in my dev machine.

For comparison, the Flask app starts up instantly.

It looked like a problem related to machine capacity, anyways, I decided to postpone troubleshooting until I have enough time in the future. I will come back to it, dig deeper, and find out what the culprit is.

2. Huge Memory Footprint

At first, I didn't pay much attention to memory footprint during development. On production, I saw that the memory it consumed at startup was ~70MB, which was much higher than Flask's ~23MB, even though it was not a big deal.

# Pay attention to the RSS column.
# Clojure
$ ps aux
USER         PID %CPU %MEM     VSZ   RSS TTY      STAT START   TIME COMMAND
texttoo+   13537  227  6.9 2328144 71332 ?       Sl   22:06   0:04 java -jar texttoolkit-0.1.1-SNAPSHOT-standalone.jar


# Flask
$ ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
texttoo+   21896  3.6  2.2 224704 23356 pts/0    S+   13:32   0:00 python3 /path/to/venv/bin/gunicorn app:app --name texttoolbox --workers 3 --user=texttoolbox --group=texttoolbox --bind=localhost:1234 --log-level=error --log-file=-

And my jaw dropped after I checked it out 2 weeks later; it consumed nearly 350 MB of my limited memory, check this out:

$ ps aux
texttoo+   13537  0.2 34.0 2834144 348000 ?      Sl   Feb17  27:46 java -jar texttoolkit-0.1.1-SNAPSHOT-standalone.jar

It seems like this is a well-known problem for Java; you should pay attention to it if you're running Clojure or Java programs in a memory-constrained environment: Running Clojure programs on constrained memory - /dev/solita. But for now, it didn't really bother me, so I just moved on.

Update:

didibus in the Slack thread kindly told me that the memory consumption is due to the thread pool in JVM, and I can set a memory cap with -Xmx and customize the pool size. These are really insightful, and I'll definitely look into it.

3. Resource Paths for libpython-clj

Another problem arose in production when Clojure imported Python modules, which it had no problem at all in the development environment. It was a ModuleNotFoundError exception:

Caused by: java.lang.Exception: ModuleNotFoundError: No module named 'cdg'

It clearly was an error when it was loading a Python module, and the code for that looked something like this:

(ns texttoolkit.handler
  (:require ; ...
            [libpython-clj2.python :as py]
            ; ...
            ))

(py/initialize!)
(py/run-simple-string (str "import sys;"
                           "sys.path.append('" (io/resource "private/py/") "');"
                           "print('python sys.path:', sys.path)"))

;; import a python module as a Clojure namespace
(py/import-as "cdg" cdg)

It seemed like the path provided by clojure.java.io/resource wasn't effective, and I quickly verified that by finding out that (io/resource "private/py/") actually translated to jar:file:/path/to/texttoolkit/target/texttoolkit-0.1.0-SNAPSHOT-standalone.jar!/private/py/, which by no means was a legit directory path.

But I failed to Google useful information for this; in the end, I decided to work around it by copying these .py files out to the OS file system and then loading Python modules from there.

Monitor the Process and Then Make It Live.

In order to recover from possible crashes, I used supervisord to monitor the Java process. If something bad happens, it will bring up another process to serve requests.

Finally, I can make it live now. So I changed the upstream settings in the conf and reloaded the Nginx config—easy peasy!

Pros and Cons

To recap, here are the pros and cons of Clojure development that I learned along the way.

Pros:

  1. Development experience matters; it is fun to explore and write Clojure code interactively.
  2. The code for the backend and the frontend was both written in Clojure syntax; they are consistent and easy to write; for the common logic, we can truly write once, run everywhere.
  3. Plus, some logic that used to run in the backend now can be used to run in the frontend with a tiny effort.
  4. With Reagent and React, now I can do frontend dev in a much simpler way.
  5. Easier to deploy with an uberjar.
  6. People are really nice in the Clojure community, don't hesitate to reach out if you have some problems or ideas.

Cons:

  1. I would say one of the biggest cons is, to my surprise, a Clojure process (JVM) eats a lot of memory.
  2. Huge startup time.
  3. The performance is not good as Flask. I hope I've made some mistakes here, and I will definitely come back to this in the future.

Outro

That's it. In this blog post, I reflected in detail on the whole rewrite journey, from making a plan to production deployment at last. And I would like to keep exploring this web app on top of the Clojure ecosystem.

In the meantime, it also raised some questions for me to explore and answer, like troubleshooting and exploring the performance of Ring handlers, learning how to improve the startup time of Clojure processes, and mastering more Java basics for Clojure.

If you find something interesting or have related experience you've gone through, or anything, please feel free to comment below. Thank you for reading!


See also

comments powered by Disqus