String Title Case in Clojure


These days I like to write scripts for some tasks in Python instead of shell. One important reason I think that's because Python is powerful at string manipulation.

Recently I'm learning Clojure, and I'm trying to find similar ways in Clojure, one of them is s.title() for getting a title-cased version of a string. For example,

>>> ' Hello world'.title()
' Hello World'

How to do that in Clojure? To make the problem simple, let's assume that the input string only has letters and spaces, that is, [a-zA-Z ] in regex pattern.

Since I am not good at Clojure right now, a Python version helps clear my mind:

def title_case(s):
    "Return a title cased string for s"
    result = []
    prev_whitespace_p = True
    for c in s:
        if ' ' == c:
            result.append(c)
            prev_whitespace_p = True
        else:
            result.append(c.upper() if prev_whitespace_p else c)
            prev_whitespace_p = False
    return ''.join(result)

# >>> title_case(' hello world ')
# ' Hello World '

Here are a few ways I ended coming up.

The Iteration Way

As thinking in Python (and also in C/C++), the natural way to do it will be to iterate over the string character by character and see if the current one needs to be cast uppercase based on whether the character before it is whitespace, the code would be like:

(defn title-case-iteration
  [input-str]
  (let [prev-whitespace-p (atom true)
        result (atom "")]
    (run! #(reset! result
                   (str @result
                        (if @prev-whitespace-p
                          (do (reset! prev-whitespace-p (= \space %))
                              (if (= \space %) % (Character/toUpperCase %)))
                          (do (reset! prev-whitespace-p (= \space %))
                              %))))
          input-str)
    @result))

(title-case-iteration " hello world ")
                                        ; => " Hello World "

As you can see, the code is somewhat complicated, and it also takes me the most time to write it. Here is why:

  1. It's not common to use states in Clojure. We can define a variable by (def foo "hello"), but how to change its value after that? It is easy to do that in Python by re-assigning the variable, foo = "another string".

    I searched the Internet with different keywords and looked up in the book "Living Clojure" but failed to a solution.

    Then @jaju on Clojure Slack kindly pointed me to use atom, so that there is many atom, @foo, and reset! there following that lead.

  2. You may have noticed the unusual run!.

    At first, I use map at the exact place, but it didn't work. Finally, after some searching, I found the problem: map returns a lazy sequence, and since there is no one to use the result (because I only want the side effect to update atoms), so the #() lambda function is not called at all!

    The solution is to use run! to make the side effects happen, and its signature is similar to map.

After I posted the above snippet at the channel, @jaju also helped refactor my code, here is his concise version:

(defn title-case-iteration-2 [input-str]
  (let [prev-whitespace-p (atom true)
        result (atom "")]
    (run! (fn [c]
            (reset! result 
                    (str @result
                         (if @prev-whitespace-p
                           (Character/toUpperCase c)
                           c)))
            (reset! prev-whitespace-p (= \space c)))
          input-str)
    @result))

The Reduce Way

@jaju also pointed to use reduce, together with a value to carry on states.

At first, I was confused about how to get the result string because the initial value has two parts: one is the previous character state (whether whitespace or not), the other is the initial result string.

But it didn't take me too long to figure it out by using first or second to get it from the state.

This version is like:

(defn title-case-reduce
  [str]
  (first (reduce #(if (= \space %2)
                    [(str (first %1) %2) true]
                    [(str (first %1) (if (second %1)
                                       (Character/toUpperCase %2)
                                       %2)) false])
                 ["" true]
                 str)))

@jaju also reviewed my code, and gave another reduce solution by using cond:

(defn title-case-reduce-2 [s]
  (->> s
       (reduce
        (fn [[prev accum] c]
          (cond
            (= :begin prev)
            [nil (Character/toUpperCase c)]

            (= \space c)
            [c (str accum prev)]

            (= \space prev)
            [nil (str accum prev (Character/toUpperCase c))]

            :else
            [nil (str accum c)]))
        [:begin ""])
       second))

;; You can also choose to use clojure.string/capitalize instead of Character/toUpperCase

The Idiomatic Way

While I was struggling in implementing it in the iteration way, I also looked at the idiomatic way.

Following the code in a gist, it took me the least time to write this idiomatic version:

(defn title-case-idiomatic
  [str]
  (clojure.string/join " "
                       (map #(clojure.string/capitalize %)
                            (clojure.string/split str #" +"))))

(title-case-idiomatic "    hello   world   ")
                                        ; => " Hello World"

Since this version doesn't walk the string character by character, it may not work as you expect in some corner cases, such as the string has leading or trailing whitespaces, as you may have noticed in the above output.

The regex Way

@jaju also provides a version using regex patterns, it perserve spaces nicely:

(defn title-case-regex [s]
  (->> (clojure.string/split s #"(?<=\s)|(?=\s)")
       (map clojure.string/capitalize)
       (apply str)))

(title-case-regex "  hello    world  ")
; => "  Hello    World  "

Summary

Along the journey of implementing this simple function, I learned a few things:

  1. Clojure favors immutability, you have to use atom explicitly to define mutable states, and the company functions stand out by having a ! mark at the end of their names.

  2. It's harder to write code using iteration and states than using map and reduce. map and reduce are more idiomatic and works at a higher level of abstraction.

    On the other side, although iteration may be more efficient, it may not deserve the extra time you invest in. "Premature optimization is the root of all evil.", said Donald Knuth.

    For most of the time, "The Idiomatic Way" should be the choice.

  3. If you want to keep a state while iterating a sequence, first think about using reduce, it can pass states along the iteration.

  4. Pay attention to the side effects, especially along with lazy sequences. Clojure has special functions for this purpose, in addition to run!, there are also doseq, dorun, and doall.

At last, big thanks to @jaju for the inspiration and help!


See also

comments powered by Disqus