American in Spain

XML Renderer in Clojure

September 8, 2009

clojure logoI've spent the past few days playing around with Clojure. Clojure is an implementation of Lisp, the most powerful programming language, that compiles to byte code that runs on the Java Virtual Machine. I won't go into just how awesome that is, but there are many technical reasons why this platform decision is equivalent to standing on the shoulders of giants.

Clojure comes with a built-in library for parsing XML files into Clojure data structures, but, for the life of me, I could absolutely not find any implementations that went the other way, to render XML from the Clojure structure that the default parser creates. So I wrote 25 lines of code. Update: I did find a function in the clojure contrib lazy-xml.clj that will emit XML nodes to a stream, but it's (gasp!) not remotely functional.

(def *always-open* #{:div :script :textarea}) (defn render-attributes [attributes]   (when attributes     (apply str       (for [[key value] attributes]         (str \space (name key) "=\"" value \"))))) (defn render [node]     (if (string? node)       (.trim node)       (let [tag (:tag node)             children (:content node)             has-children? (not-empty children)             open? (or has-children? (contains? *always-open* tag))             open-tag (str \< (name tag)                         (render-attributes (:attrs node))                         (if open? \> "/>"))             close-tag (when open? (str "</" (name tag) \>))]         (str           open-tag           (apply str (when has-children?                        (for [child children]                          (render child))))           close-tag)))) There is a little extra HTML-specific logic in there to not close

So if you have an HTML file that looks like this... <?xml version="1.0" encoding="ISO-8859-1"?> <html>  <head>   <title>Testing Title</title>   <style type="text/css">    .some-class { font-weight: bold; }   </style>   <script type="text/javascript" src="myjs.js"></script>  </head>  <body>   <p id="message" class="some-class">    This is a totally awesome test!   </p>  </body> </html> And you run the following command at the REPL... (println (render (clojure.xml/parse "index.html"))) should get this back: <html><head><title>Testing Title</title><style type="text/css">.some-class { font-weight: bold; }</style><script type="text/javascript" src="myjs.js"></script></head><body><p id="message" class="some-class">This is a totally awesome test!</p></body></html> While valid XML, it'd be nice to have it prettily formatted. To do this, we must add a little complexity to keep track of depth and indentation.

(def *always-open* #{:div :script}) (defn render-attributes [attributes]   (when attributes     (apply str       (for [[key value] attributes]         (str \space (name key) "=\"" value \"))))) (defn render   ([node] (render node 0 false))   ([node pretty?] (render node 0 pretty?))   ([node depth pretty?]    (let [indent (when pretty? (apply str (repeat depth "  ")))]      (if (string? node)        (str indent (.trim node) (when pretty? "\n"))        (let [tag (:tag node)          children (:content node)          has-children? (not-empty children)          always-open? (contains? *always-open* tag)          open? (or has-children? (contains? *always-open* tag))          open-tag (str indent \< (name tag)                     (render-attributes (:attrs node))                     (if open? \> "/>"))          close-tag (when open?                      (str (when (not always-open?) indent)                        "</" (name tag) \>))]        (str          open-tag          (when (and pretty? (not always-open?)) "\n")          (apply str (when has-children?                  (for [child children]                    (render child (inc depth) pretty?))))          close-tag          (when (and pretty? (> depth 0)) "\n"))))))) This has bumped us up to 34 lines of code with proper formatting. Now if we call: (println (render (clojure.xml/parse "index.html"))) We still get the same unformatted html back because it defaults to non-pretty formatting. But if we request pretty formatting... (println (render (clojure.xml/parse "index.html") true)) We get back this: <html>   <head>     <title>       Testing Title     </title>     <style type="text/css">       .some-class { font-weight: bold; text-align:inherit; }     </style>     <script type="text/javascript" src="myjs.js"></script>   </head>   <body>     <p id="message" class="some-class">       This is a totally awesome test!     </p>   </body> </html> Perfect!

What's amazing is that, after a few days of working with Clojure, and a very small background in Lisp, this code is perfectly readable. I could not figure out a way to go through the tree with Clojure's tail recursion idiom, so I had to use stack recursion. For most XML files, stack recursion is going to be just fine.

I am still not convinced that Clojure deserves a place in any of my business applications, but it definitely has the best shot of any Lisp dialect I've seen, mainly because of its trivial interoperation with Java.

If any Clojure ninjas out there would like to suggest improvements to my algorithm, I'm all ears.