ClojureScript, Core.Async, and Om

Thursday, February 20, 2014

Summary: I combined ClojureScript, Core.Async, and Om (React.js), and then optimized based on benchmarks. Sample Page

While playing with an older iPad, David Nolen's 10,000 dom updates with core.async example came to mind. But, when I opened the page, the browser just didn’t run the sample code well at all. I knew ClojureScript worked perfectly well on mobile browsers, so I thought it might be the DOM manipulation itself. The source on the page didn’t show any obvious requests for animation frames. So, I thought, if I started synchronizing the DOM manipulations around animation frames it might work.

After some research, I found that Facebook's React.js will synchronize all DOM manipulations, performed on it’s virtual DOM, with animation frames automatically. So, instead of trying to write manual synchronization code, I decided to push all the rendering to React.js. I chose to use the Om library, which wraps React.js, because of it’s nice ClojureScript interface. (Interestingly, David Nolen is responsible for Om as well as the original Core.Async example.)

Setup


All of the following tests were performed in Google Chrome 32.0.1700.107, on a Macbook Air running OSX 10.9. And, each test consisted of multiple runs, but I am only showing a single “good example” for brevity. Each test builds on each other, in the order they are shown.

Project Configuration



One Channel To Rule Them All



Baseline:
200 Frames
Min 90.257 ms
Avg 113.301 ms (9 FPS)
Max 162.476 ms
Standard Deviation 14.002 ms
Function Calls 14.12 seconds (60.03%)

Removing Source Maps:
200 Frames
Min 86.879 ms
Avg 112.068 ms (9 FPS)
Max 169.367 ms
Standard Deviation 15.046 ms
Function Calls 13.87 seconds (56.66%)

I didn’t expect any change here, and the numbers show that, but I thought it could be a variable. So, I removed them from the test project.

Compilation with Advanced Optimizations:
200 Frames
Min 58.056 ms
Avg 81.560 ms (12 FPS)
Max 150.629 ms
Standard Deviation 16.589 ms
Function Calls 7.62 seconds (46.10%)

Compiling ClojureScript with “Advanced Optimizations” enables all the optimizations available to the Google Closure compiler. In this mode I did need to change the project.clj file to include the React.js extern file, but the test code did not change. Similar to a C-style header file, the extern file informs the compiler what functions it can, and cannot, optimize. It makes a big difference in both size and, in this case, performance. This test really shows why you should turn on advanced optimizations when compiling for production.

Many Channels To Rule Them All



One Channel per Table Cell:
200 Frames
Min 17.239 ms
Avg 58.557 ms (17 FPS)
Max 109.993 ms
Standard Deviation 24.080 ms
Function Calls 4.31 seconds (35.47%)

The previous architecture used a single channel and had all the table cells read from it. It was the simplest thing I could think of originally. But, I decided to better mimic the original example and move to a channel per cell arrangement. This architecture, while improving the frame rate, tripled the average heap size, from ~8mb to ~23mb.



And, a closer look at the animation frame timing shows the total time between frames is now being dominated by the random delay between updates, instead of actual computation.

Tuning the Random Delay:
200 Frames
Min 17.582 ms
Avg 36.219 ms (28 FPS)
Max 64.888 ms
Standard Deviation 8.091 ms
Function Calls 4.43 seconds (59.62%)

After tuning the random delay limit, from a maximum of 100 milliseconds, down, to a maximum of 20 milliseconds, I achieved close to an average of 30 frames per second. I wanted to maintain some level of randomness between the updates, so I did not completely remove the delay.

Code Cleanup



Some Code Cleanup:
200 Frames
Min 16.051 ms
Avg 34.972 ms (29 FPS)
Max 57.388 ms
Standard Deviation 7.898 ms
Function Calls 4.33 seconds (60.82%)


Over the course of the tests, I noticed the Chrome developer tools, while profiling, appeared to degrade the overall javascript performance, so all the above absolute times should be taken as “under test” numbers. The animations appeared to run faster without the Chrome developer tools enabled.

Circling back to my original goal of running David Nolen’s demo, my modified version appears to run at a good speed on iOS 7 Mobile Safari and the Android version of Google Chrome. When I ran my code on the iOS version of Google Chrome, the performance still suffered. I can only assume Mobile Safari is cheating.

Update


Upgrading to Om 0.5.0-RC1:
200 Frames
Min 26.105 ms
Avg 84.024 ms (12 FPS)
Max 148.848 ms
Standard Deviation 29.089 ms
Function Calls 11.88 seconds (69.43%)

While writing this post, Facebook released version 0.9.0-RC1 of React.js. I reran the last test with the updated version to check for breakage, and found a large performance degradation. It is a release candidate, so I would expect the final release version to improve.

Centering a Div Tag with Physics, Instead of CSS

Friday, February 07, 2014

Summary: I combined a Javascript CSS parser and a Javascript physics engine to layout DOM elements with physics.

After reading yet another guide to layout with CSS, and yet another flame-war over CSS’s flaws, I just wanted to ditch CSS entirely. Tex has had a box and glue model, that has functioned well, for decades in print, but the tooling looks unusable for rendering webpage layouts requiring dynamic flexing and reflowing. So, why can’t I just connect two DOM nodes with a spring and have it flex as the the document resizes? (And, this is 2014. Where is my flying car?)

Using the real physical laws of motion, that people expect out of real world objects, actually makes some sense when you think of a document as in motion. CSS grids, using frameworks such as Twitter Bootstrap, have different layout configurations for ranges of different window sizes, based on CSS media queries. But, you can also see it as a page squeezing and stretching based on the user’s window size, with the DOM nodes connected by springs, colliding as the window is squeezed and stretching apart as the window is stretched. So, I built a Physics-based Page Layout system (Live demo).

Conceptual Springs
I am, currently, using a modified physics engine, originally built by by jonobr1, but there are numerous other engines to choose from. If I get the urge, I may swap it out for one that does continuous collision detection. This would help layouts squeeze correctly in response to other nodes, but would require a better abstraction around the current physics engine.


All the DOM nodes mentioned in a selector, or a declaration, are lazily assigned a particle group, containing one particle for each side (top, right, bottom, left) of a box. If the node is only mentioned inside a declaration, the particles are fixed in position, relative to the DOM node. This allows nodes to be used as anchors, without taking them out of the normal document flow. If the node is mentioned in a selector, the node is taken out of the document flow (absolute positioning) and given internal springs connecting each edge particle in a diamond pattern (top to right, right to bottom, bottom to left, left to top). This allows the particles, and thus the node, to flex in response to external springs. It would be interesting to conditionally disable those internal springs to provide partially-rigid boxes. But, that is not currently in the code base.

Originally, I kept the physics animation, of the springs flexing, as the system sought equilibrium. This behavior was fun to watch, but it resulted in terrible performance for document layout. So, now, it runs a fixed number of steps in time (500) without updating the DOM, and repeating as necessary to reach equilibrium. It might be nice to be able to conditionally re-enable the animation, as a way to assist debugging complex layouts, but that is not yet a priority.

After building this system, I think I now have a better understanding of the complexities around document layout. The current implementation is not much more than a proof-of-concept. To make it usable would require adding rigid connections, one-dimensional springs, minimum and maximum spring lengths, layers, cross-layer connections, shorthand declaration syntax, better performance, and more. So, for right now, it is just a neat proof-of-concept.

Hangout Fishbowl

Friday, January 24, 2014

Summary: I built an out-of-band invite tool, for Google’s Hangout On Air, based on the Fishbowl discussion format.

Around the beginning of January, I wandered into an online book club utilizing Google Hangouts On Air. The normal attendance was greater than the Google Hangout limit of ten people, so they turned on Hangouts On Air for the people who couldn’t get into the discussion. They allocated the nine available slots to first nine to join the Hangout. The rest were left to watch, and chat, amongst themselves, as the Hangout chat and viewer chat are separate. There was always shuffling to get into the Hangout, and a percentage of the group was always left out.

I realized, after some thought, there is a discussion format that allows total group participation and restricts the number of people speaking at any one time. The Fishbowl conversation format could be mapped onto a Hangout On Air. Every Open Spaces conference i’ve attended has featured a Fishbowl demonstration, but I had never found a use for it. Being optimized for Google+ integration, the Hangout invite system would not work smoothly for swapping in members of the audience dynamically. So, I built the Hangout Fishbowl. (The icon needs as much work as everything else.)



The application is built around two interfaces, a custom site for “viewers” and a Hangout extension for the Hangout On Air broadcaster. This allows a viewer to sign in, with Google+, to the custom site and be seen by the broadcaster, directly invited by the broadcaster, and request an invite to the active Hangout. It is designed to be realtime, so there are no emails, Google+ notifications, or privately shared posts. The broadcaster clicks a link, the viewer gets a popup, and the viewer joins the conversation.




Behind the scenes, the link between the viewer site and the Hangout extension is maintained by a simple Node.js, Socket.IO, and Redis backend.

Currently, the Hangout Fishbowl MVP is a failure. No groups are using it, and the friction involved with using it, on top of Google+ events and Hangouts, is still greater than manually adding people. Perhaps, in a few weeks, I will cycle back to it and iterate on the idea.

Infinite Lorem Ipsum with Markov chains

Monday, December 16, 2013

Summary: You can train a Markov chain on Latin text and build an infinite Lorem Ipsum string.

There are hundreds of websites, apps, plugins, and test pages utilizing to the semi-coherent, utterly incomprehensible, and much loved Lorem Ipsum text. After using it a large number of times, to fill gaps in preliminary designs, I took a moment to stare into it. In the middle of a late-night coding session, I stared into it, wondering if I too could build something of such infinite semi-coherence.

According to a Straight Dope post, the main source for the Lorem Ipsum Wikipedia article, the semi-coherent text is based on selections from the De finibus bonorum et malorum text by Cicero. Since my goal was an infinite string of semi-coherent text, the simplest solution is to train a Markov chain with Cicero’s text. Then, I could lazily pull as many words as I needed. They were still valid Latin words, but letter-order scrambling could be done on the resulting words.

I first became aware of using Markov chains for text synthesis with Garkov, a Garfield comic strip where the character’s dialog is replaced by a probabilistic model trained on genuine Garfield comic strips. I have never had a use for Markov chains in normal projects. Most of my efforts involve finding libraries and gluing them together (PostModern Programming style). And, I have never found a way to sneak such a fun way to generate semi-coherent text into a project.



The wikipedia page does a great job explaining the details behind Markov chains, so I will only include the following images visualizing the training and text synthesis. In this case I used the letters in the string “at noon you can” to train the model, and generate the string “at nou”.

Input

Generated Model

Generated Text

In this case, the function, that generates the probability hash-map, relies on Clojure’s treatment of strings as a Seqable collection of characters and the hash-map keys being of semi-arbitrary type. So, the code will work on a seqable collection of strings without modification.


Since the probability hashmap is only limited by the allowed types for hashmap keys, numbers, objects, other collections, and functions will also work.


Note: As long as the training list can be chained, without error, the generated model prevents invalid function chaining.

I’m not exactly sure how infinite probabilistic function chaining is useful, but it … um ... sounds cool.

Adventures in Validating Email Addresses with Instaparse

Saturday, September 07, 2013

Summary: The email address specifications are complex enough to require a full parser for proper validation.


It started simply enough. I wanted a function to validate the format of an email address.


The regex handled all the invalid cases I could think of. I just needed to test some valid addresses, to make sure I didn't have any false-negatives. So, I looked through the Wikipedia article on Email Addresses.


OK. I've covered those cases.



I didn't know TLDs were valid domains. But, the regex had that case covered.



Really? I guess that makes sense. I just needed to add a few characters to the regex.



Quoted strings?



That's valid?!



Email addresses can have comments?! Alright. Screw regex. I'm using the EBNF hammer.



Note: I'm using some of the PEG extensions for Instaparse, so this is neither pure EBNF nor a pure CFG.

With something this complex, I wanted a more extensive test suite to validate my validator. I found just that in Dominic Sayers’s project is_email(), which was built to solve the same problem in PHP. In the process, he created a test suite covering invalid, deprecated, standard specific, and other syntax cases. It looked like a good metric. And, at the time of writing, the above grammar identifies 74% of valid test cases as valid, and 100% of the invalid test cases as invalid. Since this is meant as a generic format validation function, I consider any test in the “ISEMAIL_ERR” category as an “invalid” case, and all other categories as “valid” cases, including deprecated syntax and length restrictions.

While writing my own grammar, I also ran across a similar post by George Pollard done with Ruby, in 2009. While I didn’t use any of his grammar, it’s nice to know i’m on the right track. Because, while you can use regex, you shouldn’t use regex to parse, or validate, an email address.

Improving the Clojure-Git Interface with a Nice Facade

Monday, September 02, 2013

Summary: A more composable Git interface can be built with a facade that implements standard Clojure interfaces.

In a previous post, I used clj-jgit to interact with a local Git repository. The functions, and general workflow, matched what I would have performed at the command line (git-add followed by git-commit). The workflow made it very easy to get started, and is preferable to using jgit directly, but, to me, it didn’t feel very Clojure-like. It felt like Bash.

While I was using git-add and git-commit, I wanted conj. A Git branch can almost be imagined as a very-persistent strongly-ordered hashmap. Every commit is addressable by a “key”, and has a sequence of commits behind it. I should be able to get commits, map and reduce over them, and conj on a new ones using the built-in clojure functions.

Associative, IFn, IObj, Oh My!
Dig into the Clojure (JVM) internals and you will quickly find a list of common interfaces for everything from metadata to data structure access. I only wanted the behavior of a hashmap, so my implementation list, after much exploration, shrunk to Associative, IFn, IObj, and Object. Associative accounted for most of the functionality, but did not provide map-as-function, metadata handling, or toString.

Proof of Concept


Usage


Some points of interest:

  • The idea of a staging area disappears because you can now construct a commit, as a hashmap, independently of Git.
  • The metadata, and commit information, is queried lazily, and not cached, because the repository could be changed from outside the application.
  • I have reused the TreeIterator from a previous post to allow commits without writing to the file system.

There is missing functionality (commit totals, changing branches, and a clean commit data format to name a few), but most of it was outside of my original goal of using conj to add a new commit. There are no technical hurdles to prevent those features in the future, but my application did not call for them. This might make for a nice feature to submit to the clj-jgit project, if I ever fill in the missing features.