I hit a wall while getting Clojure running on OpenLDK, my Common Lisp JVM. It was taking almost three hours to get to a Clojure REPL prompt. Profiling showed that huge chunks of that time were spent in invoke-special, buried deep in SBCL’s PCL implementation, trying to sort through thousands of methods for <init>(). The JVM constructor. Every Java class has one, and in Clojure every function is its own class.

The problem is a fundamental impedance mismatch: CLOS does multi-method dispatch — it considers the types of all arguments when selecting a method. Java does single dispatch — only the receiver object’s type matters. OpenLDK maps Java classes to CLOS classes and Java virtual methods to CLOS generic functions, which means every Java method call pays the full cost of CLOS’s multi-dispatch protocol. For a handful of classes, nobody notices. For the ~3000 classes Clojure loads at startup, it’s a catastrophe.

The fix was to reach into the CLOS Meta-Object Protocol and replace the dispatch machinery for Java methods with something that actually matches Java’s dispatch model.

A Quick Tour of CLOS Dispatch

SBCL’s PCL (Portable Common Loops) implements the MOP’s dispatch protocol. When you call a generic function, PCL routes the call through a discriminating function — an internal function that PCL attaches to each generic function object. The discriminating function embodies a discrimination net (dfun), a caching structure that tries to fast-path calls based on argument types it has seen before.

Standard CLOS dispatch flow

On a dfun cache miss, PCL falls back to the full MOP protocol: compute-applicable-methods-using-classes examines the class of every specializable argument against every defined method’s specializers, the matching methods are sorted by specificity across all argument positions, and PCL builds an effective method — a compiled function that chains the primary and auxiliary methods together with call-next-method support. The result is folded back into the dfun so the next call with the same types is fast.

In steady state this works fine. The problem is what happens when the method set changes. Every time you add a method or define a new class, PCL calls update-dfun to rebuild the discrimination net from scratch. During Clojure bootstrap, we’re loading classes in a tight loop — the dfun never gets a chance to stabilize. Each class definition triggers update-dfun on every generic function that has methods specializing on that class or its ancestors. With thousands of methods on <init>(), each rebuild is expensive, and it happens thousands of times.

What Java Actually Needs

Java’s virtual dispatch is simple: look at the receiver object’s class, walk up the class hierarchy until you find a matching method, done. There’s no consideration of the other arguments’ types. This is textbook single dispatch, and it has a textbook optimization: a cache keyed on the receiver’s class.

That’s exactly what I built.

The java-generic-function Metaclass

The MOP lets you subclass standard-generic-function to change how dispatch works. I defined a new metaclass with two hash-table caches and a lock:

(defclass java-generic-function (standard-generic-function)
  ((dispatch-cache :initform (make-hash-table :test 'eq)
                   :accessor java-gf-dispatch-cache)
   (invoke-special-cache :initform (make-hash-table :test 'eq)
                         :accessor java-gf-invoke-special-cache)
   (cache-lock :initform (bordeaux-threads:make-lock "java-gf-cache")
               :reader java-gf-cache-lock))
  (:metaclass sb-mop:funcallable-standard-class))

The dispatch-cache maps a receiver class to a pre-built effective method function. The invoke-special-cache does the same thing for invokespecial bytecode — Java’s mechanism for calling a specific parent class’s method (constructors, super calls). The lock is there because OpenLDK can load classes from multiple threads.

Replacing the Discriminating Function

The MOP protocol for customizing dispatch is compute-discriminating-function. I specialize it on java-generic-function to return a lambda that does a hash-table lookup instead of walking the full MOP dispatch chain:

(defmethod closer-mop:compute-discriminating-function
    ((gf java-generic-function))
  (let ((cache (java-gf-dispatch-cache gf))
        (lock (java-gf-cache-lock gf)))
    (lambda (&rest args)
      (let* ((receiver (first args))
             (class (class-of receiver))
             (emfun (gethash class cache)))
        (if emfun
            (funcall emfun args)
            (let ((new-emfun (%compute-java-effective-method gf class)))
              (bordeaux-threads:with-lock-held (lock)
                (setf (gethash class cache) new-emfun))
              (funcall new-emfun args)))))))

On a cache hit, it’s two operations: class-of and a hash lookup. On a miss (the first call for each receiver class), it computes the effective method, caches it, and calls it. After that, every subsequent call for that class is O(1).

Fast dispatch flow

The cache miss path calls %compute-java-effective-method, which does call compute-applicable-methods-using-classes — but only once per (generic-function, receiver-class) pair between invalidations:

(defun %compute-java-effective-method (gf class)
  (let* ((lambda-list (closer-mop:generic-function-lambda-list gf))
         (nargs (length lambda-list))
         (class-list (cons class
                          (make-list (max 0 (1- nargs))
                                     :initial-element (find-class 't)))))
    (multiple-value-bind (methods definitive-p)
        (closer-mop:compute-applicable-methods-using-classes gf class-list)
      (declare (ignore definitive-p))
      (if (null methods)
          (lambda (args)
            (apply #'no-applicable-method gf args))
          (let* ((around (remove-if-not
                          (lambda (m) (equal (method-qualifiers m) '(:around)))
                          methods))
                 (primary (remove-if-not
                           (lambda (m) (null (method-qualifiers m)))
                           methods))
                 (chain (append around primary)))
            (if (null chain)
                (lambda (args)
                  (apply #'no-applicable-method gf args))
                (let ((first-mf (closer-mop:method-function (first chain)))
                      (rest-chain (rest chain)))
                  (lambda (args)
                    (funcall first-mf args rest-chain)))))))))

The trick with class-list is key: I only specialize on the receiver’s actual class. Every other argument position gets T, the universal supertype. This tells the MOP “I don’t care about these arguments for dispatch purposes” — which is exactly Java’s single-dispatch semantics.

The Real Win: Short-Circuiting update-dfun

The custom discriminating function helps, but the single biggest performance win is intercepting PCL’s update-dfun. This is the internal function that rebuilds the discrimination net, and PCL calls it every time a method is added, removed, or a relevant class is defined. During Clojure bootstrap, that means thousands of times.

For java-generic-function instances, I replace the expensive rebuild with a cache clear:

(let ((original-update-dfun (fdefinition 'sb-pcl::update-dfun)))
  (sb-ext:without-package-locks
    (setf (fdefinition 'sb-pcl::update-dfun)
          (lambda (gf &rest args)
            (if (and (typep gf 'java-generic-function)
                     (slot-boundp gf 'dispatch-cache))
                (progn
                  (bordeaux-threads:with-lock-held ((java-gf-cache-lock gf))
                    (clrhash (java-gf-dispatch-cache gf))
                    (clrhash (java-gf-invoke-special-cache gf)))
                  (sb-mop:set-funcallable-instance-function
                   gf (closer-mop:compute-discriminating-function gf)))
                (apply original-update-dfun gf args))))))

Cache invalidation flow

When PCL triggers update-dfun on a java-generic-function, instead of rebuilding a discrimination net, I clear both hash tables and reinstall the fast discriminating function. The caches repopulate lazily on the next call to each class. For non-Java generic functions, the original PCL logic runs as usual.

This is the difference between O(n) work on every class definition (where n is the method count) and O(1) work followed by amortized lazy cache fills.

Pre-Creating Hot Generic Functions

Not all generic functions are equal. <init>() is the single hottest one — every Java object creation calls it. I pre-create it (and a few others) with the java-generic-function metaclass during bootstrap, before any classes are loaded:

(ensure-generic-function '|<init>()|
                         :generic-function-class 'java-generic-function
                         :lambda-list '(|this|))

(ensure-generic-function '|clone()|
                         :generic-function-class 'java-generic-function
                         :lambda-list '(|this|))

And in the code generator, every Java instance method gets its generic function pre-created with the fast metaclass:

(unless (and (fboundp method-name)
             (typep (symbol-function method-name) 'generic-function))
  (ensure-generic-function method-name
    :generic-function-class 'java-generic-function
    :lambda-list (cons 'this arg-names)))

This guarantees that no Java method ever falls through to PCL’s default dispatch machinery.

invokespecial Caching

Java’s invokespecial instruction is used for constructor chaining and super calls. It bypasses virtual dispatch and targets a specific class’s method. During Clojure bootstrap, constructor chains like Object.<init>AbstractCollection.<init>AbstractList.<init> → … fire constantly.

I added a second cache specifically for this pattern:

(defun invoke-special (method-symbol owner-symbol args)
  (let* ((gf (symbol-function method-symbol))
         (owner-class (find-class owner-symbol)))
    (when (typep gf 'java-generic-function)
      (let* ((cache (java-gf-invoke-special-cache gf))
             (entry (gethash owner-class cache)))
        (unless entry
          (setf entry (%compute-invoke-special-entry
                        gf owner-class (length args)))
          (when entry
            (bordeaux-threads:with-lock-held ((java-gf-cache-lock gf))
              (setf (gethash owner-class cache) entry))))
        (when entry
          (return-from invoke-special
            (funcall (car entry) args (cdr entry))))))
    ;; fallback to full MOP lookup for non-java-generic-functions
    ...))

Each cache entry is a cons of (method-function . next-methods), so calling a cached invokespecial is a single funcall.

The Payoff

This work was motivated by cl-clojure, a project that runs Clojure inside Common Lisp via OpenLDK. Clojure’s bootstrap loads a massive number of classes — the core library alone defines hundreds of IFn implementations, each with invoke methods at multiple arities. Without the MOP customization, PCL spent more time rebuilding discrimination nets than doing useful work. With it, getting to a Clojure REPL went from 2 hours 45 minutes to 2 minutes 40 seconds.

The whole fix is about 120 lines of Common Lisp in a single file (src/java-gf.lisp), plus a few ensure-generic-function calls sprinkled through the bootstrap code. It touches exactly one internal PCL function (update-dfun), and it leaves all non-Java generic functions completely untouched.

What I like about this solution is that it’s the MOP working as designed. Gregor Kiczales and the AMOP authors built the Meta-Object Protocol specifically so you could do this kind of thing — swap out pieces of the object system’s implementation without forking the runtime. I didn’t patch SBCL. I didn’t write a custom compiler. I subclassed standard-generic-function, specialized one MOP generic function, and intercepted one internal PCL function. The rest of CLOS keeps working exactly as before.

The source is at github.com/atgreen/openldk.