This is a work-in-progress draft document describing lightweight userland cooperative threads for SBCL. The implementation is under active development and details may change. This is a living document — you can view its revision history. You can try it out on the
fibers-v2branch at github.com/atgreen/sbcl.
Table of Contents
Introduction
- Motivation: Why Fibers?
- Design Goals
- Terminology
Programming API
- Creating and Running Fibers
- Yielding and Waiting
- Fiber Sleep and Timed Waits
- Fiber Parking (Condition-Based Suspend/Resume)
- Fiber Join
- Spawning Fibers from Fibers
- Multi-Carrier Scheduling
run-fibers(Simple, Blocking)start-fibers/submit-fiber/finish-fibers(Dynamic, Long-Lived)
- Fiber-Aware Blocking Primitives
- Mutex Acquisition
- Condition Variables
- Semaphores
- I/O (
wait-until-fd-usable) sleepandwait-for
- Fiber Pinning (Preventing Yield)
- Introspection (
list-all-fibers,fiber-state, Backtraces) - Idle Hooks
- API Reference Summary
Architecture Overview
- Carrier Threads and Schedulers
- Fiber Lifecycle State Machine
- Data Flow: Yield and Resume
Context Switching
- Register Save/Restore Convention
- The
fiber_switchAssembly Routine - Stack Frame Initialization for New Fibers
- The Entry Trampoline (Assembly to C to Lisp)
- Zero-Allocation Design (No SAPs, No GC Pressure)
- Thread Register Patching on Carrier Migration
Stack Management
- Control Stack Layout and Guard Pages
- Binding Stack (Separate Allocation)
- Stack Pooling (
madvise(MADV_DONTNEED)Recycling) - Stack Overflow Detection in Signal Handlers
- Stack Size Tradeoffs (No Dynamic Growth)
Dynamic Variable Bindings (TLS)
- The Problem:
unbind_to_hereZeroes Entries - TLS Overlay Arrays (Save/Restore Without Unbinding)
- Carrier Value Update on Migration
- Catch Block and Unwind-Protect Chain Save/Restore
- Empty Binding Stack Fast Path
- Same-Carrier Resume Optimization
- The Problem:
Garbage Collector Integration
- The Two-List Design: Suspended Fibers vs. Active Contexts
fiber_gc_info: Conservative Control Stack Scanningactive_fiber_context: Carrier + Fiber Stack Visibility- Precise Binding Stack Scavenging
- Persistent Carrier Context (Lock-Free Hot Path)
- The
fiber-contextThread Struct Slot (O(1) Lookup) - GC Safety Windows (
without-interruptsvs.without-gcing) - Correctness Argument: Why Partial Updates Are Safe
Scheduler Design
- The Scheduler Loop
- Fast Path: Skip Maintenance When Deque Is Hot
- Maintenance: Pending Queue Drain, Deadline Expiry, Wake Checks
- Post-Switch Dispatch (Suspended, Dead)
- Idle Detection and Carrier Parking
- Maintenance Frequency Backstop (Every 64 Fibers)
Work Stealing
- Chase-Lev Lock-Free Deque
- Owner Operations (Push/Pop from Bottom, LIFO)
- Thief Operations (Steal from Top, FIFO, CAS)
- Buffer Growth (Power-of-Two Circular Array)
- Random Victim Selection
- Fiber Migration and Thread Register Fixup
I/O Multiplexing
- Platform Abstraction (epoll, kqueue, poll Fallback)
- Edge-Triggered Mode with One-Shot (
EPOLLET | EPOLLONESHOT) - Indexed I/O Waiters (fd-to-Fiber Table)
- Bounded Epoll Drain Loop
fd-ready-pandwait-until-fd-usableIntegration- The Default I/O Idle Hook
- Batched FD Polling vs. Per-Fiber Polling
Deadline Scheduling
- Binary Min-Heap with Inline Index
- O(log N) Insert and Remove
- Batch Expiry (Pop All Expired in One Pass)
- Interaction with I/O Waiters (Dual-Indexed Fibers)
Fiber Death and Cleanup
- The Lisp Trampoline (
fiber-trampoline) - Error Handling and Result Capture
- Binding Stack Cleanup on Death (Without
unbind_to_here) - GC Info Unregistration Timing
- Stack Return to Pool
- The Lisp Trampoline (
Integration with SBCL
- Thread Struct Extension (
fiber-contextSlot) serve-eventDispatch (Fiber-Awarewait-until-fd-usable)sleepandwait-forDispatch- Mutex and Condition Variable Dispatch
- Pinned Blocking Fallback to OS Primitives
*pinned-blocking-action*Warning/Error Policyinterrupt-thread- Debugger Integration
- Thread Struct Extension (
Performance
- HTTP Benchmark
- Memory Efficiency
- Scalability Under High Connection Counts
- Context Switch Microbenchmark
- GC Impact
Platform Support
- x86-64 (Linux, macOS, Windows)
- ARM64
- ARM32
- PPC64
- PPC32
- RISC-V
- Feature Flag:
:sb-fiber - Platform-Specific Assembly and I/O Backends
Appendix A: Using Hunchentoot with Fibers
- The Fiber Taskmaster
- Taskmaster Methods
- Starting the Server
- How It Works
- SSL
- Considerations
1. Introduction
1.1 Motivation: Why Fibers?
Many server workloads are concurrent but not parallel. A web server handling 10,000 connections spends almost all of its time waiting for network I/O; the actual computation per request is trivial. The natural programming model is one thread of control per connection — read a request, compute a response, write it back — but OS threads are too expensive to use this way at scale.
Each OS thread in SBCL carries a full-sized control stack (typically
8 MB), a binding stack, signal handling infrastructure, and a kernel
task_struct. Creating a thread requires mmap, clone, and TLS
setup; destroying one requires the reverse. Context switching between
threads requires a kernel transition, TLB management, and scheduler
bookkeeping. At 10,000 concurrent connections, this means 80 GB of
virtual address space for stacks alone, and the kernel scheduler —
designed for dozens to hundreds of runnable tasks — begins to degrade.
The conventional alternative is event-driven programming: a single
thread multiplexes connections using epoll or kqueue, dispatching
callbacks on I/O readiness. This scales well but inverts control
flow. Sequential logic must be broken into state machines or
continuation chains. Error handling, resource cleanup, and debugging
all become harder. Backtraces show the event loop, not the logical
call stack of the request being served.
Fibers offer a third option. A fiber is a user-space thread with its own control stack and binding stack, scheduled cooperatively by a library-level scheduler rather than the kernel. Fibers preserve the sequential programming model — the code reads top-to-bottom like a normal thread — while achieving the resource efficiency of event-driven I/O. A fiber’s control stack is 256 KB by default (not 8 MB), context switching saves and restores six registers in user space (not a full kernel transition), and thousands of fibers can share a small pool of OS carrier threads.
Common Lisp complicates this picture because the language carries
more implicit per-thread state than most: special variable bindings
live in a separate binding stack, non-local exits thread through
catch tags and unwind-protect chains, and all of these must be
saved and restored correctly across suspension points. A fiber
implementation that only switches the control stack will corrupt this
state. SBCL’s fiber implementation handles all of it: dynamic
bindings made within a fiber are local to that fiber,
unwind-protect cleanup forms run on fiber death, and
handler-case/handler-bind work as expected.
1.2 Design Goals
The implementation is guided by several priorities, roughly in order:
Correctness under GC. SBCL uses a generational, compacting,
stop-the-world garbage collector. The GC must be able to find every
live Lisp object, including those on suspended fiber stacks and in
fiber binding stacks. Getting this wrong means silent heap corruption.
Every design decision in the fiber runtime is constrained by the
requirement that GC can fire at almost any point (the only exclusion is
without-gcing regions) and must see a consistent view of all fiber
state.
Transparent integration. Existing SBCL code should work inside
fibers without modification. grab-mutex, condition-wait,
wait-until-fd-usable, sleep, and wait-for all detect when they
are running inside a fiber and yield cooperatively instead of blocking
the carrier thread. Code that is not fiber-aware does not need to
change. Code that cannot safely yield (e.g., in the middle of a
multi-step foreign library interaction) can pin the fiber, causing
blocking primitives to fall through to their OS implementations.
Low per-switch overhead. The context switch hot path — yield-to-scheduler-to-resume — must be fast, because fibers switch far more frequently than OS threads. The implementation targets sub-microsecond switches by using a hand-written assembly routine that saves only callee-saved registers, performing zero heap allocation on the switch path, and eliminating all mutex acquisitions from the per-switch code.
Scalability to many fibers. The system should handle tens of thousands of concurrent fibers without degradation. This requires small per-fiber memory footprint (256 KB + 16 KB stacks, pooled for reuse), O(1) or O(log N) scheduling operations, and efficient I/O multiplexing that avoids per-fiber polling.
Multi-core utilization. A pool of carrier threads should keep all cores busy when there is enough work. Idle carriers should not waste CPU. Work should migrate between carriers without programmer intervention. The implementation uses lock-free work-stealing deques (Chase-Lev) so that busy carriers never contend on a shared queue, and idle carriers can take work from busy ones without locks.
1.3 Terminology
Fiber: A lightweight cooperative thread with its own control stack and binding stack, scheduled in user space.
Carrier thread (or carrier): An OS thread that runs fibers. Each carrier has its own scheduler. A fiber executes on exactly one carrier at a time, but may migrate between carriers via work stealing.
Scheduler: A per-carrier structure that manages a run queue (work-stealing deque), waiting lists, deadline heaps, and I/O multiplexer state. One scheduler per carrier thread.
Scheduler group: A collection of schedulers (one per carrier) that cooperate via work stealing. The unit of fiber lifecycle management: fibers are submitted to a group and results are collected from it.
Yield: A fiber voluntarily suspends itself, returning control to its carrier’s scheduler. The scheduler may then run another fiber on the same carrier.
Resume: The scheduler restores a suspended fiber’s context and transfers control to it.
Pin: Temporarily prevent a fiber from yielding. While pinned, blocking primitives fall through to their OS implementations, blocking the carrier thread.
Wake condition: A predicate function associated with a suspended fiber. The scheduler calls it periodically; when it returns true, the fiber becomes runnable.
Deque (double-ended queue): The per-carrier run queue, a Chase-Lev work-stealing deque. The owner pushes and pops from the bottom (LIFO); thieves steal from the top (FIFO) via CAS.
Work stealing: When a carrier’s local run queue is empty, it attempts to take a fiber from another carrier’s queue. This balances load across cores without centralized scheduling.
2. Programming API
All public symbols are exported from the SB-THREAD package. The
fiber API is available when SBCL is built with the :sb-fiber feature
(enabled by adding :sb-fiber to local-target-features.lisp-expr).
2.1 Creating and Running Fibers
A fiber is created with make-fiber and submitted for execution on a
scheduler group. It does not begin running until the scheduler picks
it up.
(make-fiber function &key name stack-size binding-stack-size initial-bindings)
function — a function of no arguments. This is the fiber’s
entry point. When it returns normally, all return values are captured
as a list in the fiber’s result slot (retrievable via fiber-result,
or as multiple values via fiber-join). If it signals an unhandled
error, the condition object becomes the result instead.
name — an optional string for debugging. Appears in
print-object output and backtrace annotations.
stack-size — the size of the fiber’s control stack in bytes.
Defaults to 256 KB. This is the usable region; an additional guard
page (typically 4 KB) is allocated below it for stack overflow
detection. Stacks are drawn from a per-size pool and recycled with
madvise(MADV_DONTNEED), so the cost of creation is usually a pool
hit rather than a fresh mmap.
binding-stack-size — the size of the fiber’s binding stack in
bytes. Defaults to 16 KB. Each (let ((*var* val)) ...) or progv
form pushes a two-word entry (old value + TLS index), so 16 KB
accommodates 1024 nested special variable bindings. A guard page is
allocated above the usable region; overflow triggers a segfault with
a diagnostic message, just like control stack overflow.
initial-bindings — an alist of (symbol . value) pairs.
These are established as dynamic bindings (via progv) before the
fiber’s function is called, analogous to the :initial-bindings
argument to sb-thread:make-thread. The values in the alist are
used directly (not evaluated); the caller is responsible for computing
them before passing the alist. This is the primary mechanism for
giving each fiber its own copy of a special variable:
(make-fiber (lambda () (format t "~A~%" *my-connection*))
:initial-bindings `((*my-connection* . ,conn)))
After creation, a fiber is in the :created state. It must be
submitted to a scheduler before it can run (see Section 2.7).
2.2 Yielding and Waiting
(fiber-yield &optional wake-condition)
fiber-yield suspends the current fiber and returns control to the
carrier thread’s scheduler. The scheduler may then run another
fiber on the same carrier. When the fiber is eventually resumed,
fiber-yield returns normally and execution continues from the point
of suspension.
If wake-condition is provided, it must be a function of no
arguments. The scheduler calls it periodically; when it returns a
true value, the fiber is moved from the waiting list to the run queue.
If wake-condition is nil, the fiber becomes immediately runnable
again (it is pushed back onto the scheduler’s deque and will be
picked up on a subsequent iteration).
fiber-yield signals an error if called outside a fiber context or
if the fiber is pinned (pin-count > 0).
The yield path is designed to be fast. It saves the fiber’s binding stack pointer, catch block, and unwind-protect chain; optionally saves TLS overlay values (skipped when the binding stack is empty); restores the carrier thread’s binding state; and then performs a register-level context switch back to the scheduler. The entire operation involves zero heap allocation and zero mutex acquisition.
2.3 Fiber Sleep and Timed Waits
(fiber-sleep seconds)
Suspends the current fiber for at least seconds (which may be
fractional). Implemented by setting a deadline in the fiber struct
and yielding with no wake condition. The scheduler’s deadline heap
expires the fiber when get-internal-real-time passes the target.
This avoids allocating a closure for the wake condition — the
scheduler checks the deadline field directly.
fiber-sleep integrates with the standard cl:sleep function.
When sleep is called inside a fiber, SBCL dispatches to
fiber-sleep automatically. If the fiber is pinned, sleep falls
through to the OS implementation (blocking the carrier thread).
2.4 Fiber Parking (Condition-Based Suspend/Resume)
(fiber-park predicate &key timeout)
fiber-park is the general-purpose suspension primitive. It
combines a wake predicate with an optional timeout:
predicate— a function of no arguments. The scheduler calls it during maintenance passes; when it returns true, the fiber is woken.timeout— seconds (may be fractional) after which the fiber is woken regardless of the predicate.
Returns t if the predicate was satisfied, nil if the timeout
expired first.
Internally, fiber-park stores the timeout as a deadline in the
fiber’s deadline slot and passes the predicate to fiber-yield.
The scheduler handles deadline expiry and predicate checking in its
maintenance loop, so fiber-park itself allocates nothing beyond what
the caller passes in. This two-channel wake design (predicate OR
deadline) is the building block for all higher-level waiting
operations: mutex acquisition, condition variables, I/O waits, and
fiber-join.
For a pure timed wait with no wake condition, use fiber-sleep
(Section 2.3) rather than fiber-park with a dummy predicate.
fiber-park is intended for cases where you need to wait until a
condition becomes true.
2.5 Fiber Join
(fiber-join target &key timeout)
Waits until the fiber target completes and returns the fiber
function’s return values (via values-list), analogous to
join-thread. If the fiber terminated due to an unhandled error, the
single return value is the condition object; use fiber-error-p to
distinguish this from a normal return.
Returns nil (single value) if the timeout expires first.
A fiber cannot join itself.
;; Single return value:
(fiber-join child) => result
;; Multiple return values:
(multiple-value-bind (x y) (fiber-join child)
(format t "Got ~S and ~S~%" x y))
;; Error checking:
(let ((result (fiber-join child)))
(if (fiber-error-p child)
(format t "Fiber failed: ~A~%" result)
(format t "Fiber returned: ~S~%" result)))
fiber-join works from two contexts:
From a fiber: Parks the calling fiber with a predicate that checks
(eq (fiber-state target) :dead). This is cooperative — the calling fiber yields and other fibers continue running on the same carrier.From an OS thread: Spin-waits with 1 ms sleeps between checks. This is the appropriate behavior for a thread that is collecting results from a fiber group.
2.6 Spawning Fibers from Fibers
A running fiber can create and submit new fibers using the same
make-fiber and submit-fiber API. The new fiber is submitted to
the current scheduler’s group (if any), making it visible to the
work-stealing infrastructure:
(let ((child (make-fiber (lambda () (compute-something)))))
(submit-fiber (fiber-scheduler-group *current-scheduler*) child)
;; ... do other work ...
(fiber-join child))
There is no implicit parent-child relationship. The spawning fiber is not suspended while the child runs; both are independently schedulable. The parent must explicitly join if it needs the child’s result.
When submit-fiber targets a scheduler group, the fiber is pushed
onto a randomly chosen carrier’s pending queue (a thread-safe atomic
list) and a parked carrier is woken if one exists. If all carriers
are busy, the fiber remains in the pending queue until the next
maintenance pass drains it. This is safe to call from any thread,
including non-carrier threads and fibers on other carriers.
2.7 Multi-Carrier Scheduling
Two API styles are provided: a simple blocking interface for batch-style workloads, and a dynamic interface for long-lived server applications.
2.7.1 run-fibers (Simple, Blocking)
(run-fibers fibers &key carrier-count idle-hook)
=> list-of-results
Creates a scheduler group with carrier-count carrier threads
(defaulting to the number of available CPUs, cgroup-aware on Linux),
distributes fibers
round-robin across the carriers, runs them all to completion, and
returns a list of their results in the same order as the input.
This is a convenience wrapper around start-fibers + finish-fibers.
Typical use:
(let ((fibers (loop for url in urls
collect (make-fiber (lambda () (fetch url))))))
(run-fibers fibers :carrier-count 4))
2.7.2 start-fibers / submit-fiber / finish-fibers (Dynamic, Long-Lived)
For applications that submit work over time (e.g., a web server submitting a new fiber per accepted connection):
(start-fibers initial-fibers &key carrier-count idle-hook)
=> fiber-scheduler-group
Starts carrier-count carrier threads and distributes
initial-fibers round-robin. Returns a fiber-scheduler-group
handle immediately. All carriers run on background threads.
(submit-fiber group fiber)
Submits a new fiber to the group. Atomically increments the group’s active count (so carriers know not to exit), pushes the fiber onto a randomly chosen carrier’s pending queue, and wakes a parked carrier if one exists. This is safe to call from any thread.
(finish-fibers group) => list-of-results
(fiber-group-done-p group) => boolean
finish-fibers blocks until all fibers in the group have completed,
then returns their results.
fiber-group-done-p is a non-blocking check that returns t when
the group’s active count reaches zero.
The dynamic API is the intended interface for server integration. A server’s accept loop creates fibers and submits them; the carrier threads run them; the fibers yield on I/O and resume when data is available. The carrier count controls parallelism, and work stealing keeps all carriers busy regardless of which carrier a fiber was submitted to.
2.8 Fiber-Aware Blocking Primitives
Existing SBCL blocking primitives detect fiber context and yield
cooperatively rather than blocking the carrier thread. This is the
key to transparent integration: library code that calls grab-mutex
or sleep works correctly inside fibers without modification.
Each primitive follows the same pattern: check if running in a fiber
and if the fiber can yield (not pinned); if so, park the fiber with
an appropriate wake condition; if pinned, invoke the configured
*pinned-blocking-action* and fall through to the OS implementation.
2.8.1 Mutex Acquisition
When a fiber calls sb-thread:grab-mutex and the mutex is contended,
the fiber parks with a predicate that checks whether the mutex state
is free. On wake, it attempts a CAS to acquire the mutex. If the
CAS fails (another fiber grabbed it first), the fiber parks again.
This retry loop is necessary because wake conditions are checked
periodically, not atomically with the state change.
2.8.2 Condition Variables
sb-thread:condition-wait in a fiber context: releases the mutex,
parks the fiber with a generation-counter predicate (woken when
condition-notify increments the waitqueue’s generation), re-acquires
the mutex on wake. The generation counter avoids lost wakeups —
if condition-notify fires between the mutex release and the yield,
the generation will already have changed when the predicate is first
checked.
2.8.3 Semaphores
Semaphore operations (wait-on-semaphore, signal-semaphore) follow
the same pattern as mutexes: park until the count is positive, then
CAS to decrement.
2.8.4 I/O (wait-until-fd-usable)
(sb-sys:wait-until-fd-usable fd direction &optional timeout)
In fiber context, this registers the file descriptor with the
scheduler’s event multiplexer (epoll on Linux, kqueue on BSDs) and
parks the fiber. On Linux with epoll available, the registration
uses edge-triggered mode with one-shot (EPOLLET | EPOLLONESHOT):
the kernel delivers a single event when the fd becomes ready, then
disables the registration. This eliminates spurious wakeups and
avoids the thundering-herd problem when multiple fibers wait on
related fds.
The scheduler’s idle hook calls epoll_wait when no fibers are
runnable, waking fibers whose fds have become ready. The fiber
stores its waited-on fd and direction in struct fields so the
scheduler can index into an fd-to-fiber table for O(1) dispatch.
When epoll is not available (or on non-Linux platforms), the fallback
is a batched poll() call over all waiting fds.
2.8.5 sleep and wait-for
cl:sleep dispatches to fiber-sleep when called in a fiber context
(see Section 2.3).
sb-ext:wait-for dispatches to fiber-park with the test function
as the predicate and the timeout converted to a deadline.
2.9 Fiber Pinning (Preventing Yield)
(fiber-pin &optional fiber)
(fiber-unpin &optional fiber)
(fiber-can-yield-p &optional fiber)
(with-fiber-pinned ((&optional fiber)) &body body)
Pinning increments a per-fiber counter; unpinning decrements it. A
fiber with a non-zero pin count cannot yield. Any attempt to call
fiber-yield while pinned signals an error.
The purpose of pinning is to protect regions of code that cannot tolerate suspension. The primary case is thread-affine foreign state: many C libraries use thread-local storage internally (OpenSSL contexts, database handles, GPU contexts). If a fiber yields mid-interaction and is later resumed on a different carrier via work stealing, the C library’s thread-local state belongs to the old carrier — the fiber is now on the wrong OS thread. This should be rare in practice, but pinning provides a simple guard against it.
When a pinned fiber calls a blocking primitive (grab-mutex, sleep,
wait-until-fd-usable, etc.), the primitive detects the pin and falls
through to the OS implementation, blocking the carrier thread. The
variable *pinned-blocking-action* controls how this is reported:
:warn(default) — emits a warning identifying the fiber and the operation.:error— signals an error.nil— silently falls through.
This policy exists because blocking a carrier thread degrades
throughput for all fibers on that carrier. The warning helps
developers identify code that should either be restructured to avoid
blocking while pinned, or that legitimately needs to pin (in which
case the policy can be set to nil).
with-fiber-pinned is the recommended interface. It pins on entry,
executes the body, and unpins in an unwind-protect cleanup form,
ensuring the pin count is always restored even if the body signals.
2.10 Introspection (list-all-fibers, fiber-state, Backtraces)
(list-all-fibers) => list
Returns a copy of the global fiber list. Includes all fibers that
have been created and not yet destroyed (i.e., all states: :created,
:runnable, :running, :suspended, :dead). The list is captured
under a mutex to provide a consistent snapshot.
(fiber-state fiber) => keyword
Returns the fiber’s current state: :created (made but not yet
submitted), :runnable (on a run queue), :running (actively
executing on a carrier), :suspended (yielded with a wake condition
or deadline), or :dead (function returned or signaled).
(fiber-name fiber) => string-or-nil
(fiber-result fiber) => list
(fiber-alive-p fiber) => boolean
fiber-result returns the fiber’s result values as a list (or a
single-element list containing the condition object on error) after
death. Use fiber-join to receive the values via values-list. fiber-alive-p is shorthand for
(not (eq (fiber-state fiber) :dead)).
(print-fiber-backtrace fiber &key stream count)
Prints a full symbolic backtrace for a suspended fiber, using SBCL’s
standard debugger infrastructure. Produces the same human-readable
output as print-backtrace — function names, arguments, source
locations, and local variables. Internally it binds the debug stack
bounds to the fiber’s control stack and walks frames starting from
fiber-top-frame. Only works for :suspended or :created fibers;
a running fiber’s stack is live on a carrier and being actively
modified.
2.11 Idle Hooks
(make-fiber-scheduler &key idle-hook)
(start-fibers fibers &key carrier-count idle-hook)
(run-fibers fibers &key carrier-count idle-hook)
The idle hook is a function of one argument (the scheduler) called when the scheduler has no runnable fibers but has suspended fibers waiting for a condition. The hook’s job is to block the carrier thread efficiently until something changes — typically by performing I/O multiplexing.
The default idle hook (fiber-io-idle-hook) calls epoll_wait (on
Linux) or a batched poll() (elsewhere) with a timeout derived from
the nearest deadline in the scheduler’s deadline heap. When an fd
event arrives, the corresponding fiber is moved to the run queue.
Custom idle hooks are useful for integrating fibers with external event sources that are not file descriptors. For example, a message queue consumer might check for new messages in its idle hook. The hook should block for a bounded time (the default uses the nearest deadline as its timeout) to ensure deadline-based fibers are not starved.
2.12 API Reference Summary
| Function | Description |
|---|---|
make-fiber | Create a fiber (does not start it) |
submit-fiber | Submit a fiber to a scheduler or group |
run-fibers | Create group, run fibers, return results (blocking) |
start-fibers | Create group and start carriers (non-blocking) |
finish-fibers | Block until all fibers complete, return results |
fiber-group-done-p | Non-blocking completion check |
fiber-yield | Suspend current fiber |
fiber-sleep | Suspend for a duration |
fiber-park | Suspend until predicate or timeout |
fiber-join | Wait for fiber completion |
fiber-pin / fiber-unpin | Prevent/allow yielding |
with-fiber-pinned | Pin for duration of body |
fiber-can-yield-p | Check if fiber can yield |
current-fiber | Return current fiber or nil |
fiber-state | Return fiber’s lifecycle state |
fiber-name | Return fiber’s name |
fiber-result | Return fiber’s result values (as list) |
fiber-error-p | Check if fiber terminated with an error |
fiber-alive-p | Check if fiber is not dead |
list-all-fibers | Snapshot of all live fibers |
print-fiber-backtrace | Symbolic backtrace for suspended fiber |
*current-fiber* | Dynamic variable: current fiber |
*current-scheduler* | Dynamic variable: current scheduler |
*pinned-blocking-action* | Policy for pinned blocking (:warn, :error, nil) |
3. Architecture Overview
3.1 Carrier Threads and Schedulers
The fiber runtime is organized around a two-level hierarchy: carrier threads and schedulers.
Each carrier thread is a normal SBCL OS thread (sb-thread:thread)
that runs a scheduler loop. The scheduler loop pulls fibers from a
local work-stealing deque, runs them until they yield or die, handles
post-switch bookkeeping, and repeats. When the deque is empty, the
scheduler performs maintenance (draining pending queues, checking wake
conditions, expiring deadlines, polling I/O) and attempts work
stealing from sibling schedulers.
A scheduler group (fiber-scheduler-group) binds multiple schedulers
together. Each group has a vector of scheduler structs (one per
carrier), a shared active-count for lifecycle tracking, a mutex and
condition variable for carrier parking, and a list of submitted
fibers for result collection.
The separation between scheduler and group is deliberate. A single
carrier with a single scheduler is the simplest configuration: no
work stealing, no cross-thread coordination. Multiple carriers in
a group add parallelism via work stealing. The group’s active-count
is the coordination mechanism: carriers spin down when it reaches
zero; submit-fiber increments it atomically and wakes a parked
carrier.
3.2 Fiber Lifecycle State Machine
A fiber passes through five states:
:created ──submit──> :runnable ──schedule──> :running
^ |
| yield/wake
| |
+────────────── :suspended
|
(or from :running)
|
v
:dead
:created—make-fiberreturns a fiber in this state. It has allocated stacks and a GC info record, but has not been submitted to any scheduler.:runnable—submit-fiberor a wake event transitions the fiber here. The fiber is on some scheduler’s deque or pending queue, waiting to be picked up.:running— The scheduler has popped the fiber and is executing it on a carrier. Exactly one fiber per carrier can be in this state.:suspended—fiber-yieldtransitions here. The fiber has a saved stack frame and optionally a wake condition, deadline, or I/O wait registration.:dead— The fiber’s function returned (or signaled an unhandled error). Results are captured, resources are cleaned up, and the fiber is removed from the global list.
3.3 Data Flow: Yield and Resume
A complete yield-and-resume cycle involves these steps:
Yield (fiber → scheduler):
- Save the fiber’s binding stack pointer, catch block, and unwind-protect chain into the fiber struct.
- Update the fiber’s
gc_infobinding stack pointer (so GC can still scan the binding stack after the carrier’s BSP is restored). - Save the fiber’s TLS overlay values and restore the carrier’s TLS values (skipped if binding stack is empty).
- Restore the carrier’s BSP, catch block, and unwind-protect chain.
- Set fiber state to
:suspendedand store the wake condition. - Call
fiber_switch: save six callee-saved registers, write RSP to the fiber’s saved-rsp slot, load RSP from the scheduler’s saved-rsp slot, restore six registers,retinto the scheduler.
Resume (scheduler → fiber):
- Save the carrier’s BSP, catch block, and unwind-protect chain.
- Update the active fiber GC context (fiber stack boundaries and carrier stack scan range) — zero mutex locks.
- Set the carrier’s BSP to the fiber’s binding stack pointer.
- Restore the fiber’s catch block and unwind-protect chain.
- If the fiber has bindings: update carrier values in the binding stack (only on migration) and replay the TLS overlay.
- Widen the fiber’s
gc_infoto cover the full stack (conservative scan during the window beforefiber_switch). - Set effective stack bounds to the fiber’s stack.
- Call
fiber_switch: save six registers, write RSP to the scheduler’s saved-rsp slot, load RSP from the fiber’s saved-rsp slot, restore six registers,retinto the fiber (at the point after the yield’sfiber_switchcall). - On return: restore effective stack bounds, update or unregister the fiber’s GC info, clear the active fiber context, restore the carrier’s BSP.
4. Context Switching
4.1 Register Save/Restore Convention
The fiber context switch saves only the callee-saved registers defined by the platform ABI. This is the minimum set that a C or Lisp function is entitled to assume is preserved across a call. All other registers are either caller-saved (the compiler already assumes they are clobbered) or are restored by normal function return.
x86-64 SysV ABI (Linux, macOS): 6 registers (48 bytes)
rbp,rbx,r12,r13,r14,r15
x86-64 Win64 ABI (Windows): 8 GPRs (64 bytes) + 10 XMMs (160 bytes) = 224 bytes
rbp,rbx,rdi,rsi,r12-r15xmm6-xmm15
ARM64 (AAPCS64): 20 registers (160 bytes)
x19-x28,x29(FP),x30(LR)d8-d15(SIMD/FP callee-saved)
ARM32 (AAPCS): 104 bytes
d8-d15(VFP, 64 bytes),r3-r11,lr(40 bytes)
PPC64 (ELFv2): 320 bytes
- LR, CR,
r14-r31(18 GPRs),f14-f31(18 FPRs)
PPC32: 240 bytes
- LR, CR,
r14-r31(18 GPRs × 4 bytes),f14-f31(18 FPRs × 8 bytes)
RISC-V (RV64): 208 bytes
ra,cfp(for debugger),s0-s11(12 GPRs),fs0-fs11(12 FPRs)
The choice to save only callee-saved registers (rather than the full register file) is the primary reason fiber switches are fast. A kernel context switch must save and restore all registers plus segment registers, FPU state, and signal masks. A fiber switch saves 6 registers on x86-64 SysV (48 bytes), stores one pointer, loads one pointer, restores 6 registers, and returns.
4.2 The fiber_switch Assembly Routine
fiber_switch is the core of the context switch. It is implemented
as a Lisp assembly routine (via define-assembly-routine) rather than
a hand-written .S file, which means it is assembled by SBCL’s
assembler and stored in the code heap alongside compiled Lisp
functions.
The routine takes three arguments in ABI registers:
- old-slot: raw address of the word where the current RSP should
be saved (either a fiber’s
saved-rspslot or a scheduler’ssaved-rspslot). - new-slot: raw address of the word containing the target RSP to load.
- thread-ptr: if non-zero, patch the thread register (
r13on x86-64,x21on ARM64) after restoring registers. This handles carrier migration.
The x86-64 implementation:
push rbp ; save callee-saved registers
push rbx
push r12
push r13
push r14
push r15
mov [old-slot], rsp ; save current RSP
mov rsp, [new-slot] ; load target RSP
pop r15 ; restore callee-saved registers
pop r14
pop r13
pop r12
pop rbx
pop rbp
test thread-ptr, thread-ptr
jz done
mov r13, thread-ptr ; patch thread register
done:
ret ; return to caller on new stack
The ret is the critical instruction. When switching from a running
fiber to the scheduler, ret pops the return address that was pushed
when the scheduler originally called fiber_switch, returning
execution to the scheduler loop. When switching from the scheduler
to a suspended fiber, ret pops the return address from the fiber’s
stack, returning to the point just after the fiber’s fiber_switch
call inside fiber-yield.
4.3 Stack Frame Initialization for New Fibers
A brand-new fiber has never been switched to, so its stack has no
saved registers. initialize-fiber-stack constructs a synthetic
stack frame that makes fiber_switch’s register restore sequence
“return” into the entry trampoline.
On x86-64 SysV, the initial stack layout (from top, growing down):
stack_top - 8: 0 (alignment padding)
stack_top - 16: 0 (spare slot)
stack_top - 24: trampoline_addr (return address for ret)
stack_top - 32: fiber_lispobj (popped into rbp)
stack_top - 40: 0 (popped into rbx)
stack_top - 48: 0 (popped into r12)
stack_top - 56: 0 (popped into r13)
stack_top - 64: 0 (popped into r14)
stack_top - 72: 0 (popped into r15)
^--- initial saved RSP
When fiber_switch loads this RSP and pops registers, rbp receives
the fiber’s lispobj (a tagged Lisp pointer to the fiber struct). The
ret instruction then jumps to fiber_entry_trampoline.
4.4 The Entry Trampoline (Assembly to C to Lisp)
fiber_entry_trampoline is the landing pad for first-time fiber
execution. It receives the fiber lispobj in rbp (placed there by
the synthetic stack frame) and:
- Moves the fiber lispobj into the first C argument register (
rdion SysV,rcxon Win64). - Clears
rbp(no valid frame pointer for a new fiber). - Calls
fiber_run_and_finish(a C function). - Terminates with
hlt(unreachable; the C function does not return).
fiber_run_and_finish in turn calls a Lisp trampoline function
(fiber-trampoline) registered at scheduler startup. This Lisp
function runs the fiber’s user function with handler-case error
handling, captures the result, marks the fiber dead, cleans up
bindings, and switches back to the scheduler. The chain is:
fiber_switch → fiber_entry_trampoline (asm)
→ fiber_run_and_finish (C)
→ fiber-trampoline (Lisp)
→ user function
The Lisp trampoline never returns; it ends with a fiber_switch back
to the scheduler. The hlt after the C call is a safety net.
4.5 Zero-Allocation Design (No SAPs, No GC Pressure)
The %fiber-switch VOP passes arguments as raw machine words, not
Lisp objects. The saved-rsp slot addresses are computed from the
fiber struct’s tagged pointer using arithmetic (adding the slot offset
and subtracting the lowtag), producing unsigned integers that never
become heap-allocated SAPs.
This matters because SAP allocation can trigger GC, which would move the fiber struct (invalidating the address just computed) and require stopping the world at an inconvenient point. By keeping everything as raw words, the switch path creates zero GC pressure.
The TLS scratch hash table and overlay arrays are pre-allocated in the scheduler and fiber structs, not on the switch path. The wake condition closures are allocated by the caller before yielding.
4.6 Thread Register Patching on Carrier Migration
SBCL reserves a machine register as a pointer to the current thread
struct: r13 on x86-64 (when gs-segment TLS is not used), x21
on ARM64. This register is set once when a thread starts and never
changes during normal execution.
When a fiber migrates between carriers via work stealing, it was suspended on carrier A with A’s thread pointer in its saved registers. If it resumes on carrier B, the restored thread register would point to A’s thread struct — the wrong thread.
The resume-fiber-internal function handles this by passing the
current carrier’s thread SAP as the third argument to fiber_switch.
The assembly routine checks if this is non-zero and, if so, overwrites
the just-restored thread register with the correct value. This
happens after all other registers are restored but before the ret,
so the fiber resumes with a valid thread pointer on its new carrier.
For same-carrier resumes (the common case), the third argument is still passed (the current thread pointer), but the patching is harmless — it sets the register to its already-correct value.
5. Stack Management
5.1 Control Stack Layout and Guard Pages
Each fiber’s control stack is a contiguous region obtained via
os_allocate (mmap on Unix, VirtualAlloc on Windows). The
layout, from low to high address:
+-------------------+ ← base (mmap'd address)
| Guard page (4 KB) | PROT_NONE — segfault on access
+-------------------+ ← control-stack-start (usable region)
| |
| Usable stack | 256 KB default (grows downward)
| |
+-------------------+ ← control-stack-end (stack top)
The guard page at the bottom catches stack overflow. When a fiber’s
stack pointer descends into the guard page, the hardware generates a
segfault. SBCL’s signal handler calls check_fiber_guard_page,
which walks both the active fiber contexts and the suspended fiber
GC info list to match the fault address against fiber stack ranges.
If matched, it calls lose() with a diagnostic message.
The usable stack region is what the fiber’s code sees. RSP starts
at control-stack-end (the top) and grows downward toward the guard
page.
5.2 Binding Stack (Separate Allocation)
Each fiber has a separate binding stack for dynamic variable bindings.
This is a flat array of two-word entries (old value, TLS index),
allocated via alloc_fiber_binding_stack. The default size is 16 KB,
accommodating 1024 nested bindings.
The binding stack grows upward (binding-stack-pointer increases with
each bind). A guard page (PROT_NONE) is placed above the usable
region, at the top of the allocation. If a fiber exhausts its
binding stack, the next bind writes into the guard page and triggers
a segfault. check_fiber_guard_page matches the fault address
against the binding stack guard regions in all_fiber_gc_info and
reports a diagnostic message via lose().
The 16 KB default is generous for nearly all workloads; programs
that use deeply nested dynamic bindings can increase it via the
:binding-stack-size keyword to make-fiber.
The binding stack is a separate allocation because its access pattern differs from the control stack. The control stack is accessed randomly (function calls, local variables); the binding stack is accessed linearly (push on bind, pop on unbind). Keeping them separate also simplifies GC scanning: the binding stack contains only TLS indices and Lisp object references, scavenged precisely, while the control stack is scanned conservatively.
5.3 Stack Pooling (madvise(MADV_DONTNEED) Recycling)
Creating a fiber requires two mmap calls (control stack + binding
stack) and destroying one requires two munmap calls. At high fiber
creation rates (e.g., one per HTTP request), these syscalls become a
bottleneck.
The stack pool eliminates this cost. Two global pools (one for
control stacks, one for binding stacks) cache up to 4096 stacks each.
When a fiber dies and its stack is returned, the pool first calls
madvise(MADV_DONTNEED) on the usable region, which tells the kernel
to release the physical pages while keeping the virtual mapping. The
next access to those pages returns zero-filled memory from the
kernel’s page fault handler — effectively a fresh stack without an
mmap/munmap round trip.
Pool lookup is O(1): each pool is a singly-linked free list protected by a mutex. Stacks are pooled only when their size matches the pool’s cached size (typically the default 256 KB or 16 KB); mismatched sizes bypass the pool and are deallocated immediately. This avoids complexity while handling the common case efficiently.
The guard page’s PROT_NONE protection is unaffected by madvise,
so recycled control stacks retain their overflow protection without
needing to re-set page permissions.
5.4 Stack Overflow Detection in Signal Handlers
SBCL’s signal handler for SIGSEGV (or SIGBUS on some platforms)
calls check_fiber_guard_page to determine whether the fault address
is in a fiber’s guard page. This function walks two data structures:
all_active_fiber_contexts— fibers currently executing on carriers. Theirfiber_stack_startmarks the top of the guard page.all_fiber_gc_info— suspended fibers. Theircontrol_stack_basemarks the top of the guard page.
If the fault address falls within one guard page width below a stack’s
start, the fiber has overflowed. The handler calls lose() with a
diagnostic message including the fault address and stack boundaries.
5.5 Stack Size Tradeoffs (No Dynamic Growth)
The default control stack size of 256 KB is a compromise. Smaller stacks allow more concurrent fibers (10,000 fibers at 256 KB = 2.5 GB virtual, versus 80 GB for OS threads at 8 MB), but too-small stacks cause overflow in code that uses deep recursion or large stack-allocated arrays.
The binding stack default of 16 KB is conservative for most
workloads. Each let over a special variable or progv form pushes
one 16-byte entry, so 16 KB supports 1024 nesting levels. Server
request handlers typically use a handful of special variables.
Both sizes are configurable per-fiber via make-fiber keywords.
Dynamic stack growth (as seen in Erlang/BEAM processes) is not
supported. Fiber stacks are fixed-size mmap allocations; growing
them would require either relocating the stack (invalidating all
interior pointers, return addresses, and GC-conservative references)
or using discontiguous segments (requiring compiler support for
segment-crossing checks on every function call). Neither approach
is feasible within SBCL’s native-code compilation model. The
practical mitigation is to choose an appropriate stack size at
make-fiber time and use heap allocation for large data structures.
6. Dynamic Variable Bindings (TLS)
6.1 The Problem: unbind_to_here Zeroes Entries
SBCL implements special variable bindings as a thread-local storage
(TLS) array indexed by per-symbol TLS indices. When code binds a
special variable, the runtime pushes a two-word entry (old value at
offset 0, TLS index at offset +word) onto
the binding stack and writes the new value into the thread’s TLS
array. When the binding is unwound, unbind_to_here pops entries,
restoring old values and zeroing the binding stack entries.
This create a problem for fibers. When a fiber yields, it must save
its TLS state and restore the carrier thread’s TLS state. But the
binding stack entries are the mechanism for tracking what needs to be
restored — and unbind_to_here destroys them. If we unwound the
fiber’s bindings on yield (calling unbind_to_here), the binding
stack would be zeroed out, and we would have no record of which TLS
indices were bound or what the fiber’s values were.
The fiber implementation therefore never calls unbind_to_here during
yield or resume. Instead, it uses a TLS overlay approach.
6.2 TLS Overlay Arrays (Save/Restore Without Unbinding)
Each fiber has two parallel arrays: tls-indices and tls-values.
On yield, save-fiber-tls-and-restore-carrier performs a two-pass
walk of the binding stack:
Pass 1: Count unique TLS indices using a scratch hash table (pre-allocated in the scheduler struct to avoid consing).
Pass 2: For each unique TLS index, save the current TLS value
(the fiber’s value) into the fiber’s tls-values array, then restore
the carrier’s value by writing the outermost old_value from the
binding stack into the TLS array.
On resume, restore-fiber-tls-overlay replays the fiber’s saved
values back into the TLS array:
(dotimes (i (length indices))
(setf (sap-ref-word thread-sap (aref indices i))
(aref values i)))
This is a simple indexed store loop with no hash lookups, no binding
stack manipulation, and no calls into the runtime. The binding stack
itself is never modified — it retains the full history of binds
needed for unwind-protect and error handling.
6.3 Carrier Value Update on Migration
When a fiber migrates between carriers via work stealing, its binding stack’s old-value entries still contain values from the previous carrier’s TLS array. If the fiber later yields or dies, the restore path would write the old carrier’s values into the new carrier’s TLS — corrupting the new carrier’s state.
update-binding-stack-carrier-values fixes this. Before replaying
the TLS overlay, it walks the binding stack and overwrites each
outermost old-value with the current carrier’s TLS value for that
index. This ensures that when the fiber yields, the carrier’s TLS
is restored correctly regardless of which carrier the fiber was
originally on.
This operation is skipped on same-carrier resumes (the common case),
gated by a migration flag computed from
(not (eq (fiber-carrier fiber) *current-thread*)).
6.4 Catch Block and Unwind-Protect Chain Save/Restore
Beyond the binding stack, SBCL maintains per-thread pointers to the
current catch block (*current-catch-block*) and the current
unwind-protect block (*current-unwind-protect-block*). These are
singly-linked lists threaded through the control stack, used by
throw, handler-case, and unwind-protect.
On yield, the fiber saves these pointers from the thread struct into
fiber-saved-catch-block and fiber-saved-unwind-protect-block, then
restores the carrier’s values (saved when the fiber was mounted).
On resume, the carrier’s values are saved and the fiber’s are restored. For new fibers, these are zero (no catch or unwind-protect blocks established yet).
6.5 Empty Binding Stack Fast Path
Many fibers (especially benchmark and compute-intensive fibers) never bind any special variables. Their binding stack pointer equals their binding stack start.
The yield and resume paths check for this condition:
(when (/= (fiber-binding-stack-pointer fiber)
(fiber-binding-stack-start fiber))
...)
When the binding stack is empty, the entire TLS save/restore sequence is skipped: no hash table operations, no array allocation, no binding stack walk. This fast path reduces context switch cost for fibers that do not use dynamic bindings.
6.6 Same-Carrier Resume Optimization
update-binding-stack-carrier-values is expensive: it walks the
entire binding stack with a hash table for deduplication. But it is
only needed when a fiber has migrated between carriers, because the
old-value entries already match the current carrier’s TLS if the fiber
hasn’t moved.
The scheduler loop computes a migration flag:
(let ((migrated (not (eq (fiber-carrier fiber) *current-thread*))))
...)
This flag is propagated to resume-fiber-internal, which guards the
carrier value update:
(when migrated
(update-binding-stack-carrier-values fiber thread-sap scheduler))
In a single-carrier configuration or when work stealing is infrequent, this eliminates the binding stack walk entirely from the resume path.
7. Garbage Collector Integration
7.1 The Two-List Design: Suspended Fibers vs. Active Contexts
The GC needs to find all live Lisp objects, including those on fiber stacks. Two data structures serve this purpose:
all_fiber_gc_info— a doubly-linked list offiber_gc_infostructs, one per fiber that has been created and not yet destroyed. Each entry holds the fiber’s control stack base/pointer/end and binding stack start/pointer. Used for scanning suspended fibers' stacks.all_active_fiber_contexts— a doubly-linked list ofactive_fiber_contextstructs, one per carrier thread that is currently running a scheduler. Each entry holds the running fiber’s stack boundaries and a nestedcarrier_gc_infothat represents the carrier thread’s suspended stack.
During stop-the-world GC, all mutator threads are stopped. The collector iterates both lists without locking (all writers are stopped). For each suspended fiber, it conservatively scans the control stack and precisely scavenges the binding stack. For each active context, it scans the running fiber’s stack and the carrier’s suspended stack.
7.2 fiber_gc_info: Conservative Control Stack Scanning
Each fiber’s fiber_gc_info contains:
struct fiber_gc_info {
lispobj* control_stack_base; // mmap'd base
lispobj* control_stack_pointer; // saved RSP
lispobj* control_stack_end; // base + size
lispobj* binding_stack_start;
lispobj* binding_stack_pointer;
struct fiber_gc_info* next;
struct fiber_gc_info* prev;
int registered;
};
During GC, the collector scans words from control_stack_pointer to
control_stack_end:
for (ptr = fi->control_stack_pointer; ptr < fi->control_stack_end; ptr++) {
lispobj word = *ptr;
if (word >= BACKEND_PAGE_BYTES
&& !(exclude_from <= word && word < exclude_to))
preserve_pointer(word, 0);
}
The scan is conservative: every word that looks like a valid heap
pointer is treated as a potential reference, pinning the pointed-to
object. The self-reference exclusion (exclude_from/exclude_to)
prevents the stack’s own address range from being treated as heap
pointers.
7.3 active_fiber_context: Carrier + Fiber Stack Visibility
When a fiber is actively running on a carrier, RSP points into the
fiber’s control stack, not the carrier’s. The normal thread scanning
code in gencgc.c would miss the carrier’s suspended frames (above
the point where fiber_switch was called).
The active_fiber_context solves this with a nested carrier_gc_info
that records the carrier’s stack boundaries. The carrier’s
control_stack_pointer is set conservatively to
__builtin_frame_address(0) - 16384, ensuring that all frames
between the update_fiber_gc_context call and the actual suspension
point (inside fiber_switch) are included in the scan.
The GC checks for active fiber contexts in two places:
SP validation: If a thread’s interrupt-context SP does not fall within the carrier’s control stack, the GC checks whether it falls within the fiber’s stack via
find_active_fiber_context.Stack scanning: If a fiber is active, the GC scans the fiber’s stack (from SP to
fiber_stack_end), then separately scans the carrier’s stack (fromcarrier_start + 3*page_sizetocarrier_stack_end), skipping the guard pages.
7.4 Precise Binding Stack Scavenging
Unlike control stacks, binding stacks are scavenged precisely. Each
two-word entry has a known format (old value + TLS index), so the
collector can process them with scav_binding_stack rather than
conservative word-by-word scanning.
For the active fiber, BSP points into the fiber’s binding stack:
struct active_fiber_context* fctx = find_active_fiber_context(th);
if (fctx && fctx->fiber_binding_stack_start
&& !(bsp >= bs_start && bsp <= bs_start + BINDING_STACK_SIZE)) {
scav_binding_stack(fctx->fiber_binding_stack_start, bsp, ...);
}
For suspended fibers, the collector iterates all_fiber_gc_info:
for (fi = all_fiber_gc_info; fi; fi = fi->next) {
scav_binding_stack(fi->binding_stack_start,
fi->binding_stack_pointer, ...);
}
7.5 Persistent Carrier Context (Lock-Free Hot Path)
The original implementation called enter_fiber_gc_context and
leave_fiber_gc_context on every resume/yield, acquiring four mutex
locks per context switch (two for fiber_gc_lock, two for
active_fiber_context_lock). Under load, these global locks became a
bottleneck.
The optimized implementation uses persistent carrier contexts:
init_carrier_fiber_context()— called once when a carrier starts its scheduler. Allocates theactive_fiber_context, registers thecarrier_gc_infowith empty ranges (pointer = end, so GC scan loops execute zero iterations), links intoall_active_fiber_contexts, and stores the pointer in the thread struct’sfiber-contextslot. Two mutex acquisitions total.update_fiber_gc_context()— called per resume. Reads the context fromthread->fiber_context(no TLS lookup). Updates fiber stack boundaries and carrier stack scan range. Zero mutex locks.clear_fiber_gc_context()— called per yield-return. Sets fiber fields to NULL and carrier ranges to empty. Zero mutex locks.destroy_carrier_fiber_context()— called once when a carrier exits. Unregisters from both lists. Two mutex acquisitions total.
This reduces per-switch mutex overhead from 4 to 0.
7.6 The fiber-context Thread Struct Slot (O(1) Lookup)
find_active_fiber_context is called by the GC for every thread
during stop-the-world. The original implementation walked the
all_active_fiber_contexts linked list to find the entry matching a
given thread — O(n) in the number of carriers.
The optimized implementation stores the active_fiber_context pointer
directly in the thread struct:
struct active_fiber_context*
find_active_fiber_context(struct thread* th) {
return (struct active_fiber_context*)th->fiber_context;
}
This required adding a fiber-context slot to the thread struct
definition in objdef.lisp:
#+sb-fiber
(fiber-context :c-type "void *" :pointer t)
The slot is set by init_carrier_fiber_context and cleared by
destroy_carrier_fiber_context. For non-carrier threads, it is NULL,
and find_active_fiber_context returns NULL (no fiber context to
process).
7.7 GC Safety Windows (without-interrupts vs. without-gcing)
Two SBCL mechanisms control GC timing:
without-interrupts— defers delivery of asynchronous signals (including the stop-for-GC signal) but does not prevent GC from occurring if triggered by allocation within the body.without-gcing— prevents GC entirely (sets a flag that causes allocation to block until the region exits).
The yield and resume paths use without-interrupts for most of their
work. This prevents the stop-for-GC signal from arriving mid-update,
which could expose partially-written fiber state to the collector.
Specific without-gcing regions protect narrow critical sections
where the GC must not fire:
register_fiber_for_gc/unregister_fiber_for_gc— these modify theall_fiber_gc_infolinked list. If GC fired during a list splice, the collector could follow a dangling pointer.Binding stack pointer update in
gc_info— on yield, the fiber’sgc_infobinding stack pointer must be updated before the carrier’s BSP is restored. Between these two operations,without-interruptsalone does not prevent GC (an allocation could trigger it), so thegc_infowrite is wrapped inwithout-gcing.
7.8 Correctness Argument: Why Partial Updates Are Safe
The persistent carrier context design appears to have a race: the
update_fiber_gc_context call writes multiple fields (fiber stack
start, end, binding stack start, carrier stack pointer) without
holding any lock. Could GC see a half-updated context?
No, because of SBCL’s stop-the-world GC protocol. Before the
collector reads any of these fields, it sends a stop signal to every
mutator thread and waits for all to reach a safepoint. Once stopped,
no thread is executing update_fiber_gc_context or
clear_fiber_gc_context. The fields are therefore always in a
consistent state when the GC reads them.
The only edge case is the empty state: clear_fiber_gc_context sets
fiber fields to NULL and carrier ranges to empty. If GC reads this
state, it sees no fiber stack to scan and an empty carrier range (no
additional scanning). The carrier’s normal thread stack is still
scanned via the standard thread scanning path, so no objects are
missed.
8. Scheduler Design
8.1 The Scheduler Loop
The scheduler loop (run-fiber-scheduler) is the core of each
carrier thread. Its structure:
init-carrier-fiber-context()
loop {
fiber = fast-path-pop() or (maintenance + pop-or-steal)
if fiber:
run-fiber(fiber)
handle-post-switch(fiber)
else if has-waiters:
idle-hook()
else if group-active:
park-carrier()
else:
return
}
destroy-carrier-fiber-context()
The loop runs until either all fibers in the group are dead (the active count reaches zero) or, in single-carrier mode, the local deque is empty and no fibers are waiting.
8.2 Fast Path: Skip Maintenance When Deque Is Hot
The common case in a busy scheduler is a non-empty deque. The fast
path tries wsd-pop at the top of each iteration. If it succeeds,
the fiber is run immediately without any maintenance work. This
avoids the overhead of epoll queries, deadline heap checks, and
waiting list walks when there is plenty of local work.
A maintenance counter tracks how many fibers have been run without a maintenance pass. When it reaches 64, a full maintenance cycle is forced regardless of deque state. This backstop ensures that I/O waiters and deadline-based fibers are not starved when the deque is continuously fed by productive fibers.
8.3 Maintenance: Pending Queue Drain, Deadline Expiry, Wake Checks
When the deque is empty (or the backstop fires), the scheduler performs a full maintenance cycle:
Pending queue drain — External threads (e.g., a server’s accept loop) submit fibers to a scheduler’s pending list via
atomic-push. Maintenance atomically swaps the list to NIL (via CAS) and pushes each fiber onto the local deque.Epoll harvest (Linux) —
epoll_waitwith timeout=0 drains all pending I/O events into the scheduler’sready-fdshash table.%scheduler-wake-ready-io-waitersthen moves ready fibers from theio-waiterstable to the deque.Deadline heap expiry —
%heap-pop-expiredpops all fibers whose deadline has passed (comparing against a singleget-internal-real-timecaptured once per maintenance cycle) and pushes them to the deque.Waiting list walk — Each fiber on the intrusive waiting list is checked: if its deadline has passed or its wake condition returns true, it is moved to the deque. Fibers that remain waiting are re-linked into a new list.
Note that fiber-aware mutex acquisition (Section 2.8.1) and condition
variable waits (Section 2.8.2) currently use predicate-based parking,
which places the waiting fiber on the generic waiting list. This
means mutex and condition variable waiters are checked via the O(W)
waiting list walk rather than a targeted wakeup mechanism. This is
adequate when mutex contention among fibers is low (the typical case
for I/O-heavy server workloads), but a per-mutex fiber wait queue
with explicit wakeup from release-mutex would provide O(1) wakeup
and is a planned improvement.
8.4 Post-Switch Dispatch (Suspended, Dead)
After fiber_switch returns (the fiber yielded or died), the
scheduler examines the fiber’s state:
:suspended — The fiber yielded. The scheduler routes it
based on its wait metadata:
- No wake condition, no deadline, no fd → immediately runnable; push back to deque.
- Has fd + epoll → index in the
io-waiterstable (and optionally the deadline heap if it also has a timeout). - Has deadline only (e.g.,
fiber-sleep) → insert into deadline heap. - Has deadline + wake condition → insert into both the heap and the generic waiting list.
- Has wake condition only → prepend to the generic waiting list.
:dead — The fiber’s function returned or signaled. The
scheduler decrements the group’s active count (CAS loop) and calls
destroy-fiber to free stacks and GC info.
8.5 Idle Detection and Carrier Parking
When the deque is empty, no maintenance produced work, and work stealing failed, the scheduler must wait efficiently.
If waiters exist (suspended fibers with conditions or deadlines),
the idle hook is called. The default idle hook
(fiber-io-idle-hook) calls epoll_wait with a timeout derived from
the nearest deadline, blocking the carrier thread until either an I/O
event arrives or the timeout expires.
If no waiters exist but the group’s active count is positive (some
fibers are running on other carriers or pending in their queues), the
carrier parks on the group’s condition variable with a 1 ms
timeout. submit-fiber calls condition-notify on this CV when new
work arrives, waking a parked carrier.
If the active count is zero, the carrier exits the scheduler loop.
8.6 Maintenance Frequency Backstop (Every 64 Fibers)
The backstop value of 64 is a tuning parameter. Too small, and maintenance overhead slows down tight fiber loops. Too large, and I/O waiters or deadline timers see increased latency. 64 is a compromise: at 2 million switches/sec, maintenance runs every ~32 microseconds, providing sub-millisecond responsiveness for I/O and timers.
9. Work Stealing
9.1 Chase-Lev Lock-Free Deque
Each scheduler’s run queue is a Chase-Lev work-stealing deque, a data structure designed for exactly this use case: one owner thread with fast, non-contended access, and multiple thief threads that can steal work without blocking the owner.
The deque has three fields:
bottom— index of the next slot the owner will push to. Only the owner writes this.top— index of the next slot a thief will steal from. Read by everyone; written by thieves via CAS.buffer— a power-of-two circular array of fiber references.
9.2 Owner Operations (Push/Pop from Bottom, LIFO)
The owner pushes to and pops from the bottom:
Push: Write the fiber to buffer[bottom & mask], issue a write
barrier, increment bottom. No atomic operations needed because only
the owner writes bottom.
Pop: Decrement bottom, issue a memory barrier, read top. If
bottom > top, there are multiple elements and the pop is safe. If
bottom == top, this is the last element and the owner must CAS
top to top+1 to resolve the race with a concurrent steal. If
bottom < top, the deque was empty; restore bottom to top.
LIFO order (pop from bottom) gives the owner good cache locality: the most recently pushed fiber is likely still warm in cache.
9.3 Thief Operations (Steal from Top, FIFO, CAS)
Thieves steal from the top:
Steal: Read top, issue a read barrier, read bottom. If
top < bottom, there are elements to steal. Read
buffer[top & mask], then CAS top from the read value to
top+1. If the CAS succeeds, the steal is complete. If it fails
(another thief got there first, or the owner popped the last
element), return nil.
FIFO order (steal from top) gives breadth-first work distribution, which tends to move larger chunks of work to idle carriers.
9.4 Buffer Growth (Power-of-Two Circular Array)
When bottom - top >= length(buffer), the deque is full. The owner
doubles the buffer by allocating a new array and copying elements.
The circular indexing (index & (size - 1)) works correctly for any
power-of-two size, so the copy is straightforward:
(loop for i from top below bottom
do (setf (svref new-buf (logand i new-mask))
(svref old-buf (logand i old-mask))))
Growth is safe because only the owner thread calls wsd-push (and
therefore wsd-grow). Thieves operate exclusively on top via
CAS, and they index into the buffer using the current buffer
reference and its length. The owner publishes the new buffer via a
single setf of the buffer slot; any thief that reads the old
buffer reference before the swap sees a consistent (if stale) view
and its CAS on top will fail or succeed harmlessly. The old
buffer becomes garbage and is collected by GC. A write barrier
after each push ensures the new element is visible before bottom
is incremented.
Growth is rare in practice because the initial size (64) accommodates most workloads.
9.5 Random Victim Selection
When a carrier’s deque is empty and maintenance produced no work, the
scheduler attempts to steal. try-steal-fiber selects victims
randomly:
(let ((start (random n)))
(loop for i from 0 below n
for idx = (mod (+ start i) n)
for victim = (aref schedulers idx)
unless (eq victim scheduler)
do (let ((stolen (wsd-steal (fiber-scheduler-run-deque victim))))
(when stolen (return stolen)))))
Starting from a random index and wrapping around avoids the thundering-herd problem where all idle carriers try to steal from the same victim. The loop tries each sibling once before giving up.
9.6 Fiber Migration and Thread Register Fixup
When a fiber is stolen from carrier A and resumed on carrier B, two things must be fixed:
Thread register — The saved register set contains carrier A’s thread pointer. The
fiber_switchassembly patches this (see Section 4.6).Binding stack carrier values — The old-value entries in the binding stack reflect carrier A’s TLS values.
update-binding-stack-carrier-valuesrewrites them with carrier B’s values (see Section 6.3).
These fixups happen during resume-fiber-internal, gated by the
migration flag. After fixup, the fiber runs on carrier B as if it
had always been there.
10. I/O Multiplexing
10.1 Platform Abstraction (epoll, kqueue, poll Fallback)
The fiber scheduler uses platform-specific I/O multiplexing:
- Linux:
epoll(created viaepoll_create1at scheduler init) - BSD/macOS:
kqueue(created viakqueue()at scheduler init) - Fallback: batched
poll()call over all waiting fds
The event multiplexer fd is stored in fiber-scheduler-event-fd.
If creation fails (returns -1), the scheduler falls back to the poll
path.
10.2 Edge-Triggered Mode with One-Shot (EPOLLET | EPOLLONESHOT)
On Linux, fd registrations use edge-triggered mode combined with one-shot:
(let ((events (logior (if (eq direction :input) EPOLLIN EPOLLOUT)
EPOLLET EPOLLONESHOT)))
(epoll-ctl-add efd fd events))
Edge-triggered (EPOLLET) means the kernel delivers an event only
when the fd transitions from not-ready to ready, not continuously
while data is available. This eliminates the thundering-herd problem:
if multiple fibers wait on related fds (e.g., listening socket and
accepted connections), only the relevant fibers are woken.
One-shot (EPOLLONESHOT) disables the registration after
delivering one event. This prevents duplicate wakeups and simplifies
the lifecycle: each wait-until-fd-usable call registers, gets one
event, and the registration is consumed. A subsequent wait must
re-register (via epoll-ctl-mod to re-arm).
10.3 Indexed I/O Waiters (fd-to-Fiber Table)
When epoll delivers events, the scheduler needs to find which fibers
are waiting on which fds. The scheduler maintains an io-waiters
hash table mapping fd numbers to lists of waiting fibers.
When a fiber parks with an fd wait, %scheduler-index-io-waiter adds
it to this table. When epoll events arrive,
%scheduler-wake-ready-io-waiters looks up each ready fd and moves
all waiting fibers for that fd to the run deque.
This O(1) lookup replaces the alternative of walking the entire waiting list checking each fiber’s fd, which would be O(W) where W is the number of waiting fibers.
10.4 Bounded Epoll Drain Loop
%scheduler-harvest-epoll calls epoll_wait in a loop to drain all
pending events:
(loop for ms = timeout-ms then 0
do (let ((n (epoll-wait efd sap 64 ms)))
(when (or (null n) (<= n 0)) (return))
(dotimes (i n) (setf (gethash (aref buf i) ready-fds) t))
(incf total n)
(when (< n 64) (return))))
The first call uses the provided timeout (allowing the carrier to
block when idle). Subsequent calls use timeout=0 to drain any
remaining events without blocking. The loop terminates when
epoll_wait returns fewer than 64 events (the buffer size),
indicating no more pending events.
With EPOLLET | EPOLLONESHOT, each fd fires at most once, so the
drain naturally terminates.
10.5 fd-ready-p and wait-until-fd-usable Integration
fd-ready-p is a non-blocking readiness check using poll() with a
zero timeout. It is used for quick checks before parking and for
timeout=0 calls to wait-until-fd-usable.
%fiber-wait-until-fd-usable implements the full fiber-aware path:
- If timeout is zero or negative, return the result of
fd-ready-pimmediately (with a fast-path check against the scheduler’sready-fdshash table on Linux). - Set the fiber’s
wait-fdandwait-directionfields. - Register the fd with epoll/kqueue.
- If epoll is available: yield with the
%always-falsesentinel (the scheduler wakes the fiber via epoll events, not predicate polling). If a deadline is set, also store it for the deadline heap. - If no epoll:
fiber-parkwith a predicate that callsfd-ready-p. - On wake, clear the
wait-fdfield in anunwind-protect.
10.6 The Default I/O Idle Hook
fiber-io-idle-hook is called when the scheduler has no runnable
fibers but has suspended fibers waiting. It computes a timeout from
the nearest deadline in the heap (or 100 ms as a cap), then:
- On Linux with I/O waiters: calls
%scheduler-harvest-epollwith that timeout, blocking until an event arrives or the timeout expires. - Without I/O waiters: calls
nanosleepfor a brief period (capped at 10 ms) to avoid busy-waiting on timer-only waits.
10.7 Batched FD Polling vs. Per-Fiber Polling
On platforms without epoll (or when epoll creation fails), the
fallback uses %batched-fd-poll, which collects all waiting fibers'
fds into a single poll() call. This is O(W) where W is the number
of I/O waiters, but still far better than calling poll() once per
fiber.
11. Deadline Scheduling
11.1 Binary Min-Heap with Inline Index
The scheduler maintains a binary min-heap ordered by
fiber-deadline (an internal-real-time value). The heap is stored
as a simple-vector in the scheduler struct, with the count tracked
separately.
Each fiber has a heap-index slot (-1 when not in the heap) that
enables O(log N) removal: instead of searching the heap for the
fiber, the scheduler reads the index and sifts from that position.
11.2 O(log N) Insert and Remove
Insert (%heap-insert): Append the fiber at position n
(the current count), increment the count, sift up. If the heap
array is full, double it.
Remove (%heap-remove): Swap the fiber at position i with the
last element, decrement the count, then sift the swapped element up
or down as needed to restore the heap property.
Sift up: Compare the fiber’s deadline with its parent’s; swap if smaller; repeat until the root or a larger-or-equal parent is found.
Sift down: Compare the fiber’s deadline with both children’s; swap with the smaller child if needed; repeat until a leaf or no swap is needed.
11.3 Batch Expiry (Pop All Expired in One Pass)
%heap-pop-expired pops all fibers whose deadline has passed in a
single loop:
(loop while (plusp count)
for fiber = (svref heap 0)
while (<= (fiber-deadline fiber) now)
do (%heap-remove scheduler fiber)
(setf (fiber-state fiber) :runnable)
(wsd-push deque fiber))
Since the heap is ordered by deadline, the root always has the earliest deadline. Once the root’s deadline is in the future, no other fiber can be expired, so the loop terminates.
11.4 Interaction with I/O Waiters (Dual-Indexed Fibers)
A fiber waiting on both I/O and a timeout (e.g.,
wait-until-fd-usable with a timeout) is indexed in both the
io-waiters table and the deadline heap. Whichever triggers first
wakes the fiber:
- If epoll fires first,
%scheduler-wake-ready-io-waitersalso removes the fiber from the deadline heap (via%heap-remove). - If the deadline expires first,
%heap-pop-expiredremoves the fiber from theio-waiterstable (via%scheduler-remove-io-waiter).
This dual-indexing avoids the need for a wrapper closure that checks both conditions, keeping the wake path allocation-free.
12. Fiber Death and Cleanup
12.1 The Lisp Trampoline (fiber-trampoline)
When a fiber starts for the first time, the entry path is:
fiber_switch → fiber_entry_trampoline (asm)
→ fiber_run_and_finish (C)
→ fiber-trampoline (Lisp)
fiber-trampoline wraps the user function in unwind-protect and
handler-case:
(unwind-protect
(handler-case
(setf (fiber-result fiber)
(multiple-value-list (funcall (fiber-function fiber))))
(sb-sys:interactive-interrupt (c)
(error c)) ; propagate Ctrl-C to carrier
(serious-condition (c)
(setf (fiber-result fiber) (list c))))
;; cleanup forms ...
)
If the user function returns normally, all of its return values are
captured as a list in the fiber’s result slot (matching the
thread-result convention). If it signals an unhandled error, the
condition object is stored as a single-element list. Either way,
execution reaches the unwind-protect cleanup.
12.2 Error Handling and Result Capture
The handler-case around the user function catches all conditions of
type serious-condition (which includes error, storage-condition,
and sb-ext:timeout). This prevents an unhandled condition in a
fiber from killing the carrier thread (which would take down the
scheduler and all other fibers on that carrier).
The one exception is sb-sys:interactive-interrupt (Ctrl-C / SIGINT),
which is re-signaled so it propagates to the carrier thread. This
preserves the user’s ability to break into the debugger interactively.
Because the handler-case catches conditions without invoking the
debugger, errors in fibers are silently captured rather than
triggering an interactive break. If interactive debugging is desired,
the user should establish a handler-bind with invoke-debugger
inside the fiber’s function. Restarts like abort work within the
fiber’s own catch/block scope.
The captured condition or return values are stored in fiber-result
as a list and can be retrieved by fiber-join (which returns them via
values-list, analogous to join-thread) or directly after the fiber
is dead. When an error is caught, the fiber’s errorp flag is set to
T. The predicate fiber-error-p returns this flag, so callers can
distinguish a fiber that intentionally returned a condition object
from one that signaled an error.
Non-local exits via throw, return-from, or go that target
tags or blocks within the fiber’s own function work normally — the
catch block and unwind-protect chains are saved and restored per
fiber (see Section 6.4). A throw to a tag not established within
the fiber will signal a “tag does not exist” error (caught by the
handler-case), because the fiber’s catch block chain only contains
tags established during the fiber’s own execution. Lexical non-local
exits (return-from, go) cannot cross fiber boundaries because
blocks and tagbodies are lexically scoped; closures that capture them
from outside the fiber’s function would target stale frames, which
is undefined behavior (the same as for threads).
12.3 Binding Stack Cleanup on Death (Without unbind_to_here)
When a fiber dies with active bindings, its binding stack still contains entries that reference TLS indices with the fiber’s values. The carrier’s TLS must be restored to its pre-fiber state.
The cleanup in fiber-trampoline walks the binding stack from start
to the current BSP, using the same deduplication approach as the yield
path: for each unique TLS index, the outermost old-value is written
back to the carrier’s TLS array. This effectively undoes the fiber’s
bindings without calling unbind_to_here (which would zero the
entries and is designed for the unwind path, not the fiber death
path).
12.4 GC Info Unregistration Timing
A fiber’s gc_info is registered at creation time and must remain
registered until after the last fiber_switch returns. The timing
is critical:
During
resume-fiber-internal, the fiber’sgc_infois widened to cover the full stack (control_stack_pointer = control_stack_start). This ensures GC can see the entire fiber stack during the window between the widen andfiber_switch.After
fiber_switchreturns (the fiber yielded or died), the scheduler updates or unregisters thegc_info. For dead fibers,unregister-fiber-gc-rootsremoves it from the list. For live fibers,register-fiber-for-gc-from-fiberupdates the fields with the new saved RSP.
Unregistering before destroy-fiber frees the gc_info struct is
essential to prevent the GC from following a dangling pointer.
12.5 Stack Return to Pool
After a fiber dies, destroy-fiber returns its stacks to the pools:
- Remove the fiber from
*all-fibers*(under mutex). - Free the
gc_infostruct. - Call
free-fiber-stackfor both the control stack and binding stack SAPs.free_fiber_stackroutes each to the appropriate pool based on theguard_sizefield (non-zero for control stacks, zero for binding stacks).
The pool calls madvise(MADV_DONTNEED) on the usable region,
releasing physical pages. The virtual mapping and guard page
protection are preserved for the next user.
13. Integration with SBCL
13.1 Thread Struct Extension (fiber-context Slot)
The SBCL thread struct (defined in objdef.lisp) is extended with
three fiber-related slots when :sb-fiber is in the feature set:
#+sb-fiber (effective-control-stack-start ...)
#+sb-fiber (effective-control-stack-end ...)
#+sb-fiber (fiber-context ...)
effective-control-stack-start and effective-control-stack-end are
set to the running fiber’s stack bounds so that stack overflow checks
and stack-allocated-p tests use the correct range without
disturbing the GC’s use of control-stack-start/control-stack-end.
fiber-context stores a pointer to the carrier’s
active_fiber_context for O(1) GC lookup (see Section 7.6).
13.2 serve-event Dispatch (Fiber-Aware wait-until-fd-usable)
sb-sys:wait-until-fd-usable is defined in serve-event.lisp. When
fibers are enabled, it checks *current-fiber* at the top of the
function. A single special variable check suffices because
*current-fiber* is only non-nil inside the scheduler loop where
*current-scheduler* is already bound — the invariant
*current-fiber* → *current-scheduler* holds by construction:
#+sb-fiber
(when *current-fiber*
(let ((result (%fiber-wait-until-fd-usable fd direction timeout)))
(unless (eq result :pinned-fall-through)
(return-from wait-until-fd-usable result))))
If the fiber is pinned, the result is :pinned-fall-through, and
execution continues to the normal (blocking) path. Otherwise, the
fiber-aware path handles the wait cooperatively.
13.3 sleep and wait-for Dispatch
cl:sleep and sb-ext:wait-for are patched similarly. In fiber
context, they dispatch to %fiber-sleep and %fiber-wait-for
respectively. The fiber-aware implementations use fiber-sleep and
fiber-park under the hood, converting to fiber-compatible
suspension.
13.4 Mutex and Condition Variable Dispatch
sb-thread:grab-mutex checks for fiber context before entering the
futex-based wait path:
#+sb-fiber
(when *current-fiber*
(let ((result (%fiber-grab-mutex mutex timeout)))
(unless (eq result :pinned-fall-through)
(return-from grab-mutex result))))
The same pattern applies to grab-mutex-no-check-deadlock and
%condition-wait. condition-notify increments a per-waitqueue
fiber-generation counter that fiber-aware waiters use as their wake
condition.
13.5 Pinned Blocking Fallback to OS Primitives
Every fiber-aware dispatch follows the same pattern:
- Check
fiber-can-yield-p. If the fiber is pinned, callcheck-pinned-blockingand return:pinned-fall-through. - The caller sees
:pinned-fall-throughand falls through to the normal OS blocking implementation.
This ensures that pinned fibers (e.g., mid-interaction with a thread-affine C library) still work correctly — they simply block the carrier thread instead of yielding.
13.6 *pinned-blocking-action* Warning/Error Policy
The default value :warn causes a warning whenever a pinned fiber
blocks the carrier. This helps developers identify code that needs
restructuring. Setting it to :error makes pinned blocking a hard
error (useful during development). Setting it to nil silences the
warning (appropriate for code that legitimately pins and blocks, such
as FFI calls that hold foreign resources).
13.7 interrupt-thread
interrupt-thread is not patched for fiber awareness. If you call
(sb-thread:interrupt-thread carrier-thread function), the function
executes on the carrier thread in whatever context happens to be
active at the time. If the carrier is running a fiber, the interrupt
function runs with that fiber’s dynamic bindings and stack — not
the carrier’s. This is analogous to delivering a signal to an OS
thread: the handler runs in the interrupted context.
In general, interrupt-thread should not be used on carrier threads.
To communicate with a fiber, use fiber-aware primitives: signal a
condition variable, set a flag checked by a fiber-park predicate,
or submit a new fiber to the scheduler group.
13.8 Debugger Integration
SBCL’s standard debugger operates on the current thread’s control stack. Since fibers have their own control stacks, several debugger internals are extended for fiber awareness:
control-stack-pointer-valid-pchecks*debug-control-stack-start/end*when bound, so frame-walking code validates pointers against the fiber’s stack rather than the carrier’s.with-fiber-debug-boundsbinds the debug stack bounds to a fiber’s control stack region, enablingframe-downand other debugger functions to walk the fiber’s frames.fiber-top-frameextracts the frame pointer and return address from the fiber’sfiber-switchsave area (architecture-specific offsets) and callscompute-calling-frameto produce the initialcompiled-frameobject.frame-downstops when it encountersfiber_run_and_finishin the call chain, since that is the absolute bottom of a fiber’s stack.print-fiber-backtracecombines these pieces: it binds the debug bounds, gets the top frame, and delegates toprint-backtracefor full symbolic output.
14. Performance
Native Linux threads are highly performant. With modern kernels, a thread-per-connection model delivers excellent throughput up to a few thousand concurrent connections — there is no need to use fibers if your workload stays within that range.
Where fibers provide a decisive advantage is at high connection
counts, where the thread-per-connection model hits fundamental
resource limits. Each OS thread allocates an 8 MB stack by default,
so 10,000 threads require ~80 GB of virtual address space. Fibers
use 256 KB stacks by default (configurable per fiber via
:stack-size), so 10,000 fibers require ~2.5 GB — a 32x
reduction at the default size. Beyond the memory wall, the kernel scheduler’s
performance degrades with thousands of runnable threads competing for
CPU time, while the fiber scheduler’s work-stealing design scales
more gracefully.
14.1 HTTP Benchmark
All benchmarks use a Hunchentoot “Hello, World!” HTTP server
measured with wrk -t4 -cN -d10s on a 16-core Linux machine.
Thread mode uses Hunchentoot’s standard one-thread-per-connection
taskmaster (unlimited threads, 8 GB heap). Fiber mode uses a
fiber-per-connection taskmaster with carrier count equal to the
number of CPUs (16).
| Connections | Threads (r/s) | Fibers (r/s) | Fiber advantage |
|---|---|---|---|
| 1,000 | 129,724 | 160,947 | 1.24x |
| 2,500 | 89,188 | 142,105 | 1.59x |
| 5,000 | 79,169 | 124,745 | 1.58x |
| 10,000 | 55,493 | 102,710 | 1.85x |
| 25,000 | 62,107 | 83,036 | 1.34x |
At lower connection counts (below ~500), we expect threads to have an advantage due to the low cost of kernel thread context switching (see Section 14.4). The benchmarks above focus on the concurrency range where the thread-per-connection model begins to degrade and fibers provide a clear benefit.
Fibers outperform threads at every concurrency level tested. The advantage grows with connection count, peaking at nearly 2x at 10,000 connections.
1,000 connections: fibers deliver 160,947 r/s versus 129,724 r/s for threads — a 1.24x advantage. Thread performance is already degrading due to kernel scheduling overhead across 1,000 OS threads.
2,500-5,000 connections: fibers sustain 124,745-142,105 r/s while threads drop to 79,169-89,188 r/s. The thread-per-connection model suffers from increasing context switch costs and lock contention in the Hunchentoot taskmaster. Threads also begin experiencing socket timeouts at 2,500 connections.
10,000+ connections: fibers maintain over 100,000 r/s at 10,000 connections while threads drop to 55,493 r/s — a 1.85x advantage. At 25,000 connections, both models are I/O-bound, but fibers still lead at 83,036 r/s versus 62,107 r/s. Fibers maintain the memory advantage: threads consume ~200 GB of virtual address space while fibers use ~6.4 GB.
14.2 Memory Efficiency
The defining advantage of fibers is memory efficiency at scale:
| Connections | Thread stacks | Fiber stacks | Ratio |
|---|---|---|---|
| 100 | 800 MB | 25 MB | 32x |
| 1,000 | 8 GB | 256 MB | 32x |
| 10,000 | 80 GB | 2.5 GB | 32x |
| 100,000 | 800 GB | 25 GB | 32x |
Thread stacks assume the Linux default of 8 MB; fiber stacks assume
the default of 256 KB (reducible via :stack-size at make-fiber
time). Both figures represent virtual address space — physical
memory usage depends on how many stack pages are touched
(demand-paged).
In practice, the thread-per-connection model becomes infeasible well before 100,000 connections because the kernel cannot allocate enough thread stacks. Fibers handle 100,000 idle connections routinely (see Section 14.3).
14.3 Scalability Under High Connection Counts
The C100K (100,000 idle connections) test verifies that the system
handles large numbers of suspended fibers without degradation. Each
idle fiber occupies ~256 KB of virtual address space (physical pages
are demand-allocated), a fiber_gc_info entry (64 bytes), and a
slot in the epoll interest set.
GC pause times increase linearly with the number of fibers because each suspended fiber’s control stack must be conservatively scanned. The per-fiber overhead is small for mostly-empty stacks (only a few live frames to scan), but scales with fiber count.
14.4 Context Switch Microbenchmark
A tight yield loop (two fibers yielding back and forth on a single carrier) measures raw context switch throughput:
| Configuration | Switches/sec | Cost/switch |
|---|---|---|
| Fiber context switch | 2,100,000 | ~0.48 us |
| Kernel thread switch (reference) | 4,000,000-7,000,000 | ~150-250 ns |
A fiber switch is roughly 2-3x slower than a kernel thread context switch. The additional cost comes from TLS save/restore orchestration and GC metadata updates that kernel threads do not need — a kernel thread’s stack is always at a known location and its TLS is managed by the OS.
14.5 GC Impact
Fibers increase GC work in two ways:
Conservative stack scanning — each suspended fiber’s stack is scanned word-by-word. Stacks with few live frames have little data to scan; deeply recursive fibers contribute more.
Binding stack scavenging — each fiber’s binding stack is precisely scavenged. This is proportional to the number of active bindings, which is typically small.
The persistent carrier context (Section 7.6) avoids mutex contention during stop-the-world collection — carriers do not need to acquire locks to update their GC metadata on every switch.
15. Platform Support
15.1 x86-64 (Linux, macOS, Windows)
The primary development platform. All features are available:
fiber_switch: 6 callee-saved GPRs (SysV), 8 GPRs + 10 XMMs (Win64)- I/O:
epollon Linux,kqueueon macOS,pollfallback on Windows - Thread register:
r13(patched on migration) - Stack pool:
madvise(MADV_DONTNEED)on Linux/macOS,VirtualAlloc/VirtualFreeon Windows
15.2 ARM64
Full support with 160-byte register save frame. Uses stp/ldp
instructions for paired register saves (ARM64’s most efficient
store/load pattern). Thread register is x21.
I/O multiplexing depends on the OS: epoll on Linux, kqueue on
iOS/macOS (via the BSD layer).
15.3 ARM32
Full support with 104-byte register save frame. Saves VFP registers
d8-d15 (64 bytes) and GPRs r3-r11, lr (40 bytes). Thread
register is r10. Uses blx for indirect calls.
15.4 PPC64
Full support with 320-byte register save frame. Saves LR, CR, 18
callee-saved GPRs (r14-r31), and 18 callee-saved FPRs
(f14-f31). Thread register is r30. Frame includes backchain
pointer for ABI compliance.
15.5 PPC32
Full support with 240-byte register save frame. Same register set as
PPC64 but with 4-byte GPRs: LR, CR, r14-r31 (72 bytes),
f14-f31 (144 bytes). Thread register is r17
(thread-base-tn).
15.6 RISC-V (RV64)
Full support with 208-byte register save frame. Saves ra, cfp
(for debugger backtraces), s0-s11 (12 GPRs), and fs0-fs11
(12 FPRs). Thread register is s10 (r26). Uses sd/ld for
GPRs and fsd/fld for FPRs.
15.7 Feature Flag: :sb-fiber
Fiber support is controlled by the :sb-fiber feature. When absent,
all fiber code is excluded via #+sb-fiber reader conditionals. The
thread struct does not include fiber slots, the dispatcher hooks in
grab-mutex/sleep/wait-until-fd-usable are not compiled, and
the fiber.c/fiber.h runtime files are not linked.
To enable fibers, add :sb-fiber to
local-target-features.lisp-expr before building SBCL.
15.8 Platform-Specific Assembly and I/O Backends
Each supported architecture has its own assembly file:
| Platform | Assembly file | Save frame |
|---|---|---|
| x86-64 | src/assembly/x86-64/fiber.lisp | 48-224 bytes |
| ARM64 | src/assembly/arm64/fiber.lisp | 160 bytes |
| ARM32 | src/assembly/arm/fiber.lisp | 104 bytes |
| PPC64 | src/assembly/ppc64/fiber.lisp | 320 bytes |
| PPC32 | src/assembly/ppc/fiber.lisp | 240 bytes |
| RISC-V | src/assembly/riscv/fiber.lisp | 208 bytes |
I/O backends:
| Platform | Backend | Registration mode |
|---|---|---|
| Linux | epoll | EPOLLET | EPOLLONESHOT |
| BSD/macOS | kqueue | EV_ADD | EV_ONESHOT |
| Other | poll() | Batched per maintenance |
Appendix A: Using Hunchentoot with Fibers
Hunchentoot’s standard one-thread-per-connection-taskmaster creates an
OS thread for every accepted connection. A fiber-based taskmaster
replaces those threads with fibers: one fiber per connection, all
multiplexed across a fixed set of carrier threads. Because SBCL fibers
make sleep, grab-mutex, condition-wait, and wait-until-fd-usable
transparently fiber-aware, Hunchentoot’s request processing code works
without modification — only the taskmaster needs to change.
A.1 The Fiber Taskmaster
(defclass fiber-taskmaster (hunchentoot:taskmaster)
((group :accessor fiber-taskmaster-group
:initform nil
:documentation "The fiber scheduler group.")
(carrier-count :initarg :carrier-count
:initform nil
:accessor fiber-taskmaster-carrier-count
:documentation "Number of carrier threads.
NIL means use the number of available CPUs."))
(:documentation "A Hunchentoot taskmaster that runs each connection
in a fiber instead of a dedicated OS thread."))
A.2 Taskmaster Methods
(defmethod hunchentoot:execute-acceptor ((taskmaster fiber-taskmaster))
"Start the fiber scheduler group, then run the accept loop on a
carrier fiber so that the accept() call itself is fiber-aware."
(let* ((acceptor (hunchentoot:taskmaster-acceptor taskmaster))
(accept-fiber
(sb-thread:make-fiber
(lambda () (hunchentoot:accept-connections acceptor))
:name (format nil "hunchentoot-acceptor-~A:~A"
(or (hunchentoot:acceptor-address acceptor) "*")
(hunchentoot:acceptor-port acceptor)))))
(setf (fiber-taskmaster-group taskmaster)
(sb-thread:start-fibers
(list accept-fiber)
:carrier-count (fiber-taskmaster-carrier-count taskmaster)))))
(defmethod hunchentoot:handle-incoming-connection
((taskmaster fiber-taskmaster) socket)
"Submit a new fiber to process this connection."
(let ((acceptor (hunchentoot:taskmaster-acceptor taskmaster)))
(sb-thread:submit-fiber
(fiber-taskmaster-group taskmaster)
(sb-thread:make-fiber
(lambda ()
(hunchentoot:process-connection acceptor socket))
:name "hunchentoot-worker"))))
(defmethod hunchentoot:shutdown ((taskmaster fiber-taskmaster))
"Shut down the fiber scheduler group."
(when (fiber-taskmaster-group taskmaster)
(sb-thread:finish-fibers (fiber-taskmaster-group taskmaster)))
taskmaster)
A.3 Starting the Server
;; Load Hunchentoot (via Quicklisp, OCICL, or ASDF)
(ql:quickload :hunchentoot) ; or (asdf:load-system :hunchentoot)
;; Define a handler
(hunchentoot:define-easy-handler (hello :uri "/") ()
(format nil "Hello from fiber ~A on carrier ~A!"
(sb-thread:fiber-name sb-thread:*current-fiber*)
(sb-thread:thread-name sb-thread::*current-thread*)))
;; Start with the fiber taskmaster
(defvar *server*
(hunchentoot:start
(make-instance 'hunchentoot:easy-acceptor
:port 8080
:taskmaster (make-instance 'fiber-taskmaster
:carrier-count 4))))
To stop:
(hunchentoot:stop *server*)
A.4 How It Works
execute-acceptorcreates a scheduler group with the specified number of carrier threads and submits the accept loop as the initial fiber. The accept loop callsusocket:wait-for-inputandusocket:socket-accept— both perform I/O that goes throughwait-until-fd-usable, which yields the fiber cooperatively.For each accepted connection,
handle-incoming-connectioncreates a fiber and submits it to the scheduler group. The fiber runsprocess-connection, which reads the HTTP request, calls the handler, and writes the response. Every blocking operation (socket reads/writes, mutex grabs, sleeps) yields the fiber instead of blocking the carrier thread.Work stealing distributes fibers across carriers automatically. If one carrier is busy with a compute-heavy handler while another is idle, the idle carrier steals fibers from the busy one.
When the acceptor shuts down,
finish-fibersjoins all carrier threads and cleans up.
A.5 SSL
Hunchentoot’s ssl-acceptor uses cl+ssl (OpenSSL) for TLS.
OpenSSL maintains per-connection state in thread-local storage, so a
fiber must not migrate to a different carrier mid-handshake. Pin the
fiber during SSL operations:
(defmethod hunchentoot:initialize-connection-stream
((acceptor hunchentoot:ssl-acceptor) stream)
;; Pin to prevent work-stealing during SSL handshake
(sb-thread:with-fiber-pinned ()
(call-next-method)))
In practice, single-carrier setups ((make-instance 'fiber-taskmaster :carrier-count 1)) avoid the migration issue entirely and are a
simpler starting point when using SSL. For multi-carrier SSL, pinning
is only needed during the handshake — once the TLS session is
established, the cl+ssl stream wrapper uses ordinary fd reads/writes
that are carrier-safe.
Downsides of pinning. A pinned fiber cannot be stolen by another carrier. If the TLS handshake blocks on I/O (network round-trips for certificate exchange, key agreement), the pinned fiber falls through to OS-level blocking rather than yielding cooperatively (see Section 2.9). This blocks the carrier thread entirely, preventing it from running other fibers. Under a burst of new TLS connections, all carriers can end up blocked in handshakes simultaneously, stalling the entire scheduler. Pinning also defeats load balancing: work stealing cannot redistribute pinned fibers, so carriers become unevenly loaded.
Alternative: pure-TLS. pure-tls is a TLS 1.3 implementation written entirely in Common Lisp, with no foreign (C/OpenSSL) dependencies. Because it uses only Lisp-level I/O and data structures, there is no thread-local foreign state — fibers using pure-tls can migrate freely between carriers without pinning. This preserves cooperative yielding during the handshake (the fiber yields on socket reads/writes instead of blocking the carrier) and allows work stealing to balance TLS connections across all cores. For fiber-based servers handling many concurrent TLS connections, pure-tls eliminates the pinning bottleneck entirely.
A.6 Considerations
Session locking. Hunchentoot’s session store uses a global lock.
Under fibers, grab-mutex yields cooperatively, so there is no
deadlock risk, but high-contention sessions may cause unnecessary
yielding. For high-throughput servers, consider a lock-free session
store or per-session locks.
Timeouts. Hunchentoot sets socket-level read/write timeouts via
set-timeouts. These are OS-level SO_RCVTIMEO/SO_SNDTIMEO
timeouts that work at the fd level, independent of whether the caller
is a thread or fiber. They continue to function correctly.
Logging. Hunchentoot’s logging functions are thread-safe and work
unchanged under fibers. The *acceptor* and *request* special
variables are bound per-fiber via the normal dynamic binding mechanism,
which fibers fully support (see Section 6).
Max connections. The thread-per-connection taskmaster limits
concurrency via max-thread-count. The fiber taskmaster shown above
has no such limit — fibers are cheap enough that tens of thousands
of concurrent connections are practical. If you need to limit
concurrent connections (e.g., to bound memory), add a semaphore:
(defvar *connection-semaphore* (sb-thread:make-semaphore :count 10000))
(defmethod hunchentoot:handle-incoming-connection
((taskmaster fiber-taskmaster) socket)
(let ((acceptor (hunchentoot:taskmaster-acceptor taskmaster)))
(sb-thread:submit-fiber
(fiber-taskmaster-group taskmaster)
(sb-thread:make-fiber
(lambda ()
(sb-thread:wait-on-semaphore *connection-semaphore*)
(unwind-protect
(hunchentoot:process-connection acceptor socket)
(sb-thread:signal-semaphore *connection-semaphore*)))
:name "hunchentoot-worker"))))