Performance and Concurrency

This chapter is for server administrators and operators who want to understand how mooR actually uses CPU time, threads, and the database under load.

mooR is not architected like classic LambdaMOO-style servers. LambdaMOO effectively serialized almost all execution through one main path. mooR is intentionally multi-threaded: tasks can execute concurrently, database transactions can overlap, and some durability work is handed off to background threads. That gives mooR a very different performance profile, and it changes what "bottleneck" means in practice.

This page explains the execution model, the database concurrency model, the main performance counters, and the tuning knobs most likely to matter.

Not every performance problem is a runtime problem. Some worlds are slow because the MOO code itself does too much work or creates avoidable transaction conflicts. For language-level guidance, see Performance Tuning MOO Code.

This page assumes an operator or wizard audience. It discusses server-level counters, logs, thread placement, and runtime configuration. Ordinary programmers without wizard access will usually want to start with the language-side guidance above.

Why Is My MOO Slow?

This is usually the question operators actually start with. The important step is to avoid treating "slow" as one thing. In mooR, poor responsiveness can come from scheduler delay, worker saturation, database conflict, or storage backpressure.

Slow Command or Verb Response

Look first at:

scheduler counters such as submit, wakeup, and handoff latency
builtin counters if commands spend time in a few heavy builtins
database counters if the command performs a lot of object/property/verb mutation

Likely causes:

the scheduler is delayed before the task reaches a worker
the worker pool is saturated
the command is builtin-heavy or VM-heavy
the command is conflict-heavy and is retrying work

Latency Spikes With Otherwise Low CPU Usage

Look first at:

scheduler counters, especially wakeup and handoff counters
database counters, especially commit and provider lock-wait counters
database counters, especially batch-writer backpressure counters
daemon log warnings around the storage path

Likely causes:

intermittent commit serialization pressure
slow durable flushes
a barrier or checkpoint path waiting for background writes
occasional contention on a hot object or mailbox

Good CPU Availability but Writes Feel Slow

Look first at:

database counters, especially DB commit-phase counters
database counters, especially provider lock-wait counters
database counters, especially batch-writer backpressure counters
conflict/retry behavior

Likely causes:

the workload is logically contended even if CPUs are idle
durable storage is slow enough that foreground commits start to feel it
the writer path is coalescing under pressure rather than draining immediately

Storage Is Slow

Slow storage has a distinct signature in mooR.

Look for both:

warnings in the daemon log about slow flushes or batch-writer backpressure
the database counters, especially batch-writer backpressure counters, continuing to climb

In the current implementation, the writer path emits warnings when:

Fjall flushes are slow
the batch-writer queue is full and commits have to block
barrier sends or backpressure blocks take too long

This is one of the clearer cases where logs and counters should be read together. A single warning may just indicate a transient stall. Repeated warnings plus steadily rising batch-writer backpressure counters are a strong sign that the storage layer is not keeping up with the write rate.

When that happens, look at:

disk latency and throughput on the host
whether the deployment is on especially slow or burst-limited storage
whether checkpoints, exports, or other maintenance activity are sharing the same device
whether the workload is generating unusually large or bursty commits

It Gets Worse As More Players Log In

Likely causes:

more runnable tasks than the scheduler and worker pool can efficiently cycle through
more contention on a small number of shared objects or properties
mailbox or task-message hotspots
write amplification from many tasks touching the same parts of the world

This is where mooR's concurrency model helps, but it does not remove all application-level serialization. A world can still behave as if it is single-threaded if most activity pounds on the same small set of logical resources.

Execution Model

At a high level, the daemon has three important kinds of work:

scheduler and control-plane work
task execution work
storage and durability work

The scheduler is responsible for orchestration:

receiving requests from hosts and workers
deciding which task should run next
waking suspended tasks
handling task lifecycle transitions such as suspend, retry, completion, and cancellation

Task bodies do not run on the scheduler thread. Instead, runnable tasks are dispatched onto a task worker pool. Those workers execute verbs, builtins, and other VM activity in parallel. This is one of mooR's core architectural differences from older single-threaded MOO servers.

In practice this means:

independent tasks can make progress on different cores at the same time
scheduler responsiveness still matters because all execution passes through it for orchestration
affinity and core reservation can matter on heterogeneous CPUs

Scheduler and Task Pool

The scheduler and the task pool have different jobs.

The scheduler:

owns task queues and wakeup state
processes messages from hosts, workers, and running tasks
decides when to resume or dispatch tasks
handles retries after transaction conflicts

The task pool:

executes task bodies on worker threads
runs verb dispatch and builtin-heavy VM work
returns results, suspension requests, and retry requests back to the scheduler

This separation is why some latency counters are "scheduler" counters even though the task itself is expensive. The scheduler may be fast, but the worker pool can still be saturated. Likewise, task workers may be idle while the scheduler is delayed by coordination work.

Database Concurrency Model

mooR's database layer is built around optimistic concurrency with serializable isolation.

The important operational points are:

transactions read from a stable snapshot
writes are accumulated in working sets
commit validation checks whether concurrent changes invalidated the transaction
conflicting transactions are retried rather than blocked behind one big global execution lock

This is a very different model from classic MOO implementations where execution and mutation were implicitly serialized by a single-threaded runtime.

Operational consequences:

read-heavy workloads can overlap well
independent write workloads can also overlap until commit
conflicting write workloads show up as retries, not just as longer wait times
some apparent "latency" is actually retry pressure

If you are diagnosing a workload with many retries, look at the logical shape of the application as well as the machine. Hot objects, shared counters, shared mailboxes, and other concentrated write patterns can force serial progress even on a large machine.

Commit and Durability Path

The write path is split into logical commit and durable flush stages.

At commit time, the database:

validates the transaction against the current published root
applies accepted mutations
publishes the next root for readers
hands the write batch off for durability work

That means the logical commit path and the durable writeback path are related but not identical. The system can publish a new root and then rely on background infrastructure to push the queued writes through the storage engine.

Background Writers

mooR uses background writer infrastructure for some storage work, including a coalescing batch writer in the Fjall-backed path.

That writer:

receives committed write batches
can deduplicate and coalesce pending writes
flushes immediately under normal conditions for durability
switches into more coalescing behavior under backpressure or slow flush conditions
supports barrier-style synchronization when a caller needs to know a given timestamp is durable

This is important when reading performance counters:

a fast logical commit does not always mean storage is idle
backpressure in the batch writer can indicate the durable write path is the bottleneck
slow barriers often mean the system is waiting for queued writes to drain

Thread Placement and Affinity

mooR distinguishes between service/control-plane threads and task worker threads.

Service/control-plane threads include work such as:

scheduler orchestration
RPC/event handling
background coordination

Task worker threads are the pool used to execute task bodies.

On systems with heterogeneous CPUs, mooR can try to reserve stronger cores for task execution while leaving some performance-core capacity for scheduler and control-plane work. The relevant runtime settings are documented in Server Configuration.

Default behavior is:

if the runtime detects a meaningful performance-core tier, task workers are pinned to the worker share of that tier
a small number of performance cores are reserved for service/control-plane threads
if no meaningful split is detected, the task pool is left unpinned

When affinity helps:

the machine has a clear fast-core and efficiency-core split
the worker pool is CPU-bound
scheduler responsiveness matters under load

When affinity may hurt:

the process runs in a container or VM with unusual CPU scheduling
topology information is misleading or incomplete
the workload is not CPU-bound and benefits more from general scheduler freedom

Performance Counters

mooR exposes several families of internal counters through builtins:

These facilities are wizard-only. They are useful for operators and for world authors who have administrative access, but they are not assumed to be available to ordinary programmers.

These return maps keyed by counter name, where each value is:

invocation count
cumulative duration in nanoseconds

Invocation counts are exact. Duration collection may be sampled, depending on runtime timing configuration.

Builtin Counters

Builtin counters measure builtin-function execution paths in the VM.

Use them when you want to understand:

which builtins are called most often
which builtins dominate cumulative execution time
whether apparent "VM slowness" is really concentrated in a small set of builtins

These counters are often useful when application behavior is builtin-heavy rather than verb-heavy.

Scheduler Counters

Scheduler counters cover orchestration and task-lifecycle work, including:

task startup, resume, retry, and kill paths
command parsing and verb lookup for command dispatch
scheduler-client and task-scheduler-client round-trip latencies
task wakeup and worker handoff timing
garbage collection phases

These counters are useful for diagnosing:

scheduler overload
slow task dispatch
wakeup delay
task handoff latency
task-list or checkpoint request overhead

Examples:

High task_submit_to_first_run_latency suggests tasks spend time waiting before they ever get a worker.
High task_thread_handoff_latency suggests worker-pool contention or delayed dispatch.
High wakeup-related latencies suggest scheduler-side delay or a backlog of runnable work.

Database Counters

Database counters cover world-state operations and the write path, including:

object, property, and verb lookup/update operations
provider tuple check/load paths
commit phases such as lock wait, check, apply, and commit-result handling
batch-writer backpressure timing

These counters are useful for diagnosing:

expensive object/property/verb operations
conflict-heavy workloads
commit serialization pressure
slow provider lock acquisition
storage backpressure in the writer path

Examples:

High commit lock-wait or check/apply time suggests commit-path contention.
High provider lock-wait counters suggest the lower storage layer is contended.
High batch-writer backpressure counters suggest durability work is falling behind foreground commit throughput.

Sampling Semantics

mooR's perf counters are designed so they can remain enabled in normal operation.

By default:

invocation counts remain exact
many hot-path durations are sampled
sampled durations are scaled so the cumulative totals remain useful as estimates

This is a practical tradeoff. Measuring every hot-path event exactly would distort the very paths you are trying to observe.

The runtime timing settings in Server Configuration control whether timing is enabled and how aggressively hot and medium paths are sampled.

Guidance for interpretation:

trust invocation counts as exact
treat cumulative duration as an estimate when sampling is enabled
use exact timing only for focused benchmarking, profiling, or short investigation windows

Common Tuning Scenarios

Benchmarking or Profiling Runs

Use exact timing when you care more about measurement precision than about observer overhead.

Recommended approach:

set perf timing sample shifts to 0
keep the workload otherwise as close to production as possible
record whether affinity is enabled, since it changes worker placement

The exact settings and examples are in Server Configuration.

Scheduler Feels Sluggish

Look first at:

scheduler latency counters
wakeup and worker handoff counters
whether too many performance cores were given to workers

Possible actions:

increase service_perf_cores
reduce task-worker pinning aggressiveness
investigate workloads creating large numbers of short runnable tasks

For the affinity knobs themselves, see Server Configuration.

Good CPU Availability but Poor Write Throughput

Look first at:

database commit-phase counters
provider lock-wait counters
batch-writer backpressure counters
retry/conflict behavior in the workload

Possible actions:

reduce hot write contention in application design
inspect whether many tasks are writing the same objects or properties
check whether the durable writer is coalescing under sustained pressure

High Retry Pressure

Retries are not only a hardware problem. They usually mean application-level conflict.

Look for:

many tasks touching the same objects
concentrated mailbox or queue updates
broad write transactions when narrower ones would suffice

More cores do not solve serializable-conflict pressure by themselves.

What To Tune First

If you are not already operating with measurements, start here:

check scheduler, builtin, and DB counter maps
determine whether the problem is scheduler-side, worker-side, or DB-side
only then change affinity or timing settings

For most deployments, the defaults are the right place to start. The knobs are there to support workload-specific tuning, not to require up-front hand tuning on day one.

Keyboard shortcuts

The mooR Book