Building A Deterministic Quote Router In Rust
A technical case study on Aqueducta's snapshot-first routing API for Movement L1.
Building A Deterministic Quote Router In Rust
Aqueducta started with a simple product question: if a wallet asks for the best route between two assets on Movement L1, how much hidden nondeterminism am I willing to accept?
My answer ended up being: as little as possible.
In DeFi routing, the obvious hard parts are pool math, graph search, and execution handoff. The less obvious hard part is making the system explainable. If two requests return different routes, I want to know whether the liquidity changed, the request changed, the policy changed, or the software regressed. That pushed the backend design toward a snapshot-first architecture: discover chain state outside the request path, name that state with a content-derived identifier, and route against that fixed input.
This article covers the Rust routing/API side of Aqueducta as a technical case study. The on-chain Move routing module is a separate article.
The Shape Of The System
At runtime, the backend has three jobs:
- keep an in-memory view of liquidity fresh enough to serve quotes
- evaluate exact-in route candidates deterministically
- expose enough diagnostics for clients and operators to know what happened
The core design looks like this:
flowchart LR
Client["Wallet, app, or route preview"] --> API["Rust quote API"]
API --> Snapshot["Hot snapshot<br/>named by snapshot_id"]
API --> Graph["Token graph<br/>k-hop route search"]
API --> Math["Quote engine<br/>CP + CLMM math"]
API --> Response["Deterministic quote response"]
Worker["Refresh workers"] --> Discovery["DEX discovery adapters"]
Discovery --> Chain["Movement REST API"]
Discovery --> Bundle["Snapshot bundle"]
Bundle --> Snapshot
API --> Metrics["Metrics and status"]
The important boundary is between refresh work and request work. Quote requests do not go back to chain discovery to figure out what pools exist. They evaluate against the active snapshot, or against a retained historical snapshot if the client asks for one explicitly.
That one decision simplified nearly everything else.
Snapshot-First Routing
The snapshot is the unit of truth. It contains pool metadata, pool state, token metadata, and discovery diagnostics. The backend sorts the material that goes into the snapshot and derives a snapshot_id from the canonical payload.
Conceptually:
struct SnapshotBundle {
chain_id: u64,
as_of_unix_secs: u64,
snapshot_id: String,
pools: Vec<PoolMeta>,
snapshots: Vec<PoolSnapshot>,
tokens: Vec<TokenInfo>,
diagnostics: Option<DiscoveryDiagnostics>,
}
The snapshot ID is not just a label. It is the input handle for replay. A quote response can say, in effect: “this route was computed against snapshot sha256:....” If a client wants to verify or replay a quote, it can pass required_snapshot_id and force the API to use that retained snapshot rather than whatever happens to be hot now.
sequenceDiagram
autonumber
participant Worker as Refresh worker
participant DEX as Discovery adapters
participant Chain as Movement REST
participant Store as Snapshot store
participant API as Quote API
participant Client
Worker->>DEX: discover pools and pool state
DEX->>Chain: view/resources/events
Chain-->>DEX: liquidity data
DEX-->>Worker: sorted discovery result
Worker->>Worker: build snapshot_id
Worker->>Store: install active snapshot
Client->>API: quote request
API->>Store: load active or required snapshot
API-->>Client: quote response with snapshot_id
This creates a useful operational distinction:
- freshness is a property of the snapshot lifecycle
- determinism is a property of routing over a named snapshot
Those are related, but they are not the same problem. Freshness is monitored with snapshot age, discovery coverage, and readiness checks. Determinism is tested with fixed fixtures and repeated request replay.
Request Path
The quote path is deliberately staged. Every stage has a narrow purpose.
flowchart TD
Request["Quote request"] --> Validate["Validate amount, chain, tokens,<br/>slippage, deadline, fee controls"]
Validate --> Snapshot{"required_snapshot_id?"}
Snapshot -->|yes| Historical["Load retained snapshot"]
Snapshot -->|no| Active["Load active snapshot"]
Historical --> Policy["Apply routing policy"]
Active --> Policy
Policy --> Pools["Filter pools by DEX and token controls"]
Pools --> Cache{"Route skeleton cache hit?"}
Cache -->|yes| Candidates["Use cached candidates"]
Cache -->|no| Search["Build graph and enumerate candidates"]
Search --> Candidates
Candidates --> Quote["Run quote math"]
Quote --> Rank["Rank by quality, output, stable route key"]
Rank --> IDs["Derive quote_id and route_id"]
IDs --> Response["Return quote + diagnostics"]
A simplified request model looks like this:
struct QuoteRequest {
chain_id: u64,
token_in: String,
token_out: String,
amount_in: u128,
slippage_bps: u16,
deadline_unix_secs: u64,
max_hops: usize,
routing_mode: Option<RoutingMode>,
required_snapshot_id: Option<String>,
allow_dexes: Option<Vec<String>>,
exclude_dexes: Option<Vec<String>>,
partner_fee_bps: Option<u16>,
partner_fee_recipient: Option<String>,
}
The real request surface has more controls, but the theme is the same: clients can trade latency for breadth, pin snapshots for replay, narrow DEXes, and ask for diagnostics when they need to explain a decision.
Route Search As A Token Graph
Pools become directed edges in a token graph. A pool between token A and token B contributes both A-to-B and B-to-A edges. The graph search is breadth-first with cycle prevention, and every ordering decision is canonicalized.
The route search is not trying to be clever first. It is trying to be stable first.
struct Edge {
pool: PoolRef,
dex: String,
token_in: String,
token_out: String,
fee_bps: Option<u16>,
}
struct PoolGraph {
// token -> outgoing edges
adj: HashMap<String, Vec<Edge>>,
}
The key detail is edge ordering. Before route enumeration, each token’s outgoing edges are sorted by stable fields: output token, DEX name, pool reference, and fee. Candidate routes are sorted again by a route key. That means equivalent input produces equivalent candidate order.
flowchart LR
A["Token A"] -- "Pool 1" --> B["Token B"]
B -- "Pool 2" --> C["Token C"]
A -- "Pool 3" --> C
C -- "Pool 4" --> D["Token D"]
subgraph Routes["Candidate routes"]
R1["A -> C"]
R2["A -> B -> C"]
R3["A -> C -> D"]
end
I use routing modes to make the latency/coverage tradeoff explicit:
| Mode | Purpose | Behavior |
|---|---|---|
fast | wallet previews and immediate UI feedback | smaller candidate budget, tighter hop policy |
balanced | ranked alternatives | broader candidate search |
best_price | slower power-user comparisons | widest search budget |
This is a product decision as much as an engineering one. A wallet preview and a research route comparison should not pretend to have the same latency budget.
Caching Route Skeletons, Not Quotes
One easy trap in quote services is caching too much. Full quote responses depend on amount, slippage, deadline, partner fees, diagnostics, and execution options. Caching those can create subtle invalidation bugs.
Aqueducta caches route skeletons instead.
A route skeleton says: “for this snapshot and token pair, these are candidate paths worth evaluating.” It does not say how much output they produce for a particular request amount.
flowchart TD
Key["snapshot_id + token pair + mode + hop policy"] --> Skeletons["Cached route skeletons"]
Skeletons --> QuoteA["Quote amount A"]
Skeletons --> QuoteB["Quote amount B"]
Skeletons --> QuoteC["Quote amount C"]
That gives the hot path a useful optimization without turning the cache into a source of stale quote data. Popular pairs can skip repeated graph search, while every request still runs fresh quote math against the selected snapshot.
Quote Math Boundaries
The quote engine evaluates a route one hop at a time. Each hop receives the current input amount, pool metadata, and pool snapshot. The output of one hop becomes the input of the next.
trait Quoter {
fn quote_hop(
&self,
token_in: &str,
token_out: &str,
pool: PoolContext,
amount_in: u128,
) -> Result<HopQuote>;
}
For constant-product pools, the math is straightforward integer arithmetic:
fn quote_constant_product(
reserve_in: u128,
reserve_out: u128,
fee_bps: u16,
amount_in: u128,
) -> u128 {
let fee_denominator = 10_000u128;
let amount_after_fee =
amount_in * (fee_denominator - fee_bps as u128) / fee_denominator;
amount_after_fee * reserve_out / (reserve_in + amount_after_fee)
}
Concentrated liquidity is more involved because the quote depends on tick state. The design keeps that complexity behind the same hop-quote interface. The route engine does not need to know whether a hop is constant-product, CLMM, or backed by an upstream preview call. It only needs a quote quality and an output amount.
Ranking
After quote math, routes are sorted by:
- quote quality
- highest expected output
- deterministic route key
The third point is easy to overlook. If two routes tie, random map iteration order should not decide which one a wallet sees first.
quotes.sort_by(|a, b| {
quality_rank(b.quality)
.cmp(&quality_rank(a.quality))
.then_with(|| b.amount_out.cmp(&a.amount_out))
.then_with(|| route_key(&a.plan).cmp(&route_key(&b.plan)))
});
That tie-breaker makes tests sharper. A snapshot replay should fail because behavior changed, not because a collection happened to iterate differently.
Response Identity
The API returns deterministic IDs:
snapshot_idnames the input statequote_idnames the normalized request over that input stateroute_idnames a specific route for that normalized request
flowchart TD
Snapshot["snapshot_id"] --> QuoteID["quote_id"]
Request["normalized request"] --> QuoteID
Snapshot --> RouteID["route_id"]
Request --> RouteID
Route["route key"] --> RouteID
The normalized request key sorts unordered filters, lowercases case-insensitive fields, and includes policy-relevant controls. The goal is that semantically equivalent requests produce the same identity material.
This also makes client caching cleaner. An app can key route state by snapshot_id, token pair, amount, slippage, hop policy, and routing mode, then invalidate naturally when the snapshot changes.
Operating The Router
The backend exposes two types of status:
- product-level status for clients and dashboards
- infrastructure-level status for orchestration
flowchart LR
API["Quote API"] --> Health["/v1/health"]
API --> Ready["/readyz"]
API --> Live["/livez"]
API --> Status["/v1/status"]
API --> Metrics["/metrics"]
Ready --> Kube["Kubernetes readiness"]
Live --> Kube
Metrics --> Prom["Prometheus"]
Prom --> Grafana["Grafana"]
The status surface includes the things I would want during an incident:
- active snapshot ID
- retained snapshot IDs
- snapshot age
- pool count
- discovery coverage by DEX
- tick coverage for CLMM pools
- route cache entries and warm status
- refresh worker success/failure timestamps
- runtime controls such as enabled DEXes and execution hint mode
The most valuable design choice here is that liveness and readiness are not the same. The process can be live while the router is not ready to serve traffic because no fresh snapshot is available. That distinction matters in Kubernetes.
Testing For The Things That Actually Break
The tests I care about most are not only “does the endpoint return 200?” They are also:
- does the same request over the same snapshot return the same response?
- does a required historical snapshot quote against that snapshot or fail explicitly?
- does route ordering stay stable when two routes tie?
- does the API reject invalid fee, token, hop, and deadline controls?
- does the route cache behave as an optimization rather than a source of quote truth?
- does the service report degraded discovery without hiding it?
Fixed snapshot fixtures are central to this. They let the test suite replay quote behavior without depending on live chain state.
What I Would Keep
The snapshot-first design is the part I would repeat. It gave the system a useful spine:
- discovery is responsible for building named state
- routing is responsible for deterministic candidate generation
- quote math is responsible for evaluating candidates
- the API is responsible for policy, identity, and diagnostics
That separation made the project easier to test and easier to reason about. It also created better vocabulary. When something changes, I can ask a precise question: did the snapshot change, did the request change, did policy change, or did the code change?
For a DeFi routing service, that question is worth designing around.