Unirec (Pluggable Recommendation Stack) — WIP

Posted on October 06, 2025

unirec — Building a Pluggable, RL-Friendly Recommendation Stack (WIP)

Last updated: Oct 22, 2025 (KST)

TL;DR

unirec is a modular recommender framework aimed at RL-friendly ranking, config-driven pipelines, and production-grade retrieval.

Key ideas

Plugin architecture: Retriever → Merger → Shaper → SlatePolicy → Evaluator/OPE, all swappable via YAML.
Typed interfaces: Encodable[TContext], Encoder[TContext], Encoded[TContext] with Profile/Session/Request contexts.
Deterministic caching & reproducibility: explicit fingerprints (class FQN & version) + canonicalization for NumPy/Torch values.
Retrieval @ scale: two-tower embeddings + FAISS; snapshot A/B or delta overlay; user vectors from profile/session.
RL-ready ranking: bandit UCB or slate-bandit, diversity shaping (MMR), and OPE (IPS/SNIPS/DR + 95% CI).

This is a work in progress; APIs are stabilizing, but the principles below are what we’re converging on.

Why another stack?

Separable concerns — iterate on retrieval and ranking independently.
Reproducibility — pipelines must be replayable offline; cache keys should encode what produced which vector.
Practical RL — ranking benefits from bandits/RL; retrieval must stay fast and robust.

Architecture at a glance

[ CandidateRetriever ] —(pools)—> [ CandidateMerger ] —(set)—> [ CandidateShaper ] | FAISS / TwoTower | MMR / filters v v logs CandidateSet | v [ SlatePolicy (bandit/RL) ] | v PolicyOutput
v [ Evaluator ] [ OPEEstimator ]

Each component subclasses Component and implements a tight contract:

CandidateRetriever: search_one(state, k) -> list[Candidate]
CandidateMerger: merge(pools, user_id, topk) -> CandidateSet
CandidateShaper: shape(state) -> CandidateSet
SlatePolicy: select_slate(state) -> PolicyOutput
Evaluator/OPEEstimator: evaluate/estimate(state) -> dict

component_kind is a class constant (e.g., "candidate_retriever") and pipelines are assembled by YAML.

Context, Encodable, Encoder, Encoded

We separate what you encode from how you encode it.

Context: capability boundary with two horizons + optional request:
- Profile[TContext]: stable identity & metadata (e.g., user id; item id; long-term features)
- Session[TContext]: short-term signals (recent interactions, recency windows)
- Request: per-request metadata (page, surface, A/B cell). Optional by design.
Encodable[TContext]: holds (profile, session, meta) and carries vector caches.
Encoder[TContext]: turns an Encodable[TContext] (+ optional Request) into an Encoded[TContext].
Encoded[TContext]: wraps the produced vector, delegates profile/session/meta to the origin Encodable, and carries optional Request.

This keeps retrieval fast (reusing cached vectors when keys match) and ranking flexible (request-aware when needed).

Minimal interface sketch

class Context(Versioned): ...

TContext = TypeVar("TContext", bound=Context)

class Profile(Context, Generic[TContext]):
    @property
    def id(self) -> int: ...

class Session(Context, Generic[TContext]): ...
class Request(Context): ...

class Encodable(Versioned, Generic[TContext]):
    def __init__(self, profile: Profile[TContext], session: Session[TContext], meta: Mapping[str, Any] | None = None): ...
    # holds vector caches keyed by (encoder_key, request_key)

class Encoded(Versioned, Generic[TContext]):
    def __init__(self, vector: Tensor, origin: Encodable[TContext], request: Request | None = None): ...
    @property
    def profile(self) -> Profile[TContext]: ...
    @property
    def session(self) -> Session[TContext]: ...

class Encoder(Versioned, Generic[TContext]):
    @abstractmethod
    def encode(self, encodable: Encodable[TContext], *, request: Request | None = None, **kw) -> Encoded[TContext]: ...

Why optional `Request`?

Retrieval should remain light and mostly request-agnostic; ranking may use request signals. Optionality simplifies caching and reproducibility.

Fingerprints & Versioning (repro first-class)

We attach a fingerprinter to generate canonical cache keys for encodings.

Head fields (always first): __fqn__ (class FQN), __hash_algo__, __hash_len__, __fingerprint_ver__, and optional __cls_version__ when inheriting Versioned.
Canonicalization:
- floats: NaN/Inf/−0.0 normalized, fixed decimals
- NumPy arrays: (shape, dtype, mean, var)
- Torch tensors: CPU float32; compute (mean, var) with NaN/Inf handling (fallback when nanvar missing)
- datetime → UTC ISO, UUID → hex, bytes → base64
- mappings/sets/lists → stable ordering
Output: stable JSON of sorted (key, value) pairs, plus a digest (blake2s, configurable length).

Why head fields? Keys immediately disambiguate who produced them (FQN + version) before payload features.

Cache usage

enc_key = fingerprinter.key(encoder)
req_key = "∅" (if no request) or fingerprinter.key(request)
Encodable keeps:
- vec_profile_cached[enc_key]
- vec_session_cached[enc_key]
- vec_enc_cached[(enc_key, req_key)]

Retrieval: Two-Tower + FAISS

Item tower: batch-trained in PyTorch, snapshots exported to FAISS. Index choices are pluggable (IVF/HNSW/PQ). Refresh via A/B snapshot, rolling, or delta overlay.
User tower: built from Profile (long-term) + Session (recent). Retrieval uses a light user vector; heavier user models are reserved for ranking.
Freshness gating / biasing: at retrieval, we don’t re-infer heavy features; we add cheap scalar biases (e.g., time decay) or hard filters. Save rich signals for ranking.

Ranking: Bandits & RL

Default: sequential slate policy with UCB-style scoring (position weights × residual picks × diversity penalty MMR).
RL-ready: treat ranking as a slate bandit (stateful, position-aware), not independent slots. PPO-style policies can come later; bandits get you far with simpler infra.
Metrics: NDCG / Recall / ILD / Entropy / Coverage. OPE: IPS / SNIPS / Doubly-Robust with paired bootstrap for 95% CI.

What’s intentionally not in retrieval

Re-running heavy MLPs for per-request features
Online fine-tuning of item vectors inside the hot path
O(n²) pairwise penalties across millions of items

These belong to ranking or offline training/index builds.

Roadmap (near-term)

Trainers for two-tower (PyTorch) with export hooks → FAISS builders
Concrete User* / Item* context families and default encoders
Stronger delta overlay index option with periodic materialization
More slate policies (ε-greedy, TS, position-dependent UCB variants)
Live metrics dashboard & offline/online parity checks

Closing notes

unirec is pragmatic by design: fast retrieval, smart ranking, reproducible caches.
If you’re experimenting with RL for recsys, this split gives you safe iteration loops: swap modules, keep contracts.

Feedback welcome — this is WIP and evolving.

Recommendation

Recommendation Reinforcement Learning Contextual Bandit