Unirec (Pluggable Recommendation Stack) — WIP
Posted on
unirec — Building a Pluggable, RL-Friendly Recommendation Stack (WIP)
Last updated: Oct 22, 2025 (KST)
TL;DR
unirec is a modular recommender framework aimed at RL-friendly ranking, config-driven pipelines, and production-grade retrieval.
Key ideas
- Plugin architecture: Retriever → Merger → Shaper → SlatePolicy → Evaluator/OPE, all swappable via YAML.
- Typed interfaces:
Encodable[TContext],Encoder[TContext],Encoded[TContext]withProfile/Session/Requestcontexts. - Deterministic caching & reproducibility: explicit fingerprints (class FQN & version) + canonicalization for NumPy/Torch values.
- Retrieval @ scale: two-tower embeddings + FAISS; snapshot A/B or delta overlay; user vectors from profile/session.
- RL-ready ranking: bandit UCB or slate-bandit, diversity shaping (MMR), and OPE (IPS/SNIPS/DR + 95% CI).
This is a work in progress; APIs are stabilizing, but the principles below are what we’re converging on.
Why another stack?
- Separable concerns — iterate on retrieval and ranking independently.
- Reproducibility — pipelines must be replayable offline; cache keys should encode what produced which vector.
- Practical RL — ranking benefits from bandits/RL; retrieval must stay fast and robust.
Architecture at a glance
[ CandidateRetriever ] —(pools)—> [ CandidateMerger ] —(set)—> [ CandidateShaper ]
| FAISS / TwoTower | MMR / filters
v v
logs CandidateSet
|
v
[ SlatePolicy (bandit/RL) ]
|
v
PolicyOutput
v
[ Evaluator ] [ OPEEstimator ]
Each component subclasses Component and implements a tight contract:
- CandidateRetriever:
search_one(state, k) -> list[Candidate] - CandidateMerger:
merge(pools, user_id, topk) -> CandidateSet - CandidateShaper:
shape(state) -> CandidateSet - SlatePolicy:
select_slate(state) -> PolicyOutput - Evaluator/OPEEstimator:
evaluate/estimate(state) -> dict
component_kind is a class constant (e.g., "candidate_retriever") and pipelines are assembled by YAML.
Context, Encodable, Encoder, Encoded
We separate what you encode from how you encode it.
-
Context: capability boundary with two horizons + optional request:
Profile[TContext]: stable identity & metadata (e.g., user id; item id; long-term features)Session[TContext]: short-term signals (recent interactions, recency windows)Request: per-request metadata (page, surface, A/B cell). Optional by design.
- Encodable[TContext]: holds
(profile, session, meta)and carries vector caches. - Encoder[TContext]: turns an
Encodable[TContext](+ optionalRequest) into anEncoded[TContext]. - Encoded[TContext]: wraps the produced
vector, delegatesprofile/session/metato the originEncodable, and carries optionalRequest.
This keeps retrieval fast (reusing cached vectors when keys match) and ranking flexible (request-aware when needed).
Minimal interface sketch
class Context(Versioned): ...
TContext = TypeVar("TContext", bound=Context)
class Profile(Context, Generic[TContext]):
@property
def id(self) -> int: ...
class Session(Context, Generic[TContext]): ...
class Request(Context): ...
class Encodable(Versioned, Generic[TContext]):
def __init__(self, profile: Profile[TContext], session: Session[TContext], meta: Mapping[str, Any] | None = None): ...
# holds vector caches keyed by (encoder_key, request_key)
class Encoded(Versioned, Generic[TContext]):
def __init__(self, vector: Tensor, origin: Encodable[TContext], request: Request | None = None): ...
@property
def profile(self) -> Profile[TContext]: ...
@property
def session(self) -> Session[TContext]: ...
class Encoder(Versioned, Generic[TContext]):
@abstractmethod
def encode(self, encodable: Encodable[TContext], *, request: Request | None = None, **kw) -> Encoded[TContext]: ...
Why optional Request?
Retrieval should remain light and mostly request-agnostic; ranking may use request signals. Optionality simplifies caching and reproducibility.
Fingerprints & Versioning (repro first-class)
We attach a fingerprinter to generate canonical cache keys for encodings.
- Head fields (always first):
__fqn__(class FQN),__hash_algo__,__hash_len__,__fingerprint_ver__, and optional__cls_version__when inheritingVersioned. -
Canonicalization:
- floats: NaN/Inf/−0.0 normalized, fixed decimals
- NumPy arrays:
(shape, dtype, mean, var) - Torch tensors: CPU
float32; compute(mean, var)with NaN/Inf handling (fallback whennanvarmissing) - datetime → UTC ISO, UUID → hex, bytes → base64
- mappings/sets/lists → stable ordering
- Output: stable JSON of sorted
(key, value)pairs, plus a digest (blake2s, configurable length).
Why head fields? Keys immediately disambiguate who produced them (FQN + version) before payload features.
Cache usage
enc_key = fingerprinter.key(encoder)req_key = "∅"(if no request) orfingerprinter.key(request)-
Encodable keeps:
vec_profile_cached[enc_key]vec_session_cached[enc_key]vec_enc_cached[(enc_key, req_key)]
Retrieval: Two-Tower + FAISS
- Item tower: batch-trained in PyTorch, snapshots exported to FAISS. Index choices are pluggable (IVF/HNSW/PQ). Refresh via A/B snapshot, rolling, or delta overlay.
- User tower: built from
Profile(long-term) +Session(recent). Retrieval uses a light user vector; heavier user models are reserved for ranking. - Freshness gating / biasing: at retrieval, we don’t re-infer heavy features; we add cheap scalar biases (e.g., time decay) or hard filters. Save rich signals for ranking.
Ranking: Bandits & RL
- Default: sequential slate policy with UCB-style scoring (position weights × residual picks × diversity penalty MMR).
- RL-ready: treat ranking as a slate bandit (stateful, position-aware), not independent slots. PPO-style policies can come later; bandits get you far with simpler infra.
- Metrics: NDCG / Recall / ILD / Entropy / Coverage. OPE: IPS / SNIPS / Doubly-Robust with paired bootstrap for 95% CI.
What’s intentionally not in retrieval
- Re-running heavy MLPs for per-request features
- Online fine-tuning of item vectors inside the hot path
- O(n²) pairwise penalties across millions of items
These belong to ranking or offline training/index builds.
Roadmap (near-term)
- Trainers for two-tower (PyTorch) with export hooks → FAISS builders
- Concrete
User*/Item*context families and default encoders - Stronger delta overlay index option with periodic materialization
- More slate policies (ε-greedy, TS, position-dependent UCB variants)
- Live metrics dashboard & offline/online parity checks
Closing notes
unirec is pragmatic by design: fast retrieval, smart ranking, reproducible caches.
If you’re experimenting with RL for recsys, this split gives you safe iteration loops: swap modules, keep contracts.
Feedback welcome — this is WIP and evolving.