Autoregressive Policy

Posted on June 21, 2025

Autoregressive Policy for Multi-Step Action Modeling

In architectural design automation, many actions are naturally composed of multiple interdependent sub-actions. For example, placing a room may require specifying its type, position, orientation, and adjacency constraints—all of which are tightly coupled.

However, if we model such multi-step actions using a standard action head (e.g., multiple categorical outputs sampled independently), we lose the dependency structure between these sub-actions. This leads to incoherent or invalid combinations, ultimately harming the quality and feasibility of the generated design.

To address this, we use an autoregressive policy, which explicitly models the joint distribution over sub-actions in a sequential and conditional manner.

Why Autoregressive?

Instead of sampling all sub-actions independently as:

P(a₁, a₂, …, aₙ) ≠ P(a₁) · P(a₂) · … · P(aₙ)

we factor the joint distribution into a chain of conditional probabilities:

P(a₁, a₂, …, aₙ) = P(a₁) · P(a₂ | a₁) · P(a₃ | a₁, a₂) · … · P(aₙ | a₁, …, aₙ₋₁)

This means each sub-action is sampled sequentially, with each decision conditioned on the previously sampled sub-actions. It captures rich interdependencies and results in more coherent, context-aware decision making.

How It Works

Sequential Sampling: Instead of outputting all sub-actions in parallel, the model generates them one by one in a predefined order.
State Augmentation: At each step, the current state is augmented with the previously sampled sub-actions.
Conditioned Prediction: The model predicts the next sub-action based on the current state and the history of previously selected sub-actions.

This framework allows the policy to adaptively condition on earlier choices, enabling more structured and meaningful action generation.

Advantages

Preserves dependencies between sub-actions
Improves sample quality by avoiding invalid combinations
Enables fine-grained control over multi-dimensional decisions
Compatible with sequence models such as Transformers or RNNs

Example Use Case

In a floor plan generation task, the placement of a new room might involve the following sub-actions:

Room type (e.g., bedroom, kitchen)
Target location (x, y coordinates)
Orientation (e.g., north-facing)
Adjacency (e.g., near bathroom)

Using an autoregressive policy, the model selects each sub-action conditionally, ensuring that e.g., the room type informs the best orientation, or adjacency depends on the room’s location and type.

Autoregressive modeling is a powerful approach for structured decision-making problems like architectural design, where a single “action” spans a sequence of semantically rich decisions.

← To Profile

Building Design Automation

Problem Solving Combinatorial Optimization Reinforcement Learning Optimization