Pointer Networks

Posted on June 21, 2025

Pointer Networks for Entity Selection in Reinforcement Learning

In the reinforcement learning (RL) framework for architectural design automation, many actions involve selecting an entity—such as choosing a room to modify, connecting a component, or subdividing a spatial region. While these selection actions may seem straightforward, modeling them correctly requires attention to the structure and invariance of the input.

A naive approach using a standard action head treats entity selection as a classification task over a fixed set of indices. This introduces a positional bias, where the model learns to associate certain actions with specific index positions, regardless of the actual content or relationships among entities. This is problematic in scenarios where entities are unordered, dynamic in number, or permutation-invariant—which is often the case in architectural design.

To address this, we implemented entity selection using a Pointer Network.

Why Pointer Networks?

Pointer Networks(Vinyals et al., 2015) are designed to output positions from the input sequence itself, rather than selecting from a fixed output vocabulary. This approach offers several key advantages:

Removes dependence on static entity indices
Ensures entity selection is based on semantic and relational information
Respects relational inductive bias, crucial for modeling graph-structured or set-based data

By aligning the action space with the actual structure of the problem, pointer networks make the model more robust and generalizable across varying input configurations.

How It Works

The pointer network mechanism operates as follows:

Entity Encoding: Each entity is encoded into a vector representation via an encoder (e.g., a Transformer or Graph Neural Network).
Contextual Attention: A decoder (or context vector) computes attention scores over the entity representations.
Pointer Distribution: These attention scores form a probability distribution over entities, from which a selection is made.

This allows the model to “point” directly to one of the input entities based on its content and contextual relevance—without relying on fixed position or index.

Summary

Using a pointer network for entity selection leads to:

Better alignment with the variable and relational nature of architectural entities
Avoidance of artificial biases caused by index-based modeling
A more principled and flexible policy for reinforcement learning agents in design automation

Building Design Automation

Problem Solving Combinatorial Optimization Reinforcement Learning Optimization