Abstract

Kool의 Attention! Learn to Solve Routing Problems 같은 경우 single-agent라는 한계 조재
기존 방법론은 대부분 MinSum이지만 본 논문에서는 MinMax에 집중
MinMax란 minimizing the length of the longest subroute
mTSP는 distributed scheduling policy를 통해 cooperative strategic routing을 만들어 내야 되서 어려움
또한, delayed and sparse single reward라는 점도 학습을 어렵게 만듬.
ScheduleNet은 arbitrary number of workers and tasks에 대해 적용 가능 -> Scalability가 좋음.
Training 방법으로는 Clipped REINFORCE 사용.

Introduction

MinMax의 장점은 MinSum과 달리 agent 별로 balanced된 task 분배가 가능하다는 점이다.
이 논문의 주요 부분은 아래와 같음.
1. Decentralized policy를 통해 arbitrary number of agents and tasks에 대해 적용 가능.
2. Attention-based node embeddings
3. Training with a single delayed shared reward signal

Problem Formulation

State Definition

Salesman: V_T = {1,2,…,m}
Cities: V_C = {m+1,m+2,…,m+N}
pⁱ: 2D-coordinates of entities (salesman, cities, and the depot)
1. time-dependent for workers
2. static for tasks and the depot
τ: event where any worker reaches its assigned city
t(τ): time of event τ
State s: sⁱ_τ = (pⁱ_τ, 1^active_τ, 1^assigned_τ)
1^active_τ:
1. already visited task is inactive
2. worker at the depot is inactive
1^assigned_τ:
1. whether worker is assigned to a task or not
s^env_τ: current time, sequence of tasks visited by each worker
s_τ state at the τ-th event: ({sⁱ_τ}^m+N_i=1, s^env_τ)

Action

worker-to-tak assignment

Reward

For nonterminal events, r(s_τ) = 0
r(s_T) = t(T)

ScheduleNet

스크린샷 2021-11-13 오후 7 38 55

Whenever event occurs, contruct graph as shown above.
Edge feature is the Euclidean distance between two nodes
h_i: node embedding (maybe node coordinate?)
h_ij: edge embedding (high-dim conversion of Euclidean distance)
Single iteration of TGA embedding consists of three phases: (1) edge update (2) message aggregation (3) node update
h’_ij: embeddings after TGA update (message from source node j to destination node i)
z_ij: how valuable is node i is to node j (just like compatibility in self-attention)
k_j: type of entity j (i.e. active worker if 1^active_τ = 1)

How Type-aware Graph Attention works?

“Type-aware” edge update

i. Context embedding c_ij of edge e_ij
- Source node j의 종류에 따라 context node Embedding 달라짐.
ii. Type-aware edge encoding
- MI layer dynamically generates its parameter depending on the context c_ij
- Dynamic edge feature which varies depending on the source node type.
iii. Type-aware edge encoding
- Upper one is for TGA_E
- Lower one is for TGA_A
Message Aggregation i. k type 별로 따로따로 attention score 구함.
- v_i를 기준으로 주변에 type-k neighbor에 대하여 attention score 계산.
- 현재 세팅에선 agent, task, depot 세가지 종류의 노드가 존재하므로 두가지 attention score가 생길 것으로 예상.
ii. Attention score에 앞에서 구한 node embedding을 node type별로 곱해서 concat 진행.
Node Update i. Context embedding c_i of node v_i
Type-aware aggregation이 중요한 이유
- Node distribution은 task의 갯수가 월등히 많을 가능성이 큼.
- Type-aware aggregation은 위 같은 불균형으로 인해 생기는 학습의 어려움을 어느정도 완화.

Overall Procedure

Screenshot from 2021-11-15 19-07-16

raw-2-hid: encode initial node and edge features to obtain initial h⁽⁰⁾_i, h⁽⁰⁾_ij
hid-2-hid: encode the target subgraph G^s_τ
The subgraph is composed of a target-worker(unassigned-worker) node and all unassigned-city nodes.
hid-2-hid layer is repeated H times to obtain final hidden embeddings h^(H)_i, h^(H)_ij

Probability Generation Process

Screenshot from 2021-11-15 19-13-20

Training

Makespan Normalization

i. Only reward signal is makespan of mTSP, which is M(π) ii. Such reward varies depending on the problem size, topology of the map, and … iii. So, it is normalized using the equation below.

Multi agents Task allocation Reinforcement Learning MARL

ScheduleNet: Learn to Solve MinMax Multiple Travelling Salesman Problem

Abstract

Introduction

Problem Formulation

State Definition

Action

Reward

ScheduleNet

How Type-aware Graph Attention works?

Overall Procedure

Probability Generation Process

Training

Comments

Abstract

Introduction

Problem Formulation

State Definition

Action

Reward

ScheduleNet

How Type-aware Graph Attention works?

Overall Procedure

Probability Generation Process

Training

Comments

Related Posts

Projects 01 Oct 2021

OpenAI gym tip 01 Sep 2021

Conda 명령어 31 Aug 2021