2025

ECAI
DmC: Nearest Neighbor Guidance Diffusion Model for Offline Cross-domain Reinforcement Learning

Linh Le, Minh Hoang Nguyen, Duc Kieu, Hung Le, Hung The Tran, Sunil Gupta

European Conference on Artificial Intelligence (ECAI) 2025

We study cross-domain offline RL with limited target data, where neural domain-gap estimators often overfit and only part of the source dataset overlaps with the target domain. We propose DmC, combining a k-NN proximity estimator with a nearest-neighbor–guided diffusion model to generate target-aligned source samples, and show strong gains over prior methods on MuJoCo benchmarks.

DmC: Nearest Neighbor Guidance Diffusion Model for Offline Cross-domain Reinforcement Learning

Linh Le, Minh Hoang Nguyen, Duc Kieu, Hung Le, Hung The Tran, Sunil Gupta

We study cross-domain offline RL with limited target data, where neural domain-gap estimators often overfit and only part of the source dataset overlaps with the target domain. We propose DmC, combining a k-NN proximity estimator with a nearest-neighbor–guided diffusion model to generate target-aligned source samples, and show strong gains over prior methods on MuJoCo benchmarks.

ECAI
ECML-PKDD
Hybrid Cross-domain Robust Reinforcement Learning

Linh Le, Minh Hoang Nguyen, Hung Le, Hung The Tran, Sunil Gupta

European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD) 2025

Offline robust RL is appealing when only fixed data is available, but it typically needs large datasets, while simulator data is cheaper yet suffers from dynamics mismatch. We propose HYDRO, a hybrid cross-domain robust RL framework that uses an online simulator alongside limited offline data, selecting reliable simulator samples via performance-gap–based uncertainty filtering and prioritized sampling, and show it outperforms prior methods across diverse tasks.

Hybrid Cross-domain Robust Reinforcement Learning

Linh Le, Minh Hoang Nguyen, Hung Le, Hung The Tran, Sunil Gupta

Offline robust RL is appealing when only fixed data is available, but it typically needs large datasets, while simulator data is cheaper yet suffers from dynamics mismatch. We propose HYDRO, a hybrid cross-domain robust RL framework that uses an online simulator alongside limited offline data, selecting reliable simulator samples via performance-gap–based uncertainty filtering and prioritized sampling, and show it outperforms prior methods across diverse tasks.

ECML-PKDD
IJCAI
Beyond the Known: Decision Making with Counterfactual Reasoning Decision Transformer

Minh Hoang Nguyen, Linh Le, Thommen George Karimpanal, Sunil Gupta, Hung Le

International Joint Conference on Artificial Intelligence (IJCAI) 2025

Decision Transformers often struggle in real-world offline RL because datasets are limited and dominated by suboptimal behavior. We propose CRDT, which injects counterfactual experiences to enable out-of-distribution reasoning and trajectory stitching without architectural changes, and show consistent improvements over standard DT on Atari and D4RL, including limited-data and dynamics shifted settings.

Beyond the Known: Decision Making with Counterfactual Reasoning Decision Transformer

Minh Hoang Nguyen, Linh Le, Thommen George Karimpanal, Sunil Gupta, Hung Le

Decision Transformers often struggle in real-world offline RL because datasets are limited and dominated by suboptimal behavior. We propose CRDT, which injects counterfactual experiences to enable out-of-distribution reasoning and trajectory stitching without architectural changes, and show consistent improvements over standard DT on Atari and D4RL, including limited-data and dynamics shifted settings.

IJCAI

2024

AAMAS
Policy Learning for Off-Dynamics RL with Deficient Support

Linh Le, Hung The Tran, Sunil Gupta

International Conference on Autonomous Agents and Multiagent Systems (AAMAS) 2024

We tackle cross-domain policy transfer under large dynamics mismatch, where the common "full support" assumption (the simulator covers all target transitions) is unrealistic. We propose a simple method that skews and extends source support toward target support to reduce support deficiencies, and show consistent gains over prior approaches across diverse benchmarks.

Policy Learning for Off-Dynamics RL with Deficient Support

Linh Le, Hung The Tran, Sunil Gupta

We tackle cross-domain policy transfer under large dynamics mismatch, where the common "full support" assumption (the simulator covers all target transitions) is unrealistic. We propose a simple method that skews and extends source support toward target support to reduce support deficiencies, and show consistent gains over prior approaches across diverse benchmarks.

AAMAS