Publications (* equal contribution)

๐ŸŽ™๏ธ Audio Large Language Models

[6]
Training-Efficient Text-to-Music Generation with State-Space Modeling
Wei-Jaw Lee, Fang-Chih Hsieh, Xuanjun Chen, Fang-Duo Tsai, Yi-Hsuan Yang
TASLP (submitted) 2026 bib ยท arXiv ยท Project ยท Code
[5]
DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment
Ke-Han Lu, Zhehuai Chen, Szu-Wei Fu, ..., Xuanjun Chen, ..., Boris Ginsburg, Yu-Chiang Frank Wang, Hung-yi Lee
TASLP 2026 bib ยท arXiv ยท Code
[4]
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Chien-yu Huang, Wei-Chih Chen, Shu-wen Yang, ..., Xuanjun Chen, ..., Shinji Watanabe, Hung-yi Lee
ICLR 2025 bib ยท arXiv ยท OpenReview ยท Code
[3]
Codec-SUPERB: An In-Depth Analysis of Sound Codec Models
Haibin Wu, Ho-Lam Chung, Yi-Cheng Lin, ..., Xuanjun Chen, ..., Kai-Wei Chang, Alexander H. Liu, Hung-yi Lee
Findings of ACL 2024 bib ยท arXiv ยท Anthology ยท Leaderboard ยท Code ยท HF
[2]
Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural codec models
Haibin Wu, Xuanjun Chen, Yi-Cheng Lin, Jiawei Du, Kai-Wei Chang, ..., Shinji Watanabe, Hung-yi Lee
IEEE SLT 2024 bib ยท arXiv ยท IEEE Xplore
[1]
Towards audio language modeling-an overview
Haibin Wu, Xuanjun Chen, Yi-Cheng Lin, Kai-wei Chang, Ho-Lam Chung, Alexander Liu, Hung-yi Lee
Tech Report, Feb. 2024 bib ยท arXiv ยท Awesome

๐Ÿ” Retrieval Augmented Generation

[3]
CodaRAG: Connecting the Dots with Associativity Inspired by Complementary Learning
Cheng-Yen Lee*, Xuanjun Chen*, Claire Lin, Wei-Yu Chen, Wen-Hua Nie, Hung-yi Lee, Jyh-Shing Roger Jang
ACM Trans. Intell. Syst. Technol. (Submitted)
[2]
Only Ask What You Don't Know: Grounded Delta Planning for Efficient Multi-step RAG
Wei-Chieh Chou*, Xuanjun Chen*, Jian-Ren Lin, Claire Lin, Hung-yi Lee, Jyh-Shing Roger Jang
COLM 2026 (Submitted)
[1]
A Preliminary Study of RAG for Taiwanese Historical Archives
Claire Lin*, Bo-Han Feng*, Xuanjun Chen*, Te-Lun Yang, Hung-yi Lee, Jyh-Shing Roger Jang
ROCLING 2025 Best Paper Award bib ยท arXiv ยท Anthology

๐Ÿ›ก๏ธ Audio Deepfake Detection, Localization, Attribution, and Reliability

[13]
Mitigating Proxy-to-Wild Domain Gap in Deepfake Speech
Xuanjun Chen, Yun-Shing Wu, Wei-Chung Lu, Claire Lin, Haibin Wu, Hung-yi Lee, Jyh-Shing Roger Jang
INTERSPEECH 2026 (Submitted)
[12]
Joint Fullband-Subband Modeling for High-Resolution SingFake Detection
Xuanjun Chen*, Chia-Yu Hu*, Sung-Feng Huang, Haibin Wu, Hung-yi Lee, Jyh-Shing Roger Jang
INTERSPEECH 2026 (Submitted) bib ยท arXiv
[11]
CodecFake+: Codec-Based Resynthesized Data as a Proxy for Detecting CodecFake Speech
Xuanjun Chen*, Jiawei Du*, Haibin Wu, Lin Zhang, I-Ming Lin, ..., Jyh-Shing Roger Jang, Hung-yi Lee
TASLP 2026 bib ยท arXiv ยท IEEE ยท Project ยท HF ยท Code
[10]
Localizing Audio-Visual Deepfakes via Hierarchical Boundary Modeling
Xuanjun Chen*, Shih-Peng Cheng*, Jiawei Du, Lin Zhang, Xiaoxiao Miao, ..., Hung-yi Lee, Jyh-Shing Roger Jang
Technical Report 2025 bib ยท arXiv
[9]
How Does Instrumental Music Help SingFake Detection?
Xuanjun Chen, Chia-Yu Hu, I-Ming Lin, Yi-Cheng Lin, I-Hsiang Chiu, ..., Hung-yi Lee, Jyh-Shing Roger Jang
ICASSP 2026 bib ยท arXiv
[8]
Towards Generalized Source Tracing for Codec-Based Deepfake Speech
Xuanjun Chen*, I-Ming Lin*, Lin Zhang, Haibin Wu, Hung-yi Lee, Jyh-Shing Roger Jang
IEEE ASRU 2025 Best Student Paper nominee bib ยท arXiv ยท Code
[7]
Codec-Based Deepfake Source Tracing via Neural Audio Codec Taxonomy
Xuanjun Chen*, I-Ming Lin*, Lin Zhang, Jiawei Du, Haibin Wu, Hung-yi Lee, Jyh-Shing Roger Jang
INTERSPEECH 2025 bib ยท arXiv ยท ISCA ยท Code
[6]
Singing Voice Graph Modeling for SingFake Detection
Xuanjun Chen, Haibin Wu, Jyh-Shing Roger Jang, Hung-yi Lee
INTERSPEECH 2024 (Oral) bib ยท arXiv ยท ISCA ยท Code ยท Lightning Talk
[5]
Neural Codec-based Adversarial Sample Detection for Speaker Verification
Xuanjun Chen*, Jiawei Du*, Haibin Wu, Jyh-Shing Roger Jang, Hung-yi Lee
INTERSPEECH 2024 bib ยท arXiv ยท ISCA ยท Code
[4]
DFADD: The Diffusion and Flow-Matching based Audio Deepfake Dataset
Jiawei Du, I-Ming Lin, I-Hsiang Chiu, Xuanjun Chen, ..., Yu Tsao, Hung-yi Lee, Jyh-Shing Roger Jang
IEEE SLT 2024 bib ยท arXiv ยท IEEE Xplore ยท Code ยท HF
[3]
Multimodal Transformer Distillation for Audio-Visual Synchronization
Xuanjun Chen, Haibin Wu, Chung-Che Wang, Hung-yi Lee, Jyh-Shing Roger Jang
ICASSP 2024 bib ยท arXiv ยท IEEE Xplore ยท Code ยท Poster
[2]
Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection
Xuanjun Chen*, Haibin Wu*, Helen Meng, Hung-yi Lee, Jyh-Shing Roger Jang
IEEE SLT 2022, Jan 2023 bib ยท arXiv ยท IEEE Xplore ยท Demos ยท Poster ยท Video
[1]
Adversarial Speaker Distillation for Countermeasure Model on Automatic Speaker Verification
Xuanjun Chen*, Yen-Lun Liao*, Chung-Che Wang, Jyh-Shing Roger Jang
ISCA SPSC 2022 bib ยท arXiv ยท ISCA Archive