Tianxin Xie

I am a PhD student at the AI Thrust, Information Hub, Hong Kong University of Science and Technology (Guangzhou), advised by Prof. Li Liu. I received my B.E. degree in Computer Science from Hunan Normal University, and my M.S. degree in Computer Science from the University of Chinese Academy of Sciences, advised by Prof. Hu Han.

My research focuses on text-to-speech, speech LLMs, audio generation and reasoning, and multi-modal learning.

Email: txie151[at]connect.hkust-gz.edu.cn / tianxin.xie[at]outlook.com

Google Scholar

Semantic Scholar

ORCID

GitHub

news

Jul 17, 2026	One paper, i.e., PhyAVBench, is accepted at ACM MM 2026.
Jun 04, 2026	Three papers accepted at Interspeech 2026.
May 01, 2026	One paper accepted at ICML 2026.
Apr 13, 2026	We have released a T2AV benchmark, i.e., PhyAVBench, which contains 11,605 newly recorded videos. The dataset will be soon released to the public. The embeddings of audio samples and evaluation scripts have been released: https://github.com/imxtx/PhyAVBench.
Jan 27, 2026	One paper accepted at ICLR 2026.
Nov 10, 2025	I started my internship at Tencent AI Lab.
Nov 08, 2025	One paper accepted at AAAI 2026.
Aug 21, 2025	One paper accepted at EMNLP 2025.
Dec 21, 2024	One paper accepted at ICASSP 2025.
Dec 13, 2024	One paper accepted for publication in IEEE TPAMI.
Dec 09, 2024	We released a comprehensive survey of controllable speech synthesis on arXiv.
Jul 01, 2024	I graduated from the Institute of Computing Technology, Chinese Academy of Sciences, and will be pursuing my PhD at HKUST (GZ).

latest posts

selected publications

ACM MM 2026

PhyAVBench: A Challenging Audio Physics-Sensitivity Benchmark for Physically Grounded Text-to-Audio-Video Generation

Tianxin Xie, Wentao Lei, Kai Jiang, Guanjie Huang, Pengfei Zhang, Chunhui Zhang, Fengji Ma, Haoyu He, Han Zhang, Jiangshan He, Jinting Wang, Linghan Fang, Lufei Gao, Orkesh Ablet, Peihua Zhang, Ruolin Hu, Shengyu Li, Weilin Lin, Xiaoyang Feng, Xinyue Yang, Yan Rong, Yanyun Wang, Zihang Shao, Zelin Zhao, Chenxing Li, Shan Yang, Wenfu Wang, Meng Yu, Dong Yu, and Li Liu

In Proceedings of the 34th ACM International Conference on Multimedia, 2026

PDF Code Website
Interspeech 2026

VoiceTTA: Enhancing Zero-Shot Text-to-Speech via Reinforcement Learning-Based Test-Time Adaptation

Tianxin Xie, Chenxing Li, Dong Yu, and Li Liu

In Interspeech 2026, 2026

PDF Website
EMNLP 2025

Towards Controllable Speech Synthesis in the Era of Large Language Models: A Systematic Survey

Tianxin Xie, Yan Rong, Pengfei Zhang, Wenwu Wang, and Li Liu

In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

DOI PDF Code
arXiv

EmoSteer-TTS: Fine-Grained and Training-Free Emotion-Controllable Text-to-Speech via Activation Steering

Tianxin Xie, Shan Yang, Chenxing Li, Dong Yu, and Li Liu

arXiv preprint arXiv:2508.03543, 2025

PDF
ICASSP 2025

Inter- and Intra-Sentence Cuer-Invariant Representation Learning for Generalizable Cued Speech Recognition

Tianxin Xie and Li Liu

In ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025

DOI PDF
TPAMI

Natural Adversarial Mask for Face Identity Protection in Physical World

Tianxin Xie, Hu Han, Shiguang Shan, and Xilin Chen

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

DOI PDF Code