I received my B.S. in Computer Science from China University of Geosciences (CUG), Wuhan, in 2004, and my Ph.D. in 2009 from the Institute of Image Recognition & Artificial Intelligence, Huazhong University of Science and Technology (HUST), advised by Prof. Jinwen Tian.
In 2009, I joined Chongqing University of Posts and Telecommunications (CQUPT). In 2010, I spent a year teaching middle school students in a village town as part of a government Aid Education Program. In 2012, I joined the Informedia Group at Carnegie Mellon University (CMU), working with Prof. Alexander G. Hauptmann as a postdoctoral fellow .
From 2014 to 2023, I was a full professor at CQUPT, leading the Intelligent Multimedia Research Center and the Chongqing Key Laboratory of Signal and Information Processing. In September 2023, I joined the School of Intelligent Systems Engineering at Sun Yat-sen University (Shenzhen Campus) as a full professor, and I continue close collaboration with CQUPT, including joint supervision of Master's and PhD students.
Official faculty pages: SYSU, CQUPT.
I am recruiting highly self-motivated Postdocs, PhD / Master students and long-term research interns, please read the recruiting note here: [ 招生说明 ].
Research Interests
- Multimodal perception & fusion: robust detection and recognition under challenging conditions using infrared, visible, LiDAR point clouds, SAR, and their fusion (e.g., IR+visible in low light, visible+navigation radar in real scenes).
- Infrared & 3D scene understanding: infrared small-target detection, scalable infrared foundation models for low-SNR tasks, and LiDAR-based 3D object detection and scene perception.
- Controllable generation for perception & planning: diffusion-based translation and synthesis of sensor data (e.g., visible→infrared) and condition-controlled image/video generation to support detection, tracking, and planning.
- Medical & industrial vision: dental/tooth segmentation from X-ray, CBCT, and 3D oral scans, as well as industrial surface defect inspection.
- Video understanding & embodied multimodal intelligence: long-duration video understanding, behavior/event analysis in the wild, and coupling all lines with multimodal large models and perception–action loops for robots/agents with reasoning-style supervision.
Teaching
- "Video Technology" (Undergraduate)
- "Digital Image Processing" (Undergraduate)
- "Pattern Recognition and Machine Learning" (Graduate)