Portrait of Zuming Huang

Zuming Huang

Staff Algorithm Engineer

Document Intelligence · Large Multimodal Models · Reinforcement Learning

Email: zuming dot hzm at gmail dot com

About Me

Zuming Huang is currently a staff algorithm engineer on the multimodal team at INF Tech since December 2024. Previously, he worked on the multi-modality cognition team at Ant Group starting from April 2019. Before that, he served as an algorithm engineer in the Computer Vision Technology Department at Baidu Inc. He received his Bachelor's degree in Information Engineering from South China University of Technology (SCUT) and his Master's degree in Pattern Recognition and Intelligent Systems from the Institute of Automation, Chinese Academy of Sciences (CASIA). His research interests include Document Intelligence, Large Multimodal Models, and Reinforcement Learning.

Experience & Education

  • 2024.12 - Present INF Tech, Multimodal Team, Staff Algorithm Engineer.
  • 2019.04 - 2024.12 Ant Group (Hangzhou), Multi-modality Cognition Team, Senior Algorithm Engineer.
  • 2017.06 - 2019.04 Baidu Inc. (Beijing), Department of Computer Vision Technology (VIS), Algorithm Engineer.
  • 2014.09 - 2017.06 Institute of Automation, Chinese Academy of Sciences (CASIA), M.S. in Pattern Recognition and Intelligence System.
  • 2010.09 - 2014.07 South China University of Technology (SCUT), B.Eng. in Information Engineering.

Selected Publications

  • Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing
    Baode Wang, Biao Wu, Weizhen Li, Meng Fang, Zuming Huang, Jun Huang, Haozhe Wang, Yanjie Liang, Ling Chen, Wei Chu, and Yuan Qi
    Technical Report, 2025. (Project Lead)
  • VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
    Haozhe Wang, Chao Qu, Zuming Huang, Wei Chu, Fangzhen Lin, and Wenhu Chen
    Conference on Neural Information Processing Systems (NeurIPS), 2025. (CCF-A, Spotlight)
  • Fine-grained Pesudo Labels for Scene Text Recognition
    Xiaoyu Li, Xiaoxue Chen, Zuming Huang, Lele Xie, Jingdong Chen, and Ming Yang
    ACM International Conference on Multimedia (ACM MM), 2023. (CCF-A)
  • Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes
    Chengquan Zhang*, Borong Liang*, Zuming Huang*, Mengyi En, Junyu Han, Errui Ding, and Xinghao Ding
    Computer Vision and Pattern Recognition Conference (CVPR), 2019. (CCF-A, Equal first authors)
  • A Single-shot Arbitrarily-shaped Text Detector based on Context Attended Multi-task Learning
    Pengfei Wang, Chengquan Zhang, Fei Qi, Zuming Huang, Mengyi En, Junyu Han, Jingtuo Liu, Errui Ding, Guangming Shi
    ACM International Conference on Multimedia (ACM MM), 2019. (CCF-A, Integrated into PaddleOCR)
  • TextNet: Irregular Text Reading from Images with an End-to-End Trainable Network
    Yipeng Sun, Chengquan Zhang, Zuming Huang, Jiaming Liu, Junyu Han, and Errui Ding
    Asian Conference on Computer Vision (ACCV), 2018. (Oral)
  • Building Extraction from Multi-source Remote Sensing Images via Deep Deconvolution Neural Networks
    Zuming Huang, Guangliang Cheng, Hongzhen Wang, Haichang Li, Limin Shi, and Chunhong Pan
    International Geoscience and Remote Sensing Symposium (IGARSS), 2016.
  • Extraction of Virtual Baselines from Distorted Document Images using Curvilinear Projection
    Gaofeng Meng, Zuming Huang, Yonghong Song, Shiming Xiang, and Chunhong Pan.
    International Conference on Computer Vision (ICCV), 2015. (CCF-A)

Honors & Awards

  • Employee of the Year, INF Tech, 2025
  • CCF Science and Technology Award, 2023
  • Winning Team, ICDAR Competition on SVRD, 2023
  • Honorable Mentor, Ant Group, 2021
  • Excellent Project, Baidu Vis, 2018
  • Best New Employee Award, Baidu Vis, 2017
  • Outstanding Student Cadre Award, CASIA, 2016
  • National Scholarship, CASIA, 2016
  • Best Paper Award, ICCC, 2014
  • Student Award of Merit, SCUT, 2013
  • National Scholarship, SCUT, 2012