Contact
current email: pengsun000:gmail<com
other or former emails: wanhesong:bytedance<com pythonsun@tencent.com, peng.sun@rutgers.edu, pes2021@med.cornell.edu, sunp08@mails.tsinghua.edu.cn
where : = @, < = .
Research Interests
Machine Learning and its applications. In particular, Single- and Multi- Agent Reinforcement Learning, Deep Learning, Ensemble Learning.
Experiences
Dec, 2020 - now, ByteDance Inc.
Oct, 2016 - Dec, 2020 Senior Researcher with Tencent AI Lab; Team Leader of Agent Learning Center, Tencent Robotics X
Aug, 2015 – Aug, 2016 Post-doctoral researcher on Machine Learning, Deep Learning applied to Natural Language Processing and Sequential Data, Dpt. of Statistics, Rutgers University. Advisor: Tong Zhang
Sep, 2014 – Aug, 2015 Post-doctoral researcher on Machine Learning applied to Medical Image Processing and CT image based Cardiovascular disease diagnosis, Dalio ICI, Weill Cornell Medical College, Cornell University. Advisors: Guanglei Xiong and James K. Min
Jul, 2013 – Apr, 2014 Intern at Institute of Deep Learning, Baidu Inc. Mentor: Tong Zhang and Kai Yu.
Oct, 2011 – Apr, 2012 Visiting student at Australia National University and NICTA. Advisors: Mark Reid and Robert Williamson.
Educations
Sep, 2008 – Jul, 2014 PhD at i-Vision Group, Dpt. of Automation, Tsinghua University. Supervisor: Jie Zhou.
Sep, 2005 – Mar, 2008 Master Degree of Telecommunication Engineering, Beijing University of Posts and Telecommunications (BUPT). Supervisors: Fei Su and Anni Cai.
Sep, 2001 – Jul, 2005 Bachelor Degree of Telecommunication Engineering, Wuhan University of Technology (WHUT). Rank 2 of 200+.
Programming Skills
- Python (with PyTorch and Tensorflow 1.x)
- C++ (Procedural/Object-Oriented/Generic programming; with popular libraries, e.g., OpenCV)
- Lua (with Torch 7)
- Matlab (fast prototyping, as script and test-bench, mex hybrid programming…)
- Experiences on Multi-threading and GPU (CUDA C)
Technical Reports
- Yang Liu*, Peng Sun*, Hang Li. “Large Language Models as Agents in Two-Player Games”, arXiv preprint arXiv:2402.08078, 2024 (* indicates equal contribution)
- Wei Xi, Yongxin Zhang, Changnan Xiao, Xuefeng Huang, Shihong Deng, Haowei Liang, Jie Chen, Peng Sun**. “Mastering Strategy Card Game (Legends of Code and Magic) via End-to-End Policy and Optimistic Smooth Fictitious Play”, arXiv preprint arXiv:2303.04096, 2023 (** Corresponding author).
Won the double championship of the COG2022 strategy card game LoCM competition. - Lei Han*, Jiechao Xiong*, Peng Sun*, Xinghai Sun, Meng Fang, Qingwei Guo, Qiaobo Chen, Tengfei Shi, Zhengyou Zhang. “TStarBot-X: An Open-Sourced and Comprehensive Study for Efficient League Training in StarCraft II Full Game”, arXiv preprint arXiv:2011.13729, 2020 (* Equal contribution, correspondence to the first three authors)
- Peng Sun*, Jiechao Xiong*, Lei Han*, Xinghai Sun, Shuxing Li, Jiawei Xu, Meng Fang, Zhengyou Zhang. “TLeague: A Framework for Competitive Self-Play based Distributed Multi-Agent Reinforcement Learning”, arXiv preprint arXiv:2011.12895, 2020 (* Equal contribution, correspondence to the first three authors)
- Qing Wang*, Jiechao Xiong*, Lei Han, Meng Fang, Xinghai Sun, Zhuobin Zheng, Peng Sun, Zhengyou Zhang. “Arena: a toolkit for Multi-Agent Reinforcement Learning”, https://arxiv.org/abs/1907.09467, 2019 (* indicates equal contribution)
- Peng Sun*, Xinghai Sun*, Lei Han*, Jiechao Xiong*, Qing Wang, Bo Li, Yang Zheng, Ji Liu, Yongsheng Liu, Han Liu, Tong Zhang. “TStarBots: Defeating the Cheating Level Builtin AI in StarCraft II in the Full Game”, https://arxiv.org/abs/1809.07193, 2018 (* indicates equal contribution)
- Jiechao Xiong, Qing Wang, Zhuoran Yang, Peng Sun, Lei Han, Yang Zheng, Haobo Fu, Tong Zhang, Ji Liu, Han Liu. “Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space” arXiv preprint arXiv:1810.06394, 2018
- Peng Sun, James K. Min, Guanglei Xiong. “Globally Tuned Cascade Pose Regression via Back Propagation with Application in 2D Face Pose Estimation and Heart Segmentation in 3D CT Images.” http://arxiv.org/abs/1503.08843, 2015
- Peng Sun, Haoyin Zhou, Devon Lundine, James K Min, Guanglei Xiong. “Fast Segmentation of Left Ventricle in CT Images by Explicit Shape Regression using Random Pixel Difference Features”. http://arxiv.org/abs/1507.07508, 2015
Papers
- Trung Quoc Luong*, Xinbo Zhang*, Zhanming Jie*, Peng Sun**, Xiaoran Jin, Hang Li. “ReFT: Reasoning with Reinforced Fine-Tuning”, ACL 2024 (* Equal contribution, ** Corresponding author)
- Zhiheng Xi*, Wenxiang Chen*, Boyang Hong*, Senjie Jin*, Rui Zheng**, Wei He, Yiwen Ding, Shichun Liu, Xin Guo, Junzhe Wang, Honglin Guo, Wei Shen, Xiaoran Fan, Yuhao Zhou, Shihan Dou, Xiao Wang, Xinbo Zhang, Peng Sun, Tao Gui**, Qi Zhang**, Xuanjing Huang. “Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning”, ICML 2024 (* Equal contribution, ** Corresponding author)
- Hanlin Yang, Chao Yu*, Peng Sun, Siji Chen. “Hybrid Policy Optimization from Imperfect Demonstrations”, NeurIPS 2023 (* Corresponding author)
- Changnan Xiao*, Yongxin Zhang*, Xuefeng Huang, Qinhan Huang, Jie Chen, Peng Sun**. “Mastering Strategy Card Game (Hearthstone) with Improved Techniques”, IEEE CoG 2023 (* Equal contribution; ** Corresponding author). In a machine-vs-human test of full-game (including both deck-building and battle), our AIs defeat a Hearthstone streamer who best ranked top 10 in the official league of China region. Nominated for best paper at CoG2023
- Shuxing Li*, Jiawei Xu*, Honghua Dong, Yu Yang, Chun Yuan, Peng Sun and Lei Han. “The Fittest Wins: a Multi-Stage Framework Achieving New SOTA in ViZDoom Competition”. IEEE ToG 2023 (* indicates equal contribution)
- Zongkai Liu, Chao Yu*, Yaodong Yang, Peng Sun, Zifan Wu. “A Unified Diversity Measure for Multiagent Reinforcement Learning”. NeurIPS 2022. (* Corresponding author)
- Fangwei Zhong, Peng Sun, Wenhan Luo, Tingyun Yan, Yizhou Wang. “Towards Distraction-Robust Active Visual Tracking”. ICML 2021
- Zequn Jie, Peng Sun, Xin Li, Jiashi Feng and Wei Liu. “Anytime Recognition with Routing Convolutional Networks”. TPAMI 2019
- Fangwei Zhong, Peng Sun, Wenhan Luo, Tingyun Yan, Yizhou Wang. “AD-VAT+: An Asymmetric Dueling Mechanism for Learning and Understanding Visual Active Tracking”. TPAMI 2019
- Lei Han*, Peng Sun*, Yali Du*, Jiechao Xiong, Qing Wang, Xinghai Sun, Han Liu, Tong Zhang. “Grid-Wise Control for Multi-Agent Reinforcement Learning in Video Game AI”. ICML 2019 (* indicates equal contribution)
- Wenhan Luo*, Peng Sun*, Fangwei Zhong*, Wei Liu, Tong Zhang, Yizhou Wang. “End-to-end Active Object Tracking and Its Real-world Deployment via Reinforcement Learning”, TPAMI 2019. (* indicates equal contribution)
- Fangwei Zhong, Peng Sun, Wenhan Luo, Tingyun Yan, Yizhou Wang. ”AD-VAT: An Asymmetric Dueling mechanism for learning Visual Active Tracking”, ICLR 2019 (an earlier version also appeared in NIPS workshop on Deep Reinforcement Learning, 2018).
- Wenhan Luo*, Peng Sun*, Fangwei Zhong, Wei Liu, Tong Zhang, Yizhou Wang. “End-to-end Active Object Tracking via Reinforcement Learning”, ICML 2018. (* indicates equal contribution)
- Li Shen, Peng Sun, Yitong Wang, Wei Liu, Tong Zhang. “An Algorithmic Framework of Variable Metric Over-Relaxed Hybrid Proximal Extra-Gradient Method”, ICML 2018.
- Baoyuan Wu, Weidong Chen, Peng Sun, Wei Liu, Bernard Ghanem, Siwei Lyu. “Tagging Like Humans: Diverse and Distinct Image Annotation”, CVPR 2018
- Qing Wang, Jiechao Xiong, Lei Han, Peng Sun, Han Liu, Tong Zhang. “Exponentially Weighted Imitation Learning for Batched Historical Data”, NIPS 2018
- Knight-Greenfield, Ashley and N. Beecy, Ashley and Chang, Qi and Anchouche, Khalil and Baskaran, Lohendran and Elmore, Kimberly and Kolli, Kranthi and Wang, Hao and Al’Aref, Subhi and M Peña, Jessica and Patel, Praneil and Peng Sun and Zhang, Tong and Kamel, Hooman and Min, James and Gupta, Ajay. “A Novel Deep Learning Approach for Automated Diagnosis of Cerebrovascular Accidents”, Journal of Cardiovascular Computed Tomography, 2017
- Guanglei Xiong, Peng Sun, Anna Starikov, Haoyin Zhou, Seongmin Ha, Quynh Truong, James Min. “Comprehensive Modeling and Visualization of Cardiac Anatomy and Physiology from CT Imaging and Computer Simulations”, IEEE Transactions on Visualization and Computer Graphics, 2016
- Haoyin Zhou, Peng Sun, Seongmin Ha, James K. Min, Guanglei Xiong. “Modeling of Bifurcated Tubular Structures for Vessel Segmentation”, Computerized Medical Imaging and Graphics, 2015
- Peng Sun, Tong Zhang, Jie Zhou. “A Convergence Rate Analysis for LogitBoost, MART and Their Variant”. ICML2014
- Peng Sun, Mark D. Reid, Jie Zhou. “An Improved Multiclass LogitBoost Using Adaptive-One-vs-One”, Machine Learning (MLJ), 2014, 97(3): 295-326.
- Peng Sun, Jie Zhou. “Saving Evaluation Time for the Decision Function in Boosting: Representation and Reordering Base Learner”, ICML 2013.
- Peng Sun, Mark D. Reid, Jie Zhou. “AOSO-LogitBoost: Adaptive One-Vs-One LogitBoost for Multi-Class Problems”, ICML 2012.
- Mark D. Reid, Robert C. Williamson, Peng Sun. “The Convexity and Design of Composite Multiclass Losses”, ICML 2012.
- Peng Sun, Yinan Na, and Jie Zhou. “A novel algorithm for cut shot boundary detection.” Seventh International Symposium on Multispectral Image Processing and Pattern Recognition (MIPPR2011). SPIE, 2011.
- My PhD Thesis: “Improved Boosting Classifier” (In Chinese, available upon request)
Professional Activities
- Registered reviewer for Journal of Machine Learning Research (JMLR), Machine Learning (MLJ), IEEE Transaction on Image Processing (TIP), NIPS(2017, 2018), International Conference on Machine Learning (ICML2016, 2017, 2018).
- Invited talk at Learning to Rank Team, Division of Page Searching, Baidu Inc., Oct 2013.
Code/Project
- ViZDoom maps/scenarios for the ICML2018 Active Object Tracking paper
- Distributed Multiple-Learner-Multiple-Actor IMPALA
- 3D Conv Layer (mex for Matlab, with CUDA)
- CT volume segmentation based on 3D ConvNet (in Matlab). The NN Lib, Vessel Segmentation, 3D Heart Pose Regression
- One-Hot ConvNet for Text Classification (in Torch 7)
- The Multi-Class LogitBoost for the ICML2012 AOSO-LogitBoost paper
- Data Conversion between Matlab & C++