About Me

Hi there, I am Shiyu Hu (胡世宇)!

Currently, I am a Research Fellow at Nanyang Technological University (NTU), working with Prof. Kanghao Cheong. Before that, I got my Ph.D. degree at Institute of Automation, Chinese Academy of Sciences (中国科学院自动化研究所) and University of Chinese Academy of Sciences (中国科学院大学) in Jan. 2024, supervised by Prof. Kaiqi Huang (黄凯奇) (IAPR Fellow), co-supervised by Prof. Xin Zhao (赵鑫). I received my master’s degree from the Department of Computer Science, the University of Hong Kong (HKU) under the supervision of Prof. Choli Wang (王卓立).

📣 If you are interested in my research direction or hope to cooperate with me, feel free to contact me! Online or offline cooperations are all welcome (shiyu.hu@ntu.edu.sg).

🔥 News

2025.05: 📝One review paper has been accepted by Computers and Education: Artificial Intelligence.

2025.05: 📣Our new work FIOVA is now online! We introduce a multi-annotator benchmark for human-aligned video captioning, supporting semantic diversity and cognitive-aware evaluation. Check out the project page and arXiv paper for more details.

2025.05: 📣Our updated work SOEI now available! Building upon our previous framework, this version introduces interactive multi-turn simulation to model open-ended educational dialogues with cognitively plausible virtual students. We further validate the framework’s effectiveness through behavioral analysis, personality recognition, and teacher-student reflection. Read more in the arXiv paper.

2025.05: 📣We will present our work (SOTVerse) at IJCV2024 during the VALSE2025 poster session (June 2025, Zhuhai, China).

2025.05: 📝One paper (CSTrack) has been accepted by International Conference on Machine Learning (ICML, CCF-A conference).

2025.05: 📝One corresponding paper (MSAD) has been accepted by IET Computer Vision (IET-CVI, CCF-C journal).

2025.04: 📣We will conduct a tutorial at 34th International Joint Conference on Artificial Intelligence (IJCAI) (16th-22nd August, 2025, Montreal, Canada).

2025.04: 📣We will conduct a tutorial at 28th European Conference on Artificial Intelligence (ECAI) (25th-30th October, 2025, Bologna, Italy).

2025.03: 📣We will conduct a tutorial at 2025 IEEE International Conference on Systems, Man, and Cybernetics (SMC) (5th-8th October, 2025, Vienna, Austria).

2025.02: 📖The book Visual Object Tracking: An Evaluation Perspective is online.

2025.01: 📝One paper (CTVLT) has been accepted by IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP, CCF-B conference).

2025.01: 📣A special issue (Techniques and Applications of Multimodal Data Fusion) in Electronics has been announced, all papers related to this topic are welcomed for submission!

2024.12: 📣We have conducted a tutorial at Asian Conference on Computer Vision (ACCV) (Dec. 9th 2024, Hanoi, Vietnam).

2024.12: 📣We have prepared a tutorial at International Conference on Pattern Recognition (ICPR) (Dec. 1st 2024, Kolkata, India).

2024.10: 📣We have conducted a tutorial at IEEE International Conference on Image Processing (ICIP) (Oct. 27th 2024, Abu Dhabi, United Arab Emirates).

2024.09: 📝Two papers (MemVLT and CPDTrack) have been accepted by Conference on Neural Information Processing Systems (NeurIPS, CCF-A Conference).

2024.08: 📣One tutorial proposal has been accepted by Asian Conference on Computer Vision (ACCV), the tutorial will be conducted in Dec. 2024 (Hanoi, Vietnam).

2024.08: 👩‍💻Start my work as a Research Fellow in Nanyang Technological University (NTU), Singapore.

2024.07: 📣One tutorial proposal has been accepted by International Conference on Pattern Recognition (ICPR), the tutorial will be conducted in Dec. 2024 (Kolkata, India).

2024.06: 📝One paper has been accepted by Chinese Conference on Pattern Recognition and Computer Vision (PRCV).

2024.06: 📝One paper has been accepted by Chinese Mental Health Journal (《中国心理卫生杂志》).

▶️For More News

👩‍💻 Experiences

2024.08 - Now : Research Fellow at Nanyang Technological University (NTU)

Direction: AI4Science, Computer Vision
PI: Prof. Kanghao Cheong

2018.03 - 2018.11 : Research Assistant at University of Hong Kong (HKU)

Direction: High Performance Computing, Heterogeneous Computing
PI: Prof. Choli Wang

2016.08 - 2016.09 : Research Intern at Institute of Electronics, Chinese Academy of Sciences (CASIE)

📖 Educations

2019.09 - 2024.01 : Ph.D. in Institute of Automation, Chinese Academy of Sciences (CASIA) and University of Chinese Academy of Sciences (UCAS)

Major: Computer Applied Technology
Supervisor: Prof. Kaiqi Huang (IAPR Fellow, IEEE Senior Member, 10,000 Talents Program - Leading Talents)
Co-supervisor: Prof. Xin Zhao (IEEE Senior Member, Beĳing Science Fund for Distinguished Young Scholars)
Thesis Title: Research of Intelligence Evaluation Techniques for Single Object Tracking
Thesis Committee: Prof. Jianbin Jiao, Prof. Yuxin Peng (The National Science Fund for Distinguished Young Scholars), Prof. Yao Zhao (IEEE Fellow, IET Fellow, The National Science Fund for Distinguished Young Scholars), Prof. Yunhong Wang (IEEE Fellow, IAPR Fellow, CCF Fellow), Prof. Ming Tang
Thesis Defense Grade: Excellent

2017.09 - 2019.06 : M.S. in Department of Computer Science, University of Hong Kong (HKU)

Major: Computer Science
Supervisor: Prof. Choli Wang
Thesis Title: NightRunner: Deep Learning for Autonomous Driving Cars after Dark [🌐Project]
Thesis Defense Grade: A+

2013.09 - 2017.06 : B.E. in Elite Class in School of Information and Electronics, Beijing Institute of Technology (BIT)

Major: Information Engineering
Supervisor: Prof. Senlin Luo
Thesis Title: Text Sentiment Analysis Based on Deep Neural Network
Thesis Defense Grade: Excellent

2015.07 - 2015.08 : Summer Semester in University of California, Berkeley (UCB)

Major: New Media
Course Grade: A

🔍️ Research Interests

Research Foundation

My previous research has primarily been dedicated to evaluating and exploring machine vision intelligence. This research encompasses various aspects such as task modeling, environment construction, evaluation technique, and human-machine comparisons. I strongly hold the belief that the development of artificial intelligence is inherently interconnected with human factors. Hence, drawing inspiration from the renowned Turing Test, I have focused my investigation on the concept of Visual Turing Test, aiming to integrate human elements into the evaluation of dynamic visual tasks. The ultimate goal of my previous work is to assess and analyze machine vision intelligence by benchmarking against human abilities. I believe that effective evaluation techniques are the foundation for helping us achieve trustworthy and secure artificial general intelligence. The following are several key aspects:

What are the abilities of humans? Designing more human-like tasks. In my research, I focused on utilizing Visual Object Tracking (VOT) as a representative task to explore dynamic visual abilities. VOT holds a pivotal role in computer vision; however, its original definition imposes excessive constraints that hinder alignment with human dynamic visual tracking abilities. To address this problem, I adopted a humanoid modeling perspective and expanded the original VOT definition. By eliminating the presumption of continuous motion, I introduced a more humanoid-oriented Global Instance Tracking (GIT) task. This expansion of the research objectives transformed VOT from a perceptual level, which involves locating targets in short video sequences through visual feature contrasts, to a cognitive level that addresses the continuous localization of targets in long videos without presuming continuous motion. Building upon this, I endeavored to incorporate semantic information into the GIT task and introduced the Multi-modal GIT (MGIT) task. The goal is to integrate a human-like understanding of long videos with hierarchically structured semantic labels, thereby further advancing the research objectives to include visual reasoning within complex spatio-temporal causal relationships.
What are the living environments of humans? Constructing more comprehensive and realistic datasets. The environment in which humans reside is characterized by complexity and constant change. However, current research predominantly employs static and limited datasets as closed experimental environments. These toy examples fail to provide machines with authentic human-like visual intelligence. To address this limitation, I draw inspiration from film theory and propose a framework for decoupling video narrative content. In doing so, I have developed VideoCube, the largest-scale object tracking benchmark. Expanding on this work, I integrate diverse environments from the field of VOT to create SOTVerse, a dynamic and open task space comprising 12.56 million frames. Within this task space, researchers can efficiently construct different subspaces to train algorithms, thereby improving their visual generalization across various scenarios. Furthermore, my research also focuses on visual robustness. Leveraging a bio-inspired flapping-wing drone developed by our team, I establish the first flapping-wing drone-based benchmark named BioDrone to enhance visual robustness in challenging environments.
How significant is the disparity between human and machine dynamic vision abilities? Utilizing human abilities as a baseline to evaluate machine intelligence. Computer scientists typically use large-scale datasets to evaluate machine models, while neuroscientists typically employ simple experimental environments to evaluate human subjects. This discrepancy makes it challenging to integrate human-machine evaluation into a unified framework for comparison and analysis. To address the aforementioned issues (How significant is the disparity between human and machine dynamic vision abilities?), I construct an experimental environment based on SOTVerse to enable a fair comparison between human and machine dynamic visual abilities. These sequences provide a thorough examination of the perceptual abilities, cognitive abilities, and robust tracking abilities of humans and machines. Based on this foundation, a human-machine dynamic visual capability evaluation framework is designed. Finally, a fine-grained experimental analysis is carried out from the perspectives of human-machine comparison and human-machine collaboration. The experimental results demonstrate that representative tracking algorithms have gradually narrowed the gap with human subjects. Furthermore, both humans and machines exhibit unique strengths in dynamic visual tasks, suggesting significant potential for human-machine collaboration.

This human-centered evaluation concept is referred to as Visual Turing Test, and I have presented my thoughts and future prospects in this direction through a comprehensive review on intelligent evaluation techniques. These research contents can be summarized using the 3E paradigm. In order to enable machines to acquire human abilities, we need to construct a humanoid proxy task and execute it through interactions among the environment, evaluation, and executors. Ultimately, the executors’ performance reflects their level of ability, and their upper limit of ability is continuously improved through ongoing iterations. I hope these research can create a comprehensive system that lays a solid research foundation for improving the dynamic visual abilities of machines.

Detailed Lists of Current Research Interests

Data-Centric AI

Research on the construction strategy of single-modal and multi-modal datasets incorporating human knowledge structure.
Research on designing evaluation mechanism for visual robustness, generalization, and safety.

Visual Turing Test

Design of a human-machine universal visual ability evaluation framework.
Benchmarking the performance of algorithms based on human abilities in perceptual, cognitive, inferential, etc. Analyzing the bottlenecks of algorithms and human subjects in depth, providing guidance for research on human-like modeling, human-machine collaboration, and human-machine integration.

Video Understanding

Exploring using Large Language Models (LLMs) and Large Vision Models (LVMs) for long video understanding.

Visual Object Tracking (VOT)

Research on single object tracking algorithms in general scenes and specific scenarios (such as unmanned aerial vehicles).

Visual Language Tracking (VLT)

Research on multi-modal tracking, video understanding, and visual reasoning tasks based on long video sequences.

Human-machine Interaction

Exploring human-computer interaction patterns in video sequences with various proxy tasks.

AI4Science

Education: Research on human-computer interaction (HCI) technology for education scenarios, including designing an intelligent education framework from a multidisciplinary perspective, investigating HCI technology, conducting qualitative and quantitative analysis.
Cognitive Science: Visual task design, environment construction, and human-machine capability analysis based on human-like modeling principles.
Medical Science: Research on medical image processing techniques based on artificial intelligence technologies (e.g., cell segmentation and tracking, denoising of cryo-electron microscopy images).
Psychology: Development of gamified assessment systems targeting psychological dimensions such as anxiety, depression, and obsession, along with research on intelligent psychological evaluation technologies. Exploring using Large Language Models (LLMs) and Large Vision Models (LVMs) for visual comprehension with psychological elements.

📝 Publications

Book

Springer 2025

Visual Object Tracking: An Evaluation Perspective
X. Zhao, Shiyu Hu, X. Yin
Springer, Part of the book series: Advances in Computer Vision and Pattern Recognition (ACVPR)
📌 Visual Object Tracking 📌 Intelligent Evaluation Technology
📃 Book

Acceptance

TPAMI 2023

Global Instance Tracking: Locating Target More Like Humans
Shiyu Hu, X. Zhao, L. Huang, K. Huang
IEEE Transactions on Pattern Analysis and Machine Intelligence (CCF-A Journal)
📌 Visual Object Tracking 📌 Large-scale Benchmark Construction 📌 Intelligent Evaluation Technology
📃 Paper 📑 PDF 🪧 Poster 🌐 Platform 🔧 Toolkit 💾 Dataset

IJCV 2024

SOTVerse: A User-defined Task Space of Single Object Tracking
Shiyu Hu, X. Zhao, K. Huang
International Journal of Computer Vision (CCF-A Journal)
📌 Visual Object Tracking 📌 Dynamic Open Environment Construction 📌 3E Paradigm
📃 Paper 📑 PDF 🌐 Platform

IJCV 2024

BioDrone: A Bionic Drone-based Single Object Tracking Benchmark for Robust Vision
X. Zhao, Shiyu Hu✉️, Y. Wang, J. Zhang, Y. Hu, R. Liu, H. Lin, Y. Li, R. Li, K. Liu, J. Li
International Journal of Computer Vision (CCF-A Journal)
📌 Visual Object Tracking 📌 Drone-based Tracking 📌 Visual Robustness
📃 Paper 🌐 Platform 📑 PDF 🔧 Toolkit 💾 Dataset

NeurIPS 2023

A Multi-modal Global Instance Tracking Benchmark (MGIT): Better Locating Target in Complex Spatio-temporal and causal Relationship
Shiyu Hu, D. Zhang, M. Wu, X. Feng, X. Li, X. Zhao, K. Huang
Conference on Neural Information Processing Systems (CCF-A Conference, Poster)
📌 Visual Language Tracking 📌 Long Video Understanding and Reasoning 📌 Hierarchical Semantic Information Annotation
📃 Paper 📃 PDF 🪧 Poster 📹 Slides 🌐 Platform 🔧 Toolkit 💾 Dataset

中国图象图形学报 2023

Visual Intelligence Evaluation Techniques for Single Object Tracking: A Survey (单目标跟踪中的视觉智能评估技术综述)
Shiyu Hu, X. Zhao, K. Huang
Journal of Images and Graphics (《中国图象图形学报》, CCF-B Chinese Journal)
📌 Visual Object Tracking 📌 Intelligent Evaluation Technique 📌 AI4Science
📃 Paper 📑 PDF

IET-CVI 2025

Improved SAR Aircraft Detection Algorithm Based on Visual State Space Models
Y. Wang, J. Zhang, Y. Wang, Shiyu Hu✉️, B. Shen, Z. Hou, W. Zhou
IET Computer Vision (CCF-C Journal)
📌 Synthetic Aperture Radar 📌 State Space Models 📌 Aircraft Object Detection

ICML 2025

CSTrack: Enhancing RGB-X Tracking via Compact Spatiotemporal Features
X. Feng, D. Zhang, Shiyu Hu, X. Li, M. Wu, J. Zhang, X. Chen, K. Huang
International Conference on Machine Learning (CCF-A Conference, Poster)
📌 Visual Object Tracking 📌 Multi-modal Learning

NeurIPS 2024

Beyond Accuracy: Tracking more like Human via Visual Search
D. Zhang, Shiyu Hu, X. Feng, X. Li, M. Wu, J. Zhang, K. Huang
Conference on Neural Information Processing Systems (CCF-A Conference, Poster)
📌 Visual Object Tracking 📌 Visual Search Mechanism 📌 Visual Turing Test

NeurIPS 2024

MemVLT: Vision-Language Tracking with Adaptive Memory-based Prompts
X. Feng, X. Li, Shiyu Hu, D. Zhang, M. Wu, J. Zhang, X. Chen, K. Huang
Conference on Neural Information Processing Systems (CCF-A Conference, Poster)
📌 Visual Language Tracking 📌 Human-like Memory Modeling 📌 Adaptive Prompts

CVPRW 2024

Diverse Text Generation for Visual Language Tracking Based on LLM
X. Li, X. Feng, Shiyu Hu, M. Wu, D. Zhang, J. Zhang, K. Huang
the 3rd Workshop on Vision Datasets Understanding and DataCV Challenge in CVPR 2024 (Workshop in CCF-A Conference, Oral, Best Paper Honorable Mention)
📌 Visual Language Tracking 📌 Large Language Model 📌 Evaluation Technique
📃 Paper 📃 PDF 🪧 Poster 📹 Slides 🌐 Platform 🔧 Toolkit 💾 Dataset 🏆 Award

ICASSP 2025

Enhancing Vision-Language Tracking by Effectively Converting Textual Cues into Visual Cues
X. Feng, D. Zhang, Shiyu Hu, X. Li, M. Wu, J. Zhang, X. Chen, K. Huang
IEEE International Conference on Acoustics, Speech, and Signal Processing (CCF-B Conference, Poster)
📌 Visual Language Tracking 📌 Multi-modal Learning 📌 Grounding Model

ICASSP 2024

Robust Single-particle Cryo-EM Image Denoising and Restoration
J. Zhang, T. Zhao, Shiyu Hu, X. Zhao
IEEE International Conference on Acoustics, Speech, and Signal Processing (CCF-B Conference, Poster)
📌 Medical Image Processing 📌 AI4Science 📌 Diffusion Model
📃 Paper 📑 PDF

TCSVT 2024

Finger in Camera Speaks Everything: Unconstrained Air-Writing for Real-World
M. Wu, K. Huang, Y. Cai, Shiyu Hu, Y. Zhao, W. Wang
IEEE Transactions on Circuits and Systems for Video Technology (CCF-B Journal)
📌 Air-writing Technique 📌 Benchmark Construction 📌 Human-machine Interaction
📃 Paper 📃 PDF 🔧 Toolkit

PRCV 2024

VS-LLM: Visual-Semantic Depression Assessment based on LLM for Drawing Projection Test
M. Wu, Y. Kang, X. Li, Shiyu Hu, X. Chen, Y. kang, W. Wang, K. Huang
Chinese Conference on Pattern Recognition and Computer Vision (CCF-C Conference)
📌 Psychological Assessment System 📌 Gamified Assessment 📌 AI4Science
📃 Paper 📃 PDF

PRCV 2023

A Hierarchical Theme Recognition Model for Sandplay Therapy
X. Feng, Shiyu Hu, X. Chen, K. Huang
Chinese Conference on Pattern Recognition and Computer Vision (CCF-C Conference, Poster)
📌 Psychological Assessment System 📌 Gamified Assessment 📌 AI4Science
📃 Paper 📑 PDF 🔖 Supplementary 🪧 Poster

Neurocomputing 2022

Revisiting Instance Search: A New Benchmark Using Cycle Self-training
Y. Zhang, C. Liu, W. Chen, X. Xu, F. Wang, H. Li, Shiyu Hu, X. Zhao
Neurocomputing (CCF-C Journal)
📌 Video Instance Search 📌 Benchmark Construction 📌 Data Mining
📃 Paper 📑 PDF 🌐 Project

图学学报 2021

Visual Turing: The Next Development of Computer Vision in The View of Human-computer Gaming (视觉图灵：从人机对抗看计算机视觉下一步发展)
K. Huang, X. Zhao, Q. Li, Shiyu Hu
Journal of Graphics (《图学学报》, CCF-C Chinese Journal)
📌 Visual Object Tracking 📌 Intelligent Evaluation Technique 📌 AI4Science
📃 Paper 📑 PDF

C&E:AI 2025

Artificial Intelligence-Enabled Adaptive Learning Platforms: A Review
L. Tan, Shiyu Hu, Darren J. Yeo, K. Cheong
Computers & Education: Artificial Intelligence
📌 Adaptive Learning Platforms 📌 AI for Education 📌 Educational Technology

中国心理卫生杂志 2025

A Review of Intelligent Psychological Assessment Based on Interactive Environment (基于交互环境的智能化心理测评)
K. Huang, Y. Kang, C. Yan, Shiyu Hu, L. Wang, T. Tao, W. Gao
Chinese Mental Health Journal (《中国心理卫生杂志》, CSSCI Journal, Top Psychological Journal in China)
📌 Psychological Assessment System 📌 Gamified Assessment 📌 AI4Science

CSAI 2023

Rethinking Similar Object Interference in Single Object Tracking
Y. Wang, Shiyu Hu, X. Zhao
International Conference on Computer Science and Artificial Intelligence (EI Conference, Oral)
📌 Visual Object Tracking 📌 Similar Object Interference 📌 Data Mining
📃 Paper 🗒 bibTex 📑 PDF

Preprint

FIOVA: A Multi-Annotator Benchmark for Human-Aligned Video Captioning
Shiyu Hu*, X. Li*, X. Li, J. Zhang, Y. Wang, X. Zhao, K. Cheong (*Equal Contributions)
📌 Large Vision-Language Models 📌 Video Caption 📌 Video Understanding
📃 Paper 📑 PDF 🌐 Project

Preprint

When LLMs Learn to be Students: The SOEI Framework for Modeling and Evaluating Virtual Student Agents in Educational Interaction
Y. Ma*, Shiyu Hu*, X. Li, Y. Wang, Y. Chen, S. Liu, K. Cheong (*Equal Contributions)
📌 AI4Education 📌 LLMs 📌 LLM-based Agent
📃 Paper 📑 PDF

Preprint

DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM
X. Li, Shiyu Hu, X. Feng, D. Zhang, M. Wu, J. Zhang, K. Huang
📌 Visual Language Tracking 📌 Large Language Model 📌 Evaluation Technique
📃 Paper 📑 PDF 🌐 Project

Preprint

Visual Language Tracking with Multi-modal Interaction: A Robust Benchmark
X. Li, Shiyu Hu, X. Feng, D. Zhang, M. Wu, J. Zhang, K. Huang
📌 Visual Language Tracking 📌 Multi-modal Interaction 📌 Evaluation Technology
📃 Paper 📑 PDF 🌐 Project

Preprint

Nearing or Surpassing: Overall Evaluation of Human-Machine Dynamic Vision Ability
Shiyu Hu, X. Zhao, Y. Wang, Y. Shan, K. Huang
📌 Visual Object Tracking 📌 Intelligent Evaluation Technique 📌 AI4Science
📑 PDF

⚙️ Projects

The list here mainly includes engineering projects, while more academic projects have already been published in the form of research papers. Please refer to the 📝 Publications for more information.

2018.03-2018.11

Darknet-Cross: Light-weight Deep Learning Framework for Heterogeneous Computing
📌 High-performance Computing 📌 Heterogeneous Computing 📌 Deep learning Framework

Darknet-Cross is a lightweight deep learning framework, mainly based on the open-source deep learning algorithm library Darknet and yolov2_light, and it has been successfully ported to mobile devices through cross-compilation. This framework enables efficient algorithm inference using mobile GPUs.
Darknet-Cross supports algorithm acceleration processing on various platforms (e.g., Android and Ubuntu) and various GPUs (e.g., Nvidia GTX1070 and Adreno 630).
The work is a part of my master’s thesis at the University of Hong Kong (thesis defense grade: A+).

2019.05 - 2019.10

A Skin Color Detection System without Color Atla
📌 Color Constancy 📌 Skin Color Detection 📌 Illumination Estimation

Under 18 different environmental lighting conditions and with 4 combinations of smartphone parameters, skin color data was collected from 110 participants. The skin color dataset consists of 7,920 images, with the testing results from CK Company’s MPA9 skin color detector serving as the ground truth for user skin colors.
Using an elliptical skin model, the essential skin regions are extracted from the images. The open-source color constancy model, FC4, is employed to recover the environmental lighting conditions. Subsequently, the skin color detection results for users are calculated using SVR regression.
The related work has been successfully deployed in Huawei’s official mobile application ‘Mirror’ for its AI skin testing function.

2020.11 - 2021.03

A Project for Cell Tracking Based on Deep Learning Method
📌 Medical Image Processing 📌 AI4Science 📌 Cell Segmentation and Tracking

This method follows the tracking by detection paradigm and combines per-frame CNN prediction for cell segmentation with a Siamese network for cell tracking.
This project was submitted to the cell tracking challenge in Mar. 2021, and maintains the second place in the Fluo-C2FL-MSC+ dataset and the third place in the Fluo-C2FL-Huh7 dataset (statistics by Oct. 2023).

2024.01 - Now

Research on the Dilemma and Countermeasures of Human-Computer Interaction in Intelligent Education
📌 Intelligent Education Technology 📌 Human-Computer Interaction 📌 AI4Science

Incorporating insights and methodologies from education, cognitive psychology, and computer science, this project establishes a theoretical framework for understanding the evolution of HCI within the intelligent education.
Drawing upon the established theoretical framework, this project conducts a comprehensive analysis of the evolution of HCI in educational settings, transitioning from collaboration to integration. Furthermore, it delves into the key issues arising from this transformative process within the realm of intelligent education.
Building upon the core issues unearthed, this project investigates strategies for leveraging theoretical guidance and technical enhancements to enhance the efficacy of HCI in intelligent education, ultimately striving towards effective human-computer integration.
The project is funded by the 2023 Intelligent Education PhD Research Fund, supported by the Institute of AI Education Shanghai and East China Normal University, and is currently in progress.

🏆 Honors and Awards

2024 Best Paper Honorable Mention in the 3rd Workshop on Vision Datasets Understanding and DataCV Challenge in CVPR 2024 (CVPRW最佳论文提名)
2024 Beijing Outstanding Graduates (北京市优秀毕业生, top 5%)
2023 China National Scholarship (国家奖学金, top 1%, only 8 Ph.D. students in main campus of University of Chinese Academy of Sciences win this scholarship)
2023 First Prize of Climbing Scholarship in Institute of Automation, Chinese Academy of Sciences (攀登一等奖学金, only 6 students in Institute of Automation, Chinese Academy of Sciences win this scholarship)
2022 Merit Student of University of Chinese Academy of Sciences (中国科学院大学三好学生)
2017 Excellent Innovative Student of Beijing Institute of Technology (北京理工大学优秀创新学生)
2016 College Scholarship of Chinese Academy of Sciences (中国科学院大学生奖学金)
2016 Excellent League Member on Youth Day Competition of Beijing Institute of Technology (北京理工大学优秀团员)
2015 National First Prize in Contemporary Undergraduate Mathematical Contest in Modeling (CUMCM) (全国大学生数学建模竞赛国家一等奖, top 1%, only 1 team in Beijing Institute of Technology win this prize) [📑PDF] [📖Selected and Reviewed Outstanding Papers in CUMCM (2011-2015) (Chapter 9)]
2015 First Prize of Mathematics Modeling Competition within Beijing Institute of Technology (北京理工大学数学建模校内选拔赛第一名)
2015 Outstanding Individual on Summer Social Practice of Beijing Institute of Technology (北京理工大学暑期社会实践优秀个人)
2015 Second Prize on Summer Social Practice of Beijing Institute of Technology (北京理工大学暑期社会实践二等奖, team leader)
2015 Outstanding Student Cadre of Beijing Institute of Technology (北京理工大学优秀学生干部)
2015 Outstanding League Cadre on Youth Day Competition of Beijing Institute of Technology (北京理工大学优秀团干部)
2015 Outstanding Youth League Branch of Beijing Institute of Technology (北京理工大学优秀团支部, team leader)
2015 Top-10 Activities on Youth Day Competition of Beijing Institute of Technology (北京理工大学十佳团日活动, team leader)
2014 Outstanding Student of Beijing Institute of Technology (北京理工大学优秀学生)
2014, 2015, 2016, 2017 Academic Scholarship of Beijing Institute of Technology (北京理工大学学业奖学金)

📣 Activities and Services

Tutorial

34th International Joint Conference on Artificial Intelligence (IJCAI)

Title: Human-Centric and Multimodal Evaluation for Explainable AI: Moving Beyond Benchmarks
Date & Location: 16th-22nd August, 2025, Montreal, Canada

28th European Conference on Artificial Intelligence (ECAI)

Title: From Benchmarking to Trustworthy AI: Rethinking Evaluation Methods Across Vision and Complex Systems
Date & Location: 25th-30th October, 2025, Bologna, Italy
🌐 Webpage

2025 IEEE International Conference on Systems, Man, and Cybernetics (SMC)

Title: The Synergy of Large Language Models and Evolutionary Optimization on Complex Networks
Date & Location: 5th-8th October, 2025, Vienna, Austria

31th IEEE International Conference on Image Processing (ICIP)

Title: An Evaluation Perspective in Visual Object Tracking: from Task Design to Benchmark Construction and Algorithm Analysis
Date & Location: 9:00-12:30, 27th October, 2024, Abu Dhabi, United Arab Emirates
Duration: Half-day
📹 Slides 🌐 Webpage

27th International Conference on Pattern Recognition (ICPR)

Title: Visual Turing Test in Visual Object Tracking: A New Vision Intelligence Evaluation Technique based on Human-Machine Comparison
Date & Location: 14:30-18:00, 1st December, 2024, Kolkata, India
Duration: Half-day
📹 Slides

17th Asian Conference on Computer Vision (ACCV)

Title: From Machine-Machine Comparison to Human-Machine Comparison: Adapting Visual Turing Test in Visual Object Tracking
Date & Location: 9:00-12:00, 9th December, 2024, Hanoi, Vietnam
Duration: Half-day
📹 Slides 🌐 Webpage

Guest Editor

Journals: Electronics (Special Issue: Techniques and Applications of Multimodal Data Fusion)

Associate Editor

Journals: Innovation and Emerging Technologies

Reviewer

Conferences: NeurIPS, ICML, ICLR, CVPR, ECCV, ICCV, AAAI, IJCAI, ACMMM, AISTATS, etc.
Journals: IEEE Transactions on Image Processing, SCIENCE CHINA Information Sciences, IEEE Transactions on Network Science and Engineering, IEEE Transactions on Vehicular Technology, Scientific Reports, IEEE Access, Journal of Computational Science, Journal of Electronic Imaging, Digital Signal Processing, etc.

Member

Societies: Institute of Electrical and Electronics Engineers (IEEE, No.97803543), China Society of Image and Graphics (CSIG, No.E651129499M), Chinese Association for Artificial Intelligence (CAAI, No.E660120827A), China Computer Federation (CCF, No.Z1771M).

📄 CV

English Version

✉️ Contact

shiyu.hu@ntu.edu.sg (Main)
hushiyu199510@gmail.com (Personal)
~~hushiyu2019@ia.ac.cn~~ (Valid from 2019.06 - 2024.07)

My homepage visitors recorded from April 18th, 2024. Thanks for attention.