About Me

Hi there, I am Shiyu Hu (胡世宇)!

I’ve got my Ph.D. degree at Institute of Automation, Chinese Academy of Sciences (中国科学院自动化研究所) and University of Chinese Academy of Sciences (中国科学院大学) in Jan. 2024, supervised by Prof. Kaiqi Huang (黄凯奇) (IAPR Fellow). I am also fortunate to work with Prof. Xin Zhao (赵鑫) to conduct research in computer vision. Before that, I received my master’s degree from the Department of Computer Science, the University of Hong Kong (HKU) under the supervision of Prof. Choli Wang (王卓立).

I strongly hold the belief that the development of artificial intelligence is inherently interconnected with human factors. Hence, drawing inspiration from the renowned Turing Test, I have focused my investigation on the concept of Visual Turing Test, aiming to integrate human elements into the evaluation of dynamic visual tasks. The ultimate goal of my previous work is to assess and analyze machine vision intelligence by benchmarking against human abilities. I believe that effective evaluation techniques are the foundation for helping us achieve trustworthy and secure artificial general intelligence. Please refer to the 🔍️ Research Interests for detailed information about my research foundation and ongoing projects. Besides, I am honored to collaborate with a group of outstanding researchers. We have established the Visual Intelligence Interest Group (VIIG) to promote research in related directions.

📣 I am seeking a PostDoc position starting from Spring 2024. If you are interested in my research or would like to collaborate, please do not hesitate to contact me. You can download my CV here.

🤖 Professional Summary

Excellent Education Background

I have obtained my bachelor’s, master’s, and doctoral degrees from top universities/research institutions in China. The defense results are all excellent.
I have received numerous awards and honors, including the National Scholarship (top 1%, 2023) and Beijing Outstanding Graduates (top 5%, 2024).

Solid Research Foundation

During my doctoral studies, I have published 14 papers, of which 5 are first-author/corresponding author publications – including 3 papers in IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE TPAMI, Top-1 Journal in computer vision, CCF-A journal) and International Journal of Computer Vision (IJCV, Top-2 Journal in computer vision, CCF-A journal), 1 paper in the 37th Conference on Neural Information Processing Systems (NeurIPS, Top AI conferences in Google Scholar, CCF-A conference), and a survey in Journal of Images and Graphics (top journal in China). Besides, invited by Springer, I will complete a book in Dec. 2024 (title: Visual Object Tracking - An Evaluation Perspective).
The research platform that I am responsible for building and maintaining has received over 382k visits from 130+ countries and regions worldwide.

Wide Communication and Collaboration

I have served as a reviewer for top conferences and journals such as CVPR, ECCV, AAAI, ACMMM, SCIENCE CHINA Information Sciences, etc., and will conduct a tutorial at ICIP conference in Oct. 2024 (tutorial title: An Evaluation Perspective in Visual Object Tracking: from Task Design to Benchmark Construction and Algorithm Analysis).
Since Sep. 2022, I have initiated and organized interdisciplinary seminars based on computer vision (40+ times, involving 10+ schools and 20+ individuals), covering research areas such as computer vision, cognitive neuroscience, and human-computer interaction.
I have assisted and supervised nearly 10 bachelor’s, master’s, and doctoral students in carrying out research work. Besides, I have established the Visual Intelligence Interest Group (VIIG) and work with these students to promote research in related directions (e.g., visual object tracking, visual language tracking, visual Turing test, and human-computer interaction technology).

🔥 News

2024.06: 📝One paper has been accepted by the 7th Chinese Conference on Pattern Recognition and Computer Vision (PRCV, CCF-C Conference).
2024.06: 📝One paper has been accepted by Chinese Mental Health Journal (CSSCI Journal, Top Psychological Journal in China).
2024.05: 🏆Obtain Beijing Outstanding Graduates (北京市优秀毕业生, top 5%).
2024.05: 📖Invited by Springer, I will complete a book in Dec. 2024 with Prof. Xin Zhao and Prof. Xucheng Yin (title: Visual Object Tracking - An Evaluation Perspective).
2024.05: 📣We have presented our work (Global Instance Tracking (GIT)) at TPAMI2023 during the VALSE2024 poster session (May 2024, Chongqing, China, see our 🪧 Poster for more information).
2024.04 : 📝One paper has been accepted by the 3rd Workshop on Vision Datasets Understanding and DataCV Challenge in CVPR 2024 (CVPRW, Workshop in CCF-A Conference, Oral, Best Paper Honorable Mention).
2024.04 : 📝One paper has been accepted by IEEE Transactions on Circuits and Systems for Video Technology (TCSVT, CCF-B Journal).
2024.04: 📣I intend to continuously update my learning notes of cognitive neuroscience in computer vision on Zhihu, and the textbook I have chosen is “Understanding Vision: Theory, Models, and Data” by Prof. Zhaoping Li. I encourage interested researchers to join the discussion.
2024.01: 📣One tutorial proposal has been accepted by the 2024 IEEE International Conference on Image Processing (ICIP, CCF-C Conference), the tutorial will be conducted in Oct. 2024 (Abu Dhabi, United Arab Emirates, tutorial title: An Evaluation Perspective in Visual Object Tracking: from Task Design to Benchmark Construction and Algorithm Analysis).
2024.01: 🪙One project about human-computer interaction in intelligent education has been funded by the 2023 Intelligent Education PhD Research Fund, supported by the Institute of AI Education Shanghai and East China Normal University.
2024.01: 👩‍🎓Got my Ph.D. degree at Institute of Automation, Chinese Academy of Sciences (CASIA) and University of Chinese Academy of Sciences (UCAS).
2023.12 : 📝One paper has been accepted by the 49th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP, CCF-B Conference).
2023.11 : 👩‍🎓Passed the doctoral thesis defense with unanimous distinction.
2023.11 : 📝One second author paper has been accepted by the 7th International Conference on Computer Science and Artificial Intelligence (CSAI, EI Conference, Oral).
2023.10 : 🏆Obtain China National Scholarship (国家奖学金, top 1%, only 8 Ph.D. students in main campus of UCAS win this scholarship).
2023.10 : 🏆Obtain First Prize of Climbing Scholarship (攀登一等奖学金, only 6 students in CASIA win this scholarship).
2023.10 : 📝One second author & co-corresponding author paper has been accepted by International Journal of Computer Vision (IJCV, CCF-A Journal).
2023.09 : 📝One first author survey has been accepted by Journal of Images and Graphics (CCF-B Chinese Journal).
2023.09 : 📝One first author paper has been accepted by the 37th Conference on Neural Information Processing Systems (NeurIPS, CCF-A Conference, Poster).
2023.09 : 📝One first author paper has been accepted by International Journal of Computer Vision (IJCV, CCF-A Journal).
2023.08 : 📝One second author paper has been accepted by the 6th Chinese Conference on Pattern Recognition and Computer Vision (PRCV, CCF-C Conference, Poster).
2022.06 : 🏆Obtain merit student of University of Chinese Academy of Sciences.
2022.06 : 📝One paper has been accepted by Neurocomputing (Neu, CCF-C Journal).
2022.02 : 📝One first author paper has been accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI, CCF-A Journal).
2021.06 : 📝One survey has been accepted by Journal of Graphics (CCF-C Chinese Journal).

📖 Educations

2019.09 - 2024.01 : Ph.D. in Institute of Automation, Chinese Academy of Sciences (CASIA) and University of Chinese Academy of Sciences (UCAS)

Major: Computer Applied Technology
Supervisor: Prof. Kaiqi Huang (IAPR Fellow, IEEE Senior Member, 10,000 Talents Program - Leading Talents)
Co-supervisor: Prof. Xin Zhao ( Beĳing Science Fund for Distinguished Young Scholars)
Thesis Title: Research of Intelligence Evaluation Techniques for Single Object Tracking
Thesis Committee: Prof. Jianbin Jiao, Prof. Yuxin Peng (The National Science Fund for Distinguished Young Scholars), Prof. Yao Zhao (IEEE Fellow, IET Fellow, The National Science Fund for Distinguished Young Scholars), Prof. Yunhong Wang (IEEE Fellow, IAPR Fellow, CCF Fellow), Prof. Ming Tang
Thesis Defense Grade: Excellent

2017.09 - 2019.06 : M.S. in Department of Computer Science, University of Hong Kong (HKU)

Major: Computer Science
Supervisor: Prof. Choli Wang
Thesis Title: NightRunner: Deep Learning for Autonomous Driving Cars after Dark [🌐Project]
Thesis Defense Grade: A+

2013.09 - 2017.06 : B.E. in Elite Class in School of Information and Electronics, Beijing Institute of Technology (BIT)

Major: Information Engineering
Supervisor: Prof. Senlin Luo
Thesis Title: Text Sentiment Analysis Based on Deep Neural Network
Thesis Defense Grade: Excellent

2015.07 - 2015.08 : Summer Semester in University of California, Berkeley (UCB)

Major: New Media
Course Grade: A

👩‍💻 Experiences

2022.09 - Now : Initiator and organizer of interdisciplinary symposia around computer vision (22 participants from 10+ universities, once a week).
2022.09 - 2023.07 : Assisted tutor for two undergraduate students about their bachelor’s degree projects in University of Chinese Academy of Sciences (UCAS) (one for visual object tracking, one for visual Turing test).
2022.05 - 2022.10 : Organizer of the 3rd High-Speed Low-Power Visual Understanding Challenge in the 5th Chinese Conference on Pattern Recognition and Computer Vision.
2018.03 - 2018.11 : Research assistant at Big-Little Heterogeneous Computing with Polymorphic GPU Kernels, University of Hong Kong.
2016.08 - 2016.09 : Internship on satellite faster algorithm of hard X-ray modulation telescope for space pilot satellite project at Aerospace Information Research Institute, Chinese Academy of Sciences. Internship grade: A+.
2015.07 - 2015.08 : Team leader in Summer Social Practice in University of California, Berkeley
2013.09 - 2017.06 : League Branch Secretary of Elites Class in School of Information and Electronics, Beijing Institute of Technology.

🔍️ Research Interests

The development of artificial intelligence is inherently interconnected with human factors.

Research Foundation

My previous research has primarily been dedicated to evaluating and exploring machine vision intelligence. This research encompasses various aspects such as task modeling, environment construction, evaluation technique, and human-machine comparisons. I strongly hold the belief that the development of artificial intelligence is inherently interconnected with human factors. Hence, drawing inspiration from the renowned Turing Test, I have focused my investigation on the concept of Visual Turing Test, aiming to integrate human elements into the evaluation of dynamic visual tasks. The ultimate goal of my previous work is to assess and analyze machine vision intelligence by benchmarking against human abilities. I believe that effective evaluation techniques are the foundation for helping us achieve trustworthy and secure artificial general intelligence. The following are several key aspects:

What are the abilities of humans? Designing more human-like tasks. In my research, I focused on utilizing Visual Object Tracking (VOT) as a representative task to explore dynamic visual abilities. VOT holds a pivotal role in computer vision; however, its original definition imposes excessive constraints that hinder alignment with human dynamic visual tracking abilities. To address this problem, I adopted a humanoid modeling perspective and expanded the original VOT definition. By eliminating the presumption of continuous motion, I introduced a more humanoid-oriented Global Instance Tracking (GIT) task. This expansion of the research objectives transformed VOT from a perceptual level, which involves locating targets in short video sequences through visual feature contrasts, to a cognitive level that addresses the continuous localization of targets in long videos without presuming continuous motion. Building upon this, I endeavored to incorporate semantic information into the GIT task and introduced the Multi-modal GIT (MGIT) task. The goal is to integrate a human-like understanding of long videos with hierarchically structured semantic labels, thereby further advancing the research objectives to include visual reasoning within complex spatio-temporal causal relationships.
What are the living environments of humans? Constructing more comprehensive and realistic datasets. The environment in which humans reside is characterized by complexity and constant change. However, current research predominantly employs static and limited datasets as closed experimental environments. These toy examples fail to provide machines with authentic human-like visual intelligence. To address this limitation, I draw inspiration from film theory and propose a framework for decoupling video narrative content. In doing so, I have developed VideoCube, the largest-scale object tracking benchmark. Expanding on this work, I integrate diverse environments from the field of VOT to create SOTVerse, a dynamic and open task space comprising 12.56 million frames. Within this task space, researchers can efficiently construct different subspaces to train algorithms, thereby improving their visual generalization across various scenarios. Furthermore, my research also focuses on visual robustness. Leveraging a bio-inspired flapping-wing drone developed by our team, I establish the first flapping-wing drone-based benchmark named BioDrone to enhance visual robustness in challenging environments.
How significant is the disparity between human and machine dynamic vision abilities? Utilizing human abilities as a baseline to evaluate machine intelligence. Computer scientists typically use large-scale datasets to evaluate machine models, while neuroscientists typically employ simple experimental environments to evaluate human subjects. This discrepancy makes it challenging to integrate human-machine evaluation into a unified framework for comparison and analysis. To address the aforementioned issues (How significant is the disparity between human and machine dynamic vision abilities?), I construct an experimental environment based on SOTVerse to enable a fair comparison between human and machine dynamic visual abilities. These sequences provide a thorough examination of the perceptual abilities, cognitive abilities, and robust tracking abilities of humans and machines. Based on this foundation, a human-machine dynamic visual capability evaluation framework is designed. Finally, a fine-grained experimental analysis is carried out from the perspectives of human-machine comparison and human-machine collaboration. The experimental results demonstrate that representative tracking algorithms have gradually narrowed the gap with human subjects. Furthermore, both humans and machines exhibit unique strengths in dynamic visual tasks, suggesting significant potential for human-machine collaboration.

This human-centered evaluation concept is referred to as Visual Turing Test, and I have presented my thoughts and future prospects in this direction through a comprehensive review on intelligent evaluation techniques. These research contents can be summarized using the 3E paradigm. In order to enable machines to acquire human abilities, we need to construct a humanoid proxy task and execute it through interactions among the environment, evaluation, and executors. Ultimately, the executors’ performance reflects their level of ability, and their upper limit of ability is continuously improved through ongoing iterations. I hope these research can create a comprehensive system that lays a solid research foundation for improving the dynamic visual abilities of machines.

Ongoing Research

Optimizing algorithms from the perspective of human-like modeling: locating target more like humans. Building upon these above works, my 🤝 Collaborators and I are focusing on design intelligent tracking algorithms through a human-like modeling approach. For example, we develop MemVLT, a robust visual language tracker based on human memory modelling to enhance tracking performance in complex scenarios. Besides, by considering the cognitive disparities between humans and machines when it comes to dealing with Similar Object Interference (SOI) challenge, we investigate the impact of the SOI challenge on tracking robustness, and construct a TrackingSOI benchmark by using a data mining method. We then propose the TransKT algorithm to enhance the algorithm’s perception abilities towards the SOI challenge, aiming to improve its visual robustness.
Integrating dynamic visual research and application scenarios: from human-like modeling to human-intelligence interaction. The primary objective of human-like modeling is to develop advanced algorithms that enable effective human-machine interaction, serving as the fundamental basis for subsequent collaborations between humans and machines in various scenarios. In our recent work, we have integrated language as a communication medium with the dynamic visual task of unconstrained air-writing. The overarching goal is to establish a natural and seamless mode of human-machine interaction, specifically catered to intelligent applications like AR/VR. Moreover, we are actively expanding the scope of visual language tracking task and exploring the potential development directions from the standpoint of human-machine interaction. We hope to leverage the abilities of Large Language Models (LLMs) and construct more interactive dynamic visual algorithms.
Evaluation is science: more comprehensive and in-depth evaluation techniques are needed in the era of LLMs. LLMs are gaining increasing popularity in academia and industry due to their exceptional performance across various applications. As LLMs continue to play a vital role in both research and daily use, the need for critical evaluation techniques becomes increasingly apparent. My previous research has primarily focused on intelligence evaluation, encompassing abilities related to visual perception, cognition, and reasoning. I will further collaborate with my team to develop more comprehensive intelligence evaluation techniques and conduct a finer-grained analysis of human-machine dynamic visual abilities, providing support for research on model safety and interpretability. Furthermore, we are also moving forward to expand the hierarchical levels of evaluation to a deeper dimension—the psychological dimension. By utilizing the dynamic and open environment of psychological sandbox, we aim to integrate technologies from human-computer interaction, game design, psychology, and artificial intelligence to construct a more intelligent psychological analysis system.

Detailed Lists

Visual Object Tracking (VOT)

Research on single object tracking algorithms in general scenes and specific scenarios (such as unmanned aerial vehicles).
Research on the robustness, generalization, and security of single object tracking algorithms.

Visual Language Tracking (VLT)

Research on multi-modal tracking, video understanding, and visual reasoning tasks based on long video sequences.
Exploring using Large Language Models (LLMs) and Large Vision Models (LVMs) for long video understanding.
Exploring human-computer interaction patterns in long video sequences with visual language tracking as a proxy task.

Benchmark Construction

Research on the construction strategy of single-modal and multi-modal datasets incorporating human knowledge structure.
Research on designing evaluation mechanism for visual robustness, generalization, and safety.

Intelligent Evaluation

Design of a human-machine universal visual ability evaluation framework.
Benchmarking the performance of algorithms based on human abilities in perceptual, cognitive, inferential, etc. Analyzing the bottlenecks of algorithms and human subjects in depth, providing guidance for research on human-like modeling, human-machine collaboration, and human-machine integration.

AI4Science

Cognitive Science: Visual task design, environment construction, and human-machine capability analysis based on human-like modeling principles.
Medical Science: Research on medical image processing techniques based on artificial intelligence technologies (e.g., cell segmentation and tracking, denoising of cryo-electron microscopy images).
Psychology: Development of gamified assessment systems targeting psychological dimensions such as anxiety, depression, and obsession, along with research on intelligent psychological evaluation technologies. Exploring using Large Language Models (LLMs) and Large Vision Models (LVMs) for visual comprehension with psychological elements.
Education: Research on human-computer interaction (HCI) technology for education scenarios, including designing an intelligent education framework from a multidisciplinary perspective, investigating HCI technology, conducting qualitative and quantitative analysis.

📝 Publications

Paper Summary

Journal

TPAMI: IEEE Transactions on Pattern Analysis and Machine Intelligence (CCF-A Journal, Top-1 journal in computer vision, IF=20.8). Acceptance×1 (first author×1)
IJCV: International Journal of Computer Vision (CCF-A Journal, Top-2 journal in computer vision, IF=11.6). Acceptance×2 (first author×1, corresponding-author×1)
TCSVT: IEEE Transactions on Circuits and Systems for Video Technology (CCF-B Journal, IF=8.3). Acceptance×1, under review×1
JIG: Journal of Images and Graphics (《中国图象图形学报》, CCF-B Chinese Journal). Acceptance×1 (first author×1)
JOG: Journal of Graphics (《图学学报》, CCF-C Chinese Journal). Acceptance×1
Neu: Neurocomputing (CCF-C Journal, IF=5.5). Acceptance×1
CMHJ: Chinese Mental Health Journal (《中国心理卫生杂志》, CSSCI Journal, Top Psychological Journal in China) Acceptance×1
APS: Acta Psychologica Sinica (《心理学报》, CSSCI Journal, Top-1 Psychological Journal in China). Under review×1

Conference

NeurIPS: Conference on Neural Information Processing Systems (CCF-A Conference). Acceptance×1 (first author×1), under review×4
CVPRW: Workshop in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CCF-A Conference workshop). Acceptance×1 (oral & best paper honorable mention×1)
ICASSP: IEEE International Conference on Acoustics, Speech, and Signal Processing (CCF-B Conference). Acceptance×1
PRCV: Chinese Conference on Pattern Recognition and Computer Vision (CCF-C Conference). Acceptance×2
CSAI: International Conference on Computer Science and Artificial Intelligence (EI Conference). Acceptance×1 (oral×1)

Acceptance

TPAMI 2023

Global Instance Tracking: Locating Target More Like Humans
Shiyu Hu, X. Zhao, L. Huang, K. Huang
IEEE Transactions on Pattern Analysis and Machine Intelligence (CCF-A Journal, IF=23.6)
📌 Visual Object Tracking 📌 Large-scale Benchmark Construction 📌 Intelligent Evaluation Technology
📃 Paper 🗒 bibTex 📑 PDF 🪧 Poster 🌐 Platform 🔧 Toolkit 💾 Dataset

IJCV 2024

SOTVerse: A User-defined Task Space of Single Object Tracking
Shiyu Hu, X. Zhao, K. Huang
International Journal of Computer Vision (CCF-A Journal, IF=19.5)
📌 Visual Object Tracking 📌 Dynamic Open Environment Construction 📌 3E Paradigm
📃 Paper 🗒 bibTex 📑 PDF 🌐 Platform

IJCV 2023

BioDrone: A Bionic Drone-based Single Object Tracking Benchmark for Robust Vision
X. Zhao, Shiyu Hu✉️, Y. Wang, J. Zhang, Y. Hu, R. Liu, H. Lin, Y. Li, R. Li, K. Liu, J. Li
International Journal of Computer Vision (CCF-A Journal, IF=19.5)
📌 Visual Object Tracking 📌 Drone-based Tracking 📌 Visual Robustness
📃 Paper 🌐 Platform 🗒 bibTex 📑 PDF 🔧 Toolkit 💾 Dataset

NeurIPS 2023

A Multi-modal Global Instance Tracking Benchmark (MGIT): Better Locating Target in Complex Spatio-temporal and causal Relationship
Shiyu Hu, D. Zhang, M. Wu, X. Feng, X. Li, X. Zhao, K. Huang
the 37th Conference on Neural Information Processing Systems (CCF-A Conference, Poster)
📌 Visual Language Tracking 📌 Long Video Understanding and Reasoning 📌 Hierarchical Semantic Information Annotation
📃 Paper 🗒 bibTex 📃 PDF 🪧 Poster 📹 Slides 🌐 Platform 🔧 Toolkit 💾 Dataset

中国图象图形学报 2023

Visual Intelligence Evaluation Techniques for Single Object Tracking: A Survey (单目标跟踪中的视觉智能评估技术综述)
Shiyu Hu, X. Zhao, K. Huang
Journal of Images and Graphics (《中国图象图形学报》, CCF-B Chinese Journal)
📌 Visual Object Tracking 📌 Intelligent Evaluation Technique 📌 AI4Science
📃 Paper 📑 PDF

TCSVT 2024

Finger in Camera Speaks Everything: Unconstrained Air-Writing for Real-World
M. Wu, K. Huang, Y. Cai, Shiyu Hu, Y. Zhao, W. Wang
IEEE Transactions on Circuits and Systems for Video Technology (CCF-B Journal, IF=8.4)
📌 Air-writing Technique 📌 Benchmark Construction 📌 Human-machine Interaction
📃 Paper 🗒 bibTex 📃 PDF

CVPRW 2024

Diverse Text Generation for Visual Language Tracking Based on LLM
X. Li, X. Feng, Shiyu Hu, M. Wu, D. Zhang, J. Zhang, K. Huang
the 3rd Workshop on Vision Datasets Understanding and DataCV Challenge in CVPR 2024 (Workshop in CCF-A Conference, Oral, Best Paper Honorable Mention)
📌 Visual Language Tracking 📌 Large Language Model 📌 Evaluation Technique
📃 Paper 🗒 bibTex 📃 PDF 🪧 Poster 📹 Slides 🌐 Platform 🔧 Toolkit 💾 Dataset 🏆 Award

ICASSP 2024

Robust Single-particle Cryo-EM Image Denoising and Restoration
J. Zhang, T. Zhao, Shiyu Hu, X. Zhao
the 49th IEEE International Conference on Acoustics, Speech, and Signal Processing (CCF-B Conference, Poster)
📌 Medical Image Processing 📌 AI4Science 📌 Diffusion Model
📃 Paper 🗒 bibTex 📑 PDF

PRCV 2024

VS-LLM: Visual-Semantic Depression Assessment based on LLM for Drawing Projection Test
M. Wu, Y. Kang, X. Li, Shiyu Hu, X. Chen, Y. kang, W. Wang, K. Huang
the 7th Chinese Conference on Pattern Recognition and Computer Vision (CCF-C Conference)
📌 Psychological Assessment System 📌 Gamified Assessment 📌 AI4Science

中国心理卫生杂志 2024

A Review of Intelligent Psychological Assessment Based on Interactive Environment (基于交互环境的智能化心理测评)
K. Huang, Y. Kang, C. Yan, Shiyu Hu, L. Wang, T. Tao, W. Gao
Chinese Mental Health Journal (《中国心理卫生杂志》, CSSCI Journal, Top Psychological Journal in China)
📌 Psychological Assessment System 📌 Gamified Assessment 📌 AI4Science

PRCV 2023

A Hierarchical Theme Recognition Model for Sandplay Therapy
X. Feng, Shiyu Hu, X. Chen, K. Huang
the 6th Chinese Conference on Pattern Recognition and Computer Vision (CCF-C Conference, Poster)
📌 Psychological Assessment System 📌 Gamified Assessment 📌 AI4Science
📃 Paper 🗒 bibTex 📑 PDF 🔖 Supplementary 🪧 Poster

CSAI 2023

Rethinking Similar Object Interference in Single Object Tracking
Y. Wang, Shiyu Hu, X. Zhao
the 7th International Conference on Computer Science and Artificial Intelligence (EI Conference, Oral)
📌 Visual Object Tracking 📌 Similar Object Interference 📌 Data Mining
📃 Paper 🗒 bibTex 📑 PDF

Neurocomputing 2022

Revisiting Instance Search: A New Benchmark Using Cycle Self-training
Y. Zhang, C. Liu, W. Chen, X. Xu, F. Wang, H. Li, Shiyu Hu, X. Zhao
Neurocomputing (CCF-C Journal, IF=6)
📌 Video Instance Search 📌 Benchmark Construction 📌 Data Mining
📃 Paper 🗒 bibTex 📑 PDF 🌐 Project

图学学报 2021

Visual Turing: The Next Development of Computer Vision in The View of Human-computer Gaming (视觉图灵：从人机对抗看计算机视觉下一步发展)
K. Huang, X. Zhao, Q. Li, Shiyu Hu
Journal of Graphics (《图学学报》, CCF-C Chinese Journal)
📌 Visual Object Tracking 📌 Intelligent Evaluation Technique 📌 AI4Science
📃 Paper 🗒 bibTex 📑 PDF

Preprint

Nearing or Surpassing: Overall Evaluation of Human-Machine Dynamic Vision Ability
Shiyu Hu, X. Zhao, Y. Wang, Y. Shan, K. Huang
📌 Visual Object Tracking 📌 Intelligent Evaluation Technique 📌 AI4Science
📑 PDF 🗒 bibTex

Under Review

TCSVT 2024

Target or Distractor? Rethinking Similar Object Interference in Single Object Tracking
Y. Wang, Shiyu Hu, D. Zhang, M. Wu, T. Yao, Y. Wang, L. Chen, X. Zhao
IEEE Transactions on Circuits and Systems for Video Technology (CCF-B Journal, IF=8.4, Under Review)
📌 Visual Object Tracking 📌 Similar Object Interference 📌 Data Mining

2024

Beyond Accuracy: Tracking more like Human via Visual Search
D. Zhang, Shiyu Hu, X. Feng, X. Li, M. Wu, J. Zhang, K. Huang
Submitted to a CCF-A conference, under review
📌 Visual Object Tracking 📌 Visual Search Mechanism 📌 Visual Turing Test

2024

Diverse Text Generation for Visual Language Tracking Based on LLM
X. Li, Shiyu Hu, X. Feng, D. Zhang, M. Wu, J. Zhang, K. Huang
Submitted to a CCF-A conference, under review
📌 Visual Language Tracking 📌 Large Language Model 📌 Evaluation Technique

2024

MemVLT: Vision-Language Tracking with Adaptive Memory-based Prompts
X. Feng, X. Li, Shiyu Hu, D. Zhang, M. Wu, J. Zhang, X. Chen, K. Huang
Submitted to a CCF-A conference, under review
📌 Visual Language Tracking 📌 Human-like Memory Modeling 📌 Adaptive Prompts

2024

Unconstrained Multimodal Air-Writing Benchmark: Writing by Moving Your Fingers in 3D
M. Wu, X. Li, Shiyu Hu, Y. Cai, K. Huang, W. Wang
Submitted to a CCF-A conference, under review
📌 Air-writing Technique 📌 Benchmark Construction 📌 Human-machine Interaction

心理学报 2024

Intelligent Psychological Assessment with Sandplay based on Evidence-Centered Design Theory (基于证据中心设计理论的智能心理沙盘测评系统)
Y. Ren, X. Feng, Shiyu Hu, Y. Kang, C. Yan, Y. Zeng, L. Wang, K. Huang
Acta Psychologica Sinica (《心理学报》, CSSCI Journal, Top-1 Psychological Journal in China, Under Review)
📌 Psychological Assessment System 📌 Gamified Assessment 📌 AI4Science

⚙️ Projects

The list here mainly includes engineering projects, while more academic projects have already been published in the form of research papers. Please refer to the 📝 Publications for more information.

2018.03-2018.11

Darknet-Cross: Light-weight Deep Learning Framework for Heterogeneous Computing
📌 High-performance Computing 📌 Heterogeneous Computing 📌 Deep learning Framework

Darknet-Cross is a lightweight deep learning framework, mainly based on the open-source deep learning algorithm library Darknet and yolov2_light, and it has been successfully ported to mobile devices through cross-compilation. This framework enables efficient algorithm inference using mobile GPUs.
Darknet-Cross supports algorithm acceleration processing on various platforms (e.g., Android and Ubuntu) and various GPUs (e.g., Nvidia GTX1070 and Adreno 630).
The work is a part of my master’s thesis at the University of Hong Kong (thesis defense grade: A+).

2019.05 - 2019.10

A Skin Color Detection System without Color Atla
📌 Color Constancy 📌 Skin Color Detection 📌 Illumination Estimation

Under 18 different environmental lighting conditions and with 4 combinations of smartphone parameters, skin color data was collected from 110 participants. The skin color dataset consists of 7,920 images, with the testing results from CK Company’s MPA9 skin color detector serving as the ground truth for user skin colors.
Using an elliptical skin model, the essential skin regions are extracted from the images. The open-source color constancy model, FC4, is employed to recover the environmental lighting conditions. Subsequently, the skin color detection results for users are calculated using SVR regression.
The related work has been successfully deployed in Huawei’s official mobile application ‘Mirror’ for its AI skin testing function.

2020.11 - 2021.03

A Project for Cell Tracking Based on Deep Learning Method
📌 Medical Image Processing 📌 AI4Science 📌 Cell Segmentation and Tracking

This method follows the tracking by detection paradigm and combines per-frame CNN prediction for cell segmentation with a Siamese network for cell tracking.
This project was submitted to the cell tracking challenge in Mar. 2021, and maintains the second place in the Fluo-C2FL-MSC+ dataset and the third place in the Fluo-C2FL-Huh7 dataset (statistics by Oct. 2023).

2024.01 - Now

Research on the Dilemma and Countermeasures of Human-Computer Interaction in Intelligent Education
📌 Intelligent Education Technology 📌 Human-Computer Interaction 📌 AI4Science

Incorporating insights and methodologies from education, cognitive psychology, and computer science, this project establishes a theoretical framework for understanding the evolution of HCI within the intelligent education.
Drawing upon the established theoretical framework, this project conducts a comprehensive analysis of the evolution of HCI in educational settings, transitioning from collaboration to integration. Furthermore, it delves into the key issues arising from this transformative process within the realm of intelligent education.
Building upon the core issues unearthed, this project investigates strategies for leveraging theoretical guidance and technical enhancements to enhance the efficacy of HCI in intelligent education, ultimately striving towards effective human-computer integration.
The project is funded by the 2023 Intelligent Education PhD Research Fund, supported by the Institute of AI Education Shanghai and East China Normal University, and is currently in progress.

🏆 Honors and Awards

2024 Best Paper Honorable Mention in the 3rd Workshop on Vision Datasets Understanding and DataCV Challenge in CVPR 2024 (CVPRW最佳论文提名)
2024 Beijing Outstanding Graduates (北京市优秀毕业生, top 5%)
2023 China National Scholarship (国家奖学金, top 1%, only 8 Ph.D. students in main campus of University of Chinese Academy of Sciences win this scholarship)
2023 First Prize of Climbing Scholarship in Institute of Automation, Chinese Academy of Sciences (攀登一等奖学金, only 6 students in Institute of Automation, Chinese Academy of Sciences win this scholarship)
2022 Merit Student of University of Chinese Academy of Sciences (中国科学院大学三好学生)
2017 Excellent Innovative Student of Beijing Institute of Technology (北京理工大学优秀创新学生)
2016 College Scholarship of Chinese Academy of Sciences (中国科学院大学生奖学金)
2016 Excellent League Member on Youth Day Competition of Beijing Institute of Technology (北京理工大学优秀团员)
2015 National First Prize in Contemporary Undergraduate Mathematical Contest in Modeling (CUMCM) (全国大学生数学建模竞赛国家一等奖, top 1%, only 1 team in Beijing Institute of Technology win this prize) [📑PDF] [📖Selected and Reviewed Outstanding Papers in CUMCM (2011-2015) (Chapter 9)]
2015 First Prize of Mathematics Modeling Competition within Beijing Institute of Technology (北京理工大学数学建模校内选拔赛第一名)
2015 Outstanding Individual on Summer Social Practice of Beijing Institute of Technology (北京理工大学暑期社会实践优秀个人)
2015 Second Prize on Summer Social Practice of Beijing Institute of Technology (北京理工大学暑期社会实践二等奖, team leader)
2015 Outstanding Student Cadre of Beijing Institute of Technology (北京理工大学优秀学生干部)
2015 Outstanding League Cadre on Youth Day Competition of Beijing Institute of Technology (北京理工大学优秀团干部)
2015 Outstanding Youth League Branch of Beijing Institute of Technology (北京理工大学优秀团支部, team leader)
2015 Top-10 Activities on Youth Day Competition of Beijing Institute of Technology (北京理工大学十佳团日活动, team leader)
2014 Outstanding Student of Beijing Institute of Technology (北京理工大学优秀学生)
2014, 2015, 2016, 2017 Academic Scholarship of Beijing Institute of Technology (北京理工大学学业奖学金)

🔗 For More Info

🤝 Collaborators

I am honored to collaborate with these outstanding researchers. We engage in close discussions concerning various fields such as computer vision, cognitive science, AI4Science, and human-computer interaction. If you are interested in these areas as well, please feel free to contact me.

Jiahui Gao, Ph.D. at the University of Hong Kong (HKU), focusing on natural language processing, including Pre-trained Language Modeling (PLM), Automatic Machining Learning (AutoML), and Multi-Modal (vision-language) learning.
Yanyao Zhou, Ph.D. student at the University of Hong Kong (HKU), focusing on cognitive science and psychology.
Fangchao Liu, Ph.D. student at the Hong Kong University of Science and Technology (HKUST), focusing on computer vision and AI4Science.
Meiqi Wu, Ph.D. student at the University of Chinese Academy of Sciences (UCAS), focusing on computer vision, intelligent evaluation technique, and human-computer interaction.
Yiping Ma, Ph.D. student at the East China Normal University, focusing on intelligent education technique and human-computer interaction.
Di Shang, Ph.D. student at the Institute of Automation, Chinese Academy of Sciences (CASIA), focusing on computer vision, spiking neural network and few-shot learning.
Yaxuan Kang, design researcher, research assistant and interaction designer at the Institute of Automation, Chinese Academy of Sciences (CASIA), focusing on human-computer interaction.
Jing Zhang, research assistant at the Institute of Automation, Chinese Academy of Sciences (CASIA), focusing on computer vision and AI4Science.
Yipei Wang, M.S. and incoming Ph.D. student at the Southeast University, focusing on visual object tracking and recommendation system.
Xiaokun Feng, Ph.D. student at the Institute of Automation, Chinese Academy of Sciences (CASIA), focusing on visual object tracking, visual language tracking and AI4Science.
Dailing Zhang, Ph.D. student at the Institute of Automation, Chinese Academy of Sciences (CASIA), focusing on visual object tracking, visual language tracking and AI4Science.
Xuchen Li, B.E. student at Beijing University of Posts and Telecommunications (BUPT) and incoming Ph.D. student at the Institute of Automation, Chinese Academy of Sciences (CASIA), focusing on visual object tracking, visual language tracking and AI4Science.

📄 CV

✉️ Contact

hushiyu199510@gmail.com (Main)
hushiyu1995@qq.com
hushiyu2019@ia.ac.cn (Valid from 2019.06 - 2024.03)

My homepage visitors recorded from April 18th, 2024. Thanks for attention.