About Me

Hi there, I am Shiyu Hu (胡世宇)!

Currently, I am a Research Fellow at Nanyang Technological University (NTU), working with Prof. Kanghao Cheong. Before that, I got my Ph.D. degree at Institute of Automation, Chinese Academy of Sciences (中国科学院自动化研究所) and University of Chinese Academy of Sciences (中国科学院大学) in Jan. 2024, supervised by Prof. Kaiqi Huang (黄凯奇) (IAPR Fellow), co-supervised by Prof. Xin Zhao (赵鑫). I received my master’s degree from the Department of Computer Science, the University of Hong Kong (HKU) under the supervision of Prof. Choli Wang (王卓立).

Besides, I am honored to collaborate with a group of outstanding researchers. We have established the Visual Intelligence Interest Group (VIIG) to promote research in related directions.

📣 Prof. Cheong's group currently has few vacancies for MPhil. and PhD. Competitive, curious, and self-driven candidates can contact me with your CV.

📣 If you are interested in my research direction or hope to cooperate with me, feel free to contact me! Online or offline cooperations are all welcome (shiyu.hu@ntu.edu.sg).

🔥 News

  • 2024.09: 📝Two papers (MemVLT and CPDTrack) have been accepted by the 38th Conference on Neural Information Processing Systems (NeurIPS, CCF-A Conference), congratulations to Xiaokun and Dailing👏!
  • 2024.08: 📣One tutorial proposal has been accepted by the 17th Asian Conference on Computer Vision (ACCV), the tutorial will be conducted in Dec. 2024 (Hanoi, Vietnam).
  • 2024.08: 👩‍💻Start my work as a Research Fellow in Nanyang Technological University (NTU), Singapore.
  • 2024.07: 📣One tutorial proposal has been accepted by the 27th International Conference on Pattern Recognition (ICPR), the tutorial will be conducted in Dec. 2024 (Kolkata, India).
  • 2024.06: 📝One paper has been accepted by the 7th Chinese Conference on Pattern Recognition and Computer Vision (PRCV), congratulations to Meiqi👏!
  • 2024.06: 📝One paper has been accepted by Chinese Mental Health Journal (《中国心理卫生杂志》).
  • 2024.05: 🏆Obtain Beijing Outstanding Graduates (北京市优秀毕业生, top 5%).
  • 2024.05: 📖Invited by Springer, I will complete a book in Dec. 2024 with Prof. Xin Zhao and Prof. Xucheng Yin (title: Visual Object Tracking - An Evaluation Perspective).
  • 2024.05: 📣We have presented our work (Global Instance Tracking (GIT)) at TPAMI2023 during the VALSE2024 poster session (May 2024, Chongqing, China, see our 🪧 Poster for more information).
  • 2024.04 : 📝One paper has been accepted by the 3rd Workshop on Vision Datasets Understanding and DataCV Challenge in CVPR 2024 (CVPRW, Oral, Best Paper Honorable Mention), congratulations to Xuchen👏!
  • 2024.04 : 📝One paper has been accepted by IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), congratulations to Meiqi👏!
  • 2024.01: 📣One tutorial proposal has been accepted by the 31th IEEE International Conference on Image Processing (ICIP), the tutorial will be conducted in Oct. 2024 (Abu Dhabi, United Arab Emirates).
  • 2024.01: 🪙One project about human-computer interaction in intelligent education has been funded by the 2023 Intelligent Education PhD Research Fund, supported by the Institute of AI Education Shanghai and East China Normal University, congratulations to Yiping👏!
  • 2024.01: 👩‍🎓Got my Ph.D. degree at Institute of Automation, Chinese Academy of Sciences (CASIA) and University of Chinese Academy of Sciences (UCAS).
  • 2023.12 : 📝One paper has been accepted by the 49th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).
  • 2023.11 : 👩‍🎓Passed the doctoral thesis defense with unanimous distinction.
  • 2023.11 : 📝One paper has been accepted by the 7th International Conference on Computer Science and Artificial Intelligence (CSAI, EI Conference, Oral), congratulations to Yipei👏!
  • 2023.10 : 🏆Obtain China National Scholarship (国家奖学金, top 1%, only 8 Ph.D. students in main campus of UCAS win this scholarship).
  • 2023.10 : 🏆Obtain First Prize of Climbing Scholarship (攀登一等奖学金, only 6 students in CASIA win this scholarship).
  • 2023.10 : 📝One second author & co-corresponding author paper has been accepted by International Journal of Computer Vision (IJCV).
  • 2023.09 : 📝One first author survey has been accepted by Journal of Images and Graphics (《中国图像图形学报》).
  • 2023.09 : 📝One first author paper has been accepted by the 37th Conference on Neural Information Processing Systems (NeurIPS, Poster).
  • 2023.09 : 📝One first author paper has been accepted by International Journal of Computer Vision (IJCV).
  • 2023.08 : 📝One paper has been accepted by the 6th Chinese Conference on Pattern Recognition and Computer Vision (PRCV, Poster), congratulations to Xiaokun👏!
  • 2022.06 : 🏆Obtain merit student of University of Chinese Academy of Sciences.
  • 2022.06 : 📝One paper has been accepted by Neurocomputing (Neu).
  • 2022.02 : 📝One first author paper has been accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI).
  • 2021.06 : 📝One survey has been accepted by Journal of Graphics (《图学学报》).

👩‍💻 Experiences

2024.08 - Now : Research Fellow at Nanyang Technological University (NTU)

2018.03 - 2018.11 : Research Assistant at University of Hong Kong (HKU)

  • Direction: High Performance Computing, Heterogeneous Computing
  • PI: Prof. Choli Wang

2016.08 - 2016.09 : Research Intern at Institute of Electronics, Chinese Academy of Sciences (CASIE)

📖 Educations

2019.09 - 2024.01 : Ph.D. in Institute of Automation, Chinese Academy of Sciences (CASIA) and University of Chinese Academy of Sciences (UCAS)

  • Major: Computer Applied Technology
  • Supervisor: Prof. Kaiqi Huang (IAPR Fellow, IEEE Senior Member, 10,000 Talents Program - Leading Talents)
  • Co-supervisor: Prof. Xin Zhao (IEEE Senior Member, Beijing Science Fund for Distinguished Young Scholars)
  • Thesis Title: Research of Intelligence Evaluation Techniques for Single Object Tracking
  • Thesis Committee: Prof. Jianbin Jiao, Prof. Yuxin Peng (The National Science Fund for Distinguished Young Scholars), Prof. Yao Zhao (IEEE Fellow, IET Fellow, The National Science Fund for Distinguished Young Scholars), Prof. Yunhong Wang (IEEE Fellow, IAPR Fellow, CCF Fellow), Prof. Ming Tang
  • Thesis Defense Grade: Excellent

2017.09 - 2019.06 : M.S. in Department of Computer Science, University of Hong Kong (HKU)

  • Major: Computer Science
  • Supervisor: Prof. Choli Wang
  • Thesis Title: NightRunner: Deep Learning for Autonomous Driving Cars after Dark [🌐Project]
  • Thesis Defense Grade: A+

2013.09 - 2017.06 : B.E. in Elite Class in School of Information and Electronics, Beijing Institute of Technology (BIT)

  • Major: Information Engineering
  • Supervisor: Prof. Senlin Luo
  • Thesis Title: Text Sentiment Analysis Based on Deep Neural Network
  • Thesis Defense Grade: Excellent

2015.07 - 2015.08 : Summer Semester in University of California, Berkeley (UCB)

  • Major: New Media
  • Course Grade: A

🔍️ Research Interests

Research Foundation

My previous research has primarily been dedicated to evaluating and exploring machine vision intelligence. This research encompasses various aspects such as task modeling, environment construction, evaluation technique, and human-machine comparisons. I strongly hold the belief that the development of artificial intelligence is inherently interconnected with human factors. Hence, drawing inspiration from the renowned Turing Test, I have focused my investigation on the concept of Visual Turing Test, aiming to integrate human elements into the evaluation of dynamic visual tasks. The ultimate goal of my previous work is to assess and analyze machine vision intelligence by benchmarking against human abilities. I believe that effective evaluation techniques are the foundation for helping us achieve trustworthy and secure artificial general intelligence. The following are several key aspects:

  • What are the abilities of humans? Designing more human-like tasks. In my research, I focused on utilizing Visual Object Tracking (VOT) as a representative task to explore dynamic visual abilities. VOT holds a pivotal role in computer vision; however, its original definition imposes excessive constraints that hinder alignment with human dynamic visual tracking abilities. To address this problem, I adopted a humanoid modeling perspective and expanded the original VOT definition. By eliminating the presumption of continuous motion, I introduced a more humanoid-oriented Global Instance Tracking (GIT) task. This expansion of the research objectives transformed VOT from a perceptual level, which involves locating targets in short video sequences through visual feature contrasts, to a cognitive level that addresses the continuous localization of targets in long videos without presuming continuous motion. Building upon this, I endeavored to incorporate semantic information into the GIT task and introduced the Multi-modal GIT (MGIT) task. The goal is to integrate a human-like understanding of long videos with hierarchically structured semantic labels, thereby further advancing the research objectives to include visual reasoning within complex spatio-temporal causal relationships.

  • What are the living environments of humans? Constructing more comprehensive and realistic datasets. The environment in which humans reside is characterized by complexity and constant change. However, current research predominantly employs static and limited datasets as closed experimental environments. These toy examples fail to provide machines with authentic human-like visual intelligence. To address this limitation, I draw inspiration from film theory and propose a framework for decoupling video narrative content. In doing so, I have developed VideoCube, the largest-scale object tracking benchmark. Expanding on this work, I integrate diverse environments from the field of VOT to create SOTVerse, a dynamic and open task space comprising 12.56 million frames. Within this task space, researchers can efficiently construct different subspaces to train algorithms, thereby improving their visual generalization across various scenarios. Furthermore, my research also focuses on visual robustness. Leveraging a bio-inspired flapping-wing drone developed by our team, I establish the first flapping-wing drone-based benchmark named BioDrone to enhance visual robustness in challenging environments.

  • How significant is the disparity between human and machine dynamic vision abilities? Utilizing human abilities as a baseline to evaluate machine intelligence. Computer scientists typically use large-scale datasets to evaluate machine models, while neuroscientists typically employ simple experimental environments to evaluate human subjects. This discrepancy makes it challenging to integrate human-machine evaluation into a unified framework for comparison and analysis. To address the aforementioned issues (How significant is the disparity between human and machine dynamic vision abilities?), I construct an experimental environment based on SOTVerse to enable a fair comparison between human and machine dynamic visual abilities. These sequences provide a thorough examination of the perceptual abilities, cognitive abilities, and robust tracking abilities of humans and machines. Based on this foundation, a human-machine dynamic visual capability evaluation framework is designed. Finally, a fine-grained experimental analysis is carried out from the perspectives of human-machine comparison and human-machine collaboration. The experimental results demonstrate that representative tracking algorithms have gradually narrowed the gap with human subjects. Furthermore, both humans and machines exhibit unique strengths in dynamic visual tasks, suggesting significant potential for human-machine collaboration.

This human-centered evaluation concept is referred to as Visual Turing Test, and I have presented my thoughts and future prospects in this direction through a comprehensive review on intelligent evaluation techniques. These research contents can be summarized using the 3E paradigm. In order to enable machines to acquire human abilities, we need to construct a humanoid proxy task and execute it through interactions among the environment, evaluation, and executors. Ultimately, the executors’ performance reflects their level of ability, and their upper limit of ability is continuously improved through ongoing iterations. I hope these research can create a comprehensive system that lays a solid research foundation for improving the dynamic visual abilities of machines.

Detailed Lists of Current Research Interests

Visual Object Tracking (VOT)

  • Research on single object tracking algorithms in general scenes and specific scenarios (such as unmanned aerial vehicles).
  • Research on the robustness, generalization, and security of single object tracking algorithms.

Visual Language Tracking (VLT)

  • Research on multi-modal tracking, video understanding, and visual reasoning tasks based on long video sequences.
  • Exploring using Large Language Models (LLMs) and Large Vision Models (LVMs) for long video understanding.
  • Exploring human-computer interaction patterns in long video sequences with visual language tracking as a proxy task.

Benchmark Construction

  • Research on the construction strategy of single-modal and multi-modal datasets incorporating human knowledge structure.
  • Research on designing evaluation mechanism for visual robustness, generalization, and safety.

Intelligent Evaluation

  • Design of a human-machine universal visual ability evaluation framework.
  • Benchmarking the performance of algorithms based on human abilities in perceptual, cognitive, inferential, etc. Analyzing the bottlenecks of algorithms and human subjects in depth, providing guidance for research on human-like modeling, human-machine collaboration, and human-machine integration.

AI4Science

  • Cognitive Science: Visual task design, environment construction, and human-machine capability analysis based on human-like modeling principles.
  • Medical Science: Research on medical image processing techniques based on artificial intelligence technologies (e.g., cell segmentation and tracking, denoising of cryo-electron microscopy images).
  • Psychology: Development of gamified assessment systems targeting psychological dimensions such as anxiety, depression, and obsession, along with research on intelligent psychological evaluation technologies. Exploring using Large Language Models (LLMs) and Large Vision Models (LVMs) for visual comprehension with psychological elements.
  • Education: Research on human-computer interaction (HCI) technology for education scenarios, including designing an intelligent education framework from a multidisciplinary perspective, investigating HCI technology, conducting qualitative and quantitative analysis.

📝 Publications

Paper Summary

Journal

  • TPAMI: IEEE Transactions on Pattern Analysis and Machine Intelligence (CCF-A Journal, Top-1 journal in computer vision, IF=20.8). Acceptance×1 (first author×1)
  • IJCV: International Journal of Computer Vision (CCF-A Journal, Top-2 journal in computer vision, IF=11.6). Acceptance×2 (first author×1, corresponding-author×1)
  • TCSVT: IEEE Transactions on Circuits and Systems for Video Technology (CCF-B Journal, IF=8.3). Acceptance×1, under review×1
  • JIG: Journal of Images and Graphics (《中国图象图形学报》, CCF-B Chinese Journal). Acceptance×1 (first author×1)
  • JOG: Journal of Graphics (《图学学报》, CCF-C Chinese Journal). Acceptance×1
  • Neu: Neurocomputing (CCF-C Journal, IF=5.5). Acceptance×1
  • CMHJ: Chinese Mental Health Journal (《中国心理卫生杂志》, CSSCI Journal, Top Psychological Journal in China) Acceptance×1
  • APS: Acta Psychologica Sinica (《心理学报》, CSSCI Journal, Top-1 Psychological Journal in China). Under review×1

Conference

  • NeurIPS: Conference on Neural Information Processing Systems (CCF-A Conference). Acceptance×3 (first author×1)
  • ICLR: International Conference on Learning Representations (CAAI-A Conference). Under review×3 (first author×2)
  • AAAI: Annual AAAI Conference on Artificial Intelligence (CCF-A Conference). Under review×1
  • NeurIPSW: Workshop in Conference on Neural Information Processing Systems (CCF-A Conference workshop). Under review×1
  • CVPRW: Workshop in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CCF-A Conference workshop). Acceptance×1 (oral & best paper honorable mention×1)
  • ICASSP: IEEE International Conference on Acoustics, Speech, and Signal Processing (CCF-B Conference). Acceptance×1, under review×1
  • PRCV: Chinese Conference on Pattern Recognition and Computer Vision (CCF-C Conference). Acceptance×2
  • CSAI: International Conference on Computer Science and Artificial Intelligence (EI Conference). Acceptance×1 (oral×1)

Acceptance

TPAMI 2023
sym

Global Instance Tracking: Locating Target More Like Humans
Shiyu Hu, X. Zhao, L. Huang, K. Huang
IEEE Transactions on Pattern Analysis and Machine Intelligence (CCF-A Journal)
📌 Visual Object Tracking 📌 Large-scale Benchmark Construction 📌 Intelligent Evaluation Technology
📃 Paper 🗒 bibTex 📑 PDF 🪧 Poster 🌐 Platform 🔧 Toolkit 💾 Dataset

IJCV 2024
sym

SOTVerse: A User-defined Task Space of Single Object Tracking
Shiyu Hu, X. Zhao, K. Huang
International Journal of Computer Vision (CCF-A Journal)
📌 Visual Object Tracking 📌 Dynamic Open Environment Construction 📌 3E Paradigm
📃 Paper 🗒 bibTex 📑 PDF 🌐 Platform

IJCV 2024
sym

BioDrone: A Bionic Drone-based Single Object Tracking Benchmark for Robust Vision
X. Zhao, Shiyu Hu✉️, Y. Wang, J. Zhang, Y. Hu, R. Liu, H. Lin, Y. Li, R. Li, K. Liu, J. Li
International Journal of Computer Vision (CCF-A Journal)
📌 Visual Object Tracking 📌 Drone-based Tracking 📌 Visual Robustness
📃 Paper 🌐 Platform 🗒 bibTex 📑 PDF 🔧 Toolkit 💾 Dataset

NeurIPS 2023
sym

A Multi-modal Global Instance Tracking Benchmark (MGIT): Better Locating Target in Complex Spatio-temporal and causal Relationship
Shiyu Hu, D. Zhang, M. Wu, X. Feng, X. Li, X. Zhao, K. Huang
the 37th Conference on Neural Information Processing Systems (CCF-A Conference, Poster)
📌 Visual Language Tracking 📌 Long Video Understanding and Reasoning 📌 Hierarchical Semantic Information Annotation
📃 Paper 🗒 bibTex 📃 PDF 🪧 Poster 📹 Slides 🌐 Platform 🔧 Toolkit 💾 Dataset

中国图象图形学报 2023
sym

Visual Intelligence Evaluation Techniques for Single Object Tracking: A Survey (单目标跟踪中的视觉智能评估技术综述)
Shiyu Hu, X. Zhao, K. Huang
Journal of Images and Graphics (《中国图象图形学报》, CCF-B Chinese Journal)
📌 Visual Object Tracking 📌 Intelligent Evaluation Technique 📌 AI4Science
📃 Paper 📑 PDF

NeurIPS 2024
sym

Beyond Accuracy: Tracking more like Human via Visual Search
D. Zhang, Shiyu Hu, X. Feng, X. Li, M. Wu, J. Zhang, K. Huang
the 38th Conference on Neural Information Processing Systems (CCF-A Conference, Poster)
📌 Visual Object Tracking 📌 Visual Search Mechanism 📌 Visual Turing Test

NeurIPS 2024
sym

MemVLT: Vision-Language Tracking with Adaptive Memory-based Prompts
X. Feng, X. Li, Shiyu Hu, D. Zhang, M. Wu, J. Zhang, X. Chen, K. Huang
the 38th Conference on Neural Information Processing Systems (CCF-A Conference, Poster)
📌 Visual Language Tracking 📌 Human-like Memory Modeling 📌 Adaptive Prompts

TCSVT 2024
sym

Finger in Camera Speaks Everything: Unconstrained Air-Writing for Real-World
M. Wu, K. Huang, Y. Cai, Shiyu Hu, Y. Zhao, W. Wang
IEEE Transactions on Circuits and Systems for Video Technology (CCF-B Journal)
📌 Air-writing Technique 📌 Benchmark Construction 📌 Human-machine Interaction
📃 Paper 🗒 bibTex 📃 PDF

CVPRW 2024
sym

Diverse Text Generation for Visual Language Tracking Based on LLM
X. Li, X. Feng, Shiyu Hu, M. Wu, D. Zhang, J. Zhang, K. Huang
the 3rd Workshop on Vision Datasets Understanding and DataCV Challenge in CVPR 2024 (Workshop in CCF-A Conference, Oral, Best Paper Honorable Mention)
📌 Visual Language Tracking 📌 Large Language Model 📌 Evaluation Technique
📃 Paper 🗒 bibTex 📃 PDF 🪧 Poster 📹 Slides 🌐 Platform 🔧 Toolkit 💾 Dataset 🏆 Award

ICASSP 2024
sym

Robust Single-particle Cryo-EM Image Denoising and Restoration
J. Zhang, T. Zhao, Shiyu Hu, X. Zhao
the 49th IEEE International Conference on Acoustics, Speech, and Signal Processing (CCF-B Conference, Poster)
📌 Medical Image Processing 📌 AI4Science 📌 Diffusion Model
📃 Paper 🗒 bibTex 📑 PDF

PRCV 2024
sym

VS-LLM: Visual-Semantic Depression Assessment based on LLM for Drawing Projection Test
M. Wu, Y. Kang, X. Li, Shiyu Hu, X. Chen, Y. kang, W. Wang, K. Huang
the 7th Chinese Conference on Pattern Recognition and Computer Vision (CCF-C Conference)
📌 Psychological Assessment System 📌 Gamified Assessment 📌 AI4Science

中国心理卫生杂志 2024
sym

A Review of Intelligent Psychological Assessment Based on Interactive Environment (基于交互环境的智能化心理测评)
K. Huang, Y. Kang, C. Yan, Shiyu Hu, L. Wang, T. Tao, W. Gao
Chinese Mental Health Journal (《中国心理卫生杂志》, CSSCI Journal, Top Psychological Journal in China)
📌 Psychological Assessment System 📌 Gamified Assessment 📌 AI4Science

PRCV 2023
sym

A Hierarchical Theme Recognition Model for Sandplay Therapy
X. Feng, Shiyu Hu, X. Chen, K. Huang
the 6th Chinese Conference on Pattern Recognition and Computer Vision (CCF-C Conference, Poster)
📌 Psychological Assessment System 📌 Gamified Assessment 📌 AI4Science
📃 Paper 🗒 bibTex 📑 PDF 🔖 Supplementary 🪧 Poster

CSAI 2023
sym

Rethinking Similar Object Interference in Single Object Tracking
Y. Wang, Shiyu Hu, X. Zhao
the 7th International Conference on Computer Science and Artificial Intelligence (EI Conference, Oral)
📌 Visual Object Tracking 📌 Similar Object Interference 📌 Data Mining
📃 Paper 🗒 bibTex 📑 PDF

Neurocomputing 2022
sym

Revisiting Instance Search: A New Benchmark Using Cycle Self-training
Y. Zhang, C. Liu, W. Chen, X. Xu, F. Wang, H. Li, Shiyu Hu, X. Zhao
Neurocomputing (CCF-C Journal)
📌 Video Instance Search 📌 Benchmark Construction 📌 Data Mining
📃 Paper 🗒 bibTex 📑 PDF 🌐 Project

图学学报 2021
sym

Visual Turing: The Next Development of Computer Vision in The View of Human-computer Gaming (视觉图灵:从人机对抗看计算机视觉下一步发展)
K. Huang, X. Zhao, Q. Li, Shiyu Hu
Journal of Graphics (《图学学报》, CCF-C Chinese Journal)
📌 Visual Object Tracking 📌 Intelligent Evaluation Technique 📌 AI4Science
📃 Paper 🗒 bibTex 📑 PDF

Preprint

Preprint
sym

Nearing or Surpassing: Overall Evaluation of Human-Machine Dynamic Vision Ability
Shiyu Hu, X. Zhao, Y. Wang, Y. Shan, K. Huang
📌 Visual Object Tracking 📌 Intelligent Evaluation Technique 📌 AI4Science
📑 PDF 🗒 bibTex

Under Review

CAAI-A 2024
sym

Can LVLMs Describe Videos like Humans? A Five-in-One Video Annotations Benchmark for Better Human-Machine Comparison
Shiyu Hu*, X. Li*, X. Li, J. Zhang, Y. Wang, X. Zhao, K. Cheong (*Equal Contributions)
Submitted to a CAAI-A conference, under review
📌 Large Vision-Language Models 📌 Evaluation Technique 📌 Visual Turing

CAAI-A 2024
sym

Students Rather Than Experts: A New AI for Education Pipeline to Model More Human-like and Personalised Early Adolescences
Y. Ma*, Shiyu Hu*, X. Li, Y. Wang, S. Liu, K. Cheong (*Equal Contributions)
Submitted to a CAAI-A conference, under review
📌 AI4Education 📌 LLMs 📌 LLM-based Agent

CAAI-A 2024
sym

DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM
X. Li, Shiyu Hu, X. Feng, D. Zhang, M. Wu, J. Zhang, K. Huang
Submitted to a CAAI-A conference, under review
📌 Visual Language Tracking 📌 Large Language Model 📌 Evaluation Technique

TCSVT 2024
sym

Target or Distractor? Rethinking Similar Object Interference in Single Object Tracking
Y. Wang, Shiyu Hu, D. Zhang, M. Wu, T. Yao, Y. Wang, L. Chen, X. Zhao
IEEE Transactions on Circuits and Systems for Video Technology (CCF-B Journal, Under Review)
📌 Visual Object Tracking 📌 Similar Object Interference 📌 Data Mining

CCF-A 2024
sym

ATCTrack: Leveraging Aligned Target-Context Cues for Robust Vision-Language Tracking
X. Feng, Shiyu Hu, X. Li, D. Zhang, M. Wu, J. Zhang, X. Chen, K. Huang
Submitted to a CCF-A conference, under review
📌 Visual Language Tracking 📌 Multi-modal Alignment 📌 Feature Awareness

CCF-A 2024
sym

Unconstrained Multimodal Air-Writing Benchmark: Writing by Moving Your Fingers in 3D
M. Wu, X. Li, Shiyu Hu, Y. Cai, K. Huang, W. Wang
Submitted to a CCF-A conference, under review
📌 Air-writing Technique 📌 Benchmark Construction 📌 Human-machine Interaction

CCF-B 2024
sym

Enhancing Vision-Language Tracking by Effectively Converting Textual Cues into Visual Cues
X. Feng, D. Zhang, Shiyu Hu, X. Li, M. Wu, J. Zhang, X. Chen, K. Huang
Submitted to a CCF-B conference, under review
📌 Visual Language Tracking 📌 Multi-modal Learning 📌 Grounding Model

CCF-A Workshop 2024
sym

Visual Language Tracking with Multi-modal Interaction: A Robust Benchmark
X. Li, Shiyu Hu, X. Feng, D. Zhang, M. Wu, J. Zhang, K. Huang
Submitted to a workshop in CCF-A conference, under review
📌 Visual Language Tracking 📌 Multi-modal Interaction 📌 Evaluation Technology

心理学报 2024
sym

Intelligent Psychological Assessment with Sandplay based on Evidence-Centered Design Theory (基于证据中心设计理论的智能心理沙盘测评系统)
Y. Ren, X. Feng, Shiyu Hu, Y. Kang, C. Yan, Y. Zeng, L. Wang, K. Huang
Acta Psychologica Sinica (《心理学报》, CSSCI Journal, Top-1 Psychological Journal in China, Under Review)
📌 Psychological Assessment System 📌 Gamified Assessment 📌 AI4Science

⚙️ Projects

The list here mainly includes engineering projects, while more academic projects have already been published in the form of research papers. Please refer to the 📝 Publications for more information.

2018.03-2018.11
sym

Darknet-Cross: Light-weight Deep Learning Framework for Heterogeneous Computing
📌 High-performance Computing 📌 Heterogeneous Computing 📌 Deep learning Framework

  • Darknet-Cross is a lightweight deep learning framework, mainly based on the open-source deep learning algorithm library Darknet and yolov2_light, and it has been successfully ported to mobile devices through cross-compilation. This framework enables efficient algorithm inference using mobile GPUs.
  • Darknet-Cross supports algorithm acceleration processing on various platforms (e.g., Android and Ubuntu) and various GPUs (e.g., Nvidia GTX1070 and Adreno 630).
  • The work is a part of my master’s thesis at the University of Hong Kong (thesis defense grade: A+).
2019.05 - 2019.10
sym

A Skin Color Detection System without Color Atla
📌 Color Constancy 📌 Skin Color Detection 📌 Illumination Estimation

  • Under 18 different environmental lighting conditions and with 4 combinations of smartphone parameters, skin color data was collected from 110 participants. The skin color dataset consists of 7,920 images, with the testing results from CK Company’s MPA9 skin color detector serving as the ground truth for user skin colors.
  • Using an elliptical skin model, the essential skin regions are extracted from the images. The open-source color constancy model, FC4, is employed to recover the environmental lighting conditions. Subsequently, the skin color detection results for users are calculated using SVR regression.
  • The related work has been successfully deployed in Huawei’s official mobile application ‘Mirror’ for its AI skin testing function.
2020.11 - 2021.03
sym

A Project for Cell Tracking Based on Deep Learning Method
📌 Medical Image Processing 📌 AI4Science 📌 Cell Segmentation and Tracking

  • This method follows the tracking by detection paradigm and combines per-frame CNN prediction for cell segmentation with a Siamese network for cell tracking.
  • This project was submitted to the cell tracking challenge in Mar. 2021, and maintains the second place in the Fluo-C2FL-MSC+ dataset and the third place in the Fluo-C2FL-Huh7 dataset (statistics by Oct. 2023).
2024.01 - Now
sym

Research on the Dilemma and Countermeasures of Human-Computer Interaction in Intelligent Education
📌 Intelligent Education Technology 📌 Human-Computer Interaction 📌 AI4Science

  • Incorporating insights and methodologies from education, cognitive psychology, and computer science, this project establishes a theoretical framework for understanding the evolution of HCI within the intelligent education.
  • Drawing upon the established theoretical framework, this project conducts a comprehensive analysis of the evolution of HCI in educational settings, transitioning from collaboration to integration. Furthermore, it delves into the key issues arising from this transformative process within the realm of intelligent education.
  • Building upon the core issues unearthed, this project investigates strategies for leveraging theoretical guidance and technical enhancements to enhance the efficacy of HCI in intelligent education, ultimately striving towards effective human-computer integration.
  • The project is funded by the 2023 Intelligent Education PhD Research Fund, supported by the Institute of AI Education Shanghai and East China Normal University, and is currently in progress.

🏆 Honors and Awards

  • 2024 Best Paper Honorable Mention in the 3rd Workshop on Vision Datasets Understanding and DataCV Challenge in CVPR 2024 (CVPRW最佳论文提名)
  • 2024 Beijing Outstanding Graduates (北京市优秀毕业生, top 5%)
  • 2023 China National Scholarship (国家奖学金, top 1%, only 8 Ph.D. students in main campus of University of Chinese Academy of Sciences win this scholarship)
  • 2023 First Prize of Climbing Scholarship in Institute of Automation, Chinese Academy of Sciences (攀登一等奖学金, only 6 students in Institute of Automation, Chinese Academy of Sciences win this scholarship)
  • 2022 Merit Student of University of Chinese Academy of Sciences (中国科学院大学三好学生)
  • 2017 Excellent Innovative Student of Beijing Institute of Technology (北京理工大学优秀创新学生)
  • 2016 College Scholarship of Chinese Academy of Sciences (中国科学院大学生奖学金)
  • 2016 Excellent League Member on Youth Day Competition of Beijing Institute of Technology (北京理工大学优秀团员)
  • 2015 National First Prize in Contemporary Undergraduate Mathematical Contest in Modeling (CUMCM) (全国大学生数学建模竞赛国家一等奖, top 1%, only 1 team in Beijing Institute of Technology win this prize) [📑PDF] [📖Selected and Reviewed Outstanding Papers in CUMCM (2011-2015) (Chapter 9)]
  • 2015 First Prize of Mathematics Modeling Competition within Beijing Institute of Technology (北京理工大学数学建模校内选拔赛第一名)
  • 2015 Outstanding Individual on Summer Social Practice of Beijing Institute of Technology (北京理工大学暑期社会实践优秀个人)
  • 2015 Second Prize on Summer Social Practice of Beijing Institute of Technology (北京理工大学暑期社会实践二等奖, team leader)
  • 2015 Outstanding Student Cadre of Beijing Institute of Technology (北京理工大学优秀学生干部)
  • 2015 Outstanding League Cadre on Youth Day Competition of Beijing Institute of Technology (北京理工大学优秀团干部)
  • 2015 Outstanding Youth League Branch of Beijing Institute of Technology (北京理工大学优秀团支部, team leader)
  • 2015 Top-10 Activities on Youth Day Competition of Beijing Institute of Technology (北京理工大学十佳团日活动, team leader)
  • 2014 Outstanding Student of Beijing Institute of Technology (北京理工大学优秀学生)
  • 2014, 2015, 2016, 2017 Academic Scholarship of Beijing Institute of Technology (北京理工大学学业奖学金)

📣 Activities and Services

Tutorial

31th IEEE International Conference on Image Processing (ICIP)

  • Title: An Evaluation Perspective in Visual Object Tracking: from Task Design to Benchmark Construction and Algorithm Analysis
  • Date & Location: 27-30 October, 2024, Abu Dhabi, United Arab Emirates
  • Duration: Half-day (Three Hours)

27th International Conference on Pattern Recognition (ICPR)

  • Title: Visual Turing Test in Visual Object Tracking: A New Vision Intelligence Evaluation Technique based on Human-Machine Comparison
  • Date & Location: 01-05 December, 2024, Kolkata, India
  • Duration: Half-day (Three Hours)

17th Asian Conference on Computer Vision (ACCV)

  • Title: From Machine-Machine Comparison to Human-Machine Comparison: Adapting Visual Turing Test in Visual Object Tracking
  • Date & Location: 08-12 December, 2024, Hanoi, Vietnam
  • Duration: Half-day (Three Hours)

Associate Editor

  • Journals: Innovation and Emerging Technologies

Reviewer

  • Conferences: NeurIPS, ICLR, CVPR, ECCV, AAAI, ACMMM, AISTATS, etc.
  • Journals: SCIENCE CHINA Information Sciences, IEEE Access, Journal of Computational Science, Journal of Electronic Imaging, Digital Signal Processing, etc.

🤝 Collaborators

I am honored to collaborate with these outstanding researchers. We engage in close discussions concerning various fields such as computer vision, cognitive science, AI4Science, and human-computer interaction. If you are interested in these areas as well, please feel free to contact me.

📄 CV

✉️ Contact

  • shiyu.hu@ntu.edu.sg (Main)
  • hushiyu199510@gmail.com (Personal)
  • hushiyu2019@ia.ac.cn (Valid from 2019.06 - 2024.07)

My homepage visitors recorded from April 18th, 2024. Thanks for attention.


© Shiyu Hu | Last updated: 2024-10