I am a student in the Master of Science in Computer Vision (MSCV) program at Carnegie Mellon University. I obtained my bachelor’s degree in Computer Science and Technology at Zhejiang University, advised by Prof. Hongzhi Wu.

My interest includes computer graphics and 3D vision.

📖 Education

Carnegie Mellon University, Pittsburgh, USA 2023.9 - 2024.12(expected)

  • Program: Master of Science in Computer Vision
  • Cumulative QPA: 4.33/4.33

Zhejiang University, Hangzhou, China 2019.9 - 2023.6

  • Degree: Bachelor of Engineering
    • Honors degree from Chu Kochen Honors College
  • Major: Computer Science and Technology
  • Overall GPA:    94.6/100   3.98/4
  • Ranking:    1/125
  • Thesis: Real-Time SLAM System based on ARKit Framework (Excellent Graduation Thesis)

💻 Experience

ByteDance 2024.5 - 2024.8

  • Position: AR Effect Engineer Intern


Meta 2024.1 - 2024.5

  • Position: Student Researcher (School Project)
  • Advisor: Shubham Garg, Dr. Pei Wu


Microsoft Research Asia 2022.3 - 2023.6


📝 Projects

jjyouLib [Project Page] 2022.3 - Present

My personal C++ library, a collection of various classes and functions, to streamline the development process of my personal projects. I am keeping updating it as I journey through my educational path and acquire new knowledge.


Optimizing and designing features for Effect House's Visual Effects system 2024.5 - 2024.8

Internship Project at ByteDance

In this internship project, I optimized the particle attribute buffer in Effect House's Visual Effects (VFX) system, which saved more than 50% memory for most template VFX effects in Effect House. Besides, I implemented simulation node in VFX graph editor which allows users to use the VFX system as a general compute pipeline, apart from a GPU particle system. The users can use simulation node to achieve custom physics simulation effects. Finally, I implemented a 3D Gaussian Splatting output node to render 3D scenes using VFX particles.


Large Scale Camera Array Calibration via SfM 2024.1 - 2024.5, 2024.9 - Present

Research Project at Meta

We are working on building an efficient and accurate SfM pipeline to calibrate the camera array and reconstruct the human face from multi-view avatar images.


KinectFusion - Vulkan [Project Page] 2024.3 - 2024.4

Course Project of Robot Localization and Mapping (16-833)

In this project, I implemented KinectFusion based on Vulkan. Different from CUDA, Vulkan is a cross-platform graphics API that supports both graphics rendering and parallel computing. Therefore, my implementation is cross-platform and supports real-time camera tracking, scene reconstruction, and graphics rendering at the same time. The estimated camera poses can also be used to render AR objects onto the input RGB images to achieve AR effects.


Render72: A real-time renderer based on Vulkan [Project Page] 2024.1 - 2024.4

Course Project of Real-Time Graphics (15-472)

I developed a real-time renderer based on Vulkan. It supports multiple material types like mirror, lambertian, and pbr. The scene can have an environment map that can be used for image-based lighting by precomputing radiance/irradiance lookup tables. The renderer also supports analytical lighting with shadow mapping (perspective / omnidirectional / cascade). It also supports deferred shading and screen space ambient occlusion (SSAO).


Anti-Blur Depth Fusion based on Vision Cone Model 2022.11 - 2023.6

Research Project at Microsoft Research Asia

We proposed a depth fusion method to fuse low-resolution depth images while still maintaining high resolution information in the global model. Traditional methods like KinectFusion optimize the reconstruction by minimizing the difference between the reconstruction depths and captured depths. Therefore, they may produce blurred or aliased reconstructions when the image resolution is low. Our method is based on the assumption that the captured depth of a pixel equals to the average of actual depths within the pixel's vision cone. We designed loss functions to minimize difference between the average of reconstruction depths and captured depth. We have tested our method on both SDF voxel and mesh representations and got better reconstruction results than KinectFusion.


Real-Time SLAM System based on ARKit Framework 2022.3 - 2022.10

Research Project at Microsoft Research Asia

We developed a SLAM system for real-time tracking of camera trajectory when scanning indoor scenes with rich planar structures, using only an IOS device like iPhone or iPad. Our system gets the RGB-D data from the LiDAR camera, along with estimated camera poses computed by ARKit framework. It then searches for coplanar and parallel planes in the scene and uses them to optimize camera poses. Meanwhile, it uses a vocabulary tree and a confusion map to detect loops globally. Additionally, it allows users to confirm detected loops via the UI to improve the precision. Also, to avoid memory overflow in long time scan, it uses an embedded database to store infrequently visited data. Experiments show that our method improves the performance of camera localization and loop detection algorithms of ARKit. It allows users to scan large indoor scenes while still runs at real-time frame rate to give feedback to users.


C Compiler [Project Page] 2022.4 - 2022.6

Course Project of Compiler Principle

We developed a compiler that can compile C language into binary codes. The project is divided into three parts: Lexer and parser, code generation, and AST (Abstract Syntax Tree) visualization. The lexer and parser are based on lex and yacc. They receive the source code string and build an AST; The code generation module is based on LLVM. It receives the AST and generate binary codes; And we use HTML to visualize the AST.


3D Game: Interstellar [Project Page] 2021.11 - 2021.12

Course Project of Computer Graphics

We developed a 3D game based on OpenGL, where users can control a spaceship to travel in the universe, watch the view of space stations, planets and stars, and launch missiles to destory enemy spaceships. To make the visual effects more realistic, we applied several techniques, including specular mapping, normal mapping, light attenuation, and collision detection.


Voxel Reconstruction of Opaque Objects [Project Page] 2021.9 - 2021.10

Course Project of Intelligent Acquisition of Visual Information

We proposed a system based on voxel carving and ray casting algorithms to reconstruct the 3D shapes of opaque objects. We firstly use a projector to project structured light on the object and a camera to capture photos simultaneously. These photos are then used to extract silhouettes and estimate depth images of the object. Then We use silhouettes to carve the voxel model of the object, and use depth images to refine it. Finally, ray casting algorithm is used to color the reconstructed model.


MiniSQL [Project Page] 2021.5 - 2021.6

Course Project of Database System

We developed a Database Management System called MiniSQL. It allows users to use SQL statements to 1. create and delete tables; 2. create and delete indices; 3. insert, delete, and select records in the database. The whole project is divided into 7 modules: GUI, interpreter, API, Record Manager, Index Manager, Catalog Manager, and Buffer Manager.


🔧 Skills

  • Programming Language: C/C++, Python, JavaScript, Swift, Objective-C, Verilog
  • Tool&Framework: Vulkan, OpenGL, Metal, OpenCV, CUDA, PyTorch, NumPy, MySQL, Doxygen, CMake