I am a student in the Master of Science in Computer Vision (MSCV) program at Carnegie Mellon University. I obtained my bachelor’s degree in Computer Science and Technology at Zhejiang University, advised by Prof. Hongzhi Wu. I also worked as a research intern at Microsoft Research Asia, advised by Dr. Yizhong Zhang and Dr. Yang Liu.

My interest includes computer graphics and 3D vision.

📖 Education

Carnegie Mellon University, Pittsburgh, USA 2023.9 - 2024.12(expected)

  • Program: Master of Science in Computer Vision
  • Cumulative QPA: 4.33/4.33

Zhejiang University, Hangzhou, China 2019.9 - 2023.6

  • Degree: Bachelor of Engineering
    • Honors degree from Chu Kochen Honors College
  • Major: Computer Science and Technology
  • Overall GPA:    94.6/100   3.98/4
  • Ranking:    1/125
  • Thesis: Real-Time SLAM System based on ARKit Framework (Excellent Graduation Thesis)

💻 Experience

ByteDance 2024.5 - 2024.8

  • Position: AR Effect Engineer Intern

Meta 2024.1 - 2024.5

  • Position: Student Researcher (School Project)
  • Advisor: Shubham Garg, Dr. Pei Wu

Microsoft Research Asia 2022.3 - 2023.6

📝 Projects

jjyouLib [Project Page] 2022.3 - Present

My personal C++ library, a collection of various classes and functions, to streamline the development process. I am keeping updating it as I journey through my educational path and acquire new knowledge.

Large Scale Camera Array Calibration via SfM 2024.1 - Present

Research Project at Meta

We are working on building an efficient and accurate SfM pipeline to calibrate the camera array and reconstruct the human face from multi-view avatar images.

KinectFusion - Vulkan [Project Page] 2024.3 - 2024.4

Course Project of Robot Localization and Mapping (16-833)

In this project, I implemented KinectFusion based on Vulkan. Different from CUDA, Vulkan is a cross-platform graphics API that supports both graphics rendering and parallel computing. Therefore, my implementation is cross-platform and supports real-time camera tracking, scene reconstruction, and graphics rendering at the same time. The estimated camera poses can also be used to render AR objects onto the input RGB images to achieve AR effects.

Render72: A real-time renderer based on Vulkan [Project Page] 2024.1 - 2024.4

Course Project of Real-Time Graphics (15-472)

I developed a real-time renderer based on Vulkan. It can load scene from files (in s72 format) and render it to the screen (including animation). It supports 5 material types: simple, mirror, environment, lambertian, and pbr. The scene can have an environment map that can be used for image-based lighting by precomputing radiance/irradiance lookup tables. The renderer also supports analytical lighting with shadow mapping (perspective / omnidirectional / cascade). It also supports defered shading and screen space ambient occlusion (SSAO).

Anti-Blur Depth Fusion based on Vision Cone Model 2022.11 - 2023.6

Research Project at Microsoft Research Asia

We proposed a depth fusion method to fuse low-resolution depth images while still maintaining high resolution information in the global model. KinectFusion (and many other methods) assumes the actual depth along every direction within a pixel's vision cone equals to the captured depth value. It simply projects each voxel onto the depth image, finds the nearest pixel, and computes the projective SDF value. Therefore, it may produce blurred or aliased models when the image resolution is low. Our method assumes the captured depth value of a pixel equals to the average of actual depths of the scene within the pixel's vision cone. We designed loss functions based on this assumption and wrote CUDA functions to accelerate the optimization process. We have tested our method on both SDF voxel and mesh representations and got better reconstruction results than KinectFusion.

Real-Time SLAM System based on ARKit Framework 2022.3 - 2022.10

Research Project at Microsoft Research Asia

We developed a SLAM system for accurate real-time tracking of camera trajectory when scanning indoor scenes with rich planar structures, using only an IOS device like iPhone or iPad. Our system gets the RGB-D data from the LiDAR camera, along with estimated camera poses computed by ARKit framework. It then searches for coplanar and parallel planes in the scene and uses them to optimize camera poses. Meanwhile, it uses a vocabulary tree and a two-dimensional confusion map to detect loops globally. Additionally, it exploits user's interaction to improve the precision of loop detection. Also, to avoid memory overflow in long time scan, it uses an embedded database to store infrequently visited data. Experiments show that our method improves the performance of camera localization and loop detection algorithms of ARKit. It allows users to scan large indoor scenes while still runs at real-time frame rate to give feedback to users.

C Compiler [Project Page] 2022.4 - 2022.6

Course Project of Compiler Principle

We developed a compiler that can compile C language into binary codes. The project is divided into three parts: Lexer and parser, code generation, and AST (Abstract Syntax Tree) visualization. The lexer and parser are based on lex and yacc. They receive the source code string and build an AST; The code generation module is based on LLVM. It receives the AST and generate binary codes; And we use HTML to visualize the AST.

3D Game: Interstellar [Project Page] 2021.11 - 2021.12

Course Project of Computer Graphics

We developed a 3D game based on OpenGL, where users can control a spaceship to travel in the universe, watch the view of space stations, planets and stars, and launch missiles to destory enemy spaceships. To make the visual effects more realistic, we applied several techniques, including specular mapping, normal mapping, light attenuation, and collision detection. We write our own programs for rendering and game logic, while the resource files (e.g. 3D models, textures) are from the game Stellaris.

Voxel Reconstruction of Opaque Objects [Project Page] 2021.9 - 2021.10

Course Project of Intelligent Acquisition of Visual Information

We proposed a system based on voxel carving and ray casting algorithms to reconstruct the 3D shapes of opaque objects. We firstly use a projector to project structured light on the object and a camera to capture photos simultaneously. These photos are then used to extract silhouettes and estimate depth images of the object. Then We use silhouettes to carve the voxel model of the object, and use depth images to refine it. Finally, ray casting algorithm is used to color the reconstructed model.

MiniSQL [Project Page] 2021.5 - 2021.6

Course Project of Database System

We developed a Database Management System called MiniSQL. It allows users to use SQL statements to 1. create and delete tables; 2. create and delete indices; 3. insert, delete, and select records in the database. The whole project is divided into 7 modules: GUI, interpreter, API, Record Manager, Index Manager, Catalog Manager, and Buffer Manager.

🔧 Skills

  • Programming Language: C/C++, Python, Swift, Objective-C, Verilog
  • Tool: OpenGL, Vulkan, MetalKit, OpenCV, CUDA, PyTorch, NumPy, MySQL, Doxygen, CMake