Staff Software Engineer
Jul 2024 - Present
• 1 yr 1 moTech leading our team to develop state-of-the-art system to generate and simulate realistic environment(Scene , Camera, Lidar data) for Waymo driver eval and validation using latest techs like video diffusion, 3DGS, NeRF, LLM.
Skills: Team Leadership · Artificial Intelligence (AI) · Diffusion · Computer Vision
Senior Software Engineer
Jul 2022 - Jul 2024
• 2 yrs 1 moLead our team’s effort on three generative AI(AIGC) products with four engineers, we work with top researchers to train deep learning diffusion model(similar to DALL·E 2, Stable Diffusion) and build system around it to scaly generate realistic synthetic data for running simulation of Waymo and evaluate Waymo driver behaviors.
• Responsible for model design & implementation & training, model integration with production simulation system, cross team collaboration, production usage landing and onboard engineers engagement.
• AITrafficGen: Landed our first generative AI product: AITrafficGen, which support generate realistic traffic in large scale(>100k scenarios) to discover long-tail events. Model produces 16 agents, with 12s trajectory of each, similar to diffusion based video generation. The synthetic set created with it already used in prod launches.
• AutoDense: Proposed and building our second generative AI product for automatic long tail events densification, support constrainted diffusion to generate similar rare events given seed long tail events to densify evaluation signal. This is similar to image guided diffusion. Backbone of it is based on Perceiver IO, and used transformer as encoder & denoiser. We build the model as large multi-task foundational model.
• LLM Guided SceneGen: Inspired by DALL·E 3, proposed third generative AI product use multi-modal LLM to guide our diffusion model generate synthetic scenario end-to-end given natural language description and image to boost the synthetic data generation at Waymo. Initial protoptype shows very promising result.
Skills: Deep Learning · Machine Learning · Artificial Intelligence (AI) · Large Language Models (LLM) · Distributed Systems · Generative AI · Diffusion
Waypoint - The official Waymo blog: Simulation City: Introducing Waymo's most advanced simulation system yet for autonomous driving

Google
May 2016 - Jul 2022
Senior Software Engineer
May 2021 - Jul 2022
• 1 yr 3 mosWork in Google Assistant Evaluation Eng Team, Focused on building our next-gen Assistant Eval Infra: Assistant Hermetic Eval which provides high eval fidelity, better data
privacy, better scalability.
Tech-Led three engineers to build two critical subsystems: Output scrubbing and on-device hermetic eval
Scoped the open-ended problem, designed the whole project, lead three engineers implemented the project.
– Drove the collaboration with Google Assistant Infra team, NLG(Natural Language Generation) Team, Feature teams to
implement the project.
– Tackled the key problem by using the semantic based annotation in the NLG request.
– Presented the project in org’s all-hands.
Skills: Deep Learning · Machine Learning · Artificial Intelligence (AI) · Distributed Systems
Software Engineer III
May 2019 - May 2021
• 2 yrs 1 moIn Google Assistant Evaluation Eng Team, I Proposed & designed & implemented Assistant Eval Fidelity Toolchain, which is a series of tools to help improve Assistant Eval
Fidelity including:
– Fidelity Dashboard: Dashboard show eval fidelity for each surface feature. Built using Google SQL, PLX.
– Fidelity Search: Automatically search for the data that can increase the eval fidelity by construct field tree, eliminate nodes,
issue data to Assistant stack.
– Fidelity Bug Manager: File bugs to developers for fidelity findings from fidelity search automatically.
– Planned & conducted Fidelity Fixit event with 7 feature teams 1 surface team, fixed 38 fidelity issues.
– Authored 31,166 lines of codes for the toolchain and got 3 spot bonuses, 4 peer bonuses for it.
Designed & Implemented deep learning based Side-by-side eval noise classifier
– Designed & implemented noise data collection pipeline in our eval tool to collect noise, join eval data and extract noise
features.
– Designed & implemented the noise classifier training pipeline using TensorFlow, TFX(Tensorflow extended) to train the DNN
model and the random forest model for noise classification.
– Designed & implemented the noise confidence metric in in eval system for users to filter out the noise automatically from their
eval report, and also provides noise feedback for training.
– Presented the classifier in Org’s all-hands
– Authored 18,539 lines of codes for the project
Skills: Deep Learning · TensorFlow · Distributed Systems · Data Analysis
Software Engineer II
Mar 2017 - May 2019
• 2 yrs 3 mosWork in Google Assistant Infra Team, I migrated the Google Assistant Conversation State API, Built tool to audit Assistant backend failure and improving stability by injecting backend errors.
Skills: Distributed Systems · C++ · Infrastructure · gRPC
Software Engineering Intern
May 2016 - Aug 2016
• 4 mosProject:High Performance In-memory Aggregation Server Core component of Next-Gen Muti-DataCenter Aggregation Datastore that will store all statistics for Ads traffic
• Implemented in-memory aggregation library, and got 3X performance improvement
• Designed and implemented RPC service with gRPC, Spanner, gunit, gmock using Google's best practice
• Built and deployed benchmark suite for aggregation library to steer design decisions

Software Engineer
Baidu, Inc.
Nov 2013 - Jul 2015
• 1 yr 9 mosProject I: Real-time advertisements report statistics system
• Daily dealt with billions of search&click&impression logs
• Cooperated with 36 people from 13 teams, completed log analysis workers
• Shortened the delay period of full flow online report of the system from 3.5 hours to 5-10 minutes
Project II: Hadoop Historical Data Management System
• Remove useless historical data based on user config and saved 10% space in HDFS cluster
• Implemented RESTful APIs in Django and Django RESTful framework, and released it to the other team of Baidu

Exchange Student
UC Berkeley
Aug 2012 - May 2013
• 10 mosProject: Taint-Tracking JavaScript interpreter:
- Implemented a AST interpreter wrote in CoffeeScript, used dynamic tracking to protect sensitive data like user cookies;
- Used esprima library to parse javascript code, dynamic trace function execution, and use jsPlumb to draw function call graph;

Student Technology Director
Student Innovation and Practice Center of TJUT
Feb 2012 - May 2012
• 4 mosResponsible for the technology development of the student in Student Innovate and Practice Center(SIPC), and lead the student develop the website as well as the program for our student activity and for the several department of Tianjin University of Technology.
Staff Software Engineer
Jul 2024 - Present
• 1 yr 1 moTech leading our team to develop state-of-the-art system to generate and simulate realistic environment(Scene , Camera, Lidar data) for Waymo driver eval and validation using latest techs like video diffusion, 3DGS, NeRF, LLM.
Senior Software Engineer
Jul 2022 - Jul 2024
• 2 yrs 1 moLead our team’s effort on three generative AI(AIGC) products with four engineers, we work with top researchers to train deep learning diffusion model(similar to DALL·E 2, Stable Diffusion) and build system around it to scaly generate realistic synthetic data for running simulation of Waymo and evaluate Waymo driver behaviors.
• Responsible for model design & implementation & training, model integration with production simulation system, cross team collaboration, production usage landing and onboard engineers engagement.
• AITrafficGen: Landed our first generative AI product: AITrafficGen, which support generate realistic traffic in large scale(>100k scenarios) to discover long-tail events. Model produces 16 agents, with 12s trajectory of each, similar to diffusion based video generation. The synthetic set created with it already used in prod launches.
• AutoDense: Proposed and building our second generative AI product for automatic long tail events densification, support constrainted diffusion to generate similar rare events given seed long tail events to densify evaluation signal. This is similar to image guided diffusion. Backbone of it is based on Perceiver IO, and used transformer as encoder & denoiser. We build the model as large multi-task foundational model.
• LLM Guided SceneGen: Inspired by DALL·E 3, proposed third generative AI product use multi-modal LLM to guide our diffusion model generate synthetic scenario end-to-end given natural language description and image to boost the synthetic data generation at Waymo. Initial protoptype shows very promising result.
Waypoint - The official Waymo blog: Simulation City: Introducing Waymo's most advanced simulation system yet for autonomous driving

Google
May 2016 - Jul 2022
Senior Software Engineer
May 2021 - Jul 2022
• 1 yr 3 mosWork in Google Assistant Evaluation Eng Team, Focused on building our next-gen Assistant Eval Infra: Assistant Hermetic Eval which provides high eval fidelity, better data
privacy, better scalability.
Tech-Led three engineers to build two critical subsystems: Output scrubbing and on-device hermetic eval
Scoped the open-ended problem, designed the whole project, lead three engineers implemented the project.
– Drove the collaboration with Google Assistant Infra team, NLG(Natural Language Generation) Team, Feature teams to
implement the project.
– Tackled the key problem by using the semantic based annotation in the NLG request.
– Presented the project in org’s all-hands.
Software Engineer III
May 2019 - May 2021
• 2 yrs 1 moIn Google Assistant Evaluation Eng Team, I Proposed & designed & implemented Assistant Eval Fidelity Toolchain, which is a series of tools to help improve Assistant Eval
Fidelity including:
– Fidelity Dashboard: Dashboard show eval fidelity for each surface feature. Built using Google SQL, PLX.
– Fidelity Search: Automatically search for the data that can increase the eval fidelity by construct field tree, eliminate nodes,
issue data to Assistant stack.
– Fidelity Bug Manager: File bugs to developers for fidelity findings from fidelity search automatically.
– Planned & conducted Fidelity Fixit event with 7 feature teams 1 surface team, fixed 38 fidelity issues.
– Authored 31,166 lines of codes for the toolchain and got 3 spot bonuses, 4 peer bonuses for it.
Designed & Implemented deep learning based Side-by-side eval noise classifier
– Designed & implemented noise data collection pipeline in our eval tool to collect noise, join eval data and extract noise
features.
– Designed & implemented the noise classifier training pipeline using TensorFlow, TFX(Tensorflow extended) to train the DNN
model and the random forest model for noise classification.
– Designed & implemented the noise confidence metric in in eval system for users to filter out the noise automatically from their
eval report, and also provides noise feedback for training.
– Presented the classifier in Org’s all-hands
– Authored 18,539 lines of codes for the project
Software Engineer II
Mar 2017 - May 2019
• 2 yrs 3 mosWork in Google Assistant Infra Team, I migrated the Google Assistant Conversation State API, Built tool to audit Assistant backend failure and improving stability by injecting backend errors.
Software Engineering Intern
May 2016 - Aug 2016
• 4 mosProject:High Performance In-memory Aggregation Server Core component of Next-Gen Muti-DataCenter Aggregation Datastore that will store all statistics for Ads traffic
• Implemented in-memory aggregation library, and got 3X performance improvement
• Designed and implemented RPC service with gRPC, Spanner, gunit, gmock using Google's best practice
• Built and deployed benchmark suite for aggregation library to steer design decisions

Software Engineer
Baidu, Inc.
Nov 2013 - Jul 2015
• 1 yr 9 mosProject I: Real-time advertisements report statistics system
• Daily dealt with billions of search&click&impression logs
• Cooperated with 36 people from 13 teams, completed log analysis workers
• Shortened the delay period of full flow online report of the system from 3.5 hours to 5-10 minutes
Project II: Hadoop Historical Data Management System
• Remove useless historical data based on user config and saved 10% space in HDFS cluster
• Implemented RESTful APIs in Django and Django RESTful framework, and released it to the other team of Baidu

Exchange Student
UC Berkeley
Aug 2012 - May 2013
• 10 mosProject: Taint-Tracking JavaScript interpreter:
- Implemented a AST interpreter wrote in CoffeeScript, used dynamic tracking to protect sensitive data like user cookies;
- Used esprima library to parse javascript code, dynamic trace function execution, and use jsPlumb to draw function call graph;