
Distinguished Engineer
Google · Full-time
2006 - Present
• 18 yrs 9 mos2022 - Present: I currently wear three hats:
* Lakehouse Storage: Unified Zetta-scale storage for Analytics and ML.
* Resource efficiency (Compute, Storage) for Data Infra.
* Analytics Domain: Best-in-the-industry end-to-end analytics for Google.
2018 - 2019: Designed the next generation of Google’s exa-scale logs storage and analytics, with 10X improvements in streaming and interactive query performance. Part of cross-organization teams driving next generation of analytics at Google.
2012 - 2017: Area Tech Lead, YouTube Data: Drove the technical vision for 120+ strong engineering team on exa-scale analytics. Built from scratch Procella, a state-of-the-art distributed SQL query engine that today serves almost all of YouTube stats traffic for users (millions of QPS), as well as internally for dashboards and complex ad-hoc analysis (trillions of rows). Architected several other key infrastructure pieces for stateful streaming pipelines, natural language query processing, data warehouse automation, data security and consistency, etc.
2006 - 2011: Tech Lead, Ads data warehouse: Drove the re-architecture of Ads Warehouse from third party infrastructure (MySQL, Netezza, Oracle) to all Google infrastructure. Built from scratch Tenzing, a state-of-the-art SQL-on-MapReduce query engine built on Google stack (Borg, GFS, MapReduce, Bigtable) that provides comparable features and performance to industry leading commercial distributed databases at significantly lower cost with greater scalability.
2006 - 2008: Co-founded and built Asset Maps, an Ads optimization technology that enables better Ads keyword and creative suggestions.

Software Engineer
Meta · Full-time
Area Tech Lead for Data Infrastructure: Compute, Storage, Metadata and Shared Foundations.
I founded and led a series of projects focused on radical transformation of the open source Analytics and ML stack with order of magnitude native code acceleration, exabyte scalability, breaking new grounds in interactive performance, seamless ML+Analytics convergence, next generation storage and metadata, and more. You can read more about this in our CIDR 2022 paper: https://research.facebook.com/publications/shared-foundations-modernizing-metas-data-lakehouse/
Some highlights:
* Velox: converged eval engine accelerating analytical and ML pre-processing by an order of magnitude. VLDB 2022 Paper: https://research.facebook.com/publications/velox-metas-unified-execution-engine/
* Alpha: Brand new ML optimized columnar storage format
* MetastoreX: Next generation metadata system supporting mutability, real time ingestion, materialized views, rich metadata, and more.
* CoreSQL: Unified SQL dialect across batch, interactive, streaming, ML pre-processing and more.
* RaptorX: DIsaggregated storage with smart hierarchical caching for order-of-magnitude query acceleration.
* Next generation Presto incorporating many of the components above. SIGMOD 2023 paper: https://research.facebook.com/publications/presto-a-decade-of-sql-analytics-at-meta/

Data Warehouse Architect
British Telecom
Architect, designer, data modeler, DBA, developer and maintainer of three generations of data warehouses and business intelligence systems using numerous “old stack” data technologies (Oracle (various products), Ab Initio, Business Objects, Unix, etc.).

Software Engineer
Tata Consultancy Services
Misc projects involving ERP systems, databases (mainly Oracle), C++, Unix / Linux, etc.