Principal Software Engineer, Oath, June 2017 to present
Principal Software Engineer, Yahoo, June 2012 to June 2017
Architect on the stream content analytics platform. Redesigned previous analytics platform and reduced pipeline count by 10x while increasing customers ability to slice-and-dice the data. Led team of 6 engineers in the development of the project. Used Druid, Oozie, and Pig.
Developed new sketch based method for performing Student's t-test on A/B experiments. This solution saves significant computation time and resources for all experiment evaluations. Developed Pig UDF and deployed to several production systems. These enhancements were pushed to DataSketches.
Lead engineer on the Instamart project, which provides a UI for non-engineers to design, compile, and launch ETL pipelines that load data into Druid. Product is in use by 50+ teams. Technologies used include Pig, Hive, Oozie, Druid, Python, Java, and DataSketches. Presented project at Hadoop Summit 2016.
Led engineer team of 4 in adding features to Apache Incubating Superset and Apache Calcite. Notable features included Druid support for HLL and theta sketches, query pushdown for Druid, Superset and Calcite integration, and UI enhancements. Organized cross company meetings with Airbnb and Hortonworks to synchronize on planning.
Lead engineer for integrating Flurry with Yahoo's data systems after acquisition. Consumed from Flurry's HBase cluster, performed ETL, and integrated results with our existing data systems.
Developed new company-wide audience data ETL pipeline. The many technical challenges included scaling from 40 billion events per day to 200 billion events per day, performing deep data partitioning (which was presented at XLDB 2015), and optimizing for stability and performance guarantees. The project used an extensive tech stack including Java MapReduce, Oozie, Pig, Hive, HBase, Python, Spark, and HCatalog.
UC San Diego
Bachelors of Science in Computer Science
UC San Diego Extension
Certificate, Data Mining
Sessionizing Data Pipelines with Minimal Look-backs and Delayed Look-forwards
Filed May 2015
Method for Ranking Social and Search Web Traffic with Virality Scores
Filed December 2014
Publications & Presentations
Demystifying Dark Matter for Online Experimentation
IEEE BigData, November 7th, 2017
Faster, Faster, Faster!: The True Story of a Mobile Analytics Data Mart on Hive
Hadoop Summit, June 28th, 2016
Method and System for Performing Funnel Analysis in MapReduce Systems
IP.com, March 15th, 2016
Deep Data Partitioning in MapReduce
XLDB, May 19th, 2015
Method and System for Behavioral Testing of Big Data Pipelines using Statistical Distributions
IP.com, February 13th, 2015
Method for Partitioning MapReduce Tasks
IP.com, June 17th, 2014