Staff Data Scientist
2024 – Present
Polly.io · New York, NY · Remote
- Led modernization of Polly's analytics platform, migrating 16 production pipelines from legacy batch ETL to Delta Live Tables streaming with Change Data Capture — cutting end-to-end latency from 90 to 22 minutes, improving data freshness from 2-hour batch windows to 15-minute incremental CDC, and eliminating hundreds of millions of duplicate records across 10B+ daily rows.
- Designed ML platform on Databricks for loan volume forecasting, including feature store, experiment tracking, and champion/challenger workflows for production model governance.
- Built an automated knowledge graph spanning 29 production pipelines — enabling instant blast radius analysis before code changes, AI-agent dependency lookups, and auto-publishing 1,240+ field definitions to Confluence via CI/CD-integrated generation.
- Architected a Delta Sharing platform with dynamic view generation and row-level security, onboarding 3 external financial clients (mortgage servicers and hedge funds) with zero manual DDL and self-service configuration.
- Designed agentic workflows using Claude Code and the Anthropic API for CI/CD documentation validation, automated lineage tracking, and developer workflow acceleration.