HamzaGabajiwala
Software Development Engineer at Yahoo
Building large-scale data pipelines and audience targeting systems at Yahoo. Working with Spark, Airflow, and Flink on AWS to process audience segments for programmatic advertising at scale. Recently integrating GenAI/LLM capabilities into search retargeting pipelines.
01Skills
Data Engineering
Apache SparkPySparkApache AirflowApache FlinkOpenSearchAmazon MSK (Kafka)ProtobufAvro/ORC
GenAI / AI Tooling
Claude (Anthropic)Claude CodeAmazon BedrockLLM API integrationPrompt engineeringAgentic development
Cloud & DevOps
AWS EMREMR ServerlessAWS S3AWS GlueAWS LambdaEC2AWS BedrockChronosphereOpenTelemetryDockerKubernetesJenkinsCI/CD
Languages
PythonScalaJavaC++JavaScriptTypeScriptBash
Databases & Web
MySQLPostgreSQLMongoDBRedisFastAPISQLModelReact
02Experience
February 2024 — Present
Yahoo
Software Development Engineer I · Dublin, Ireland
- Built Accelerated Audience Activation end-to-end — Spark + Avro + bucketed user partitioning + backward-compatible named-parameter scoring app — that cut new-segment activation latency from 0–0 hours to ~4 hours.
- Built the GenAI keyword-expansion DAG (Airflow + EMR Serverless + Amazon Bedrock, currently Claude Sonnet 4.5) — scaled LLM concurrency from 0 to 0 threads to saturate the 0-RPM model quota and shipped inference-profile reuse that stopped Bedrock's 0-profile per-region cap from killing hourly runs.
- Shipped 3-level real-time re-engagement targeting in Flink (Line → Package → Campaign) — extended the segment cache with TLongObjectMap package/order indexes, added a DSP line mapping cache loaded from S3, and feature-flagged the rollout so it activates only when the new rule types are exposed.
- Designed a segment-reprocessing system covering Yahoo DSP's 0K-segment audience catalog — daily health-check DAG with Slack alerting, a forward-compatible write_target toggle for the upcoming OpenSearch → S3 cutover, and remediation playbooks that resolved live customer incidents (traced 0 segments, restored 0 for a major travel advertiser — ~0M users brought back).
- Migrated 4 production scoring systems — upgraded data access layers, moved from EMR v6 → v7, replaced Glue catalog reads with direct S3, and migrated monitoring to Chronosphere via OpenTelemetry — cutting batch-scoring cost 0% and unlocking sub-minute alerting.
- Led the team's adoption of agentic development tooling — consolidated 4 product repos under a shared submodule layout, published a Claude Code plugin marketplace with 7+ shared skills, and ran weekly knowledge-sharing for the Dublin team.
June 2021 — June 2022
TIAA GBS
Software Developer · Mumbai, India
- Migrated 0+ test cases from Selenium to WebDriver in 0 days — manual regression to automated nightly runs.
- Cut data-collection downtime by 0% with REST API ingestion, enabling same-day reporting.
03Projects
04Publications
05Education
2022 — 2023
Trinity College Dublin
M.Sc. Computer Science — AR/VR · 1:1
2018 — 2022
NMIMS University
B.Tech Computer Engineering · GPA: 3.45/4