Quantifiable results from data engineering initiatives across different organizations
Reduced manual data processing effort on 500K+ daily records
Production ETL pipelines serving a 150M+ user platform
Consistent data pipeline reliability and reporting adherence
Increase from unified data model architecture
Cross-functional Agile data engineering team
End-to-end change data capture from S3 to warehouse
Achieved through data-driven e-commerce operations
Via Salesforce automation and Python-driven workflows
I'm a data engineer with 6+ years of experience building production pipelines and analytics infrastructure. My journey began in e-commerce operations, where I discovered the transformative power of data-driven decision making.
Currently pursuing an MS in Business Analytics (AI and Data Analytics) with a perfect 4.0 GPA at UMass Boston, I specialize in building enterprise-scale data platforms that process hundreds of thousands of records daily for platforms serving 150M+ users.
My expertise spans the full data stack: from real-time streaming with Kafka and Spark to lakehouse architectures on Databricks, and cutting-edge GenAI infrastructure including RAG pipelines with LangChain and vector databases.
Comprehensive skill set spanning the modern data engineering stack
Real-world data engineering solutions with measurable business impact
Lambda-style batch + streaming architecture with RAG-powered market intelligence at sub-second latency
Data Flow
Key Features
Impact
Airflow-orchestrated RAG system with hybrid semantic search and CDC-driven embedding auto-refresh
Pipeline Flow
Key Features
Technical Highlights
Medallion architecture on Azure Databricks with ADF event-triggered ingestion
Bronze → Silver → Gold · 15+ PySpark quality rules · ACID
Serverless AWS pipeline with Snowpipe for zero-delay loading
Star schema DWH on BigQuery with Looker dashboards
5-min end-to-end CDC from AWS S3 to Snowflake with SCD Type 1/2
Progressive career growth from operations to data engineering leadership
Data engineering & analytics leadership for India's largest ed-tech platform (150M+ users)
Reusable schema validation, null checks, SLA monitoring — standardized across analytics verticals
Built and optimized DAGs across ingestion, transformation, and reporting with automated alerting and retry logic
Analytics leadership for customer operations team of 20+ members
Eliminated manual overhead by extracting from Salesforce and internal databases
Real-time tracking of team productivity and business-critical metrics
E-commerce operations across Amazon, eBay, and Shopify platforms
Improved operational visibility and fulfillment workflows across multi-platform e-commerce
Continuous learning through formal education and industry certifications
University of Massachusetts Boston
Advanced Machine Learning, Predictive Analytics, Big Data Processing, Statistical Modeling
Visvesvaraya Technological University, Bangalore
Senior project on manufacturing process optimization
DeepLearning.AI • 2024
DeepLearning.AI • 2024
Datavidhya • 2024
Datavidhya • 2024
Datavidhya • 2024
Ready to discuss your data engineering needs? Let's connect and explore how we can transform your data infrastructure.
Boston, Massachusetts
Available for remote/hybrid roles
atulpandey02@gmail.com
Quick response within 24h
LinkedIn Profile
Professional network