CDP Implementation
As part of a major enterprise data platform upgrade, I led the implementation of Cloudera Data Platform (CDP) 7.1.9 with a focus on performance, scalability, and governance. My responsibilities included architecting external Hive tables using the Parquet file format to enable faster data access and reduced storage footprint. I fine-tuned complex Hive queries, reducing execution time and compute costs significantly in production environments.
One of the core challenges was managing real-world issues such as column shifts in Hive-Parquet mappings, handling multiple active SCD Type 2 records, and identifying expired keys across lookup layers. I designed and implemented a data validation framework that used natural keys where available, and hashing logic where keys were absent, to ensure cell-to-cell data accuracy between the RDBMS and CDP layers. This framework became a reusable asset across multiple pipelines.
I also standardized the implementation of SCD Type 1 and Type 2 logic across both dimension and lookup tables, improving data reliability and reducing developer overhead in future rollouts.
GCP Migration & Cloud Architecture
In another critical initiative, I spearheaded the migration of legacy systems from Teradata and Hadoop to Google Cloud Platform (GCP), leveraging BigQuery as the analytical engine. My role involved end-to-end planning and execution—from source-to-target mapping (STM), data model redesign, and orchestration planning to go-live support.
I used Airflow (Composer) and Control-M for orchestrating complex data workflows and built scalable ingestion pipelines using Dataflow, which supported high-throughput, near-real-time data processing. Close collaboration with US-based stakeholders was key to finalizing architecture blueprints, defining KPIs, and executing rigorous post-migration tuning.
The end result was a modern, cloud-native platform that was scalable, cost-effective, and auditable. This solution now supports advanced analytics, reporting, and AI/ML workloads for the client with significantly improved turnaround times.
Data Quality & Validation Automation
Across both CDP and GCP engagements, a recurring theme was the need for robust data validation. I conceptualized and built a flexible validation framework that performs column-level checks, row count validations, hash-based comparison, and natural key-based matching. This framework helped detect mismatches arising from schema changes, ingestion issues, or logic errors—long before they impacted downstream systems.
The automation not only accelerated QA cycles but also became a crucial part of the CI/CD workflow, ensuring trusted data delivery at scale.
Skills Demonstrated
Data Architecture & Modeling (SCD Type 1 & 2, Fact/Dim design)
Hive, Parquet, Impala, BigQuery
Python for automation and validation frameworks
Airflow, Control-M, Dataflow orchestration
Migration strategy from on-prem to cloud (Teradata, Hadoop → GCP)
Stakeholder collaboration, Agile delivery, production support
Overview
With extensive experience in data architecture, platform migration, and validation frameworks, I specialize in designing and delivering scalable data platforms across hybrid and cloud environments. My core strength lies in implementing efficient data structures, optimizing performance, and ensuring end-to-end data quality through automation and governance. Below are some of the key projects that reflect my expertise and contributions
Client Feedback
Expertise in data engineering is truly impressive and invaluable.
⭐ "Reliable, solution-driven, and always ahead of the curve"
“We faced significant data quality issues during our CDP upgrade, and Sachin stepped in with a powerful validation framework that caught errors early and saved us weeks of manual effort. His domain knowledge, especially in Hive and SCD strategies, was instrumental in delivering a stable and high-performing platform.”
Data Engineering Lead, BFSI Client
⭐ "Expertise that drove our cloud transformation"
“Sachin played a pivotal role in our successful migration from Teradata and Hadoop to GCP. His ability to design scalable pipelines, optimize BigQuery performance, and ensure seamless production cutovers was outstanding. He was always proactive, technically sound, and aligned perfectly with our business priorities.”
Senior Data Manager, US-based Retail Client
★★★★★
★★★★★
Data Consulting Services
Expert solutions in data engineering, cloud migration, and performance optimization for your business needs.
Cloud Data Migration
Seamless migration across Azure, GCP, and AWS, ensuring data integrity and optimized performance.
Performance Optimization
Enhancing data processing efficiency and speed through tailored optimization strategies for large-scale architectures.
Data Platform Modernization
Modernize your legacy systems by migrating to Cloudera Data Platform (CDP), Google Cloud (GCP), or other cloud-native platforms. I help design scalable, secure, and cost-efficient architectures that support analytics, reporting, and advanced data science use cases.
Data Quality & Validation Frameworks
Poor data quality impacts decision-making and trust. I build automated validation tools that compare source and target datasets across systems using natural keys, hash functions, and row-level comparisons—ensuring complete data integrity.
Expertise
Data consulting, optimization, and cloud migration services.
Community
Leadership
sachinbijwar@gmail.com
+91-8087004537
© 2025. All rights reserved.