geometric shape digital wallpaper

CDP Implementation

As part of a major enterprise data platform upgrade, I led the implementation of Cloudera Data Platform (CDP) 7.1.9 with a focus on performance, scalability, and governance. My responsibilities included architecting external Hive tables using the Parquet file format to enable faster data access and reduced storage footprint. I fine-tuned complex Hive queries, reducing execution time and compute costs significantly in production environments.

One of the core challenges was managing real-world issues such as column shifts in Hive-Parquet mappings, handling multiple active SCD Type 2 records, and identifying expired keys across lookup layers. I designed and implemented a data validation framework that used natural keys where available, and hashing logic where keys were absent, to ensure cell-to-cell data accuracy between the RDBMS and CDP layers. This framework became a reusable asset across multiple pipelines.

I also standardized the implementation of SCD Type 1 and Type 2 logic across both dimension and lookup tables, improving data reliability and reducing developer overhead in future rollouts.

GCP Migration & Cloud Architecture

In another critical initiative, I spearheaded the migration of legacy systems from Teradata and Hadoop to Google Cloud Platform (GCP), leveraging BigQuery as the analytical engine. My role involved end-to-end planning and execution—from source-to-target mapping (STM), data model redesign, and orchestration planning to go-live support.

I used Airflow (Composer) and Control-M for orchestrating complex data workflows and built scalable ingestion pipelines using Dataflow, which supported high-throughput, near-real-time data processing. Close collaboration with US-based stakeholders was key to finalizing architecture blueprints, defining KPIs, and executing rigorous post-migration tuning.

The end result was a modern, cloud-native platform that was scalable, cost-effective, and auditable. This solution now supports advanced analytics, reporting, and AI/ML workloads for the client with significantly improved turnaround times.

Data Quality & Validation Automation

Across both CDP and GCP engagements, a recurring theme was the need for robust data validation. I conceptualized and built a flexible validation framework that performs column-level checks, row count validations, hash-based comparison, and natural key-based matching. This framework helped detect mismatches arising from schema changes, ingestion issues, or logic errors—long before they impacted downstream systems.

The automation not only accelerated QA cycles but also became a crucial part of the CI/CD workflow, ensuring trusted data delivery at scale.

Skills Demonstrated

  • Data Architecture & Modeling (SCD Type 1 & 2, Fact/Dim design)

  • Hive, Parquet, Impala, BigQuery

  • Python for automation and validation frameworks

  • Airflow, Control-M, Dataflow orchestration

  • Migration strategy from on-prem to cloud (Teradata, Hadoop → GCP)

  • Stakeholder collaboration, Agile delivery, production support

Overview

With extensive experience in data architecture, platform migration, and validation frameworks, I specialize in designing and delivering scalable data platforms across hybrid and cloud environments. My core strength lies in implementing efficient data structures, optimizing performance, and ensuring end-to-end data quality through automation and governance. Below are some of the key projects that reflect my expertise and contributions

photo of bulb artwork

Client Feedback

Expertise in data engineering is truly impressive and invaluable.

⭐ "Reliable, solution-driven, and always ahead of the curve"

“We faced significant data quality issues during our CDP upgrade, and Sachin stepped in with a powerful validation framework that caught errors early and saved us weeks of manual effort. His domain knowledge, especially in Hive and SCD strategies, was instrumental in delivering a stable and high-performing platform.”

Data Engineering Lead, BFSI Client
A large, industrial steel structure with a complex lattice design, viewed from below against a cloudy sky. The metal framework includes crossbeams and riveted joints, suggesting the architecture of a bridge or crane.
A large, industrial steel structure with a complex lattice design, viewed from below against a cloudy sky. The metal framework includes crossbeams and riveted joints, suggesting the architecture of a bridge or crane.

⭐ "Expertise that drove our cloud transformation"

“Sachin played a pivotal role in our successful migration from Teradata and Hadoop to GCP. His ability to design scalable pipelines, optimize BigQuery performance, and ensure seamless production cutovers was outstanding. He was always proactive, technically sound, and aligned perfectly with our business priorities.”

A large bridge spans across a body of water under a dramatic sky filled with diverse and vibrant cloud formations. The bridge features modern architectural elements, and the water below reflects the clouds and the structure. In the foreground, there is a dirt path with a few scattered barrels, which leads towards the water's edge. On the left side, trees and a hint of hills can be seen in the distance.
A large bridge spans across a body of water under a dramatic sky filled with diverse and vibrant cloud formations. The bridge features modern architectural elements, and the water below reflects the clouds and the structure. In the foreground, there is a dirt path with a few scattered barrels, which leads towards the water's edge. On the left side, trees and a hint of hills can be seen in the distance.
Senior Data Manager, US-based Retail Client
★★★★★
★★★★★
turned on monitoring screen

Data Consulting Services

Expert solutions in data engineering, cloud migration, and performance optimization for your business needs.

Cloud Data Migration

Seamless migration across Azure, GCP, and AWS, ensuring data integrity and optimized performance.

Large, billowing white clouds dominate the sky above a concrete bridge. The bridge features railings and two tall lampposts, with a small structure possibly for maintenance or observation on the left side. The sky is a bright blue.
Large, billowing white clouds dominate the sky above a concrete bridge. The bridge features railings and two tall lampposts, with a small structure possibly for maintenance or observation on the left side. The sky is a bright blue.
Performance Optimization

Enhancing data processing efficiency and speed through tailored optimization strategies for large-scale architectures.

A large arch bridge structure is silhouetted against a cloudy sky. The curved metal beams are supported by cables, creating a modern architectural design.
A large arch bridge structure is silhouetted against a cloudy sky. The curved metal beams are supported by cables, creating a modern architectural design.
Data Platform Modernization

Modernize your legacy systems by migrating to Cloudera Data Platform (CDP), Google Cloud (GCP), or other cloud-native platforms. I help design scalable, secure, and cost-efficient architectures that support analytics, reporting, and advanced data science use cases.

Data Quality & Validation Frameworks

Poor data quality impacts decision-making and trust. I build automated validation tools that compare source and target datasets across systems using natural keys, hash functions, and row-level comparisons—ensuring complete data integrity.

clear glass building
clear glass building
white wooden rack
white wooden rack