Skip to the content.

dmm-vs-lookup-etl

ETL Pipelines Optimization Research Project

๐Ÿ“Š Evaluating Dynamic Mapping Matrix (DMM) vs Traditional ETL Mapping for Optimized Data Processing

This repository contains the codebase, documentation, and benchmark results for my senior research project, which explores the performance differences between traditional ETL (Extract, Transform, Load) mapping methods and a matrix-based approach called the Dynamic Mapping Matrix (DMM).

๐Ÿง  Project Overview

As modern data systems grow in scale and complexity, optimizing ETL pipelines becomes critical to ensure fast and memory-efficient processing. Traditional ETL methods typically rely on lookup tables and rule-based transformations, which can become computationally expensive at scale.

This project empirically compares:

All three approaches are evaluated using real-world data from the NYC Taxi & Limousine Commission.

๐Ÿ“‚ Project Structure

alt text

๐Ÿ“ˆ Overview

We implemented and profiled the following transformation pipelines:

๐Ÿงช Benchmark Setup

๐Ÿ“Š Results Highlights

๐Ÿ“ˆ Performance Metrics

alt text

alt text

Benchmarks were conducted on all three transformation strategies using:

Early results revealed inconsistencies due to object caching. These were resolved by using del and gc.collect() between runs to ensure fair measurement.

โœ… Final Results:

๐Ÿ”ง Tools & Technologies

๐Ÿ“š Research Goals

๐Ÿ™‹ About Me

Iโ€™m a recent graduate at Eastern Connecticut State University with a double major in Computer Science and Business Information Systems, and a minor in Data and Information Engineering. My academic and project background centers on data engineering, and this research supports my goal of contributing to real-world ETL optimization practices.

๐Ÿ“ฌ Feedback and suggestions are welcome.
๐Ÿ“ See results/ for figures and paper.
๐Ÿ“Ž Code and transformations live in notebooks/