Cemal Cici

Big Data Engineer

I am a Big Data Engineer with 3+ years of experience in designing and building modern data platforms and warehouse solutions. Skilled in Python, PySpark, Hive, and Iceberg/Delta Lake, I develop scalable ETL/ELT workflows using Airflow, dbt, and SQLMesh. I have experience with PostgreSQL, Trino, and Power BI for analytics and reporting. My focus is on building reliable data pipelines, automating deployments, and integrating ML and big data workloads into modern lakehouse architectures.

Experience

Senior Data Engineer

Bentego

Ankara, Türkiye (Remote)

Apr 2026 – Present

Big Data Engineer

Treomind

Ankara, Turkey (Hybrid)

Apr 2023 – Mar 2026
  • Designed and implemented business-unit-oriented enterprise BI/DWH data architectures using Python, PySpark, Hive and Iceberg/Delta Lake
  • Created relational and dimensional data models and data marts for reporting and analytics on PostgreSQL, Oracle SQL, MSSQL and Trino/Nessie
  • Orchestrated and versioned ETL/ELT workflows in high-volume environments using Airflow, dbt and SQLMesh
  • Published operational dashboards and self-service data layers on Power BI and Apache Superset
  • Set up fast read layers on DuckDB and MinIO (S3-compatible) to improve ad-hoc and analyst-focused query performance
  • Defined and applied partitioning, indexing and proper file format (Parquet/Iceberg/Delta) strategies for existing database schemas
  • Developed and centrally managed a metadata-driven, modular PySpark ETL/ELT framework on HPE Ezmeral Data Fabric
  • Enabled versioned tables, Dev–Prod branch management and automatic maintenance processes with Apache Iceberg + Nessie
  • Created enterprise-scale sustainable data flows with incremental load, idempotent execution, dynamic partitioning and vault-based secrets management
  • Synchronized PostgreSQL DWH/STG schemas with the Iceberg layer and generated DIM/FACT structures with standard templates
  • Automated packaging and secure delivery of Spark/ETL jobs using GitHub Actions, Argo CD and container-based deployments
  • Managed container-based deployment, scaling and service discovery of Trino, Nessie and similar data services on Kubernetes
  • Configured Apache Spark clusters on HPE Ezmeral Data Fabric, applied resource/role policies and executed performance tests for big data workloads
  • Implemented data governance, user/role-based access control and service-level configuration policies on the Ezmeral platform
  • Designed and automatically deployed production-focused ETL/ELT data pipelines for models developed with Azure ML Studio and Python SDK
  • Integrated MLflow, Airflow, MinIO and FastAPI/SQLModel to make model registry, experiment tracking and online model services scalable
  • Integrated end-to-end ML processes with big data workloads on the HPE Runtime MLOps platform using Spark on Kubernetes
Python PySpark Hive Iceberg Delta Lake Airflow dbt SQLMesh PostgreSQL Trino Nessie Kubernetes Docker GitHub Actions Argo CD MinIO DuckDB Power BI Apache Superset Azure ML MLflow FastAPI

Data Scientist

Freelancer

Ankara, Turkey (Remote)

Jul 2022 – Mar 2023
  • Executed ad-hoc data analysis tasks in private pension and life insurance domains
  • Partially automated recurring reports through SQL and Python-based data preparation
  • Migrated IBM SPSS Modeler workflows to Python and SQL for improved maintainability
  • Integrated and preprocessed policy, customer, and payment data for analytical insights
Python SQL IBM SPSS Modeler

Data Science Bootcamp Participant & Volunteer Mentor

Veri Bilimi Okulu

Ankara, Turkey (Remote)

Apr 2020 – Jul 2021
  • Trained on Python, SQL, and Power BI for data processing and visualization tasks
  • Completed project-based applications in CRM analytics, A/B testing, and basic recommendation systems
  • Implemented machine learning pipelines with feature engineering and model training
  • Volunteered as a peer mentor to support concept reinforcement and collaboration
Python SQL Power BI

Education

M.Sc. Information Systems

Gazi University

Ankara, Turkey

2025 – Present

B.Sc. Industrial Engineering (GPA: 3.0/4.0)

Karabuk University

Karabuk, Turkey

2013 – 2018

Skills

Languages & Scripting
Python SQL Bash Script
ETL/ELT & Data Processing
PySpark Pandas Polars dbt SQLMesh SSIS ODI
Big Data Platform
Apache Hadoop (HDFS, YARN) Apache Hive Apache Spark Apache Airflow Apache Kafka HPE Ezmeral Data Fabric MinIO Trino Nessie
Databases
PostgreSQL Oracle SQL MSSQL SQLite DuckDB
BI & Reporting
Power BI Apache Superset OBI
Data/MLOps & DevOps
Docker Kubernetes GitHub Actions Argo CD Linux MLflow FastAPI HPE Ezmeral Runtime Enterprise HPE Ezmeral Unified Analytics
Cloud
Azure ML Studio Microsoft Fabric MinIO (S3-compatible)
Concepts
Data Warehouse Dimensional Modeling Data Lake Data Lakehouse Modern Data Warehouse DataOps MLOps

Certifications

  • Microsoft Certified: Fabric Data Engineer Associate

    Microsoft · Jun 2025

  • dbt Fundamentals

    dbt Labs · Nov 2023

  • A'dan Z'ye Docker

    Udemy · Apr 2023

Languages

  • Turkish Native
  • English Intermediate (B1)