Arati Nagmal

About

Solution-oriented software engineer having quick-learning capabilities. Experienced in big data engineering using cutting-edge technologies under Hadoop ecosystem.

Timeline

06/2018

Graduated from Pune University

Joined Great Software Laboratory as Software Engineer

07/2018

10/2018

Won FinTech Hackathon in GSLab

Received Pat on the back Award in GSLab

04/2019

03/2020

Started Job at Coditation Systems

Made contribution to Apache Airflow

03/2020

Skills & Languages

Big Data

Apache Spark Core

Apache Spark SQL

Spark Streaming

Apache Hadoop

Apache Hive

PrestoDB

Apache Kafka

Apache Airflow

Programming Languages

Java

Scala

C

Python

Go

Cloud

AWS

GCP

Languages

English

Hindi

Marathi

Telugu

Work Experience

Data Engineer

Coditation Systems Pvt Ltd

Feb 2020 - Present

Reduced execution time of existing Spark batch job to 20%.
Solely automated pipeline using Apache Airflow + Bash scripts.

Software Engineer

Great Software Laboratory Pvt Ltd

Jul 2018 - Feb 2020

Implemented complex batch processing logic using Apache Spark SQL including performance optimization and solving memory issues.
Knowledge of HDFS optimized file formats like ORC and AVRO.
Knowledge of AWS EMR, S3 with experience in using AWS console for monitoring EMR and EC2 instances and understanding of their configuration files.
Configured Hadoop cluster of 3 nodes along with Hive(MySQL as metastore) and Spark.
Configured HA for ResourceManager using ZooKeeper.

Professional Projects

Dropouts Prediction

Development, Data Engineering, Infrastructure.

Played major role in building codebase/transformations/quality checks for maintaining data lake on S3.
Built a transformer job in PySpark which performs aggregations and stores them as tables on RDS.
Developed a PySpark job to extract and transform features required for the data science team.
Actively participated in implementing backend server in Python Flask which interacts with Dashboard using REST APIs.

Near Real Time Discovery

Explored on different services provided by Cloud Providers like AWS and GCP to notify update events of resources in real-time.
Implemented multiple POCs for AWS resources which involved CloudTrail, SNS, SQS, Lambda services to make the solution cost efficient, reliable and real-time.

Re-engineering of Report Generation

Development, Data Engineering, Infrastructure.

Formulated Spark batch jobs for processing stock exchange data at EOD to calculate incentives per market participant. The generated output further used in report generation.
Optimized the spark jobs to make efficient use of memory and minimize processing time which involved crucial partitioning, complex joins and skewed data.
Utilized S3 for storage of data in ORC format and EMR for executing the spark jobs.
Actively participated in all stages of the project from the very start of requirement gathering to the prod-deployment phase.

Education

Bachelors of Engineering - Information Technology

Pune University

Pune Vidyarthi Griha's College of Engineering and Technology, Pune.

Aug 2015 - Jun 2018
74%

Diploma - Computer Technology

MSBTE Board

Goverment Polytechnic, Solapur.

Aug 2012 - Jun 2015
84%

Achievements

OpenSource Contribution to Apache Airflow

Added feature to existing operator to run custom query to unload data from Redshift to S3.

Pat on the Back Award in GSLab

For excellent job done in developing core component of project + learning and implementing Scala within short time span.

Winner of GS Lab's Internal FinTech Hackathon

Implemented PoC for fruad-avoidance and loan processing in NBFCs using Blockchain(Hyperledger frameworks).

Achieved World Rank 1 in Java Domain on hackerrank.com

Achieved World Rank 1 in SQL Domain on hackerrank.com

Blogs

Data Engineering

Airflow UI Plugin Development: A Walkthrough

Author

Published on: Oct 2020

Apache Airflow, Web UI

Start reading

Data Engineering

Optimization Techniques: ETL With Spark And Airflow

Co-Author

Published on: Sep 2020

Apache Spark, SQL, Apache Airflow

Start reading