Arati Nagmal

Data Engineer · Developer · Open-Source Contributor

About

Solution-oriented software engineer having quick-learning capabilities. Experienced in big data engineering using cutting-edge technologies under Hadoop ecosystem.

Timeline

06/2018
Graduated from Pune University
Joined Great Software Laboratory as Software Engineer
07/2018
10/2018
Won FinTech Hackathon in GSLab
Received Pat on the back Award in GSLab
04/2019
03/2020
Started Job at Coditation Systems
Made contribution to Apache Airflow
03/2020

Skills & Languages

Big Data

Apache Spark Core
Apache Spark SQL
Spark Streaming
Apache Hadoop
Apache Hive
PrestoDB
Apache Kafka
Apache Airflow
Programming Languages

Java
Scala
C
Python
Go
Cloud

AWS
GCP
Languages

English
Hindi
Marathi
Telugu

Work Experience

Data Engineer
Coditation Systems Pvt Ltd
Feb 2020 - Present

Reduced execution time of existing Spark batch job to 20%.
Solely automated pipeline using Apache Airflow + Bash scripts.


Software Engineer
Great Software Laboratory Pvt Ltd
Jul 2018 - Feb 2020

Implemented complex batch processing logic using Apache Spark SQL including performance optimization and solving memory issues.
Knowledge of HDFS optimized file formats like ORC and AVRO.
Knowledge of AWS EMR, S3 with experience in using AWS console for monitoring EMR and EC2 instances and understanding of their configuration files.
Configured Hadoop cluster of 3 nodes along with Hive(MySQL as metastore) and Spark.
Configured HA for ResourceManager using ZooKeeper.

Professional Projects

Dropouts Prediction
Development, Data Engineering, Infrastructure.

Played major role in building codebase/transformations/quality checks for maintaining data lake on S3.
Built a transformer job in PySpark which performs aggregations and stores them as tables on RDS.
Developed a PySpark job to extract and transform features required for the data science team.
Actively participated in implementing backend server in Python Flask which interacts with Dashboard using REST APIs.


Near Real Time Discovery

Explored on different services provided by Cloud Providers like AWS and GCP to notify update events of resources in real-time.
Implemented multiple POCs for AWS resources which involved CloudTrail, SNS, SQS, Lambda services to make the solution cost efficient, reliable and real-time.


Re-engineering of Report Generation
Development, Data Engineering, Infrastructure.

Formulated Spark batch jobs for processing stock exchange data at EOD to calculate incentives per market participant. The generated output further used in report generation.
Optimized the spark jobs to make efficient use of memory and minimize processing time which involved crucial partitioning, complex joins and skewed data.
Utilized S3 for storage of data in ORC format and EMR for executing the spark jobs.
Actively participated in all stages of the project from the very start of requirement gathering to the prod-deployment phase.

Education

Bachelors of Engineering - Information Technology
Pune University
Pune Vidyarthi Griha's College of Engineering and Technology, Pune.
Aug 2015 - Jun 2018
74%

Diploma - Computer Technology
MSBTE Board
Goverment Polytechnic, Solapur.
Aug 2012 - Jun 2015
84%

Achievements

OpenSource Contribution to Apache Airflow
Added feature to existing operator to run custom query to unload data from Redshift to S3.

Pat on the Back Award in GSLab
For excellent job done in developing core component of project + learning and implementing Scala within short time span.

Winner of GS Lab's Internal FinTech Hackathon
Implemented PoC for fruad-avoidance and loan processing in NBFCs using Blockchain(Hyperledger frameworks).

Achieved World Rank 1 in Java Domain on hackerrank.com

Achieved World Rank 1 in SQL Domain on hackerrank.com

Blogs

Data Engineering

Airflow UI Plugin Development: A Walkthrough

Author
Published on: Oct 2020

Apache Airflow, Web UI

Start reading
Data Engineering

Optimization Techniques: ETL With Spark And Airflow

Co-Author
Published on: Sep 2020

Apache Spark, SQL, Apache Airflow

Start reading