0% found this document useful (0 votes)
21 views10 pages

Dinesh Katla AWS Backend Data Engineer Updated

Dinesh Katla is an AWS Data Engineer with 8 years of experience in designing and optimizing scalable data platforms using cloud technologies such as AWS and GCP. He has expertise in ETL workflows, data modeling, and real-time data streaming, with a strong focus on building efficient data pipelines for various industries. His professional experience includes leading data engineering projects, implementing data governance, and developing machine learning pipelines, while also holding a Master's degree in Computer Science.

Uploaded by

Akhil Dinesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views10 pages

Dinesh Katla AWS Backend Data Engineer Updated

Dinesh Katla is an AWS Data Engineer with 8 years of experience in designing and optimizing scalable data platforms using cloud technologies such as AWS and GCP. He has expertise in ETL workflows, data modeling, and real-time data streaming, with a strong focus on building efficient data pipelines for various industries. His professional experience includes leading data engineering projects, implementing data governance, and developing machine learning pipelines, while also holding a Master's degree in Computer Science.

Uploaded by

Akhil Dinesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Dinesh Katla

[email protected] | 804-637-8860 | Visa: H1B

AWS Data Engineer


Professional Summary

● Results-driven Data Engineer with 8 years of experience in designing, building, and optimizing scalable
data platforms.
● Expertise in cloud-based data architectures (GCP, AWS), distributed computing frameworks, and real-
time data streaming.
● Proficient in ETL workflows, data modeling, MLOps, and infrastructure automation
using Terraform and Jenkins.
● Passionate about building robust, efficient, and secure data pipelines to drive AI-driven
decision-making and operational analytics.
● Data Engineering experience with expertise in Big data services in industries such as
Healthcare, Marketing, Finance, and Retail.
● Designed ETL pipelines and built key regulatory and financial reports using advanced
SQL in
Snowflake.
● Executed a one-time multi-state data migration from SQL Server to Snowflake using
Python and
SnowSQL.
● Extensive experience with AWS cloud services and SDKs, including API Gateway,
Lambda, S3, IAM, and EC2.
● Designed, developed, and optimized scalable data pipelines on Google Cloud Platform
(GCP), enabling near real-time data integration.
● Implemented data processing solutions using BigQuery, Dataflow, Dataproc, Pub/Sub,
Cloud Functions, and Cloud Storage, ensuring efficient data delivery.
● Built ETL pipelines using PySpark and Python, optimizing data ingestion and
transformation processes for high performance.
● Designed and implemented scalable data engineering solutions in collaboration with Data
Architects, ensuring high-performance data processing.
● Developed and optimized SQL queries for efficient data processing and transformation in
BigQuery, improving query performance and cost efficiency.
● Built and maintained ETL pipelines using PySpark and Dataproc, enabling seamless data
integration from multiple sources.
● Designed and implemented data models to support analytical and reporting needs,
improving data accessibility and usability.
● Developed Python-based scripts for data transformation, automation, and workflow
orchestration, enhancing operational efficiency.
● Loaded data from Informatica Server to HDFS on EMR using Sqoop.
● Led end-to-end data engineering processes, ingesting DynamoDB data into Snowflake via
AWS
Kinesis Firehose and transforming it using PySpark on EC2.
● Integrated Lambda with SQS, Step Functions to process lists and update status in DynamoDB.
● Managed Azure Data Lake Analytics, Azure SQL Database, Databricks, and Data
Warehouse, overseeing access and migrating on-prem databases to Azure Data Lake via
Azure Data Factory.
● Designed and implemented large-scale Lambda architectures using Azure Data Platform
services, including Data Lake, Data Factory, Data Catalog, HDInsight, SQL Server, Azure
ML, and Power BI.
● Proficient in Azure Cloud services, including Data Lake, Databricks, HDInsight, Blob
Storage, Data Factory, Synapse, SQL, SQL DB, DWH, and Storage Explorer.
● Experienced in configuring Spark connections for batch and real-time data processing
using HDFS and in-memory Spark DataFrame API.
● Hands-on experience with Spark architecture, including Spark Core, Spark SQL,
DataFrames, Spark Streaming, Driver Node, Worker Node, Stages, Executors, and Tasks.
● Used HTTPLib, Urllib, Beautiful Soup, and Pandas libraries throughout the development
lifecycle.
● Processed semi-structured data (CSV, XML, JSON) in Hive/Spark using Python.
● Specialized in data integration, migration, and business application development using
IBM InfoSphere Information Server, Ascential, and IBM InfoSphere DataStage Parallel
Extender and Server Editions.
● Skilled in interacting with HDFS to query data using HiveQL for ad-hoc extraction and
analysis, with experience in debugging and writing custom Hive User Defined Functions
(UDFs) as needed.
● Strong expertise in leveraging partitioning and bucketing techniques on managed and
external tables to optimize performance.

Core Competencies

● Data Warehousing
● Data Engineering
● Analytics
● Root Cause Analysis
● Machine Learning

Certifications:
_________________________________________________________________________________________

 Processional GCP Certified

Technical Acumen

Big Data Technologies Hadoop, Map Reduce, Sqoop, Hive, Oozie, Spark, Zookeeper and
Cloudera Manager, Kafka, Flume,Airflow

ETL Tools Informatica, IBM Infosphere DataStage, Tera Data

NoSQL Database HBase, Cassandra, DynamoDB, MongoDB

Monitoring and PowerBI (Microsoft Certified)


Reporting
Hadoop Distribution Horton Works, Cloudera

Programming and Python, Scala, JAVA, GoLang, SQL, Shell Scripting, C, C++, PySpark
Scripting
Databases PostgreSQL, MySQL, TeraData, Oracle, DBT

Operating Systems Linux, Unix, MacOS X, CentOS, Windows 10, Windows 8, Windows 7

GCP Services BigQuery, Dataflow, Dataproc, Pub/Sub, Cloud Storage, Cloud


Composer (Airflow)
Cloud Technologies AWS, Snowflake, GCP, Azure (Azure Data Lake, Azure Data Factory,
Azure Databrick, Azure SQL Database, Azure SQL Data Warehouse
AWS Services Amazon EC2, Amazon S3, Amazon SimpleDB, Amazon MQ, Amazon
ECS, Amazon Lambdas, Amazon Sagemaker, Amazon RDS,
Amazon Elastic Load Balancing, Elastic Search

Database Modeling ER modeling, dimension modeling, Star schema modeling,


Snowflake modeling

Machine Learning Regression (Linear and Logistic), Decision trees, Random Forest,
SVM, KNN, PCA

ML Frameworks PyTorch, Pandas, Keras, NumPy, TensorFlow, Scikit-Learn,


NLTK, Open CV

Version Control GIT, Github, Bitbucket

Professional Experience

Client: CVS Health Care (Remote) May-2024 - Till Date


Role: AWS Data Engineer Atlanta, GA (Remote)

Key Responsibilities:

● Led the redesign and re-architecture of data systems, focusing on high availability, fault
tolerance, reliability, cost optimization, and reduced latency.
● Proficient in StreamSets for efficient data ingestion and transport across diverse
source and destination systems.
● Design and develop scalable and efficient data engineering solutions on Google Cloud
Platform (GCP) in collaboration with the Data Architect.
● Designed and implemented scalable data platforms for efficient ingestion,
storage, and processing of structured and unstructured data.
● Built cloud-native and distributed data systems that enabled real-time analytics
and AI-driven insights.
● Developed ETL pipelines using Python and SQL, processing large-scale datasets
to support analytics and operational reporting.
● Optimized data storage, retrieval, and transformation pipelines for improved
query performance and cost efficiency.
● Implemented real-time data streaming pipelines using Kafka and Apache Beam to
process high-velocity event data.
● Ensured data integrity, quality monitoring, and governance using automated
validation frameworks and dashboards.
● Developed and maintained ML pipelines, integrating data engineering workflows with
MLOps best practices for AI applications.
● Managed cloud-based data infrastructures on GCP, leveraging BigQuery,
Dataflow, Cloud Storage, and Cloud Composer.
● Automated infrastructure provisioning and deployment using Terraform and
Jenkins, ensuring CI/CD best practices.
● Worked in Agile/DevOps environments, collaborating closely with data scientists,
software engineers, and business stakeholders to build scalable data solutions.
● Managed data storage, processing, and security on GCP, leveraging BigQuery,
Dataflow, and Cloud Storage.
● Build, optimize, and maintain data pipelines using BigQuery, Dataproc, PySpark, and
SQL for seamless data integration and transformation.
● Develop and implement data models to support analytical and reporting needs, ensuring
data accuracy and performance.
● Perform SQL processing and query optimization to enhance data retrieval efficiency and
cost-effectiveness.
● Write and maintain Python scripts for data transformation, automation, and pipeline
orchestration.
● Lead data engineering tasks by ensuring high-quality coding, testing, and deployment of
data solutions.
● Monitor and troubleshoot data pipelines, ensuring reliability, scalability, and optimal
performance.
● Collaborate with cross-functional teams, including Data Architects and Analysts, to
support business intelligence and reporting needs.
● Implement data governance, security, and compliance best practices to ensure data
integrity.
● Stay up to date with the latest advancements in GCP and data engineering technologies
to drive innovation and efficiency.
● Designed and implemented multiple Data Quality zones adhering to stringent data
governance principles at both Data Lake and Data Warehouse levels.
● Built a Data Catalog from scratch using Collibra, incorporating lineage tracking, data
governance, and discovery capabilities.
● Built CloudFormation templates for infrastructure components, including SNS, SQS,
Elasticsearch, DynamoDB, Lambda, EC2, VPC, RDS, and S3.
● Created AWS Lambda functions using Python for deployment automation and integrated
public-facing websites with AWS infrastructure.

Client: AT & T(Remote) Apr 2023 - Apr 2024


Role: Azure Data Engineer New York, NY

Key Responsibilities:
● Developed and optimized Apache Spark applications using PySpark and SparkSQL to
process data from relational databases and streaming sources.
● Worked with version control tools like GitHub for managing code repositories and CI/CD
pipelines.
● Conducted end-to-end data analysis, including collection, transformation, and
visualization supporting data-driven decision-making.
● Design, develop, and maintain scalable data pipelines on Google Cloud Platform
(GCP)
● Explored and leveraged Google Kubernetes Engine (GKE) to design and deploy
microservices for real-time data feeds.
● Develop and maintain documentation for data pipelines, and ETL Pipeline processes.
● Stay up-to-date with the latest advancements in data engineering and GCP
technologies.
● Built ETL pipelines using Azure Data Factory (ADF), T-SQL, SparkSQL, and U-SQL to extract,
transform, and load data into Azure Data Lake, Azure Storage, Azure SQL, and Azure
Synapse Analytics.
● Designed and implemented modern data solutions with Azure PaaS for visualization,
business intelligence, and predictive analysis of application performance.
● Proven experience as a Data Engineer with a focus on Google Cloud Platform (GCP).
● Improved Spark application performance through batch interval tuning, memory
optimization, and parallelism adjustments.
● Created data ingestion pipelines for Azure HDInsight Spark clusters using ADF and
SparkSQL, integrating data from on-premise (MySQL, Cassandra) and cloud sources (Blob
Storage, Azure SQL DB).
● Designed data pipelines with Azure Data Lake, Databricks, and Apache Airflow,
incorporating real-time streaming via Apache Flume and storing processed data in Azure
Table Storage.
● Migrated log storage from Cassandra to Azure Synapse Analytics, enhancing query
performance and reducing latency.
● Utilized Azure DevOps for CI/CD, Active Directory for authentication, and Apache
Ranger for authorization management.
● Developed PowerBI dashboards for real-time analytics and business reporting.
● Created Airflow DAGs to automate data ingestion, ETL jobs, and business reporting
workflows.
● Integrated and processed data from Snowflake, MS SQL, MongoDB, and Teradata using
Spark, Hive, and Sqoop.
● Configured Kubernetes to manage online and batch workloads for analytics and machine
learning applications.
● Built and deployed JSON-based Azure Data Factory pipelines to automate SQL
activities and streamline data processing.
● Successfully conducted proof-of-concept (PoC) implementations for SOAP & REST API
integrations to retrieve analytics data from diverse sources.

Client: T-Mobile (India) Jan 2021 - Nov 2022


Role: Senior Data Engineer India - Hyd

Key Responsibilities:
● Expertise in Azure Cloud services, including HDInsight, Data Lake, Databricks, Blob Storage,
Data Factory, Synapse, SQL, and Data Warehouse.
● Designed and implemented data pipelines using Azure Data Lake, Databricks, and Apache
Airflow, enabling complex data workflows and machine learning applications.
● Integrated Snowpark with Azure Data Factory for advanced data orchestration and
optimized
Snowpark scripts for improved query execution.
Strong programming skills in PySpark and Python and additional similar languages.
● Developed ETL pipelines to integrate data from on-premises (MySQL, Cassandra) and
cloud sources (Blob Storage, Azure SQL DB), applying transformations and loading data
into Azure Synapse.
● Built real-time analytics solutions using Spark-Scala functions and configured Spark
Streaming to process live data from Apache Flume, storing results in Azure Table Storage.
● Experience with SQL and relational databases.
● Processed and transformed large datasets using Databricks, Spark Scala scripts, and
UDFs, leveraging Azure Blob Storage for ingestion and storage.
● Familiarity with data modeling, ETL processes, and data warehousing concepts.
● Implement data processing solutions using GCP services such as BigQuery, Python,
Dataproc, Dataflow, Pub/Sub, and Cloud Storage, Cloud Function and other GCP
services.
● Optimize data processing and storage for performance, cost, and scalability in GCP BQ.
● Optimized Spark Streaming API to enhance cluster performance, while applying data
cleansing and business transformations using Spark DataFrames and Databricks
Notebooks.
● Developed DAG workflows with Apache Airflow and Apache NiFi, distributing tasks across
Celery workers for efficient inter-service communication.
● Monitored and tuned Spark clusters using Log Analytics and Ambari Web UI,
improving query performance by migrating log storage from Cassandra to Azure Synapse.
● Built and optimized data ingestion pipelines on Azure HDInsight Spark clusters using Azure
Data Factory and Spark SQL, working extensively with Cosmos DB (SQL API & Mongo API).
● Designed custom input adapters using Spark, Hive, and Sqoop to ingest and analyze data
from
Snowflake, MS SQL, and other sources.

Client: ADP (India) Sep 2018 - Dec 2020


Role: Data Engineer

Key Responsibilities:
● Extensive experience in machine learning, big data, data visualization, and development
using R, Python, Unix, and SQL.
● Conducted exploratory data analysis (EDA) using Python libraries such as NumPy,
Pandas, Matplotlib, and SciPy to uncover patterns and insights.
● Skilled in quantitative analysis, data mining, and statistical modeling, translating complex
data into actionable insights.
● Configured AWS Identity and Access Management (IAM) for enhanced authentication and
access
security.
● Assessed system design feasibility and cost efficiency, recommending cloud solutions on
AWS for optimal performance and scalability.
● Developed complex SQL queries and scripts for data extraction, aggregation, and
validation, ensuring accuracy and alignment with business needs.
● Created high-level analysis reports in Excel and Tableau, identifying billing patterns,
anomalies, and data quality issues.
● Designed and implemented advanced visualizations such as Maps, Heat Maps, Pareto
Charts, Tree Maps, Bullet Charts, and Density Maps in Tableau.
● Proficient in Tableau filtering and sorting techniques, including Quick Filters, Context
Filters, Conditional Filters, and Top Filters.
● Identified and documented data quality limitations, writing SQL queries for validation and
generating
Excel summary reports (pivot tables, charts, etc.).
● Developed functional requirements through data gathering and modeling, leveraging ETL
tools to ensure robust data integration.
● Extracted and analyzed data from multiple sources (CSV, Excel, HTML, SQL
databases), transforming and exporting insights into various formats such as CSV, Excel,
and databases.

Client: SRR Software Private Limited (India) Jan 2017 - Aug 2018
Role: Data Engineer

Key Responsibilities:
● Designed and developed ETL pipelines using Apache Spark and Hadoop to process large-scale
data sets efficiently.
● Collaborated with cross-functional teams to define data requirements and ensure data
availability for
analytics and reporting.
● Implemented and optimized SQL queries for data extraction, transformation, and loading
from
relational databases like MySQL and PostgreSQL.
● Developed and maintained data processing workflows on AWS using Amazon S3, Redshift,
and
Lambda functions to scale data operations.
● Ensured data quality and integrity by creating automated data validation and monitoring
processes.

● Utilized Python and PySpark for data wrangling, processing, and building
reusable data transformation scripts.
● Led the migration of on-premise data pipelines to Azure Data Factory and Azure
Databricks, improving data scalability and performance.
● Built real-time data streaming pipelines using Apache Kafka and Apache Flink for
instant data processing.
● Created and managed data models using data warehousing platforms like Google BigQuery
and
Snowflake, enabling efficient querying and analytics.
● Collaborated with data scientists to ensure smooth integration of machine learning models
into the data pipeline.
● Developed and automated reports and dashboards using Tableau and Power BI to
support business decision-making.
● As a data engineer, managed data security and privacy protocols, ensuring compliance with
GDPR
and HIPAA standards.

Education

Masters in Computer Science


Concordia University St Paul

Bachelors in Computers
Osmania University
Additional Role: AWS Backend/Data Engineer
Title: AWS Backend/Data Engineer
Location: Charlotte, NC
Key Technology Skill Set:
• AWS platform and services knowledge including AWS Lambda, API Gateway, and
EMR/Hudi/Glue/Kafka/Flink with .Net, CI/CD using Terraform/Concourse/Jenkins.
• Primary skills: AWS, Python, Kafka, ETL data processing.
• Secondary skills: .Net, Angular, Terraform, DevOps.
• Additional technology skill set: Spark, MongoDB/DynamoDB/Aurora.
• AWS Certification - Associate or Architect.
• Ability to understand the future architecture and guide the team with required technical
direction.
• 7+ years of experience in cloud development (preferably 8-12 years but may not have
all been in AWS).
Responsibilities:
• Collaborate with cross-functional teams to gather requirements and translate them into
technical specifications.
• Implement AWS services, including AWS Glue, to streamline data processing and
integration workflows.
• Utilize Terraform for infrastructure as code to automate deployment processes and
enhance system reliability.
• Integrate Kafka for real-time data streaming to improve data accessibility and
responsiveness.
• Apply Python programming to develop efficient algorithms for geospatial data analysis
and visualization.
• Ensure application security and compliance with industry standards through regular
code reviews and testing.
• Optimize application performance by identifying bottlenecks and implementing effective
solutions.
• Demonstrate proficiency in AWS services, including AWS Glue, to manage data
workflows efficiently.
• Have experience with Terraform for infrastructure automation, crucial for system
reliability.
• Show expertise in Kafka for real-time data streaming, enhancing data accessibility.
• Exhibit strong Python programming skills for developing algorithms in geospatial data
analysis.
• Experience in hybrid work models, ensuring effective collaboration and productivity.
• Familiarity with industry standards for application security and compliance, ensuring
safe operations.
Additional Responsibilities:
• Develop MVP User Stories (Re-usable components/APIs/UI/Mock stubs etc.).
• Support backlog creation, prioritization, and Story mapping.
• Hands-on developer experience expected with support to technical design and
solutioning.
• Ability to work across tiers – UI development, APIs, databases (basic developer level
tasks), and utilize CI/CD pipeline, create/configure build and deploy jobs.
• Experience working in XP and pair programming model and understands and performs
TDD.
• Responsible for assigned task delivery and quality and knowledge transfer through daily
pairing with Client team members.
• Motivated and energetic engineers who are passionate about software modernization.
• Must be able to clearly communicate with team members and leadership.
• Participate in paired programming and XP development.
• Strong ability to work independently when required.
• Turn complex ideas into manageable pieces of work.
• Hands-on, individual contribution and mentor other developers in team.
• Support team members on troubleshooting.

You might also like