0% found this document useful (0 votes)
147 views

Vishal DataEngineer

The document provides a summary of Vishal's professional experience as a Senior Data Engineer. It details his 6+ years of experience designing, developing, and implementing data models and big data projects using technologies like Hadoop, Spark, AWS, and databases like SQL, PostgreSQL and MongoDB. It also lists his technical skills and education.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
147 views

Vishal DataEngineer

The document provides a summary of Vishal's professional experience as a Senior Data Engineer. It details his 6+ years of experience designing, developing, and implementing data models and big data projects using technologies like Hadoop, Spark, AWS, and databases like SQL, PostgreSQL and MongoDB. It also lists his technical skills and education.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Vishal

[email protected]
+1 647 948 3739
Sr. Data Engineer
______________________________________________________________________________________
Professional Summary:
● Having 6+ years of experience as a Data Engineer and Data Analyst including designing, developing, and implementing of data models for enterprise-level applications and systems. 
● Strong knowledge of Software Development Life Cycle (SDLC) and expertise in detailed design documentation. 
● Experience in Agile Methodology, participated in Sprints and daily Scrums to deliver software tasks on time and with good quality on the basis with onsite and offshore teams. 
● Experience in the development of Big Data projects using Hadoop, YARN, Hive, Pig, Flume, and Map Reduce open-source tools/technologies.
● Knowledge in tuning Spark applications to achieve optimal performance.
● Configuration Management, Continuous Integration (CI), Continuous Deployment (CD), Release Management, and Cloud Implementations.
● Experience in setting up enterprise infrastructure on Amazon Web Services (AWS) including EC2, S3, AWS billing console, AMI, IAM, Cloud Formation, VPC, Code Deploy, Sage maker, Service
Catalog, EMR, DynamoDB, CloudWatch, etc. 
● Primarily involved in Data Migration using SQL, SQL Azure, and Azure Data Factory, SSIS, PowerShell.
● Experience with DB technologies like SQL, PostgreSQL, MYSQL, and NoSQL like MongoDB.
● Hands-on experience with scripting languages like PHP, and Python libraries (NumPy, SciPy, matplotlib, Pandas, Seaborn, Sci-kit learn).
● Experience in developing Spark Applications using Spark RDD, Spark Context, Spark MLib, Spark-SQL, and Data frame APIs.
● Experience in writing complex Pig scripts, Hive & Impala queries, Implementation, and import data to Hadoop using Sqoop & vice versa. 
● Experience with Client-Server application development using Oracle PL/SQL, SQL PLUS, SQL Developer, and TOAD 
● Good experience in Data Modeling and Data Analysis as a Proficient in gathering business requirements and handling requirements management. 
● Excellent experience in using Sqoop to import data into HDFS from RDBMS and vice-versa. 
● Solid knowledge of Data Marts, OLAP, OLTP, and Dimensional Data Modeling with Ralph Kimball Methodology using Analysis Services. 
● Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, and MDM. 
● Knowledge and experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper. 
● Knowledge in configuring and managing - Cloudera's Hadoop platform along with CDH3&4 clusters. 
● Extensive experience on HBase High availability and manually tested using failover tests. 
● Proficient experience in importing and exporting data between HDFS and RDBMS using Sqoop. 
● Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms. 
● Good experience with real-time stream processing frameworks such as Kafka, Apache Strom, Apache Nifi
● Addressing complex POCs according to business requirements from the technical end. 
● Experience in Data transformation, Data mapping from source to target database schemas, and Data Cleansing procedures. 
● Good experience in working with different ETL tool environments like SSIS, and Informatica and reporting tool environments like SQL Server Reporting Services (SSRS) and Business Objects. 
● Very good understanding of Partitions, and bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance. 
● Pleasant experience working on analysis tools like Tableau for regression analysis, pie charts, and bar graphs. 
● Experience in writing Storm topology to accept the events from Kafka producer and emit them into Cassandra DB. 
● Experience in working with different data sources like Flat files, XML files, and Databases. 
● Strong experience with architecting highly performant databases using MySQL and Cassandra. 
● Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive. 
● Strong background in mathematics and very good analytical and problem-solving skills.
Technical Skills:
Programming:  Py-Spark, Python, Java, XML, JSON, HIVE, C, C++
Databases:  MySQL, MongoDB, SQL Server 2008,12, HIVE(DW), PostgreSQL

Big Data Ecosystems: Hadoop, MapReduce, YARN, HDFS, HIVE, Spark, Kafka, Apache Airflow, NoSQL, MongoDB
AWS: EC2, S3, IAM, Cloud Formation, VPC, Sagemaker, Service Catalog, EMR, DynamoDB, CloudWatch, ECR, ECS.

Machine Learning: Regression, Clustering, Classification, NLP, Sentiment Analysis.


Source Control/Automation Tools: GitHub, Bamboo CI/CD, Bitbucket, JIRA - tracking

IDE/Development Tools:  Jupiter Notebook, Spring STS, VS Code, Eclipse-Java, Tableau


Mobile Application: Android App Development

Education: Diploma in AI & Data Science

Professional Experience

Sr.Data Engineer
National Bank of Canada- Ottawa Jan 2021 to Present

Responsibilities:
● Worked on a completely cloud-based environment with AI/ML team to build applications. 
● Demonstrated competency with the following AWS services: EC2, EBS, S3, RDS, VPC, Route53, ELB, IAM, Cloud Front, Cloud formation templates and ability to make recommendations on new

cloud offerings fit in the architecture.


● Experience working in building and deploying pipelines using bitbucket, bamboo, and Jira.
● Experience working with networking in AWS VPC, subnets, routing tables, VPNs, CIDRs, Security groups, and Network access control lists. 
● Worked Extensively on setting up new AWS EC2 instances, managing EBS volumes, 
● Experience working with AWS IAM, Secret manager, and Key management system, Created IAM roles and policies with the principle of list privileges, and created and managed s3 buckets.
● Implemented Sagemaker infrastructure pipeline end-to-end – building models, custom ECR, ECS, developed batch-transform jobs, s3 configurations.
● Created custom Docker Images, and Batch Transform jobs on AWS.
● Create, maintain, and customize complex JIRA project configurations including workflows, custom fields, permissions, and notifications.
● Created multiple AWS Lambda functions using python for automation on IAM, S3 access control lists, and antivirus updates.
● Experienced in providing production support 24/7 for the Sagemaker pipeline.
● Experience in running Control -m JOBS, tuning Spark applications for performance optimization.

Environment: Python, PySpark, Control -m, AWS technologies – S3, IAM, Service Catalog, etc., Bamboo CI/CD

Sr.Data Engineer
Metrolinx, Toronto September 2018 to December 2020
Responsibilities:
● Worked as a Sr. Big Data Engineer, provided technical expertise and aptitude to Hadoop technologies as they related to the development of analytics. 
● Prepared ETL technical Mapping Documents along with test cases for each Mapping for future developments to maintain Software Development Life Cycle (SDLC) 
● Experienced in Agile Methodologies, Scrum stories, and sprints experience in a Python-based environment, along with data analytics, data wrangling, and Excel data extracts. 
● Worked with the MDM systems team with respect to technical aspects and generating reports. 
● Developed Spark code by using Scala and Spark-SQL for faster processing and testing and performed complex HiveQL queries on Hive tables. 
● Worked on importing and exporting data from different Relational Database Systems like DB2 into HDFS and Hive and vice-versa, using Sqoop 
● Supported various reporting teams and experience with the data visualization tool Tableau. 
● Involved in importing real-time data to Hadoop using Kafka and implemented the Oozie job daily. 
● Implemented Apache Airflow for authoring, scheduling, and monitoring Data Pipelines.
● Configured Spark Streaming to receive real-time data from Apache Kafka and store the stream data to HDFS using Scala. 
● Implemented Python Data Analysis using Pandas, Matplotlib, Seaborn, Tensor Flow, and Numpy. 
● Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA. 
● Developed Pig scripts in the areas where extensive coding needs to be reduced. 
● Used Amazon EC2 command line interface along with Python to automate repetitive work. 
● Worked on analyzing and examining customer behavioral data using MongoDB. 
● Involved in PL/SQL query optimization to reduce the overall run time of stored procedures. 
● Developed custom Apache Spark programs in Scala to analyze and transform unstructured data. 
● Developed pig scripts to transform the data into a structured format and are automated through Oozie coordinators. 
● Performed data analysis and data profiling using complex SQL on various source systems. 
● Used Python MVC framework to design and develop the application. 
● Extensively used Pig for data cleansing using Pig scripts and Embedded Pig scripts. 
● Analyzed the data which is using the maximum number of resources and made changes in the back-end code using PL/SQL stored procedures and triggers.
● Performed thorough data analysis for the purpose of overhauling the database using SQL Server. 
● Developed consumer-based features and applications using Python and Django in test-driven Development and pair-based programming. 
● Translated business concepts into XML vocabularies by designing XML Schemas with UML. 
● Developed Shell, Perl, and Python scripts to automate and provide Control flow to Pig scripts. 
● Worked with medical claim data in the Oracle database for Inpatient/Outpatient data validation, trend, and comparative analysis. 
● Designed the data marts using Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin. 
● Implemented Data Validation using MapReduce programs to remove unnecessary records before moving data into Hive tables. 
● Used Python-based GUI components for the front-end functionality such as selection criteria. 
● Deployed SSRS reports to Report Manager and created linked reports, snapshots, and subscriptions for the reports and worked on the scheduling of the reports. 
● Integrated multiple sources of data (SQL Server, DB2) into the Hadoop cluster and analyzed data by Hive-HBase integration. 
● Developed entire frontend and backend modules using Python on Django Web Framework. 
● Designed the data aggregations on Hive for ETL processing on Amazon EMR to process data as per business requirements. 
● Developed Sqoop scripts to extract the data from MYSQL and load it into HDFS. 
● Prepared complex T-SQL queries, views, and stored procedures to load data into the staging area. 
● Involved in Hive-HBase integration by creating hive external tables and specifying storage in HBase format. 
● Imported data from different relational data sources like Oracle, and Teradata to HDFS using Sqoop. 

Data Engineer

State Farm Insurance, Toronto September 2016 to August 2018


Responsibilities:
● Worked with the analysis teams and management teams and supported them based on their requirements. 
● Involved in extraction, transformation, and loading of data directly from different source systems (flat files/Excel/Oracle/SQL/Teradata) using SAS/SQL, SAS/macros. 
● Generated PL/SQL scripts for data manipulation, validation, and materialized views for remote instances. 
● Created and modified several database objects such as Tables, Views, Indexes, Constraints, Stored procedures, Packages, Functions, and Triggers using SQL and PL/SQL. 
● Created large datasets by combining individual datasets using various inner and outer joins in SAS/SQL and dataset sorting and merging techniques using SAS/Base. 
● Developed live reports in a drill-down mode to facilitate usability and enhance user interaction 
● Extensively worked on Shell scripts for running SAS programs in batch mode on UNIX. 
● Wrote Python scripts to parse XML documents and load the data into a database. 
● Used Python to extract weekly information from XML files. 
● Developed Python scripts to clean the raw data. 
● Worked on AWS CLI to aggregate clean files in Amazon S3 and on Amazon EC2 Clusters to deploy files into Buckets. 
● Used AWS CLI with IAM roles to load data to the Redshift cluster, 
● Responsible for in-depth data analysis and creation of data extract queries in both Netezza and Teradata databases 
● Extensive development in the Netezza platform using PL SQL and advanced SQLs. 
● Validated regulatory financial data and created automated adjustments using advanced SAS Macros, PROC SQL, UNIX (Korn Shell), and various reporting procedures. 
● Designed reports in SSRS to create, execute, and deliver tabular reports using a shared data source and specified data sources. Also, Debugged, and deployed reports in SSRS. 
● Optimized the performance of queries with modification in TSQL queries, established joins, and created clustered indexes. 
● Used Hive and Sqoop utilities and Oozie workflows for data extraction and data loading. 
● Development of routines to capture and report data quality issues and exceptional scenarios. 
● Creation of Data Mapping document and data flow diagrams. 
● Developed Linux Shell scripts by using Nzsql/Nzload utilities to load data from flat files to the Netezza database. 
● Involved in generating dual-axis bar charts, Pie charts, and Bubble charts with multiple measures and data blending in case of merging different sources. 
● Developed dashboards in Tableau Desktop and published them on Tableau Server which allowed end users to understand the  data on the fly with the usage of quick filters for on-demand needed
information. 
● Created Dashboards style of reports using QlikView components like List box Slider, Buttons, Charts, and Bookmarks. 
● Coordinated with Data Architects and Data Modelers to create new schemas and view in Netezza to improve reports execution time, worked on creating optimized Data-Mart reports. 
● Worked on QA the data and adding Data sources, snapshots, and caching to the report. 
● Involved in troubleshooting at database levels, error handling, and performance tuning of queries and procedures. 

Data Engineer
National Bank of Canada- Ottawa May 2016 to August 2016

Responsibilities:
● Analyzed the physical data model to understand the relationship between existing tables. Cleansed the unwanted tables and columns as per the requirements as part of the duty as a Data Analyst. 
● Established and maintained comprehensive data model documentation including detailed descriptions of business entities, attributes, and data relationships. 
● Worked on Metadata Repository (MRM) for maintaining the definitions and mapping rules up to mark 
● Trained Spotfire tool and gave guidance in creating Spotfire Visualizations to a couple of colleagues 
● Created DDL scripts for implementing Data Modeling changes. 
● Developed data Mart for the base data in Star Schema, Snow-Flake Schema involved in developing the data warehouse for the database. 
● Worked on Unit Testing for three reports and created SQL Test Scripts for each report as required. 
● Extensively used ER Studio as the main tool for modeling along with Visio and worked on Unit Testing for three reports and created SQL Test Scripts for each report as required. 
● Configured & developed the triggers, workflows, and validation rules & having hands-on the deployment process from one sandbox to another. 
● Managed Logical and Physical Data Models in ER Studio Repository based on the different subject area requests for an integrated model. 
● Created automatic field updates via workflows and triggers to satisfy internal compliance requirements of stamping certain data on a call during submission. 
● Worked on Metadata Repository (MRM) for maintaining the definitions and mapping rules up to mark. 
● Developed DataMart for the base data in Star Schema, Snow-Flake Schema involved in developing the data warehouse for the database. 
● Developed enhancements to MongoDB architecture to improve performance and scalability. 
● Forward Engineering the Data models, Reverse Engineering on the existing Data Models and Updates the Data models. 

You might also like