Vishal DataEngineer
Vishal DataEngineer
[email protected]
+1 647 948 3739
Sr. Data Engineer
______________________________________________________________________________________
Professional Summary:
● Having 6+ years of experience as a Data Engineer and Data Analyst including designing, developing, and implementing of data models for enterprise-level applications and systems.
● Strong knowledge of Software Development Life Cycle (SDLC) and expertise in detailed design documentation.
● Experience in Agile Methodology, participated in Sprints and daily Scrums to deliver software tasks on time and with good quality on the basis with onsite and offshore teams.
● Experience in the development of Big Data projects using Hadoop, YARN, Hive, Pig, Flume, and Map Reduce open-source tools/technologies.
● Knowledge in tuning Spark applications to achieve optimal performance.
● Configuration Management, Continuous Integration (CI), Continuous Deployment (CD), Release Management, and Cloud Implementations.
● Experience in setting up enterprise infrastructure on Amazon Web Services (AWS) including EC2, S3, AWS billing console, AMI, IAM, Cloud Formation, VPC, Code Deploy, Sage maker, Service
Catalog, EMR, DynamoDB, CloudWatch, etc.
● Primarily involved in Data Migration using SQL, SQL Azure, and Azure Data Factory, SSIS, PowerShell.
● Experience with DB technologies like SQL, PostgreSQL, MYSQL, and NoSQL like MongoDB.
● Hands-on experience with scripting languages like PHP, and Python libraries (NumPy, SciPy, matplotlib, Pandas, Seaborn, Sci-kit learn).
● Experience in developing Spark Applications using Spark RDD, Spark Context, Spark MLib, Spark-SQL, and Data frame APIs.
● Experience in writing complex Pig scripts, Hive & Impala queries, Implementation, and import data to Hadoop using Sqoop & vice versa.
● Experience with Client-Server application development using Oracle PL/SQL, SQL PLUS, SQL Developer, and TOAD
● Good experience in Data Modeling and Data Analysis as a Proficient in gathering business requirements and handling requirements management.
● Excellent experience in using Sqoop to import data into HDFS from RDBMS and vice-versa.
● Solid knowledge of Data Marts, OLAP, OLTP, and Dimensional Data Modeling with Ralph Kimball Methodology using Analysis Services.
● Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, and MDM.
● Knowledge and experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
● Knowledge in configuring and managing - Cloudera's Hadoop platform along with CDH3&4 clusters.
● Extensive experience on HBase High availability and manually tested using failover tests.
● Proficient experience in importing and exporting data between HDFS and RDBMS using Sqoop.
● Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms.
● Good experience with real-time stream processing frameworks such as Kafka, Apache Strom, Apache Nifi
● Addressing complex POCs according to business requirements from the technical end.
● Experience in Data transformation, Data mapping from source to target database schemas, and Data Cleansing procedures.
● Good experience in working with different ETL tool environments like SSIS, and Informatica and reporting tool environments like SQL Server Reporting Services (SSRS) and Business Objects.
● Very good understanding of Partitions, and bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
● Pleasant experience working on analysis tools like Tableau for regression analysis, pie charts, and bar graphs.
● Experience in writing Storm topology to accept the events from Kafka producer and emit them into Cassandra DB.
● Experience in working with different data sources like Flat files, XML files, and Databases.
● Strong experience with architecting highly performant databases using MySQL and Cassandra.
● Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive.
● Strong background in mathematics and very good analytical and problem-solving skills.
Technical Skills:
Programming: Py-Spark, Python, Java, XML, JSON, HIVE, C, C++
Databases: MySQL, MongoDB, SQL Server 2008,12, HIVE(DW), PostgreSQL
Big Data Ecosystems: Hadoop, MapReduce, YARN, HDFS, HIVE, Spark, Kafka, Apache Airflow, NoSQL, MongoDB
AWS: EC2, S3, IAM, Cloud Formation, VPC, Sagemaker, Service Catalog, EMR, DynamoDB, CloudWatch, ECR, ECS.
Professional Experience
Sr.Data Engineer
National Bank of Canada- Ottawa Jan 2021 to Present
Responsibilities:
● Worked on a completely cloud-based environment with AI/ML team to build applications.
● Demonstrated competency with the following AWS services: EC2, EBS, S3, RDS, VPC, Route53, ELB, IAM, Cloud Front, Cloud formation templates and ability to make recommendations on new
Environment: Python, PySpark, Control -m, AWS technologies – S3, IAM, Service Catalog, etc., Bamboo CI/CD
Sr.Data Engineer
Metrolinx, Toronto September 2018 to December 2020
Responsibilities:
● Worked as a Sr. Big Data Engineer, provided technical expertise and aptitude to Hadoop technologies as they related to the development of analytics.
● Prepared ETL technical Mapping Documents along with test cases for each Mapping for future developments to maintain Software Development Life Cycle (SDLC)
● Experienced in Agile Methodologies, Scrum stories, and sprints experience in a Python-based environment, along with data analytics, data wrangling, and Excel data extracts.
● Worked with the MDM systems team with respect to technical aspects and generating reports.
● Developed Spark code by using Scala and Spark-SQL for faster processing and testing and performed complex HiveQL queries on Hive tables.
● Worked on importing and exporting data from different Relational Database Systems like DB2 into HDFS and Hive and vice-versa, using Sqoop
● Supported various reporting teams and experience with the data visualization tool Tableau.
● Involved in importing real-time data to Hadoop using Kafka and implemented the Oozie job daily.
● Implemented Apache Airflow for authoring, scheduling, and monitoring Data Pipelines.
● Configured Spark Streaming to receive real-time data from Apache Kafka and store the stream data to HDFS using Scala.
● Implemented Python Data Analysis using Pandas, Matplotlib, Seaborn, Tensor Flow, and Numpy.
● Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA.
● Developed Pig scripts in the areas where extensive coding needs to be reduced.
● Used Amazon EC2 command line interface along with Python to automate repetitive work.
● Worked on analyzing and examining customer behavioral data using MongoDB.
● Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
● Developed custom Apache Spark programs in Scala to analyze and transform unstructured data.
● Developed pig scripts to transform the data into a structured format and are automated through Oozie coordinators.
● Performed data analysis and data profiling using complex SQL on various source systems.
● Used Python MVC framework to design and develop the application.
● Extensively used Pig for data cleansing using Pig scripts and Embedded Pig scripts.
● Analyzed the data which is using the maximum number of resources and made changes in the back-end code using PL/SQL stored procedures and triggers.
● Performed thorough data analysis for the purpose of overhauling the database using SQL Server.
● Developed consumer-based features and applications using Python and Django in test-driven Development and pair-based programming.
● Translated business concepts into XML vocabularies by designing XML Schemas with UML.
● Developed Shell, Perl, and Python scripts to automate and provide Control flow to Pig scripts.
● Worked with medical claim data in the Oracle database for Inpatient/Outpatient data validation, trend, and comparative analysis.
● Designed the data marts using Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin.
● Implemented Data Validation using MapReduce programs to remove unnecessary records before moving data into Hive tables.
● Used Python-based GUI components for the front-end functionality such as selection criteria.
● Deployed SSRS reports to Report Manager and created linked reports, snapshots, and subscriptions for the reports and worked on the scheduling of the reports.
● Integrated multiple sources of data (SQL Server, DB2) into the Hadoop cluster and analyzed data by Hive-HBase integration.
● Developed entire frontend and backend modules using Python on Django Web Framework.
● Designed the data aggregations on Hive for ETL processing on Amazon EMR to process data as per business requirements.
● Developed Sqoop scripts to extract the data from MYSQL and load it into HDFS.
● Prepared complex T-SQL queries, views, and stored procedures to load data into the staging area.
● Involved in Hive-HBase integration by creating hive external tables and specifying storage in HBase format.
● Imported data from different relational data sources like Oracle, and Teradata to HDFS using Sqoop.
Data Engineer
Data Engineer
National Bank of Canada- Ottawa May 2016 to August 2016
Responsibilities:
● Analyzed the physical data model to understand the relationship between existing tables. Cleansed the unwanted tables and columns as per the requirements as part of the duty as a Data Analyst.
● Established and maintained comprehensive data model documentation including detailed descriptions of business entities, attributes, and data relationships.
● Worked on Metadata Repository (MRM) for maintaining the definitions and mapping rules up to mark
● Trained Spotfire tool and gave guidance in creating Spotfire Visualizations to a couple of colleagues
● Created DDL scripts for implementing Data Modeling changes.
● Developed data Mart for the base data in Star Schema, Snow-Flake Schema involved in developing the data warehouse for the database.
● Worked on Unit Testing for three reports and created SQL Test Scripts for each report as required.
● Extensively used ER Studio as the main tool for modeling along with Visio and worked on Unit Testing for three reports and created SQL Test Scripts for each report as required.
● Configured & developed the triggers, workflows, and validation rules & having hands-on the deployment process from one sandbox to another.
● Managed Logical and Physical Data Models in ER Studio Repository based on the different subject area requests for an integrated model.
● Created automatic field updates via workflows and triggers to satisfy internal compliance requirements of stamping certain data on a call during submission.
● Worked on Metadata Repository (MRM) for maintaining the definitions and mapping rules up to mark.
● Developed DataMart for the base data in Star Schema, Snow-Flake Schema involved in developing the data warehouse for the database.
● Developed enhancements to MongoDB architecture to improve performance and scalability.
● Forward Engineering the Data models, Reverse Engineering on the existing Data Models and Updates the Data models.