Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
Pandas is a high-level data manipulation tool developed by Wes McKinney. It is built on the Numpy package and its key data structure is called the DataFrame. DataFrames allow you to store and manipulate tabular data in rows of observations and columns of variables.
This is a repository which contains a small demo of danfo.js. danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.
Analysis of Paycheck Protection Program by the government done to the Covid crisis in 2020. This repository focuses on merging, cleaning, normalizing and exploring the data using Python, Python Pandas and PySpark.
Employee exit survey results from two institutes, Department of Education, Training and Employment (DETE) and the Technical and Further Education (TAFE) in Queensland, Australia, were analyzed to discover if there were notable relationships between employee dissatisfaction rates, and factors such as their Age, and how long they had been working at their respective institutes.
Project examines factors affecting the change in car sales between 2019 and 2020 utilizing real world data, python, pandas, matplotlib and jupyter notebook.
Using Python and SQLAlchemy to do basic climate analysis and data exploration of our climate database. Then after initial analysis, designing a Flask API based on the queries that we just developed.
This project is part of Advanced Data Analysis Nanodegree with Udacity. It includes performing an exploratory data analysis using Python. Then, create a presentation with explanatory plots.
Analysis of Pymaceuticals's recent animal study screening for potential anti-cancer treatments. Provided a statistical analysis of the drug regimens and their effectiveness in reducing tumor volume.
Loaded the iris dataset in Python using a Pandas data frame.Performed a PCA using Scikit Decomposition component.Plotted the Principal Components to recreate the scatterplot for each flower type
This ride sharing analysis was done with data used from both Lyft and Uber the data is from 2018 and has rides with city active driver and historic rides with city and driver count with individual fares and city type.