The Wayback Machine - https://web.archive.org/web/20220329181142/https://github.com/dinhanhthi/data-science-learning
Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 

📊 data-science-learning

The list of things I've finished so far on the way of learning by myself Machine Learning and Data Science.

🔥 Projects

  • Setting up a café in Ho Chi Minh City — find a best place to setting up a new business — article — source.
  • Titanic: Machine Learning from Disaster (from Kaggle) — predicts which passengers survived the Titanic shipwreck — source.

I also do some mini-projects for understanding the concepts. You can find the html files (exported from the corresponding Jupyter Notebook files) and "Open in Colab" files for below mini projects here.

🎲 Tasks

  • Anomaly Detection. — my note
  • Data Aggregation — my note
  • Data Overview. — my note
  • Data Visualization.
  • Model evaluation.
  • Preprocessing (texts, images, dates & times, structured data). — my note
  • Testing. — my note
  • Web Scraping.

ðŸ?? Programming Languages

  • GraphQL — an open-source data query and manipulation language for APIs, and a runtime for fulfilling queries with existing data.
  • Python — an interpreted, high-level, general-purpose programming language — my note.
  • R — a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing.
  • Scala — a general-purpose programming language providing support for functional programming and a strong static type system.
  • SQL — a domain-specific language used in programming and designed for managing data held in a relational database management system, or for stream processing in a relational data stream management system.

⚙� Frameworks & Platforms

  • Apache Airflow — my note
  • Docker — a set of platform as a service products that use OS-level virtualization to deliver software in packages called containers — my note
  • Google Colab — a free cloud service, based on Jupyter Notebooks for machine-learning education and research — my note.
  • Google Kubernetes
  • Hadoop — a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation.
  • Kaggle — an online community of data scientists and machine learners, owned by Google.
  • PostgreSQL (Postgres) — a free and open-source relational database management system emphasizing extensibility and technical standards compliance.
  • Spark — an open-source distributed general-purpose cluster-computing framework.

⚒� Tools

  • Bash — my note
  • Git — a distributed version-control system for tracking changes in source code during software development — my note.
  • Markdown — a lightweight markup language with plain text formatting syntax — my note.
  • Jupyter Notebook — an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text — my note.
  • Trello — a web-based Kanban-style list-making application.

📚 Libraries & Frameworks

The "ticked" libraries don't mean that I've known/understand whole of them (but I can easily use them with their documentation)!

  • D3js — a JavaScript library for producing dynamic, interactive data visualizations in web browsers.
  • Keras — an open-source neural-network library written in Python.
  • Matplotlib — a plotting library for the Python programming language and its numerical mathematics extension NumPy. — my note
  • Numpy — a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. — my note
  • OpenCV — a library of programming functions mainly aimed at real-time computer vision.
  • Pandas — a software library written for the Python programming language for data manipulation and analysis. -- my note
  • Plotly -- the front-end for ML and data science models.
  • PyTorch -- my note
  • Seaborn — a Python data visualization library based on matplotlib.
  • Scikit-learn — a free software machine learning library for the Python programming language.
  • TensorFlow — a free and open-source software library for dataflow and differentiable programming across a range of tasks.

👨�� Courses

The "non-checked" courses are under the way to be finished!

📖 Books

The "non-checked" books are under the way to be finished!

🤖 Github's repositories

� Other resources


The descriptions of terms in this site are borrowed from Wikipedia.

About

📊 All of courses, assignments, exercises, mini-projects and books that I've done so far in the process of learning by myself Machine Learning and Data Science.

Topics

Resources

Stars

Watchers

Forks