Data Science for Non-Data Scientists
Duration: 3 days
Description:
Most enterprises have a lot of data, but aren’t able to fully utilize the knowledge present in the data.
This course aims at getting engineers and other college graduates to unlock the opportunities in data with hands on experience to some of the best tools for data processing.
We’ll start with fundamental math and statistics and move to contemporary tools and algorithms.
Objectives:
-
Understand the value in your data
-
Understand fundamental (high-school level) math required to understand machine learning and fundamental data science
-
Learn how to covert domain models into useful input models for machine learning
-
Learn to use some of the contemporary tools (e.g., Spark MachineLearning, Tensorflow, Various Python libraries)
-
Learn some of the most common machine learning algorithms
Prerequisites:
Basic understanding of statistics and software engineering.
Audience
Software engineers, data engineers, software architects, and technical minded managers
Outline
Introduction
-
What is data science?
-
What is machine learning?
-
What data is useful?
-
A few case studies that illustrate the value of data
-
Goals of this course
Introduction to Python
-
Python fundamentals
-
Introduction to NumPy
-
Data manipulation using Pandas
-
Visualization with Mathplotlib
-
First example of machine learning in Python
Introduction to Computational Thinking
-
Optimization problems
-
Graph-theoretic models
-
Stochastic thinking
-
Random walks
-
Monte Carlo simulation
-
Confidence intervals
-
Let’s talk statistics
-
Confidence intervals
-
Experimental data
Machine Learning
-
What is machine learning, really?
-
Classes of algorithms
-
Clustering
-
Classification
-
Neural nets
-
Common mistakes
-
Best practices