Gravity
Corporate
Data Science with Python
Syllabus for Data Science with Python
Module 1: Module 1: SQL (Postgre db)
-
Data Analysis using sql:
1. DDL
2. DML
3. DCL
4. DQL
5. Joins
6. Agg functions
7. Window functions
8. Cte -
Preparing data
Module 2: Python
1. Python Basic:
-
Introduction of python and comparison with other programming language
-
Installation of anaconda distribution and other python ide
-
Python objects, number & Booleans, strings
-
Container objects, mutability of objects
-
Operators - arithmetic, bitwise, comparison and assignment operators, operator’s precedence and associativity
Conditions (if else, if-elif-else), loops (while, for) -
Break and continue statement and range function
2. string objects:
-
basic data structure in python
-
String object basics
-
String inbuilt methods
-
Splitting and joining strings
-
String format functions
3. List Object Basics:
-
List methods
-
List as stack and queues
-
List comprehensions
4. Tuples, Sets, Dictionaries & its Function:
-
Dictionary object methods
-
Dictionary comprehensions
-
Dictionary view objects
-
Functions basics, parameter passing, iterators
-
Generator functions
-
Lambda functions
-
Map, reduce, filter functions
5. OOPs Concepts:
-
oops basic concepts.
-
Creating classes
-
Pillars of oops
-
Inheritance
-
Polymorphism
-
Encapsulation
-
Abstraction
-
Decorator
-
Class methods and static methods
-
Special (magic/dunder) methods
-
Property decorators - getters, setters, and delete
6. Exception Handling and Difference between Exception and Error:
-
Exceptions handling with try-except
-
Custom exception handling
-
List of general use exception
-
Best practice exception handling
7. Interaction with database using python
8. Pandas:
-
Python pandas - series
-
Python pandas – data frame
-
Python pandas – panel
-
Python pandas - basic functionality
-
Reading data from different file system
-
Python pandas – re indexing python
-
Pandas – iteration
-
Python pandas – sorting.
-
Working with text data options & customization
.
-
Working with text data options & customization
-
Indexing & selecting
-
Data statistical functions
-
Python pandas - window functions
-
Python pandas - date functionality
-
Python pandas –time delta
-
Python pandas - categorical data
-
Python pandas – visualization
-
Python pandas – iotools
9. Python Numpy
-
Numpy - ND array object.
Numpy - data types. -
Numpy - array attributes.
-
Numpy - array creation routines.
-
Numpy - array from existing.
-
Data array from numerical ranges.
-
Numpy - indexing & slicing.
-
Numpy – advanced indexing.
-
Numpy – broadcasting.
-
Numpy - iterating over array.
-
Numpy - array manipulation.
-
Numpy - string functions.
-
Numpy - mathematical functions.
-
Numpy - arithmetic operations.
-
Numpy - statistical functions. Sort, search & counting functions.
-
Numpy - byte swapping.
-
Numpy - copies &views.
-
Numpy - matrix library.
-
Numpy - linear algebra
10. Visualization
-
Matplotlib
-
Seaborn
-
Cufflinks
-
Plotly
-
Bokeh
Module 3: Maths
1. Statistics Basic
-
Introduction to basic statistics terms
-
Types of statistics
-
Types of data
-
Levels of measurement
-
Measures of central tendency
-
Measures of dispersion
-
Random variables
-
Set
-
Skewness
-
Covariance and correlation
2. Probability Distribution Function
-
Probability density/distribution function
-
Types of the probability distribution
-
Binomial distribution
-
Poisson distribution
-
Normal distribution (Gaussian distribution)
-
Probability density function and mass function
-
Cumulative density function
-
Examples of normal distribution
-
Bernoulli distribution
-
Uniform distribution
-
Z stats
-
Central limit theorem
-
Estimation
3. Statistics Advance
-
a Hypothesis
-
Hypothesis testing’s mechanism
-
P-value
-
T-stats
-
Student t distribution
-
T-stats vs. Z-stats: overview
-
When to use a t-tests vs. Z-tests
-
Type 1 & type 2 error
-
Bayes statistics (Bayes theorem)
-
Confidence interval(ci)
-
Confidence intervals and the margin of error
-
Interpreting confidence levels and confidence intervals
-
Chi-square test
-
Chi-square distribution using python
-
Chi-square for goodness of fit test
-
When to use which statistical distribution?
-
Analysis of variance (anova)
-
Assumptions to use anova
-
Anova three type
-
Partitioning of variance in the anova
-
Calculating using python
-
F-distribution
-
F-test (variance ratio test)
-
Determining the values of f
-
F distribution using python
Module:4: Power BI
-
Power BI introduction and overview
-
Key Benefits of Power BI
-
Power BI Architecture
-
Power BI Process
-
Components of Power BI
-
Power BI - Building Blocks
-
Power BI vs other BI tools
-
Power Installation
-
Overview of Power BI Desktop
-
Data Sources in Power BI Desktop
-
Connecting to a data Sources
-
Query Editor in Power BI
-
Views in Power BI
-
Field Pane
-
Visual Pane
-
Custom Visual Option
-
Filters
-
Introduction to using Excel data in Power BI
-
Exploring live connections to data with Power BI
-
Connecting directly to SQL Azure, HD Spark, SQL Server Analysis Services/ My SQL
-
Import Power View and Power Pivot to Power BI
-
Power BI Publisher for Excel
-
Content packs
-
Introducing Power BI Mobile
-
Power Query Introduction
-
Query Editor Interface
-
Clean and Transform your data with Query Editor
-
Data Type
-
Column Transformations vs Adding Colums
-
Text Transformations
-
Cleaning irregularly formatted data -Transpose
-
Date and Time Calculations
-
Advance editor: Use Case
-
Query Level Parameters
-
Combining Data – Merging and Appending
-
Data Modelling
-
Calculated Columns
-
Measures/New Quick Measures
-
Calculated Tables
-
Optimizing Data Models
-
Row Context vs Set Context
-
Cross Filter Direction
-
Manage Data Relationship
-
Why is DAX important?
-
Advanced calculations using Calculate functions
-
DAX queries
-
DAX Parameter Naming
-
Time Intelligence Functions
-
Types of visualization in a Power BI report
-
Custom visualization to a Power BI report
-
Matrixes and tables
-
Getting started with color formatting and axis properties
-
Change how a chart is sorted in a Power BI report
-
Move, resize, and pop out a visualization in a Power BI report
-
Drill down in a visualization in Power BI
Module:5: Tableau
-
What is Tableau
-
Why Tableau
-
Tableau vs Power BI vs Qlik
-
Features of Tableau
-
Tableau Architecture
-
Tableau Installation
-
Tableau Product Line
-
Advantages of Tableau
-
How to use Tableau
-
Connecting to Data
-
Tableau Datatypes
-
Tableau Architecture
-
Heat Maps
-
Column Chart
-
Horizontal Bar Chart
-
Stacked Column Chart
-
Stacked Bar Chart
-
Keep Only,Exclude
-
Keep Only, Exclude2_Normal
-
Publish to Tableau Public
-
Funnel Chart
-
Advanced Funnel Chart
-
Calendar
-
Dumbell Chart
-
Donut Chart
-
Multiple Donut Chart
Module:6: Machine Learning
1. Introduction to Machine Learning:
-
Introduction to basic statistics terms
-
Types of statistics
-
Types of data
-
Levels of measurement
-
Measures of central tendency
-
Measures of dispersion
-
Random variables
-
Set
-
Skewness
-
Covariance and correlation
2. Feature Engineering
-
Handling missing data
-
Handling imbalanced data
-
Up-sampling
-
Down-sampling
-
Smote
-
Data interpolation
-
Handling outliers
-
Filter method
-
Wrapper method
-
Embedded methods
-
Feature scaling
-
Standardization
-
Mean normalization
-
Min-max scaling
-
Unit vector
-
Feature extraction
-
Pca (principle component analysis)
-
Data encoding
-
Nominal encoding
-
One hot encoding
-
One hot encoding with multiple categories
-
Mean encoding
-
Ordinal encoding
-
Label encoding
-
Target guided ordinal encoding
-
Covariance
-
Correlation check
-
Pearson correlation coefficient
-
Spearman’s rank correlation
-
Vif
3. Feature Selection
-
Feature selection
-
Recursive feature elimination
-
Backward elimination
-
Forward elimination
4. Regression
-
Linear regression
-
Gradient descent
-
Multiple linear regression
-
Polynomial regression
-
R square and adjusted r square
-
Rmse , mse, mae comparison
-
Regularized linear models
-
Ridge regression
-
Lasso regression
-
Elastic net
5. Logistics Regression
-
Logistics regression in-depth intuition
-
In-depth mathematical intuition
-
In-depth geometrical intuition
-
Hyper parameter tuning
-
Grid search cv
-
Randomize search cv
-
Data leakage
-
Confusion matrix
-
Precision,recall,f1 score ,roc, auc
-
Best metric selection
-
Multiclass classification in lr
6. Decision Tree
-
Decision tree classifier
-
In-depth mathematical intuition
-
In-depth geometrical intuition
-
Confusion matrix
-
Precision, recall,f1 score ,roc, auc
-
Best metric selection
-
Decision tree repressor
-
In-depth mathematical intuition
-
In-depth geometrical intuition
-
Performance metrics
7. Naive Bayes
-
Bayes theorem
-
Multinomial naïve Bayes
-
Gaussian naïve Bayes
-
Various type of Bayes theorem and its intuition
-
Confusion matrix
-
precision ,recall,f1 score ,roc, auc
-
Best metric selection
8. KNN
-
Knn classifier
-
Knn repressor
-
Variants of knn
-
Brute force knn
-
K-dimension tree
-
Ball tree
9. Clustering
-
Clustering and their types
-
K-means clustering
-
K-means++
-
Batch k-means
-
Hierarchical clustering
-
Dbscan
-
Evaluation of clustering
-
Homogeneity, completeness and v-measure
-
Silhouette coefficient
-
Davies-bouldin index
-
Contingency matrix
-
Pair confusion matrix
-
Extrinsic measure
-
Intrinsic measure
10. Dimensionality Reduction
-
The curse of dimensionality
-
Dimensionality reduction technique
-
Pca (principle component analysis)
-
Scree plots
-
Eigen-decomposition approach
11. Ensemble Technique and its Types
-
Definition of ensemble techniques
-
Bagging technique
-
Bootstrap aggregation
-
Random forest (bagging technique)
-
Random forest repressor
-
Random forest classifier
12. Project
Module:7: Big Data Introduction
-
What is big data?
-
Big data application
-
Big data pipeline
Hadoop
-
Hadoop introduction
Spark
-
Spark introduction