Data Science with Python
Syllabus for Data Science with Python
Module 1: Module 1: SQL (Postgre db)
Data Analysis using sql:
1. DDL
2. DML
3. DCL
4. DQL
5. Joins
6. Agg functions
7. Window functions
8. Cte -
Preparing data
Module 2: Python
1. Python Basic:
Introduction of python and comparison with other programming language
Installation of anaconda distribution and other python ide
Python objects, number & Booleans, strings
Container objects, mutability of objects
Operators - arithmetic, bitwise, comparison and assignment operators, operator’s precedence and associativity
Conditions (if else, if-elif-else), loops (while, for) -
Break and continue statement and range function
2. string objects:
basic data structure in python
String object basics
String inbuilt methods
Splitting and joining strings
String format functions
3. List Object Basics:
List methods
List as stack and queues
List comprehensions
4. Tuples, Sets, Dictionaries & its Function:
Dictionary object methods
Dictionary comprehensions
Dictionary view objects
Functions basics, parameter passing, iterators
Generator functions
Lambda functions
Map, reduce, filter functions
5. OOPs Concepts:
oops basic concepts.
Creating classes
Pillars of oops
Class methods and static methods
Special (magic/dunder) methods
Property decorators - getters, setters, and delete
6. Exception Handling and Difference between Exception and Error:
Exceptions handling with try-except
Custom exception handling
List of general use exception
Best practice exception handling
7. Interaction with database using python
8. Pandas:
Python pandas - series
Python pandas – data frame
Python pandas – panel
Python pandas - basic functionality
Reading data from different file system
Python pandas – re indexing python
Pandas – iteration
Python pandas – sorting.
Working with text data options & customization
Working with text data options & customization
Indexing & selecting
Data statistical functions
Python pandas - window functions
Python pandas - date functionality
Python pandas –time delta
Python pandas - categorical data
Python pandas – visualization
Python pandas – iotools
9. Python Numpy
Numpy - ND array object.
Numpy - data types. -
Numpy - array attributes.
Numpy - array creation routines.
Numpy - array from existing.
Data array from numerical ranges.
Numpy - indexing & slicing.
Numpy – advanced indexing.
Numpy – broadcasting.
Numpy - iterating over array.
Numpy - array manipulation.
Numpy - string functions.
Numpy - mathematical functions.
Numpy - arithmetic operations.
Numpy - statistical functions. Sort, search & counting functions.
Numpy - byte swapping.
Numpy - copies &views.
Numpy - matrix library.
Numpy - linear algebra
10. Visualization
Module 3: Maths
1. Statistics Basic
Introduction to basic statistics terms
Types of statistics
Types of data
Levels of measurement
Measures of central tendency
Measures of dispersion
Random variables
Covariance and correlation
2. Probability Distribution Function
Probability density/distribution function
Types of the probability distribution
Binomial distribution
Poisson distribution
Normal distribution (Gaussian distribution)
Probability density function and mass function
Cumulative density function
Examples of normal distribution
Bernoulli distribution
Uniform distribution
Z stats
Central limit theorem
3. Statistics Advance
a Hypothesis
Hypothesis testing’s mechanism
Student t distribution
T-stats vs. Z-stats: overview
When to use a t-tests vs. Z-tests
Type 1 & type 2 error
Bayes statistics (Bayes theorem)
Confidence interval(ci)
Confidence intervals and the margin of error
Interpreting confidence levels and confidence intervals
Chi-square test
Chi-square distribution using python
Chi-square for goodness of fit test
When to use which statistical distribution?
Analysis of variance (anova)
Assumptions to use anova
Anova three type
Partitioning of variance in the anova
Calculating using python
F-test (variance ratio test)
Determining the values of f
F distribution using python
Module:4: Power BI
Power BI introduction and overview
Key Benefits of Power BI
Power BI Architecture
Power BI Process
Components of Power BI
Power BI - Building Blocks
Power BI vs other BI tools
Power Installation
Overview of Power BI Desktop
Data Sources in Power BI Desktop
Connecting to a data Sources
Query Editor in Power BI
Views in Power BI
Field Pane
Visual Pane
Custom Visual Option
Introduction to using Excel data in Power BI
Exploring live connections to data with Power BI
Connecting directly to SQL Azure, HD Spark, SQL Server Analysis Services/ My SQL
Import Power View and Power Pivot to Power BI
Power BI Publisher for Excel
Content packs
Introducing Power BI Mobile
Power Query Introduction
Query Editor Interface
Clean and Transform your data with Query Editor
Data Type
Column Transformations vs Adding Colums
Text Transformations
Cleaning irregularly formatted data -Transpose
Date and Time Calculations
Advance editor: Use Case
Query Level Parameters
Combining Data – Merging and Appending
Data Modelling
Calculated Columns
Measures/New Quick Measures
Calculated Tables
Optimizing Data Models
Row Context vs Set Context
Cross Filter Direction
Manage Data Relationship
Why is DAX important?
Advanced calculations using Calculate functions
DAX queries
DAX Parameter Naming
Time Intelligence Functions
Types of visualization in a Power BI report
Custom visualization to a Power BI report
Matrixes and tables
Getting started with color formatting and axis properties
Change how a chart is sorted in a Power BI report
Move, resize, and pop out a visualization in a Power BI report
Drill down in a visualization in Power BI
Module:5: Tableau
What is Tableau
Why Tableau
Tableau vs Power BI vs Qlik
Features of Tableau
Tableau Architecture
Tableau Installation
Tableau Product Line
Advantages of Tableau
How to use Tableau
Connecting to Data
Tableau Datatypes
Tableau Architecture
Heat Maps
Column Chart
Horizontal Bar Chart
Stacked Column Chart
Stacked Bar Chart
Keep Only,Exclude
Keep Only, Exclude2_Normal
Publish to Tableau Public
Funnel Chart
Advanced Funnel Chart
Dumbell Chart
Donut Chart
Multiple Donut Chart
Module:6: Machine Learning
1. Introduction to Machine Learning:
Introduction to basic statistics terms
Types of statistics
Types of data
Levels of measurement
Measures of central tendency
Measures of dispersion
Random variables
Covariance and correlation
2. Feature Engineering
Handling missing data
Handling imbalanced data
Data interpolation
Handling outliers
Filter method
Wrapper method
Embedded methods
Feature scaling
Mean normalization
Min-max scaling
Unit vector
Feature extraction
Pca (principle component analysis)
Data encoding
Nominal encoding
One hot encoding
One hot encoding with multiple categories
Mean encoding
Ordinal encoding
Label encoding
Target guided ordinal encoding
Correlation check
Pearson correlation coefficient
Spearman’s rank correlation
3. Feature Selection
Feature selection
Recursive feature elimination
Backward elimination
Forward elimination
4. Regression
Linear regression
Gradient descent
Multiple linear regression
Polynomial regression
R square and adjusted r square
Rmse , mse, mae comparison
Regularized linear models
Ridge regression
Lasso regression
Elastic net
5. Logistics Regression
Logistics regression in-depth intuition
In-depth mathematical intuition
In-depth geometrical intuition
Hyper parameter tuning
Grid search cv
Randomize search cv
Data leakage
Confusion matrix
Precision,recall,f1 score ,roc, auc
Best metric selection
Multiclass classification in lr
6. Decision Tree
Decision tree classifier
In-depth mathematical intuition
In-depth geometrical intuition
Confusion matrix
Precision, recall,f1 score ,roc, auc
Best metric selection
Decision tree repressor
In-depth mathematical intuition
In-depth geometrical intuition
Performance metrics
7. Naive Bayes
Bayes theorem
Multinomial naïve Bayes
Gaussian naïve Bayes
Various type of Bayes theorem and its intuition
Confusion matrix
precision ,recall,f1 score ,roc, auc
Best metric selection
8. KNN
Knn classifier
Knn repressor
Variants of knn
Brute force knn
K-dimension tree
Ball tree
9. Clustering
Clustering and their types
K-means clustering
Batch k-means
Hierarchical clustering
Evaluation of clustering
Homogeneity, completeness and v-measure
Silhouette coefficient
Davies-bouldin index
Contingency matrix
Pair confusion matrix
Extrinsic measure
Intrinsic measure
10. Dimensionality Reduction
The curse of dimensionality
Dimensionality reduction technique
Pca (principle component analysis)
Scree plots
Eigen-decomposition approach
11. Ensemble Technique and its Types
Definition of ensemble techniques
Bagging technique
Bootstrap aggregation
Random forest (bagging technique)
Random forest repressor
Random forest classifier
12. Project
Module:7: Big Data Introduction
What is big data?
Big data application
Big data pipeline
Hadoop introduction
Spark introduction