top of page

Data Science with Python

Syllabus for Data Science with Python

Module 1:  Module 1: SQL (Postgre db)
  • Data Analysis using sql:
    1. DDL
    2. DML
    3. DCL
    4. DQL
    5. Joins
    6. Agg functions
    7. Window functions
    8. Cte

  • Preparing data

Module 2: Python

1. Python Basic:

  • Introduction of python and comparison with other programming language

  • Installation of anaconda distribution and other python ide

  • Python objects, number & Booleans, strings

  • Container objects, mutability of objects  

  • Operators - arithmetic, bitwise, comparison and assignment operators, operator’s precedence and associativity 
    Conditions (if else, if-elif-else), loops (while, for)

  • Break and continue statement and range function

2. string objects:

  • basic data structure in python

  • String object basics

  • String inbuilt methods

  • Splitting and joining strings

  • String format functions

3. List Object Basics:

  • List methods

  • List as stack and queues 

  • List comprehensions

4. Tuples, Sets, Dictionaries & its Function:

  • Dictionary object methods

  • Dictionary comprehensions 

  • Dictionary view objects

  • Functions basics, parameter passing, iterators

  • Generator functions

  • Lambda functions

  • Map, reduce, filter functions

5. OOPs Concepts:

  • oops basic concepts.

  • Creating classes

  • Pillars of oops 

  • Inheritance

  • Polymorphism

  • Encapsulation

  • Abstraction

  • Decorator

  • Class methods and static methods

  • Special (magic/dunder) methods

  • Property decorators - getters, setters, and delete

6. Exception Handling and Difference between Exception and Error:

  • Exceptions handling with try-except

  • Custom exception handling

  • List of general use exception

  • Best practice exception handling

7. Interaction with database using python

8. Pandas:

  • Python pandas - series

  • Python pandas – data frame 

  • Python pandas – panel 

  • Python pandas - basic functionality

  • Reading data from different file system 

  • Python pandas – re indexing python 

  • Pandas – iteration 

  • Python pandas – sorting. 

  • Working with text data options & customization

  • Working with text data options & customization

  • Indexing & selecting  

  • Data statistical functions

  • Python pandas - window functions 

  • Python pandas - date functionality 

  • Python pandas –time delta

  • Python pandas - categorical data

  • Python pandas – visualization

  • Python pandas – iotools

9. Python Numpy

  • Numpy - ND array object.
    Numpy - data types.

  • Numpy - array attributes. 

  • Numpy - array creation routines. 

  • Numpy - array from existing. 

  • Data array from numerical ranges. 

  • Numpy - indexing & slicing. 

  • Numpy – advanced indexing.

  • Numpy – broadcasting. 

  • Numpy - iterating over array.

​ ​ 

  • Numpy - array manipulation.

  • Numpy - string functions.

  • Numpy - mathematical functions.

  • Numpy - arithmetic operations.

  • Numpy - statistical functions. Sort, search & counting functions.

  • Numpy - byte swapping. 

  • Numpy - copies &views.

  • Numpy - matrix library.

  • Numpy - linear algebra

10. Visualization

  • Matplotlib 

  • Seaborn  

  • Cufflinks  

  • Plotly  

  • Bokeh

Module 3: Maths

1. Statistics Basic

  • Introduction to basic statistics terms 

  • Types of statistics 

  • Types of data 

  • Levels of measurement 

  • Measures of central tendency 

  • Measures of dispersion 

  • Random variables 

  • Set 

  • Skewness 

  • Covariance and correlation

2. Probability Distribution Function

  • Probability density/distribution function

  • Types of the probability distribution 

  • Binomial distribution 

  • Poisson distribution 

  • Normal distribution (Gaussian distribution)

  • Probability density function and mass function 

  • Cumulative density function 

  • Examples of normal distribution 

  • Bernoulli distribution 

  • Uniform distribution 

  • Z stats 

  • Central limit theorem

  • Estimation

3. Statistics Advance

  • a Hypothesis

  • Hypothesis testing’s mechanism

  • P-value

  • T-stats

  • Student t distribution

  • T-stats vs. Z-stats: overview

  •  When to use a t-tests vs. Z-tests  

  • Type 1 & type 2 error

  • Bayes statistics (Bayes theorem)

  • Confidence interval(ci)

  •  Confidence intervals and the margin of error

  • Interpreting confidence levels and confidence intervals

  • Chi-square test

  •  Chi-square distribution using python

  •  Chi-square for goodness of fit test

  • When to use which statistical distribution?  

  • Analysis of variance (anova)

  •  Assumptions to use anova

  •  Anova three type

  • Partitioning of variance in the anova

  • Calculating using python

  • F-distribution

  •  F-test (variance ratio test)

  • Determining the values of f

  • F distribution using python

Module:4: Power BI
  • Power BI introduction and overview

  •  Key Benefits of Power BI

  •  Power BI Architecture

  •  Power BI Process

  •  Components of Power BI

  •  Power BI - Building Blocks

  •  Power BI vs other BI tools

  •  Power Installation

  •  Overview of Power BI Desktop

  •  Data Sources in Power BI Desktop

  • Connecting to a data Sources

  • Query Editor in Power BI

  • Views in Power BI

  • Field Pane

  • Visual Pane

  • Custom Visual Option

  • Filters

  • Introduction to using Excel data in Power BI

  • Exploring live connections to data with Power BI

  • Connecting directly to SQL Azure, HD Spark, SQL Server Analysis Services/ My SQL

  • Import Power View and Power Pivot to Power BI

  • Power BI Publisher for Excel

  • Content packs

  • Introducing Power BI Mobile

  • Power Query Introduction

  • Query Editor Interface

  • Clean and Transform your data with Query Editor

  • Data Type

  • Column Transformations vs Adding Colums

  • Text Transformations

  • Cleaning irregularly formatted data -Transpose

  • Date and Time Calculations

  • Advance editor: Use Case

  • Query Level Parameters

  • Combining Data – Merging and Appending

  • Data Modelling

  • Calculated Columns

  •  Measures/New Quick Measures

  •  Calculated Tables

  •  Optimizing Data Models

  •  Row Context vs Set Context

  •  Cross Filter Direction

  •  Manage Data Relationship

  •  Why is DAX important?

  •  Advanced calculations using Calculate functions

  •  DAX queries

  •  DAX Parameter Naming

  •  Time Intelligence Functions

  •  Types of visualization in a Power BI report

  •  Custom visualization to a Power BI report

  •  Matrixes and tables

  •  Getting started with color formatting and axis properties

  •  Change how a chart is sorted in a Power BI report

  •  Move, resize, and pop out a visualization in a Power BI report

  •  Drill down in a visualization in Power BI

Module:5: Tableau
  • What is Tableau      

  • Why Tableau

  • Tableau vs Power BI vs Qlik

  • Features of Tableau

  • Tableau Architecture

  • Tableau Installation

  • Tableau Product Line

  • Advantages of Tableau

  • How to use Tableau

  • Connecting to Data

  • Tableau Datatypes

  • Tableau Architecture

  • Heat Maps

  • Column Chart

  • Horizontal Bar Chart

  • Stacked Column Chart

  • Stacked Bar Chart

  • Keep Only,Exclude

  • Keep Only, Exclude2_Normal

  • Publish to Tableau Public

  • Funnel Chart

  • Advanced Funnel Chart

  • Calendar

  • Dumbell Chart

  • Donut Chart

  • Multiple Donut Chart

Module:6: Machine Learning

1. Introduction to Machine Learning:

  • Introduction to basic statistics terms 

  • Types of statistics 

  • Types of data 

  • Levels of measurement 

  • Measures of central tendency 

  • Measures of dispersion 

  • Random variables 

  • Set 

  • Skewness 

  • Covariance and correlation

2. Feature Engineering

  • Handling missing data

  • Handling imbalanced data

  • Up-sampling

  • Down-sampling

  • Smote

  • Data interpolation

  • Handling outliers

  • Filter method

  • Wrapper method

  • Embedded methods

  • Feature scaling

  • Standardization

  • Mean normalization

  • Min-max scaling

  • Unit vector

  • Feature extraction

  • Pca (principle component analysis)

  • Data encoding

  • Nominal encoding

  • One hot encoding

  • One hot encoding with multiple categories

  • Mean encoding

  • Ordinal encoding

  • Label encoding

  • Target guided ordinal encoding

  • Covariance

  • Correlation check

  • Pearson correlation coefficient

  • Spearman’s rank correlation

  • Vif

3. Feature Selection

  • Feature selection

  • Recursive feature elimination

  • Backward elimination

  • Forward elimination

4. Regression

  • Linear regression

  • Gradient descent

  • Multiple linear regression

  • Polynomial regression

  • R square and adjusted r square

  • Rmse , mse, mae comparison

  • Regularized linear models

  • Ridge regression

  • Lasso regression

  • Elastic net

5. Logistics Regression

  • Logistics regression in-depth intuition

  • In-depth mathematical intuition

  • In-depth geometrical intuition

  • Hyper parameter tuning

  • Grid search cv

  • Randomize search cv

  • Data leakage

  • Confusion matrix

  • Precision,recall,f1 score ,roc, auc

  • Best metric selection

  • Multiclass classification in lr

6. Decision Tree

  • Decision tree classifier

  • In-depth mathematical intuition

  • In-depth geometrical intuition

  • Confusion matrix

  • Precision, recall,f1 score ,roc, auc

  • Best metric selection

  • Decision tree repressor

  • In-depth mathematical intuition

  • In-depth geometrical intuition

  • Performance metrics

7. Naive Bayes

  • Bayes theorem

  • Multinomial naïve Bayes

  • Gaussian naïve Bayes

  • Various type of Bayes theorem and its intuition

  • Confusion matrix

  • precision ,recall,f1 score ,roc, auc

  • Best metric selection

8. KNN

  • Knn classifier

  • Knn repressor

  • Variants of knn

  • Brute force knn

  • K-dimension tree

  • Ball tree

9. Clustering

  • Clustering and their types

  • K-means clustering

  • K-means++

  • Batch k-means

  • Hierarchical clustering

  • Dbscan

  • Evaluation of clustering

  • Homogeneity, completeness and v-measure

  • Silhouette coefficient

  • Davies-bouldin index

  • Contingency matrix

  • Pair confusion matrix

  • Extrinsic measure

  • Intrinsic measure

10. Dimensionality Reduction

  • The curse of dimensionality

  • Dimensionality reduction technique

  • Pca (principle component analysis)

  • Scree plots

  • Eigen-decomposition approach

11. Ensemble Technique and its Types

  • Definition of ensemble techniques

  • Bagging technique

  • Bootstrap aggregation

  • Random forest (bagging technique)

  • Random forest repressor

  • Random forest classifier

12. Project

Module:7: Big Data Introduction
  • What is big data?

  • Big data application

  • Big data pipeline

Hadoop

  • Hadoop introduction

Spark

  • Spark introduction

bottom of page