Skip to content

A comprehensive collection of Machine Learning materials, tutorials, and practical implementations designed for learning and mastering ML fundamentals and advanced techniques.

Notifications You must be signed in to change notification settings

HariomSinghalPuri/Machine_Learning_Materials

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

15 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿค– ML_Materials

Work in Progress Python Jupyter License

โš ๏ธ Note: This project is currently under active development. New modules and content are being added regularly.

A comprehensive collection of Machine Learning materials, tutorials, and practical implementations designed for learning and mastering ML fundamentals and advanced techniques.

๐Ÿ“‹ Table of Contents

๐ŸŽฏ Overview

This repository serves as a complete learning resource for Machine Learning enthusiasts, covering everything from Python basics to advanced ML implementations. The materials are organized into comprehensive modules that build upon each other to provide a structured learning path.

What you'll learn:

  • Python programming fundamentals for ML
  • Data manipulation with NumPy and Pandas
  • Data visualization with Matplotlib and Seaborn
  • Data preprocessing techniques
  • Feature engineering and text processing
  • Real-world ML use cases and implementations

๐Ÿ“ Repository Structure

ML_materials/
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ requirements.txt
โ”œโ”€โ”€ CSV.ipynb
โ”œโ”€โ”€ JSON.ipynb
โ”œโ”€โ”€ Learn_Python.ipynb
โ”œโ”€โ”€ Rev_Arrays.ipynb
โ”œโ”€โ”€ Rev_Pandas.ipynb
โ”‚
โ”œโ”€โ”€ Module_1_Fundamentals/                    # โœ… Complete
โ”‚   โ”œโ”€โ”€ README.md
โ”‚   โ”œโ”€โ”€ 1_Learn_Python.ipynb
โ”‚   โ”œโ”€โ”€ 2_Numpy_ML.ipynb
โ”‚   โ”œโ”€โ”€ 3_Matplotlib_ML.ipynb
โ”‚   โ”œโ”€โ”€ 5_Pandas_Series_ML.ipynb
โ”‚   โ”œโ”€โ”€ 6_Pandas_DataFrame_ML.ipynb
โ”‚   โ”œโ”€โ”€ 7_Seaborn_ML.ipynb
โ”‚   โ””โ”€โ”€ datasets/
โ”‚       โ”œโ”€โ”€ batsman_runs_ipl.csv
โ”‚       โ”œโ”€โ”€ bollywood.csv
โ”‚       โ”œโ”€โ”€ data.csv
โ”‚       โ”œโ”€โ”€ data_for_Histograms.csv
โ”‚       โ”œโ”€โ”€ data_for_LinePlot.csv
โ”‚       โ”œโ”€โ”€ data_for_ScatterPlot.csv
โ”‚       โ”œโ”€โ”€ data_for_Timeseries.csv
โ”‚       โ”œโ”€โ”€ data_subplots.csv
โ”‚       โ”œโ”€โ”€ diabetes.csv
โ”‚       โ”œโ”€โ”€ fig1.png
โ”‚       โ”œโ”€โ”€ fig2.png
โ”‚       โ”œโ”€โ”€ ipl-matches.csv
โ”‚       โ”œโ”€โ”€ kohli_ipl.csv
โ”‚       โ”œโ”€โ”€ movies.csv
โ”‚       โ”œโ”€โ”€ Part_of_CSV_01.csv
โ”‚       โ”œโ”€โ”€ Part_of_CSV_01_with_no_index.csv
โ”‚       โ””โ”€โ”€ subs.csv
โ”‚
โ”œโ”€โ”€ Module_2_Preprocessing/               # โœ… Complete    
โ”‚   โ”œโ”€โ”€ 1_Importing_Datasets_through_Kaggle_API.ipynb
โ”‚   โ”œโ”€โ”€ 2_Handling_Missing_Values.ipynb
โ”‚   โ”œโ”€โ”€ 3_Data_Standardization.ipynb
โ”‚   โ”œโ”€โ”€ 4_Label_Encoding.ipynb
โ”‚   โ”œโ”€โ”€ 5_Train_Test_Split.ipynb
โ”‚   โ”œโ”€โ”€ 6_Handling_imbalanced_Dataset.ipynb
โ”‚   โ”œโ”€โ”€ 7_Feature_extraction_of_Text_data_using_Tf_idf_Vectorizer.ipynb
โ”‚   โ”œโ”€โ”€ 8_Numerical_Dataset_Pre_Processing_Use_Case.ipynb
โ”‚   โ”œโ”€โ”€ 9_Text_Data_Pre_Processing_Use_Case.ipynb
โ”‚   โ”œโ”€โ”€ ML_Use_Case_1_Rock_vs_Mine_Prediction.ipynb
โ”‚   โ”œโ”€โ”€ ML_Use_Case_2_Diabetes_Prediction.ipynb
โ”‚   โ”œโ”€โ”€ ML_Use_Case_3_Spam_Mail_Prediction_using_Machine_Learning.ipynb
โ”‚   โ”œโ”€โ”€ Dataset_Links.txt
โ”‚
โ”œโ”€โ”€ Module_3_Mathematical_Foundations/        # ๐Ÿšง In Progress
   โ”œโ”€โ”€ README.md
   โ”œโ”€โ”€ 1_Linear_Algebra_Part_1.ipynb
   โ”œโ”€โ”€ 2_Linear_Algebra_Part_2.ipynb
   โ”œโ”€โ”€ 3_Calculus_Part_1.ipynb
   โ”œโ”€โ”€ 4_Calculus_Part_2.ipynb
   โ”œโ”€โ”€ 5_Calculus_Part_3.ipynb
   โ”œโ”€โ”€ 6_Probability.ipynb
   โ”œโ”€โ”€ 7_Statistics.ipynb
.....Progresss.....

๐Ÿ“š Module 1: ML Fundamentals

Status: โœ… Complete
Focus: Building strong foundations in Python and data analysis libraries

๐Ÿ Core Learning Materials

Notebook Status Description Key Topics
1_Learn_Python.ipynb โœ… Python programming essentials Syntax, data structures, control flow
2_Numpy_ML.ipynb โœ… NumPy for numerical computing Arrays, vectorization, mathematical operations
3_Matplotlib_ML.ipynb โœ… Data visualization basics Plots, charts, customization
5_Pandas_Series_ML.ipynb โœ… Working with Pandas Series Data manipulation, indexing
6_Pandas_DataFrame_ML.ipynb โœ… DataFrame operations Data analysis, filtering, grouping
7_Seaborn_ML.ipynb โœ… Advanced statistical visualizations Statistical plots, styling

๐Ÿ“Š Practice Datasets

Real-world datasets for hands-on practice:

  • Sports Analytics: batsman_runs_ipl.csv, kohli_ipl.csv, ipl-matches.csv
  • Entertainment: bollywood.csv, movies.csv
  • Healthcare: diabetes.csv
  • Visualization Datasets: Various CSV files for different plot types
  • Sample Images: fig1.png, fig2.png for image processing examples

๐Ÿ”ง Module 2: Data Preprocessing & ML Use Cases

Focus: Advanced preprocessing techniques and practical ML implementations

๐Ÿ› ๏ธ Data Preprocessing Techniques

Notebook Status Technique Application
1_Importing_Datasets_through_Kaggle_API.ipynb โœ… Data acquisition Kaggle API integration
2_Handling_Missing_Values.ipynb โœ… Data cleaning Imputation strategies
3_Data_Standardization.ipynb โœ… Feature scaling Normalization, standardization
4_Label_Encoding.ipynb โœ… Categorical encoding One-hot, label encoding
5_Train_Test_Split.ipynb โœ… Data splitting Validation strategies
6_Handling_imbalanced_Dataset.ipynb โœ… Class balancing SMOTE, undersampling
7_Feature_extraction_of_Text_data_using_Tf_idf_Vectorizer.ipynb โœ… Text processing TF-IDF, feature extraction
8_Numerical_Dataset_Pre_Processing_Use_Case.ipynb โœ… End-to-end pipeline Complete numerical data workflow
9_Text_Data_Pre_Processing_Use_Case.ipynb โœ… Text preprocessing pipeline Complete text data workflow
Dataset_Links.txt โœ… Resource management Dataset source references
ML Use Case 1. Rock_vs_Mine_Prediction.ipynb โœ… Binary classification Sonar object detection
ML Use case 2. Diabetes_Prediction.ipynb โœ… Medical prediction Healthcare classification
ML Use Case 3. Spam_Mail_Prediction_using_Machine_Learning.ipynb โœ… Text classification Email filtering system

๐Ÿ“ Comprehensive Preprocessing Workflows

Workflow Status Focus Application
8_Numerical_Dataset_Pre_Processing_Use_Case.ipynb โœ… Complete numerical pipeline Feature selection, scaling, outlier handling
9_Text_Data_Pre_Processing_Use_Case.ipynb โœ… End-to-end text processing Tokenization, cleaning, vectorization

๐ŸŽฏ Real-World Use Cases

Project Status Domain Technique Accuracy Focus
๐Ÿชจ Rock vs Mine Prediction โœ… Defense/Marine Logistic Regression Sonar signal classification
๐Ÿฉบ Diabetes Prediction โœ… Healthcare Multiple algorithms Medical diagnosis support
๐Ÿ“ง Spam Mail Detection โœ… Cybersecurity NLP + Classification Email security

๐Ÿ“š Resource Files

File Status Purpose Content
Dataset_Links.txt โœ… Reference guide Curated dataset sources and URLs

๐Ÿ” Module 2 Learning Outcomes

By completing this module, you will:

  • Master essential data preprocessing techniques
  • Handle real-world data challenges (missing values, imbalanced datasets)
  • Implement feature engineering for both numerical and text data
  • Build complete ML pipelines from data acquisition to model evaluation
  • Apply ML to solve practical problems in healthcare, cybersecurity, and defense
  • Understand the importance of proper data splitting and validation
  • Work with external data sources through APIs

๐Ÿ“ˆ Technical Skills Covered

Data Preprocessing:

  • Missing value imputation strategies
  • Feature scaling and standardization
  • Categorical variable encoding
  • Handling imbalanced datasets with SMOTE
  • Text preprocessing and TF-IDF vectorization

Machine Learning Applications:

  • Binary classification problems
  • Multi-class classification
  • Text classification and NLP
  • Medical prediction systems
  • Security applications

Best Practices:

  • Proper train-test splitting
  • Cross-validation techniques
  • Feature selection methods
  • Model evaluation metrics
  • End-to-end pipeline development

๐Ÿงฎ Module 3: Mathematical Foundations for Machine Learning

Status: ๐Ÿšง In Progress
Focus: Essential mathematical concepts underlying machine learning algorithms

๐Ÿ“ Linear Algebra Fundamentals

Notebook Status Focus Area Key Concepts
1_Linear_Algebra_Part_1.ipynb โœ… Core tensor operations Scalars, vectors, matrices, tensor operations
2_Linear_Algebra_Part_2.ipynb โœ… Advanced matrix operations Eigendecomposition, SVD, PCA

๐Ÿ“Š Linear Algebra Part 1 - Core Concepts

Data Structures for Algebra:

  • Scalars (Rank 0 Tensors) in Python, PyTorch, TensorFlow
  • Vectors (Rank 1 Tensors) with NumPy operations
  • Vector norms (L1, L2, Max, Squared L2)
  • Matrices (Rank 2 Tensors) and higher-rank tensors
  • Orthogonal vectors and matrices

Common Tensor Operations:

  • Tensor transposition and arithmetic
  • Reduction operations and dot products
  • Solving linear systems
  • Matrix properties and operations

๐Ÿ” Linear Algebra Part 2 - Advanced Operations

Eigendecomposition:

  • Affine transformations and matrix applications
  • Eigenvectors and eigenvalues in multiple dimensions
  • Matrix determinants and eigendecomposition

Matrix Operations for ML:

  • Singular Value Decomposition (SVD)
  • Image compression applications
  • Moore-Penrose pseudoinverse
  • Principal Component Analysis (PCA)

๐Ÿ“ˆ Calculus for Machine Learning

Notebook Status Focus Area Key Concepts
3_Calculus_Part_1.ipynb โœ… Limits & derivatives Differentiation, automatic differentiation
4_Calculus_Part_2.ipynb โœ… Advanced calculus Partial derivatives, gradients, integrals
5_Calculus_Part_3.ipynb โœ… Symbolic computation SymPy library applications

๐Ÿ”ข Calculus Part 1 - Fundamentals

Limits & Derivatives:

  • Calculus of infinitesimals
  • Computing derivatives through differentiation
  • Automatic differentiation with PyTorch and TensorFlow

โšก Calculus Part 2 - ML Applications

Gradients for Machine Learning:

  • Partial derivatives of multivariate functions
  • Gradients of cost functions w.r.t. model parameters
  • Practical examples with cylinder volume calculations

Integrals:

  • Area under ROC curves
  • Integration applications in ML evaluation

๐Ÿ”ง Calculus Part 3 - Symbolic Math

SymPy Applications:

  • Symbolic mathematical computations
  • Advanced calculus operations
  • Mathematical modeling tools

๐ŸŽฒ Probability & Statistics

Notebook Status Focus Area Key Concepts
6_Probability.ipynb โœ… Probability theory & information Distributions, entropy, information theory
7_Statistics.ipynb โœ… Statistical analysis Frequentist & Bayesian statistics

๐ŸŽฏ Probability & Information Theory

Introduction to Probability:

  • Events, sample spaces, and probability combinations
  • Combinatorics and Law of Large Numbers
  • Expected value and measures of central tendency
  • Statistical measures: mean, median, mode, quantiles
  • Dispersion measures and correlation analysis

ML Distributions:

  • Uniform, Gaussian, and Central Limit Theorem
  • Log-normal, exponential, and Laplace distributions
  • Binomial, multinomial, and Poisson distributions
  • Mixture distributions and sampling techniques

Information Theory:

  • Shannon and differential entropy
  • Kullback-Leibler divergence
  • Cross-entropy applications

๐Ÿ“Š Statistical Analysis

Frequentist Statistics:

  • Central tendency and dispersion measures
  • Gaussian distribution and Central Limit Theorem
  • Statistical testing: z-scores, p-values, t-tests
  • ANOVA and correlation analysis
  • Multiple comparison corrections

Regression Analysis:

  • Linear least squares fitting
  • Ordinary least squares
  • Logistic regression fundamentals

Bayesian Statistics:

  • Bayes' theorem applications
  • Bayesian inference in ML

๐ŸŽ“ Module 3 Learning Outcomes

By completing this module, you will:

  • Master Linear Algebra: Understand tensors, matrix operations, and eigendecomposition
  • Apply Calculus: Use derivatives and gradients for optimization problems
  • Probability Mastery: Work with distributions and information theory
  • Statistical Analysis: Perform hypothesis testing and regression analysis
  • Mathematical ML: Connect mathematical concepts to machine learning applications
  • Tool Proficiency: Use NumPy, PyTorch, TensorFlow, and SymPy for mathematical computing

๐Ÿ”ฌ Technical Skills Covered

Linear Algebra:

  • Tensor operations and manipulations
  • Matrix decomposition techniques (SVD, eigendecomposition)
  • Principal Component Analysis (PCA)
  • Solving linear systems

Calculus:

  • Automatic differentiation
  • Gradient computation for optimization
  • Partial derivatives for multivariate functions
  • Symbolic mathematical computation

Probability & Statistics:

  • Statistical distributions and sampling
  • Hypothesis testing and confidence intervals
  • Bayesian inference
  • Information theory metrics
  • Regression analysis techniques

Programming Libraries:

  • NumPy: Numerical computations and linear algebra
  • PyTorch: Automatic differentiation and tensor operations
  • TensorFlow: Machine learning mathematical operations
  • SymPy: Symbolic mathematics and calculus

๐ŸŽจ Key Features

  • ๐Ÿ“– Comprehensive Documentation: Each notebook includes detailed explanations
  • ๐Ÿ”„ Progressive Learning: Concepts build upon previous knowledge
  • ๐Ÿ› ๏ธ Practical Examples: Real-world datasets and use cases
  • ๐Ÿ“Š Visualization Focus: Strong emphasis on data visualization
  • ๐Ÿ”ฌ Hands-on Practice: Interactive exercises and challenges
  • ๐ŸŽฏ Industry-Relevant: Current ML practices and techniques

๐Ÿ—บ๏ธ Roadmap

๐ŸŽฏ Planned Features (Coming Soon)

  • Module 4: Deep Learning Fundamentals
  • Module 5: MLOps and Model Deployment
  • Interactive web-based tutorials
  • Video explanations for complex concepts
  • Additional real-world projects

๐Ÿ“… Current Focus

  • Enhancing existing notebooks with more examples
  • Adding comprehensive documentation
  • Creating supplementary exercises
  • Improving code quality and best practices

๐Ÿ“Š Progress Tracking

Progress Progress Progress Progress

๐Ÿ“ Recent Updates

  • โœ… Added comprehensive data preprocessing notebooks
  • โœ… Implemented three real-world ML use cases
  • ๐Ÿšง Working on advanced feature engineering techniques
  • ๐Ÿ”„ Continuously improving documentation

Happy Learning! ๐Ÿš€

This repository is continuously updated with new materials and improvements. Check back regularly for the latest content!

Last Updated: August 2025

About

A comprehensive collection of Machine Learning materials, tutorials, and practical implementations designed for learning and mastering ML fundamentals and advanced techniques.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published