Skip to content

mohamedkhalifa9/Machin-Learing-Project--Bike-Sharing-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MLDA Project 1: Bikesharing Demand Prediction

Predict daily bike rentals in the Capital Bikeshare system (Washington D.C.) using historical data. Accurate forecasts help optimize bike availability, reduce costs, and improve user satisfaction by accounting for weather, seasonality, and temporal factors.

Motivation and Problem Statement

Urban bikesharing systems face challenges in managing bike distribution amid variable demand. The goal is to build a regression model that estimates daily rental counts (cnt) based on features such as temperature, humidity, weather conditions, seasonality, and holidays.

Dataset

Source: Public Kaggle dataset (daily records from 2011–2012)
Size: 731 records, 16 features
Key features: season, yr, mnth, holiday, weekday, workingday, weathersit, temp, atemp, hum, windspeed, casual, registered
Target: cnt (total rentals = casual + registered)

Approach

1) Data Analytics

Inspected structure, checked duplicates/missing values , visualized distributions (histograms, boxplots), scatter matrices, correlation analysis to identify key predictors (e.g., strong positive correlation with temp and atemp)

2) Preprocessing

One-hot encoding: season, weathersit, and other categorical variables
Feature scaling/normalization
Train/test split

3) Modeling

Implemented and compared multiple regression techniques:

Linear models:
Linear Regression
Ridge
Lasso
Polynomial Regression (degrees 2–3)
Partial Least Squares (PLS)

Ensemble methods:
Random Forest
XGBoost
Hyperparameter tuning via GridSearchCV with 5-fold cross-validation.

4) Evaluation

Metrics: RMSE, MAE, R² Experiments with categorical encodings to assess impact on performance

Results

Best model: XGBoost (without additional categorical adjustments) Test R²: 0.896 RMSE: ~645

Takeaways:

XGBoost captured non-linear patterns effectively All models showed moderate performance, limited by dataset size.
Opportunities: expand data, apply time-series modeling

Skills Demonstrated

Data Analysis & Visualization:

Pandas, seaborn, matplotlib (correlation matrices, histograms, scatter plots)

Machine Learning:

Regression modeling, feature engineering (polynomial features, scaling), hyperparameter tuning (GridSearchCV), cross-validation (KFold), evaluation (MSE, RMSE, MAE, R²)

Advanced Techniques:

Regularization (Ridge/Lasso), dimensionality reduction (PLS), ensemble learning (Random Forest, XGBoost)

Tools:

Python (scikit-learn, XGBoost, NumPy), Jupyter Notebook for reproducible workflows

Problem-Solving:

Addressed overfitting, multicollinearity, non-linearity; proposed temporal modeling for weather dependencies

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors