This dataset contains customer data with the following columns:
- Email: Customer email addresses, uniquely identifying users.
- Address: Residential address of the customers.
- Avatar: Favorite color or theme for a user’s online profile representation.
- Avg. Session Length: Average duration (in minutes) customers spend per session on the app.
- Time on App: Average daily time (in minutes) spent on the app.
- Time on Website: Average daily time (in minutes) spent on the company’s website.
- Length of Membership: How many years the customer has been a member.
- Yearly Amount Spent: Total money spent by the customer in a year (dependent variable).
The goal of this analysis is to predict the Yearly Amount Spent based on other factors such as average session length, time on app, time on website, and length of membership. By understanding the key drivers of spending, the company can better target customers, enhance the user experience, and optimize resource allocation to boost revenue and customer satisfaction.
Pick the option that best fits your comfort level. Both non-technical and technical paths produce the same results.
- Download this project as a ZIP:
- On this repository page, click the green “Code” button > “Download ZIP”.
- Unzip the file on your computer.
- Open Google Colab in your browser: https://colab.research.google.com
- In Colab, go to File > Upload notebook and select
Ecommerce_LinearRegression.ipynb. - Upload the dataset:
- In the left sidebar (Files tab), click “Upload” and select
Ecommerce_Customers.csv.
- In the left sidebar (Files tab), click “Upload” and select
- If needed, update the data path in the first data-loading cell to:
df = pd.read_csv('Ecommerce_Customers.csv')
- Run all cells:
- Runtime > Run all.
- View results:
- The notebook will walk you through exploratory analysis, model training, and performance reporting (RMSE).
Tips:
- Keep the notebook (
.ipynb) and the CSV in the same working directory in Colab to avoid path issues. - If prompted to install packages, follow the notebook instructions.
Prerequisites:
- Python 3.9+ installed
- pip installed
-
Clone or download the repository:
git clone https://github.com/vinh2155/E-commerce-Model.git cd E-commerce-ModelOr download ZIP and extract, then open a terminal in the extracted folder.
-
(Optional but recommended) Create and activate a virtual environment:
python -m venv .venv # Windows .venv\Scripts\activate # macOS/Linux source .venv/bin/activate
-
Install dependencies:
pip install jupyter pandas numpy scikit-learn matplotlib seaborn
-
Launch Jupyter and open the notebook:
jupyter notebook
Then open
Ecommerce_LinearRegression.ipynb. -
Ensure the data file is present in the same directory:
Ecommerce_Customers.csvIf your path differs, update the data-loading cell accordingly:
df = pd.read_csv('path/to/Ecommerce_Customers.csv')
-
Run all cells (Kernel > Restart & Run All) to reproduce the analysis and metrics.
Ecommerce_LinearRegression.ipynb— Main notebook with EDA, feature engineering, model training, and evaluation.Ecommerce_Customers.csv— Dataset used in the analysis.README.md— Project description and run instructions.
- Model: Linear Regression
- Metric: RMSE = 10.19 (2.04% of dataset mean), indicating strong predictive performance for yearly spending.
- File not found errors: Confirm the CSV filename and that it’s in the same directory as the notebook (or update the path).
- Package errors: Re-run
pip install ...commands, or restart the notebook/kernel after installing. - Colab file resets: If the Colab runtime disconnects, re-upload the CSV and re-run cells.