This project demonstrates how to orchestrate the training and evaluation of stock price prediction models using Facebook Prophet, Apache Airflow, Weights & Biases (wandb), and Google Cloud Storage (GCS).
- Automated Data Download: Fetches historical stock data from Yahoo Finance.
- Model Training: Trains Prophet models for multiple tickers.
- Evaluation & Logging: Evaluates models, logs metrics and plots to wandb.
- Model Versioning: Uploads models and metadata to GCS.
- Orchestration: Uses Airflow to schedule and manage model training workflows.
- Dockerized: All components run in containers for reproducibility.
.
├── airflow/
│ ├── dags/
│ │ └── stock_prediction_dag.py
│ ├── Dockerfile
│ ├── docker-compose.yaml
│ └── requirements.txt
├── src/
│ └── train_model.py
├── requirements.txt
├── .env
├── gc-auth.json
└── README.md
- Docker
- Docker Compose
- Google Cloud service account with access to your GCS bucket
- wandb account and API key
Create a .env file in the project root with the following variables:
GCS_BUCKET_NAME=your-bucket-name
PROJECT_ID=your-gcp-project-id
GOOGLE_APPLICATION_CREDENTIALS=/app/gc-auth.json
WANDB_API_KEY=your-wandb-api-key
MODELS_DIR=models
BEST_MODEL_DIR=models/best
MODEL_RUNS_DIR=models/runs- Download your GCP service account JSON and rename it to
gc-auth.json - Place
gc-auth.jsonin the project root (same directory as.env) - Ensure the file has the correct permissions (readable by the Docker container)
Important: Run docker compose from the airflow/ directory:
cd airflow
docker compose up --build- The Airflow UI will be available at http://localhost:8080.
- The first time you run Airflow in standalone mode, it will print the admin credentials in the logs.
Check the logs for the admin password:
docker compose logs airflowLook for a line like:
Admin user created with username: admin and password: <random_password>
Login with these credentials.
- The Airflow DAG (
stock_forecast_daily) will run every 15 minutes by default for testing. - Model artifacts and metrics are logged to wandb and GCS.
- You can customize tickers and DAG schedule in
airflow/dags/stock_prediction_dag.py.
Default Configuration:
- Tickers: AAPL, MSFT, TSLA, TWLO
- Schedule: Every 15 minutes (
*/15 * * * *) - Training Period: 1 year of historical data
- Forecast Period: 14 days
-
Set Python Version with pyenv
pyenv install 3.11.8 pyenv local 3.11.8 -
Create and Activate Virtual Environment
python -m venv venv source venv/bin/activate # On Unix/macOS # OR .\venv\Scripts\activate # On Windows
-
Verify Python Version
python --version # Should output Python 3.11.8
- To run the training script locally:
pip install -r requirements.txt python src/train_model.py --ticker AAPL --period 1y --forecast_periods 14
-
"no configuration file provided: not found"
- Solution: Make sure you're running
docker composefrom theairflow/directory, not the project root.
- Solution: Make sure you're running
-
"FileNotFoundError: No such file or directory: gc-auth.json"
- Solution: Ensure
gc-auth.jsonis in the project root and the path in.envis/app/gc-auth.json.
- Solution: Ensure
-
"DAG.init() got an unexpected keyword argument 'schedule_interval'"
- Solution: This is fixed in the current version. The DAG uses
scheduleinstead ofschedule_intervalfor Airflow 3.0+.
- Solution: This is fixed in the current version. The DAG uses
-
"PermissionError: Permission denied" with wandb artifacts
- Solution: The script now handles this gracefully. Models are still saved to GCS even if wandb artifacts fail.
-
Environment variables not being passed to tasks
- Solution: Check that your
.envfile is in the project root and the docker-compose.yaml references../.env.
- Solution: Check that your
The DAG includes a debug task that prints all environment variables. Check the debug_env_vars task logs to verify:
- Environment variables are set correctly
- File paths are accessible
- Credentials are properly mounted
MIT License


