Stock Predictions with Airflow, Prophet, wandb, and GCP

This project demonstrates how to orchestrate the training and evaluation of stock price prediction models using Facebook Prophet, Apache Airflow, Weights & Biases (wandb), and Google Cloud Storage (GCS).

Features

Automated Data Download: Fetches historical stock data from Yahoo Finance.
Model Training: Trains Prophet models for multiple tickers.
Evaluation & Logging: Evaluates models, logs metrics and plots to wandb.
Model Versioning: Uploads models and metadata to GCS.
Orchestration: Uses Airflow to schedule and manage model training workflows.
Dockerized: All components run in containers for reproducibility.

Airflow Runs

Weights & Bias model runs

GCS Model Bucket

Project Structure

.
├── airflow/
│   ├── dags/
│   │   └── stock_prediction_dag.py
│   ├── Dockerfile
│   ├── docker-compose.yaml
│   └── requirements.txt
├── src/
│   └── train_model.py
├── requirements.txt
├── .env
├── gc-auth.json
└── README.md

Getting Started

1. Prerequisites

Docker
Docker Compose
Google Cloud service account with access to your GCS bucket
wandb account and API key

2. Environment Variables

Create a .env file in the project root with the following variables:

GCS_BUCKET_NAME=your-bucket-name
PROJECT_ID=your-gcp-project-id
GOOGLE_APPLICATION_CREDENTIALS=/app/gc-auth.json
WANDB_API_KEY=your-wandb-api-key
MODELS_DIR=models
BEST_MODEL_DIR=models/best
MODEL_RUNS_DIR=models/runs

3. GCP Credentials Setup

Download your GCP service account JSON and rename it to gc-auth.json
Place gc-auth.json in the project root (same directory as .env)
Ensure the file has the correct permissions (readable by the Docker container)

4. Build and Start Airflow

Important: Run docker compose from the airflow/ directory:

cd airflow
docker compose up --build

The Airflow UI will be available at http://localhost:8080.
The first time you run Airflow in standalone mode, it will print the admin credentials in the logs.

5. Accessing Airflow UI

Check the logs for the admin password:

docker compose logs airflow

Look for a line like:

Admin user created with username: admin and password: <random_password>

Login with these credentials.

Usage

The Airflow DAG (stock_forecast_daily) will run every 15 minutes by default for testing.
Model artifacts and metrics are logged to wandb and GCS.
You can customize tickers and DAG schedule in airflow/dags/stock_prediction_dag.py.

Default Configuration:

Tickers: AAPL, MSFT, TSLA, TWLO
Schedule: Every 15 minutes (*/15 * * * *)
Training Period: 1 year of historical data
Forecast Period: 14 days

Development

Python Environment Setup

Set Python Version with pyenv
```
pyenv install 3.11.8
pyenv local 3.11.8
```

Create and Activate Virtual Environment

python -m venv venv
source venv/bin/activate  # On Unix/macOS
# OR
.\venv\Scripts\activate  # On Windows

Verify Python Version

python --version  # Should output Python 3.11.8

To run the training script locally:

pip install -r requirements.txt
python src/train_model.py --ticker AAPL --period 1y --forecast_periods 14

Troubleshooting

Common Issues

"no configuration file provided: not found"
- Solution: Make sure you're running docker compose from the airflow/ directory, not the project root.
"FileNotFoundError: No such file or directory: gc-auth.json"
- Solution: Ensure gc-auth.json is in the project root and the path in .env is /app/gc-auth.json.
"DAG.init() got an unexpected keyword argument 'schedule_interval'"
- Solution: This is fixed in the current version. The DAG uses schedule instead of schedule_interval for Airflow 3.0+.
"PermissionError: Permission denied" with wandb artifacts
- Solution: The script now handles this gracefully. Models are still saved to GCS even if wandb artifacts fail.
Environment variables not being passed to tasks
- Solution: Check that your .env file is in the project root and the docker-compose.yaml references ../.env.

Debugging

The DAG includes a debug task that prints all environment variables. Check the debug_env_vars task logs to verify:

Environment variables are set correctly
File paths are accessible
Credentials are properly mounted

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
airflow		airflow
assets		assets
src		src
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Stock Predictions with Airflow, Prophet, wandb, and GCP

Features

Airflow Runs

Weights & Bias model runs

GCS Model Bucket

Project Structure

Getting Started

1. Prerequisites

2. Environment Variables

3. GCP Credentials Setup

4. Build and Start Airflow

5. Accessing Airflow UI

Usage

Development

Python Environment Setup

Troubleshooting

Common Issues

Debugging

License

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

didiergarcia/stock-predictions

Folders and files

Latest commit

History

Repository files navigation

Stock Predictions with Airflow, Prophet, wandb, and GCP

Features

Airflow Runs

Weights & Bias model runs

GCS Model Bucket

Project Structure

Getting Started

1. Prerequisites

2. Environment Variables

3. GCP Credentials Setup

4. Build and Start Airflow

5. Accessing Airflow UI

Usage

Development

Python Environment Setup

Troubleshooting

Common Issues

Debugging

License

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages