Enterprise-grade automated data extraction system for Indian financial markets via Upstox API. Built for algorithmic traders, quantitative analysts, and financial researchers who demand reliable, high-performance market data.
β οΈ PRE-RELEASE VERSION - This is a beta version currently in testing. While fully functional, please report any issues and use with caution in production environments.
- Manual data extraction from Upstox is time-consuming and error-prone
- Rate limits make large-scale data collection challenging
- Inconsistent data formatting requires extensive post-processing
- Missing data points can invalidate entire backtesting strategies
- Complex API management for options and futures data
A production-ready system that transforms months of development into minutes of setup, delivering clean, reliable market data with zero manual intervention.
- Index Data: Nifty 50, Bank Nifty, Sensex, Nifty IT, Nifty Pharma
- Options Chain: Complete Call/Put data with strike prices and expiry dates
- Futures Data: Index futures with intelligent expiry management
- Historical OHLC: Open, High, Low, Close with volume and open interest
- 4x Faster Extraction: Multi-token architecture (7,200 req/30min vs 1,800)
- Smart Token Rotation: Automatic switching when rate limits approached
- HTTPx Optimization: Connection pooling for 20-30% speed improvement
- Zero Data Loss: Conservative rate limiting prevents API errors
- Batch Processing: Efficient handling of large datasets
- Professional Rate Limiting: FlowGuard integration (15 req/sec, 200 req/min)
- Automatic Retry Logic: Intelligent retry with exponential backoff
- Error Recovery: Graceful handling of network issues and API errors
- Data Validation: Multiple integrity checks ensure accuracy
- Comprehensive Logging: Detailed progress and error reporting
- Auto-Expiry Detection: Fetches current futures expiry dates from API
- Monthly/Weekly Prioritization: Prefers monthly expiries for better liquidity
- Date Range Intelligence: Automatically calculates optimal periods
- Trading Hours Filter: Market hours only (09:15 - 15:30)
- Chronological Sorting: All data automatically ordered oldest to newest
- Standardized CSV Format: Consistent columns across all extractors
- Whole Number Formatting: Clean integer values (no decimals)
- Complete Coverage: Ensures no gaps in requested date ranges
- Metadata Files: JSON files with expiry dates and instrument keys
- Ready for Analysis: Import directly into pandas, Excel, or trading platforms
- Python 3.7 or higher
- Upstox API account with valid access tokens
- Basic familiarity with command line
# Clone the repository
git clone https://github.com/yourusername/Upstox-Data-Downloader.git
cd Upstox-Data-Downloader
# Install dependencies
pip install -r requirements.txt
Edit upstox_tokens.py
with your Upstox API tokens:
# Single token for basic usage
PRIMARY_ACCESS_TOKEN = 'your_primary_token_here'
# Multiple tokens for 4x faster extraction
MULTI_ACCESS_TOKENS = [
'your_token_1',
'your_token_2',
'your_token_3',
'your_token_4'
]
# Verify token configuration
python test_tokens.py
# Start with Index data (fastest, good for testing)
python upstox_index_extractor.py
# Then Options data (moderate complexity)
python upstox_options_extractor.py
# Finally Futures data (most complex)
python upstox_futures_extractor.py
All extracted data will be saved in the data/
folder:
data/
βββ NIFTY_50_index_5minute_2025-01-13_2025-01-17.csv
βββ NIFTY_50_OPTIONS_5minute_2025-01-13_2025-01-17_historical_data.csv
βββ NIFTY_50_FUTURES_5minute_2025-04-15_2025-05-15_historical_data.csv
symbol,date,time,idx_open,idx_high,idx_low,idx_close
NIFTY_50,2025-01-17,09:15:00,23204,23208,23195,23203
NIFTY_50,2025-01-17,09:20:00,23203,23215,23198,23210
NIFTY_50,2025-01-17,09:25:00,23210,23225,23205,23220
symbol,date,time,strikeprice,expirydate,option_type,open,high,low,close,opt_vol,opt_oi
NIFTY17JAN202523200CE,2025-01-17,09:15:00,23200,2025-01-17,CE,45,48,42,46,1250,15000
NIFTY17JAN202523200PE,2025-01-17,09:15:00,23200,2025-01-17,PE,25,28,22,26,980,12500
NIFTY17JAN202523250CE,2025-01-17,09:15:00,23250,2025-01-17,CE,22,25,19,23,1100,18000
symbol,date,time,fut_open,fut_high,fut_low,fut_close,fut_vol,fut_oi
NIFTY24APR2025,2025-04-15,09:15:00,23295,23350,23252,23340,5000,25000
NIFTY24APR2025,2025-04-15,09:20:00,23340,23365,23330,23355,4800,25200
NIFTY24APR2025,2025-04-15,09:25:00,23355,23380,23345,23370,5200,25400
INDEX_TYPE = "NIFTY_50" # Nifty 50 Index
INDEX_TYPE = "BANK_NIFTY" # Bank Nifty Index
INDEX_TYPE = "SENSEX" # BSE Sensex
INDEX_TYPE = "NIFTY_IT" # Nifty IT Sector
INDEX_TYPE = "NIFTY_PHARMA" # Nifty Pharma Sector
# Index Data (v3 API)
TIMEFRAME = "1minute" # 1-minute candles
TIMEFRAME = "5minute" # 5-minute candles (recommended)
TIMEFRAME = "15minute" # 15-minute candles
TIMEFRAME = "30minute" # 30-minute candles
TIMEFRAME = "1hour" # 1-hour candles
TIMEFRAME = "1day" # Daily candles
# Options/Futures Data (v2 API)
TIMEFRAME = "1minute" # 1-minute candles
TIMEFRAME = "5minute" # 5-minute candles (recommended)
TIMEFRAME = "15minute" # 15-minute candles
TIMEFRAME = "1day" # Daily candles
# Example configurations for different extractors
START_DATE = "2025-01-13" # Index extractor
USER_START_DATE = "2025-01-13" # Options/Futures extractors
END_DATE = "2025-01-17" # Index extractor
USER_END_DATE = "2025-01-17" # Options/Futures extractors
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Token Pool β β Rate Limiter β β Data Pipeline β
β β β β β β
β Token 1: 1800/30βββββΆβ FlowGuard βββββΆβ Fetch β Process β
β Token 2: 1800/30β β 15 req/sec β β Validate β Save β
β Token 3: 1800/30β β 200 req/min β β Sort β Export β
β Token 4: 1800/30β β Smart Rotation β β β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
7200 req/30min Zero 429 Errors Clean CSV Output
- Authentication: Secure token management and rotation
- API Calls: Optimized requests with HTTPx and connection pooling
- Rate Limiting: Professional-grade throttling to prevent errors
- Data Validation: Multiple integrity checks and error handling
- Processing: Clean, format, and sort data chronologically
- Output: Standardized CSV files ready for analysis
- Automatic Retries: Exponential backoff for transient errors
- Token Rotation: Switch tokens on rate limit hits
- Partial Success: Continue processing even if some requests fail
- Data Validation: Verify integrity before saving
- Comprehensive Logging: Detailed error reporting and debugging
Configuration | Requests/30min | Relative Speed | Use Case |
---|---|---|---|
Single Token | 1,800 | 1x (baseline) | Small datasets, testing |
Multi-Token (4x) | 7,200 | 4x faster | Large datasets, production |
- Index Data: ~500 records/minute (5-minute timeframe)
- Options Data: ~2,000 records/minute (multiple strikes/expiries)
- Futures Data: ~300 records/minute (with expiry management)
- Memory Usage: <100MB for typical datasets
- Success Rate: 99.9% with proper token configuration
- HTTPx: 20-30% faster HTTP requests with connection pooling
- FlowGuard: Professional rate limiting prevents API errors
- Smart Caching: Reduces redundant API calls for instrument keys
- Batch Processing: Efficient handling of large date ranges
- Connection Reuse: Persistent connections reduce overhead
- Backtesting: Historical OHLC data for strategy validation
- Strategy Development: Clean data for quantitative analysis
- Risk Management: VaR calculations and stress testing
- Performance Attribution: Portfolio analysis and optimization
- Academic Studies: Market microstructure and behavior analysis
- Trend Analysis: Long-term market pattern identification
- Volatility Studies: Options pricing and Greeks calculation
- Correlation Analysis: Cross-asset relationship studies
- Asset Allocation: Historical performance analysis
- Risk Assessment: Drawdown and volatility metrics
- Benchmark Comparison: Index tracking and alpha generation
- Rebalancing: Data-driven portfolio optimization
- Greeks Calculation: Delta, Gamma, Theta, Vega analysis
- Volatility Surface: Implied volatility modeling
- Strategy Backtesting: Complex options strategies validation
- Risk Management: Position sizing and hedging
TRADING_HOURS_FILTER = True # Enable market hours filtering
TRADING_START_TIME = "09:15" # Market open time
TRADING_END_TIME = "15:30" # Market close time
ROUND_PRICES = True # Round to whole numbers
SORT_CHRONOLOGICALLY = True # Auto-sort by date/time
VALIDATE_DATA = True # Enable integrity checks
FILL_GAPS = False # Don't interpolate missing data
# Rate limiting (requests per second/minute)
RATE_LIMIT_SEC = 15 # Conservative: 15 req/sec
RATE_LIMIT_MIN = 200 # Conservative: 200 req/min
MAX_RETRIES = 3 # Retry failed requests
RETRY_DELAY = 1.0 # Base delay between retries
OUTPUT_FORMAT = "CSV" # CSV format (JSON planned)
INCLUDE_METADATA = True # Save expiry/instrument JSON files
COMPRESS_OUTPUT = False # Gzip compression (optional)
CUSTOM_COLUMNS = None # Use default column names
# Problem: Invalid or expired tokens
# Solution: Test your token configuration
python test_tokens.py
# Check token expiry and refresh if needed
# Update upstox_tokens.py with fresh tokens
- Check Date Range: Ensure dates have trading data (no weekends/holidays)
- Verify Timeframe: Some timeframes may not be available for all instruments
- Market Hours: Data only available during trading hours (09:15-15:30)
- Instrument Keys: Verify correct index/symbol configuration
- Use Multi-Token: Switch to multi-token extractors for large datasets
- Reduce Date Range: Process smaller chunks for single-token extractors
- Wait Between Runs: Allow rate limit windows to reset (30 minutes)
- Check Token Rotation: Ensure automatic rotation is working
- Check Internet: Verify stable internet connection
- Upstox Status: Check Upstox API status and maintenance windows
- Firewall: Ensure Python can access external APIs
- Retry Logic: System automatically retries failed requests
- Data Directory: Ensure
data/
folder exists and is writable - Disk Space: Check available storage for large datasets
- File Locks: Close Excel/other programs accessing CSV files
- Permissions: Run with appropriate file system permissions
Enable detailed logging for troubleshooting:
# Add to top of extractor files
import logging
logging.basicConfig(level=logging.DEBUG)
- Large Datasets: Use multi-token extractors for >1 month data
- Memory Usage: Process in smaller date chunks if memory limited
- Slow Network: Increase retry delays and reduce concurrent requests
- API Latency: Run during off-peak hours for better response times
Upstox-Data-Downloader/
βββ π README.md # This comprehensive guide
βββ π WORKFLOW_INSTRUCTIONS.md # Detailed step-by-step usage
βββ π VERSION.md # Version info and limitations
βββ π CHANGELOG.md # Version history and updates
βββ π requirements.txt # Python dependencies
βββ π upstox_tokens.py # Centralized token configuration
βββ π test_tokens.py # Token validation utility
βββ π upstox_index_extractor.py # Index data extraction
βββ π upstox_options_extractor.py # Options chain extraction
βββ π upstox_futures_extractor.py # Futures data extraction
βββ π data/ # Output directory
βββ π future_expiry_date.txt # Manual expiry backup
βββ π *.csv # Historical data files
βββ π *.json # Metadata (expiries, instruments)
- Never commit real tokens to version control
- Use environment variables for production deployments
- Rotate tokens regularly to maintain security
- Monitor token usage to detect unauthorized access
- Keep backup tokens for uninterrupted service
- Respect Rate Limits: Conservative settings prevent violations
- Monitor Usage: Track API quota consumption
- Error Handling: Graceful degradation on API issues
- Terms Compliance: Follow Upstox API terms of service
- Data Usage: Ensure proper data usage rights
- Validation Checks: Multiple data integrity verifications
- Backup Strategy: Regular backups of important datasets
- Version Control: Track data extraction configurations
- Audit Trail: Comprehensive logging for compliance
- Quality Assurance: Automated data quality checks
When reporting bugs, please include:
- Python version and operating system
- Error messages and stack traces
- Steps to reproduce the issue
- Expected vs actual behavior
- Configuration details (without sensitive tokens)
For new features, please describe:
- Use case and business justification
- Proposed implementation approach
- Impact on existing functionality
- Alternative solutions considered
- Documentation: Check
WORKFLOW_INSTRUCTIONS.md
for detailed guidance - GitHub Issues: Search existing issues before creating new ones
This project is licensed under the MIT License - see the LICENSE file for details.
- No Financial Advice: This tool is for data extraction only, not financial advice
- Use at Own Risk: Users responsible for their trading decisions
- Data Accuracy: While we strive for accuracy, verify critical data independently
- API Compliance: Users must comply with Upstox API terms of service
- No Warranty: Software provided "as is" without warranty of any kind
- Personal Use: Free for personal trading and research
- Commercial Use: Allowed under MIT license terms
- Redistribution: Permitted with proper attribution
- Modification: Encouraged for customization needs
- Attribution: Please credit this project if you use it
- Upstox API: Professional trading API for Indian markets
- Python: Powerful programming language for data analysis
- Pandas: Essential data manipulation library
- HTTPx: Modern HTTP client for Python
- FlowGuard: Professional rate limiting
This project was born from the frustration of manually extracting market data for algorithmic trading research. We believe that reliable, high-quality data should be accessible to all traders and researchers, not just large institutions.
Special thanks to the beta testers, contributors, and the broader algorithmic trading community for their feedback, suggestions, and support in making this tool better.
β If this project helps your trading or research, please give it a star! β
Built with β€οΈ for the algorithmic trading community
Last Updated: January 2025 | Version: v0.9.0-beta