This package contains scripts for processing, interpolating, and forecasting population data from census years.
- Python 3.6+
- Required packages:
- pandas
- numpy
- scipy
- matplotlib
- Create a virtual environment (recommended):
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install required packages:
pip install pandas numpy scipy matplotlib
- Set up R environment:
- Install R from CRAN.
- Install required R packages:
Rscript -e "install.packages(c('dplyr', 'ggplot2', 'tidyr'), repos='https://cran.rstudio.com/')"
- Copy the
nordpred.s
file to your working directory (required for nordpred analysis)
Note: use python3
in place of python
based on your
installed python. It is suggested to add an alias for your
python for ease of use.
-
Navigate to the project directory:
cd <workspace>/nordpred-india-forecasts
-
Run the script:
python reading-data/read_diabetes_data.py <input_csv> --output-base <output_base>
<input_csv>
: Path to the input CSV file (e.g.,18-groups-1991-2021-input-data/Nagaland-18groups-1991-2021.csv
).<output_base>
: Base name for the output files (e.g.,NewScripts/processed-files/nagaland_processed
).
If
--output-base
is not provided, the script will prompt you to enter a base name. -
Output:
- The script generates two files:
<output_base>_male.txt
<output_base>_female.txt
- These files are in nordpred format (tab-separated text, years as columns, age groups as rows).
- The script generates two files:
To process the Nagaland data:
python3 reading-data/read_diabetes_data.py 18-groups-1991-2021-input-data/Nagaland-18groups-1991-2021.csv --output-base NewScripts/processed-files/nagaland_processed
To process the Global data:
python3 reading-data/read_diabetes_data.py 18-groups-1991-2021-input-data/GlobalType1-18groups-1991-2021.csv --output-base NewScripts/processed-files/global_processed
- The script automatically detects the CSV header format (multi-header or single-header) and extracts the year column accordingly.
- Confidence intervals in the data are handled by extracting the first value from each cell.
- Ensure that the age-groups match between what the script expects and what is in your csv file. To see what the script expects, visit
nordpred-india-forecasts/reading-data/read_diabetes_data.py
and checkage_map
.
The population processing script (process-population.py
) processes census data and generates population predictions.
- Census Data Files:
1991.csv
: 1991 census data2001.csv
: 2001 census data2011.csv
: 2011 census data- These files should be in the input directory
- CSV format with columns: State, Age, Males, Females
python NewPopulationScripts/process-population.py --state <state> --gender <gender> --input-dir <directory> --output-dir <directory>
python NewPopulationScripts/process-population.py --state <state> --gender <gender> --input-dir <directory> --output-dir <directory> [--start-year <year>] [--end-year <year>] [--forecast-years <years>]
Options:
--state
: State name (e.g., Goa)--gender
: Gender (Male/Female)--input-dir
: Directory containing census CSV files--output-dir
: Directory to save output files (default: "output")--start-year
: Start year for interpolation (default: 1990)--end-year
: End year for interpolation (default: 2021)--forecast-years
: Comma-separated list of years to forecast (default: "2025,2030,2035,2040")
-
Historical Population:
population-{gender}-{state}.txt
- Contains interpolated population data from 1991 to 2021
- Space-separated values
- Years as columns, age groups as rows
- Example:
population-male-goa.txt
-
Predicted Population:
population-{gender}-{state}-pred.txt
- Contains population predictions for 2025-2040
- Space-separated values
- Years as columns, age groups as rows
- Example:
population-male-goa-pred.txt
-
Visualization:
population_forecast.png
- Shows historical and predicted population trends
- Includes all age groups
- Historical data (solid lines) and predictions (dashed lines)
python NewPopulationScripts/process-population.py --state Goa --gender Male --input-dir NewPopulationScripts --output-dir output
This will:
- Read census data from
NewPopulationScripts/1991.csv
,2001.csv
, and2011.csv
- Process data for Goa, male population
- Generate interpolated data (1991-2021)
- Create population predictions (2025-2040)
- Save output files in the
output
directory - Create visualization of the population trends
The script generates the following output files in the specified output directory:
{gender}-{state}.txt
: Contains interpolated population data from 1991 to 2021{gender}-{state}-pred.txt
: Contains forecasted population data for future yearspopulation_forecast.png
: Visualization of population trendspopulation_forecast_log_scale.png
: Log-scale visualization of population trends
To process male population data for Nagaland:
python process-population.py3 --state "Nagaland" --gender Male --input-dir ../population-interpolation-forecast-scripts
This will:
- Read census data from
NewPopulationScripts/1991.csv
,2001.csv
, and2011.csv
- Process data for {state}, {gender} population
- Generate interpolated data from 1995 to 2020
- Create population predictions for 2025-2045
- Save output files in the
output
directory - Create visualization of the population trends
- The script uses cubic spline interpolation for years between census data
- For forecasting, it uses cubic spline or linear interpolation
- All population values are rounded to integers
- Negative values are not allowed in the output
The nordpred analysis script (run-nordpred-analysis.R
) performs age-standardized rate predictions using the nordpred package.
-
Cases file:
{state}-t1_{gender}.txt
- Contains incidence data
- Space-separated values
- Years as columns, age groups as rows
- Example:
goa-t1_male.txt
-
Historical Population:
population-{gender}-{state}.txt
- Contains historical population data
- Space-separated values
- Years as columns, age groups as rows
- Example:
population-male-goa.txt
-
Predicted Population:
population-{gender}-{state}-pred.txt
- Contains future population predictions
- Space-separated values
- Years as columns, age groups as rows
- Example:
population-male-goa-pred.txt
Rscript run-nordpred-analysis.R --input-dir <directory> --state <state> --gender <gender> --plot-type <type>
Options:
--input-dir
: Directory containing input files (default: "test")--state
: State name (e.g., goa)--gender
: Gender (male/female)--plot-type
: Type of plot to generatemain
: Main prediction plot (default)trends
: Trend scenarios plotboth
: Generate both plots
The trends plot shows three different prediction scenarios:
- No trend (solid black line): Assumes no change in rates
- Full trend (dashed red line): Uses the full observed trend
- Recent trend (dotted blue line): Uses a weighted trend, giving more weight to recent years
nordpred_plot_{state}_{gender}.png
: Main prediction plotnordpred_trends_{state}_{gender}.png
: Trend scenarios plotnordpred_predictions_{state}_{gender}.csv
: Predicted rates
To process the Nagaland data:
python3 reading_data/read_diabetes_data.py 18-groups-1991-2021-input-data/Nagaland-18groups-1991-2021.csv --output-base nagaland_type1
To process the Global data:
python3 NewScripts/read_diabetes_data.py 18-groups-1991-2021-input-data/GlobalType1-18groups-1991-2021.csv --output-base NewScripts/processed-files/global_processed
To run nordpred analysis for Goa:
Rscript run-nordpred-analysis.R --input-dir test --state goa --gender male --plot-type both
This project is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). See the LICENSE file for details.
You are free to:
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material for any purpose, even commercially
Under the following terms:
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made.