Open
Description
This is motivated by the desire to more properly capture the NSSP dataset, which appears to report county-level information, but it is actually reporting HSA-level information (and repeating the same value for each county in an HSA).
- We have some HSA info in the GeoMapper resource files:
- https://github.com/cmu-delphi/covidcast-indicators/blob/main/_delphi_utils_python/delphi_utils/data/2019/zip_hsa_table.csv
- https://github.com/cmu-delphi/covidcast-indicators/blob/main/_delphi_utils_python/delphi_utils/data/2020/zip_hsa_table.csv
- (both files were created by our custom geo data processing code)
- However, these files are not currently used in the GeoMapper itself.
- Further complicating this process, the NSSP dataset actually uses "NCI Modified" HSA definitions as described at https://seer.cancer.gov/seerstat/variables/countyattribs/hsa.html .
- We may want to consider adding this as a different and separate geographic region type, like "
hsa_nci
" or "nci
". - As an example of the difference between these "versions" of HSA:
- Beaver and Allegheny counties in Pennsylvania are reported in the NSSP dataset with the same
hca_nci_id
of 42. - Ambridge in Beaver county uses the ZIP code 15003, which shows in our mapping file as the "proper" HSA 39111 (not "42"!)
- CMU in Allegheny county has ZIP 15213, which shows as a different "proper" HSA 39098 (also not "42")
- Beaver and Allegheny counties in Pennsylvania are reported in the NSSP dataset with the same
- We may want to consider adding this as a different and separate geographic region type, like "