Skip to content

Startup

Eric Liu edited this page Feb 24, 2025 · 18 revisions

Inputs

The Startup module has no inputs

Outputs

Special MGRAs

Certain MGRAs contain special kinds of population or housing and so thus must be treated differently. There are three different situations which are labeled by this output table:

  1. MGRAs with prison population
  2. MGRAs with senior only population. This is not referring to retirement home businesses, but rather to full neighborhoods which are restricted to seniors (TODO: Isn't it actually 55+?)
  3. MGRAs with military dependent population (TODO: #25)

Thankfully, no MGRAs contain a mix of special populations + regular populations, so there is no additional complexity required to separate them out. What this means is that if an MGRA is labeled as being male_adult_prison, then MGRA contains only male adult prisoners. If an MGRA is labeled as being senior, then the MGRA contains only senior population

ACS Derived Tables

Each of these ACS derived tables has similar methodology to each other, so the tables are included here in one large section. For the most part, the methodology involves combining two or more ACS tables to get tract level "rates", which can then be applied to other geographies. For example, if we compute the distribution of households by household income for a tract, we can assume that for every MGRA in the tract, the household income distribution will roughly match that of the parent tract.

Also note that Estimates and the ACS have different release schedules. This means that the most recent year of data we release is not based on the same year of ACS data. For example, in the 2023 Estimates, we released data for 2020-2023. The years 2020-2022 were directly derived from the respective year of ACS data. However, for 2023, we re-used 2022 data as the ACS had not yet released new data. This fact is reflected in all the below tables, as the final year of ACS data will be duplicated.

Tract level occupancy rates by structure type

Occupancy rates are derived from two ACS tables, B25025 | UNITS IN STRUCTURE and B25032 | TENURE BY UNITS IN STRUCTURE. From the first table, we get ACS housing structures by structure type. From the second table, we get ACS households by structure type. Simply dividing the two gets us tract level occupancy rate by structure type

$$\forall t \in \text{San Diego Tracts}, \forall st \in \text{Structure Types}; \text{Occupancy Rate}_{t,st} = \frac{\text{Households}_{t,st}}{\text{Housing Structures}_{t,st}}$$

To avoid any divide by zero errors, we set the occupancy rate to NULL if the housing structures is also zero. But here we actually run into a point of conflict between ACS and SANDAG. For some tracts, the ACS thinks there are no housing structures therefore there is NULL occupancy rate. But SANDAG thinks there are housing structures. Generally speaking, we trust our housing counts more than the ACS counts, so we still need to find some occupancy rate for these tracts. For simplicity, we just use the regional structure type specific occupancy rate.

$$\forall st \in \text{Structure Types}; \text{Regional Occupancy Rate}_{st} = \frac{\sum \text{Households}_{st}}{\sum \text{Housing Structures}_{st}}$$

Finally, the most recent year of data is duplicated due to differing release schedules between SANDAG and the ACS

City Controls for Population, Housing, and Households

The exact variables contained in our city controls are population by housing type (household, military group quarters, college group quarters, prison group quarters, and all other group quarters) and housing/households by structure type (single family detached, single family multiple units, multiple family, and mobile homes). A large number of these variables are directly copied from or derived from data provided by the California Department of Finance's E-5 product (DOF E-5). We always use the most recent vintage of the DOF E-5 data product.

The only other data source used for our city controls is our official count of dwelling units from the LUDU (Land Use and Dwelling Units) tables. These tables are created directly from San Diego County Tax Assessor data, with some minor corrections regarding exact unit counts and land use codes. We consider our LUDU housing unit counts to be more accurate than what is provided to us by the Census Bureau and from the CA DOF.

The following table explicitly gives the data source of every variable or variables and any relevant transformations

Variable(s) Data Source Description
Population by type CA DOF E-5 Population by type for each city. Data is a direct copy of CA DOF E-5
Housing by type LUDU Housing structures by type for each city. Data is a aggregated to the city level from LUDU, but is otherwise a direct copy
Households by type Derived Households by type for each city. CA DOF E-5 additionally provides a generic occupancy rate for each city. The occupancy rate is applied to the city housing by type to get households by type

Yes, this does mean that we apply a generic occupancy rate, which is for all housing structures of any type, to housing stock by type. We definitely lose some granularity in the data as a result, we may change this methodology. For discussion of the issue, see #28

Regional Group Quarters Age/Sex/Ethnicity Distribution

The regional group quarters age/sex/ethnicity distribution is split by the standard sex/eth categories, but also uses single year of age as opposed to our standard age groups. Additionally, the data is split by every kind of group quarters population, including college, military, prison, and all other group quarters.

With respect to military, the distribution is further split by:

  • Military group quarters, as in those who are actively living in group quarters facilities on military bases
  • Military dependent population, as in the families of active duty military who do not live in group quarters facilities. This is not a group quarters distribution, but it is included here anyway do to the similarity of processing

The reason we require military dependent population is that in a few military bases, such as Camp Pendleton and Miramar (TODO: Verify), families of active duty military live on base. The military dependent age/sex/ethnicity distribution is very different from the general household population distribution.

With respect to prison, the distribution is further split by prison type, including distributions for juvenile male, mixed gender juvenile, male adult, and female adult. In the case that a prison contains both juvenile and adult population, such as in the East Mesa Facilities, the distributions will be mixed according to the population in each separate facility. Also note that there are no juvenile female facilities, which is why there is a mixed gender juvenile distribution.

With respect to all other group quarters, this distribution includes facilities such as orphanages, nursing/convalescent homes, hospitals (TODO: not sure about this one), monasteries, etc. Unfortunately, our group quarters data lacks the level of detail needed to distinguish these different facilities from each other, so they are grouped together. This does mean that an orphanage will have a bunch of very old people in it, since the distributions are mixed together, but that's just a limitation of our data

As to the actual datasource, all distributions are derived from a large set of ACS PUMS person level data, specifically the 2011-2015 and the 2016-2020. We do not use data later than 2020 due to the effect of the COVID-19 pandemic. The distributions are also held constant for all years of the Estimates program. Note that many of these points are currently under discussion, please see #16.

For each person, the age/sex/ethnicity is encoded using the variables of [AGEP], [SEX], [HISP], and [RAC1P]. Group quarters population is determined using [RELP] IN (16, 17) for the 2011-2015 data, and using [RELSHIPP] IN (37, 38) for the 2016-2022 data. Then, further filtering is done to get the specific GQ types:

  • Military group quarters is determined using [ESR] IN (4, 5)
  • College group quarters is determined using [SCHG] IN (15, 16)
  • For lack of better variables in PUMS data to make this determination, prison group quarters are determined by taking the non-disabled ([DIS] = 2) institutionalized ([RELP] = 16 for 2011-2015, [RELSHIPP] = 37 for 2016-2020) population
  • Also for lack of better variables, all other group quarters are determined by taking the disabled ([DIS] = 1) institutionalized ([RELP] = 16 for 2011-2015, [RELSHIPP] = 37 for 2016-2020) population

Additional work is done on the prison distribution, where it is further split based on age/sex thresholds. For example, the male juvenile distribution is determined by filtering out female population and any population over the age of 18. As another example, the female adult distribution is determined by filtering out male population and any population aged 18 or younger.

Age filtering is additionally done on all group quarters distributions. For example, the military group quarters has a cap on the age which can be found in (TODO: link). This is not saying that once you are over the age of (TODO), you are fired from the military. Remember that this is specifically about the military living in group quarters, or in dorms. Above a certain age, most military either live outside the dorms or have left the military. To see all age restrictions, see the code at (TODO. My thought here is that these numbers may be updated, so better to refer directly to the code)

Finally, distributions are actually calculated from the population weight variable [PWGTP]. Although not the exact meaning of the variable, you can think of it as being the number of people who have the listed characteristics. For every population type, the total [PWGTP] is computed and used to compute the distribution of [PWGTP] across the different age/sex/ethnicity categories.

Region Controls

There are two regional control tables which each contain different sets of variables. The first regional control table contains population data split by population type, age, sex, and race/ethnicity. The second regional control table contains household characteristics data, with columns for households by household size, by presence of children, and by number of workers.

Regional population by type, age, sex, and race/ethnicity controls

TODO

Regional household characteristics controls

TODO

Description

TODO

Clone this wiki locally