Grace Rade, Maeve Tyler-Penny, Julia Ting
library(actLifer)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
The actLifer package contains functions to create actuarial life tables and three datasets ready to be made into a life table. Each mathematical step in transforming mortality data into life expectancy has a corresponding function, which builds the table up to that step. The datasets have been prepared are are ready to use in our functions.
Inspiriation
Mathematically speaking, mortality data is the first step in calculating life expectancy. There are several intermediate calculations between the number of deaths at a given age and life expectancy, and each step builds on the previous values. With this in mind, we created several functions that calculated each intermediate value that build a complete actuarial lifetable when combined. Lifetables can be rather easily created in a spreadsheet, but is a rather involved process in R. Our functions simplify the procedure of creating a lifetable into one function with the option to group by different categorical variables.
The actLifer package is a useful tool for anyone who works with mortality data, wants to calculate life expectancy, or wants to find any of the intermediate values between number of deaths and life expectancy.
Functions
All of the functions take in a dataset that has columns for age group (\(x\)), deaths at each age (\(D_x\)), and the midyear population at each age (\(P_x\)).
-
central_death_rate()
: Calculates the central/crude death rate, \(M_x\), which is the number of deaths in a given period divided by the population at risk in that same given period.Formula: \(M_x = \frac{D_x}{P_x}\)
This is an optional column in the life table, but can be useful to ascertain a general indication of the health status of a given area or population.
-
conditional_death_prob()
: Calculates the conditional probability of death at each age (\(q_x\)), which is the probability of dying at a certain age within a given period.- Formula: \(q_x = \frac{D_x}{P_x + \frac{D_x}{2}}\)
-
conditional_life_prob()
: Calculates the conditional prbability of life at each age (\(p_x\)), which is the probability of living to a certain age within a given period.- Formula: \(p_x = 1 - q_x\)
please note that R will round the conditional probability of life to 1, this will not present problems to later calculations
-
number_to_survive()
: Calculates the number of people to survive to a given age interval (\(l_x\)), starting with an arbitrary number of 100,000 at age 0 (or age < 1).- Formula: \(l_x = l_{x-1} \cdot p_{x-1}; l_0 = 100,000\)
-
prop_to_survive()
: Calculates the proportion of the population surviving to age \(x\).Formula: \(l_x/100000\)
This is another optional column in the life table, and can be removed after all of the calculations are completed.
-
person_years()
: Calculates the person years lived at each age (), which is the total number of years lived at each age \(x\) by all people who survive to that age.- Formula: $ L_x = $
-
total_years_lived()
: Calculates the total years lived to each age \(x\), which is the sum of all person years from \(0\) to age \(x\).- Formula: \(T_x = \sum_{i = 0}^{x}L_x\)
-
life_expectancy()
: Calculates the life expectancy at age \(x\) (\(e_x\)), which is the number of years an average person is expected to live beyond their current age.Formula: \(e_x = \frac{T_x}{l_x}\)
This function will output a complete life table, without the added customization of the
lifetable()
function.
-
lifetable()
: Outputs a complete lifetable with the ability to customize which of the optional columns are included, and add extra grouping variables.if
includeAllSteps = TRUE
, the lifetable will includeCentralDeathRate
andPropToSurvive
in the final outputif
includeCDR = FALSE
,CentralDeathRate
will not be included in the final outputif
includePS = FALSE
,PropToSurvive
will not be included in the datasetincludeAllSteps
,includeCDR
, andincludePS
are allTRUE
by default
example <- lifetable(mortality2, "age_group", "population", "deaths")
#> # A tibble: 5 × 11
#> age_group deaths population CentralDeathRate ConditionalProbDeath
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 < 1 year 23161 3970145 0.00583 0.00582
#> 2 1 year 1568 3995008 0.000392 0.000392
#> 3 2 years 1046 3992154 0.000262 0.000262
#> 4 3 years 791 3982074 0.000199 0.000199
#> 5 4 years 640 3987656 0.000160 0.000160
#> # ℹ 6 more variables: ConditionalProbLife <dbl>, NumberToSurvive <dbl>,
#> # PropToSurvive <dbl>, PersonYears <dbl>, TotalYears <dbl>,
#> # LifeExpectancy <dbl>
Calculating life extpectancy is an iterative process, building on the previous intermediate calculations. Each of the functions will call the function of the previous step as it executes, meaning that the output dataset will include the columns of the previous steps. For this reason, there is no need to run each step individually on a dataset, simply run the function for the last step that you are trying to complete.
Central Death Rate is an optional column in the dataset and must be called in addition to the other functions.
Datasets
The package includes three datasets, all sourced from the CDC Wonder Database (https://wonder.cdc.gov/ucd-icd10.html).
mortality
contains data from the year 2018 with single-year age groupsmortality2
contains data from the year 2016 with single-year age gapsmortality3
contains data from the year 2016 with single-year age gaps and a gender grouping variable
What Do These Datasets Look Like?
Each of the included data sets include an age group variable, a population variable, and a deaths variable. Population represents the mid-year population for each age group. Deaths represents the number of people in each age group that have died.
Here’s what the first five rows of mortality2
look
like.
#> # A tibble: 5 × 3
#> age_group deaths population
#> <chr> <dbl> <dbl>
#> 1 < 1 year 23161 3970145
#> 2 1 year 1568 3995008
#> 3 2 years 1046 3992154
#> 4 3 years 791 3982074
#> 5 4 years 640 3987656
Who Should Use This Package?
This package can be used by researchers, actuaries, or anyone that is working with mortality data. This can be particularly useful for those wanting to calculate life expectancy of specific groups, as life expectancy data for sub-groups of the total population of a given area is difficult to find. Additionally, out package can be used to compare life expectancy at different points in time, such as before and after the COVID-19 pandemic.
What Can We Do With This Data?
We can use this package to address question such as:
How does life expectancy differ between population groups?
Is there a specific age-range where life expectancy dramatically changes?
Does the central death rate significantly differ from the probability of death at a certain age?
And many more!
Example 1:
How does life expectancy differ between population groups?
The built-in dataset mortality3
provides a
gender
variable that can be used to group the data. The
lifetable
function allows for extra grouping arguments, so
that is the function we will use.
- Please note that
gender
is the variable name that the CDC uses to mean biological sex (Male, Female)
lifetable(mortality3, "age_group", "population", "deaths", FALSE, FALSE, FALSE, "gender")
#> # A tibble: 170 × 6
#> # Groups: "gender" [1]
#> age_group gender deaths population `"gender"` LifeExpectancy
#> <chr> <chr> <dbl> <dbl> <chr> <dbl>
#> 1 < 1 year Female 10294 1939667 gender 139.
#> 2 < 1 year Male 12867 2030478 gender 138.
#> 3 1 year Female 694 1953850 gender 138.
#> 4 1 year Male 874 2041158 gender 137.
#> 5 2 years Female 474 1949132 gender 136.
#> 6 2 years Male 572 2043022 gender 135.
#> 7 3 years Female 323 1947408 gender 134.
#> 8 3 years Male 468 2034666 gender 133.
#> 9 4 years Female 298 1950127 gender 132.
#> 10 4 years Male 342 2037529 gender 131.
#> # ℹ 160 more rows
The output is a tibble data frame that has calculated life expectancy for each gender. From this we can see any differences in life expectancy between males and females.
Users can use many extra grouping variables to get even more specific with population subgroups. Some suggested variables include (but are not limited to) state/geographic area, race, sex, income group, or health status.
Example 2:
Is there a specific age-range where life expectancy dramatically changes?
The mortality
dataset has the age grouped in single-year
intervals. We can use this dataset to see if life expectancy changes
dramatically from one interval to the next.
lifetable(mortality2, "age_group", "population", "deaths", TRUE, FALSE, FALSE)
#> # A tibble: 85 × 9
#> age_group deaths population ConditionalProbDeath ConditionalProbLife
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 < 1 year 23161 3970145 0.00582 0.994
#> 2 1 year 1568 3995008 0.000392 1.00
#> 3 2 years 1046 3992154 0.000262 1.00
#> 4 3 years 791 3982074 0.000199 1.00
#> 5 4 years 640 3987656 0.000160 1.00
#> 6 5 years 546 4032515 0.000135 1.00
#> 7 6 years 488 4029655 0.000121 1.00
#> 8 7 years 511 4029991 0.000127 1.00
#> 9 8 years 483 4159114 0.000116 1.00
#> 10 9 years 462 4178524 0.000111 1.00
#> # ℹ 75 more rows
#> # ℹ 4 more variables: NumberToSurvive <dbl>, PersonYears <dbl>,
#> # TotalYears <dbl>, LifeExpectancy <dbl>
From the abbreviated output, we can see that life expectancy does not change dramatically from year to year.
Example 3:
Does the central death rate significantly differ from the probability of death at a certain age?
Central Death Rate (also known as the Crude Death or Mortality Rate),
is not a necessary intermediate step for calculating life expectancy, so
the conditional_death_prop()
function does not call
central_death_rate()
. To compare the two measures, we will
have to run both the functions.
mort<- mortality2 %>%
central_death_rate("age_group", "population", "deaths") %>%
conditional_death_prob("age_group", "population", "deaths")
head(mort)
#> # A tibble: 6 × 5
#> age_group deaths population CentralDeathRate ConditionalProbDeath
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 < 1 year 23161 3970145 0.00583 0.00582
#> 2 1 year 1568 3995008 0.000392 0.000392
#> 3 2 years 1046 3992154 0.000262 0.000262
#> 4 3 years 791 3982074 0.000199 0.000199
#> 5 4 years 640 3987656 0.000160 0.000160
#> 6 5 years 546 4032515 0.000135 0.000135
tail(mort)
#> # A tibble: 6 × 5
#> age_group deaths population CentralDeathRate ConditionalProbDeath
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 79 years 62081 1439937 0.0431 0.0422
#> 2 80 years 64987 1358260 0.0478 0.0467
#> 3 81 years 67240 1284298 0.0524 0.0510
#> 4 82 years 67120 1135109 0.0591 0.0574
#> 5 83 years 69758 1079082 0.0646 0.0626
#> 6 84 years 72916 1008890 0.0723 0.0698
Central Death Rate and Conditional Probability of Death start off
being very similar in value, as you can see from the first five rows of
mort
. However, as one ages, the difference between Central
Death Rate and Conditional Probability of Death becomes larger, as you
can see from the last five rows of the dataset.