Skip to contents

This section demonstrates the use of the functions contained in the demogmx package. This package includes functions to obtain demographic information regarding births, population, migration (immigration and emigration), mortality, and aging rate in Mexico. All these functions are very similar in the way they work, and once you become familiar with using one, using the rest becomes very intuitive.

To start, first we load the package:

# Load libraries
library(demogmx)

To obtain information from the functions, it is important to know the characteristics of the data. The available information is disaggregated by year, state, sex, and age. In some cases, there are other disaggregation variables like the type of migration in the get_migration() function. To obtain information at a state level, we have to introduce their name in English. The list of state names available in the data-sets is:

##  [1] "Aguascalientes"      "Baja California"     "Baja California Sur"
##  [4] "Campeche"            "Coahuila"            "Colima"             
##  [7] "Chiapas"             "Chihuahua"           "Mexico City"        
## [10] "Durango"             "Guanajuato"          "Guerrero"           
## [13] "Hidalgo"             "Jalisco"             "State of Mexico"    
## [16] "Michoacan"           "Morelos"             "Nayarit"            
## [19] "Nuevo Leon"          "Oaxaca"              "Puebla"             
## [22] "Queretaro"           "Quintana Roo"        "San Luis Potosi"    
## [25] "Sinaloa"             "Sonora"              "Tabasco"            
## [28] "Tamaulipas"          "Tlaxcala"            "Veracruz"           
## [31] "Yucatan"             "Zacatecas"           "National"

As we can see, the last element in the list is National, with it we can obtain the aggregated information of all the states in the country.

The list of sex names that are accepted in the functions is the following:

## [1] "Female" "Male"   "Total"

In the present version of the package, the functions can vary the range of ages that are accepted as parameter inputs. In some cases they accept ages from 0 to 109 years, as is the case with get_population() or, in other cases, from 0 to 89, if we use get_migration().

Most of the functions have the option age_groups that only accepts logical elements as inputs. When age_groups = FALSE the data-sets will only return the data of the ages that are inside the v_ages numeric vector. When age_groups = TRUE, the output data will be grouped by age groups bounded by the elements inside the age vector. Suppose that we have the next numeric vector representing the ages that we are looking into the data:

v_ages <- c(0, 10, 15, 20)
  • If age_groups = FALSE, the output will be a data-set containing the information of the people with 0, 10, 15, and 20 years of age.

  • If age_groups = TRUE, the data will be grouped in the following blocks:

    • The group of people between 0 and 10 years of age.

    • The group of people between 11 and 15 years of age.

    • The group of people between 16 and 20 years of age.

    • The group of people 20 years old and older. In the data-sets, the last age block will have an Inf indicating that the upper bound of the block is an infinite positive number.

    • We will notice how the age groups have been made looking at the age_group column. This column is made with the interval notation.

Now let’s have a closer look at each of the demogmx functions.

Births

This function gives a dataset with the number and the rate of births in the years, states, and sexes indicated by the user.

In the present version of the package there are two functions to obtain birth information from Mexico. One gives the information with sex disaggregation with data from 1985 to 2020 and the other does not have sex disaggregation but has projections between1970 and 2050.

In this function there is no age_groups option, instead there is the year_groups option that aggregates the information by age in the same way as we explained age_groups before.

General births from 1970 to 2050

This function gets the number of births estimated by the National Council of Population (CONAPO by its acronym in spanish), additionally it provides birth rate information based on the population information that can be accessed with get_population(). This function has birth projections from 1970 to 2050 for each state but does not have sex disaggregation.

Problema: en demog-mx no vienen los scripts de cómo se manipularon las bases de datos originales, quizás sea necesario volverlo a hacer aquí para que pueda ser replicable. En el caso específico de los nacimientos de la CONAPO, hay que procesar la base de datos original para obtener los datos obtenidos en data-raw.

This function contains the next parameters:

  • v_state requires an element or a vector of characters containing the names of the desired states.

  • v_year needs a numeric element or vector with the years of the information.

  • year_groups needs a logical argument that indicates if data will be grouped by years.

Ahead there are some examples of how this function can be used. First we obtain the data without aggregating it by years.

# Get data without grouping by year range
get_births(v_state = c("Guerrero", "Zacatecas", "National"),
           v_year = c(1970, 2000, 2020),
           year_groups = FALSE) 
## # A tibble: 9 x 5
##    year state     CVE_GEO  births birth_rate
##   <int> <fct>       <int>   <int>      <dbl>
## 1  1970 Guerrero       12   85806     0.0511
## 2  2000 Guerrero       12   84012     0.0270
## 3  2020 Guerrero       12   68085     0.0186
## 4  1970 Zacatecas      32   43853     0.0447
## 5  2000 Zacatecas      32   35136     0.0257
## 6  2020 Zacatecas      32   31168     0.0188
## 7  1970 National        0 2222585     0.0444
## 8  2000 National        0 2322025     0.0237
## 9  2020 National        0 2151358     0.0169

Now we can see what happens if the output is aggregated.

# Get data grouping by year range
get_births(v_state = c("Michoacan", "Guerrero", "Zacatecas", "National"),
           v_year = c(1970, 2000, 2020),
           year_groups = TRUE)
## # A tibble: 12 x 5
##    year_group  state     CVE_GEO   births birth_rate
##    <fct>       <fct>       <int>    <int>      <dbl>
##  1 [1970,2000] Guerrero       12  2773431     0.0370
##  2 [1970,2000] Michoacan      16  3580985     0.0352
##  3 [1970,2000] Zacatecas      32  1286995     0.0342
##  4 [1970,2000] National        0 73396334     0.0317
##  5 (2000,2020] Guerrero       12  1535249     0.0226
##  6 (2000,2020] Michoacan      16  1890914     0.0215
##  7 (2000,2020] Zacatecas      32   662425     0.0219
##  8 (2000,2020] National        0 45055055     0.0198
##  9 (2020,Inf]  Guerrero       12  1615318     0.0146
## 10 (2020,Inf]  Michoacan      16  2428192     0.0156
## 11 (2020,Inf]  Zacatecas      32   828243     0.0154
## 12 (2020,Inf]  National        0 57038811     0.0135

Note that the year column changes its name based in whether we chose to group the data. If the data was not grouped the column is named year. If it is grouped it will be named year_group.

If any of the functions of this package receive a wrong input in their parameters, they will throw an error message and the list of accepted values, based on the parameter where the error was detected.

# Throw an error
get_births(v_state = 55, # Is not a character and is not a state name.
           v_year = c(1970, 2000, 2020),
           year_groups = FALSE) 
## Error in get_births(v_state = 55, v_year = c(1970, 2000, 2020), year_groups = FALSE): v_state must be a character element or vector containing at least one of the next names:
## 
## Aguascalientes, Baja California, Baja California Sur, Campeche, Coahuila, Colima, Chiapas, Chihuahua, Mexico City, Durango, Guanajuato, Guerrero, Hidalgo, Jalisco, State of Mexico, Michoacan, Morelos, Nayarit, Nuevo Leon, Oaxaca, Puebla, Queretaro, Quintana Roo, San Luis Potosi, Sinaloa, Sonora, Tabasco, Tamaulipas, Tlaxcala, Veracruz, Yucatan, Zacatecas, National

Births by sex

This function gives the number of registered births based on Mexico’s National Institute of Statistical and Geographical data (INEGI, by it’s acronym in Spanish). As in the previous function, it provides birth rate information. This information disaggregated by sex for each state and has estimations from 1980 to 2020.

This function has a parameter called v_sex that specifies the sexes that will contain the output data-set. This input accepts the character arguments “Female”, “Male” and “Total”, giving the information of each sex or the aggregation of both, respectively.

Here is an example of how this function can be used, without aggregating by year.

# Get data without grouping by year range
get_births_INEGI(v_state = c( "Guerrero", "National"),
                 v_year = c(2000, 2020),
                 v_sex = c("Female", "Total"),
                 year_groups = FALSE) 
## # A tibble: 8 x 7
##    year state    CVE_GEO sex     births birth_prop birth_rate
##   <dbl> <chr>      <dbl> <chr>    <dbl>      <dbl>      <dbl>
## 1  2000 Guerrero      12 Female   85116      0.520     0.0533
## 2  2000 Guerrero      12 Total   163640      1         0.0525
## 3  2020 Guerrero      12 Female   29606      0.493     0.0157
## 4  2020 Guerrero      12 Total    60022      1         0.0164
## 5  2000 National       0 Female 1398703      0.500     0.0280
## 6  2000 National       0 Total  2797580      1         0.0285
## 7  2020 National       0 Female  800264      0.491     0.0123
## 8  2020 National       0 Total  1629208      1         0.0128

And here we can see how it works if we decide to aggregate the information by years.

# Get data grouping by year range
get_births_INEGI(v_state = c( "Guerrero", "National"),
                 v_year = c(2000, 2020),
                 v_sex = c("Female", "Total"),
                 year_groups = TRUE) 
## # A tibble: 4 x 7
##   year_group  state    CVE_GEO sex      births birth_prop birth_rate
##   <fct>       <chr>      <dbl> <chr>     <dbl>      <dbl>      <dbl>
## 1 [2000,2020] Guerrero      12 Female  1114059      0.508     0.0304
## 2 [2000,2020] Guerrero      12 Total   2194179      1         0.0308
## 3 [2000,2020] National       0 Female 25813848      0.497     0.0214
## 4 [2000,2020] National       0 Total  51915377      1         0.0219

Population

get_population() is a function that returns the estimations of CONAPO on number of people living in Mexico. This function allows to filter the data based on the year, state, sex and age specified by the user. This function has population projections from 1970 to 2050

get_population() function includes the next parameters:

  • v_age requires a numeric vector that indicates the ages that will be returned in the output data.

  • age_groups needs a logical argument in order to group the data based on the input of v_age.

Let’s see an example of how this function is executed without aggregating the data by age.

# Get population data without grouping by age
get_population(v_state = "Mexico City",
               v_year = 2000,
               v_sex = c("Female", "Male"),
               v_age = c(10, 25, 100),
               age_groups = FALSE)
## # A tibble: 6 x 8
##    year state         age CVE_GEO sex    population death_rate proportion
##   <int> <fct>       <int>   <int> <chr>       <int>      <dbl>      <dbl>
## 1  2000 Mexico City    10       9 Female      77915   0.000193   0.475   
## 2  2000 Mexico City    10       9 Male        79634   0.000339   0.497   
## 3  2000 Mexico City    25       9 Female      85947   0.000500   0.524   
## 4  2000 Mexico City    25       9 Male        80613   0.00200    0.503   
## 5  2000 Mexico City   100       9 Female        138   0.391      0.000841
## 6  2000 Mexico City   100       9 Male           63   0.381      0.000393

Now, if we use this function aggregating by age we get the next result:

# Get population data grouping by age
get_population(v_state = "Mexico City",
               v_year = 2000,
               v_sex = c("Female", "Male"),
               v_age = c(10, 25, 100),
               age_groups = TRUE)
## # A tibble: 6 x 8
##    year state       CVE_GEO sex    age_group population death_rate proportion
##   <int> <fct>         <int> <chr>  <fct>          <int>      <dbl>      <dbl>
## 1  2000 Mexico City       9 Female [10,25]      1319456   0.000377  0.355    
## 2  2000 Mexico City       9 Female (25,100]     2392044   0.00769   0.644    
## 3  2000 Mexico City       9 Female (100,Inf]        197   0.477     0.0000531
## 4  2000 Mexico City       9 Male   [10,25]      1283359   0.00109   0.382    
## 5  2000 Mexico City       9 Male   (25,100]     2075376   0.00930   0.618    
## 6  2000 Mexico City       9 Male   (100,Inf]         92   0.5       0.0000274

Migration

This function allows the user to get information about international and interstate migration in Mexico estimated by CONAPO. This data-set can give information disaggregated by year (from 1970 to 2050), state, sex and age (from 0 to 89 years). As most functions in the package, the information retrieved by this function can be aggregated by age.

get_population() has a particular parameter called v_type that is used to define the type of migration that will be present in the data-set output. The available types of migration in this function are Interstate migration, International migration and Total migration.

We can find an examples of how get_population() can be used with different specifications.

# Get migration data without grouping by age and
# with both types of migration
get_migration(v_state = "Mexico City",
              v_year = 2010,
              v_sex = "Total",
              v_age = c(25, 50),
              v_type = c("Interstate", "International"),
              age_groups = FALSE)
## # A tibble: 4 x 12
##    year state       CVE_GEO sex     age emigrants immigrants type  net_migration
##   <int> <chr>         <dbl> <chr> <int>     <dbl>      <dbl> <chr>         <dbl>
## 1  2010 Mexico City       9 Total    25       279        317 Inte~            38
## 2  2010 Mexico City       9 Total    25      2900       2149 Inte~          -751
## 3  2010 Mexico City       9 Total    50        70        104 Inte~            34
## 4  2010 Mexico City       9 Total    50      1001        458 Inte~          -543
## # ... with 3 more variables: em_rate <dbl>, im_rate <dbl>, nm_rate <dbl>
# Get migration data grouping by age and
# with both types of migration
get_migration(v_state = "Mexico City",
              v_year = 2010,
              v_sex = "Total",
              v_age = c(25, 50),
              v_type = c("Interstate", "International"),
              age_groups = TRUE)
## # A tibble: 4 x 12
##    year state   CVE_GEO sex   age_group type  emigrants immigrants net_migration
##   <int> <chr>     <dbl> <chr> <fct>     <chr>     <dbl>      <dbl>         <dbl>
## 1  2010 Mexico~       9 Total [25,50]   Inte~      4191       5281          1090
## 2  2010 Mexico~       9 Total [25,50]   Inte~     54514      30933        -23581
## 3  2010 Mexico~       9 Total (50,Inf]  Inte~       997       1127           130
## 4  2010 Mexico~       9 Total (50,Inf]  Inte~     13891       5420         -8471
## # ... with 3 more variables: em_rate <dbl>, im_rate <dbl>, nm_rate <dbl>

Note that if v_state = National and v_type = Interstate, the function will return an empty data-set because there is no interstate migration at a National level.

# Get migration data without grouping by age
get_migration(v_state = "National",
              v_year = 2010,
              v_sex = "Total",
              v_age = c(0, 10, 15),
              v_type = "Interstate",
              age_groups = FALSE)
## # A tibble: 0 x 12
## # ... with 12 variables: year <int>, state <chr>, CVE_GEO <dbl>, sex <chr>,
## #   age <int>, emigrants <dbl>, immigrants <dbl>, type <chr>,
## #   net_migration <dbl>, em_rate <dbl>, im_rate <dbl>, nm_rate <dbl>

Mortality

This function allows us to get the estimation on number of deaths made by CONAPO. It works identically as get_population(). The user must define the year(s), state(s), sex(es), age(s) to obtain the data. As well, there is the option on whether the information should be grouped by age or not.

Ahead you will find some examples of how this function can be used

# Get mortality data without grouping by age
get_deaths(v_state = c("Guanajuato", "Nuevo Leon"),
           v_year = 2015,
           v_sex = c("Male", "Total"),
           v_age = c(15, 25, 35),
           age_groups = FALSE)
## # A tibble: 12 x 7
##     year state      CVE_GEO   age sex   deaths death_rate
##    <int> <fct>        <int> <int> <chr>  <int>      <dbl>
##  1  2015 Guanajuato      11    15 Male      45   0.000781
##  2  2015 Guanajuato      11    15 Total     67   0.000587
##  3  2015 Guanajuato      11    25 Male     121   0.00239 
##  4  2015 Guanajuato      11    25 Total    156   0.00150 
##  5  2015 Guanajuato      11    35 Male     123   0.00320 
##  6  2015 Guanajuato      11    35 Total    170   0.00205 
##  7  2015 Nuevo Leon      19    15 Male      34   0.000732
##  8  2015 Nuevo Leon      19    15 Total     49   0.000540
##  9  2015 Nuevo Leon      19    25 Male     106   0.00231 
## 10  2015 Nuevo Leon      19    25 Total    131   0.00146 
## 11  2015 Nuevo Leon      19    35 Male     116   0.00303 
## 12  2015 Nuevo Leon      19    35 Total    153   0.00197
# Get mortality data grouping by age
get_deaths(v_state = c("Guanajuato", "Nuevo Leon"),
           v_year = 2015,
           v_sex = c("Male", "Total"),
           v_age = c(15, 25, 35),
           age_groups = TRUE)
## # A tibble: 12 x 7
##     year state      CVE_GEO sex   age_group deaths death_rate
##    <int> <fct>        <int> <chr> <fct>      <int>      <dbl>
##  1  2015 Guanajuato      11 Male  [15,25]      970    0.00159
##  2  2015 Guanajuato      11 Male  (25,35]     1230    0.00284
##  3  2015 Guanajuato      11 Male  (35,Inf]   14540    0.0154 
##  4  2015 Guanajuato      11 Total [15,25]     1299    0.00106
##  5  2015 Guanajuato      11 Total (25,35]     1625    0.00177
##  6  2015 Guanajuato      11 Total (35,Inf]   27028    0.0134 
##  7  2015 Nuevo Leon      19 Male  [15,25]      806    0.00155
##  8  2015 Nuevo Leon      19 Male  (25,35]     1124    0.00271
##  9  2015 Nuevo Leon      19 Male  (35,Inf]   13130    0.0138 
## 10  2015 Nuevo Leon      19 Total [15,25]     1041    0.00103
## 11  2015 Nuevo Leon      19 Total (25,35]     1426    0.00172
## 12  2015 Nuevo Leon      19 Total (35,Inf]   23662    0.0121

Aging rate

The aging rate is the proportion of the population with age \(i\) that were able to live another year, in other words, the aging rate is the division of the aging population between the total population.

\[ aging\ rate = \frac{aging\ population_i}{total\ population_i} \]

The aging population, as we wrote, is the number of people that is able to live another period of time (in this case, another year) in an specific location. Hence, to calculate the aging population in a specific age \(i\), we must add the immigrants and subtract the emigrants and deaths to the total population of that age, as it is stated in the next equation.

\[ aging\ population = population_i + immigrants_i - emigrants_i - deaths_i \]

If the age is 0, then it is necessary to substitute the population of age \(i\) by the births that occurred, in the specific time and location, to the equation.

As we see, all the elements to get the aging population and the aging rate are already at hand, so this function was created based on the outputs of the rest of functions of demogmx. This function requires that the user specifies the year, state, sex, and the age to filter the output and, as most of the functions, the data can be aggregated by the ages defined by the user in v_age.

Below you will find some examples of how this function can be used as well as its outputs.

# Get mortality data without grouping by age
get_aging_rate(v_state = "National",
               v_year = seq(2000, 2010),
               v_sex = "Total",
               v_age = 47)
## # A tibble: 11 x 7
##     year state    CVE_GEO sex     age aging_pop aging_rate
##    <int> <chr>      <dbl> <chr> <int>     <dbl>      <dbl>
##  1  2000 National       0 Total    47    841560      0.995
##  2  2001 National       0 Total    47    878597      0.995
##  3  2002 National       0 Total    47    917559      0.995
##  4  2003 National       0 Total    47    956589      0.995
##  5  2004 National       0 Total    47    996809      0.995
##  6  2005 National       0 Total    47   1038726      0.996
##  7  2006 National       0 Total    47   1079537      0.996
##  8  2007 National       0 Total    47   1119101      0.996
##  9  2008 National       0 Total    47   1157259      0.996
## 10  2009 National       0 Total    47   1193257      0.996
## 11  2010 National       0 Total    47   1227207      0.995
# Get aging data grouping by age
get_aging_rate(v_state = "National",
               v_year = seq(2000, 2010),
               v_sex = "Total",
               v_age = 47)
## # A tibble: 11 x 7
##     year state    CVE_GEO sex     age aging_pop aging_rate
##    <int> <chr>      <dbl> <chr> <int>     <dbl>      <dbl>
##  1  2000 National       0 Total    47    841560      0.995
##  2  2001 National       0 Total    47    878597      0.995
##  3  2002 National       0 Total    47    917559      0.995
##  4  2003 National       0 Total    47    956589      0.995
##  5  2004 National       0 Total    47    996809      0.995
##  6  2005 National       0 Total    47   1038726      0.996
##  7  2006 National       0 Total    47   1079537      0.996
##  8  2007 National       0 Total    47   1119101      0.996
##  9  2008 National       0 Total    47   1157259      0.996
## 10  2009 National       0 Total    47   1193257      0.996
## 11  2010 National       0 Total    47   1227207      0.995

NOTA: creo que especifqué mal la función the get_aging_rate cuando la edad es cero.

Get help

Finally, it is possible to get help of each of the demogmx functions as any other package. You can go to the help tab and search for information about a specific function or you can write ? before the function in the console as it is in the following code chunk.

# ?get_population