Query data from the Facebook Marketing API using R, with a focus for social science research.
This package facilitates querying data from the Facebook Marketing API. The packages is inspired by pySocialWatcher, which is a similar package built for Python. Emerging research has shown that the Facebook Marketing API can provide useful data for social science research. For example, Facebook marketing data has been used for:
- Poverty estimation (here and here)
- Disease surveillance
- Migrants in United States (here and here)
- Monitoring refugee and migrant flows in Venezuela
- Quantifying mobility patterns
- Studying the Urban/Rural Divide
The package provides the following functions:
get_fb_parameter_ids()
: To obtain IDs for targeting users by different characteristics, including (1) different parameter types (eg, behaviors and interests) and (2) location keys (eg, city keys)get_location_coords()
: To obtain coordinates and, when available, geometries of locations based on their location keys.query_fb_marketing_api()
: Query daily and monthly active users, querying users for specific locations and by specific types.get_fb_suggested_radius()
: Determine a suggested radius to reach enough people for a given coordinate pair.
The package can be installed via CRAN.
install.packages("rsocialwatcher")
You can install the development version of rsocialwatcher from GitHub with:
# install.packages("devtools")
devtools::install_github("worldbank/rsocialwatcher")
Using the Facebook Marketing API requires indicating the following:
- Token
- Version
- Creation
Follow the instructions here to obtain these credentials.
- Setup
- Get Facebook Parameter IDs
- Query Facebook Users for Different Location Types
- Query Facebook Users by Different Attributes
- Map Over Multiple Queries
library(rsocialwatcher)
library(dplyr)
# Define API version, creation act & token -------------------------------------
VERSION <- "[ENTER HERE]" # Example: "v19.0"
CREATION_ACT <- "[ENTER HERE]"
TOKEN <- "[ENTER HERE]"
# Get dataframe of Facebook parameter IDs and descriptions ---------------------
## Interests and behaviors
interests_df <- get_fb_parameter_ids("interests", VERSION, TOKEN)
behaviors_df <- get_fb_parameter_ids("behaviors", VERSION, TOKEN)
head(behaviors_df[,1:3])
#> id name type
#> 1 6002714895372 Frequent travellers behaviors
#> 2 6002714898572 Small business owners behaviors
#> 3 6002764392172 Facebook Payments users (90 days) behaviors
#> 4 6003808923172 Early technology adopters behaviors
#> 5 6003986707172 Facebook access (OS): Windows 7 behaviors
#> 6 6003966451572 Facebook access (OS): Mac OS X behaviors
## Locations: countries
country_df <- get_fb_parameter_ids("country", VERSION, TOKEN)
head(country_df)
#> key name type country_code supports_region supports_city
#> 1 AD Andorra country AD TRUE FALSE
#> 2 AE United Arab Emirates country AE TRUE TRUE
#> 3 AF Afghanistan country AF TRUE FALSE
#> 4 AG Antigua country AG TRUE FALSE
#> 5 AI Anguilla country AI TRUE FALSE
#> 6 AL Albania country AL TRUE FALSE
Example: Query Facebook users in US
us_key <- country_df |>
filter(name == "United States") |>
pull(key)
query_fb_marketing_api(
location_unit_type = "countries",
location_keys = us_key,
version = VERSION,
creation_act = CREATION_ACT,
token = TOKEN)
#> estimate_dau estimate_mau_lower_bound estimate_mau_upper_bound
#> 1 219899444 234900000 276400000
#> location_unit_type location_types location_keys gender age_min age_max
#> 1 countries home or recent US 1 or 2 18 65
#> api_call_time_utc
#> 1 2024-05-04 17:03:38
Example: Query Facebook users around specific location
query_fb_marketing_api(
location_unit_type = "coordinates",
lat_lon = c(40.712, -74.006),
radius = 5,
radius_unit = "kilometer",
version = VERSION,
creation_act = CREATION_ACT,
token = TOKEN)
#> estimate_dau estimate_mau_lower_bound estimate_mau_upper_bound
#> 1 1871425 2400000 2800000
#> location_unit_type location_types radius radius_unit gender age_min age_max
#> 1 coordinates home or recent 5 kilometer 1 or 2 18 65
#> latitude longitude api_call_time_utc
#> 1 40.712 -74.006 2024-05-04 17:03:38
Example: Location coordinates and, when available, geometries can be
obtained using the get_location_coords
function.
get_location_coords(
location_unit_type = "countries",
location_keys = c("US", "MX", "CA"),
version = VERSION,
token = TOKEN
)
#> Simple feature collection with 3 features and 7 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -179.2302 ymin: 14.53211 xmax: 179.8597 ymax: 83.11495
#> Geodetic CRS: WGS 84
#> key type name supports_city supports_region latitude longitude
#> 1 US country United States TRUE TRUE 40.00000 -100.0000
#> 2 MX country Mexico TRUE TRUE 23.31667 -102.3667
#> 3 CA country Canada TRUE TRUE 56.00000 -109.0000
#> geometry
#> 1 MULTIPOLYGON (((177.2906 52...
#> 2 MULTIPOLYGON (((-118.3256 2...
#> 3 MULTIPOLYGON (((-132.5786 5...
Example: In addition, when obtaining location IDs using the
query_fb_marketing_api
function, we can directly add
coordinates/geometries by setting the add_location_coords
to TRUE
.
get_fb_parameter_ids(
type = "region",
country_code = "US",
version = VERSION,
token = TOKEN,
add_location_coords = T) |>
head()
#> key name type country_code country_name supports_region
#> 1 3866 Minnesota region US United States TRUE
#> 2 3855 Idaho region US United States TRUE
#> 3 3856 Illinois region US United States TRUE
#> 4 3864 Massachusetts region US United States TRUE
#> 5 3846 Arkansas region US United States TRUE
#> 6 3886 Texas region US United States TRUE
#> supports_city latitude longitude geometry
#> 1 TRUE 46.0 -94.0 MULTIPOLYGON (((-97.1811 48...
#> 2 TRUE 45.0 -114.0 MULTIPOLYGON (((-117.0265 4...
#> 3 TRUE 40.0 -89.0 MULTIPOLYGON EMPTY
#> 4 TRUE 42.3 -71.8 MULTIPOLYGON EMPTY
#> 5 TRUE 34.8 -92.2 MULTIPOLYGON (((-94.26958 3...
#> 6 TRUE 31.0 -100.0 MULTIPOLYGON EMPTY
Facebook enables querying a specific location to determine a suggested
radius to reach enough people (see Facebook documentation
here).
We can use the get_fb_suggested_radius
function to get the suggested
radius. Below shows the querying the suggested radius for Paris, France
and Paris, Kentucky.
# Paris, France
get_fb_suggested_radius(location = c(48.856667, 2.352222),
version = VERSION,
token = TOKEN)
#> suggested_radius distance_unit
#> 1 1 kilometer
# Paris, Kentucky
get_fb_suggested_radius(location = c(38.209682, -84.253915),
version = VERSION,
token = TOKEN)
#> suggested_radius distance_unit
#> 1 25 kilometer
Example [One parameter]: Facebook users who primarily access Facebook using Mac OS X living in the US
beh_mac_id <- behaviors_df |>
filter(name == "Facebook access (OS): Mac OS X") |>
pull(id)
query_fb_marketing_api(
location_unit_type = "country",
location_keys = "US",
behaviors = beh_mac_id,
version = VERSION,
creation_act = CREATION_ACT,
token = TOKEN)
#> estimate_dau estimate_mau_lower_bound estimate_mau_upper_bound
#> 1 114053 138100 162500
#> location_unit_type location_types location_keys behaviors gender age_min
#> 1 countries home or recent US 6003966451572 1 or 2 18
#> age_max api_call_time_utc
#> 1 65 2024-05-04 17:03:51
Example [One parameter]: Facebook users who are likely technology early adopters
beh_tech_id <- behaviors_df |>
filter(name == "Early technology adopters") |>
pull(id)
query_fb_marketing_api(
location_unit_type = "country",
location_keys = "US",
behaviors = beh_tech_id,
version = VERSION,
creation_act = CREATION_ACT,
token = TOKEN)
#> estimate_dau estimate_mau_lower_bound estimate_mau_upper_bound
#> 1 13957411 14100000 16600000
#> location_unit_type location_types location_keys behaviors gender age_min
#> 1 countries home or recent US 6003808923172 1 or 2 18
#> age_max api_call_time_utc
#> 1 65 2024-05-04 17:03:52
Example [Two parameters, OR condition]: Facebook users who primarily access Facebook using Mac OS X OR who are likely technology early adopters who live in the US. Vectors of IDs are used to specify OR conditions.
query_fb_marketing_api(
location_unit_type = "country",
location_keys = "US",
behaviors = c(beh_mac_id, beh_tech_id),
version = VERSION,
creation_act = CREATION_ACT,
token = TOKEN)
#> estimate_dau estimate_mau_lower_bound estimate_mau_upper_bound
#> 1 14107933 14300000 16800000
#> location_unit_type location_types location_keys
#> 1 countries home or recent US
#> behaviors gender age_min age_max api_call_time_utc
#> 1 6003966451572 or 6003808923172 1 or 2 18 65 2024-05-04 17:03:53
Example [Two parameters, AND condition]: Facebook users who primarily access Facebook using Mac OS X AND who are likely technology early adopters who live in the US. Lists of IDs are used to specify AND conditions.
query_fb_marketing_api(
location_unit_type = "country",
location_keys = "US",
behaviors = list(beh_mac_id, beh_tech_id),
version = VERSION,
creation_act = CREATION_ACT,
token = TOKEN)
#> estimate_dau estimate_mau_lower_bound estimate_mau_upper_bound
#> 1 10883 10500 12400
#> location_unit_type location_types location_keys
#> 1 countries home or recent US
#> behaviors gender age_min age_max api_call_time_utc
#> 1 6003966451572 and 6003808923172 1 or 2 18 65 2024-05-04 17:03:55
Example [Two parameters types]: Across parameter types, AND conditions are used. The below example queries Facebook users who (1) primarily access Facebook using Mac OS X AND (2) who are likely technology early adopters AND (3) are interested in computers, who live in the US. The “flex_target” parameters can be used to specify OR conditions across parameters; see here for examples.
int_comp_id <- interests_df |>
filter(name == "Computers (computers & electronics)") |>
pull(id)
query_fb_marketing_api(
location_unit_type = "country",
location_keys = "US",
behaviors = list(beh_mac_id, beh_tech_id),
interests = int_comp_id,
version = VERSION,
creation_act = CREATION_ACT,
token = TOKEN)
#> estimate_dau estimate_mau_lower_bound estimate_mau_upper_bound
#> 1 6538 5900 6900
#> location_unit_type location_types location_keys interests
#> 1 countries home or recent US 6003404634364
#> behaviors gender age_min age_max api_call_time_utc
#> 1 6003966451572 and 6003808923172 1 or 2 18 65 2024-05-04 17:03:57
Putting parameters in the map_param
function results in the
query_fb_marketing_api
function making multiple queries.
Example: Make queries for different countries.
country_df |>
filter(name %in% c("United States", "Canada", "Mexico")) |>
pull(key)
#> [1] "CA" "MX" "US"
query_fb_marketing_api(
location_unit_type = "country",
location_keys = map_param("US", "CA", "MX"),
behaviors = c(beh_mac_id, beh_tech_id),
interests = int_comp_id,
version = VERSION,
creation_act = CREATION_ACT,
token = TOKEN)
#> estimate_dau estimate_mau_lower_bound estimate_mau_upper_bound
#> 1 8388414 7700000 9100000
#> 2 886291 834100 981300
#> 3 2438417 2200000 2600000
#> location_unit_type location_types location_keys interests
#> 1 countries home or recent US 6003404634364
#> 2 countries home or recent CA 6003404634364
#> 3 countries home or recent MX 6003404634364
#> behaviors gender age_min age_max api_call_time_utc
#> 1 6003966451572 or 6003808923172 1 or 2 18 65 2024-05-04 17:03:58
#> 2 6003966451572 or 6003808923172 1 or 2 18 65 2024-05-04 17:03:59
#> 3 6003966451572 or 6003808923172 1 or 2 18 65 2024-05-04 17:03:59
Example: Make queries for different and behaviors. In total, six queries are made (mapping over three countries and two parameters).
query_fb_marketing_api(
location_unit_type = "country",
location_keys = map_param("US", "CA", "MX"),
behaviors = map_param(beh_mac_id, beh_tech_id),
interests = int_comp_id,
version = VERSION,
creation_act = CREATION_ACT,
token = TOKEN)
#> estimate_dau estimate_mau_lower_bound estimate_mau_upper_bound
#> 1 61908 58200 68500
#> 2 15795 15000 17700
#> 3 22012 20900 24500
#> 4 8311533 7700000 9000000
#> 5 869378 817800 962100
#> 6 2438287 2200000 2600000
#> location_unit_type location_types location_keys interests behaviors
#> 1 countries home or recent US 6003404634364 6003966451572
#> 2 countries home or recent CA 6003404634364 6003966451572
#> 3 countries home or recent MX 6003404634364 6003966451572
#> 4 countries home or recent US 6003404634364 6003808923172
#> 5 countries home or recent CA 6003404634364 6003808923172
#> 6 countries home or recent MX 6003404634364 6003808923172
#> gender age_min age_max api_call_time_utc
#> 1 1 or 2 18 65 2024-05-04 17:04:00
#> 2 1 or 2 18 65 2024-05-04 17:04:00
#> 3 1 or 2 18 65 2024-05-04 17:04:01
#> 4 1 or 2 18 65 2024-05-04 17:04:03
#> 5 1 or 2 18 65 2024-05-04 17:04:04
#> 6 1 or 2 18 65 2024-05-04 17:04:04
Example: Make query for each country, for:
- Those that access Facebook using Mac OS X OR who are likely technology early adopters
- Those that access Facebook using Mac OS X AND who are likely technology early adopters
The below illustrates how we can make complex queries (ie, using AND and
OR) conditions within map_param()
query_fb_marketing_api(
location_unit_type = "country",
location_keys = map_param("US", "CA", "MX"),
behaviors = map_param(c(beh_mac_id, beh_tech_id), # OR condition
list(beh_mac_id, beh_tech_id)), # AND condition
interests = int_comp_id,
version = VERSION,
creation_act = CREATION_ACT,
token = TOKEN)
#> estimate_dau estimate_mau_lower_bound estimate_mau_upper_bound
#> 1 8388414 7700000 9100000
#> 2 886291 834100 981300
#> 3 2438417 2200000 2600000
#> 4 6538 5900 6900
#> 5 1391 1300 1500
#> 6 1699 1500 1700
#> location_unit_type location_types location_keys interests
#> 1 countries home or recent US 6003404634364
#> 2 countries home or recent CA 6003404634364
#> 3 countries home or recent MX 6003404634364
#> 4 countries home or recent US 6003404634364
#> 5 countries home or recent CA 6003404634364
#> 6 countries home or recent MX 6003404634364
#> behaviors gender age_min age_max api_call_time_utc
#> 1 6003966451572 or 6003808923172 1 or 2 18 65 2024-05-04 17:04:05
#> 2 6003966451572 or 6003808923172 1 or 2 18 65 2024-05-04 17:04:05
#> 3 6003966451572 or 6003808923172 1 or 2 18 65 2024-05-04 17:04:05
#> 4 6003966451572 and 6003808923172 1 or 2 18 65 2024-05-04 17:04:06
#> 5 6003966451572 and 6003808923172 1 or 2 18 65 2024-05-04 17:04:07
#> 6 6003966451572 and 6003808923172 1 or 2 18 65 2024-05-04 17:04:07
Example: Make queries using vector as input. Below, we want to make a separate query for six countries. We define the following vector:
countries <- c("US", "CA", "MX", "FR", "GB", "ES")
However, for the below:
location_keys = map_param(countries)
map_param()
views countries
as one item (a vector of countries), so
will make just 1 query—querying the number of MAU/DAU across countries.
To make a query for each item in the vector, we use map_param_vec()
.
Incorrect attempt to making query for each country
countries <- c("US", "CA", "MX", "FR", "GB", "ES")
# INCORRECT: The below will make 1 query, querying the number of MAU/DAU across the six countries. The function inteprets the input as the number of Facebook users in the US or Canada or Mexico, etc.
query_fb_marketing_api(
location_unit_type = "country",
location_keys = map_param(countries),
version = VERSION,
creation_act = CREATION_ACT,
token = TOKEN)
#> estimate_dau estimate_mau_lower_bound estimate_mau_upper_bound
#> 1 450330182 484900000 570500000
#> location_unit_type location_types location_keys gender
#> 1 countries home or recent US or CA or MX or FR or GB or ES 1 or 2
#> age_min age_max api_call_time_utc
#> 1 18 65 2024-05-04 17:04:07
Incorrect approach to make query for each country
countries <- c("US", "CA", "MX", "FR", "GB", "ES")
# CORRECT: The below will make 6 queries, one for each country.
query_fb_marketing_api(
location_unit_type = "country",
location_keys = map_param_vec(countries),
version = VERSION,
creation_act = CREATION_ACT,
token = TOKEN)
#> estimate_dau estimate_mau_lower_bound estimate_mau_upper_bound
#> 1 219899444 234900000 276400000
#> 2 26595379 27700000 32600000
#> 3 88476855 95600000 112400000
#> 4 37889724 39700000 46800000
#> 5 46496564 46800000 55100000
#> 6 28909238 30700000 36100000
#> location_unit_type location_types location_keys gender age_min age_max
#> 1 countries home or recent US 1 or 2 18 65
#> 2 countries home or recent CA 1 or 2 18 65
#> 3 countries home or recent MX 1 or 2 18 65
#> 4 countries home or recent FR 1 or 2 18 65
#> 5 countries home or recent GB 1 or 2 18 65
#> 6 countries home or recent ES 1 or 2 18 65
#> api_call_time_utc
#> 1 2024-05-04 17:04:08
#> 2 2024-05-04 17:04:08
#> 3 2024-05-04 17:04:11
#> 4 2024-05-04 17:04:11
#> 5 2024-05-04 17:04:12
#> 6 2024-05-04 17:04:13
The Facebook API is rate limited, where only a certain number of queries
can be made in a given time. If the rate limit is reached,
query_fb_marketing_api
will pause then try the query until it is
successfully called. query_fb_marketing_api
can take a long time to
complete if mapping over a large number of queries.
Multiple API tokens can be used to minimize delay times from the
function reaching its rate limit. To use multiple tokens, enter a vector
with multiple entries for version
, creation_act
, and token
.
Example: Using multiple API tokens
# We only have 1 token, but we'll pretend we have three
TOKEN_1 <- TOKEN
TOKEN_2 <- TOKEN
TOKEN_3 <- TOKEN
VERSION_1 <- VERSION
VERSION_2 <- VERSION
VERSION_3 <- VERSION
CREATION_ACT_1 <- CREATION_ACT
CREATION_ACT_2 <- CREATION_ACT
CREATION_ACT_3 <- CREATION_ACT
# Make query
query_fb_marketing_api(
location_unit_type = "country",
location_keys = map_param("US", "CA", "MX", "GB", "FR", "DE", "IT"),
behaviors = c(beh_mac_id, beh_tech_id),
interests = int_comp_id,
version = c(VERSION_1, VERSION_2, VERSION_3) ,
creation_act = c(CREATION_ACT_1, CREATION_ACT_2, CREATION_ACT_3),
token = c(TOKEN_1, TOKEN_2, TOKEN_3) )
#> estimate_dau estimate_mau_lower_bound estimate_mau_upper_bound
#> 1 8388414 7700000 9100000
#> 2 886291 834100 981300
#> 3 2438417 2200000 2600000
#> 4 942296 890600 1000000
#> 5 644377 597500 703000
#> 6 672929 626500 737000
#> 7 667252 599300 705000
#> location_unit_type location_types location_keys interests
#> 1 countries home or recent US 6003404634364
#> 2 countries home or recent CA 6003404634364
#> 3 countries home or recent MX 6003404634364
#> 4 countries home or recent GB 6003404634364
#> 5 countries home or recent FR 6003404634364
#> 6 countries home or recent DE 6003404634364
#> 7 countries home or recent IT 6003404634364
#> behaviors gender age_min age_max api_call_time_utc
#> 1 6003966451572 or 6003808923172 1 or 2 18 65 2024-05-04 17:04:13
#> 2 6003966451572 or 6003808923172 1 or 2 18 65 2024-05-04 17:04:14
#> 3 6003966451572 or 6003808923172 1 or 2 18 65 2024-05-04 17:04:14
#> 4 6003966451572 or 6003808923172 1 or 2 18 65 2024-05-04 17:04:15
#> 5 6003966451572 or 6003808923172 1 or 2 18 65 2024-05-04 17:04:15
#> 6 6003966451572 or 6003808923172 1 or 2 18 65 2024-05-04 17:04:16
#> 7 6003966451572 or 6003808923172 1 or 2 18 65 2024-05-04 17:04:16
The below table summarizes different ways parameters can be entered into
the query_fb_marketing_api
for different purposes. The table uses
output from the following code.
behaviors_df <- get_fb_parameter_ids("behaviors", VERSION, TOKEN)
beh_mac_id <- behaviors_df |>
filter(name == "Facebook access (OS): Mac OS X") |>
pull(id)
beh_tech_id <- behaviors_df |>
filter(name == "Early technology adopters") |>
pull(id)
beh_ids <- c(beh_mac_id, beh_tech_id)
Method | Function | Example input in query_fb_marketing_api(behaviors = [], ...) |
Description |
---|---|---|---|
Or condition | c() |
c(beh_mac_id, beh_tech_id) |
Facebook users with beh_mac_id OR beh_tech_id behaviors |
And condition | list() |
list(beh_mac_id, beh_tech_id) |
Facebook users with beh_mac_id AND beh_tech_id behaviors |
Two queries [Way 1] | map_param() |
map_param(beh_mac_id, beh_tech_id) |
One query for Facebook users with beh_mac_id ; second query for beh_tech_id |
Two queries [Way 2] | map_param_vec() |
map_param_vec(beh_ids) |
One query for Facebook users with beh_mac_id ; second query for beh_tech_id |
See this vignette for additional information and examples illustrating how to use the package.