Analyzing FHIR Data in a Tabular Format With R

Not yet updated for 2026
Roles: Informaticist
Learning objectives
  1. Understand the high-level approaches for converting FHIR-formatted data into tabular for analysis in R.
  2. Learn how the fhircrackr library facilitates requesting data from a FHIR server, and creating tidy tabular data tables.

Data analysis approaches in R typically uses data frames to store tabular data. There are two primary approaches to loading FHIR-formatted data into Pandas DataFrames:

  1. Writing R code to manually convert FHIR instances in JSON format into data frames.

  2. Using a purpose-built library like fhircrackr to automatically convert FHIR instances into DataFrames.

    It is recommended to try this approach first. If it is not possible to use fhircrackr for your use case, it may be easier to convert the data from FHIR to tabular format using Python and then export it to R format compared to doing this completely in R. The Reticulate package may facilitate this by allowing Python and R code to share data objects within RStduio.

To use fhircrackr, you will need a R runtime with fhircrackr installed. Typically R users work in the RStudio IDE but this is not strictly necessary.

1 FHIR testing server

The examples in this module use a FHIR testing server populated with Synthea data in FHIR R4 format via public HAPI Test Server operated by HAPI FHIR.

The endpoint for this testing server is:

https://hapi.fhir.org/baseR4

However, any FHIR server loaded with testing data can be used. See Standing up a FHIR Testing Server for instructions to set up your own test server.

The code blocks in the following section show sample output immediately after. This is similar to the code blocks and results in a rendered RMarkdown file.

2 Retrieving FHIR data

Once your environment is set up, you can run the following R code to retrieve instances of the Patient resource from a test server:

# Load dependencies
library(fhircrackr)
library(tidyverse) # Not strictly necessary, but helpful for working with data in R
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.1     ✔ tibble    3.3.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Define the URL of the FHIR server and the request that will be made
# Add a search parameter for Synthea-specific identifier system
request <- fhir_url(
    url = "https://hapi.fhir.org/baseR4",
    resource = "Patient",
    parameters = c(
        identifier = "https://github.com/synthetichealth/synthea|"
    )
)

# Perform the request
patient_bundle <- fhir_search(request = request, max_bundles = 1, verbose = 0)

# This method defines the mapping from FHIR to data frame columns.
# If the `cols` argument is omitted, all data elements will be included in the data frame.
table_desc_patient <- fhir_table_description(
    resource = "Patient"
)

# Convert to R data frame
df_patient <- fhir_crack(bundles = patient_bundle, design = table_desc_patient, verbose = 0)

# Exclude `photo.data`, which is a base64-encoded photo and doesn't display well in dataframes
df_patient <- df_patient %>% select(-any_of("photo.data"))

df_patient %>% head(5)

It is easier to see the contents of this DataFrame by printing out its first row vertically:

df_patient[1, ] %>% t()
                                         1                                                                                                                                                                                                                             
address.city                             "Emporia"                                                                                                                                                                                                                     
address.country                          "US"                                                                                                                                                                                                                          
address.extension                        "http://hl7.org/fhir/StructureDefinition/geolocation"                                                                                                                                                                         
address.extension.extension              "latitude:::longitude"                                                                                                                                                                                                        
address.extension.extension.valueDecimal "38.37796654358168:::-96.17060814119407"                                                                                                                                                                                      
address.line                             "633 Abernathy Landing"                                                                                                                                                                                                       
address.postalCode                       "66801"                                                                                                                                                                                                                       
address.state                            "KS"                                                                                                                                                                                                                          
birthDate                                "1927-05-21"                                                                                                                                                                                                                  
communication.language.coding.code       "en-US"                                                                                                                                                                                                                       
communication.language.coding.display    "English (United States)"                                                                                                                                                                                                     
communication.language.coding.system     "urn:ietf:bcp:47"                                                                                                                                                                                                             
communication.language.text              "English (United States)"                                                                                                                                                                                                     
deceasedDateTime                         "1989-05-09T20:35:22-04:00"                                                                                                                                                                                                   
extension                                NA                                                                                                                                                                                                                            
extension.extension                      NA                                                                                                                                                                                                                            
extension.extension.valueCoding.code     NA                                                                                                                                                                                                                            
extension.extension.valueCoding.display  NA                                                                                                                                                                                                                            
extension.extension.valueCoding.system   NA                                                                                                                                                                                                                            
extension.extension.valueString          NA                                                                                                                                                                                                                            
extension.valueAddress.city              NA                                                                                                                                                                                                                            
extension.valueAddress.country           NA                                                                                                                                                                                                                            
extension.valueAddress.state             NA                                                                                                                                                                                                                            
extension.valueCode                      NA                                                                                                                                                                                                                            
extension.valueDecimal                   NA                                                                                                                                                                                                                            
extension.valueString                    NA                                                                                                                                                                                                                            
gender                                   "female"                                                                                                                                                                                                                      
id                                       "129c6ac7-8d06-89de-ad63-0204a93e76c3"                                                                                                                                                                                        
identifier.system                        "https://github.com/synthetichealth/synthea:::http://hospital.smarthealthit.org:::http://hl7.org/fhir/sid/us-ssn:::urn:oid:2.16.840.1.113883.4.3.25:::http://standardhealthrecord.org/fhir/StructureDefinition/passportNumber"
identifier.type.coding.code              "MR:::SS:::DL:::PPN"                                                                                                                                                                                                          
identifier.type.coding.display           "Medical Record Number:::Social Security Number:::Driver's license number:::Passport Number"                                                                                                                                  
identifier.type.coding.system            "http://terminology.hl7.org/CodeSystem/v2-0203:::http://terminology.hl7.org/CodeSystem/v2-0203:::http://terminology.hl7.org/CodeSystem/v2-0203:::http://terminology.hl7.org/CodeSystem/v2-0203"                               
identifier.type.text                     "Medical Record Number:::Social Security Number:::Driver's license number:::Passport Number"                                                                                                                                  
identifier.value                         "129c6ac7-8d06-89de-ad63-0204a93e76c3:::129c6ac7-8d06-89de-ad63-0204a93e76c3:::999-94-5397:::S99940903:::X53631011X"                                                                                                          
maritalStatus.coding.code                "M"                                                                                                                                                                                                                           
maritalStatus.coding.display             "Married"                                                                                                                                                                                                                     
maritalStatus.coding.system              "http://terminology.hl7.org/CodeSystem/v3-MaritalStatus"                                                                                                                                                                      
maritalStatus.text                       "Married"                                                                                                                                                                                                                     
meta.lastUpdated                         "2026-02-27T02:21:05.552+00:00"                                                                                                                                                                                               
meta.profile                             NA                                                                                                                                                                                                                            
meta.source                              "#yGzIftcT7wgU5qdC"                                                                                                                                                                                                           
meta.tag.code                            NA                                                                                                                                                                                                                            
meta.tag.system                          NA                                                                                                                                                                                                                            
meta.versionId                           "3"                                                                                                                                                                                                                           
multipleBirthBoolean                     "false"                                                                                                                                                                                                                       
name.family                              "Medhurst46:::Cummerata161"                                                                                                                                                                                                   
name.given                               "Sumiko254:::Larue605:::Sumiko254:::Larue605"                                                                                                                                                                                 
name.prefix                              "Mrs.:::Mrs."                                                                                                                                                                                                                 
name.suffix                              NA                                                                                                                                                                                                                            
name.use                                 "official:::maiden"                                                                                                                                                                                                           
telecom.system                           "phone"                                                                                                                                                                                                                       
telecom.use                              "home"                                                                                                                                                                                                                        
telecom.value                            "555-810-7203"                                                                                                                                                                                                                
text.status                              "generated"                                                                                                                                                                                                                   

If you look at the output above, you can see fhircrackr collapsed the hierarchical FHIR data structure into data frame columns, with multiple values delimited by ::: by default. For example, Patient.identifier has multiple values that appear in the data frame as:

Column name Example Values
identifier.type.text Medical Record Number:::Social Security Number:::Driver's License:::Passport Number
identifier.value bf23e283-4791-46e1-9d79-9e0ad9edd436:::bf23e283-4791-46e1-9d79-9e0ad9edd436:::999-21-6325:::S99948444:::X30821805X

Splitting up these values is discussed below.

3 Selecting specific columns

Usually not every single value from a FHIR instance is needed for analysis. There are two ways to get a more concise data frame:

  1. Use the approach above to load all elements into a data frame, remove the unneeded columns, and rename the remaining columns as needed.
  2. Use XPath to select specific elements and map them onto column names.

The second approach is typically more concise. For example, to generate a DataFrame like this…

id gender date_of_birth marital_status

…you could use the following code:

table_desc_patient <- fhir_table_description(
    resource = "Patient",
    cols = c(
        id = "id",
        gender = "gender",
        date_of_birth = "birthDate",
        # Rather than having fhircrackr concatenate all `Patient.maritalStatus` values
        # into one cell, you can select a specific value with XPath:
        marital_status = "maritalStatus/coding/code"
    )
)

df_patient <- fhir_crack(bundles = patient_bundle, design = table_desc_patient, verbose = 0)

df_patient %>% head(5)

While XPath expressions can be quite complex, thier use in fhircrackr is often straight-forward. Nested elements are separated with /, and elements with multiple sub-values are identified by [N] where N is an integer starting at 1.

There are two approaches to identifying element paths to construct XPath expressions:

  1. Look at the FHIR specification or the relevant FHIR Implementation Guide to determine the paths of available data elements. For example, the Patient page in the FHIR specification describes the elements and their hierarchy for instances of Patient.

  2. Print out the raw data returned by the FHIR server. Fhircrackr uses XML-formatted data, and the following code will print out one of the instances of Patient requested above:

resource <- xml2::xml_find_first(x = patient_bundle[[1]], xpath = "./entry[1]/resource")
xml2::xml_find_all(resource, ".//photo") %>% xml2::xml_remove() # Suppress the photo element - prints out a bunch of base64-encoded data that makes it hard to read the rest of the XML
resource %>% paste0() %>% cat()
<resource>
  <Patient>
    <id value="129c6ac7-8d06-89de-ad63-0204a93e76c3"/>
    <meta>
      <versionId value="3"/>
      <lastUpdated value="2026-02-27T02:21:05.552+00:00"/>
      <source value="#yGzIftcT7wgU5qdC"/>
    </meta>
    <text>
      <status value="generated"/>
    </text>
    <identifier>
      <system value="https://github.com/synthetichealth/synthea"/>
      <value value="129c6ac7-8d06-89de-ad63-0204a93e76c3"/>
    </identifier>
    <identifier>
      <type>
        <coding>
          <system value="http://terminology.hl7.org/CodeSystem/v2-0203"/>
          <code value="MR"/>
          <display value="Medical Record Number"/>
        </coding>
        <text value="Medical Record Number"/>
      </type>
      <system value="http://hospital.smarthealthit.org"/>
      <value value="129c6ac7-8d06-89de-ad63-0204a93e76c3"/>
    </identifier>
    <identifier>
      <type>
        <coding>
          <system value="http://terminology.hl7.org/CodeSystem/v2-0203"/>
          <code value="SS"/>
          <display value="Social Security Number"/>
        </coding>
        <text value="Social Security Number"/>
      </type>
      <system value="http://hl7.org/fhir/sid/us-ssn"/>
      <value value="999-94-5397"/>
    </identifier>
    <identifier>
      <type>
        <coding>
          <system value="http://terminology.hl7.org/CodeSystem/v2-0203"/>
          <code value="DL"/>
          <display value="Driver's license number"/>
        </coding>
        <text value="Driver's license number"/>
      </type>
      <system value="urn:oid:2.16.840.1.113883.4.3.25"/>
      <value value="S99940903"/>
    </identifier>
    <identifier>
      <type>
        <coding>
          <system value="http://terminology.hl7.org/CodeSystem/v2-0203"/>
          <code value="PPN"/>
          <display value="Passport Number"/>
        </coding>
        <text value="Passport Number"/>
      </type>
      <system value="http://standardhealthrecord.org/fhir/StructureDefinition/passportNumber"/>
      <value value="X53631011X"/>
    </identifier>
    <name>
      <use value="official"/>
      <family value="Medhurst46"/>
      <given value="Sumiko254"/>
      <given value="Larue605"/>
      <prefix value="Mrs."/>
    </name>
    <name>
      <use value="maiden"/>
      <family value="Cummerata161"/>
      <given value="Sumiko254"/>
      <given value="Larue605"/>
      <prefix value="Mrs."/>
    </name>
    <telecom>
      <system value="phone"/>
      <value value="555-810-7203"/>
      <use value="home"/>
    </telecom>
    <gender value="female"/>
    <birthDate value="1927-05-21"/>
    <deceasedDateTime value="1989-05-09T20:35:22-04:00"/>
    <address>
      <extension url="http://hl7.org/fhir/StructureDefinition/geolocation">
        <extension url="latitude">
          <valueDecimal value="38.37796654358168"/>
        </extension>
        <extension url="longitude">
          <valueDecimal value="-96.17060814119407"/>
        </extension>
      </extension>
      <line value="633 Abernathy Landing"/>
      <city value="Emporia"/>
      <state value="KS"/>
      <postalCode value="66801"/>
      <country value="US"/>
    </address>
    <maritalStatus>
      <coding>
        <system value="http://terminology.hl7.org/CodeSystem/v3-MaritalStatus"/>
        <code value="M"/>
        <display value="Married"/>
      </coding>
      <text value="Married"/>
    </maritalStatus>
    <multipleBirthBoolean value="false"/>
    <communication>
      <language>
        <coding>
          <system value="urn:ietf:bcp:47"/>
          <code value="en-US"/>
          <display value="English (United States)"/>
        </coding>
        <text value="English (United States)"/>
      </language>
    </communication>
  </Patient>
</resource>

In some cases, you may need to construct more complex expressions like the one to extract marital_status from Patient.maritalStatus.coding.code. You can use a tool like this XPath tester to help generate XPath expressions, though online tools such as these should not be used with real patient data. For more information on XPath, see this guide.

4 Elements with multiple sub-values

There are multiple identifier[N].value values for each instance of Patient in this dataset. By default, fhircrackr will concatenate these into a single cell per row, delimited with ::: (this is configurable; use fhir_table_description(..., sep = ' | ', ...) to delimit with | instead).

Fhircrackr provides some tools to split up multiple values stored in the same cell into separate rows in a “long” data frame:

table_desc_patient <- fhir_table_description(
    resource = "Patient",

    # Prefix values in cells with indices to facilitate handling cells that contain
    # multiple values
    brackets = c("[", "]")
)

df_patient_indexed <- fhir_crack(bundles = patient_bundle, design = table_desc_patient, verbose = 0)

df_patient_identifiers <- fhir_melt(
    indexed_data_frame = df_patient_indexed,
    columns = c("identifier.type.text", "identifier.value"),
    brackets = c("[", "]"),
    sep = ":::",
    all_columns = FALSE
)

df_patient_identifiers %>% head(10)

The df_patient_identifiers data frame printed above has one row for each value of Patient.identifier for each instance of Patient. The in-cell indices (surrounded by [ ]) can be removed:

df_patient_identifiers <- fhir_rm_indices(indexed_data_frame = df_patient_identifiers, brackets = c("[", "]"))

df_patient_identifiers %>% head(10)

These can then be merged back into the original data frame as needed. For example, if you want to include the synthetic “Social Security Number” in the original data:

df_patient %>%
    # Add in row numbers for joining
    mutate(
        row_number = row_number()
    ) %>%
    left_join(
        df_patient_identifiers %>%
            # Note: this assumes there is just one social security number for each patient in the data.
            # If this was not true, it would be necessary to remove extra data before joining so there
            # was one row per patient.
            filter(`identifier.type.text` == "Social Security Number") %>%
            rename(
                "ssn" = "identifier.value"
            ) %>%
            # Exclude the `identifier.type.text` column so it doesn't appear in the joined data frame
            select(resource_identifier, ssn) %>%
            # Fhircrackr generates the `resource_identifier` column as a string, but it needs to be
            # an integer for joining.
            mutate(resource_identifier = as.integer(resource_identifier)),
        by = c("row_number" = "resource_identifier")
    ) %>%
    head(5)

You can see that the synthetic SSNs are now split out into a separate column.