Analyzing FHIR Data in a Tabular Format With Python

Not yet updated for 2026
Roles: Informaticist
Learning objectives
  1. Understand the high-level approaches for converting FHIR-formatted data into tabular for analysis in Python.
  2. Learn how the FHIR-PYrate library facilitates requesting data from a FHIR server, and creating tidy tabular data tables.

Data analysis approaches in Python often use Pandas DataFrames to store tabular data. There are two primary approaches to loading FHIR-formatted data into Pandas DataFrames:

  1. Writing Python code to manually convert FHIR instances in JSON format into DataFrames.

    This does not require any special skills beyond data manipulation in Python, but in practice can be laborious (especially with large number of data elements) and prone to bugs.

  2. Using a purpose-built library like FHIR-PYrate to automatically convert FHIR instances into DataFrames.

    It is recommended to try this approach first, and only fall back to (1) if needed.

To use FHIR-PYrate, you will need a Python 3 runtime with FHIR-PYrate and Pandas installed.

1 FHIR testing server

The examples in this module use a FHIR testing server populated with Synthea data in FHIR R4 format via public HAPI Test Server operated by HAPI FHIR.

The endpoint for this testing server is:

https://hapi.fhir.org/baseR4

However, any FHIR server loaded with testing data can be used. See Standing up a FHIR Testing Server for instructions to set up your own test server.

The code blocks in the following section show sample output immediately after. This is similar to the code cells and results in a Jupyter notebook.

2 Retrieving FHIR data

Once your environment is set up, you can run the following Python code to retrieve instances of the Patient resource from a test server:

# Load dependencies
from fhir_pyrate import Pirate
import pandas as pd

# Instantiate a Pirate object using the FHIR-PYrate library to query a test FHIR server
search = Pirate(
    auth=None,
    base_url="https://hapi.fhir.org/baseR4",
    print_request_url=True,
)

# Use the whimsically named `steal_bundles()` method to instantiate a search interaction
#
# For more information, see https://github.com/UMEssen/FHIR-PYrate/#pirate
bundles = search.steal_bundles(
    resource_type="Patient",
    request_params={
        "_count": 10,  # Get 10 instances per page
        "identifier": "https://github.com/synthetichealth/synthea|",
    },
    num_pages=1,  # Get 1 page (so a total of 10 instances)
)

# Execute the search and convert to a Pandas DataFrame
df = search.bundles_to_dataframe(bundles)

df.head(5)
https://hapi.fhir.org/baseR4/Patient?_count=10&identifier=https://github.com/synthetichealth/synthea|
Query (Patient):   0%|          | 0/1 [00:00<?, ?it/s]Query (Patient): 100%|██████████| 1/1 [00:00<00:00, 1323.96it/s]
resourceType id meta_versionId meta_lastUpdated meta_source meta_tag_0_system meta_tag_0_code meta_tag_0_display identifier_0_system identifier_0_value ... identifier_5_type_coding_0_display identifier_5_type_text identifier_5_system identifier_5_value name_0_use name_0_family name_0_given_0 gender birthDate identifier_3_use
0 Patient 129c6ac7-8d06-89de-ad63-0204a93e76c3 4 2026-06-06T10:10:02.293+00:00 #pOh0yLEonU9VLYRE http://terminology.hl7.org/CodeSystem/v3-Obser... SUBSETTED Resource encoded in summary mode https://github.com/synthetichealth/synthea 129c6ac7-8d06-89de-ad63-0204a93e76c3 ... Medical Record Number MRN http://hospital.example.org/mrn 9447890438 official Soto Lisa female 1970-07-10 NaN
1 Patient 3af3708d-41f1-cd80-f3dd-ec5ac76072bf 4 2026-06-06T10:10:03.292+00:00 #AUTEsfW29X5LJZgx http://terminology.hl7.org/CodeSystem/v3-Obser... SUBSETTED Resource encoded in summary mode https://github.com/synthetichealth/synthea 3af3708d-41f1-cd80-f3dd-ec5ac76072bf ... NaN NaN NaN NaN official Bryant Wendy female 2024-12-13 official
2 Patient 63ee2253-bdd5-da55-2ad2-b4984d0ad700 4 2026-06-06T10:10:04.292+00:00 #XM3TN7SS38yyuh01 http://terminology.hl7.org/CodeSystem/v3-Obser... SUBSETTED Resource encoded in summary mode https://github.com/synthetichealth/synthea 63ee2253-bdd5-da55-2ad2-b4984d0ad700 ... NaN NaN NaN NaN official Brown Colleen female 2019-05-14 official
3 Patient 6a4160eb-a793-2f86-2302-378626f46cce 4 2026-06-06T10:10:05.292+00:00 #rKvC3vdBqV5HtCMp http://terminology.hl7.org/CodeSystem/v3-Obser... SUBSETTED Resource encoded in summary mode https://github.com/synthetichealth/synthea 6a4160eb-a793-2f86-2302-378626f46cce ... Medical Record Number MRN http://hospital.example.org/mrn 6387718521 official Barker Brandon male 2020-02-23 NaN
4 Patient 79a66c97-6131-3213-f3c9-4606946ab056 4 2026-06-06T10:10:06.293+00:00 #6uDlmVC65TsIt3tg http://terminology.hl7.org/CodeSystem/v3-Obser... SUBSETTED Resource encoded in summary mode https://github.com/synthetichealth/synthea 79a66c97-6131-3213-f3c9-4606946ab056 ... Medical Record Number MRN http://hospital.example.org/mrn 3877816703 official Hensley Melissa female 1995-06-12 NaN

5 rows × 47 columns

It is easier to see the contents of this DataFrame by printing out its first row vertically:

# Print the first row of the DataFrame vertically for easier reading.
pd.set_option("display.max_rows", 100)  # Show all rows
df.head(1).T
0
resourceType Patient
id 129c6ac7-8d06-89de-ad63-0204a93e76c3
meta_versionId 4
meta_lastUpdated 2026-06-06T10:10:02.293+00:00
meta_source #pOh0yLEonU9VLYRE
meta_tag_0_system http://terminology.hl7.org/CodeSystem/v3-Obser...
meta_tag_0_code SUBSETTED
meta_tag_0_display Resource encoded in summary mode
identifier_0_system https://github.com/synthetichealth/synthea
identifier_0_value 129c6ac7-8d06-89de-ad63-0204a93e76c3
identifier_1_type_coding_0_system http://terminology.hl7.org/CodeSystem/v2-0203
identifier_1_type_coding_0_code MR
identifier_1_type_coding_0_display Medical Record Number
identifier_1_type_text Medical Record Number
identifier_1_system http://hospital.smarthealthit.org
identifier_1_value 129c6ac7-8d06-89de-ad63-0204a93e76c3
identifier_2_type_coding_0_system http://terminology.hl7.org/CodeSystem/v2-0203
identifier_2_type_coding_0_code SS
identifier_2_type_coding_0_display Social Security Number
identifier_2_type_text Social Security Number
identifier_2_system http://hl7.org/fhir/sid/us-ssn
identifier_2_value 999-94-5397
identifier_3_type_coding_0_system http://terminology.hl7.org/CodeSystem/v2-0203
identifier_3_type_coding_0_code DL
identifier_3_type_coding_0_display Driver's license number
identifier_3_type_text Driver's license number
identifier_3_system urn:oid:2.16.840.1.113883.4.3.25
identifier_3_value S99940903
identifier_4_type_coding_0_system http://terminology.hl7.org/CodeSystem/v2-0203
identifier_4_type_coding_0_code PPN
identifier_4_type_coding_0_display Passport Number
identifier_4_type_text Passport Number
identifier_4_system http://standardhealthrecord.org/fhir/Structure...
identifier_4_value X53631011X
identifier_5_use official
identifier_5_type_coding_0_system http://terminology.hl7.org/CodeSystem/v2-0203
identifier_5_type_coding_0_code MR
identifier_5_type_coding_0_display Medical Record Number
identifier_5_type_text MRN
identifier_5_system http://hospital.example.org/mrn
identifier_5_value 9447890438
name_0_use official
name_0_family Soto
name_0_given_0 Lisa
gender female
birthDate 1970-07-10
identifier_3_use NaN

If you look at the output above, you can see FHIR-PYrate collapsed the hierarchical FHIR data structure into DataFrame columns. FHIR-PYrate does this by taking an element from the FHIR-formatted data like Patient.identifier[0].value and converting to an underscore-delimited column name like identifier_0_value. (Note that Patient.identifier has multiple values in the FHIR data, so there are multiple identifier_N_... columns in the DataFrame.)

3 Selecting specific columns

Usually not every single value from a FHIR instance is needed for analysis. There are two ways to get a more concise DataFrame:

  1. Use the approach above to load all elements into a DataFrame, remove the unneeded columns, and rename the remaining columns as needed. The process_function capability in FHIR-PYrate allows you to integrate this approach into the bundles_to_dataframe() method call.
  2. Use FHIRPath to select specific elements and map them onto column names.

The second approach is typically more concise. For example, to generate a DataFrame like this…

id gender date_of_birth marital_status

…you could use the following code:

# Instantiate and perform the FHIR search interaction in a single function call
df = search.steal_bundles_to_dataframe(
    resource_type="Patient",
    request_params={
        "_count": 10,  # Get 10 instances per page
        "identifier": "https://github.com/synthetichealth/synthea|",
    },
    num_pages=1,  # Get 1 page (so a total of 10 instances)
    fhir_paths=[
        ("id", "identifier[0].value"),
        ("gender", "gender"),
        ("date_of_birth", "birthDate"),
        ("marital_status", "maritalStatus.coding[0].code"),
    ],
)
df
https://hapi.fhir.org/baseR4/Patient?_count=10&identifier=https://github.com/synthetichealth/synthea|
Query & Build DF (Patient):   0%|          | 0/1 [00:00<?, ?it/s]Query & Build DF (Patient): 100%|██████████| 1/1 [00:00<00:00, 376.00it/s]
id gender date_of_birth
0 129c6ac7-8d06-89de-ad63-0204a93e76c3 female 1970-07-10
1 3af3708d-41f1-cd80-f3dd-ec5ac76072bf female 2024-12-13
2 63ee2253-bdd5-da55-2ad2-b4984d0ad700 female 2019-05-14
3 6a4160eb-a793-2f86-2302-378626f46cce male 2020-02-23
4 79a66c97-6131-3213-f3c9-4606946ab056 female 1995-06-12
5 7bc002fa-dc52-17d6-1563-fd8901826f7d female 1979-04-26
6 8e1a0a7c-e308-444b-075a-3c2b1f60f881 male 1995-02-11
7 a4a401d1-a46a-eb4a-8a38-760d5d79d6ec female 1938-07-15
8 a5cb8ce9-cec6-6b23-0990-cbaf753578a4 male 1966-06-10
9 bb6a9034-2f23-2508-d29d-35efee156dc9 male 1943-01-04

While FHIRPath can be quite complex, its use in FHIR-PYrate is often straight forward. Nested elements are separated with ., and elements with multiple sub-values are identified by [N] where N is an integer starting at 0. The element paths can typically be constructed by loading all elements into a DataFrame and then manually deriving the FHIRPaths from the column names, or by looking at the hierarchy resource pages in the FHIR specification (see Key FHIR Resources for more information on reading the FHIR specification).

4 Elements with multiple sub-values

There are multiple identifier[N].value values for each instance of Patient in this dataset.

# Instantiate and perform the FHIR search interaction in a single function call
df = search.steal_bundles_to_dataframe(
    resource_type="Patient",
    request_params={
        "_count": 10,  # Get 10 instances per page
        "identifier": "https://github.com/synthetichealth/synthea|",
    },
    num_pages=1,  # Get 1 page (so a total of 10 instances)
    fhir_paths=[("id", "identifier[0].value"), ("identifiers", "identifier.value")],
)
df
https://hapi.fhir.org/baseR4/Patient?_count=10&identifier=https://github.com/synthetichealth/synthea|
Query & Build DF (Patient):   0%|          | 0/1 [00:00<?, ?it/s]Query & Build DF (Patient): 100%|██████████| 1/1 [00:00<00:00, 583.11it/s]
id identifiers
0 129c6ac7-8d06-89de-ad63-0204a93e76c3 [129c6ac7-8d06-89de-ad63-0204a93e76c3, 129c6ac...
1 3af3708d-41f1-cd80-f3dd-ec5ac76072bf [3af3708d-41f1-cd80-f3dd-ec5ac76072bf, 3af3708...
2 63ee2253-bdd5-da55-2ad2-b4984d0ad700 [63ee2253-bdd5-da55-2ad2-b4984d0ad700, 63ee225...
3 6a4160eb-a793-2f86-2302-378626f46cce [6a4160eb-a793-2f86-2302-378626f46cce, 6a4160e...
4 79a66c97-6131-3213-f3c9-4606946ab056 [79a66c97-6131-3213-f3c9-4606946ab056, 79a66c9...
5 7bc002fa-dc52-17d6-1563-fd8901826f7d [7bc002fa-dc52-17d6-1563-fd8901826f7d, 7bc002f...
6 8e1a0a7c-e308-444b-075a-3c2b1f60f881 [8e1a0a7c-e308-444b-075a-3c2b1f60f881, 8e1a0a7...
7 a4a401d1-a46a-eb4a-8a38-760d5d79d6ec [a4a401d1-a46a-eb4a-8a38-760d5d79d6ec, a4a401d...
8 a5cb8ce9-cec6-6b23-0990-cbaf753578a4 [a5cb8ce9-cec6-6b23-0990-cbaf753578a4, a5cb8ce...
9 bb6a9034-2f23-2508-d29d-35efee156dc9 [bb6a9034-2f23-2508-d29d-35efee156dc9, bb6a903...

To convert to separate columns, you can do the following:

df.join(pd.DataFrame(df.pop("identifiers").values.tolist()).add_prefix("identifier_"))
id identifier_0 identifier_1 identifier_2 identifier_3 identifier_4 identifier_5
0 129c6ac7-8d06-89de-ad63-0204a93e76c3 129c6ac7-8d06-89de-ad63-0204a93e76c3 129c6ac7-8d06-89de-ad63-0204a93e76c3 999-94-5397 S99940903 X53631011X 9447890438
1 3af3708d-41f1-cd80-f3dd-ec5ac76072bf 3af3708d-41f1-cd80-f3dd-ec5ac76072bf 3af3708d-41f1-cd80-f3dd-ec5ac76072bf 999-26-9282 2339674416 None None
2 63ee2253-bdd5-da55-2ad2-b4984d0ad700 63ee2253-bdd5-da55-2ad2-b4984d0ad700 63ee2253-bdd5-da55-2ad2-b4984d0ad700 999-28-8122 7144994331 None None
3 6a4160eb-a793-2f86-2302-378626f46cce 6a4160eb-a793-2f86-2302-378626f46cce 6a4160eb-a793-2f86-2302-378626f46cce 999-75-6358 S99942926 X17055248X 6387718521
4 79a66c97-6131-3213-f3c9-4606946ab056 79a66c97-6131-3213-f3c9-4606946ab056 79a66c97-6131-3213-f3c9-4606946ab056 999-27-7392 S99918680 X71217115X 3877816703
5 7bc002fa-dc52-17d6-1563-fd8901826f7d 7bc002fa-dc52-17d6-1563-fd8901826f7d 7bc002fa-dc52-17d6-1563-fd8901826f7d 999-59-5908 S99978056 X23499364X 9902880694
6 8e1a0a7c-e308-444b-075a-3c2b1f60f881 8e1a0a7c-e308-444b-075a-3c2b1f60f881 8e1a0a7c-e308-444b-075a-3c2b1f60f881 999-43-2141 S99990165 X42517753X 6474987210
7 a4a401d1-a46a-eb4a-8a38-760d5d79d6ec a4a401d1-a46a-eb4a-8a38-760d5d79d6ec a4a401d1-a46a-eb4a-8a38-760d5d79d6ec 999-53-1770 S99970448 X61437109X 8725754749
8 a5cb8ce9-cec6-6b23-0990-cbaf753578a4 a5cb8ce9-cec6-6b23-0990-cbaf753578a4 a5cb8ce9-cec6-6b23-0990-cbaf753578a4 999-56-7727 S99979112 X83974334X 9852415028
9 bb6a9034-2f23-2508-d29d-35efee156dc9 bb6a9034-2f23-2508-d29d-35efee156dc9 bb6a9034-2f23-2508-d29d-35efee156dc9 999-79-4457 7310386469 None None

This will give you separate identifier_0, identifier_1, … columns for each Patient.identifier[N] value.