Analyzing FHIR Data in a Tabular Format With Python

Learning objectives

Understand the high-level approaches for converting FHIR-formatted data into tabular for analysis in Python.
Learn how the FHIR-PYrate library facilitates requesting data from a FHIR server, and creating tidy tabular data tables.

Relevant roles:

Informaticist

Data analysis approaches in Python often use Pandas DataFrames to store tabular data. There are two primary approaches to loading FHIR-formatted data into Pandas DataFrames:

Writing Python code to manually convert FHIR instances in JSON format into DataFrames.

This does not require any special skills beyond data manipulation in Python, but in practice can be laborious (especially with large number of data elements) and prone to bugs.
Using a purpose-built library like FHIR-PYrate to automatically convert FHIR instances into DataFrames.

It is recommended to try this approach first, and only fall back to (1) if needed.

To use FHIR-PYrate, you will need a Python 3 runtime with FHIR-PYrate and Pandas installed.

1 FHIR testing server

The examples in this module use a FHIR testing server populated with Synthea data in FHIR R4 format via Logica Health’s Sandbox service.

The endpoint for this testing server is:

https://api.logicahealth.org/FHIRResearchSynthea/open

However, any FHIR server loaded with testing data can be used. See Standing up a FHIR Testing Server for instructions to set up your own test server.

The code blocks in the following section show sample output immediately after. This is similar to the code cells and results in a Jupyter notebook.

2 Retrieving FHIR data

Once your environment is set up, you can run the following Python code to retrieve instances of the Patient resource from a test server:

# Load dependencies
from fhir_pyrate import Pirate
import pandas as pd

# Instantiate a Pirate object using the FHIR-PYrate library to query a test FHIR server
search = Pirate(
    auth=None,
    base_url="https://api.logicahealth.org/FHIRResearchSynthea/open/",
    print_request_url=True,
)

# Use the whimsically named `steal_bundles()` method to instantiate a search interaction
#
# For more information, see https://github.com/UMEssen/FHIR-PYrate/#pirate
bundles = search.steal_bundles(
    resource_type='Patient',
    request_params={
        "_count": 10 # Get 10 instances per page
    },
    num_pages=1 # Get 1 page (so a total of 10 instances)
)

# Execute the search and convert to a Pandas DataFrame
df = search.bundles_to_dataframe(bundles)

df.head(5)

https://api.logicahealth.org/FHIRResearchSynthea/open/Patient?_count=10

Query (Patient):   0%|          | 0/1 [00:00<?, ?it/s]Query (Patient): 100%|██████████| 1/1 [00:00<00:00, 889.19it/s]

	resourceType	id	meta_versionId	meta_lastUpdated	meta_source	text_status	text_div	extension_0_url	extension_0_extension_0_url	extension_0_extension_0_valueCoding_system	...	address_0_country	maritalStatus_coding_0_system	maritalStatus_coding_0_code	maritalStatus_coding_0_display	maritalStatus_text	multipleBirthBoolean	communication_0_language_coding_0_system	communication_0_language_coding_0_code	communication_0_language_coding_0_display	communication_0_language_text
0	Patient	1	1	2023-04-06T20:52:11.000+00:00	#wQwWCylvgEiNKNbB	generated	<div xmlns="http://www.w3.org/1999/xhtml">Gene...	http://hl7.org/fhir/us/core/StructureDefinitio...	ombCategory	urn:oid:2.16.840.1.113883.6.238	...	US	http://terminology.hl7.org/CodeSystem/v3-Marit...	M	M	M	False	urn:ietf:bcp:47	en-US	English	English
1	Patient	180	1	2023-04-06T20:53:35.000+00:00	#N1sYXaxE6wGQilNA	generated	<div xmlns="http://www.w3.org/1999/xhtml">Gene...	http://hl7.org/fhir/us/core/StructureDefinitio...	ombCategory	urn:oid:2.16.840.1.113883.6.238	...	US	http://terminology.hl7.org/CodeSystem/v3-Marit...	S	Never Married	Never Married	False	urn:ietf:bcp:47	en-US	English	English
2	Patient	293	1	2023-04-06T20:56:31.000+00:00	#ZAAeVXzDXxeetcQS	generated	<div xmlns="http://www.w3.org/1999/xhtml">Gene...	http://hl7.org/fhir/us/core/StructureDefinitio...	ombCategory	urn:oid:2.16.840.1.113883.6.238	...	US	http://terminology.hl7.org/CodeSystem/v3-Marit...	S	S	S	False	urn:ietf:bcp:47	en-US	English	English
3	Patient	612	1	2023-04-06T20:57:01.000+00:00	#sTSlRxyopDL1X4fa	generated	<div xmlns="http://www.w3.org/1999/xhtml">Gene...	http://hl7.org/fhir/us/core/StructureDefinitio...	ombCategory	urn:oid:2.16.840.1.113883.6.238	...	US	http://terminology.hl7.org/CodeSystem/v3-Marit...	S	S	S	False	urn:ietf:bcp:47	en-US	English	English
4	Patient	931	1	2023-04-06T20:57:13.000+00:00	#ElqD7MaHoYuk6qSo	generated	<div xmlns="http://www.w3.org/1999/xhtml">Gene...	http://hl7.org/fhir/us/core/StructureDefinitio...	ombCategory	urn:oid:2.16.840.1.113883.6.238	...	US	http://terminology.hl7.org/CodeSystem/v3-Marit...	S	S	S	False	urn:ietf:bcp:47	en-US	English	English

5 rows × 87 columns

It is easier to see the contents of this DataFrame by printing out its first row vertically:

# Print the first row of the DataFrame vertically for easier reading.
pd.set_option('display.max_rows', 100)  # Show all rows
df.head(1).T

	0
resourceType	Patient
id	1
meta_versionId	1
meta_lastUpdated	2023-04-06T20:52:11.000+00:00
meta_source	#wQwWCylvgEiNKNbB
text_status	generated
text_div	<div xmlns="http://www.w3.org/1999/xhtml">Gene...
extension_0_url	http://hl7.org/fhir/us/core/StructureDefinitio...
extension_0_extension_0_url	ombCategory
extension_0_extension_0_valueCoding_system	urn:oid:2.16.840.1.113883.6.238
extension_0_extension_0_valueCoding_code	2054-5
extension_0_extension_0_valueCoding_display	Black or African American
extension_0_extension_1_url	text
extension_0_extension_1_valueString	Black or African American
extension_1_url	http://hl7.org/fhir/us/core/StructureDefinitio...
extension_1_extension_0_url	ombCategory
extension_1_extension_0_valueCoding_system	urn:oid:2.16.840.1.113883.6.238
extension_1_extension_0_valueCoding_code	2186-5
extension_1_extension_0_valueCoding_display	Not Hispanic or Latino
extension_1_extension_1_url	text
extension_1_extension_1_valueString	Not Hispanic or Latino
extension_2_url	http://hl7.org/fhir/StructureDefinition/patien...
extension_2_valueString	Delois358 Hintz995
extension_3_url	http://hl7.org/fhir/us/core/StructureDefinitio...
extension_3_valueCode	M
extension_4_url	http://hl7.org/fhir/StructureDefinition/patien...
extension_4_valueAddress_city	Lawrence
extension_4_valueAddress_state	Massachusetts
extension_4_valueAddress_country	US
extension_5_url	http://synthetichealth.github.io/synthea/disab...
extension_5_valueDecimal	0.098274
extension_6_url	http://synthetichealth.github.io/synthea/quali...
extension_6_valueDecimal	62.901726
identifier_0_system	https://github.com/synthetichealth/synthea
identifier_0_value	439b24b4-6f25-4093-b101-47a39bd061ca
identifier_1_type_coding_0_system	http://terminology.hl7.org/CodeSystem/v2-0203
identifier_1_type_coding_0_code	MR
identifier_1_type_coding_0_display	Medical Record Number
identifier_1_type_text	Medical Record Number
identifier_1_system	http://hospital.smarthealthit.org
identifier_1_value	439b24b4-6f25-4093-b101-47a39bd061ca
identifier_2_type_coding_0_system	http://terminology.hl7.org/CodeSystem/v2-0203
identifier_2_type_coding_0_code	SS
identifier_2_type_coding_0_display	Social Security Number
identifier_2_type_text	Social Security Number
identifier_2_system	http://hl7.org/fhir/sid/us-ssn
identifier_2_value	999-57-3355
identifier_3_type_coding_0_system	http://terminology.hl7.org/CodeSystem/v2-0203
identifier_3_type_coding_0_code	DL
identifier_3_type_coding_0_display	Driver's License
identifier_3_type_text	Driver's License
identifier_3_system	urn:oid:2.16.840.1.113883.4.3.25
identifier_3_value	S99925942
identifier_4_type_coding_0_system	http://terminology.hl7.org/CodeSystem/v2-0203
identifier_4_type_coding_0_code	PPN
identifier_4_type_coding_0_display	Passport Number
identifier_4_type_text	Passport Number
identifier_4_system	http://standardhealthrecord.org/fhir/Structure...
identifier_4_value	X42032818X
name_0_use	official
name_0_family	Moen819
name_0_given_0	Willian804
name_0_prefix_0	Mr.
telecom_0_system	phone
telecom_0_value	555-135-7303
telecom_0_use	home
gender	male
birthDate	1955-10-09
address_0_extension_0_url	http://hl7.org/fhir/StructureDefinition/geoloc...
address_0_extension_0_extension_0_url	latitude
address_0_extension_0_extension_0_valueDecimal	42.319305
address_0_extension_0_extension_1_url	longitude
address_0_extension_0_extension_1_valueDecimal	-71.173653
address_0_line_0	545 Tromp Port Unit 55
address_0_city	Needham
address_0_state	Massachusetts
address_0_postalCode	02492
address_0_country	US
maritalStatus_coding_0_system	http://terminology.hl7.org/CodeSystem/v3-Marit...
maritalStatus_coding_0_code	M
maritalStatus_coding_0_display	M
maritalStatus_text	M
multipleBirthBoolean	False
communication_0_language_coding_0_system	urn:ietf:bcp:47
communication_0_language_coding_0_code	en-US
communication_0_language_coding_0_display	English
communication_0_language_text	English

If you look at the output above, you can see FHIR-PYrate collapsed the hierarchical FHIR data structure into DataFrame columns. FHIR-PYrate does this by taking an element from the FHIR-formatted data like Patient.identifier[0].value and converting to an underscore-delimited column name like identifier_0_value. (Note that Patient.identifier has multiple values in the FHIR data, so there are multiple identifier_N_... columns in the DataFrame.)

3 Selecting specific columns

Usually not every single value from a FHIR instance is needed for analysis. There are two ways to get a more concise DataFrame:

Use the approach above to load all elements into a DataFrame, remove the unneeded columns, and rename the remaining columns as needed. The process_function capability in FHIR-PYrate allows you to integrate this approach into the bundles_to_dataframe() method call.
Use FHIRPath to select specific elements and map them onto column names.

The second approach is typically more concise. For example, to generate a DataFrame like this…

id	gender	date_of_birth	marital_status
…	…	…	…

…you could use the following code:

# Instantiate and perform the FHIR search interaction in a single function call
df = search.steal_bundles_to_dataframe(
    resource_type='Patient',
    request_params={
        "_count": 10  # Get 10 instances per page
    },
    num_pages=1,  # Get 1 page (so a total of 10 instances)
    fhir_paths=[
        ("id", "identifier[0].value"),
        ("gender", "gender"),
        ("date_of_birth", "birthDate"),
        ("marital_status", "maritalStatus.coding[0].code")
    ])
df

https://api.logicahealth.org/FHIRResearchSynthea/open/Patient?_count=10

Query & Build DF (Patient):   0%|          | 0/1 [00:00<?, ?it/s]Query & Build DF (Patient): 100%|██████████| 1/1 [00:00<00:00, 247.70it/s]

	id	gender	date_of_birth	marital_status
0	439b24b4-6f25-4093-b101-47a39bd061ca	male	1955-10-09	M
1	99285aac-e5e3-4f5b-857d-f67271c97304	female	2017-12-02	S
2	2fa15bc7-8866-461a-9000-f739e425860a	male	1945-12-10	S
3	2fa15bc7-8866-461a-9000-f739e425860a	male	1945-12-10	S
4	2fa15bc7-8866-461a-9000-f739e425860a	male	1945-12-10	S
5	41166989-975d-4d17-b9de-17f94cb3eec1	male	1946-03-29	M
6	2b083021-e93f-4991-bf49-fd4f20060ef8	female	2002-10-24	S
7	29e51479-f742-4474-8f8e-d2607d5269f6	male	1999-12-12	S
8	262b819a-5193-404a-9787-b7f599358035	male	2002-04-15	S
9	aff8f143-2375-416f-901d-b0e4c73e3e58	male	1997-12-26	S

While FHIRPath can be quite complex, its use in FHIR-PYrate is often straight forward. Nested elements are separated with ., and elements with multiple sub-values are identified by [N] where N is an integer starting at 0. The element paths can typically be constructed by loading all elements into a DataFrame and then manually deriving the FHIRPaths from the column names, or by looking at the hierarchy resource pages in the FHIR specification (see Key FHIR Resources for more information on reading the FHIR specification).

4 Elements with multiple sub-values

There are multiple identifier[N].value values for each instance of Patient in this dataset.

# Instantiate and perform the FHIR search interaction in a single function call
df = search.steal_bundles_to_dataframe(
    resource_type='Patient',
    request_params={
        "_count": 10  # Get 10 instances per page
    },
    num_pages=1,  # Get 1 page (so a total of 10 instances)
    fhir_paths=[
        ("id", "identifier[0].value"),
        ("identifiers", "identifier.value")
    ])
df

https://api.logicahealth.org/FHIRResearchSynthea/open/Patient?_count=10

Query & Build DF (Patient):   0%|          | 0/1 [00:00<?, ?it/s]Query & Build DF (Patient): 100%|██████████| 1/1 [00:00<00:00, 484.95it/s]

	id	identifiers
0	439b24b4-6f25-4093-b101-47a39bd061ca	[439b24b4-6f25-4093-b101-47a39bd061ca, 439b24b...
1	99285aac-e5e3-4f5b-857d-f67271c97304	[99285aac-e5e3-4f5b-857d-f67271c97304, 99285aa...
2	2fa15bc7-8866-461a-9000-f739e425860a	[2fa15bc7-8866-461a-9000-f739e425860a, 2fa15bc...
3	2fa15bc7-8866-461a-9000-f739e425860a	[2fa15bc7-8866-461a-9000-f739e425860a, 2fa15bc...
4	2fa15bc7-8866-461a-9000-f739e425860a	[2fa15bc7-8866-461a-9000-f739e425860a, 2fa15bc...
5	41166989-975d-4d17-b9de-17f94cb3eec1	[41166989-975d-4d17-b9de-17f94cb3eec1, 4116698...
6	2b083021-e93f-4991-bf49-fd4f20060ef8	[2b083021-e93f-4991-bf49-fd4f20060ef8, 2b08302...
7	29e51479-f742-4474-8f8e-d2607d5269f6	[29e51479-f742-4474-8f8e-d2607d5269f6, 29e5147...
8	262b819a-5193-404a-9787-b7f599358035	[262b819a-5193-404a-9787-b7f599358035, 262b819...
9	aff8f143-2375-416f-901d-b0e4c73e3e58	[aff8f143-2375-416f-901d-b0e4c73e3e58, aff8f14...

To convert to separate columns, you can do the following:

df.join(
    pd.DataFrame(
        df.pop('identifiers').values.tolist()
    ).add_prefix('identifier_')
)

	id	identifier_0	identifier_1	identifier_2	identifier_3	identifier_4
0	439b24b4-6f25-4093-b101-47a39bd061ca	439b24b4-6f25-4093-b101-47a39bd061ca	439b24b4-6f25-4093-b101-47a39bd061ca	999-57-3355	S99925942	X42032818X
1	99285aac-e5e3-4f5b-857d-f67271c97304	99285aac-e5e3-4f5b-857d-f67271c97304	99285aac-e5e3-4f5b-857d-f67271c97304	999-62-8033	None	None
2	2fa15bc7-8866-461a-9000-f739e425860a	2fa15bc7-8866-461a-9000-f739e425860a	2fa15bc7-8866-461a-9000-f739e425860a	999-93-7537	S99948707	X14078167X
3	2fa15bc7-8866-461a-9000-f739e425860a	2fa15bc7-8866-461a-9000-f739e425860a	2fa15bc7-8866-461a-9000-f739e425860a	999-93-7537	S99948707	X14078167X
4	2fa15bc7-8866-461a-9000-f739e425860a	2fa15bc7-8866-461a-9000-f739e425860a	2fa15bc7-8866-461a-9000-f739e425860a	999-93-7537	S99948707	X14078167X
5	41166989-975d-4d17-b9de-17f94cb3eec1	41166989-975d-4d17-b9de-17f94cb3eec1	41166989-975d-4d17-b9de-17f94cb3eec1	999-17-8717	S99933732	X75257608X
6	2b083021-e93f-4991-bf49-fd4f20060ef8	2b083021-e93f-4991-bf49-fd4f20060ef8	2b083021-e93f-4991-bf49-fd4f20060ef8	999-83-6040	S99998967	None
7	29e51479-f742-4474-8f8e-d2607d5269f6	29e51479-f742-4474-8f8e-d2607d5269f6	29e51479-f742-4474-8f8e-d2607d5269f6	999-48-8328	S99955051	None
8	262b819a-5193-404a-9787-b7f599358035	262b819a-5193-404a-9787-b7f599358035	262b819a-5193-404a-9787-b7f599358035	999-19-7941	S99986287	None
9	aff8f143-2375-416f-901d-b0e4c73e3e58	aff8f143-2375-416f-901d-b0e4c73e3e58	aff8f143-2375-416f-901d-b0e4c73e3e58	999-35-9084	S99998334	X10377495X

This will give you separate identifier_0, identifier_1, … columns for each Patient.identifier[N] value.

5 Retrieving related data

To retrieve instances of related resources, additional request_params can be added. See Using the FHIR API to Access Data for more information on constructing the parameters for FHIR search interactions.

In the example below, instances of Patient and instances of related Observation resources are requested:

# Instantiate and perform the FHIR search interaction in a single function call
dfs = search.steal_bundles_to_dataframe(
    resource_type='Patient',
    request_params={
        # Get instances of Observation where `Observation.patient` refers to a fetched Patient instance
        "_revinclude": "Observation:patient",

        "_count": 10  # Get 10 instances per page
    },
    num_pages=1,  # Get 1 page (so a total of 10 instances)
)

# `dfs` is a dictionary where the key is the FHIR resource type, and the value is the DataFrame
#
# Split these into separate variables for easy access:
df_patients = dfs['Patient']
df_observations = dfs['Observation']

# Look at the first row of the Observations DataFrame
df_observations.head(1).T

https://api.logicahealth.org/FHIRResearchSynthea/open/Patient?_count=10&_revinclude=Observation:patient

Query & Build DF (Patient):   0%|          | 0/1 [00:00<?, ?it/s]Query & Build DF (Patient): 100%|██████████| 1/1 [00:00<00:00, 26.69it/s]

	0
resourceType	Observation
id	2048
meta_versionId	1
meta_lastUpdated	2023-04-06T20:58:59.000+00:00
meta_source	#aCMN9zUZkrDZoyGR
status	final
category_0_coding_0_system	http://terminology.hl7.org/CodeSystem/observat...
category_0_coding_0_code	vital-signs
category_0_coding_0_display	vital-signs
code_coding_0_system	http://loinc.org
code_coding_0_code	29463-7
code_coding_0_display	Body Weight
code_text	Body Weight
subject_reference	Patient/1989
encounter_reference	Encounter/2045
effectiveDateTime	2012-01-01T22:52:53-05:00
issued	2012-01-01T22:52:53.991-05:00
valueQuantity_value	42.979797
valueQuantity_unit	kg
valueQuantity_system	http://unitsofmeasure.org
valueQuantity_code	kg
component_0_code_coding_0_system	NaN
component_0_code_coding_0_code	NaN
component_0_code_coding_0_display	NaN
component_0_code_text	NaN
component_0_valueQuantity_value	NaN
component_0_valueQuantity_unit	NaN
component_0_valueQuantity_system	NaN
component_0_valueQuantity_code	NaN
component_1_code_coding_0_system	NaN
component_1_code_coding_0_code	NaN
component_1_code_coding_0_display	NaN
component_1_code_text	NaN
component_1_valueQuantity_value	NaN
component_1_valueQuantity_unit	NaN
component_1_valueQuantity_system	NaN
component_1_valueQuantity_code	NaN
valueCodeableConcept_coding_0_system	NaN
valueCodeableConcept_coding_0_code	NaN
valueCodeableConcept_coding_0_display	NaN
valueCodeableConcept_text	NaN

As of April 2023, FHIR-PYrate does not have a good approach to fhir_paths for searches that return instances of multiple FHIR resource types.

To work around this, you can also iterate over all the rows in a DataFrame and request related resources using trade_rows_for_bundles():

df_observations2 = search.trade_rows_for_dataframe(
    df_patients,
    resource_type="Observation",

    # Limit to 10 instances of Observation per patient
    request_params={
        "_count": "10"
    },
    num_pages=1,

    # Load Observations where `Observation.subject` references the instance of Patient
    # identified by `id` in the `df_patients` DataFrame
    df_constraints={"subject": "id"},
    fhir_paths=[
      ("observation_id", "id"),
      ("patient", "subject.reference"),
      ("status", "status"),
      ("code", "code.coding[0].code"),
      ("code_display", "code.coding[0].display"),
      ("value", "valueQuantity.value"),
      ("value_units", "valueQuantity.unit"),
      ("datetime", "effectiveDateTime")
    ]
)

# Look at the first row of the Observations DataFrame
df_observations2.head(15)

Query & Build DF (Observation):   0%|          | 0/10 [00:00<?, ?it/s]                                                                      Query & Build DF (Observation):   0%|          | 0/10 [00:00<?, ?it/s]Query & Build DF (Observation):  10%|█         | 1/10 [00:00<00:01,  5.10it/s]                                                                              Query & Build DF (Observation):  10%|█         | 1/10 [00:00<00:01,  5.10it/s]Query & Build DF (Observation):  20%|██        | 2/10 [00:00<00:01,  5.89it/s]

https://api.logicahealth.org/FHIRResearchSynthea/open/Observation?_count=10&subject=1
https://api.logicahealth.org/FHIRResearchSynthea/open/Observation?_count=10&subject=180

                                                                              Query & Build DF (Observation):  20%|██        | 2/10 [00:00<00:01,  5.89it/s]Query & Build DF (Observation):  30%|███       | 3/10 [00:00<00:01,  4.36it/s]                                                                              Query & Build DF (Observation):  30%|███       | 3/10 [00:00<00:01,  4.36it/s]Query & Build DF (Observation):  40%|████      | 4/10 [00:00<00:01,  4.81it/s]

https://api.logicahealth.org/FHIRResearchSynthea/open/Observation?_count=10&subject=293
https://api.logicahealth.org/FHIRResearchSynthea/open/Observation?_count=10&subject=612

                                                                              Query & Build DF (Observation):  40%|████      | 4/10 [00:00<00:01,  4.81it/s]Query & Build DF (Observation):  50%|█████     | 5/10 [00:00<00:00,  5.11it/s]                                                                              Query & Build DF (Observation):  50%|█████     | 5/10 [00:01<00:00,  5.11it/s]Query & Build DF (Observation):  60%|██████    | 6/10 [00:01<00:00,  5.25it/s]

https://api.logicahealth.org/FHIRResearchSynthea/open/Observation?_count=10&subject=931
https://api.logicahealth.org/FHIRResearchSynthea/open/Observation?_count=10&subject=1250

                                                                              Query & Build DF (Observation):  60%|██████    | 6/10 [00:01<00:00,  5.25it/s]Query & Build DF (Observation):  70%|███████   | 7/10 [00:01<00:00,  5.45it/s]                                                                              Query & Build DF (Observation):  70%|███████   | 7/10 [00:01<00:00,  5.45it/s]Query & Build DF (Observation):  80%|████████  | 8/10 [00:01<00:00,  5.64it/s]

https://api.logicahealth.org/FHIRResearchSynthea/open/Observation?_count=10&subject=1806
https://api.logicahealth.org/FHIRResearchSynthea/open/Observation?_count=10&subject=1989

                                                                              Query & Build DF (Observation):  80%|████████  | 8/10 [00:01<00:00,  5.64it/s]Query & Build DF (Observation):  90%|█████████ | 9/10 [00:01<00:00,  5.65it/s]

https://api.logicahealth.org/FHIRResearchSynthea/open/Observation?_count=10&subject=2172

                                                                              Query & Build DF (Observation):  90%|█████████ | 9/10 [00:01<00:00,  5.65it/s]Query & Build DF (Observation): 100%|██████████| 10/10 [00:01<00:00,  5.16it/s]Query & Build DF (Observation): 100%|██████████| 10/10 [00:01<00:00,  5.21it/s]

https://api.logicahealth.org/FHIRResearchSynthea/open/Observation?_count=10&subject=2370

	observation_id	patient	status	code	code_display	datetime	value	value_units	id
0	122	Patient/1	final	72166-2	Tobacco smoking status NHIS	2015-12-06T14:48:37-05:00	NaN	NaN	1
1	121	Patient/1	final	55284-4	Blood Pressure	2015-12-06T14:48:37-05:00	NaN	NaN	1
2	61	Patient/1	final	72166-2	Tobacco smoking status NHIS	2011-11-13T14:48:37-05:00	NaN	NaN	1
3	120	Patient/1	final	39156-5	Body Mass Index	2015-12-06T14:48:37-05:00	25.783578	kg/m2	1
4	60	Patient/1	final	2085-9	High Density Lipoprotein Cholesterol	2011-11-13T14:48:37-05:00	79.213320	mg/dL	1
5	69	Patient/1	final	72514-3	Pain severity - 0-10 verbal numeric rating [Sc...	2012-11-18T14:48:37-05:00	0.845428	{score}	1
6	68	Patient/1	final	8302-2	Body Height	2012-11-18T14:48:37-05:00	190.900092	cm	1
7	70	Patient/1	final	29463-7	Body Weight	2012-11-18T14:48:37-05:00	95.644696	kg	1
8	110	Patient/1	final	72166-2	Tobacco smoking status NHIS	2014-11-30T14:48:37-05:00	NaN	NaN	1
9	72	Patient/1	final	55284-4	Blood Pressure	2012-11-18T14:48:37-05:00	NaN	NaN	1
10	247	Patient/180	final	77606-2	Weight-for-length Per age and sex	2018-08-11T23:47:47-04:00	87.806367	%	180
11	246	Patient/180	final	29463-7	Body Weight	2018-08-11T23:47:47-04:00	9.618441	kg	180
12	245	Patient/180	final	72514-3	Pain severity - 0-10 verbal numeric rating [Sc...	2018-08-11T23:47:47-04:00	2.566816	{score}	180
13	244	Patient/180	final	8302-2	Body Height	2018-08-11T23:47:47-04:00	70.813536	cm	180
14	249	Patient/180	final	72166-2	Tobacco smoking status NHIS	2018-08-11T23:47:47-04:00	NaN	NaN	180

Note that this will only display value for instances of Observation that record a value in Observation.valueQuantity. Typically, you would filter by Observation.code and then choose the appropriate data type for Observation.value[x] to import. For example, http://loinc.org|72166-2 is the LOINC for smoking status. To get smoking status records for all patients in df_patients:

df_observations2 = search.trade_rows_for_dataframe(
    df_patients,
    resource_type="Observation",

    # Limit to 10 instances of Observation per patient
    request_params={
        "_count": "10",
        "code": "http://loinc.org|72166-2"
    },
    num_pages=1,

    # Load Observations where `Observation.subject` references the instance of Patient
    # identified by `id` in the `df_patients` DataFrame
    df_constraints={"subject": "id"},
    fhir_paths=[
      ("observation_id", "id"),
      ("patient", "subject.reference"),
      ("status", "status"),
      ("code", "code.coding[0].code"),
      ("code_display", "code.coding[0].display"),
      ("value", "valueCodeableConcept.coding[0].code"),
      ("value_display", "valueCodeableConcept.coding[0].display"),
      ("datetime", "effectiveDateTime")
    ]
)

# Look at the first row of the Observations DataFrame
df_observations2.head(15)

Query & Build DF (Observation):   0%|          | 0/10 [00:00<?, ?it/s]                                                                      Query & Build DF (Observation):   0%|          | 0/10 [00:00<?, ?it/s]Query & Build DF (Observation):  10%|█         | 1/10 [00:00<00:01,  6.20it/s]

https://api.logicahealth.org/FHIRResearchSynthea/open/Observation?_count=10&code=http://loinc.org|72166-2&subject=1

                                                                              Query & Build DF (Observation):  10%|█         | 1/10 [00:00<00:01,  6.20it/s]Query & Build DF (Observation):  20%|██        | 2/10 [00:00<00:01,  4.52it/s]

https://api.logicahealth.org/FHIRResearchSynthea/open/Observation?_count=10&code=http://loinc.org|72166-2&subject=180

                                                                              Query & Build DF (Observation):  20%|██        | 2/10 [00:00<00:01,  4.52it/s]Query & Build DF (Observation):  30%|███       | 3/10 [00:00<00:01,  4.42it/s]                                                                              Query & Build DF (Observation):  30%|███       | 3/10 [00:00<00:01,  4.42it/s]Query & Build DF (Observation):  40%|████      | 4/10 [00:00<00:01,  5.11it/s]

https://api.logicahealth.org/FHIRResearchSynthea/open/Observation?_count=10&code=http://loinc.org|72166-2&subject=293
https://api.logicahealth.org/FHIRResearchSynthea/open/Observation?_count=10&code=http://loinc.org|72166-2&subject=612

                                                                              Query & Build DF (Observation):  40%|████      | 4/10 [00:00<00:01,  5.11it/s]Query & Build DF (Observation):  50%|█████     | 5/10 [00:00<00:00,  5.62it/s]                                                                              Query & Build DF (Observation):  50%|█████     | 5/10 [00:01<00:00,  5.62it/s]Query & Build DF (Observation):  60%|██████    | 6/10 [00:01<00:00,  5.79it/s]

https://api.logicahealth.org/FHIRResearchSynthea/open/Observation?_count=10&code=http://loinc.org|72166-2&subject=931
https://api.logicahealth.org/FHIRResearchSynthea/open/Observation?_count=10&code=http://loinc.org|72166-2&subject=1250

                                                                              Query & Build DF (Observation):  60%|██████    | 6/10 [00:01<00:00,  5.79it/s]Query & Build DF (Observation):  70%|███████   | 7/10 [00:01<00:00,  6.09it/s]                                                                              Query & Build DF (Observation):  70%|███████   | 7/10 [00:01<00:00,  6.09it/s]Query & Build DF (Observation):  80%|████████  | 8/10 [00:01<00:00,  6.24it/s]

https://api.logicahealth.org/FHIRResearchSynthea/open/Observation?_count=10&code=http://loinc.org|72166-2&subject=1806
https://api.logicahealth.org/FHIRResearchSynthea/open/Observation?_count=10&code=http://loinc.org|72166-2&subject=1989

                                                                              Query & Build DF (Observation):  80%|████████  | 8/10 [00:01<00:00,  6.24it/s]Query & Build DF (Observation):  90%|█████████ | 9/10 [00:01<00:00,  6.08it/s]                                                                              Query & Build DF (Observation):  90%|█████████ | 9/10 [00:01<00:00,  6.08it/s]Query & Build DF (Observation): 100%|██████████| 10/10 [00:01<00:00,  6.23it/s]Query & Build DF (Observation): 100%|██████████| 10/10 [00:01<00:00,  5.75it/s]

https://api.logicahealth.org/FHIRResearchSynthea/open/Observation?_count=10&code=http://loinc.org|72166-2&subject=2172
https://api.logicahealth.org/FHIRResearchSynthea/open/Observation?_count=10&code=http://loinc.org|72166-2&subject=2370

	observation_id	patient	status	code	code_display	value	value_display	datetime	id
0	122	Patient/1	final	72166-2	Tobacco smoking status NHIS	266919005	Never smoker	2015-12-06T14:48:37-05:00	1
1	61	Patient/1	final	72166-2	Tobacco smoking status NHIS	266919005	Never smoker	2011-11-13T14:48:37-05:00	1
2	110	Patient/1	final	72166-2	Tobacco smoking status NHIS	266919005	Never smoker	2014-11-30T14:48:37-05:00	1
3	84	Patient/1	final	72166-2	Tobacco smoking status NHIS	266919005	Never smoker	2012-11-18T14:48:37-05:00	1
4	95	Patient/1	final	72166-2	Tobacco smoking status NHIS	266919005	Never smoker	2013-11-24T14:48:37-05:00	1
5	136	Patient/1	final	72166-2	Tobacco smoking status NHIS	266919005	Never smoker	2016-12-11T14:48:37-05:00	1
6	163	Patient/1	final	72166-2	Tobacco smoking status NHIS	266919005	Never smoker	2017-12-17T14:48:37-05:00	1
7	20	Patient/1	final	72166-2	Tobacco smoking status NHIS	266919005	Never smoker	2009-11-01T14:48:37-05:00	1
8	42	Patient/1	final	72166-2	Tobacco smoking status NHIS	266919005	Never smoker	2010-11-07T14:48:37-05:00	1
9	176	Patient/1	final	72166-2	Tobacco smoking status NHIS	266919005	Never smoker	2018-12-23T14:48:37-05:00	1
10	249	Patient/180	final	72166-2	Tobacco smoking status NHIS	266919005	Never smoker	2018-08-11T23:47:47-04:00	180
11	235	Patient/180	final	72166-2	Tobacco smoking status NHIS	266919005	Never smoker	2018-05-12T23:47:47-04:00	180
12	264	Patient/180	final	72166-2	Tobacco smoking status NHIS	266919005	Never smoker	2018-11-10T22:47:47-05:00	180
13	200	Patient/180	final	72166-2	Tobacco smoking status NHIS	266919005	Never smoker	2017-12-02T22:47:47-05:00	180
14	288	Patient/180	final	72166-2	Tobacco smoking status NHIS	266919005	Never smoker	2019-05-11T23:47:47-04:00	180

More information about the search interaction used above to filter Observations is here.

6 Additional resources

NIH’s Office of Data Science Strategy has online exercises for converting FHIR-formatted data into tabular format for further analysis. These exercises include implementations in both Python and R.

Note that the Python version of these exercises was created before the FHIR-PYrate library existed, so they use the “writing Python code to manually convert FHIR instances in JSON Format into DataFrames” approach. This approach may still be useful for use cases that are not well-supported by FHIR-PYrate or for integrating FHIR data with data from other web APIs.