Bulk Data Access
1 Bulk Data Access
Bulk data access is key for researchers to do comprehensive data analysis needed in large-scale studies. Examples include population health studies and epidemiological research. Access to vast datasets allows for more robust statistical analyses and the ability to identify patterns and trends that may not be apparent in smaller datasets. Additionally, automated bulk data extraction reduces manual effort in data collection and entry, while supporting scalability of research projects.
For research applications that need standardized bulk data access across FHIR-enabled EHR systems, Bulk FHIR is often a strong choice for most development initiatives.
2 Bulk FHIR
The Bulk Data Access standard, also known as “Bulk FHIR”, enables clients to efficiently retrieve large volumes of data from FHIR-based EHRs. This standard is designed to support a wide array of use cases, making it well-suited for research, analytics, and population health management. Clinical researchers may find Bulk FHIR particularly useful for extracting data related to specific patient populations.
The Bulk Data Access standard is part of the SMART ecosystem, which maintains that developing “plug and play” healthcare applications enables development of “best of breed” digital health solutions, benefiting application developers, care teams, patients, industry, public health and others. Bulk Data Access can be leveraged in conjunction with SMART on FHIR to develop applications that can authenticate and authorize access and facilitate bulk data retrieval automatically.
As an HL7® FHIR® standard, Bulk FHIR has been adopted by many electronic health record (EHR) vendors, including Epic, Oracle Health/Cerner, athenahealth, and Veradigm/Allscripts. In January 2023, ONC announced its support for this standard.
- Key Features
- Asynchronous Export: Allows for large datasets to be exported without overloading the server by processing the request asynchronously.
- Granular Data Retrieval: Enables retrieval of specific types of data, such as patient records, claims, and other health-related information.
- Format: Typically returns data in NDJSON (Newline Delimited JSON) format for easy processing.
3 Benefits of Using Bulk FHIR
Without Bulk FHIR, digital health application developers would face significant challenges in handling large-scale patient data transactions. Developing custom solutions to manage these transactions would not only be resource-intensive and time-consuming but also difficult to scale.
Standard FHIR APIs are designed primarily for retrieving data related to individual patients in real time, typically through a single FHIR resource or a FHIR bundle. Consequently, without Bulk FHIR, accessing longitudinal data across a population or cohort would necessitate a series of repeated FHIR API calls for each individual patient, leading to inefficiencies and increased processing overhead. Bulk FHIR addresses these challenges by enabling the efficient retrieval of large datasets, thus facilitating scalable and streamlined data exchanges.
Key benefits to using Bulk FHIR for developing FHIR applications that need access to large amounts of electronic health data include:
3.1 Efficient Data Retrieval
- Scalability: Bulk FHIR is designed for scenarios where large volumes of patient data need to be extracted, processed, or analyzed. It allows for the efficient retrieval of data for multiple patients simultaneously, making it well-suited for population-level analytics and reporting.
- Batch Processing: It supports asynchronous processing, enabling data extraction in batches, which is more efficient than retrieving data individually for each patient. This reduces the burden on servers and minimizes the need for repeated queries.
3.2 Performance and Speed
- Reduced API Calls: Instead of making separate data requests for each patient, Bulk FHIR allows a client to initiate a single bulk export job for multiple patients. The data is then made available as downloadable files, which is much more efficient for large datasets.
- Optimized Data Transfer: Bulk FHIR optimizes data transfer by using more efficient data formats (e.g., NDJSON) and compression (when supported), reducing the amount of data that needs to be transmitted and processed.
3.3 Cost-Effectiveness
- Resource Utilization: By reducing the number of API calls and optimizing data retrieval, Bulk FHIR can lower the computational resources required, leading to cost savings on infrastructure.
- Reduced Bandwidth Usage: The optimized data formats and batch processing reduce bandwidth usage, which can result in lower costs for data transfer, especially in cloud-based environments.
3.4 Compliance and Standardization
- Interoperability: Bulk FHIR adheres to the same standardized data models and protocols as individual FHIR resources, ensuring that applications developed using Bulk FHIR remain interoperable with other FHIR-based systems.
- Regulatory Compliance: Using Bulk FHIR supports compliance with regulatory requirements, such as those related to data sharing for population health management and research under the 21st Century Cures Act.
3.5 Support for Analytics and Research
- Population Health Management: Bulk FHIR is particularly useful for applications focused on population health management, as it allows for the extraction and analysis of data across large patient cohorts.
- Research and Quality Improvement: Researchers can use Bulk FHIR to efficiently access large datasets for research and quality improvement initiatives, including de-identified patient data when the source system or supporting workflow applies appropriate de-identification controls.
3.6 Security and Privacy
- Authorization-Aware Export: Bulk FHIR supports authorized access to large datasets, commonly through SMART Backend Services and OAuth 2.0. Clients only receive the data they are permitted to see.
3.7 Future-Proofing
- Growing Support: As Bulk FHIR is increasingly adopted and supported by health systems and EHR vendors, applications developed using this approach are better positioned to be compatible with future data exchange initiatives and regulations.
In summary, using Bulk FHIR for applications that need to access large amounts of patient data provides significant advantages in terms of efficiency, performance, cost-effectiveness, and support for large-scale analytics and research, making it a preferred choice for developers working with extensive healthcare datasets.
4 Research Application Using Bulk FHIR
An example of a research initiative leveraging Bulk FHIR is the MULTI-State EHR-Based Network for Disease Surveillance (MENDS). MENDS is a population-based distributed data network for national chronic disease surveillance. Originally requiring institution-specific data extraction-transformation-load (ETL) routines, the MENDS-on-FHIR effort leveraged standards-based interoperability resources (including Bulk FHIR) to provide an alternative mechanism for sharing large amounts of research data. The project leveraged the US Core FHIR standard and Bulk FHIR to transform Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) data into US Core-conformant FHIR resources and expose them through a standards-based ETL pipeline.
5 The Future of Bulk Data Access
The Argonaut Project is currently working on future Bulk FHIR features, including bulk submit and bulk publish. Bulk submit will allow data providers to push pre-coordinated datasets to interested clients. Bulk publish will allow data providers to publish static datasets using Bulk FHIR APIs and formats. To follow their progress, see the Happy Bulk page on the HL7 Confluence site.
6 References
For more information, see this overview presentation and the HL7 specification.