Speaker 1: Healthcare organizations are increasingly using cloud platforms to personalize care, analyze large datasets, enhance research and development, optimize operational costs, and increase their security and privacy. And with HIPAA's privacy rule, healthcare entities are also tasked to safeguard protected health information. Google has partnered with many healthcare organizations over the years and has consolidated multiple best practices into a single solution called the Healthcare Data Engine, aka HDE, which is aimed at assisting cloud infrastructure or security engineers set up a data management layer that is automated, offer pre-configured data maps and pipelines to help data engineers and clinical informaticists spend less time on things like manual data transformation, processes, real-time risk scores, and insights optimized for longitudinal patient records, and has traceability built in in order to know where data came from and how it was processed, how and why data exists where it is. This is known as provenance. Setting up your data environment and designing it for repeatable deployments in a highly regulated sector can be challenging. HDE offers a pre-built configuration script that serves as a template to help build out your cloud resources with all the necessary parameters and governance design. It uses a Terraform, which is a familiar open-source way to define and provide data center infrastructure using a declarative configuration language. When projects are deployed successfully, the script will write a YAML file with all generated fields specified in the project's config using the generated fields path attribute. These fields are used to generate monitoring rules. Overall, Healthcare Data Engine's implementation automates the following for key dev, staging, and production environments. The creation of a Google Cloud folder and multiple cloud projects provisions the necessary resources for common healthcare data use cases, as well as the access rules to manage each. It establishes a collection of audit locks, enables cloud monitoring metrics and alerts, and allows users to create visualizations to track your resources and security policies. And if an organization uses an on-premise or third-party identity platform, you can synchronize this user directory with cloud identity as well as set up SAML 2.0-based single sign-on to let users access Google Cloud or any work app by signing in once and accessing all their services. Next, from an information harmonization perspective, data engineers have a dedicated, fully managed JupyterLab web app running on Google Cloud AI Platform notebooks. That enables them to convert HL7v2 messages and proprietary data schemas that are in CSV into FHIR. This notebook interface serves as an integrated development tool because it includes operations such as syntax highlighting, autocompletion of functions, version control, integration with Git and a code source repo, etc. And because it is connected to your Google Cloud resources, it can execute distributed data processing pipelines on Dataflow. Dataflow is a fully managed streaming analytics service that minimizes latency, processing time, and cost through autoscaling and batch processing. This is the JupyterLab IDE UI in HDE. We will open BigQuery here on the side thanks to the UI plugin. We now have a list of tables listed inside of a BigQuery dataset provisioned through the HDE process. These tables have been pre-adjusted with raw CSV data. Let's take a look at them. Note that our goal is to convert the CSV patient data into a FHIR JSON resource. Next, let's look at our local file system. These are prepackaged sample mapping files. They are part of HDE's JupyterLab IDE. This one specifically converts CSV data to FHIR. Now let's visit Git to the Jupyter demo branch and open the following file. In this notebook, we will execute these commands in Python code, then we will run this pre-built magic command. When it's done, we visit the JSON that has been generated. Next, we run a validation test to our FHIR resource and find an error saying that the patient given name is expected to be an array. So we will modify the code responsible for patient data by converting it to an array and rerun the magic command and reload the JSON. And now we have it validated as successful because it is now valid FHIR. After this step, we perform test mapping, which executes the data transformation code into a Dataflow pipeline. This link brings you to the Dataflow pipeline. And finally, we return to Git and look at the changes and then commit them. Data engineers also need traceability of how data is transformed and created. They need to debug data issues and understand which pipeline produced what data. This is commonly referred to as provenance. Provenance data gets written to Google Cloud Storage by the various pipelines for ingestion, harmonization, or reconciliation. A cron job using Cloud Scheduler runs a processing pipeline that takes this provenance data and writes it to an operational FHIR store. Provenance links device to input and outputs, document references, and FHIR resources. For example, let's figure out how a sample patient got created into the FHIR file store. By looking at the JSON, we see several attributes. An important one is the ID field, which can help us understand the provenance of the patient data. Let's visit the operational FHIR store and look at the provenance record by using the filterable lookup of the patient ID. As we locate it, we can investigate in the Elements tab the additional fields tied to that record. The provenance record combines the source information as well as the data of the pipeline that transformed the source into the target as well as the target that created the source. For example, I can see there are 19 resources that were created in conjunction with the patient. Some are organizational, device, location, or message resources. As for the pipeline itself, it exists as a device resource under the agent field, then who? In order for me to figure out the message itself, which HL7v2 message was the source for the data, I can go to the entity, what field, and here I have a document reference that points to the HL7v2 message. Let me click into it and show you how it's structured. When I click Content Attachment URL, I have a pointer to the message in the HL7v2 store. And if I were to do a cURL GET request, I would retrieve the full message. And that is an overview of how provenance works in HDE. And there you have it, a quick summary on how you can enable infrastructure and data specialists via the Healthcare Data Engine, which is a predefined configuration to get you started with the necessary cloud infrastructure and data transformations with built-in auditability. To get started with some of the underlying technology that powers HDE, you will need to have a Google Cloud project. If you do not have one, I have included a link to a trial account with free credits in this video's description, along with other helpful resources. And community, if you found this episode helpful, please subscribe to the channel to get notifications of more healthcare episodes. Cheers.
Generate a brief summary highlighting the main points of the transcript.
GenerateGenerate a concise and relevant title for the transcript based on the main themes and content discussed.
GenerateIdentify and highlight the key words or phrases most relevant to the content of the transcript.
GenerateAnalyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.
GenerateCreate interactive quizzes based on the content of the transcript to test comprehension or engage users.
GenerateWe’re Ready to Help
Call or Book a Meeting Now