Exercises
Optional: Use Google Colab to evaluate Python, R, and/or Julia code generated by the LLM:
Select your programming language at Runtime > Change runtime type > [Python3 | R | Julia]
Paste the code in a cell
Click the Run Button.
(NB: A Google account is required for Google Colab access.)
Hands-On Exercise: Screening Patient Data for Clinical Trial Eligibility
- Search for a kidney-related active trial on clinicaltrials.gov
- Copy the Participation Criteria
- Prompt LLM to write SQL (or alternatively Python, R, Julia, SPSS, Stata, or language of your choice) to identify eligible patients using relevant columns in EPIC’s Clarity Database Schema
Example Actively Recruiting Trial Participation Criteria
Optional Extensions
Prompt LLM to create:
Patient-Screener Web App
CONSORT Diagram of Patient Eligibility Criteria
Synthetic Dataset to Test Code
Additional Hands-On Exercises
Exercise 1: Code Explanation
Prompt LLM for a detailed explanation of code, including libraries/packages used, required data types and functional outputs.
Options:
Use your own code.
Search for and copy code from GitHub.
Example Searches:
language:Python clinical triallanguage:R renallanguage:SAS clinicallanguage:Python nephrology
Extension: Prompt an LLM for ways to improve the code. For example, to reduce run/compilation time or improve readability.
Exercise 2: Data Reporting, Visualization, and Predictive Modeling
Using the following description of the clinical trial, copy and paste the variables and data descriptions into an LLM and prompt it to:
Create a JAMA style Table 1 in Markdown.
Explore and visualize an important relationship(s) between the predictors and
remission.Provide and explain code for building multiple predictive models for
remission(remission = 1).Optional: Download the dataset and evaluate the code in Google Colab.
Variables
days_of_life - age in days. Numeric. Range: 1207-32356. 1 missing value.
plt - Platelet Count. Numeric. Range: 11-1114. 4 missing values.
mpv - Mean Platelet Volume. Numeric. Range: 5.3-13.5. 21 missing values.
un - Blood Urea Nitrogen. Numeric. Range: 2-118. 53 missing values.
wbc - White Blood Cell Count. Numeric. Range: 0.7-33.5. No missing values.
hgb - Hemoglobin. Numeric. Range: 4.5-18.6. 4 missing values.
hct Hematocrit. Numeric. Range: 13.7-55.2. 3 missing values.
rbc - Red Blood Cell Count. Numeric. Range: 1.57-7.04. 3 missing values.
mcv - Mean Corpuscular (RBC) Volume. Numeric. Range: 56.5-124. 3 missing values.
mch - Mean Corpuscular (RBC) Hemoglobin. Numeric. Range: 16.7-42.3. 7 missing values.
mchc - Mean Corpuscular (RBC) Hemoglobin per Cell. Numeric. Range: 28.2-38.0. 7 missing values.
rdw - Red cell Distribution Width. Numeric. Range: 11.3-39.7. 3 missing values.
neut_percent - Percent of Neutrophils in WBC count. Numeric. Range: 17-98.1. No missing values.
lymph_percent - Percent of Lymphocytes in WBC count. Numeric. Range: 1-67.9. No missing values.
mono_percent - Percent of Monocytes in WBC count. Numeric. Range: 0-30.3. No missing values.
eos_percent - Percent of Eosinophils in WBC count. Numeric. Range: 0.5-29.3. 6 missing values.
baso_percent - Percent of Basoophils in WBC count. Numeric. Range: 0.2-5.3. 6 missing values.
sod - Sodium. Numeric. Range: 116-151. No missing values.
pot - Potassium. Numeric. Range: 2.6-10.1. 1 missing value.
chlor - Chloride. Numeric. Range: 83-126. No missing values.
co2 - Bicarbonate (CO2). Numeric. Range: 12-40. 5 missing values.
creat - Creatinine. Numeric. Range: 0.2-8.4. No missing values.
gluc - Glucose. Numeric. Range: 41-486. No missing values.
cal - Calcium. Numeric. Range: 6.5-11.8. 1 missing value.
prot - Protein. Numeric, range 2.9-10, 0 missing values
alb - Albumin. Numeric, range 1.2-5.5, 0 missing values
ast - Aspartate Transaminase. Numeric, range 5-7765, 0 missing values
alt - Alanine Transaminase. Numeric, range 1-10666, 18 missing values
alk - Alkaline phosphatase. Numeric, range 13-1938, 0 missing values
tbil - Total Bilirubin. Numeric, range 0.09-27, 0 missing values
active - Active Inflammation despite Thiopurines for > 12 weeks. Numeric, range 0-1, 0 missing values
remission - Remission of Inflammation after Thiopurines for > 12 weeks. Numeric, range 0-1, 0 missing values
Citation: Higgins P (2023). medicaldata: Data Package for Medical Datasets. https://higgi13425.github.io/medicaldata/, https://github.com/higgi13425/medicaldata/.
Extension: Try using Paper Banana to create a visualization or CONSORT diagram.
Exercise 3: Munging Messy Data
Prompt an LLM to clean and validate the messy_aki dataset in preparation for analysis and modeling.
- Download the messy_aki dataset from Dr. Peter Higgins’ {medicaldata} R package.
- Prompt an LLM to write code to clean the data.
- Evaluate the results and adjust your prompt to address any missed issues.
Data: messy_aki
Optional Extension: Use the LLM to visualize eGFR trends for each patient over time.