AI-Assisted Metadata Normalization for Clinical Research: A Human-in-the-Loop Prototype

Dataset Info
Published on
2026-01-30

Variables
1

Data Access

Data is available only upon formal request and subject to approval.

Approved users receive a secure institute account and work with the data exclusively in our Trusted Research Environment (TRE) via remote desktop.

Request data (Email to us)

Reuse & Usage Terms
  • Data is not downloadable (TRE access only).
  • Approved users receive a personal institute account.
  • Tools available: RStudio, Jupyter, Python, Stata, etc.
  • Data resides in your TRE home directory.
  • Re-use/publication per Data Use Agreement (DUA).
  • No redistribution of the data.
Contact us for the DUA template and details.
Description

Clinical study data are often first captured in spreadsheets (Excel/CSV). While convenient, flexible column naming frequently leads to ambiguous or inconsistent variable names and labels. These issues may only become apparent during analysis, causing delays, repeated clarification between clinicians and data scientists, and limited reuse and harmonization across studies.

As a team effort (Eren Tasken, Berkay Uras, Lakshmi Sowjanya Batchu, Marcel Müller), we developed a lightweight, extensible, human-in-the-loop prototype for AI-assisted standardization of clinical research metadata at the variable level. The system proposes improved variable names and labels (demonstrated on the PEACHES and BEARR studies) and can optionally suggest semantic annotations. All recommendations remain under human control: users review, edit, and accept/reject suggestions to produce a finalized mapping. The workflow operates on REDCap-style metadata and lightweight summaries and avoids any modification of live REDCap projects.

The prototype follows a modular retrieval-augmented generation (RAG) architecture with a FastAPI backend and a Next.js frontend. Users upload a CSV data dictionary; the system generates ranked normalization suggestions using an API-based LLM (initially Grok-4-1-fast-non-reasoning) to enable rapid prototyping without local model infrastructure.

In an initial evaluation on 28 variables, the system produced human-readable naming suggestions for 10/28 (36.0%) when only the variable name was provided. When variable label context was included, performance improved to 22/28 (78.6%), highlighting the importance of informative labels.

We are submitting this work to the 4th Heidelberg Spring Symposium on Medical Informatics (May 13, 2026). The prototype source code and documentation will be shared via TRACE (Trusted Research Access & Collaboration Environment) for download.

Check out the work on Github: GitHub - erentasken/FieldForge

Available Variables (1)
Event: PEACHES | BEARR
Selected Variables
  • gew_mut_fb1
  • gro_mut_fb1
  • geschw
  • sprach_mut
  • rauch_fb2
  • u2_gew
  • u2_gro
  • u2_ku_2
  • dtpc_lactfr
  • dtpc_dm
  • dtpc_glutfr
  • mi_vat
  • ap_vat
  • ca_vat
  • dm_vat_alt
  • stinf_gr_frei
  • st_versuch
  • bddia1_bd_ss
  • ssw2_bd_ss
  • bem_bd_ss
  • eing_bd_ss
  • aend1_bd_ss
  • aend2_bd_ss
  • it_sys_bp
  • ssw3_bd_ss
  • bdsys3_bd_ss
  • bddia3_bd_ss
  • bddia2_bd_ss
  • kontr_bd_ss
Event: Context
Selected Variables
  • Gewicht: ..... kg
  • Größe: ..... cm
  • Wie viele Geschwister hat Ihr Kind?
  • Welche Muttersprache haben Sie?(Bitte für beide Elternteile angeben.) Mutter(Mehrfachnennung möglich)
  • Rauchen Sie zurzeit - wenn auch nur gelegentlich?(In FB01 wurde bei dieser Frage [rauch_fb1] angegeben)
  • U2: Gewicht: ..... g
  • U2: Größe: ..... cm
  • U2: Kopfumfang: .....cm
  • Lactose-freie Ernährung
  • Diät für Diabetes
  • Gluten-freie Ernährung
  • Herzinfarkt
  • Schlaganfall
  • Krebskrankheit
  • Zuckerkrankheit
  • SONSITIGES
  • 4.5 Haben Sie seit der Geburt versucht zu stillen (das erste Anlegen ausgenommen)?
  • RR diastol M1 in mm Hg
  • SSW Messung 2
  • Bemerkungen Hypertension
  • Daten eingegeben von
  • Änderung 1 im Datenbankfeld
  • Änderung 2 im Datenbankfeld
  • SSW Messung 3
  • RR systol M3
  • RR diastol M3
  • RR diastol M3 in mm Hg
  • RR diastol M2 in mm Hg
  • Daten überprüft von

Analysis Code
Viewing: v1 Python Multi-file Archive
Viewing version: v1 (Python)
Created by eren.tasken · 2026-01-30 23:25
📦 Archive contents
  • FieldForge-main/.env.example
    other · 28 bytes
    file
  • FieldForge-main/.gitignore
    other · 128 bytes
    file
  • FieldForge-main/Dockerfile.backend.dev
    other · 484 bytes
    file
  • FieldForge-main/LICENSE
    other · 1067 bytes
    file
  • FieldForge-main/README.md
    documentation · 3096 bytes
    docs
  • FieldForge-main/docker-compose.dev.yml
    other · 453 bytes
    file
  • FieldForge-main/front/.gitignore
    other · 480 bytes
    file
  • FieldForge-main/front/Dockerfile.front.dev
    other · 123 bytes
    file
  • FieldForge-main/front/README.md
    documentation · 1450 bytes
    docs
  • FieldForge-main/front/components.json
    data · 451 bytes
    file
  • FieldForge-main/front/eslint.config.mjs
    other · 465 bytes
    file
  • FieldForge-main/front/next.config.ts
    other · 156 bytes
    file
  • FieldForge-main/front/package-lock.json
    data · 238693 bytes
    file
  • FieldForge-main/front/package.json
    data · 847 bytes
    file
  • FieldForge-main/front/postcss.config.mjs
    other · 94 bytes
    file
  • FieldForge-main/front/src/app/favicon.ico
    other · 25931 bytes
    file
  • FieldForge-main/front/src/app/globals.css
    other · 4299 bytes
    file
  • FieldForge-main/front/src/app/layout.tsx
    other · 790 bytes
    file
  • FieldForge-main/front/src/app/page.tsx
    other · 23565 bytes
    file
  • FieldForge-main/front/src/components/ui/alert.tsx
    other · 1614 bytes
    file
  • FieldForge-main/front/src/components/ui/button.tsx
    other · 2218 bytes
    file
  • FieldForge-main/front/src/components/ui/card.tsx
    other · 1987 bytes
    file
  • FieldForge-main/front/src/components/ui/checkbox.tsx
    other · 1219 bytes
    file
  • FieldForge-main/front/src/lib/utils.ts
    other · 166 bytes
    file
  • FieldForge-main/front/tsconfig.json
    data · 670 bytes
    file
  • FieldForge-main/media/demo.gif
    other · 1590477 bytes
    file
  • FieldForge-main/media/sample1.jpeg
    other · 104316 bytes
    file
  • FieldForge-main/media/sample2.jpeg
    other · 105328 bytes
    file
  • FieldForge-main/media/sample3.jpeg
    other · 103576 bytes
    file
  • FieldForge-main/rag/embeddings.py
    script · 260 bytes
    script
  • FieldForge-main/rag/knowledge.py
    script · 7998 bytes
    script
  • FieldForge-main/rag/pipeline.py
    script · 353 bytes
    script
  • FieldForge-main/rag/prompt.py
    script · 3976 bytes
    script
  • FieldForge-main/rag/retriever.py
    script · 1088 bytes
    script
  • FieldForge-main/requirements.txt
    dependencies · 253 bytes
    deps
  • FieldForge-main/server.py
    script · 2548 bytes
    script
🧾 README
This is a [Next.js](https://nextjs.org) project bootstrapped with [`create-next-app`](https://nextjs.org/docs/app/api-reference/cli/create-next-app).

## Getting Started

First, run the development server:

```bash
npm run dev
# or
yarn dev
# or
pnpm dev
# or
bun dev
```

Open [http://localhost:3000](http://localhost:3000) with your browser to see the result.

You can start editing the page by modifying `app/page.tsx`. The page auto-updates as you edit the file.

This project uses [`next/font`](https://nextjs.org/docs/app/building-your-application/optimizing/fonts) to automatically optimize and load [Geist](https://vercel.com/font), a new font family for Vercel.

## Learn More

To learn more about Next.js, take a look at the following resources:

- [Next.js Documentation](https://nextjs.org/docs) - learn about Next.js features and API.
- [Learn Next.js](https://nextjs.org/learn) - an interactive Next.js tutorial.

You can check out [the Next.js GitHub repository](https://github.com/vercel/next.js) - your feedback and contributions are welcome!

## Deploy on Vercel

The easiest way to deploy your Next.js app is to use the [Vercel Platform](https://vercel.com/new?utm_medium=default-template&filter=next.js&utm_source=create-next-app&utm_campaign=create-next-app-readme) from the creators of Next.js.

Check out our [Next.js deployment documentation](https://nextjs.org/docs/app/building-your-application/deploying) for more details.
Version Timeline (by language)
PYTHON
Version History (detailed)
Version Language Type Relation Author Date
Global v1 (Python v1) selected Python Multi-file Archive Initial Implementation eren.tasken 2026-01-30
Contact
Marcel Müller
Email
Publisher

Project
PEACHES