Turning a Research Idea into a Working Assistant: Medical Reports AI

OSF Study

App Link

Context

Patients often get medical reports that are hard to read. The numbers are scattered, the words are technical, and even doctors explain them differently.
So I wondered — what if an AI could summarize, explain, and answer questions directly from a patient’s own report?

That was the starting idea for Medical Reports AI, a small experiment that turned into a preregistered pilot study published on OSF.

The goal was simple:
👉 Make report reading faster and clearer for patients.
👉 Keep everything auditable, transparent, and ethical.


Hypothesis

If AI can read and summarize medical text, then:

  • Patients will find information faster.
  • They’ll feel more confident before doctor visits.
  • They’ll trust the system if it’s transparent about what it’s doing.

This became the foundation of my preregistered study — testing if an AI assistant could reduce time-to-find and increase user confidence when reading reports.


Development

I built everything myself over a few weekends.

🔹 Backend (Core Engine)

  • Python + LangChain + OpenAI API → handled text parsing, summarization, and chat-based answers.
  • LangSmith → used for auditing and prompt tracing.
  • OpenAI OCR → extracted text from scanned reports.
  • Backblaze B2 → cloud storage for uploaded files.
  • FastAPI → exposed all endpoints as a clean REST API.
  • RepoCloud → hosted the backend, keeping logs visible for debugging.

🔹 Frontend

  • Generated first version with v0.dev, then manually connected to my API.
  • Designed a simple upload → chat → summary flow.
  • Each answer cited text from the uploaded report to improve trust.

It wasn’t fancy, but it worked. The user could upload a report, wait a few seconds, then chat with an AI about their findings — “What’s my hemoglobin level?”, “Is this normal?”, or “What’s abnormal here?”


Testing & Results

I preregistered the study before collecting data (DOI: 10.17605/OSF.IO/SPJNY).
Then I ran a within-subject pilot with 5 participants (non-medical users).

They did two sessions:

  1. Without AI – manually finding values in sample reports.
  2. With AI – using the assistant to find and interpret them.

Key outcomes:

MetricBeforeAfterChange
Time-to-find (sec)260.098.4↓ 62.2%
Ease of finding (1–5)3.24.4↑ significant
Confidence (1–5)3.04.0↑ significant

Post-surveys showed:

  • Trust: ~4.0
  • Usability: ~3.8
  • Reuse Intention: ~4.4

Users liked “how fast it finds things” and “seeing everything in one place.”
The main complaints were upload latency and “how do I know it’s accurate?”


Reflection

What worked:
✅ Fast search and summarization drastically improved the experience.
✅ The audit trail from LangSmith made debugging and verification easier.
✅ Transparency (showing report text snippets) built trust.

What didn’t:
⚠️ Upload latency — heavy PDF OCR slows the first impression.
⚠️ No explicit confidence score — users wanted reassurance about accuracy.

I understood that building a usable AI tool is less about the model and more about trust design — making sure users feel the system isn’t hiding anything.


Takeaway

This project started as a weekend idea but ended up as a preregistered open pilot with real data.
It proves that small, transparent systems can genuinely improve understanding — even in sensitive fields like healthcare.

The next step is obvious: scale it carefully, test with larger and more diverse users, and explore how explainable AI can move from lab demos to patient tools

Rakib Jahan
Rakib Jahan
Articles: 48