Interview: eClinical's Nathan Johnson on digital transformation and why he loves his current role

Interview-eClinical-s-Nathan-Johnson-on-digital-transformation.jpg
© Getty Images (Getty Images)

Nathan Johnson has 20 years’ experience in clinical research as an innovator and programmer with expertise in statistical analysis and reporting, SAS programming, standards development, and data management.

He is passionate about reshaping clinical trials through digital transformation, intelligent technology, and increased automation. OSP was lucky to catch up with him at this year’s DIA Global in Boston for a chat about his role and his hopes for integration and data flow automation problems along with information about the company’s Data Lakehouse.

Could you tell us a littel about your background? 

I'm currently the VP of digital innovation. eClinical Solutions, I've been with eClinical for three years now. My background is in training and biostatistics, most of my career was spent in statistical programming. So, I’ve spent time in pharma and in the CRO space doing analysis and reporting a lot of my focus has been in automating processes.

I’ve spent time building systems and programmes that streamline the data flow through that whole process of data management, statistical programming, and reporting. I ended up at eClinical, found them through some mutual connections, and I was really excited about the platform they built because they were trying to answer a lot of the questions that had been in the back of my mind for years and years of how do we solve some of these integration and data flow automation problems and I had found the company with the right vision. So, there was alignment in where we wanted to go with the technical expertise to be able to deliver that.

It was really exciting to me, and it's been a fantastic fit, absolutely.

What were some of the questions that were back at the back of your mind?

There's a lot of steps as data flows through a clinical trial where significant effort manually is involved, data management of reviewing and raising queries, that portion of it is in my realm in biostats and stat programming. There was tremendous effort put on data transformation which was done manually and then the generation of tables, listings, and figures, all of that programming is still hands on keyboard writing programmes to perform it.

Because of this standardisation data, my goal has been to find better ways to do this other than just adding more resources and finding more programmers so some of those questions of ‘can we transform data into a more automated way, can we integrate more metadata to drive some of these processes, those are the things that were in the back of my mind. When I saw the Elluminate product they were trying, to achieve the same goals, I was able to bring some subject matter expertise to eClinical around the stat programming.

So do you feel your questions were heard, your ideas have been taken on board and your ideas are progressing?

Absolutely, yes. One of the things that I spearheaded after starting was our statistical computing environment where within the same platform, we brought in some of the analytic tools so the SAS programmers can manage programs and deploy them on data that already lives within this this cloud hosted platform - and I feel that that's a huge step forward for the company in what our product offers - to be able to not just do the data management activities and the medical review activities but it is now being able to support all the biostatisticians and stat programmers to be full-on end to end.

With clinical trials in terms of the data flow you obviously want this to highlight any issues, problems and target areas as it goes through and in your findings discover whether it has been in one specific area or just little things that crop up that you need to fix in terms of trial problems.

Can you highlight some of the issues?

There are the well-known ones, some of the data management activities, the data cleanliness.

Yeah, we were considered clearing patients of data reconciliation, especially now that more data is external. It's not all captured in electronic data capture (EDC). So, when you're having multiple vendors providing different sets of data reconciling that, that's one issue on the stat programming side, there's still a lot of challenges even within data transformation.

We're still seeing papers addressing the same problems. You know, some that we addressed 10 years ago and maybe the technology has changed and we're now trying to solve it in-house instead of SAS. But a lot are the same problems we saw a decade ago. And for me, my vision was well, can we do better if we're still asking the same questions, we haven't solved it yet.

So we're trying to rethink the way that we approach some of those problems so that we can solve them in a different way.

eClinical-Nathan-Johnson.jpg

From what I've gathered, the buzz theme at DIA this year is machine learning (ML) and artificial intelligence (AI). Have you found the same?

It's part of the popular culture with the release of ChatGPT. Now, that's only accelerated that. It was already a bit of a buzzword in the industry before, but now it's so that the general consumption of those ideas is so popular that it's accelerating at quite a rate. There are a lot of very strong use cases for those techniques within clinical trials, and then the approach that that we're taking with AI and ML adoption is that it's embedded in everything that we do. So, when we think of the blueprint for a clinical trial data system and what it needs to look like today, most often you're hearing about where we're going to insert AI and ML. People figuring out how they are going to use it and address single problems and we want to think of it as something that underlies everything that we do. The entire approach can be driven by advanced analytics.

What's considered strictly artificial intelligence might be machine learning. It might be natural language processing or other techniques, but that whole umbrella undergirds everything that we do. We're looking for embedded AI rather than inserting it in in various places. From our perspective it's a philosophy of adopting this technology, this technological solution, as opposed to quick wins of can we insert a chat bot.

How exactly does it work with what you’re doing?

The technology behind it are large language models and language is a pattern, computers are just very good at recognising and replicating patterns and so in the clinical trials arena there are there are lots of ways in which patterns can be identified and signals detected from those patterns. So, with large language models there is a definite place for those within clinical trials to perhaps provide more market and scientific background for certain information. So, from the clinical trials data system perspective, a data software perspective, being able to insert those things to answer questions for end users is very useful. For example, if a user sees a signal, perhaps a safety signal in their data and they want to know more information about how that particular marker behaves in other areas outside of that trial, you can very quickly ask a question and get more information from a larger source of data that is powered by a large language model. So those types of kind of human asking a question and the machine providing realistic responses in realistic human language answers it for them.

I must ask; how do you trust where machines get their answers from?

That's something that large language models struggle with because it's not always clear what the sources are that they're trained on. In clinical trials, because of the way it's regulated, you need to be cautious of how you're using that. So how do you balance using artificial intelligence? With our pharma QQA QC (quality assurance and quality control) processes, so I don't see it replacing the work that humans are doing to generate those standardised reports, but augmenting it, not the same way that computers augmented what we used to do with pen and paper, and the way that that computer algorithms have augmented the way that we did human involved work flows. Now the AI components will just augment what humans are already doing. However, you still have the oversight that is necessary that you don't want the model making all your decisions relative to a trial.

Could you explain how your data is presented back to the client and how they know it has all been collected and stored safely?

Our system is a cloud-based system. So, this is all hosted in the cloud and the architecture that we're using, and it is called the Data Lakehouse. There are two divergent technologies that have dominated the industry over the past few decades. One is the data warehouse and that is if you think of it as a structured database where it's in a tabular table structure and it's stored and regulated there. There are very good tools for effectively querying that and getting results back, but now and the second type of data architecture is a data lake and that would support more of the unstructured data which fallout of ours in clinical trial is now. Things like that don't lend themselves to being entered into a table. So, whether that's maybe documents, patient records that aren't necessarily tabular, and a data lake is not necessarily the right technology for that. There are some ways to get insights out of that. But in clinical trials, because we have both, to have that in one platform, the architecture we've adapted is this Lakehouse concept. It can take out data silos and increase performance and has the flexibility to support modern clinical trials as well as supporting AI and ML.