Interview with Dr. Martin Sedlmayr, Chair of Medical Informatics, FAU Erlangen-Nürnberg
In the "KDI – Clinical Data Intelligence Project", researchers are trying to consolidate various types of data to make them useable and useful to both medical professionals and scientists. This is a tremendous undertaking, considering the data volumes from different sources. In this conversation with MEDICA, Dr. Martin Sedlmayr explains the project setup.
Dr. Sedlmayr, what is the goal behind the "KDI – Clinical Data Intelligence" Project?
Martin Sedlmayr: For starters, the project combines the previous findings of several projects that have so far been funded by the German Federal Ministry of Economics (Bundeswirtschaftsministerium): clinical research requires data to be able to conduct research. As long as the data is available in a structured format, everything is fine. However, things get very difficult when it comes to free texts because it’s nearly impossible to analyze them automatically with the computer. That is why the precursor project "Cloud4Health" addressed this subject. Having said that, in addition to data from free texts and structured data, there is also an abundance of other information; for example, images that can be assessed with specific algorithms, or genomic data. One of the fundamental goals is to consolidate and combine all of these different data sources and types in one research database. Once you have created this integrated data pool, you can match and link data that could previously not be merged and reach new conclusions. We wanted to demonstrate this with two "use cases". This is currently taking place in collaboration with the Department of Nephrology at the University Hospital Charité Berlin, which maintains a register that contains thousands of pieces of information collected over 20 years from patients, who underwent kidney transplantation. The question is whether you can build predictive models for patients based on this data for example. Here in Erlangen, we address the issue of breast cancer by collaborating with the departments of gynecology, radiology and human genetics. We hope that by analyzing data, we are able to better manage the current mammography measures for example.
You consolidate data that was collected in very different types of formats. Are you planning to offer a specific format in the future to prevent the differences between them or is the database generally meant to support "all" data formats?
Sedlmayr: It would be great if data could be collected in a standardized fashion but unfortunately, the reality is different. You have different types of producers and different departments with their individual processes and emphases. It is definitely not realistic to achieve a uniform documentation standard for a while yet, even if we can certainly see a trend in this direction. There is clearly a growing awareness that data can create an added benefit. The German Federal Ministry of Education and Research (BMBF) invests more than one hundred million Euros in projects to promote the exchange and usage of data with its Medical Informatics Funding Initiative.
That’s why it is actually necessary for all parties to agree on one common language and terminology. Yet so far, we have to convert data into a standardized format, regardless of the type. We need uniform structures in research databases.
Does this happen automatically or do you need actual employees to screen the data?
Sedlmayr: Both. Some of it can already be done automatically by computers, such as text mining to search free and open-source texts for example. However, there will always be unanswered questions that can only be answered with the help of human intervention.
Are there any thoughts already on what later inquiries into the database should look like?
Sedlmayr: There are several conceivable possibilities, depending on the user. A researcher wants data from as many patients as possible and efficient algorithms to analyze this data in compliance with data protection principles. A physician, who treats a patient wants to have an immediate added benefit. He or she needs decision support systems that provide concrete recommendations based on existing data. This can be achieved by integrating functions with existing systems or with new infrastructures. One vision of future use for medical professionals is an app that answers these types of questions.
So in other words, this is a type of "Dr. Siri" that can answer questions?
Sedlmayr: That would be one option. However, we are thinking of a slightly more conservative approach. For instance, an app infrastructure, meaning an ecosystem that provides specialized applications for specific problems. Data utilization will definitely look differently for research purposes. Here the consideration is methods of application to data integration points that specify the data you would like and what exactly you would like to research. After all, medical data can generally only be disclosed if the patient has given his/her consent. That means there needs to be a supervisory authority that ensures data protection.
How do you generally plan to ensure data protection? To be able to use the data for many years and perhaps for purposes that are hitherto unknown, wouldn’t you actually need general patient consent?
Sedlmayr: This is indeed a balancing act we try to perform. Together with the Funding Initiative, we are currently deliberating what consent might look like since generalized consent forms are actually prohibited. Having said that, we are not able to anticipate today how we might also be able to use the data in the future. We have not been able to come up with a solution so far. What’s more, today’s consent forms are in paper and not digital format, which presents a challenge. However, this would be essential to accommodate automated access in compliance with applicable laws.
What structures have to be created to build this type of large common use database?
Sedlmayr: It is surprising that big data is a prominent feature in medicine but that we actually don’t have large amounts of data in this sense in medicine. At most, we see this with DNA databases or in the case of large amounts of image data. Usually, the data volume is manageable, even in the case of a University Medical Center. If anything, heterogeneity is a major problem. Of the four "V’s" of big data, variety is enormous because we have many different types of data. Data veracity is also a challenge because data was in part collected given certain contexts but is now used elsewhere. For example, when I document a diagnosis for billing purposes, it might be unsuitable as qualitative data to make a medical research statement. For instance, the ICD code is not accurate enough for conducting medical research.
What are your next project steps?
Sedlmayr: If we get the extension approved, the project will run until the end of September. So far, we have tapped into a lot of data and created many different components for analysis. In the coming months, we will combine them into an integrated system to illustrate the entire use-definition chain.