Secondary medical research: opportunities, risks and benefits of secondary data
What is secondary data and why is it so valuable?
In the age of digital transformation, data has become one of the most valuable resources in healthcare. The collection of new data through clinical trials and observational studies is commonplace in science, but the reuse of existing healthcare data to answer new scientific questions is now receiving increasing attention. For good reason: secondary medical research offers enormous potential to understand, evaluate and improve patient care.
Introduction to secondary research
Secondary medical research is becoming increasingly important in everyday clinical practice and research. But what exactly is secondary research? Unlike primary research, which actively collects data in clinical studies, secondary research uses existing data to answer new questions. This can lead to new findings in all areas of medicine. Secondary data analysis is therefore a valuable research approach that expands the state of knowledge in many disciplines.
This research method was developed back in the 1960s and has become much more important in medical research over the last 15 years - particularly due to the availability of electronic health records and big data technologies. Additional interest in secondary data analysis was sparked by the COVID-19 pandemic, as primary data collection largely came to a standstill in 2020. It is now widely used, partly because technological developments and possibilities such as AI (artificial intelligence) and big data make it extremely powerful.
What is secondary data in medicine?
Secondary data is data that was originally collected for another purpose and is now reused for research purposes. Typical examples of secondary medical data are electronic health records (EHR), hospital and health insurance data, cancer registry data and data from clinical studies that have already been completed.
It is therefore also about the reuse of routine medical data for research purposes. These data provide a wealth of information about how healthcare works outside of controlled trial environments. It is extremely valuable as it reflects the day-to-day experiences of patients, nurses and doctors.
Secondary data analyses can be conducted with any data that has already been collected: Not only data from everyday medical practice, but also with public data such as educational data or data from national surveys. They are made available to researchers free of charge in anonymized form.
Why is secondary data important?
The secondary use of clinical data in research holds great potential for scientific progress and the improvement of healthcare. This is because secondary data analysis is a cost-effective and efficient method of using the wealth of valuable information contained in existing datasets to answer new research questions.
A major advantage is the reduction in the time required for primary data collection. Secondary data analysis makes it possible to conduct research without having to recruit participants and monitor response rates, for example. But it's about more than just speed and money.
Unlike clinical trials, which are conducted under strict conditions, secondary data often reflects the actual care patients receive - with all its challenges and potential errors. This is known as real-world evidence.
All of this serves to improve healthcare, because by analyzing patterns of diagnosis, treatment and outcomes, researchers can, for example, identify gaps in care and recommend changes.
Areas of application for secondary medical research
In secondary research, large amounts of data make it possible to test scientific hypotheses and improve the quality of care. The sheer volume of patient data sets enables researchers to identify rare side effects and investigate unusual diseases.
Secondary research is already being used in many areas: Health services research examines how therapies are used in practice and uncovers regional or demographic differences. Epidemiology uses it to investigate disease patterns and risk factors. Quality management in hospitals relies on them to improve treatment processes and avoid errors. And with the advent of artificial intelligence (AI), secondary data feeds algorithms that can predict diseases, suggest diagnoses or personalize treatments. With the help of machine learning, patterns can be recognized that would otherwise not have been identified.
Secondary data forms the basis for predictive models, e.g., for predicting complications in diabetes or heart failure. Platforms like OHDSI use the OMOP Common Data Model to train machine learning models in an interoperable and reproducible manner. In practice, this means that AI can compare treatment outcomes, calculate risk profiles, and provide clinical decision support on standardized data sets.
Methods for analyzing secondary data
Secondary research combines traditional statistical methods with new technologies: regression and multivariate analyses are still indispensable for understanding correlations and trends. Machine learning models can predict results and support clinical decisions.
In addition to classic regression methods, modern analyses rely on deep learning, e.g. for image diagnostics or the analysis of unstructured data such as radiology findings. Natural language processing (NLP) extracts knowledge from free text (such as doctor's letters and discharge reports) in EHRs, for example to identify rare side effects. Frameworks such as OHDSI Patient-Level Prediction enable the development of ML models directly on OMOP data - an approach that is increasingly accepted in regulatory studies. Together, these tools transform raw data into actionable knowledge.
Study data from research is also made available for further analysis through data sharing agreements. Researchers can share datasets to perform meta-analyses to corroborate results. In the US, researchers are often required to make anonymized datasets publicly available. The US website Data.gov provides an overview.
Challenges and risks
Despite all the advantages, there are also critical points in secondary research. It poses challenges for privacy and thus also the self-determination of the patients whose data is used. Ethical framework conditions and data protection regulations are therefore particularly important in secondary research. Medical data is sensitive, which is why ethical standards for its use are essential. Technical security precautions - such as anonymization and pseudonymization - are necessary to ensure the protection of privacy. Data use requires strict compliance with the GDPR (EU General Data Protection Regulation), which defines how personal health data is reused for scientific research in the EU. Practical challenges may also arise due to different interpretations of the GDPR and national regulations in EU member states.
Researchers also need to be aware of the limitations of secondary data analysis. The most obvious one is that the data has already been collected. The new research cannot therefore change the characteristics of the primary data, but can only analyze what is available. Data quality is relevant for this: Incomplete, erroneous or inconsistent data could significantly distort the results.
Bias is another risk: if patient groups are over- or under-represented, this leads to bias. The conclusions of the research may then not apply to the wider population.
The future of secondary research in medicine
Secondary data analysis is an accessible, practical and cost-effective tool for sophisticated research. In the future, it will become increasingly powerful thanks to big data, AI and machine learning. International data sharing initiatives allow researchers to combine data sets across borders. This is because international data pools enable larger, more robust analyses.
Advances in genomic medicine are paving the way for more personalized treatments through the integration of genomic data and clinical data. And predictive analytics promises earlier detection of diseases, even before symptoms appear.
The integration of AI in secondary data analysis is driven by Explainable AI and FAIR principles. AI models will not only make predictions, but also make them explainable - a must for clinical acceptance. International networks such as OHDSI are driving the harmonization of data sets so that ML models can be trained across national borders. In combination with FHIR interfaces, the exchange of real-time data becomes possible, paving the way for adaptive clinical trials.
Conclusion: Why secondary data is a game changer
Secondary research is fundamentally changing medical research: it saves time and costs, provides insights from real-life care and supports personalized therapeutic approaches. In combination with primary research, it enables improved, evidence-based medicine.
Conclusion: We are living in a new era of medical development. However, its opportunities can only be fully exploited if investments are made in data quality, ethical standards and data protection guidelines are adhered to and public trust is gained and maintained. Because behind every data set is a person.
Data4Life's digital solutions make health data researchable and promote evidence-based medicine.
The contents of this article reflect the current scientific status at the time of publication and were written to the best of our knowledge and belief. However, the article cannot replace medical advice and diagnosis. If you have any questions, please contact your general practitioner.
FAQ
What is the difference between primary and secondary research?
Primary research collects new data in clinical studies, while secondary research uses existing data to answer new questions.
Which data sources are used?
Typical examples of secondary medical data are electronic health records (EHR), hospital and health insurance data, cancer registry data and data from completed clinical trials.
Is the use of secondary data compliant with data protection regulations?
The use of data requires strict compliance with the GDPR (EU General Data Protection Regulation), which defines how personal health data is reused for scientific research in the EU.
Sources
1. Science Direct: "Secondary data analysis is a cost-effective, accessible, and efficient means of using existing data to answer new research questions." ScienceDirect [accessed July 28, 2025] 2. JMIR Publications: "First, the clinical data are immediately available. No physical intervention is required..." JMIR+1EDPB+1ScienceDirect+1JMIR+1 [accessed July 28, 2025] 3. BMC/ Springer Nature: "Secondary use of health records for prediction, detection, and treatment planning..." informatics.bmj.com+13BioMed Central+13BioSlice Blog+13 [accessed July 28, 2025] 4. OUP: "From the perspective of EU data protection law, such secondary use requires a legal basis under Article 6(1) and permission under Article 9(2) of the General Data Protection Regulation." Oxford Academic+1BioSlice Blog+1 [accessed July 28, 2025] 5. Arnold & Porter/EDPB Study: BioSlice Blog+1EDPB+1 [accessed July 28, 2025]