Introduction
According to Chen Chaoyang, chairman of Yimai Sunshine, “AI can be easily applied in grassroots healthcare, but generating AI from there is quite challenging.”
China has over 30,000 rural health clinics, serving more than 1 billion patients annually, playing a crucial role in healthcare. However, the data generated by equipment like CT and ultrasound in these clinics is often of poor quality. High-quality data is essential for developing effective AI models, yet the imaging data from rural healthcare facilities currently lacks the necessary standards, leading to wastage.
The Challenge of Data Quality
Chen noted that collecting meaningful imaging data from rural clinics is particularly difficult. Yimai Sunshine focuses on medical imaging data, including CT, MRI, and ultrasound. They operate 117 imaging service centers across 20 provinces and collaborate with over 1,100 institutions, including rural healthcare facilities.
“China has about 100,000 hospitals, with 30,000 in rural areas, 30,000 in communities, and another 30,000 in urban settings. We believe that to improve future healthcare, we need to digitize these 100,000 medical institutions,” Chen stated.
According to the National Health Commission’s 2024 report, there are 33,334 rural health clinics in China. These institutions bear a heavy burden, with patient visits increasing to 1.38 billion in 2024, up by 70 million from the previous year.
Imaging data is a core component of medical data. Given the large patient population in China, healthcare institutions generate vast amounts of imaging data each year. Ideally, Chinese AI companies should have access to ample high-quality medical data. However, the reality is that much of the data from rural clinics is difficult for AI companies to utilize.
“The imaging data collected from rural clinics differs significantly from that of top-tier hospitals. For instance, CT data collected in Beijing is fundamentally different from that collected in a rural clinic,” Chen explained. When training medical AI models, much of the data from rural clinics is unusable.
The Importance of Data
Data serves as the fuel for AI. China is a competitive market for medical AI, with companies like Philips and United Imaging viewing AI as a key driver for the future of healthcare. At a recent medical device expo, Philips showcased over 50 innovative products, nearly half of which are closely related to AI. The next generation of energy CT scanners is expected to generate an explosion of data.
As noted by Yao Maoqing, chairman and CEO of Mifeng Technology, current large models are data-driven. “Garbage In, Garbage Out; if you input garbage data, you get a garbage model out.” Low-quality data can lead to deeper issues, making it hard for large model companies to determine whether poor results stem from bad data or flawed models, potentially undermining effective algorithms.
Currently, there is a one-way flow of AI technology to grassroots healthcare. AI can be adopted by these facilities through cloud services or embedded devices, but they struggle to provide usable data back to AI developers.
Disparities in Healthcare Resources
This issue largely arises from the uneven distribution of healthcare resources. Chen pointed out that healthcare systems in Europe and the U.S. are more homogeneous, with less disparity between rural and urban healthcare levels. This results in lower costs for data conversion to application in those regions due to higher standardization in data collection. In contrast, this presents a significant pain point for China.
The Waste of Data
The disparity in imaging data quality between top-tier hospitals and rural health clinics reflects a significant difference in talent systems. China’s modern healthcare system is relatively young, yet the vast population means that the ratio of healthcare workers is still insufficient.
“In our system, there are several specialties that high-quality medical students are reluctant to choose, including pediatrics and radiology. The primary reason is related to income,” Chen noted. Medical students who complete imaging programs at universities often prefer urban hospitals over rural clinics.
Chen observed that while county-level hospitals may have doctors with PhDs, many radiologists in rural clinics are only diploma graduates.
In radiology, completing a full examination process requires two personnel: an equipment operator and an imaging doctor. Unlike taking pictures with a simple camera, medical equipment operators need to understand the clinical diagnosis direction and how to use complex device functions to achieve imaging results.
“For example, after an MRI, if I suspect a patient’s gray matter or blood vessels are problematic, I need to adjust to a TWI scan. If the operator lacks the skills, the information will be insufficient, and the clinician cannot make an accurate diagnosis,” Chen explained. “Qualified imaging doctors are even scarcer. A chest CT can yield 300 images, and without extensive training, they cannot interpret the images effectively.”
Due to the poor data quality in grassroots healthcare, some medical AI companies must collect data themselves. With advancements in communication technologies like 5G, these companies can remotely control devices in distant locations to standardize data collection. This can help align data from rural patients with that from major cities.
“The most expensive part of our AI development is the data construction and computing power,” said Zhu Ruixing, CEO of Shenzhi Technology. He believes that many medical large models are already available, and public data published in high-quality medical journals has been fully utilized. The unique advantage of AI in healthcare will be proprietary data.
“Proprietary data can create long-term barriers and continuously improve model accuracy. Without live data, there are no barriers,” Zhu stated.
The issue of medical data quality is not limited to grassroots healthcare. According to Senyi Intelligent, the core challenge lies in the complexity of data governance and integration. Hospitals often operate numerous independent systems with varying architectures and data standards, hindering interoperability. The lack of standardization in medical terminology and the prevalence of unstructured data, such as medical records, further complicate data cleaning and analysis. Poor data quality, including errors and missing fields, undermines the reliability of AI models and increases development costs.
“Healthcare institutions are extremely fragmented, each acting as an isolated island. Our treatment practices are dispersed across these islands,” Chen said. “I am the owner of my medical data, and even organizing health check data is challenging because sometimes it is collected in Beijing and other times in Shanghai.”
Data quality determines the upper limits of AI. With a large population, if China’s medical data can be fully utilized, AI could be significantly enhanced. Conversely, if data collection is inconsistent and healthcare institutions remain isolated, it leads to substantial waste.
Comments
Discussion is powered by Giscus (GitHub Discussions). Add
repo,repoID,category, andcategoryIDunder[params.comments.giscus]inhugo.tomlusing the values from the Giscus setup tool.