Integrating Radiology into an AI System for Physician Decision Making

Over time, physicians gather a vast amount of data such as radiographic images, medical procedures, treatments, and insurance coverage about the patients. A Case-Based Reasoning (CBR) system is developed to organize and retrieve such data to aid physicians in making a diagnosis and formulating a treatment plan for patients with similar traits. This research is an extension of the earlier work presented by the authors. Our earlier research did not consider radiographic images, although radiology plays a vital role in diagnosing and treating patients. Currently, a new algorithm is created to retrieve and organize information from the CBR system to add radiographic images to the patient’s case data. This system is tested and the performance results prove that the acceptance rate achieved by this system is higher than that of the earlier system.


Introduction
The medical field is an ever-evolving field fueled by technological innovation. Modern-day medicine has evolved to a level where radiology is now the backbone of decision making as it provides physicians images that support or disprove various diagnoses. As technology has evolved, radiology has become much more efficient at detecting abnormalities, giving clinicians unprecedented information regarding their patients (Future, 2010;Reasons, 2020). Case-Based Reasoning (CBR) is an Artificial Intelligence (AI) methodology that uses previous experience (i.e., past solved cases) in solving similar new cases (i.e., new problems) (Aamodt & Plaza, 1994;Cox et al., 2018;Holt et al., 2005;Prasad, 2005). The following six-phased sequence depicts the activities or components of the CBR process, as shown in Figure 1.
Phase 1: The user submits a new problem, e.g., a new patient arrives at a hospital for treatment.
Phase 2: The system selects an existing case (e.g., a case of an existing patient), which is "close" to the current problem (i.e., current patient's medical scenario). The closeness is determined as defined in the system's algorithm.

Figure 1. Visual representation of the CBR process
Our study did not find any existing CBR system that uses all of the following attributes for a patient: radiographic images, symptoms, diseases (comorbid conditions), treatments, age, time-gap between events, dates, insurance coverage, gender, and race in its decision making process for treating a new patient. The current research work extends our earlier research (Paruchuri & Granville, 2020) by adding a patient's radiographic images to the case attributes. We will refer to the earlier research as "system-A" for simplicity. A primary drawback with system-A is that it did not use or consider radiographic images of patients, thereby neglecting or overlooking an essential component of modern-day medicine. The current system is developed to overcome this drawback. When integrating radiographic images in a CBR system, the following new challenges must be solved:


How to represent and index radiographic images;  How to integrate radiographic images in the recommendation process;  How radiographic images will affect the overall performance of the system.
The rest of the paper is organized as follows: Section 2 presents a review on CBR in radiology. Section 3 presents how the data is represented in the system. Section 4 presents additional details. Section 5 presents the algorithm. Performance results are presented in Section 6. Finally, Section 7 provides the conclusion.

Case-Based Reasoning Using Radiographic Images
Substantial research has been conducted regarding the use of CBR in medicine. A recent survey on this subject is available in the earlier research (Paruchuri & Granville, 2020). Several researchers employ radiographic images in their CBR systems. However, none used all of the attributes included in our current CBR system, such as radiographic images, symptoms, diseases (comorbid conditions), treatments, age, time-gap between events, dates, insurance coverage, gender, and race. System-A from our earlier work (Paruchuri & Granville, 2020) uses all these attributes except for the radiographic images. Macura & Macura (1995) presented a case-based retrieval system that uses radiographic images. This system is probably the earliest system to apply CBR to radiographic images. Later, Grimnes & Aamodt (1996) presented a multi-layer CBR system for medical image interpretation. Perner (1999) presented a CBR system to identify a degenerative brain disease by using CT images of a patient's brain. Wilson & O'Sullivan (2008) presented a review of medical imagery applications in a CBR system. A CBR system to provide treatment planning in brain cancer radiotherapy was presented by Jagannathan et al. (2010). Khussainova et al. (2015) designed a CBR application to help medical staff to plan treatment for brain cancer patients using radiotherapy. CBR is also used to segment the images of the kidneys deformed by nephroblastoma (Marieusing et al., 2018). Sekar et al. (2018) presented a CBR approach for decision support in breast cancer management.

Representation of Patients' Data
Representation of patients' data that includes radiographic images is an important issue. A date that represents an event is stored using 8 digits in which the first 2 digits represent the month, the next 2 digits represent the day, and the last 4 digits represent the year. Integers ranging from 0 to 150 are used to represent the age in years. Any months in the age makes the age rounded to the closest year. Time gap is the time between two different events such as the time interval between two consecutive visits to the doctor. Time gap is represented as a decimal number in which the integer part represents the years, and the decimal part (maximum 2 digits) represents the months. The patient's gender is represented as follows: 0 represents male, 1 represents female, and 2 represents other. Integers from 151 to 10,000 represent the symptoms, where each integer represents a specific symptom. E.g., in this system 151 represents fever, 152 is headache, etc. Each disease is represented by a specific integer and these integers range from 10,001 to 100,000. Integers in the insurance coverage field range from 1 to 1,000 represent insurance coverage plans, where each coverage plan is represented by a specific number. Integers from 100,001 to 200,000 are used to represent the treatments, where each integer represents a recommended treatment. Furthermore, each patient is assigned a unique integer and this number represents the patient. The patient field uses integers from 1 to 1,000,000, thereby representing a total of 1,000,000 different patients.
The following approach is followed to address the issue of how to convert radiographic images into a suitable form for further processing, such as a comparison between two radiographic images and to index those images. In this research, radiographic images are considered as only grayscale images, with each image consisting of 512 × 512 = 262,144 pixels. A pixel, which is an acronym for "picture element," is a tiny block of the image that represents the quantity of gray intensity for that tiny block. A widely used pixel format is the byte image. Here, the number representing the byte image is stored as an 8-bit binary integer giving a range of 256 different possible values starting from decimal value 0 (if all the 8 bits are 0s) to 255 (if all the 8 bits are 1s) to each pixel. Each value in between 0 to 255 represents a different shade of gray and, usually, black color is represented by the value 0, and white color is represented by 255 (Eck, 2018;Pixel, 2020). Byte image format is used in this research, and all images are of the same width, height, and contrast. Each image is represented as a vector of 512 × 512 elements.
If a radiographic image, made up of M x N pixels (i.e., M pixels in each row and there are N such rows), is stored in raw format (i.e., without any compression, without any header, etc.) then the file size or length of that image file is M x N x 8 bits. In this research, we have used cityblock distance (Chen & Chu, 2005) for measuring the difference between the images. If there are two images, represented as vectors x and y and each vector is of size k, then the cityblock distance between these two images can be calculated as: Cityblock distance is used in this research because, when compared to other methods, it requires fewer steps to calculate the difference between two images (Chalom et al., 2013;Wang et al., 2005). This is specifically important, since the system, in general, makes a large number of computations each time it is activated as can be seen by the algorithm provided in Section 5. Therefore, improvement in efficiency is crucial to maintain the timely output of data.
In this research, we defined "difference" between two images. If two images are represented as vectors x and y, respectively and each vector is of size k, then the difference between those two images is: x 512, and M is 512 x 512 x 255.
The difference is a number between 0 and 1. The following images (Radiopaedia, 2020) illustrate this principle: Figure 2. Difference between the images A and B is 0 because they are the same. Figure 3. Difference between the images C and D is > 0 because they are not the same In this system, a radiographic image is collected either before treatment or after treatment. The radiographic image that is collected before treatment is called pre-treatment image, and the radiographic image that is collected after treatment is called post-treatment image. Currently, the system can handle up to one pre-treatment image and up to one post-treatment image for a treatment (or a set of multiple treatments) performed in a day. In this system, pre-treatment images are indexed by -1, and post-treatment images are indexed by -2.

Further Details of the System
Patient information is divided into primary and associate items. As the names imply, primary items play a vital role in the system's decision-making process, while the associate items play a supporting role in fine-tuning that process.
 A primary item is a pre-treatment image, post-treatment image, treatment, disease, symptom, or time gap. Each state consists of a single primary item or a set of multiple primary items.
 If multiple primary items, all of which are of the same type, occur on the same date, then the primary item of that state is the set of those multiple primary items. For example, it is common to have multiple procedures performed on the same date.
 If multiple primary items, which are of different types (for example, one or more treatments and one or more symptoms,) occur on the same date then the states (made up of these primary items) are created in the following order: a. The state containing the symptom or set of multiple symptoms precedes the state containing the disease or set of multiple diseases.
b. The state containing the disease (or set of multiple diseases) precedes the state containing the treatment (or set of multiple treatments), and this treatment includes pre-treatment image, "if any" and/or post-treatment image, "if any." Note that "if any" is specified because for some treatments, radiographic images may not be taken. For example, if a patient exhibits symptoms of pneumonia, a chest x-ray is often used to confirm before beginning treatment. However, after the treatment, if the patient is asymptomatic, a post-treatment chest x-ray is not always required.
 No state will be made up of pre-treatment image(s) and/or post-treatment image(s) alone. If a state contains a treatment, and a pre-treatment image and/or a post-treatment image is taken as part of that treatment, then that state will be transformed to a set made up of the treatment and its image(s).
 Associate items represent different fields of a state. Therefore, the numbers used to represent various categories of associate items may overlap.
If the primary item (of a state) is a set of multiple symptoms or a set of multiple diseases then two such primary items are defined as matched if those two sets are equal. If the primary item (of a state) is a set of multiple treatments without any pre-treatment or post-treatment image, then two such primary items are defined as matched if those two sets are equal. If the primary item (of a state) is a set of treatment(s), which contain a pre-treatment image and/or a post-treatment image, then two such primary items are defined as matched subject to the following conditions:  The two sets are equal after those images are removed from both those sets,  If both sets contain pre-treatment images then the difference (as defined in Section 3) between those two pre-treatment images is less than or equal to the threshold-value (see the algorithm in Section 5 for details of the threshold-value), and  If both sets contain post-treatment images, then the difference (as defined in Section 3) between those two post-treatment images is less than or equal to the threshold-value (see the algorithm in Section 5 for details of the threshold-value).
If a primary or associate item of a state is not a set and is not an image, then two of such items are defined as matched, if those two items are numerically equal in all of their corresponding fields. The system's algorithm and its discussion are provided in the next section. Threshold-Value  0.0; /* DETERMINE THE COMPETING-SET */ 3. Label the i most recent states in the current patient's case as Si, ..., S1, respectively. In this sequence, Si is the least recent and S1 is the most recent among all these states;

Details of the Algorithm
Case  A case contains the following sequence of states: Si, …, S1; If time gap is not a member in the Item-Set, then remove from Case those states in which time gap is the primary item. Now call that resulting case as Case; j  Length of Case; If j < 2, then exit the whole process else: Competing-Set  The set consists of all those existing sub-cases in the system for which all the following conditions are met: (a). The length of the sub-case is (j + 1) or more; (b). The primary items, which are present in the Item-Set and are in the first j adjacent states of that sub-case, respectively match, as per the definition provided in Section 4, with the primary items of the corresponding state of Case; (c). The associate item(s), which are present in the Item-Set and are in the first j adjacent states of that sub-case, respectively match, as per the definition provided in Section 4, with the corresponding associate item(s) of the corresponding state of Case.

4.
If Competing-Set is empty and Threshold-Value < Max-Threshold-Value, then perform the following in the specified order: Recommend the primary item, which is in the (j + 1) th state and is other than a time gap and is there in the majority of the sub-cases of Competing-Set.
If two or more such primary items present, then recommend the primary item that is in the most recent state. Still, if two or more such primary items present, then recommend those primary items as options.
If the primary item, which is at the (j + 1) th state in each of the sub-cases in Competing-Set is time gap and: (a). i > 2, then do the following in the specified order: i  (i -1); Go to Step 2; (b). i = 2, then exit the process; In the above algorithm, Threshold-Value represents the difference that will be tolerated (by the algorithm) between two images. The Threshold-Value is initially set to 0. When there is no suitable previous case that matches the current user's case, the Threshold-Value will be incrementally increased by 0.01, and this process will continue until the Threshold-Value reaches the maximum limit (Max-Threshold-Value) which is currently set to 0.2. If there is no match, the first associate item in the Priority-List will be dropped from consideration, and the process repeats. Still, if there is no match, then the oldest state from the current patient's case will be dropped from consideration, and the process repeats. If all these choices are exhausted without finding a match, the algorithm will terminate.
Example: Assume that a new patient's case is 153, 1, 10003, 100009, 0.3, 157 and the doctor is using system-A. The associate items and the images are not taken into consideration in this example. This case implies that the new patient had symptom 153. After one year, disease 10003 is identified. After that, treatment 100009 is recommended and performed. Again, after a gap of 0.3 years, another symptom 157 is noticed. Now the doctor is looking for feedback from the system. After searching the case-base, the system did not find an exact match to recommend, thereby causing the system to drop the time gap. Then 1 and 0.3 are removed from the case. Therefore, the resulting case is 153, 10003, 100009, 157. After a few iterations, system-A identifies the conflict set C (which is similar to Competing-Set in the current system) to consist of the following three cases: 153, 10003, 100009, 157, 10002 153, 10003, 100009, 157, 10002 153, 10003, 100009, 157, 10006 Then 10002 is recommended by system-A, as 10002 is presenting in the majority of these cases. However, for this example, when using the current system, it is noticed that the pre-treatment images associated with 100009 in the first two cases of C did not match with the current patient's corresponding pre-treatment image (image details are not provided here for simplicity). Therefore, these first two cases are not included as part of Competing-Set of the current system. However, the current user's case matches with the last case of C (and no other matching cases were found in the system). Therefore, 10006 is recommended to the doctor. This is a better recommendation because the pre-treatment images are matched.

Adaptability to a Change of Trend and Performance Results
Like system-A, the performance of the current system is tested in terms of its adaptability to a change of trend in the medical field. The testing is performed on a simulated environment consisting of 250 different symptoms, 200 different time gaps, 1605 different images, 300 different treatments, 350 different diseases, 2 different genders, 12 different insurance coverage plans, 300 different dates, 3 different races, 345 different ages and 755 separate cases. The results are presented in Figure 4. In Figure 4, the X-axis specifies the number of cases, while the Y-axis specifies the acceptance rate of the recommendations provided by the current system. The threshold value is set to 50%. Once the acceptance rate started decreasing monotonically and reached 42% (which is below the set threshold value), the system stopped using the cases that were recorded prior to the date the monotonic decrease started (dates are not shown in the graph). Thereby, the graph in Figure 4 shows that the acceptance rate picks up again. Figure 5. Performance comparison between system-A and the current system The current system is compared with system-A, to evaluate its operational performance, in terms of making acceptable recommendations to the doctors. The acceptance rates of both these systems are provided in Figure 5. The X-axis specifies the number of cases and the Y-axis specifies the acceptance rate of the recommendations from these systems. In Figure 5, the arc connecting the squares represents the current system, whereas the arc connecting the crosses represents system-A. Moreover, the acceptance rate with system-A was 36% after accumulating 300 cases in the system. However, for the current system, that rate is 70%, a substantial increase. The current system gives doctors a warning that radiographic images are used to generate the current system's recommendations. When using system-A, doctors were given a warning that radiographic images are not used as a parameter while making recommendations. Due to this warning, doctors were hesitant to accept the recommendations made by system-A.

Conclusion
A CBR system for aiding physicians in decision-making was presented by adding radiographic images to the cases. This work extended the authors' earlier research (Paruchuri & Granville, 2020). However, the current radiographic system needs further refinements. For example, currently the system cannot handle multiple pre-treatment images, multiple post-treatment images or if the images are not pre-treatment or post-treatment type. The system can be further expanded to integrate these various scenarios in which radiographic images can be applied. Although the system was tested in a simulated environment, it needs further testing to determine which methods of image storage are optimal for aiding physicians. Currently, there are several methods for storing radiographic images effectively and finding the difference between a pair of radiographic images (Chalom et al., 2013;Wang et al., 2005). The limitations of the earlier system (system-A) are valid here, and they also need to be addressed.