Using ChimerSeq fusion annotations ( https://www.kobic.re.kr/chimerdb_mirror/),<br>
we train the model with 1218 samples for each exon and predict fusions within two genes with only 50,100,200,500 samples.<br>
The results are as follows:<br>
predict with all(1218) samples, accuracy:66.0667% precision: 67.3774%<br>
predict with 500 samples, accuracy: 64.7900% precision: 65.2067%<br>
predict with 200 samples, accuracy: 64.8750% precision: 65.1616%<br>
predict with 100 samples, accuracy: 63.2100% precision: 62.8252%<br>
predict with 50 samples,  accuracy: 62.0650% precision: 60.6647%<br>
<br>
And there is annother thing worth mentioning, considering p value into learning model is important for robustness.<br>
Here is the results using model whose only difference with our final model is not taking p value into consideration:<br>
<br>
predict with all(1218) samples, accuracy:65.4250% precision: 68.4211%<br>
predict with 500 samples, accuracy: :66.5700% precision: 67.5902%<br>
predict with 200 samples, accuracy: 64.9400% precision: 63.7115%<br>
predict with 100 samples, accuracy: 62.9550% precision: 60.3448%<br>
predict with 50 samples,  accuracy: 60.5450% precision: 57.2951%<br>
<br>
We can see that after considering p value, there is smaller accuracy and precision drop when using less samples.