.png&w=256&q=75)
1
1
Romania
1 year of experience
.png&w=828&q=75)
We Fine-Tuned Qwen2.5-7b instruct for medical use. Methodology: Datasets: Fine-Tuning : qiaojin/PubMedQA (pqa_labeled) , LinhDuong/chatdoctor-200k (1000 samples) Evaluation : qiaojin/PubMedQA (pqa_labeled) (500 samples , LinhDuong/chatdoctor-200k (500 samples), GBaker/MedQA-USMLE-4-options (500 samples) Fine-tuned with LoRA : rank-8, Evaluation Criteria: We used Gemma 2 as a judge with the labels from the datasets as The Standard Refference. We evaluated Qwen base and our Model( Qwen Medical ) independently to see improvments. Each answer was then graded by the LLM Judge(Gemma 2) on this scale: Grading Rubric: 1: Completely incorrect/unsafe. 2: Major hallucinations. 3: Correct but lacks detail. 4: Highly accurate, minor flaws. 5: Medically flawless and perfectly structured. Qwen Base got a score of 2.04 while Qwen Medical got a score of 2.16 showing a 6% increase.
10 May 2026