Wonjin Yoon

WonJin Yoon, PhD   (윤원진)
  It is pronounced like [One gene🧬]

Research Interests: ClinicalNLP / BioNLP / NLP

Postdoctoral researcher at Harvard Medical School and Boston Children's Hospital
  - Machine Learning for Medical Language (MLML) Lab

Former research collaborator of AstraZeneca
  - Feb. 2020 ~ Apr. 2023

Google Scholar   Curriculum Vitae

✨ News

✨ Dec/2023: I am serving as a member of organising committee for the first ChemoTimelines Shared Task, "Chemotherapy Treatment Timelines Extraction from the Clinical Narrative" at NAACL - Clinical NLP 2024.
✨ Apr/2023: I presented my research work on BioNLP at Columbia University NLP Seminar (Columbia University: Department of Computer Science, NY).
✨ Sep/2022: I received an Academic Award, "Standigm Paper Award 2022", from the Korean Society for Bioinformatics (한국생명정보학회) with the paper entitle Sequence Tagging for Biomedical Extractive Question Answering (Bioinformatics 2022).
✨ Jun/2022: I am honoured to be invited to become a member of the Program Committee of the 10th BioASQ! (Link)
✨ Jun/2022: Our working note on the 9th BioASQ challenge (published last year) was nominated as "Best of 2021 Labs paper" and finally accepted in the CLEF 2022. I will be presenting the paper in the CLEF 2022, Bologna, Italy (5th of Sep, 2022).
✨ Nov/2021: Our team scored 1st and 3rd places at 2 tracks of BioCreative challenge (1st place for NER and 3rd place for RE track)
    - Lead our team (KU in collaboration with Richard Jackson from AstraZeneca) for the Relation Extraction (RE) - DrugProt task.
✨ Oct/2021: I am invited to present my research work on BioNLP at an internal conference hosted by AstraZeneca UK (More than a few hundred researchers accross the company and research institute attended the conference)
✨ May/2021: I gave an invited talk at a seminar hosted by RINS, Seoul National University (서울대학교 간호과학연구소) (News).
✨ Sep/2020: Our team (KU) won 8th BioASQ challenge PhaseB for both Exact and Ideal (Exact answers : for two consecutive years)
    - Lead our team for the ideal track this year, and the exact answer track for the last year (News).
✨ Sep/2020: BioBERT has been ranked as the most read papers in Bioinformatics which is one of the top-tier journals in the domain.
Also, BioBERT was included in the Best Papers for the Natural Language Processing Section of the 2020 IMIA (International Medical Informatics Association) Yearbook (link).

My research interests are focused on topics and tasks of Biomedical Natural Language Processing (BioNLP). To elaborate, I am interested in exploring the nature of the biomedical text and in designing to improve modellings for BioNLP tasks. One of the technical aims encompasses extracting biomedical information (such as relations between entities) from text, and building knowledge graphs using the extracted information.

Stemmed from these, I am also interested in scalable language models (distillation) to enlarge the search space of information to the entire MEDLINE document set (All PubMed searchable abstracts) and the use of graph ML (incl. GNN) with knowledge graphs built using BioNLP to infer unvisited/unrevealed biomedical information.

Please check my recent publications on Language Models, QA, RE and NER in the biomedical domain.
(Please check: BioBERT (co-first author), BioASQ (BioASQ7b, 8b) and the entire collection from:

Google Scholar

Misc) I love travelling and taking pictures while I travel.


May. 2023 ~ Present
Postdoctoral researcher at MLML Lab,
Harvard Medical School and Boston Children's Hospital
MLML Harvard Catalyst

2017 ~ Apr. 2023
Researcher at DMIS Lab, Korea University, Seoul (Jan 2017 ~ ) DMIS
Research collaborator, AstraZeneca (UK & Sweden), (Feb 2020 ~ )
Research Intern @ NAVER (Clova AI), (Sep~Dec, 2019)

2013 ~ 2016
B.S., CSE Major, Class of 2017. Korea University (고려대학교)
Exchange Student at NUS (National Univ. of Singapore) (2015)

Chairperson of KUICS ; Korea University Institute of Computer Security (Club) (2015) KUICS
Chairperson of KU - Kyunggi high school Alumni Association(YB) (2014)
Chairperson of Inc0gnito Hacking Conference 2014 Incognito Conference

Seoul Kyunggi High School (Class of 2012 ; 108th graduating class)


Google Scholar

Research Publications / Works (Selected)

Paper and Research results...

KAZU Project and a paper about the project
(Industrial Collaboration Project with AstraZeneca)
KAZU is a highly extensible, scalable open source framework designed to support BioNLP for the pharmaceutical sector.

This project is personally very meaningful to me as during my PhD days I always wanted to know the needs of the "real-world settings" and to challenge myself to bridge the gap between the lab and the real-world settings. This is because I wanted to conduct research that goes beyond the lab settings (i.e. restricted settings) and eventually helps to progress science.
My research collaboration with AstraZeneca started 3 years ago and was initiated from this aspiration. It was really a fantastic experience to work with scientists (special thanks to Richard Jackson) who struggle to fight together to connect recent research outputs to what is compatible for scientists from other academic subjects and industrial researchers.
(The name of the project and framework "KAZU" is named after characters from "Korea University" and "AstraZeneca")

KAZU web demo, KAZU open-source repo and Training code for NER NN module

[Paper] Biomedical NER for the Enterprise with Distillated BERN2 and the Kazu Framework
Wonjin Yoon, Richard Jackson, Aron Lagerberg and Jaewoo Kang
Work as a research collaborator of AstraZeneca PLC
EMNLP 2022, Abu Dhabi, UAE (Industry track)

Available on arXiv

Sequence tagging for biomedical extractive question answering
Wonjin Yoon, Richard Jackson, Aron Lagerberg and Jaewoo Kang
Work as a research collaborator of AstraZeneca PLC
Bioinformatics, 2022, 1-8

Available on Bioinformatics and github

KU-DMIS at BioASQ 9: Data-centric and model-centric approaches for biomedical question answering
Wonjin Yoon, Jaehyo Yoo, Sumin Seo, Mujeen Sung, Minbyul Jeong, Gangwoo Kim and Jaewoo Kang
Original work accepted for BioASQ9b (@CLEF2021) and later invited to CLEF2022 Best of 2021 Labs paper track.

Available on CLEF 2022 (Best paper track) and BioASQ 9b (Original paper)

covidAsk: real-time QA system on COVID-19
Jinhyuk Lee, Minbyul Jeong, Mujeen Sung, Wonjin Yoon, Yonghwa Choi, Miyoung Ko, Seok-Won Lee and Jaewoo Kang

Available on arXiv and covidAsk

Pre-trained model for biomedical question answering
Wonjin Yoon, Jinhyuk Lee, Donghyeon Kim, Minbyul Jeong and Jaewoo Kang

Available on arxiv and github

BioBERT: a pre-trained biomedical language representation model for biomedical text mining
Jinhyuk Lee*, Wonjin Yoon*, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So and Jaewoo Kang
* Joint co-first authors; these authors contributed equally to the work.
One of the Best Papers for the Natural Language Processing Section of the 2020 IMIA (International Medical Informatics Association) Yearbook

Available on Bioinformatics and github(Fine-tuning codes)
* Joint co-first authors; these authors contributed equally to the work.

CollaboNet: collaboration of deep neural networks for biomedical named entity recognition
Wonjin Yoon*, Chan Ho So*, Jinhyuk Lee and Jaewoo Kang
BMC Bioinformatics 2019, 20(Suppl 10):249

Available on here (Open access) and github
* Joint co-first authors; these authors contributed equally to the work.

A Neural Named Entity Recognition and Multi-Type Normalization Tool for Biomedical Text Mining
Donghyeon Kim, Jinhyuk Lee, Chan Ho So, Hwisang Jeon, Minbyul Jeong, Yonghwa Choi, Wonjin Yoon, Mujeen Sung, Jaewoo Kang
IEEE Access, vol. 7, pp. 73729-73740, 2019.

Available on IEEE (Open access) and BERN


E Mail
WonJin Yoon

WonJin Yoon

Ph.D. in Computer Science
Postdoc fellow at Harvard Medical School
Biomedical / Clinical Natural Language Processing

 Linked-in  Github  Harvard Catalyst

*If Facebook link seems broken, please log in to Facebook first.