Contact

Email: yuzhang [AT] tamu [DOT] edu; yuz9yuz [AT] gmail [DOT] com
Office: 222 Peterson Building, 435 Nagle St, College Station, TX 77843

About Me

I am an Assistant Professor at the Department of Computer Science & Engineering at Texas A&M University. I lead the SKY Lab.

Prior to TAMU, I received my Ph.D. and M.Sc. degrees in Computer Science from the University of Illinois at Urbana-Champaign, advised by Prof. Jiawei Han. During my graduate study, I visited the University of Washington, working with Prof. Sheng Wang; I interned at Microsoft Research Redmond three times, working with Dr. Iris Shen, Dr. Hao Cheng, Dr. Xiaodong Liu, and Dr. Yuxiao Dong. My Ph.D. thesis won the ACM SIGKDD Dissertation Award Runner-Up.

Prior to UIUC, I received my B.Sc. degree in Computer Science from Peking University, advised by Prof. Yan Zhang. During my undergraduate study, I got the China National Scholarship; I visited Carnegie Mellon University, working with Prof. Kathleen M. Carley.

[Recruiting] I am looking for self-motivated Ph.D. students and interns! Please fill out this form if you are interested in working with me. After completing the form, you are also welcome to reach out via email. I will read all submitted forms and emails but I do apologize for not being able to respond to each of them.

Research Interests

NLP and Data Mining for Science (Biology, Medicine, Mathematics, and Science of Science):
[WWW 2025, EMNLP 2024, EMNLP 2023, KDD 2023a, WWW 2023, Bioinformatics 2019]

NLP and Text Mining with Graphs and Structural Knowledge:
[ACL 2024, ACL 2023, KDD 2023b, ICLR 2023, WWW 2021]

NLP and Data Mining under Weak Supervision:
[ICML 2023, NeurIPS 2022, WWW 2022, WSDM 2022, WSDM 2021, SIGIR 2020]

Teaching

Fall 2025: CSCE 670 - Information Storage and Retrieval
Spring 2025: CSCE 689 - Special Topics in NLP for Science

What’s New [What’s Not New…]

2025-07 to 2025-12 Invited to be a PC member of KDD 2026 (Area Chair).

2025-08-03 Attended KDD 2025 in Toronto, Canada. Thrilled to receive the ACM SIGKDD Dissertation Award Runner-Up and give my award talk!

2025-01 to 2025-06 Invited to be a PC member of KDD 2025 (Area Chair), ACL 2025 (Area Chair), NeurIPS 2025 (Area Chair), and EMNLP 2025 (Senior Area Chair).

2025-05-15 Two papers got accepted by ACL 2025 (1 main conference + 1 findings)!

2025-04-30 Attended SDM 2025 in Washington, DC to present our tutorial.

2025-04-11 Gave a talk at the University of Kansas [slides].

2025-03-07 We will hold two KDD 2025 workshops on Structured Knowledge for Large Language Models (SKnowLLM) and Machine Learning on Graphs in the Era of Artificial General Intelligence (MLoG-GenAI)! See you in Toronto!

2025-01-20 Our paper on Paper-Reviewer Matching was accepted by WWW 2025 as an oral presentation! The acceptance rate is 19.8% (409/2062).

2025-01-01 Joined Texas A&M University as an Assistant Professor!

Selected Publications [Full List]

(“*” indicates equal contribution. Unless otherwise specified, the paper is accepted as a research track long/regular paper.)

Preprint

Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning [arXiv]
Shubham Parashar, Shurui Gui, Xiner Li, Hongyi Ling, Sushil Vemuri, Blake Olson, Eric Li, Yu Zhang, James Caverlee, Dileep Kalathil, and Shuiwang Ji.
arXiv:2506.06632.

RM-R1: Reward Modeling as Reasoning [arXiv] [code]
Xiusi Chen, Gaotang Li, Ziqi Wang, Bowen Jin, Cheng Qian, Yu Wang, Hongru Wang, Yu Zhang, Denghui Zhang, Tong Zhang, Hanghang Tong, and Heng Ji.
arXiv:2505.02387.

Protein Large Language Models: A Comprehensive Survey [arXiv] [project page]
Yijia Xiao, Wanjia Zhao, Junkai Zhang, Yiqiao Jin, Han Zhang, Zhicheng Ren, Renliang Sun, Haixin Wang, Guancheng Wan, Pan Lu, Xiao Luo, Yu Zhang, James Zou, Yizhou Sun, and Wei Wang.
arXiv:2502.17504.

2025

Internal and External Impacts of Natural Language Processing Papers [PDF] [arXiv] [dataset]
Yu Zhang.
ACL 2025. Vienna, Austria. (Short Paper)

Chain-of-Factors Paper-Reviewer Matching [PDF] [arXiv] [code]
Yu Zhang, Yanzhen Shen, SeongKu Kang, Xiusi Chen, Bowen Jin, and Jiawei Han.
WWW 2025. Sydney, Australia.

A Unified Taxonomy-Guided Instruction Tuning Framework for Entity Set Expansion and Taxonomy Expansion [PDF] [arXiv] [code]
Yanzhen Shen, Yu Zhang, Yunyi Zhang, and Jiawei Han.
ACL 2025 Findings. Vienna, Austria.

Improving Scientific Document Retrieval with Concept Coverage-based Query Set Generation [PDF] [arXiv]
SeongKu Kang, Bowen Jin, Wonbin Kweon, Yu Zhang, Dongha Lee, Jiawei Han, and Hwanjo Yu.
WSDM 2025. Hannover, Germany.

2024

Structure-Enhanced Text Mining for Science [Link]
Yu Zhang.
Ph.D. Thesis. (ACM SIGKDD Dissertation Award Runner-Up)

A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery [PDF] [arXiv] [project page]
Yu Zhang*, Xiusi Chen*, Bowen Jin*, Sheng Wang, Shuiwang Ji, Wei Wang, and Jiawei Han.
EMNLP 2024. Miami, FL, USA. (550+ stars on GitHub!)

Seed-Guided Fine-Grained Entity Typing in Science and Engineering Domains [PDF] [arXiv] [code]
Yu Zhang*, Yunyi Zhang*, Yanzhen Shen, Yu Deng, Lucian Popa, Larisa Shwartz, ChengXiang Zhai, and Jiawei Han.
AAAI 2024. Vancouver, Canada.

Ontology Enrichment for Effective Fine-grained Entity Typing [PDF] [arXiv]
Siru Ouyang, Jiaxin Huang, Pranav Pillai, Yunyi Zhang, Yu Zhang, and Jiawei Han.
KDD 2024. Barcelona, Spain.

Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs [PDF] [arXiv] [code]
Bowen Jin, Chulin Xie, Jiawei Zhang, Kashob Kumar Roy, Yu Zhang, Zheng Li, Ruirui Li, Xianfeng Tang, Suhang Wang, Yu Meng, and Jiawei Han.
ACL 2024 Findings. Bangkok, Thailand.

2023

Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding [PDF] [arXiv] [project page] [code] [model] [dataset] [PMC-Patients leaderboard]
Yu Zhang*, Hao Cheng*, Zhihong Shen, Xiaodong Liu, Ye-Yi Wang, and Jianfeng Gao.
EMNLP 2023 Findings. Singapore, Singapore.

Weakly Supervised Multi-Label Classification of Full-Text Scientific Papers [PDF] [arXiv] [code]
Yu Zhang, Bowen Jin, Xiusi Chen, Yanzhen Shen, Yunyi Zhang, Yu Meng, and Jiawei Han.
KDD 2023. Long Beach, CA, USA.

The Effect of Metadata on Scientific Literature Tagging: A Cross-Field Cross-Model Study [PDF] [arXiv] [code] [dataset]
Yu Zhang, Bowen Jin, Qi Zhu, Yu Meng, and Jiawei Han.
WWW 2023. Austin, TX, USA.

Effective Seed-Guided Topic Discovery by Integrating Multiple Types of Contexts [PDF] [arXiv] [code]
Yu Zhang*, Yunyi Zhang*, Martin Michalski*, Yucheng Jiang*, Yu Meng*, and Jiawei Han.
WSDM 2023. Singapore, Singapore.

PIEClass: Weakly-Supervised Text Classification with Prompting and Noise-Robust Iterative Ensemble Training [PDF] [arXiv] [code]
Yunyi Zhang, Minhao Jiang, Yu Meng, Yu Zhang, and Jiawei Han.
EMNLP 2023. Singapore, Singapore.

Heterformer: Transformer-based Deep Node Representation Learning on Heterogeneous Text-Rich Networks [PDF] [arXiv] [code]
Bowen Jin, Yu Zhang, Qi Zhu, and Jiawei Han.
KDD 2023. Long Beach, CA, USA.

Patton: Language Model Pretraining on Text-Rich Networks [PDF] [arXiv] [code]
Bowen Jin, Wentao Zhang, Yu Zhang, Yu Meng, Xinyang Zhang, Qi Zhu, and Jiawei Han.
ACL 2023. Toronto, Canada.

Chain-of-Skills: A Configurable Model for Open-Domain Question Answering [PDF] [arXiv] [code]
Kaixin Ma, Hao Cheng, Yu Zhang, Xiaodong Liu, Eric Nyberg, and Jianfeng Gao.
ACL 2023. Toronto, Canada.

Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning [PDF] [arXiv] [code]
Yu Meng, Martin Michalski, Jiaxin Huang, Yu Zhang, Tarek Abdelzaher, and Jiawei Han
ICML 2023. Honolulu, HI, USA.

Edgeformers: Graph-Empowered Transformers for Representation Learning on Textual-Edge Networks [PDF] [arXiv] [code]
Bowen Jin, Yu Zhang, Yu Meng, and Jiawei Han.
ICLR 2023. Kigali, Rwanda.

Gotta: Generative Few-shot Question Answering by Prompt-based Cloze Data Augmentation [PDF] [arXiv] [code]
Xiusi Chen, Yu Zhang, Jinliang Deng, Jyun-Yu Jiang, and Wei Wang.
SDM 2023. Minneapolis, MN, USA. (Regular Paper, Best Poster Award Honorable Mention)

2022

Seed-Guided Topic Discovery with Out-of-Vocabulary Seeds [PDF] [arXiv] [code]
Yu Zhang, Yu Meng, Xuan Wang, Sheng Wang, and Jiawei Han.
NAACL 2022. Seattle, WA, USA.

Metadata-Induced Contrastive Learning for Zero-Shot Multi-Label Text Classification [PDF] [arXiv] [code]
Yu Zhang, Zhihong Shen, Chieh-Han Wu, Boya Xie, Junheng Hao, Ye-Yi Wang, Kuansan Wang, and Jiawei Han.
WWW 2022. Lyon, France.

MotifClass: Weakly Supervised Text Classification with Higher-order Metadata Information [PDF] [arXiv] [code]
Yu Zhang*, Shweta Garg*, Yu Meng, Xiusi Chen, and Jiawei Han.
WSDM 2022. Tempe, AZ, USA.

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark [PDF] [arXiv] [code]
Carl Yang*, Yuxin Xiao*, Yu Zhang*, Yizhou Sun, and Jiawei Han.
TKDE. Volume 34, Issue 10. IEEE.

Generating Training Data with Language Models: Towards Zero-Shot Language Understanding [PDF] [arXiv] [code]
Yu Meng, Jiaxin Huang, Yu Zhang, and Jiawei Han.
NeurIPS 2022. New Orleans, LA, USA.

Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations [PDF] [arXiv] [code]
Yu Meng, Yunyi Zhang, Jiaxin Huang, Yu Zhang, and Jiawei Han.
WWW 2022. Lyon, France.

2021

MATCH: Metadata-Aware Text Classification in A Large Hierarchy [PDF] [arXiv] [code]
Yu Zhang, Zhihong Shen, Yuxiao Dong, Kuansan Wang, and Jiawei Han.
WWW 2021. Ljubljana, Slovenia.

Hierarchical Metadata-Aware Document Categorization under Weak Supervision [PDF] [arXiv] [code]
Yu Zhang, Xiusi Chen, Yu Meng, and Jiawei Han.
WSDM 2021. Jerusalem, Israel.

Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training [PDF] [arXiv] [code]
Yu Meng, Yunyi Zhang, Jiaxin Huang, Xuan Wang, Yu Zhang, Heng Ji, and Jiawei Han.
EMNLP 2021. Punta Cana, Dominican Republic.

2020

Minimally Supervised Categorization of Text with Metadata [PDF] [arXiv] [code]
Yu Zhang*, Yu Meng*, Jiaxin Huang, Frank F. Xu, Xuan Wang, and Jiawei Han.
SIGIR 2020. Xi’an, China.

Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding [PDF] [arXiv] [code]
Yu Meng, Yunyi Zhang, Jiaxin Huang, Yu Zhang, Chao Zhang, and Jiawei Han.
KDD 2020. San Diego, CA, USA.

Discriminative Topic Mining via Category-Name Guided Text Embedding [PDF] [arXiv] [code]
Yu Meng, Jiaxin Huang, Guangyuan Wang, Zihan Wang, Chao Zhang, Yu Zhang, and Jiawei Han.
WWW 2020. Taipei, Taiwan.

2019

HiGitClass: Keyword-Driven Hierarchical Classification of GitHub Repositories [PDF] [arXiv] [code]
Yu Zhang, Frank F. Xu, Sha Li, Yu Meng, Xuan Wang, Qi Li, and Jiawei Han.
ICDM 2019. Beijing, China.

Cross-type Biomedical Named Entity Recognition with Deep Multi-Task Learning [PDF] [arXiv] [bioRxiv] [code]
Xuan Wang, Yu Zhang, Xiang Ren, Yuhao Zhang, Marinka Zitnik, Jingbo Shang, Curtis Langlotz, and Jiawei Han.
Bioinformatics. Volume 35, Issue 10. Oxford University Press.

Integrating Local Context and Global Cohesiveness for Open Information Extraction [PDF] [arXiv] [code]
Qi Zhu, Xiang Ren, Jingbo Shang, Yu Zhang, Ahmed El-Kishky, and Jiawei Han.
WSDM 2019. Melbourne, Australia.

2018

Weakly-supervised Relation Extraction by Pattern-enhanced Embedding Learning [PDF] [arXiv] [code]
Meng Qu, Xiang Ren, Yu Zhang, and Jiawei Han.
WWW 2018. Lyon, France.

Open Information Extraction with Global Structure Constraints [PDF] [code]
Qi Zhu, Xiang Ren, Jingbo Shang, Yu Zhang, Frank F. Xu, and Jiawei Han.
WWW 2018. Lyon, France. (Poster, Best Poster Award Honorable Mention)

2017

RATE: Overcoming Noise and Sparsity of Textual Features in Real-Time Location Estimation [PDF] [arXiv] [code]
Yu Zhang, Wei Wei, Binxuan Huang, Kathleen M. Carley, and Yan Zhang.
CIKM 2017. Singapore, Singapore. (Short Paper)

Top-K Influential Nodes in Social Networks: A Game Perspective [PDF] [code]
Yu Zhang and Yan Zhang.
SIGIR 2017. Shinjuku, Tokyo, Japan. (Short Paper)

Conference Tutorials (In Proceedings)

Bridging Text Data and Graph Data: Towards Semantics and Structure-aware Knowledge Discovery [PDF]
Bowen Jin, Yu Zhang, Sha Li, and Jiawei Han.
WSDM 2024. Mérida, Mexico. (Tutorial)

Pretrained Language Representations for Text Understanding: A Weakly-Supervised Perspective [PDF]
Yu Meng, Jiaxin Huang, Yu Zhang, Yunyi Zhang, and Jiawei Han.
KDD 2023. Long Beach, CA, USA. (Tutorial)

Tutorials at The Web Conference 2023 [PDF]
Valeria Fionda, Olaf Hartig, et al. (including Yu Zhang)
WWW 2023. Austin, TX, USA. (Tutorial)

Mining Structures from Massive Texts by Exploring the Power of Pre-trained Language Models [PDF]
Yu Zhang, Yunyi Zhang, and Jiawei Han.
EDBT 2023. Ioannina, Greece. (Tutorial)

Adapting Pretrained Representations for Text Mining [PDF]
Yu Meng, Jiaxin Huang, Yu Zhang, and Jiawei Han.
KDD 2022. Washington, DC, USA. (Tutorial)

On the Power of Pre-Trained Text Representations: Models and Applications in Text Mining [PDF]
Yu Meng, Jiaxin Huang, Yu Zhang, and Jiawei Han.
KDD 2021. Singapore, Singapore. (Tutorial)

Workshop Summaries (In Proceedings)

Machine Learning on Graphs in the Era of Generative Artificial Intelligence [PDF]
Yu Wang, Yu Zhang, Zhichun Guo, Harry Shomer, Haoyu Han, Tyler Derr, Nesreen K. Ahmed, Mahantesh Halappanavar, and Jiliang Tang.
KDD 2025. Toronto, Canada. (Workshop)

SKnow-LLM Workshop: Structured Knowledge for Large Language Models [PDF]
Qi Zhu, Xiusi Chen, Yu Zhang, Soji Adeshina, Costas Mavromatis, Zhen Han, Vassilis N. Ioannidis, Leman Akoglu, Danai Koutra, and Huzefa Rangwala.
KDD 2025. Toronto, Canada. (Workshop)

Honors and Awards

ACM SIGKDD Dissertation Award Runner-Up, 2025
EMNLP Outstanding Reviewer, 2024
KDD Outstanding Reviewer (30 in 1469), 2023
Dissertation Completion Fellowship, Graduate College, UIUC (the only recipient from CS), 2023
WWW Best Reviewer, 2023
SDM Best Poster Award Honorable Mention, 2023
Data Mining Research Excellence Gold Award, Data Mining Group, UIUC, 2023, 2024
CIKM Best Reviewer, 2022
Yunni & Maxine Pao Memorial Fellowship, the Grainger College of Engineering, UIUC, 2022
WWW Student Scholarship, 2021
WSDM Student Travel Grant, 2021, 2022, 2023, 2024
WWW Best Poster Award Honorable Mention, 2018
Outstanding Undergraduate Thesis Award, School of EECS, Peking University (10 in 320), 2017
Outstanding Graduates, Peking University, 2017
SIGIR Student Travel Grant, 2017, 2020
China National Scholarship (top 1% in Peking University), 2014
First Prize, National Olympiad in Informatics in Provinces, 2011, 2012

Invited Talks

Assisting Scientific Research with Structure-Aware Large Language Models [slides]
April 2025, Invited Talk at the University of Kansas.

Graph-Enhanced Scientific Text Mining [slides]
August 2025, ACM SIGKDD Dissertation Award talk at KDD 2025.
December 2024, Invited Talk at the LoG 2024 Seattle Meetup.
November 2024, Guest Lecture (CSE 427) at the University of Washington.
May 2024, Guest Lecture (STAT 359) at Northwestern University.
March 2024, Keynote at the Machine Learning on Graphs (MLoG) Workshop at WSDM 2024.

Professional Services

Conference Senior Area Chair
EMNLP 2025

Conference Area Chair
KDD 2025-2026; ACL 2025; NeurIPS 2025

Conference Reviewer
KDD 2022-2025; WWW 2022-2025; WSDM 2023-2025;
CIKM 2021-2024; SDM 2024; ECML/PKDD 2022;
NeurIPS 2021-2022, 2024; ICML 2022-2025; ICLR 2021-2025; AAAI 2022;
ACL 2021, 2023; EMNLP 2020, 2022-2024; NAACL 2021-2022; COLING 2022

Journal Reviewer
TPAMI; TKDE; TOIS; TKDD; Bioinformatics; TWEB; TASLP; TBD; TCBB

Workshop Co-Organizer
SKnowLLM@KDD 2025; MLoG-GenAI@KDD 2025

Student Volunteer
SIGIR 2020; KDD 2022-2023

Miscellany

I was born and raised in Shanghai, China. I graduated from the High School Affiliated to Fudan University.

I played bridge during high school and undergraduate time.