Zheng Li

2025

ErrorTrace: A Black-Box Traceability Mechanism Based on Model Family Error Space

Chuanchao Zang, Xiangtao Meng, Wenyu Chen, Tianshuo Cong, Yaxing Zha, Dong Qi, Zheng Li^†, Shanqing Guo^†

NeurIPS 2025

The open-source release of large language models (LLMs) enables malicious users to create unauthorized derivative models at low cost, posing significant threats to intellectual property (IP) and market stability. Existing IP protection methods either require access to model parameters or are vulnerable to fine-tuning attacks. To fill this gap, we propose ErrorTrace, a robust and black-box traceability mechanism for protecting LLM IP. Specifically, ErrorTrace leverages the unique error patterns of model families by mapping and analyzing their distinct error spaces, enabling robust and efficient IP protection without relying on internal parameters or specific query responses. Experimental results show that ErrorTrace achieves a traceability accuracy of 0.8518 for 27 base models when the suspect model is not included in ErrorTrace's training set, outperforming the baseline by 0.2593. Additionally,ErrorTrace successfully tracks 34 fine-tuned, pruned and merged models across various scenarios, demonstrating its broad applicability and robustness. In addition, ErrorTrace shows a certain level of resilience when subjected to adversarial attacks. Our code is available at: https://github.com/csdatazcc/ErrorTrace.

ErrorTrace: A Black-Box Traceability Mechanism Based on Model Family Error Space

Chuanchao Zang, Xiangtao Meng, Wenyu Chen, Tianshuo Cong, Yaxing Zha, Dong Qi, Zheng Li^†, Shanqing Guo^†

NeurIPS 2025

[Paper] [Code]

DCMI: A Differential Calibration Membership Inference Attack Against Retrieval-Augmented Generation

Xinyu Gao, Xiangtao Meng^†, Yingkai Dong, Zheng Li^†, Shanqing Guo^†

CCS 2025

[Abstract] [Paper] [Code]

While Retrieval-Augmented Generation (RAG) effectively reduces hallucinations by integrating external knowledge bases, it introduces vulnerabilities to membership inference attacks (MIAs), particularly in systems handling sensitive data. Existing MIAs targeting RAG's external databases often rely on model responses but ignore the interference of non-member-retrieved documents on RAG outputs, limiting their effectiveness. To address this, we propose DCMI, a differential calibration MIA that mitigates the negative impact of non-member-retrieved documents. Specifically, DCMI leverages the sensitivity gap between member and non-member retrieved documents under query perturbation. It generates perturbed queries for calibration to isolate the contribution of member-retrieved documents while minimizing the interference from non-member-retrieved documents. ...

DCMI: A Differential Calibration Membership Inference Attack Against Retrieval-Augmented Generation

Xinyu Gao, Xiangtao Meng^†, Yingkai Dong, Zheng Li^†, Shanqing Guo^†

CCS 2025

[Paper] [Code]

Fuzz-Testing Meets LLM-Based Agents: An Automated and Efficient Framework for Jailbreaking Text-To-Image Generation Models

Yingkai Dong, Xiangtao Meng, Ning Yu, Zheng Li^†, Shanqing Guo^†

IEEE S&P 2025

[Abstract] [Paper] [Code]

Text-to-image (T2I) generative models have revolutionized content creation by transforming textual descriptions into high-quality images. However, these models are vulnerable to jailbreaking attacks, where carefully crafted prompts bypass safety mechanisms to produce unsafe content. While researchers have developed various jailbreak attacks to expose this risk, these methods face significant limitations, including impractical access requirements, easily detectable unnatural prompts, restricted search spaces, and high query demands on the target system. ...

Fuzz-Testing Meets LLM-Based Agents: An Automated and Efficient Framework for Jailbreaking Text-To-Image Generation Models

Yingkai Dong, Xiangtao Meng, Ning Yu, Zheng Li^†, Shanqing Guo^†

IEEE S&P 2025

[Paper] [Code]

Enhanced Label-Only Membership Inference Attacks with Fewer Queries

Hao Li*, Zheng Li*, Siyuan Wu, Yutong Ye, Min Zhang^†, Dengguo Feng

USENIX Security 2025

[Abstract] [Paper] [Code]

Machine Learning (ML) models are vulnerable to membership inference attacks (MIAs), where an adversary aims to determine whether a specific sample was part of the model’s training data. Traditional MIAs exploit differences in the model’s output posteriors, but in more challenging scenarios (label-only scenarios) where only predicted labels are available, existing works directly utilize the shortest distance of samples reaching decision boundaries as membership signals, denoted as the shortestBD. However, they face two key challenges: low distinguishability between members and non-members due to sample diversity,and high query requirements stemming from direction diversity. ...

Enhanced Label-Only Membership Inference Attacks with Fewer Queries

Hao Li*, Zheng Li*, Siyuan Wu, Yutong Ye, Min Zhang^†, Dengguo Feng

USENIX Security 2025

[Paper] [Code]

Membership Inference Attacks Against Vision-Language Models

Yuke Hu, Zheng Li, Zhihao Liu, Yang Zhang, Zhan Qin^†, Kui Ren, Chun Chen

USENIX Security 2025

[Abstract] [Paper] [Code]

Vision-Language Models (VLMs), built on pre-trained vision encoders and large language models (LLMs), have shown exceptional multi-modal understanding and dialog capabilities, positioning them as catalysts for the next technological revolution. However, while most VLM research focuses on enhancing multi-modal interaction, the risks of data misuse and leakage have been largely unexplored. This prompts the need for a comprehensive investigation of such risks in VLMs. ...

Membership Inference Attacks Against Vision-Language Models

Yuke Hu, Zheng Li, Zhihao Liu, Yang Zhang, Zhan Qin^†, Kui Ren, Chun Chen

USENIX Security 2025

[Paper] [Code]

Safe Driving Adversarial Trajectory Can Mislead: Toward More Stealthy Adversarial Attack Against Autonomous Driving Prediction Module

Yingkai Dong, Li Wang, Zheng Li^†, Hao Li, Peng Tang, Chengyu Hu, Shanqing Guo^†

ACM Transactions on Privacy and Security 2025

[Abstract] [Paper]

The prediction module, powered by deep learning models, constitutes a fundamental component of high-level Autonomous Vehicles (AVs). Given the direct influence of the module’s prediction accuracy on AV driving behavior, ensuring its security is paramount. However, limited studies have explored the adversarial robustness of the prediction modules. Furthermore, existing methods still generate adversarial trajectories that deviate significantly from human driving behavior. These deviations can be easily identified as hazardous by AVs’ anomaly detection models and thus cannot effectively evaluate and reflect the robustness of the prediction modules. To bridge this gap, we propose a stealthy and more effective optimization-based attack method. Specifically, we reformulate the optimization problem using Lagrangian relaxation and design a Frenet-based objective function along with a distinct constraint space. We conduct extensive evaluations on 2 popular prediction models and 2 benchmark datasets. Our results show that our attack is highly effective, with over 87% attack success rates, outperforming all baseline attacks. Moreover, our attack method significantly improves the stealthiness of adversarial trajectories while guaranteeing adherence to physical constraints. Our attack is also found robust to noise from upstream modules, transferable across trajectory prediction models, and high realizability. Lastly, to verify its effectiveness in real-world applications, we conduct further simulation evaluations using a production-grade simulator. These simulations reveal that the adversarial trajectory we created could convincingly induce autonomous vehicles (AVs) to initiate hard braking.

Safe Driving Adversarial Trajectory Can Mislead: Toward More Stealthy Adversarial Attack Against Autonomous Driving Prediction Module

Yingkai Dong, Li Wang, Zheng Li^†, Hao Li, Peng Tang, Chengyu Hu, Shanqing Guo^†

ACM Transactions on Privacy and Security 2025

2025

ErrorTrace: A Black-Box Traceability Mechanism Based on Model Family Error Space

ErrorTrace: A Black-Box Traceability Mechanism Based on Model Family Error Space

DCMI: A Differential Calibration Membership Inference Attack Against Retrieval-Augmented Generation

DCMI: A Differential Calibration Membership Inference Attack Against Retrieval-Augmented Generation

Fuzz-Testing Meets LLM-Based Agents: An Automated and Efficient Framework for Jailbreaking Text-To-Image Generation Models

Fuzz-Testing Meets LLM-Based Agents: An Automated and Efficient Framework for Jailbreaking Text-To-Image Generation Models

Enhanced Label-Only Membership Inference Attacks with Fewer Queries

Enhanced Label-Only Membership Inference Attacks with Fewer Queries

Membership Inference Attacks Against Vision-Language Models

Membership Inference Attacks Against Vision-Language Models

Safe Driving Adversarial Trajectory Can Mislead: Toward More Stealthy Adversarial Attack Against Autonomous Driving Prediction Module

Safe Driving Adversarial Trajectory Can Mislead: Toward More Stealthy Adversarial Attack Against Autonomous Driving Prediction Module

FDINet: Protecting Against DNN Model Extraction Using Feature Distortion Index

FDINet: Protecting Against DNN Model Extraction Using Feature Distortion Index

A Comprehensive Study of Privacy Risks in Curriculum Learning

A Comprehensive Study of Privacy Risks in Curriculum Learning

Safe-Control: A Safety Patch for Mitigating Unsafe Content in Text-to-Image Generation Models

Safe-Control: A Safety Patch for Mitigating Unsafe Content in Text-to-Image Generation Models

PDA: Generalizable Detection of AI-Generated Images via Post-hoc Distribution Alignment

PDA: Generalizable Detection of AI-Generated Images via Post-hoc Distribution Alignment

Faceswapguard: Safeguarding facial privacy from deepfake threats through identity obfuscation

Faceswapguard: Safeguarding facial privacy from deepfake threats through identity obfuscation

2024

PRJack: Pruning-Resistant Model Hijacking Attack Against Deep Learning Models

PRJack: Pruning-Resistant Model Hijacking Attack Against Deep Learning Models

ModScan: Measuring Stereotypical Bias in Large Vision-Language Models from Vision and Language Modalities

ModScan: Measuring Stereotypical Bias in Large Vision-Language Models from Vision and Language Modalities

Membership Inference Attacks Against In-Context Learning

Membership Inference Attacks Against In-Context Learning

SeqMIA: Sequential-Metric Based Membership Inference Attack

SeqMIA: Sequential-Metric Based Membership Inference Attack

BadMerging: Backdoor Attacks Against Model Merging

BadMerging: Backdoor Attacks Against Model Merging

SecurityNet: Assessing Machine Learning Vulnerabilities on Public Models

SecurityNet: Assessing Machine Learning Vulnerabilities on Public Models

Inside the Black Box: Detecting Data Leakage in Pre-trained Language Encoders

Inside the Black Box: Detecting Data Leakage in Pre-trained Language Encoders

Detection and Attribution of Models Trained on Generated Data

Detection and Attribution of Models Trained on Generated Data

Model Hijacking Attack in Federated Learning

Model Hijacking Attack in Federated Learning

Membership Inference Attack Against Masked Image Modeling

Membership Inference Attack Against Masked Image Modeling

2023

On the privacy risks of machine learning models

On the privacy risks of machine learning models

UnGANable: Defending Against GAN-based Face Manipulation

UnGANable: Defending Against GAN-based Face Manipulation

DE-FAKE Detection and Attribution of Fake Images Generated by Text-to-Image Generation Models

DE-FAKE Detection and Attribution of Fake Images Generated by Text-to-Image Generation Models

Backdoor Attacks Against Dataset Distillation

Backdoor Attacks Against Dataset Distillation

NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models

NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models

Data Poisoning Attacks Against Multimodal Encoders

Data Poisoning Attacks Against Multimodal Encoders

RemovalNet: DNN Fingerprint Removal Attacks

RemovalNet: DNN Fingerprint Removal Attacks

Watermarking Diffusion Model

Watermarking Diffusion Model

Generative Watermarking Against Unauthorized Subject-Driven Image Synthesis

Generative Watermarking Against Unauthorized Subject-Driven Image Synthesis

2022

Auditing Membership Leakages of Multi-Exit Networks

Auditing Membership Leakages of Multi-Exit Networks

FuzzGAN: A Generation-Based Fuzzing Framework For Testing Deep Neural Networks

FuzzGAN: A Generation-Based Fuzzing Framework For Testing Deep Neural Networks

Membership-Doctor: Comprehensive Assessment of Membership Inference Against Machine Learning Models

Membership-Doctor: Comprehensive Assessment of Membership Inference Against Machine Learning Models

Membership Inference Attacks Against Text-to-image Generation Models

Membership Inference Attacks Against Text-to-image Generation Models

Backdoor Attacks in the Supply Chain of Masked Image Modeling

Backdoor Attacks in the Supply Chain of Masked Image Modeling

2021

Membership Leakage in Label-Only Exposures

Membership Leakage in Label-Only Exposures

2019

How to prove your model belongs to you: a blind-watermark based framework to protect intellectual property of DNN

How to prove your model belongs to you: a blind-watermark based framework to protect intellectual property of DNN