📄️ 🟢 Introduction
Le prompt hacking (piratage de prompt ou le hacking de prompt) est un terme utilisé pour décrire un type d'attaque qui exploite les vulnérabilités des %%LLMs|LLM%%, en manipulant leurs entrées ou prompts. Contrairement au hacking traditionnel, qui exploite généralement les vulnérabilités logicielles, le hacking de prompt repose sur la création soignée de prompts pour tromper le LLM et le faire réaliser des actions non intentionnelles.
📄️ 🟢 Prompt Injection
Prompt injection is the process of hijacking a language model's output(@branch2022evaluating)(@crothers2022machine)(@goodside2022inject)(@simon2022inject). It allows the hacker to get the model to say anything that they want.
📄️ 🟢 Prompt Leaking
Prompt leaking is a form of prompt injection in which the model is asked to
📄️ 🟢 Jailbreaking
Jailbreaking is a process that uses prompt injection to specifically bypass safety and moderation features placed on LLMs by their creators(@perez2022jailbreak)(@brundage_2022)(@wang2022jailbreak). Jailbreaking usually refers to Chatbots which have successfully been prompt injected and now are in a state where the user can ask any question they would like.
🗃️ 🟢 Defensive Measures
9 éléments
🗃️ 🟢 Offensive Measures
8 éléments