📄️ 🟢 Aperçu
Prévenir l'injection de prompt (prompt injection) peut être extrêmement difficile, et il existe peu de défenses robustes contre cela(@crothers2022machine)(@goodside2021gpt). Cependant, certaines solutions de bon sens existent. Par exemple, si votre application n'a pas besoin de produire du texte libre, ne permettez pas de tels résultats. Il existe de nombreuses manières différentes de défendre un prompt. Nous discuterons ici de certaines des plus courantes.
📄️ 🟢 Filtering
Filtering is a common technique for preventing prompt hacking(@kang2023exploiting). There are a few types of filtering, but the basic idea is to check for words and phrase in the initial prompt or the output that should be blocked. You can use a blocklist or an allowlist for this purpose(@selvi2022exploring). A blocklist is a list of words and phrases that should be blocked, and an allowlist is a list of words and phrases that should be allowed.
📄️ 🟢 Instruction Defense
You can add instructions to a prompt, which encourage the model to be careful about
📄️ 🟢 Post-Prompting
The post-prompting defense(@christoph2022talking) simply puts
📄️ 🟢 Random Sequence Enclosure
Yet another defense is enclosing the user input between two random sequences of characters(@armstrong2022using). Take this prompt as an example:
📄️ 🟢 Sandwich Defense
The sandwich defense involves sandwiching user input between
📄️ 🟢 XML Tagging
XML tagging can be a very robust defense when executed properly (in particular with the XML+escape). It involves surrounding user input by XML tags (e.g. ``). Take this prompt as an example:
📄️ 🟢 Separate LLM Evaluation
A separate prompted LLM can be used to judge whether a prompt is adversarial.
📄️ 🟢 Other Approaches
Although the previous approaches can be very robust, a few other approaches, such as using a different model, including fine tuning, soft prompting, and length restrictions, can also be effective.