๋ณธ๋ฌธ์œผ๋กœ ๊ฑด๋„ˆ๋›ฐ๊ธฐ

๐ŸŸก ์ž๊ธฐ ์ผ๊ด€์„ฑ

์ž๊ธฐ ์ผ๊ด€์„ฑ1์€ ํ•˜๋‚˜๊ฐ€ ์•„๋‹Œ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์‚ฌ๊ณ  ์‚ฌ์Šฌ์„ ๋งŒ๋“ค๊ณ  ๊ทธ ์ค‘์—์„œ ๊ฐ€์žฅ ๋งŽ์ด ๋‚˜์˜จ ๋‹ต์„ ํƒํ•˜๋Š” CoT ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.

์•„๋ž˜์˜ ๊ทธ๋ฆผ์„ ๋ณด๋ฉด ์™ผ์ชฝ์˜ ํ”„๋กฌํ”„ํŠธ๋Š” ํ“จ์ƒท CoT ํŒจ๋Ÿฌ๋‹ค์ž„์„ ์‚ฌ์šฉํ•ด์„œ ์“ฐ์—ฌ์กŒ์Šต๋‹ˆ๋‹ค. ์ด ํ•˜๋‚˜์˜ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ด์šฉํ•ด์„œ ์—ฌ๋Ÿฌ๊ฐœ์˜ ๋…๋ฆฝ์ ์ธ ์‚ฌ๊ณ  ์‚ฌ์Šฌ์ด ์ƒ์„ฑ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋‹ต๋ณ€์€ ๊ฐ ํ•ญ๋ชฉ๋“ค๋กœ๋ถ€ํ„ฐ ์ถ”์ถœ๋˜์—ˆ๊ณ  ๊ทธ ๊ฒฐ๊ณผ ์ตœ์ข… ๋‹ต์•ˆ์€ "marginalizing out reasoning paths"๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด ์˜ˆ์ œ์—์„œ๋Š” ๋‹จ์ˆœํžˆ ๊ฐ€์žฅ ๋งŽ์ด ๋‚˜์˜จ ๋‹ต์„ ์„ ํƒํ•œ ๊ฒƒ ๋ฟ์ž…๋‹ˆ๋‹ค.

์ž๊ธฐ ์ผ๊ด€์„ฑ (Wang et al.)

์˜ˆ์ œโ€‹

์ด๋ฉ”์ผ ๋ถ„์„์— ๋Œ€ํ•œ ๊ฐ„๋‹จํ•œ ์˜ˆ์ œ์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์—ฌ๋Ÿฌ๋ถ„์ด ์†Œํ”„ํŠธ์›จ์–ด ํšŒ์‚ฌ์— ๋‹ค๋‹ˆ๊ณ  ๋งค์ผ ๋ช‡ ๋ฐฑ ํ†ต์˜ ์ด๋ฉ”์ผ์„ ๋ฐ›๋Š”๋‹ค๊ณ  ๊ฐ€์ •ํ•ด๋ด…์‹œ๋‹ค. ์—ฌ๋Ÿฌ๋ถ„์˜ ๋น„์ฆˆ๋‹ˆ์Šค์— ์ค‘์š”ํ•œ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ๊ฒƒ๋“ค์„ ์•Œ์•„๋ณด๊ธฐ ์œ„ํ•ด์„œ ์—ฌ๋Ÿฌ๋ถ„์€ ์ค‘์š”ํ•œ ์ด๋ฉ”์ผ๊ณผ ์•„๋‹Œ ์ด๋ฉ”์ผ์„ ๊ตฌ๋ถ„ํ•˜๊ธฐ ์œ„ํ•œ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

์—ฌ๊ธฐ์— ์—ฌ๋Ÿฌ๋ถ„์ด ๋ฐ›์„ ์ด๋ฉ”์ผ์— ๋Œ€ํ•œ ์˜ˆ์‹œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค:

Hello,

I have discovered a major security vulnerability in your system. Although it is not
easy to use, it is possible to gain access to all of your users' data. I have attached
a proof of concept. Please fix this issue as soon as possible.

Cheers,

Donny

ํ”„๋กฌํ”„ํŠธ์— ๋„ฃ์–ด๋ด…์‹œ๋‹ค:

Hello,

I have discovered a major security vulnerability in your system. Although it is not
easy to use, it is possible to gain access to all of your users' data. I have attached
a proof of concept. Please fix this issue as soon as possible.

Cheers,

Donny

Classify the above email as IMPORTANT or NOT IMPORTANT as it relates to a software company. Let's think step by step.

์ด์ œ, ์šฐ๋ฆฌ๋Š” ์œ„์˜ ํ”„๋กฌํ”„ํŠธ๋ฅผ ํ†ตํ•ด์„œ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋…๋ฆฝ์ ์ธ ์‚ฌ๊ณ  ์‚ฌ์Šฌ์„ ๋งŒ๋“ค ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์— ๊ทธ ๋ช‡๊ฐ€์ง€ ์˜ˆ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค:

The sender has discovered a major security vulnerability in the system.
This is important because it could lead to a data breach.
I would classify this email as IMPORTANT
The sender has discovered a major security vulnerability in the system, but they have not provided any proof of concept and it is not easy to use.
I would classify this email as NOT IMPORTANT
The sender is likely a hacker and may have access to the system.
This is important because it could lead to a data breach.
I would classify this email as IMPORTANT

๋งŽ์€ ์‚ฌ๊ณ  ์‚ฌ์Šฌ์„ ๋งŒ๋“ค์–ด ๋‚ด๊ณ , ๊ฐ€์žฅ ๋งŽ์ด ๋‚˜์˜จ ๋‹ต๋ณ€์„ ์„ ํƒํ•จ์œผ๋กœ์จ(์ค‘์š”ํ•จ), ์šฐ๋ฆฌ๋Š” GPT-3๋กœ ๋ถ€ํ„ฐ ๋” ์ผ๊ด€์ ์ด๊ณ  ์ •ํ™•ํ•œ ๋‹ต๋ณ€์„ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ฒฐ๋ก โ€‹

์ž๊ธฐ ์ผ๊ด€์„ฑ์€ ์‚ฐ์ˆ ์ , ์ƒ์‹์ , ์ƒ์ง•์  ์ถ”๋ก  ๊ณผ์ œ์—์„œ ๊ฒฐ๊ณผ๋ฅผ ๋” ๊ฐœ์„ ํ–ˆ์Šต๋‹ˆ๋‹ค.

์‹ฌ์ง€์–ด ๊ธฐ๋ณธ CoT๊ฐ€ ํšจ๊ณผ์ ์ด์ง€ ์•Š์„ ๋•Œ๋„ ์ž๊ธฐ ์ผ๊ด€์„ฑ์€ ์—ฌ์ „ํžˆ ๋„์›€์ด ๋œ๋‹ค๋Š” ๊ฒƒ๋„ ์•Œ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

์ฐธ๊ณ โ€‹

Wang et al. ๋‹ต์„ ๋„์ถœํ•˜๋Š” ๋” ๋ณต์žกํ•œ ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด์„œ ๊ณ ๋ฏผํ•ด๋ด…์‹œ๋‹ค. ์‚ฌ๊ณ  ์‚ฌ์Šฌ์— ๋Œ€ํ•œ LLM์˜ ์ƒ์„ฑ ํ™•๋ฅ ์— ๋Œ€ํ•ด์„œ ์ƒ๊ฐํ•ด ๋ณผ ์ˆ˜ ์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ตœ๋นˆ๊ฐ’ ํˆฌํ‘œ๊ฐ€ ๋ณดํ†ต ๋” ๋‚˜์€ ๋‹ต์„ ์ œ๊ณตํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์—ฐ๊ตฌ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•๋“ค์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.


  1. Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., & Zhou, D. (2022). Self-Consistency Improves Chain of Thought Reasoning in Language Models. โ†ฉ