Post

๐Ÿ“„ Paper2Code: Automating Code Generation from Scientific Papers

๋…ผ๋ฌธ์„ ์ž๋™์œผ๋กœ ์ฝ”๋“œ ์ €์žฅ์†Œ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋‹ค์ค‘ ์—์ด์ „ํŠธ ๊ธฐ๋ฐ˜์˜ LLM ์‹œ์Šคํ…œ

๐Ÿ“„ Paper2Code: Automating Code Generation from Scientific Papers

๐Ÿ“„ Paper2Code: Automating Code Generation from Scientific Papers

๐Ÿ“Œ PaperCoder ๊ฐœ์š”

PaperCoder๋Š” ๋จธ์‹ ๋Ÿฌ๋‹ ๋…ผ๋ฌธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ž๋™์œผ๋กœ ์ฝ”๋“œ ์ €์žฅ์†Œ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋‹ค์ค‘ ์—์ด์ „ํŠธ ๊ธฐ๋ฐ˜์˜ LLM ์‹œ์Šคํ…œ์œผ๋กœ, ๊ธฐํš(planning), ๋ถ„์„(analysis), ์ฝ”๋“œ ์ƒ์„ฑ(code generation)์˜ ์„ธ ๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ๋œ ํŒŒ์ดํ”„๋ผ์ธ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

โš™๏ธ ํŒŒ์ดํ”„๋ผ์ธ ๊ตฌ์กฐ

  • ๊ธฐํš(Planning): ๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ ๋‚ด์šฉ์„ ๋ถ„์„ํ•˜์—ฌ ์ฝ”๋“œ ๊ตฌ์กฐ์™€ ํ•„์š”ํ•œ ์ž‘์—…์„ ๊ณ„ํšํ•ฉ๋‹ˆ๋‹ค.
  • ๋ถ„์„(Analysis): ๊ณ„ํš๋œ ์ž‘์—…์„ ์‹ฌ์ธต์ ์œผ๋กœ ๋ถ„์„ํ•˜๊ณ , ๋…ผ๋ฌธ์˜ ์„ธ๋ถ€ ๋‚ด์šฉ์„ ์ดํ•ดํ•ฉ๋‹ˆ๋‹ค.
  • ์ฝ”๋“œ ์ƒ์„ฑ(Code Generation): ๋ถ„์„ ๊ฒฐ๊ณผ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ณ ํ’ˆ์งˆ์˜ ์ฝ”๋“œ๋ฅผ ์ž๋™์œผ๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

    ๐Ÿš€ ์‚ฌ์šฉ ๋ฐฉ๋ฒ• (Quick Start)

โœ… OpenAI API๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ

1
2
3
4
5
pip install openai
export OPENAI_API_KEY="<OPENAI_API_KEY>"
cd scripts
bash run.sh

โœ… ์˜คํ”ˆ์†Œ์Šค ๋ชจ๋ธ(vLLM)์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ

๊ธฐ๋ณธ ๋ชจ๋ธ์€ deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct์ž…๋‹ˆ๋‹ค.

1
2
3
4
pip install vllm
cd scripts
bash run_llm.sh

๐Ÿ“ ๊ฒฐ๊ณผ๋ฌผ ๊ตฌ์กฐ

1
2
3
4
5
6
7
outputs
โ”œโ”€โ”€ Transformer
โ”‚   โ”œโ”€โ”€ analyzing_artifacts
โ”‚   โ”œโ”€โ”€ coding_artifacts
โ”‚   โ””โ”€โ”€ planning_artifacts
โ””โ”€โ”€ Transformer_repo # ์ตœ์ข… ์ƒ์„ฑ๋œ ์ฝ”๋“œ ์ €์žฅ์†Œ

๐Ÿ› ๏ธ ์ƒ์„ธ ํ™˜๊ฒฝ ์„ค์ •

ํ•„์š”์— ๋”ฐ๋ผ ์„ ํƒ์ ์œผ๋กœ ์„ค์น˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

1
2
3
4
5
pip install openai
pip install vllm
# ๋˜๋Š” ๋ชจ๋“  ์˜์กด์„ฑ ์„ค์น˜
pip install -r requirements.txt

๐Ÿ“„ PDF๋ฅผ JSON์œผ๋กœ ๋ณ€ํ™˜ (์„ ํƒ์ )

๋…ผ๋ฌธ PDF๋ฅผ JSON ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋ ค๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค:

1
2
3
4
5
6
7
8
9
git clone https://github.com/allenai/s2orc-doc2json.git
cd ./s2orc-doc2json/grobid-0.7.3
./gradlew run
mkdir -p ./s2orc-doc2json/output_dir/paper_coder
python ./s2orc-doc2json/doc2json/grobid2json/process_pdf.py \
    -i ${PDF_PATH} \
    -t ./s2orc-doc2json/temp_dir/ \
    -o ./s2orc-doc2json/output_dir/paper_coder

๐Ÿ“Š ์ €์žฅ์†Œ ํ’ˆ์งˆ ํ‰๊ฐ€ ๋ฐฉ๋ฒ•

PaperCoder๋กœ ์ƒ์„ฑ๋œ ์ €์žฅ์†Œ์˜ ํ’ˆ์งˆ์„ ๋ชจ๋ธ ๊ธฐ๋ฐ˜์œผ๋กœ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“ ๋ ˆํผ๋Ÿฐ์Šค ์—†๋Š” ํ‰๊ฐ€

1
2
3
4
5
6
7
8
9
10
11
12
cd codes/
python eval.py \
    --paper_name Transformer \
    --pdf_json_path ../examples/Transformer_cleaned.json \
    --data_dir ../data \
    --output_dir ../outputs/Transformer \
    --target_repo_dir ../outputs/Transformer_repo \
    --eval_result_dir ../results \
    --eval_type ref_free \
    --generated_n 8 \
    --papercoder

๐Ÿ“ ๋ ˆํผ๋Ÿฐ์Šค ๊ธฐ๋ฐ˜ ํ‰๊ฐ€

1
2
3
4
5
6
7
8
9
10
11
12
13
cd codes/
python eval.py \
    --paper_name Transformer \
    --pdf_json_path ../examples/Transformer_cleaned.json \
    --data_dir ../data \
    --output_dir ../outputs/Transformer \
    --target_repo_dir ../outputs/Transformer_repo \
    --gold_repo_dir ../examples/Transformer_gold_repo \
    --eval_result_dir ../results \
    --eval_type ref_based \
    --generated_n 8 \
    --papercoder

๐Ÿ“Œ Paper2Code๋ฅผ ํ†ตํ•ด ๋…ผ๋ฌธ์˜ ์•„์ด๋””์–ด๋ฅผ ํšจ์œจ์ ์œผ๋กœ ๊ตฌํ˜„ ๊ฐ€๋Šฅํ•œ ์ฝ”๋“œ๋กœ ๋น ๋ฅด๊ฒŒ ๋ณ€ํ™˜ํ•˜๊ณ , ์ฝ”๋“œ์˜ ์ •ํ™•์„ฑ๊ณผ ์‹ ๋ขฐ์„ฑ์„ ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ํ‰๊ฐ€๋ฅผ ํ†ตํ•ด ๊ฒ€์ฆํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ“‹ ๋งํฌ

https://github.com/going-doer/Paper2Code?utm_source=pytorchkr&ref=pytorchkr

This post is licensed under CC BY 4.0 by the author.