Paper Review on Watchstep Blog

🌳 LLaDA; Large Language Diffusion Models (2025-02) 논문 리뷰

Fri, 21 Mar 2025 00:00:00 +0900

최근 등장한 Diffusion Models들은 ARMs (Autoregressive Models) 만큼 성능이 괜찮은 편이고, context-awareness 영역에서는 성능이 더 강하다는 평이 나오고 있다. → DLMs이 전통 ARMs 대체할 새로운 대안으로 부상하고 있는 것 같다.

🏝️ CoCoMix; LLM Pretraining with Continuous Concepts (Meta, 2025) 논문 리뷰

Fri, 07 Mar 2025 00:00:00 +0900

CoCoMix (Continuous Concept Mixing)

next token prediction과 continuous concepts를 결합한 프레임워크.

pretrained sparse autoencoder를 통한 concept 추출.
continuous concept를 hidden state에 혼합 → discrete language tokens 대신 continuous latent representations으로 대체하는 접근 방식. (LLM이 본질적으로 high-level concept과 reasoning 가정을 latent representations에 내재하고 있다는 것을 설명)

1/ Problem

일반적으로 LLM은 token-level에서 학습됨. 주어진 context에 따라 가장 적절한 next token을 예측하도록 학습 → the, a, and 등과 같은 기능어 (function words ↔ content words)처럼 피상적인 단어들이 있어 모델이 reasoning하기 위해서 (심층적인 의미 이해)는 많은 훈련이 필요함.

2/ Solution

SAE (Sparse Autoencoder)를 사용해 의미 있는 concept를 추출하고, 이를 모델의 hidden state에 결합. concept은 next token prediction에 직접적으로 기여하게 됨. (각 context에 대해 의미 있는 concept를 효과적으로 추출해 표현)

🥥 CoCoNut; Training Large Language Models to Reason in a Continuous Latent Space (Meta, 2024) 논문 리뷰

Fri, 24 Jan 2025 00:00:00 +0900

1/ Chain-Of-Thought (CoT)

CoT 한계: LLM의 reasoning이 텍스트 형태로 생성되어야 한다는 점은 제약을 가할 수 있다.

Neuroimaging 연구에 의하면 언어 이해 및 생성을 담당하는 인간 두뇌 영역이 추론 과정 중에는 비활성화된다고 함. 이는 언어는 communication에 적합할 뿐 복잡한 문제 해결에는 불필요하다는 것을 시사한다.