<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Qa on Watchstep Blog</title><link>https://blog.watchstep.site/categories/qa/</link><description>Recent content in Qa on Watchstep Blog</description><generator>Hugo</generator><language>en</language><copyright>Â©Â 2025 watchstep</copyright><lastBuildDate>Thu, 18 Dec 2025 09:26:10 +0900</lastBuildDate><atom:link href="https://blog.watchstep.site/categories/qa/index.xml" rel="self" type="application/rss+xml"/><item><title>❓What is the difference between LLaDA and BERT?</title><link>https://blog.watchstep.site/posts/llada-qa/</link><pubDate>Fri, 04 Apr 2025 09:26:10 +0900</pubDate><guid>https://blog.watchstep.site/posts/llada-qa/</guid><description>&lt;h2 id="how-do-the-masking-of-llada-large-language-diffusion-with-masking-and-bert-differ">How do the &amp;ldquo;masking&amp;rdquo; of LLaDA (Large Language Diffusion with Masking) and BERT differ?&lt;/h2>
&lt;p>&lt;a href="https://arxiv.org/abs/1810.04805">BERT (Bidirectional Encoder Representations from Transformers)&lt;/a> 와 &lt;a href="https://arxiv.org/abs/2502.09992">LLaDA (Large Language Diffusion with Masking)&lt;/a> 는 모두 “masking” 기법을 사용한다.&lt;/p></description></item></channel></rss>