Skip to content

RoBERTa Translation #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 7 additions & 17 deletions pytorch_fairseq_roberta.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,33 +16,23 @@ demo-model-link: https://huggingface.co/spaces/pytorch/RoBERTa
---


### Model Description
### 모델 설명

Bidirectional Encoder Representations from Transformers, or [BERT][1], is a
revolutionary self-supervised pretraining technique that learns to predict
intentionally hidden (masked) sections of text. Crucially, the representations
learned by BERT have been shown to generalize well to downstream tasks, and when
BERT was first released in 2018 it achieved state-of-the-art results on many NLP
benchmark datasets.
Bidirectional Encoder Representations from Transformers, [BERT][1]는 텍스트에서 의도적으로 숨겨진(masked) 부분을 예측하는 학습에 획기적인 self-supervised pretraining 기술이다. 결정적으로 BERT가 학습한 표현은 downstream tasks에 잘 일반화되는 것으로 나타났으며, BERT가 처음 출시된 2018년에 많은 NLP benchmark datasets에서 state-of-the-art 결과를 달성했다.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

state-of-the-art 저는 다른 문서에서는 최신 성능이라고 해석했는 이부분에서는 최적의 결과나 가장 성능 좋은 결과는 어떤지 제안해봅니다😀


[RoBERTa][2] builds on BERT's language masking strategy and modifies key
hyperparameters in BERT, including removing BERT's next-sentence pretraining
objective, and training with much larger mini-batches and learning rates.
RoBERTa was also trained on an order of magnitude more data than BERT, for a
longer amount of time. This allows RoBERTa representations to generalize even
better to downstream tasks compared to BERT.
[RoBERTa][2]는 BERT의 language masking strategy를 기반으로 구축되며, BERT의 next-sentence pretraining objective를 제거하고 훨씬 더 큰 미니 배치와 학습 속도로 훈련하는 등 주요 하이퍼파라미터를 수정한다. 또한 RoBERTa는 더 오랜 시간 동안 BERT보다 훨씬 많은 데이터에 대해 학습되었다. 이를 통해 RoBERTa의 표현은 BERT와 비교해 downstream tasks을 훨씬 잘 일반화할 수 있다.


### Requirements
### 요구 사항

We require a few additional Python dependencies for preprocessing:
전처리 과정을 위해 추가적인 Python 의존성이 필요합니다.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

윗부분 문장들도 이부분처럼 전체 문장의 끝을 합니다로 쓴다면 토치 튜토리얼 문서나 번역된 허브 문서와도 통일성이 있을거 같습니다.


```bash
pip install regex requests hydra-core omegaconf
```


### Example
### 예시

##### Load RoBERTa

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

소제목도 같이 번역하면 좋을거 같습니다. RoBERTa 불러오기는 어떨까요?

```python
Expand Down Expand Up @@ -95,7 +85,7 @@ logprobs = roberta.predict('new_task', tokens) # tensor([[-1.1050, -1.0672, -1.
```


### References
### 참고

- [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding][1]
- [RoBERTa: A Robustly Optimized BERT Pretraining Approach][2]
Expand Down