파이썬 - "Transformers (신경망 언어모델 라이브러리) 강좌" - 1장 2절 코드 실습
다음의 강좌에서,
Transformers (신경망 언어모델 라이브러리) 강좌
; https://wikidocs.net/book/8056
1장 2절의 내용에,
2. 🤗Transformers가 할 수 있는 일들
; https://wikidocs.net/166787
포함된 코드를 구글 Colab에서 수행한 결과를 나열해 봅니다. ^^
!pip install transformers
from transformers import pipeline
classifier = pipeline("sentiment-analysis")
classifier("I've been waiting for a HuggingFace course my whole life.")
classifier(["I've been waiting for a HuggingFace course my whole life.", "I hate this so much!"])
# 실행 결과
[{'label': 'POSITIVE', 'score': 0.9598048329353333},
{'label': 'NEGATIVE', 'score': 0.9994558691978455}]
from transformers import pipeline
classifier = pipeline("zero-shot-classification")
classifier(
"This is a course about the Transformers library",
candidate_labels=["education", "politics", "business"],
)
# 실행 결과
{'sequence': 'This is a course about the Transformers library',
'labels': ['education', 'business', 'politics'],
'scores': [0.8445989489555359, 0.11197412759065628, 0.04342695698142052]}
from transformers import pipeline
generator = pipeline("text-generation")
generator("In this course, we will teach you how to")
# 실행 결과
[{'generated_text': "In this course, we will teach you how to use NLP with the following tasks. In this course, we will work with a computer running NLP. I'm using the npc-get system to find your NPM scripts and to start"}]
from transformers import pipeline
generator = pipeline("text-generation", model="distilgpt2") # distilgpt2 모델을 로드한다.
generator(
"In this course, we will teach you how to",
max_length=30,
num_return_sequences=2,
)
# 실행 결과
[{'generated_text': 'In this course, we will teach you how to create a simple and fun web design using Photoshop for building a simple website.\n\n\n\nThe'},
{'generated_text': 'In this course, we will teach you how to apply the following basic concepts to your life (see below). This course aims to help you to choose'}]
from transformers import pipeline
unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about models.", top_k=3)
# 실행 결과
[{'score': 0.19619806110858917,
'token': 30412,
'token_str': ' mathematical',
'sequence': 'This course will teach you all about mathematical models.'},
{'score': 0.04052723944187164,
'token': 38163,
'token_str': ' computational',
'sequence': 'This course will teach you all about computational models.'},
{'score': 0.03301795944571495,
'token': 27930,
'token_str': ' predictive',
'sequence': 'This course will teach you all about predictive models.'}]
from transformers import pipeline
ner = pipeline("ner", grouped_entities=True)
ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")
# 실행 결과
[{'entity_group': 'PER',
'score': 0.9981694,
'word': 'Sylvain',
'start': 11,
'end': 18},
{'entity_group': 'ORG',
'score': 0.9796019,
'word': 'Hugging Face',
'start': 33,
'end': 45},
{'entity_group': 'LOC',
'score': 0.9932106,
'word': 'Brooklyn',
'start': 49,
'end': 57}]
from transformers import pipeline
question_answerer = pipeline("question-answering")
question_answerer(
question="Where do I work?",
context="My name is Sylvain and I work at Hugging Face in Brooklyn",
)
# 실행 결과
{'score': 0.6949767470359802, 'start': 33, 'end': 45, 'answer': 'Hugging Face'}
from transformers import pipeline
summarizer = pipeline("summarization")
summarizer(
"""
America has changed dramatically during recent years. Not only has the number of
graduates in traditional engineering disciplines such as mechanical, civil,
electrical, chemical, and aeronautical engineering declined, but in most of
the premier American universities engineering curricula now concentrate on
and encourage largely the study of engineering science. As a result, there
are declining offerings in engineering subjects dealing with infrastructure,
the environment, and related issues, and greater concentration on high
technology subjects, largely supporting increasingly complex scientific
developments. While the latter is important, it should not be at the expense
of more traditional engineering.
Rapidly developing economies such as China and India, as well as other
industrial countries in Europe and Asia, continue to encourage and advance
the teaching of engineering. Both China and India, respectively, graduate
six and eight times as many traditional engineers as does the United States.
Other industrial countries at minimum maintain their output, while America
suffers an increasingly serious decline in the number of engineering graduates
and a lack of well-educated engineers.
"""
)
# 실행 결과
[{'summary_text': ' America has changed dramatically during recent years . The number of engineering graduates in the U.S. has declined in traditional engineering disciplines such as mechanical, civil, electrical, chemical, and aeronautical engineering . Rapidly developing economies such as China and India, as well as other industrial countries in Europe and Asia, continue to encourage and advance engineering .'}]
from transformers import pipeline
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-ko-en")
translator("그동안 너무 잘해 주셔서 감사드립니다.")
# 실행 결과
[{'translation_text': 'Thank you so much for your kindness.'}]
from transformers import pipeline
unmasker = pipeline("fill-mask", model="bert-base-uncased")
result = unmasker("This man works as a [MASK].")
print([r["token_str"] for r in result])
result = unmasker("This woman works as a [MASK].")
print([r["token_str"] for r in result])
# 실행 결과
['carpenter', 'lawyer', 'farmer', 'businessman', 'doctor']
['nurse', 'maid', 'teacher', 'waitress', 'prostitute']
참고로, Colab이 아닌 Windows에서의 python 환경에서 테스트하고 싶다면 우선 python 3.10으로 설치하고,
Python 3.10.0
; https://www.python.org/downloads/release/python-3100/
제 경우에는
"Windows embeddable package (64-bit)"를 다운로드했고 (따라서 _pth 파일과 pip을 별도로 설정한 다음), virtualenv도 마저 설치합니다.
이후 새로운 virtualenv 환경을 만들고,
C:\python\llml> virtualenv test
created virtual environment CPython3.10.0.final.0-64 in 3934ms
...[생략]...
활성화시킨 후,
C:\python\llml> cd test
C:\python\llml\test> .\Scripts\activate
(test) C:\python\llml\test>
transformers를 설치합니다.
(test) C:\python\llml\test> python -m pip install "transformers[sentencepiece]"
그런데, 이것만으로는 pipeline 예제를 실행하는 경우 예외가 발생합니다.
Traceback (most recent call last):
File "C:\python\llml\test\sc1.py", line 3, in
unmasker = pipeline("fill-mask", model="bert-base-uncased")
File "C:\python\llml\test\lib\site-packages\transformers\pipelines\__init__.py", line 788, in pipeline
framework, model = infer_framework_load_model(
File "C:\python\llml\test\lib\site-packages\transformers\pipelines\base.py", line 222, in infer_framework_load_model
raise RuntimeError(
RuntimeError: At least one of TensorFlow 2.0 or PyTorch should be installed. To install TensorFlow 2.0, read the instructions at https://www.tensorflow.org/install/ To install PyTorch, read the instructions at https://pytorch.org/.
메시지에서 의미하듯이 PyTorch (또는
tensorflow)를 설치해야 하는데요,
START LOCALLY
; https://pytorch.org/get-started/locally/
// NVidia CUDA 11.8
python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
// CPU
python -m pip install torch torchvision torchaudio
PyTorch의 경우 지원하는 Compute Platform에 CPU와 CUDA만 있으므로 아쉽게도
AMD 그래픽 카드에서는 사용할 수 없습니다. 하지만, 이미 이 글에서 실습한 코드들의 경우 Model을 직접 훈련시키는 것이 아닌, 이미 훈련된 Model을 사용하는 것에 불과하므로 CPU로도 문제없이 실습이 가능합니다. (
3장의 미세 조정 학습까지는!)
[이 글에 대해서 여러분들과 의견을 공유하고 싶습니다. 틀리거나 미흡한 부분 또는 의문 사항이 있으시면 언제든 댓글 남겨주십시오.]