Aligning models with human values

Generative AI with Large Language Models

by Taeyoon.Kim.DS 2023. 8. 28. 18:18

https://www.coursera.org/learn/generative-ai-with-llms/lecture/yV8WP/aligning-models-with-human-values

Aligning models with human values - Week 3 | Coursera

Video created by deeplearning.ai, Amazon Web Services for the course "Generative AI with Large Language Models". Reinforcement learning and LLM-powered applications

www.coursera.org

In this video, the speaker discusses the Generative AI project life cycle and focuses on the technique of fine-tuning with instructions, particularly addressing challenges related to natural-sounding language generated by large language models (LLMs). While fine-tuning can improve a model's understanding of human prompts and lead to more human-like responses, it also raises concerns about LLMs behaving badly. These issues include models using toxic language, providing incorrect information, or promoting harmful actions. The importance of aligning models with human values, specifically helpfulness, honesty, and harmlessness (HHH), is emphasized. Additional fine-tuning with human feedback is introduced as a method to better align LLMs with these values and reduce issues with model behavior.

이 비디오에서 스피커는 생성적 AI 프로젝트 수명 주기를 다시 살펴보고, 특히 명령을 통한 미세 조정 기술과 관련된 자연스러운 언어 생성과 관련된 도전 과제에 중점을 둡니다. 미세 조정은 모델이 인간 프롬프트를 더 잘 이해하고 더 인간 같은 응답을 생성하도록 개선할 수 있지만, LLMs가 부적절하게 행동하는 문제도 발생할 수 있습니다. 이러한 문제에는 모델이 독성 언어를 사용하거나 부정확한 정보를 제공하거나 해로운 행동을 촉구하는 경우가 포함됩니다. 인간 가치와 특히 도움, 정직 및 무해함 (HHH)이 강조되며 모델을 이러한 가치와 더 잘 일치시키는 중요성이 강조됩니다. 인간 피드백을 통한 추가 미세 조정은 모델을 이러한 가치와 더 잘 일치시키고 모델 행동 문제를 줄이는 방법으로 소개됩니다.

저작자표시 비영리 변경금지

'Generative AI with Large Language Models' 카테고리의 다른 글

RLHF: Obtaining feedback from humans (0)	2023.08.28
Reinforcement learning from human feedback (RLHF) (0)	2023.08.28
Introduction - Week 3 (Notion AI) (0)	2023.08.28
Introduction - Week 3 (0)	2023.08.23
Lab 2 walkthrough (0)	2023.08.23