Anyone taking an exam understands the language and other challenges associated with providing paragraph answers to open-ended questions versus simple yes / no or multiple choice questions. Such a long question-answer (LFQA) presents similar challenges in the field of natural language processing (NLP) research, where existing approaches tend to focus on two main task elements: retrieval and synthesis. information.
In the new journal WebGPT: browser-assisted question answer with human feedback, an OpenAI research team combines these existing approaches with improved training goals. They use the Microsoft Bing Web Search API for document retrieval and unsupervised pre-training and fine-tuning on the GPT-3 Large Language Model for high-quality synthesis. They then use human feedback to directly optimize the quality of responses, allowing their method to achieve human-level performance on LFQA tasks.
The team summarizes its main contributions as follows:
- We create a text-based web browsing environment that a refined language model can interact with. This allows us to improve both recovery and end-to-end synthesis using general methods such as imitation learning and reinforcement learning.
- We generate responses with references: passages extracted by the model from web pages during navigation. This is crucial to enable labellers to judge the factual accuracy of responses without engaging in a difficult and subjective process of independent research.
Contemporary search engines are powerful, fast, and can provide up-to-date knowledge. This has led humans to increasingly rely on search engines when looking for answers to questions – our total daily web search estimates run into the billions. OpenAI researchers therefore set out to design a text-based web browsing environment that would allow pre-trained language models to mimic such human web search behavior.
Prompted for a question and contextual and additional information, the proposed WebGPT template performs web actions such as performing a Bing search, clicking on links, scrolling through documents, and retrieving references and citations. . Navigation continues until the model issues a command to end navigation, the maximum number of actions is reached, or the maximum total length of references is reached. Finally, if at least one relevant reference has been detected, the model will compose a long answer to the question.
The team also designed a graphical interface for their text-based web browsing environment to allow users to provide ancillary annotations and comparison ratings to further enhance understanding of the question model.
The team refined the GPT-3 models in sizes 760M, 13B and 175B and used four main training methods: Behavior Cloning (BC), Reward Modeling (RM), Reinforcement Learning (RL) and reject sampling (best-of-n). They evaluated the proposed WebGPT on questions from the ELI5 (Explain Like I’m 5) subreddit, with human reviewer judgments based on the criteria that answers should be relevant, consistent, and supported by reliable references.
In evaluations, the best-of-64 WebGPT 175B model responses were preferred over those written by human demonstrators 56% of the time and preferred over the reference responses from the ELI5 dataset 69% of the time.
Overall, the work demonstrates that a refined pre-trained language model leveraging a text-based web browsing environment can achieve high response quality on LFQA tasks, outperforming even humans on the ELI5 dataset.
The paper WebGPT: browser-assisted question answer with human feedback is on OpenAI.com.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Weekly Synchronized Global AI for weekly AI updates.