Mathematics Data Science Seminar: Ethan Brooks, In-Context Policy Iteration

Name: Mathematics Data Science Seminar: Ethan Brooks, In-Context Policy Iteration
Start: 2024-03-27T14:30:00-04:00
End: 2024-03-27T15:30:00-04:00

This event is in the past.

When:

March 27, 2024
2:30 p.m. to 3:30 p.m.

Where:

Zoom

Event category: Seminar

Virtual

Speaker: Ethan Brooks, Technical Staff at Reflection AI

Time: Wednesday, March 27, 2:30pm-3:30pm

Place: Virtual

Zoom link:

https://wayne-edu.zoom.us/j/96316494795?pwd=Ylc3M0R0R1BYaUZGSnB2dkI2UFRVQT09

Meeting ID: 963 1649 4795

Passcode: 271178

Title: In-Context Policy Iteration

Abstract:

In this talk, we present In-Context Policy Iteration, an algorithm for performing Reinforcement Learning (RL), in-context, using foundation models. While the application of foundation models to RL has received considerable attention, most approaches at the time of publication relied on either (1) the curation of expert demonstrations (either through manual design or task-specific pretraining) or (2) adaptation to the task of interest using gradient methods (either fine-tuning or training of adapter layers). Both of these techniques have drawbacks. Collecting demonstrations is labor-intensive, and algorithms that rely on them do not outperform the experts from which the demonstrations were derived. All gradient techniques are inherently slow, sacrificing the "few-shot" quality that made in-context learning attractive to begin with. In this work, we present an algorithm, ICPI, that learns to perform RL tasks without expert demonstrations or gradients. Instead we present a policy-iteration method in which the prompt content is the entire locus of learning. ICPI iteratively updates the contents of the prompt from which it derives its policy through trial-and-error interaction with an RL environment. In order to eliminate the role of in-weights learning (on which approaches like Decision Transformer rely heavily), we demonstrate our algorithm using Codex, a language model with no prior knowledge of the domains on which we evaluate it.

Contact

Rohini Kumar
rohini.kumar@wayne.edu

Cost

Free

Calendars

Research Events, Mathematics, Main Events Calendar

Audience

Current students, Faculty

SU	M	TU	W	TH	F	SA
25	26	27	28	29	1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31	1	2	3	4	5	6

SU	M	TU	W	TH	F	SA
25	26	27	28	29	1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31	1	2	3	4	5	6

SU	M	TU	W	TH	F	SA
25	26	27	28	29	1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31	1	2	3	4	5	6

SU	M	TU	W	TH	F	SA
25	26	27	28	29	1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31	1	2	3	4	5	6