[Paper]
If you just want the NNetNav-6k dataset, you can download it via huggingface:
from datasets import load_dataset
ds = load_dataset("smurty/NNetNav-6k")
If you want to generate your own synthetic demonstrations for websites, and train your own web agents, then read on!
- Add support for collecting synthetic demonstrations on live websites.
- Release pre-trained LLama-8b web-agents
- [11/18/2024]: Github repo released, along with NNetNav-6k dataset.
- Project Structure
- Environment Setup
- Setting Environment Variables
- Collecting Demonstrations
- Converting Demonstrations for Supervised Finetuning (SFT)
- Finetuning LLama with NNetNav Demonstrations
- Evaluating a Trained Agent
- Contributing
- Citation
NnetscapeNavigator/
├── src/
│ ├── agent/
│ ├── browser_env/
│ ├── postprocess_outputs.py # Simple book-keeping script
│ └── postprocess_trajectories.py # Add stop action / retroactive reasoning
│ ├── nnetnav_utils.py # Contains main logic for exploration / pruning
│ └── run_nnetnav.py # Entry point for running nnetnav
├── scripts/ # Executable scripts
│ ├── convert_to_sft.py
│ ├── merge_log_files.py
│ └── ...
├── notebooks/
│ ├── analyze_demonstration_quality.ipynb
│ ├── explore_config_files.ipynb
│ └── ...
├── evaluation/
│ ├── eval_miniwob.py
│ ├── eval_webarena.py
│ └── get_per_domain_acc.py
│ ├── run_lm_reward.py
│ └── ...
├── finetuning/
│ ├── scripts/
│ ├── data/
│ └── open_instruct/
│ └── ...
│
├── setup.py
├── requirements.txt # Dependencies
└── README.md
conda create -n agents python=3.10.14;
pip install -r requirements.txt
pip install -e .
This codebase uses two environments, MiniWoB++ and WebArena. For setup instructions follow:
If you run into issues with either setup, please open an issue on the respective repositories. Note that correctly setting up environments is a pre-requisite for the rest of the code
Make sure the following environment variables have been set up:
export SHOPPING="<SHOPPING_URL>"
export SHOPPING_ADMIN="<SHOPPING_ADMIN_URL>"
export REDDIT="<REDDIT_URL>"
export GITLAB="<GITLAB_URL>"
export MAP="<MAP_URL>"
export WIKIPEDIA="<WIKIPEDIA_URL>"
export HOMEPAGE="<HOMEPAGE_URL>"
export OPENAI_API_KEY=<OPENAI_KEY>
export MINIWOB_URL=<MINIWOB_URL>
export LLAMA_API_KEY=<LLAMA_API_KEY>
Here, note that LLAMA_API_KEY
is needed to make calls to a vLLM endpoint. Assuming you have installed vLLM correctly, here's how to create a vLLM endpoint
vllm serve --model meta-llama/Meta-Llama-3-8B-Instruct --api-key smurty-llama-8b-instruct
and then set LLAMA_API_KEY=smurty-llama-8b-instruct
.
The starting point for running NNetNav is src/run_nnetnav.py. Here's how to run it:
python src/run_nnetnav.py \
--model gpt-4o-mini \
--result_dir EXPLORATION_NNETNAV \
--filter_dir DATA_NNETNAV \
--seed_dir seed_states_webarena/ \
--exploration_size_per_seed 100 \
--use_personas
Important note on seed_dir: Here, seed_states is all the starting configs for setting up browser state. WebArena has 5 websites, and so there are 5 seed states. Make sure to correctly configure the start_url
based on your webarena setup!
After this, run:
python src/postprocess_outputs.py \
--orig_dir EXPLORATION_NNETNAV \
--filter_dir DATA_NNETNAV
And then finally run:
python src/postprocess_trajectories.py \
--model gpt-4o-mini \
--data_dir DATA_NNETNAV \
--environment_type webarena \
--n_jobs 10
This will give you demonstrations saved in DATA_NNETNAV. If you're using a vLLM endpoint (e.g. meta-llama/Meta-Llama-3-8B-Instruct), replace gpt-4o-mini
with meta-llama/Meta-Llama-3-8B-Instruct
.
Next, we convert demonstrations into (input, output) pairs for an LLM agent. The following will create a data.jsonl file that can be directly used for supervised finetuning!
python scripts/convert_to_sft.py \
--nnetnav_dem_dir DATA_NNETNAV \
--output_dir finetuning/data/processed/my_nnetnav_data
If you correctly followed all the steps till now, you should be able to train your own local LLama web-agent. To do this, first cd finetuning
. Then, launch
./scripts/finetune_with_accelerate.sh
And let the GPUs go brr. The script is configured for my personal setup. Feel free to modify it for your own setup (e.g. change CUDA_VISIBLE_DEVICES
and NUM_GPUS
and output_dir
).
First, launch a vLLM server with the new local llama model
vllm serve --model /path/to/local/llama --api-key smurty-llama-8b-instruct
To evaluate on WebArena, first reset the WebArena environment (check out WebArena setup for this). Then run:
python evaluation/eval_webarena.py \
--model /path/to/local/llama \
--result_dir llama_8b_nnetnav \
--instruction_path src/agent/prompts/jsons/p_cot_llama_action_history.json \
--test_end_idx 800
Contributions are welcome! If you encounter any issues, feel free to open a GitHub issue or submit a pull request.
If you use NNetNav in your work, please cite us using the following BibTeX entry:
@article{murty2024nnetscape,
title={NNetscape Navigator: Complex Demonstrations for Web Agents Without a Demonstrator},
author={Murty, Shikhar and Bahdanau, Dzmitry and Manning, Christopher D},
journal={arXiv preprint arXiv:2410.02907},
year={2024}
}