This talk covers the paper AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents by Harsh Trivedi. It introduces a simulated world environment and benchmark for agents using tools/APIs to tackle complex day-to-day tasks across apps like Amazon, Venmo, Todoist, Splitwise, Gmail, etc. Examples include: 'Return my last Amazon-ordered shirt & buy it in one size larger,' or 'I owe money to friends on Splitwise. Pay them on Venmo.' The tasks are challenging, often requiring 15+ API calls and 80+ lines of code, and include robust programmatic evaluation."
This paper is crucial for advancing interactive coding and API use, addressing gaps in benchmarks, and highlighting the limitations of state-of-the-art models.
Massive thanks to Harsh Trivedi for this insightful talk! Learn more about the paper at appworld.dev or follow him on X: @harsh3vedi.
Hello there, passionate AI enthusiasts! 🌟 We are 🐫 CAMEL-AI.org, a global coalition of students, researchers, and engineers dedicated to advancing the frontier of AI and fostering a harmonious relationship between agents and humans.
📘 Our Mission: To harness the potential of AI agents in crafting a brighter and more inclusive future for all. Every contribution we receive helps push the boundaries of what’s possible in the AI realm.
🙌 Join Us: If you believe in a world where AI and humanity coexist and thrive, then you’re in the right place. Your support can make a significant difference. Let’s build the AI society of tomorrow, together!