• Skip to main content
  • Skip to primary sidebar

Biz Builder Mike

You can't sail Today's boat on Yesterdays wind - Michael Noel

  • Tokenomics is not Economics – Digital CX -The Digital Transformation
  • Resume / CV – Michael Noel
  • Contact Us
  • Featured
You are here: Home / AI / Learning with queried hints

Jan 25 2023

Learning with queried hints

Posted by Sreenivas Gollapudi, Senior Staff Research Scientist, and Kostas Kollias, Staff Research Scientist, Google Research, Algorithms & Optimization Team

In many computing applications the system needs to make decisions to serve requests that arrive in an online fashion. Consider, for instance, the example of a navigation app that responds to driver requests. In such settings there is inherent uncertainty about important aspects of the problem. For example, the preferences of the driver with respect to features of the route are often unknown and the delays of road segments can be uncertain. The field of online machine learning studies such settings and provides various techniques for decision-making problems under uncertainty.

A navigation engine has to decide how to route this user’s request. The satisfaction of the user will depend on the (uncertain) congestion of the two routes and unknown preferences of the user on various features, such as how scenic, safe, etc., the route is.

A very well known problem in this framework is the multi-armed bandit problem, in which the system has a set of n available options (arms) from which it is asked to choose in each round (user request), e.g., a set of precomputed alternative routes in navigation. The user’s satisfaction is measured by a reward that depends on unknown factors such as user preferences and road segment delays. An algorithm’s performance over T rounds is compared against the best fixed action in hindsight by means of the regret (the difference between the reward of the best arm and the reward obtained by the algorithm over all T rounds). In the experts variant of the multi-armed bandit problem, all rewards are observed after each round and not just the one played by the algorithm.

An instance of the experts problem. The table presents the rewards obtained by following each of the 3 experts at each round = 1, 2, 3, 4. The best expert in hindsight (and hence the benchmark to compare against) is the middle one, with total reward 21. If, for example, we had selected expert 1 in the first two rounds and expert 3 in the last two rounds (recall that we need to select before observing the rewards of each round), we would have extracted reward 17, which would give a regret equal to 21 – 17 = 4.

These problems have been extensively studied, and existing algorithms can achieve sublinear regret. For example, in the multi-armed bandit problem, the best existing algorithms can achieve regret that is of the order √T. However, these algorithms focus on optimizing for worst-case instances, and do not account for the abundance of available data in the real world that allows us to train machine learned models capable of aiding us in algorithm design.

In “Online Learning and Bandits with Queried Hints” (presented at ITCS 2023), we show how an ML model that provides us with a weak hint can significantly improve the performance of an algorithm in bandit-like settings. Many ML models are trained accurately using relevant past data. In the routing application, for example, specific past data can be used to estimate road segment delays and past feedback from drivers can be used to learn the quality of certain routes. Models trained with such data can, in certain cases, give very accurate feedback. However, our algorithms achieve strong guarantees even when the feedback from the model is in the form of a less explicit weak hint. Specifically, we merely ask that the model predict which of two options will be better. In the navigation application this is equivalent to having the algorithm pick two routes and query an ETA model for which of the two is faster, or presenting the user with two routes with different characteristics and letting them pick the one that is best for them. By designing algorithms that leverage such a hint we can: Improve the regret of the bandits setting on an exponential scale in terms of dependence on T and improve the regret of the experts setting from order of √T to become independent of T. Specifically, our upper bound only depends on the number of experts n and is at most log(n).

Algorithmic Ideas

Our algorithm for the bandits setting utilizes the well known upper confidence bound (UCB) algorithm. The UCB algorithm maintains, as a score for each arm, the average reward observed on that arm so far and adds to it an optimism parameter that becomes smaller with the number of times the arm has been pulled, thus balancing between exploration and exploitation. Our algorithm applies the UCB scores on pairs of arms, mainly in an effort to utilize the available pairwise comparison model that can designate the better of two arms. Each pair of arms i and j is grouped as a meta-arm (i, j) whose reward in each round is equal to the maximum reward between the two arms. Our algorithm observes the UCB scores of the meta-arms and picks the pair (i, j) that has the highest score. The pair of arms are then passed as a query to the ML auxiliary pairwise prediction model, which responds with the best of the two arms. This response is the arm that is finally used by the algorithm.

The decision problem considers three candidate routes. Our algorithm instead considers all pairs of the candidate routes. Suppose pair 2 is the one with the highest score in the current round. The pair is given to the auxiliary ML pairwise prediction model, which outputs whichever of the two routes is better in the current round.

Our algorithm for the experts setting takes a follow-the-regularized-leader (FtRL) approach, which maintains the total reward of each expert and adds random noise to each, before picking the best for the current round. Our algorithm repeats this process twice, drawing random noise two times and picking the highest reward expert in each of the two iterations. The two selected experts are then used to query the auxiliary ML model. The model’s response for the best between the two experts is the one played by the algorithm.

Results

Our algorithms utilize the concept of weak hints to achieve strong improvements in terms of theoretical guarantees, including an exponential improvement in the dependence of regret on the time horizon or even removing this dependence altogether. To illustrate how the algorithm can outperform existing baseline solutions, we present a setting where 1 of the n candidate arms is consistently marginally better than the n-1 remaining arms. We compare our ML probing algorithm against a baseline that uses the standard UCB algorithm to pick the two arms to submit to the pairwise comparison model. We observe that the UCB baseline keeps accumulating regret whereas the probing algorithm quickly identifies the best arm and keeps playing it, without accumulating regret.

An example in which our algorithm outperforms a UCB based baseline. The instance considers n arms, one of which is always marginally better than the remaining n-1.

Conclusion

In this work we explore how a simple pairwise comparison ML model can provide simple hints that prove very powerful in settings such as the experts and bandits problems. In our paper we further present how these ideas apply to more complex settings such as online linear and convex optimization. We believe our model of hints can have more interesting applications in ML and combinatorial optimization problems.

Acknowledgements

We thank our co-authors Aditya Bhaskara (University of Utah), Sungjin Im (University of California, Merced), and Kamesh Munagala (Duke University).

[mailpoet_form id="1"]

Learning with queried hints Republished from Source http://ai.googleblog.com/2023/01/learning-with-queried-hints.html via http://feeds.feedburner.com/blogspot/gJZg

crowdsourcing week

Written by Google AI · Categorized: AI · Tagged: AI

Primary Sidebar

https://youtu.be/Qvad1CQ9WOM

Blockchain Weekly Rebooted –

During the Blockchain Spring 2016 to 2020 I hosted Blockchain Weekly. Each week I interviewed someone who was doing interesting things in the blockchain space. At one time we had 29k subscribers and we were consistently getting over 15k views a week on the channel. All of that went away during the lockdown, including the Gmail address that controlled that channel. Recently, I found some of the original videos on some old hard drives. So I’m reposting a few of the relevant ones while I am starting to shoot new Blockchain Weekly Episodes to be aired 1st quarter 2023. Please subscribe to bless the You Tube Algorithm, and allow me to let you know about any updates! Our Sponsor – https://BlockchainConsultants.io

The Utah State Legislature has approved a new law, the Utah Decentralized Autonomous Organizations Act, providing legal recognition and limited liability to decentralized autonomous organizations (DAOs). This legislation, also known as the “Utah LLDs,” was passed after the combined efforts of the Digital Innovation Taskforce and the Utah Blockchain Legislature. The Utah DAO Act defines […]

Search Here

Market Insights

  • FTX Founder Allegedly Sought Federal Regulation Before Collapse
  • DeFi Hack Linked to North Korea
  • US Banking Crisis Fuels Regulation Debate
  • HSBC approves multi-million-pound bonuses for Silicon Valley Bank UK staff
  • Swiss regulators consider UBS takeover of Credit Suisse to prevent collapse
  • Mid-Size Banks Ask for Deposit Insurance Extension
  • Former Coinbase CTO Bets $1 Million on Bitcoin Reaching $1 Million in 90 Days
  • Binance Responds to U.S. Senators Letter, Excludes Financial Data
  • Crypto Entrepreneur Bail Package Revised
  • DeFi Hacker Returns $5.4M to Euler Finance

Tags

AI (259) andrewchen (4) Biz Builder Mike (28) Blockchain (824) Crowd Funding (68) crowdfundinsider (2) entrepreneur (957) eonetwork (42) Front Page Featured (33) MIT AI (89) startupmindset (145) Technology (667) virtual reality (1) youngupstarts (99)
  • Twitter
  • Facebook
  • About Us
  • LinkedIn
  • ANTI-SPAM POLICY
  • Google+
  • API Terms and Conditions
  • RSS
  • Archive Page
  • Biz Builder Mike is all about New World Marketing
  • Cryptocurrency Exchange
  • Digital Millennium Copyright Act (DMCA) Notice
  • DMCA Safe Harbor Explained: Why Your Website Needs a DMCA/Copyright Policy
  • Marketing? Well, how hard can that be?
  • Michael Noel
  • Michael Noel CBP
  • Noels Law of decentralization

Copyright © 2023 · Altitude Pro on Genesis Framework · WordPress · Log in