Academic literature on the topic 'Non-linear reward functions'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Non-linear reward functions.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Non-linear reward functions"

1

Toro Icarte, Rodrigo, Toryn Q. Klassen, Richard Valenzano, and Sheila A. McIlraith. "Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning." Journal of Artificial Intelligence Research 73 (January 11, 2022): 173–208. http://dx.doi.org/10.1613/jair.1.12440.

Full text
Abstract:
Reinforcement learning (RL) methods usually treat reward functions as black boxes. As such, these methods must extensively interact with the environment in order to discover rewards and optimal policies. In most RL applications, however, users have to program the reward function and, hence, there is the opportunity to make the reward function visible – to show the reward function’s code to the RL agent so it can exploit the function’s internal structure to learn optimal policies in a more sample efficient manner. In this paper, we show how to accomplish this idea in two steps. First, we propose reward machines, a type of finite state machine that supports the specification of reward functions while exposing reward function structure. We then describe different methodologies to exploit this structure to support learning, including automated reward shaping, task decomposition, and counterfactual reasoning with off-policy learning. Experiments on tabular and continuous domains, across different tasks and RL agents, show the benefits of exploiting reward structure with respect to sample efficiency and the quality of resultant policies. Finally, by virtue of being a form of finite state machine, reward machines have the expressive power of a regular language and as such support loops, sequences and conditionals, as well as the expression of temporally extended properties typical of linear temporal logic and non-Markovian reward specification.
APA, Harvard, Vancouver, ISO, and other styles
2

Pirrone, Angelo, Andreagiovanni Reina, and Fernand Gobet. "Input-dependent noise can explain magnitude-sensitivity in optimal value-based decision-making." Judgment and Decision Making 16, no. 5 (September 2021): 1221–33. http://dx.doi.org/10.1017/s1930297500008408.

Full text
Abstract:
AbstractRecent work has derived the optimal policy for two-alternative value-based decisions, in which decision-makers compare the subjective expected reward of two alternatives. Under specific task assumptions — such as linear utility, linear cost of time and constant processing noise — the optimal policy is implemented by a diffusion process in which parallel decision thresholds collapse over time as a function of prior knowledge about average reward across trials. This policy predicts that the decision dynamics of each trial are dominated by the difference in value between alternatives and are insensitive to the magnitude of the alternatives (i.e., their summed values). This prediction clashes with empirical evidence showing magnitude-sensitivity even in the case of equal alternatives, and with ecologically plausible accounts of decision making. Previous work has shown that relaxing assumptions about linear utility or linear time cost can give rise to optimal magnitude-sensitive policies. Here we question the assumption of constant processing noise, in favour of input-dependent noise. The neurally plausible assumption of input-dependent noise during evidence accumulation has received strong support from previous experimental and modelling work. We show that including input-dependent noise in the evidence accumulation process results in a magnitude-sensitive optimal policy for value-based decision-making, even in the case of a linear utility function and a linear cost of time, for both single (i.e., isolated) choices and sequences of choices in which decision-makers maximise reward rate. Compared to explanations that rely on non-linear utility functions and/or non-linear cost of time, our proposed account of magnitude-sensitive optimal decision-making provides a parsimonious explanation that bridges the gap between various task assumptions and between various types of decision making.
APA, Harvard, Vancouver, ISO, and other styles
3

Yu, Lingtao, Yongqiang Xia, Pengcheng Wang, and Lining Sun. "Automatic adjustment of laparoscopic pose using deep reinforcement learning." Mechanical Sciences 13, no. 1 (June 28, 2022): 593–602. http://dx.doi.org/10.5194/ms-13-593-2022.

Full text
Abstract:
Abstract. Laparoscopic arm and instrument arm control tasks are usually accomplished by an operative doctor. Because of intensive workload and long operative time, this method not only causes the operation not to be flow, but also increases operation risk. In this paper, we propose a method for automatic adjustment of laparoscopic pose based on vision and deep reinforcement learning. Firstly, based on the Deep Q Network framework, the raw laparoscopic image is taken as the only input to estimate the Q values corresponding to joint actions. Then, the surgical instrument pose information used to formulate reward functions is obtained through object-tracking and image-processing technology. Finally, a deep neural network adopted in the Q-value estimation consists of convolutional neural networks for feature extraction and fully connected layers for policy learning. The proposed method is validated in simulation. In different test scenarios, the laparoscopic arm can be well automatically adjusted so that surgical instruments with different postures are in the proper position of the field of view. Simulation results demonstrate the effectiveness of the method in learning the highly non-linear mapping between laparoscopic images and the optimal action policy of a laparoscopic arm.
APA, Harvard, Vancouver, ISO, and other styles
4

Okafor, Ekene G., Daniel Udekwe, Osichinaka C. Ubadike, Emmanuel Okafor, Paul O. Jemitola, and Mohammed T. Abba. "Photovoltaic System MPPT Evaluation Using Classical, Meta-Heuristics, and Reinforcement Learning-Based Controllers: A Comparative Study." Journal of Southwest Jiaotong University 56, no. 3 (June 30, 2021): 1–17. http://dx.doi.org/10.35741/issn.0258-2724.56.3.1.

Full text
Abstract:
Maximum power point tracking (MPPT) entails constraining photovoltaic (PV) modules to operate under a specified power condition. It has previously been shown that some meta-heuristic techniques often suffer from steady-state oscillations around maximum points and experience difficulty in adapting to environmental variations, such as irradiation and/or temperature. To address the aforementioned limitation, this work proposed an adaptable reinforcement learning (RL) technique based on a novel deep deterministic policy gradient (DDPG) agent and a reward function. The actor–network top layer uses a sigmoid activation function and the critic–network contains bottleneck layers with non-uniform nodal distributions as well as exponential linear unit (ELU) activation functions in some of the layers. The RL based on DDPG method was compared with Particle Swarm Optimization (PSO) and Perturb-and-Observe (P&O) in order to determine the optimal duty-cycle command needed for controlling the PV modules MPPT. All the investigated systems were implemented in MATLAB/Simulink. The results show that the proposed RL technique based on DDPG agent yielded superior tracking efficiency than all the other approaches. However, as the step change in irradiation at a constant temperature increases, the RL technique based on DDPG agent shows a decrease in tracking efficiency.
APA, Harvard, Vancouver, ISO, and other styles
5

Kukreja, Vinay. "Hybrid fuzzy AHP–TOPSIS approach to prioritizing solutions for inverse reinforcement learning." Complex & Intelligent Systems, July 20, 2022. http://dx.doi.org/10.1007/s40747-022-00807-5.

Full text
Abstract:
AbstractReinforcement learning (RL) techniques nurture building up solutions for sequential decision-making problems under uncertainty and ambiguity. RL has agents with a reward function that interacts with a dynamic environment to find out an optimal policy. There are problems associated with RL like the reward function should be specified in advance, design difficulties and unable to handle large complex problems, etc. This led to the development of inverse reinforcement learning (IRL). IRL also suffers from many problems in real life like robust reward functions, ill-posed problems, etc., and different solutions have been proposed to solve these problems like maximum entropy, support for multiple rewards and non-linear reward functions, etc. There are majorly eight problems associated with IRL and eight solutions have been proposed to solve IRL problems. This paper has proposed a hybrid fuzzy AHP–TOPSIS approach to prioritize the solutions while implementing IRL. Fuzzy Analytical Hierarchical Process (FAHP) is used to get the weights of identified problems. The relative accuracy and root-mean-squared error using FAHP are 97.74 and 0.0349, respectively. Fuzzy Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) uses these FAHP weights to prioritize the solutions. The most significant problem in IRL implementation is of ‘lack of robust reward functions’ weighting 0.180, whereas the most significant solution in IRL implementation is ‘Supports optimal policy and rewards functions along with stochastic transition models’ having closeness of coefficient (CofC) value of 0.967156846.
APA, Harvard, Vancouver, ISO, and other styles
6

Lauri, Mikko, Joni Pajarinen, and Jan Peters. "Multi-agent active information gathering in discrete and continuous-state decentralized POMDPs by policy graph improvement." Autonomous Agents and Multi-Agent Systems 34, no. 2 (June 10, 2020). http://dx.doi.org/10.1007/s10458-020-09467-6.

Full text
Abstract:
Abstract Decentralized policies for information gathering are required when multiple autonomous agents are deployed to collect data about a phenomenon of interest when constant communication cannot be assumed. This is common in tasks involving information gathering with multiple independently operating sensor devices that may operate over large physical distances, such as unmanned aerial vehicles, or in communication limited environments such as in the case of autonomous underwater vehicles. In this paper, we frame the information gathering task as a general decentralized partially observable Markov decision process (Dec-POMDP). The Dec-POMDP is a principled model for co-operative decentralized multi-agent decision-making. An optimal solution of a Dec-POMDP is a set of local policies, one for each agent, which maximizes the expected sum of rewards over time. In contrast to most prior work on Dec-POMDPs, we set the reward as a non-linear function of the agents’ state information, for example the negative Shannon entropy. We argue that such reward functions are well-suited for decentralized information gathering problems. We prove that if the reward function is convex, then the finite-horizon value function of the Dec-POMDP is also convex. We propose the first heuristic anytime algorithm for information gathering Dec-POMDPs, and empirically prove its effectiveness by solving discrete problems an order of magnitude larger than previous state-of-the-art. We also propose an extension to continuous-state problems with finite action and observation spaces by employing particle filtering. The effectiveness of the proposed algorithms is verified in domains such as decentralized target tracking, scientific survey planning, and signal source localization.
APA, Harvard, Vancouver, ISO, and other styles
7

Oehrn, Carina R., Lena Molitor, Kristina Krause, Hauke Niehaus, Laura Schmidt, Lukas Hakel, Lars Timmermann, Katja Menzler, Susanne Knake, and Immo Weber. "Non-invasive vagus nerve stimulation in epilepsy patients enhances cooperative behavior in the prisoner’s dilemma task." Scientific Reports 12, no. 1 (June 17, 2022). http://dx.doi.org/10.1038/s41598-022-14237-3.

Full text
Abstract:
AbstractThe vagus nerve constitutes a key link between the autonomic and the central nervous system. Previous studies provide evidence for the impact of vagal activity on distinct cognitive processes including functions related to social cognition. Recent studies in animals and humans show that vagus nerve stimulation is associated with enhanced reward-seeking and dopamine-release in the brain. Social interaction recruits similar brain circuits to reward processing. We hypothesize that vagus nerve stimulation (VNS) boosts rewarding aspects of social behavior and compare the impact of transcutaneous VNS (tVNS) and sham stimulation on social interaction in 19 epilepsy patients in a double-blind pseudo-randomized study with cross-over design. Using a well-established paradigm, i.e., the prisoner’s dilemma, we investigate effects of stimulation on cooperative behavior, as well as interactions of stimulation effects with patient characteristics. A repeated-measures ANOVA and a linear mixed-effects model provide converging evidence that tVNS boosts cooperation. Post-hoc correlations reveal that this effect varies as a function of neuroticism, a personality trait linked to the dopaminergic system. Behavioral modeling indicates that tVNS induces a behavioral starting bias towards cooperation, which is independent of the decision process. This study provides evidence for the causal influence of vagus nerve activity on social interaction.
APA, Harvard, Vancouver, ISO, and other styles
8

Di Fiore, Francesco, and Laura Mainini. "NM-MF: Non-Myopic Multifidelity Framework for Constrained Multi-Regime Aerodynamic Optimization." AIAA Journal, January 1, 2023, 1–11. http://dx.doi.org/10.2514/1.j062219.

Full text
Abstract:
The exploration and trade-off analysis of different aerodynamic design configurations requires solving optimization problems. The major bottleneck to assess the optimal design is the large number of time-consuming evaluations of high-fidelity computational fluid dynamics (CFD) models, necessary to capture the non-linear phenomena and discontinuities that occur at higher Mach number regimes. To address this limitation, we introduce an original non-myopic multifidelity Bayesian framework aimed at including expensive high-fidelity CFD simulations for the optimization of the aerodynamic design. Our scheme proposes a novel two-step lookahead policy to maximize the improvement of the solution quality considering the rewards of future steps, and combines it with utility functions informed by the fluid dynamic regime and the information extracted from data, to wisely select the aerodynamic model to interrogate. We validate the proposed framework for the case of a constrained drag coefficient optimization problem of a NACA 0012 airfoil, and compare the results to other popular multifidelity and single-fidelity optimization frameworks. The results suggest that our strategy outperforms the other approaches, allowing to significantly reduce the drag coefficient through a principled selection of limited evaluations of the high-fidelity CFD model.
APA, Harvard, Vancouver, ISO, and other styles
9

Thompson, Jason, Ken S. McAllister, and Judd Ethan Ruggill. "Onward Through the Fog: Computer Game Collection and the Play of Obsolescence." M/C Journal 12, no. 3 (July 15, 2009). http://dx.doi.org/10.5204/mcj.155.

Full text
Abstract:
In Mardi and a Voyage Thither, novelist Herman Melville writes of the peculiar and startling confluence of memory, objects, valuation, and disfigurement that mark the collector of obsoletia. The story’s antiquary is the picture of perverse depletion, with a body “crooked, and dwarfed, and surmounted by a hump, that sat on his back like a burden” (328), his hut in shambles, and “the precious antiques, and curios, and obsoletes”—the objects of his collection—“strewn about, all dusty and disordered” (329). This unkempt display cum impromptu museum turns out to present a mere fraction of the curator’s collection, the rest of which is host to countless subtle molds and ravenous worms in a vast catacomb below ground. Traversing this darkened vault, one visitor says, is “like going down to posterity” (332). As inveterate accumulators ourselves, we can certainly relate to Mardi’s "extraordinary antiquarian": pursuing obsolete things has transformed us too (though hopefully not quite so hideously), as well as the work we do and the spaces we do it in. Since 1999, we have been collecting—and subsequently lending out to scholars the world over—computer games, systems, and game-related paraphernalia. By recent estimates, our Learning Games Initiative Archive contains more than 20,000 artifacts, from Venezuelan Pong clones to Mario-themed lollipops. Archival work at this scale and with this diversity is not easy, and it constantly butts up against a host of intractable questions. For example, what does it mean to isolate a thing that no longer has its original value but has taken on a new one? When researchers hold such transmuted artifacts up for inspection, what are they looking for and how might archivists help them to find it? Is the primary work of computer game archivists (and indeed archivists of all types) to protect artifacts from the elements, to enjoin them upon their kin, and to guard over the collection for the sake of some abstract posterity, or is it something more collaborative and communal? Finally, is it possible for research-oriented collectors to engage the process of collection without suffering the deformations of skin and soul (not to mention pocketbook) that often plague the more solipsistic acquirer? We offer this article as an entrée to these questions, as a way to begin to attend to some of the theoretical and practical complexities of obsolescence and its negotiation. We do so primarily by focusing on where those complexities intersect with computer games, the new media we collect and study. Circuitous Obsolescence Melville finished Mardi in 1849, almost fifty years after Joseph Marie Jacquard invented the programmable loom and twelve years after Charles Babbage theorised the possibility of a programmable mechanical computer. The subsequent history of the development of the modern computer and its applications (including computer games) typically gets told as a narrative of technological novelty followed by ineluctable obsolescence—Herman Hollerith’s tabulator to Konrad Zuse’s Z3 to the US Army’s Electronic Numerical Integrator and Computer (ENIAC) and so on. This kind of monumentalised and narrativised history exemplifies an onward march much fetishised by the marketplace: once introduced, a given technology will be developed then updated, upgraded, and improved, inevitably producing a staggering wake of tired-and-true archaeological assemblages. These cast-offs, however, are only useless to those who prefer to consume at the cutting edge, and even that is an illusory experience. Like a well-designed knife whose business end is supported by the stout spine behind it, the edgiest of today’s computer games and peripherals—from the most non-directive sandbox titles to the most obscene add-ons—are merely vanguards to a half-century of industrial history. In etymological terms, “obsolete” captures the conundrum well. A combination of ob (away) and solere (to be used to, accustomed), the word “obsolete” has at least four distinct meanings: “no longer used or practiced”; “worn away, dilapidated, atrophied”; “indistinct, hardly perceptible, vestigial”; and as a noun, “A thing which is out of date or has fallen into disuse.” In each usage, present and past are both integral and palpable. As archivists, we appreciate this temporal distillation because it illustrates how seamless yet discernable is the paradoxical binding between old and new. “Obsolescence” thus functions like a rhetorical ouroboros, ensuring that reflection on the antique reveals the avant-garde and vice-versa. Consider, for example, the Atari 2600 paddle. Compared to a PlayStation 3 controller, with its variety of buttons, sticks, and pads—and the re-mapability of all these input elements—the single potentiometer and button of the paddle seem downright antiquated. Moreover, because Atari hardware in general has largely faded from mainstream use (though it has a remarkable half-life in collectible markets), the paddle is mostly neglected by contemporary players and pundits alike, in the process revealing another obsolescence: the static state that accompanies disuse—the waiting nonlife of discarded technology. The paddle's first obsolescence—the supplantation of the state of the art—signifies a moment of loss. An obsolete computer game controller is one that no longer holds or is capable of provoking the novelty necessary to stake a claim on wonder, or at least that part of wonder engendered in the playing of the newest game on the newest console—the farthest distance from technological obsolescence. The paddle's second obsolescence—disuse—signifies potential: when a newer system (e.g., PlayStation 3) supersedes an older one (e.g., Atari 2600), the older one will often sit like a fact in benighted spaces such as attics, thrift stores, garages, and closets—all prime hunting grounds for computer game collectors. The ephemera that for most people drift toward oblivion get picked up by archivists and cleaned off, catalogued, stored, studied, used, and reused. Trash becomes treasure, obsolescence newness and utility. And yet, obsolescence is not solely in the eye of the beholder, as it were; it is also in the hand, which further complicates the concept. Because obsolescence calls on the familiar in a pejorative sense—the obsolete thing has become too familiar (it now lacks novelty and surprise)—it is easy to overlook the necessity of familiarity (and thus obsolescence) to computer game development and play. After all, play demands familiarity as well as novelty; deeply complex and satisfying tasks—the kind the best play sets out and rewards generously—can only be accomplished with a level of mastery, of skill born of familiarity born of practice. Just as metaphors, in order to be successful, must merge the known with the unknown in an instantaneous insight that reveals fresh understanding, so too must computer games blend the tried and true with a twist to provoke profound and prolonged play. Computer games must always be the same, only different, familiar enough to be recognisable as forms, but new enough to create wonder as ludica. In the elegant prose of game scholar Roger Caillois, [games] must be like the leaves on the trees which survive from one season to the next and remain identical. Games must be ever similar to animal skins, the design on butterfly wings, and the spiral curves of shell fish which are transmitted unchanged from generation to generation. However, games do not have this hereditary sameness. They are innumerable and changeable. They are clad in thousands of unequally distributed shapes, just as vegetable species are, but infinitely more adaptable, spreading and acclimating themselves with disconcerting ease. (81) All this is what makes computer games so difficult to collect and study, to preserve and produce. They are always already both obsolete and pioneering. Memory as the Arbiter of Obsolescence Despite its plasticity, the concept of obsolescence offers a kind of security to its invoker: in theory, functionality and use follow a clean, linear progression. Accordingly, obsolescence can be seen not only as a thin pretext to justify a rabid consumerist desire for newness, but also as a brief memorial, a marker of passing, one that reaffirms an orderly universe and transfers a degree of security to those who witness its passing. As Aristotle explains, “the criterion of ‘security’ is the ownership of property in such places and under such conditions that the use of it is in our power; and it is ‘our own’ if it is in our power to dispose of it or keep it” (1341). Security is thus the power of alienation, and calling on the concept of obsolescence encourages the exercise of that power. Indeed, as theorist and collector Walter Benjamin argues, “The most profound enchantment for the collector is the locking of individual items within a magic circle in which they are fixed as the final thrill, the thrill of acquisition, passes over them” (62). This magic circle is really no different from the one play sociologist Johann Huizinga uses to describe the “temporary worlds” that can be carved out of the workaday one, worlds created and encapsulated by the rules and possibilities of play. There is, in fact, a powerful parallel between play and collecting, with each territorialising and deterritorialising the practice of materiality and its pleasures. For the collector, the magic circle not only encompasses the library or archive, but potentially the world, harboring as it does the possibility of a "complete collection," however obscured or damaged such a collection might be. This magic circle can also be constructed anywhere, and out of anything because the collector is a playful, nearly absurd, hunter of things whose best work occurs on the road: “I have made my most memorable purchases on trips, as a transient. Property and possession belong to the tactical sphere” (Benjamin 64). For computer game collectors especially, the circumference of the magic circle grows not with the size of a collection but with the imaginative ability to learn how to unsee what she or he has been taught to see as obsolete by industry and popular culture both: industrial, ludic, aesthetic, narratological, and ideological design. It is thus memory—in its alembic ability to make and unmake, to be made and unmade—that is the ultimate arbiter of obsolescence. From this perspective, all that is obsolete fashions a kind of infinite immemorial compendium of “what has been” that makes “what is” possible. Benjamin calls this a “magic encyclopedia,” an expansive tome for the archivist that contains “The period, the region, the craftsmanship, the former ownership—for a true collector the whole background of an item” that constitutes its being both in and beyond its present time and place (62). Vivacious Obsolescence Memory notwithstanding, the crux of computer game collection—the problematic that makes both body and mind “crooked and dwarfed”—is the timelessness of play itself. What is "old" play, for example? The kind found in Missile Command (Atari, 1980) or other golden age arcade game? Perhaps, but is this play still old when it is brought to a new platform and new audiences (e.g., http://macmost.com/iphonegames/MissileCommand.html)? What of the computer game consoles that facilitate play? Surely they grow obsolete, replaced every several years by newer, more advanced incarnations. And yet in the homebrew, retro, and collectible markets, it is the new things, the new playables that are strangely obsolete and undesirable. They are merely extant, whereas reconfigurations of old machines require imaginative new remediations in order to work and to satisfy. Older technologies and the play they enable are what are very much alive and on the cusp; these things, not their newer cousins, are the source of interest, value, experimentation, discourse, and play, that is, they are the cutting edge. So what, then, does it mean to collect and study obsoletia when the play intrinsic to them thwarts obsolescence at every turn? For computer game collectors, the answer is that ultimately there can be no difference between fad and fashion, prototype and stereotype. Obsolescence is a dynamic and incomplete designation because computer games do not age in quite the same way as do other things. The power and potential of a game archive is therefore overwhelming as well as invigorating, offering the rare but challenging chance not only to tame something wild (temporarily at least), but also to perform resurrections, bringing the old dead into new life. Computer game archivists thus trade daily in vivacious obsolescence, reveling in the defiance of moribundity in which their artifacts partake. Still, this liveliness creates other problems. How, for instance, does one organise the contents of an archive that can be categorised in so many ways (e.g., age, developer, play styles, content genres, system, audio-visual aesthetic, and so on)? What is the appropriate taxonomic way of seeing technological and ludic history when the artifacts that embody this history are constantly being made and remade, not only by scholars and historians, but also by subcultures, franchise agents, and myriad avenues of pop culture reappropriation? What does it mean for knowledge work when newness and obsolescence persist in equal measure in the same artifact? The answers to such questions are, of course, only ever temporary and never more than rickety. In the words of Benjamin, “[T]his or any other procedure is merely a dam against the springtide of memories which surges toward any collector as he contemplates his possessions” (61). The art of collection itself is one of defiance in the face of insurmountable complexity and multiplying articulations, which in the end is perhaps the real pleasure of collecting. The trial before computer game collectors is to have a sturdy boat at the ready, one capable of enduring that surging springtide to which Benjamin refers, when the well-disciplined dam of categorical judgments and explanatory structures—itself always already obsolete—inevitably breaks apart.References Aristotle. Rhetoric. Trans. W. Rhys Roberts. In The Basic Works of Aristotle. Ed. Richard McKeon. New York: Random House, 2001. 1325-1451. Benjamin, Walter. Illuminations. London: Pimlico: 1999. Caillois, Roger. Man, Play and Games. Urbana: University of Illinois, 2001. Huizinga, Johan. Homo Ludens: A Study of the Play Element in Culture. Boston: Beacon, 1955. Melville, Herbert. Mardi and a Voyage Thither. Ed. Nathalia Wright. Putney: Hendricks House, 1990.
APA, Harvard, Vancouver, ISO, and other styles
10

Grandinetti, Justin Joseph. "A Question of Time: HQ Trivia and Mobile Streaming Temporality." M/C Journal 22, no. 6 (December 4, 2019). http://dx.doi.org/10.5204/mcj.1601.

Full text
Abstract:
One of the commonplace and myopic reactions to the rise of televisual time-shifting via video-on-demand, DVD rental services, illegal downloads, and streaming media was to decree “the death of the communal television experience”. For many, new forms of watching television unconstrained by time-bound, regularly scheduled programming meant the demise of the predominant form of media liveness that existed commercially since the 1950s. Nevertheless, as time-shifting practices evolved, so have attendant notions of televisual temporality—including changing forms of liveness, shared experience, and the plastic and flexible nature of new viewing patterns (Bury & Li; Irani, Jefferies, & Knight; Turner; Couldry). Although these temporal conceptualisations are relevant to streaming media, in the few years since the launch of platforms such as Netflix, Hulu, and Amazon, what it means “to stream” has rapidly expanded. Social media platforms like Twitter, Facebook, Snapchat, YouTube, and TikTok allow users to record, share, and livestream their own content. Not only does social media add to the growing definition of streaming, but these streaming interactions are also predominately mobile (Munson; Droesch). Taken together, a live and social experience of time via audio-visual media is not lost but is instead reactivated through the increasingly mobile nature of streaming. In the following article, I examine how mobile streaming media practices are part of a construction of shared temporality that both draws upon and departs from conceptualisations of televisual and fixed streaming liveness. Accordingly, HQ Trivia—a mobile-specific streaming gameshow app launched in August 2017—demonstrates novel attempts at reimagining the temporally-bound live televisual experience while simultaneously offering new monetisation strategies via mobile streaming technologies. Through this example, I argue that pervasive Web-connectivity, streaming platforms, data collection, mobile devices, and mobile streaming practices form arrangements of valorisation that are temporally bound yet concomitantly mobile, allowing new forms of social cohesion and temporal control.A Brief History of Televisual TemporalityTime is at once something infinitely mysterious and inherently understood. As John Durham Peters concisely explains, “time lies at the heart of the meaning of our lives” (175). It is precisely due to the myriad ontological, phenomenological, and epistemological dimensions of time that the subject has long been the focus of critical inquiry. As part of the so-called spatial turn, Michel Foucault argues that theory formerly treated space as “the dead, the fixed, the undialectical, the immobile. Time, on the contrary, was richness, fecundity, life, dialectic” (70). While scholarly turns toward space and later mobility have shifted the emphasis of critical inquiry, time is not rendered irrelevant. For example, Doreen Massey defines spaces as the product of interrelations, as sphere of possibility and heterogeneous multiplicity, and as always under construction (9). Critical to these conceptualisations of space, then, is the element of time. Considering space not as a static container in which individual actors enter and leave but instead as a production of ongoing becoming demonstrates how space, mobility, and time are inexorably intertwined. Time, space, and mobility are also interrelated when it comes to conversations of power. Judy Wajcman and Nigel Dodd contend that temporal control is related to dynamics of power, in that the powerful are fast and the powerless slow (3). Questions of speed, mobility, and the control of time itself, however, require attention to the media that help construct time. Aspects of time may always escape human comprehension, yet, “Whatever time is, calendars and clocks measure, control, and constitute it” (Peters 176). Time is a sociotechnical construction, but temporal experience is bound up in more than just time-keeping apparatuses. Elucidated by Sarah Sharma, temporalities are not experienced as uniform time, but instead produced within larger economies of labor and temporal worth (8). To reach a more productive understanding of temporalities, Sharma offers power-chronography, which conceptualises time as experiential, political, and produced by social differences and institutions (15). Put another way, time is an experience structured by the social, economic, political, and technical toward forms of social cohesion and control.Time has always been central to the televisual. Though it is often placed in a genealogy with film, William Uricchio contends that early discursive imaginings and material experiments in television are more indebted to technologies such as the telegraph and telephone in promising live and simultaneous communication across distances (289-291). In essence, film is a technology of storage, related to 18th- and 19th-century traditions of conceptualising time as fragmented; the televisual is instead associated with the “contrasting notion of time conceived as a continuous present, as flow, as seamless” (Uricchio 295). Responding to Uricchio, Doron Galili asserts that the relationship between film and television is dialectical and not hierarchical. For Galili, the desire for simultaneity and storage oscillates—both are present, both remain separate from one another. It is the synthesis of simultaneity and storage that allows both to operate together as a technological and mediated vision of mastering time. Despite disagreements regarding how best to conceptualise early film and television, it is clear that the televisual furthered a desire for spatial and temporal coordination, liveness, and simultaneity.In recent years, forms of televisual “time-shifting” allow viewers to escape temporally-bound scheduling. In what is commonly periodised as TVIII, the proliferation of digital platforms, video-on-demand, legal and illegal downloads, and DVD players, and streaming media displaced more traditional forms of watching live television (Jenner 259). It is important to note that while streaming is often related to the televisual, the televisual-to-streaming shift is not a clean linear evolution. Televisual-style content persists in streaming, but streaming might be better defined as matrix media, where content is made available away from the television set (Jenner 260). Regardless, the rise of streaming media platforms such as Netflix, Hulu, and Amazon Prime is commonly framed as part of televisual temporal disruption, as scholars note the growing plurality of televisual-type viewing options (Bury and Li 594). Further still, streaming platforms are often defined as television, a recent example occurring when Netflix CEO Reed Hastings called the service a “global Internet TV network” in 2016.The changing landscape of streaming and time-shifting notwithstanding, individuals remain aware of the viewing patterns of others, and this anticipation impacts the coordination and production of the collective television experience (Irani, Jeffries, and Knight 621). Related to this goal is how liveness connects viewers to shared social realities as they are occurring and helps to create a collective sense of time (Couldry 355-356). This shared experience of the social is still readily available in a time-shifted landscape, in that even shows released via an all-at-once format (for example, Netflix’s Stranger Things) can rapidly become a cultural phenomenon. Moreover, livestreaming has become commonplace as alternative to cable television for live events and sports, along with new uses for gaming and social media. As Graeme Turner notes, “if liveness includes a sense of the shrinking temporal gap between oneself and the rest of the world, as well as a palpable sense of immediacy, then this is something we can find as readily online as in television”. To this end, the claim that streaming media is harbinger of the “death of liveness” is far too simplistic. Liveness vis-à-vis streaming is not something that ceases to exist—shared temporal experiences simply occur in new forms.HQ TriviaOne such strategy to reactive a more traditional form of televisual liveness through streaming is to make streaming more social and mobile. Launched in August 2017, HQ Trivia (later retitled HQ Trivia and Words) requires users, known as HQties, to download the app and log in at 3.00 pm and 9.00 pm Eastern Standard Time to join a live gameshow. In each session, gameshow hosts ask a series of 12 single-elimination questions with three answer choices. Any users who successfully answer all 12 questions correctly split the prize pool for the show, which ranges from $250 to $250,000. Though these monetary prizes appear substantial, the per-person winnings paid out are often quite low based on the number winners splitting the pool. In the short time since its inception, HQ has had high and low audience participation numbers and has also spawned a myriad of imitators, including Facebook’s “Confetti” gameshow.Mobile streaming via trivia gameshows are a return to forms of televisual liveness and participation often disrupted by the flexible nature of streaming. HQ’s twice-a-day events require users to re-adapt to temporal constraints to play and participate. Just as intriguing is that “HQ sees its biggest user participation—and largest prizes—on Sundays, especially if games coincide with national events, such as holidays, sports games or award shows” (Alcantara). Though it is difficult to draw conclusions from this correlation, the fact that HQ garners more players and attention during events and holidays complicates notions of mobile trivia as a primary form of entertainment. It is possible, perhaps, that HQ is an evolution to the so-called second screen experience, in which a mobile device is used simultaneously with a television. As noted by Hye-Jin Lee and Mark Andrejevic, the rise of the second screen often enables real-time monitoring, customisation, and targeting that is envisioned by the promoters of the interactive commercial economy (41). Second screens are a way to reestablish live-viewing and, by extension, advertising through the importance of affective economies (46). Affect, or a preconscious structure of feeling, is critical to platform monetisation, in that the capture of big data requires an infrastructuralisation of desire—in streaming media often a desire for entertainment (Cockayne 6). Through affective capture, users become willing to repeat certain actions via love for and connection to a platform. Put another way, big data collection and processing is often the central monetisation strategy of platforms, but capturing this data requires first cultivating user attachment and repeat actions.To this end, many platforms operate by encouraging as much user engagement as possible. HQ certainly endeavors for strong affective investment by users (a video search for “HQ Trivia winner reactions” demonstrates the often-zealous nature of HQties, even when winning relatively low amounts of prize money). However, HQ departs from the typical platform streaming model in that engagement with the app is limited to two games per day. These comparatively diminutive temporal appointments have substantial implications for HQ’s strategies of valorisation, or the process of apprehending and making productive the user as laborer in new times and spaces (Franklin 13). Media theorists have long acknowledged the “work of watching” television, in which the televisual is “a real economic process, a value-creating process, and a metaphor, a reflection of value creation in the economy as a whole” (Jhally and Livant 125). Televisual monetisation is predominately based on the advertising model, which functions to accelerate the selling of commodities. This configuration of capital accumulation is enabled by a lineage of privatisation of broadcasting; television is heralded as a triumph of deregulation, but in practice is an oligopolistic, advertising-supported system of electronic media aided by government policies (Streeter 175). By contrast, streaming media accomplishes capitalistic accumulation through the collection, storage, and processing of big data via cloud infrastructure. Cloud infrastructure enables unprecedented storage and analytic capacity, and is heavily utilised in streaming media to compress and transmit data packets.Although the metaphor of the cloud situates user data as ephemeral and free, these infrastructures are better conceptualised as a “digital enclosure”, which invokes the importance of privatisation and commodification, as well as the materiality and spatiality of data collection (Andrejevic 297). As such, streaming monetisation is often achieved through the multitude of monetisation possibilities that occur through the collection of vast amounts of user data. Streaming and mobile streaming, then, are similar to the televisual in that these processes monetise the work of watching; yet, the ubiquitous data collection of streaming permits more efficient forms of computational commodification.Mobile streaming media continues the lineage of ubiquitous immaterial labor—a labor form that can, and commonly is, accomplished by “filling the cracks” of non-work time with content engagement and accompanying data collection. HQ Trivia, nevertheless, functions as a notable departure from this model in that company has made public claims that the platform will not utilise the myriad user identification and location data collected by the app. Instead, HQ has engaged in brand promotions that include Warner Brothers movies Ready Player One and Rampage, along with a brief Nike partnership (Feldman; Perry). Here, mobile and temporal valorisation occurs through monetisation strategies more akin to traditional televisual advertising than the techniques of big data collection often utilised by platforms. Whether or not eschewing the proclivity toward monetising user data for a more traditional form of brand promotion will yield rewards for HQ remains to be seen. Nonetheless, this return to more conventional televisual monetisation strategies sets HQ apart from many other applications that rely on data collection and subsequent sale of user data for targeted advertisements.Affective attachment and the transformation of leisure times through mobile devices is critical not just to value generation, but also to the relationship between mobile streaming and temporal and mobile control. As previously noted, Sharma elucidates that time is part of biopolitical forms of control, produced and experienced differently. Nick Couldry echoes these sentiments, in that there are rival forms of liveness stemming from a desire for connectivity, and that these “types of liveness are now pulling in different directions” (360). Despite common positionings, the relationship between television and streaming media is not a neat linear evolution—television, streaming, and mobile streaming continue to operate both side-by-side and in conjunction with one another. The experience of time, nevertheless, operates differently in these media forms. Explained by Wendy Chun, television structures temporality through steady streams of information, the condensation of time that demands response in crisis, and the most powerful moments of “touching the real” via catastrophe (74). New media differs by instead fostering crisis as the norm, in that “crises promise to move users from banal to the crucial by offering the experience of something like responsibility; something like the consequences and joys of ‘being in touch’” (Chun 75). New media crisis is often felt via reminders and other increasingly pervasive prompts that require an immediate user response. HQ differs from other forms of streaming and mobile streaming in that the plastic and flexible nature of viewing is replaced by mobile notifications and reminders that one must be ready for twice-daily games or risk losing a chance to win.In contributing to a sense of new media crisis, HQ fosters novel expectations for the mobile streaming subject. Through temporally-bound mobile livestreaming, “networked smart screens are the mechanism by which time and space will be both overcome and reanimated” as the “real world” is transformed into a magical landscape of mobile desire (Oswald and Packer 286). There is a double-edged element to this transformation, however, in that power of HQ Trivia is the ability to reanimate space through a promise that users are able to win substantial prize money only if one remembers to tune in at certain times. Within HQ Trivia, the much-emphasised temporal freedom of streaming time-shifting is eschewed for more traditional forms of televisual liveness; at the same time, smartphone technologies permit mobile on-the-go forms of engagement. Accordingly, a more traditional televisual simultaneity reemerges even as the spaces of streaming are untethered from the living room. It is in this reemphasis of liveness and sharedness that the user is simultaneously empowered vis-à-vis mobile devices and made mobile streaming subject through new temporal expectations and forms of monetisation.As mobile streaming becomes increasingly pervasive, new experimental applications jockey for user attention and time. HQ Trivia’s model of eschewing data collection for more traditional televisual monetisation represents attempts to recreate mobile media engagement not through individual isolated audio-visual practices, but instead through a live and mobile experience. Consequently, HQ Trivia and other temporally-bound gameshow apps demonstrate a reimagined live televisual experience, and, in turn, a monetisation of mobile engagement through affective investment.ReferencesAlcantara, Chris. “Diving into HQ Trivia: The Toughest Rounds, the Best Time to Play and How Some Users Beat the Odds.” The Washington Post 5 Mar. 2018. <http://www.washingtonpost.com/graphics/2018/business/hq-trivia/?utm_term=.02dc389ae3a9>.Andrejevic, Mark. “Surveillance in the Digital Enclosure.” The Communication Review 10.4 (2007): 295-317.Bury, Rhiannon, and Johnson Li. “Is It Live or Is It Timeshifted, Streamed or Downloaded? Watching Television in the Era of Multiple Screens.” New Media & Society 17.4 (2013): 592-610.Chun, Wendy Hui Kyong. Updating to Remain the Same: Habitual New Media. Cambridge: MIT Press, 2017.Cockayne, Daniel G. “Affect and Value in Critical Examinations of the Production and ‘Prosumption’ of Big Data.” Big Data & Society 3.2 (2016): 1-11.Couldry, Nick. “Liveness, ‘Reality,’ and the Mediated Habitus from Television to the Mobile Phone.” Communication Review 7.4 (2004): 353-361.Droesch, Blake. “More than Half of US Social Network Users Will Be Mobile-Only in 2019.” EMarketer 26 Apr. 2019. <http://www.emarketer.com/content/more-than-half-of-social-network-users-will-be-mobile-only-in-2019>.Franklin, Seb. Control: Digitality as Cultural Logic. Cambridge: MIT Press, 2015.Galili, Doron. “Seeing by Electricity: The Emergence of Television and the Modern Mediascape, 1878—1939.” PhD dissertation. Chicago: U of Chicago, 2011.Irani, Lilly, Robin Jeffries, and Andrea Knight. “Rhythms and Plasticity: Television Temporality at Home.” Personal and Ubiquitous Computing 14.7 (2010): 621-632.Jenner, Mareike. “Is This TVIV? On Netflix, TVIII and Binge-Watching.” New Media & Society 18.2 (2014): 257-273.Jhally, Sut, and Bill Livant. “Watching as Working: The Valorization of Audience Consciousness.” Journal of Communication 36.3 (1986): 124-143.Lee, Hye-Jin, and Mark Andrejevic. “Second-Screen Theory: From Democratic Surround to the Digital Enclosure.” Connected Viewing: Selling, Streaming & Sharing Media in the Digital Age. Eds. Jennifer Holt and Kevin Sanson. New York: Routledge, 2014. 40-62.Massey, Doreen. For Space. London: Sage, 2005.Munson, Ben. “More than Half of Global Video Views Start on Mobile.” Fierce Video 24 Sep. 2019. <https://www.fiercevideo.com/video/more-than-half-global-video-views-start-mobile-report-says>.Oswald, Kathleen, and Jeremy Packer. “Flow and Mobile Media.” Communication Matters: Materialist Approaches to Media, Mobility and Networks. Eds. Jeremy Packer and Stephen B. Crofts Wiley. New York: Routledge, 2012. 276-287.Perry, Erica. “Here's How HQ Trivia Is Finally Monetizing Its Massive Audience.” Social Media Week 29 Mar. 2018. <http://socialmediaweek.org/blog/2018/03/heres-how-hq-trivia-is-finally-monetizing-its-massive-audience/>.Peters, John Durham. The Marvelous Clouds: Toward a Philosophy of Elemental Media. Chicago: U of Chicago P, 2016.Sharma, Sarah. In the Meantime: Temporality and Cultural Politics. Durham: Duke UP, 2014.Sterling, Greg. “Nearly 80 Percent of Social Media Time Now Spent on Mobile Devices.” Marketing Land 4 Apr. 2016. <http://marketingland.com/facebook-usage-accounts-1-5-minutes-spent-mobile-171561>.Streeter, Thomas. Selling the Air. Chicago: U of Chicago P, 1996.Turner, Graeme. “'Liveness' and 'Sharedness' Outside the Box” Flow Journal 8 (2011). <https://www.flowjournal.org/2011/04/liveness-and-sharedness-outside-the-box/>.Uricchio, William. “Television's First Seventy-Five Years: The Interpretive Flexibility of a Medium in Transition.” The Oxford Handbook of Film and Media Studies. Ed. Robert Kolker. Oxford: Oxford UP, 2008. 286-305.Wajcman, Judy, and Nigel Dodd. “Introduction: The Powerful Are Fast, The Powerless Are Slow.” The Sociology of Speed: Digital, Organizational, and Social Temporalities. Eds. Judy Wajcman and Nigel Dodd. Oxford: Oxford UP, 2017. 1-12.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Non-linear reward functions"

1

Huang, Weiran, Jungseul Ok, Liang Li, and Wei Chen. "Combinatorial Pure Exploration with Continuous and Separable Reward Functions and Its Applications." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/317.

Full text
Abstract:
We study the Combinatorial Pure Exploration problem with Continuous and Separable reward functions (CPE-CS) in the stochastic multi-armed bandit setting. In a CPE-CS instance, we are given several stochastic arms with unknown distributions, as well as a collection of possible decisions. Each decision has a reward according to the distributions of arms. The goal is to identify the decision with the maximum reward, using as few arm samples as possible. The problem generalizes the combinatorial pure exploration problem with linear rewards, which has attracted significant attention in recent years. In this paper, we propose an adaptive learning algorithm for the CPE-CS problem, and analyze its sample complexity. In particular, we introduce a new hardness measure called the consistent optimality hardness, and give both the upper and lower bounds of sample complexity. Moreover, we give examples to demonstrate that our solution has the capacity to deal with non-linear reward functions.
APA, Harvard, Vancouver, ISO, and other styles
2

Ou, Mingdong, Nan Li, Shenghuo Zhu, and Rong Jin. "Multinomial Logit Bandit with Linear Utility Functions." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/361.

Full text
Abstract:
Multinomial logit bandit is a sequential subset selection problem which arises in many applications. In each round, the player selects a K-cardinality subset from N candidate items, and receives a reward which is governed by a multinomial logit (MNL) choice model considering both item utility and substitution property among items. The player's objective is to dynamically learn the parameters of MNL model and maximize cumulative reward over a finite horizon T. This problem faces the exploration-exploitation dilemma, and the involved combinatorial nature makes it non-trivial. In recent years, there have developed some algorithms by exploiting specific characteristics of the MNL model, but all of them estimate the parameters of MNL model separately and incur a regret bound which is not preferred for large candidate set size N. In this paper, we consider the linear utility MNL choice model whose item utilities are represented as linear functions of d-dimension item features, and propose an algorithm, titled LUMB, to exploit the underlying structure. It is proven that the proposed algorithm achieves regret which is free of candidate set size. Experiments show the superiority of the proposed algorithm.
APA, Harvard, Vancouver, ISO, and other styles
3

Brafman, Ronen I., and Giuseppe De Giacomo. "Regular Decision Processes: A Model for Non-Markovian Domains." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/766.

Full text
Abstract:
We introduce and study Regular Decision Processes (RDPs), a new, compact, factored model for domains with non-Markovian dynamics and rewards. In RDPs, transition and reward functions are specified using formulas in linear dynamic logic over finite traces, a language with the expressive power of regular expressions. This allows specifying complex dependence on the past using intuitive and compact formulas, and provides a model that generalizes MDPs and k-order MDPs. RDPs can also approximate POMDPs without having to postulate the existence of hidden variables, and, in principle, can be learned from observations only.
APA, Harvard, Vancouver, ISO, and other styles
4

Meda, Shashwath, Mike Stevens, Erwin Boer, Catherine Boyle, Greg Book, Nicolas Ward, and Godfrey Pearlson. "Brain-behavior relationships of simulated naturalistic automobile driving under the influence of acute cannabis intoxication: A double-blind, placebo-controlled study." In 2022 Annual Scientific Meeting of the Research Society on Marijuana. Research Society on Marijuana, 2022. http://dx.doi.org/10.26828/cannabis.2022.02.000.32.

Full text
Abstract:
Background: Driving is a complex everyday activity that requires the use and integration of different cognitive and psychomotor functions, many of which are known to be affected when under the influence of cannabis (CNB). Given legal implications of drugged-driving and rapidly increasing use of CNB nationwide, there is an urgent need to better understand the effects of CNB on such functions in the context of driving. This longitudinal, double-blind placebo-controlled study investigated the effects of CNB on driving brain-behavior relationships in a controlled simulated environment using functional MRI (fMRI). Methods: N=26 frequent cannabis users were administered 0.5 grams of 13% THC or placebo flower cannabis via a Stortz+Bickel ‘Volcano’ vaporizer using paced inhalation, on separate days at least 1 week apart. On each study day, participants drove a virtual driving simulator (steering wheel, brake, gas pedal) inside an MRI scanner approximately 40 minutes post-dosing. Each fMRI driving session presented a naturalistic simulated environment that unobtrusively engaged drivers with scenarios that tested specific driving skills and response. There were three, approximately 10 min epochs where drivers engaged in task of lane keeping/weaving (LK), lead car following (CF), and safe overtaking (OT). fMRI data were prepared for analyses using the Human Connectome Project pipeline, then subjected to group independent component analysis (ICA) to isolate 50 spatially independent networks. 40 ICA networks were deemed valid and non-noisy. Network regions in these components were identified using 387 parcel locations, incorporating a cortical parcellation atlas (Glasser et al 2016) and detailed subcortical labels. A placebo minus high difference connectivity map was generated for each subject. A similar placebo minus high behavioral score was generated for each subject and then subjected to a principal component analysis (PCA) to reduce it to 8 orthogonal behavioral factors. Of the 8 driving behavior factors, two represented CF events (F1 and F5), three LK (F3, F4, and F8), and three OT (F2, F6, and F7). Driving behavior factors were evaluated for linear association with connectivity maps via FSL’s randomize (p<0.01 FWE-corrected significance). Results:Across all components examined, we found connectivity differences between placebo v high THC within right motion-sensitive visual cortex (parcel FST) (visual) and right superior temporal gyrus (social cognition) to positively correlate with LK driving performance. The strongest brain-behavior relationships were found for OT-related behavioral factors. Connectivity in left dorsolateral parcel a9-46v (cognitive flexibility) and right motor cortex parcel 3b (somatosensory) correlated negatively with F6 (OT). A left superior frontal parcel (higher order cognition/working memory) correlated negatively with F7 (OT) and finally R inferior frontal gyrus (response inhibition and reward deduction) correlated positively with F7 (OT). Conclusion: Our preliminary analyses yield a complex yet informative picture of key brain areas sensitive to acute CNB exposure on different driving behaviors using a simulated environment, further underscoring the impact of substance use on driving as a potential public safety issue.
APA, Harvard, Vancouver, ISO, and other styles
5

Wu, Shuang, Jingyu Zhao, Guangjian Tian, and Jun Wang. "State-Aware Value Function Approximation with Attention Mechanism for Restless Multi-armed Bandits." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/64.

Full text
Abstract:
The restless multi-armed bandit (RMAB) problem is a generalization of the multi-armed bandit with non-stationary rewards. Its optimal solution is intractable due to exponentially large state and action spaces with respect to the number of arms. Existing approximation approaches, e.g., Whittle's index policy, have difficulty in capturing either temporal or spatial factors such as impacts from other arms. We propose considering both factors using the attention mechanism, which has achieved great success in deep learning. Our state-aware value function approximation solution comprises an attention-based value function approximator and a Bellman equation solver. The attention-based coordination module capture both spatial and temporal factors for arm coordination. The Bellman equation solver utilizes the decoupling structure of RMABs to acquire solutions with significantly reduced computation overheads. In particular, the time complexity of our approximation is linear in the number of arms. Finally, we illustrate the effectiveness and investigate the properties of our proposed method with numerical experiments.
APA, Harvard, Vancouver, ISO, and other styles
6

Velasquez, Alvaro. "Steady-State Policy Synthesis for Verifiable Control." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/784.

Full text
Abstract:
In this paper, we introduce the Steady-State Policy Synthesis (SSPS) problem which consists of finding a stochastic decision-making policy that maximizes expected rewards while satisfying a set of asymptotic behavioral specifications. These specifications are determined by the steady-state probability distribution resulting from the Markov chain induced by a given policy. Since such distributions necessitate recurrence, we propose a solution which finds policies that induce recurrent Markov chains within possibly non-recurrent Markov Decision Processes (MDPs). The SSPS problem functions as a generalization of steady-state control, which has been shown to be in PSPACE. We improve upon this result by showing that SSPS is in P via linear programming. Our results are validated using CPLEX simulations on MDPs with over 10000 states. We also prove that the deterministic variant of SSPS is NP-hard.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography