Academic literature on the topic 'Sparse Reward'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Sparse Reward.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Journal articles on the topic "Sparse Reward"
Park, Junseok, Yoonsung Kim, Hee bin Yoo, Min Whoo Lee, Kibeom Kim, Won-Seok Choi, Minsu Lee, and Byoung-Tak Zhang. "Unveiling the Significance of Toddler-Inspired Reward Transition in Goal-Oriented Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 1 (March 24, 2024): 592–600. http://dx.doi.org/10.1609/aaai.v38i1.27815.
Full textXu, Pei, Junge Zhang, Qiyue Yin, Chao Yu, Yaodong Yang, and Kaiqi Huang. "Subspace-Aware Exploration for Sparse-Reward Multi-Agent Tasks." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 10 (June 26, 2023): 11717–25. http://dx.doi.org/10.1609/aaai.v37i10.26384.
Full textMguni, David, Taher Jafferjee, Jianhong Wang, Nicolas Perez-Nieves, Wenbin Song, Feifei Tong, Matthew Taylor, et al. "Learning to Shape Rewards Using a Game of Two Partners." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 10 (June 26, 2023): 11604–12. http://dx.doi.org/10.1609/aaai.v37i10.26371.
Full textMeng, Fanxiao. "Research on Multi-agent Sparse Reward Problem." Highlights in Science, Engineering and Technology 85 (March 13, 2024): 96–103. http://dx.doi.org/10.54097/er0mx710.
Full textZuo, Guoyu, Qishen Zhao, Jiahao Lu, and Jiangeng Li. "Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards." International Journal of Advanced Robotic Systems 17, no. 1 (January 1, 2020): 172988141989834. http://dx.doi.org/10.1177/1729881419898342.
Full textVelasquez, Alvaro, Brett Bissey, Lior Barak, Andre Beckus, Ismail Alkhouri, Daniel Melcer, and George Atia. "Dynamic Automaton-Guided Reward Shaping for Monte Carlo Tree Search." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 13 (May 18, 2021): 12015–23. http://dx.doi.org/10.1609/aaai.v35i13.17427.
Full textCorazza, Jan, Ivan Gavran, and Daniel Neider. "Reinforcement Learning with Stochastic Reward Machines." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 6 (June 28, 2022): 6429–36. http://dx.doi.org/10.1609/aaai.v36i6.20594.
Full textGaina, Raluca D., Simon M. Lucas, and Diego Pérez-Liébana. "Tackling Sparse Rewards in Real-Time Games with Statistical Forward Planning Methods." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 1691–98. http://dx.doi.org/10.1609/aaai.v33i01.33011691.
Full textZhou, Xiao, Song Zhou, Xingang Mou, and Yi He. "Multirobot Collaborative Pursuit Target Robot by Improved MADDPG." Computational Intelligence and Neuroscience 2022 (February 25, 2022): 1–10. http://dx.doi.org/10.1155/2022/4757394.
Full textJiang, Jiechuan, and Zongqing Lu. "Generative Exploration and Exploitation." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (April 3, 2020): 4337–44. http://dx.doi.org/10.1609/aaai.v34i04.5858.
Full textDissertations / Theses on the topic "Sparse Reward"
Hanski, Jari, and Kaan Baris Biçak. "An Evaluation of the Unity Machine Learning Agents Toolkit in Dense and Sparse Reward Video Game Environments." Thesis, Uppsala universitet, Institutionen för speldesign, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-444982.
Full textCastanet, Nicolas. "Automatic state representation and goal selection in unsupervised reinforcement learning." Electronic Thesis or Diss., Sorbonne université, 2025. http://www.theses.fr/2025SORUS005.
Full textIn the past few years, Reinforcement Learning (RL) achieved tremendous success by training specialized agents owning the ability to drastically exceed human performance in complex games like Chess or Go, or in robotics applications. These agents often lack versatility, requiring human engineering to design their behavior for specific tasks with predefined reward signal, limiting their ability to handle new circumstances. This agent's specialization results in poor generalization capabilities, which make them vulnerable to small variations of external factors and adversarial attacks. A long term objective in artificial intelligence research is to move beyond today's specialized RL agents toward more generalist systems endowed with the capability to adapt in real time to unpredictable external factors and to new downstream tasks. This work aims in this direction, tackling unsupervised reinforcement learning problems, a framework where agents are not provided with external rewards, and thus must autonomously learn new tasks throughout their lifespan, guided by intrinsic motivations. The concept of intrinsic motivation arise from our understanding of humans ability to exhibit certain self-sufficient behaviors during their development, such as playing or having curiosity. This ability allows individuals to design and solve their own tasks, and to build inner physical and social representations of their environments, acquiring an open-ended set of skills throughout their lifespan as a result. This thesis is part of the research effort to incorporate these essential features in artificial agents, leveraging goal-conditioned reinforcement learning to design agents able to discover and master every feasible goals in complex environments. In our first contribution, we investigate autonomous intrinsic goal setting, as a versatile agent should be able to determine its own goals and the order in which to learn these goals to enhance its performances. By leveraging a learned model of the agent's current goal reaching abilities, we show that we can shape an optimal difficulty goal distribution, enabling to sample goals in the Zone of Proximal Development (ZPD) of the agent, which is a psychological concept referring to the frontier between what a learner knows and what it does not, constituting the space of knowledge that is not mastered yet but have the potential to be acquired. We demonstrate that targeting the ZPD of the agent's result in a significant increase in performance for a great variety of goal-reaching tasks. Another core competence is to extract a relevant representation of what matters in the environment from observations coming from any available sensors. We address this question in our second contribution, by highlighting the difficulty to learn a correct representation of the environment in an online setting, where the agent acquires knowledge incrementally as it make progresses. In this context, recent achieved goals are outliers, as there are very few occurrences of this new skill in the agent's experiences, making their representations brittle. We leverage the adversarial setting of Distributionally Robust Optimization in order for the agent's representations of such outliers to be reliable. We show that our method leads to a virtuous circle, as learning accurate representations for new goals fosters the exploration of the environment
Paolo, Giuseppe. "Learning in Sparse Rewards setting through Quality Diversity algorithms." Electronic Thesis or Diss., Sorbonne université, 2021. http://www.theses.fr/2021SORUS400.
Full textEmbodied agents, both natural and artificial, can learn to interact with the environment they are in through a process of trial and error. This process can be formalized through the Reinforcement Learning framework, in which the agent performs an action in the environment and observes its outcome through an observation and a reward signal. It is the reward signal that tells the agent how good the performed action is with respect to the task. This means that the more often a reward is given, the easier it is to improve on the current solution. When this is not the case, and the reward is given sparingly, the agent finds itself in a situation of sparse rewards. This requires a big focus on exploration, that is on testing different things, in order to discover which action, or set of actions leads to the reward. RL agents usually struggle with this. Exploration is the focus of Quality-Diversity methods, a family of evolutionary algorithms that searches for a set of policies whose behaviors are as different as possible, while also improving on their performances. In this thesis, we approach the problem of sparse rewards with these algorithms, and in particular with Novelty Search. This is a method that, contrary to many other Quality-Diversity approaches, does not improve on the performances of the discovered rewards, but only on their diversity. Thanks to this it can quickly explore the whole space of possible policies behaviors. The first part of the thesis focuses on autonomously learning a representation of the search space in which the algorithm evaluates the discovered policies. In this regard, we propose the Task Agnostic eXploration of Outcome spaces through Novelty and Surprise (TAXONS) algorithm. This method learns a low-dimensional representation of the search space in situations in which it is not easy to hand-design said representation. TAXONS has proven effective in three different environments but still requires information on when to capture the observation used to learn the search space. This limitation is addressed by performing a study on multiple ways to encode into the search space information about the whole trajectory of observations generated during a policy evaluation. Among the studied methods, we analyze in particular the mathematical transform called signature and its relevance to build trajectory-level representations. The manuscript continues with the study of a complementary problem to the one addressed by TAXONS: how to focus on the most interesting parts of the search space. Novelty Search is limited by the fact that all information about any reward discovered during the exploration process is ignored. In our second contribution, we introduce the Sparse Reward Exploration via Novelty Search and Emitters (SERENE) algorithm. This method separates the exploration of the search space from the exploitation of the reward through a two-alternating-steps approach. The exploration is performed through Novelty Search, but whenever a reward is discovered, it is exploited by instances of reward-based methods - called emitters - that perform local optimization of the reward. Experiments on different environments show how SERENE can quickly obtain high rewarding solutions without hindering the exploration performances of the method. In our third and final contribution, we combine the two ideas presented with TAXONS and SERENE into a single approach: SERENE augmented TAXONS (STAX). This algorithm can autonomously learn a low-dimensional representation of the search space while quickly optimizing any discovered reward through emitters. Experiments conducted on various environments show how the method can i) learn a representation allowing the discovery of all rewards and ii) quickly [...]
Beretta, Davide. "Experience Replay in Sparse Rewards Problems using Deep Reinforcement Techniques." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/17531/.
Full textParisi, Simone [Verfasser], Jan [Akademischer Betreuer] Peters, and Joschka [Akademischer Betreuer] Boedeker. "Reinforcement Learning with Sparse and Multiple Rewards / Simone Parisi ; Jan Peters, Joschka Boedeker." Darmstadt : Universitäts- und Landesbibliothek Darmstadt, 2020. http://d-nb.info/1203301545/34.
Full textBenini, Francesco. "Predicting death in games with deep reinforcement learning." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/20755/.
Full textGallouedec, Quentin. "Toward the generalization of reinforcement learning." Electronic Thesis or Diss., Ecully, Ecole centrale de Lyon, 2024. http://www.theses.fr/2024ECDL0013.
Full textConventional Reinforcement Learning (RL) involves training a unimodal agent on a single, well-defined task, guided by a gradient-optimized reward signal. This framework does not allow us to envisage a learning agent adapted to real-world problems involving diverse modality streams, multiple tasks, often poorly defined, sometimes not defined at all. Hence, we advocate for transitioning towards a more general framework, aiming to create RL algorithms that more inherently versatile.To advance in this direction, we identify two primary areas of focus. The first aspect involves improving exploration, enabling the agent to learn from the environment with reduced dependence on the reward signal. We present Latent Go-Explore (LGE), an extension of the Go-Explore algorithm. While Go-Explore achieved impressive results, it was constrained by domain-specific knowledge. LGE overcomes these limitations, offering wider applicability within a general framework. In various tested environments, LGE consistently outperforms the baselines, showcasing its enhanced effectiveness and versatility. The second focus is to design a general-purpose agent that can operate in a variety of environments, thus involving a multimodal structure and even transcending the conventional sequential framework of RL. We introduce Jack of All Trades (JAT), a multimodal Transformer-based architecture uniquely tailored to sequential decision tasks. Using a single set of weights, JAT demonstrates robustness and versatility, competing its unique baseline on several RL benchmarks and even showing promising performance on vision and textual tasks. We believe that these two contributions are a valuable step towards a more general approach to RL. In addition, we present other methodological and technical advances that are closely related to our core research question. The first is the introduction of a set of sparsely rewarded simulated robotic environments designed to provide the community with the necessary tools for learning under conditions of low supervision. Notably, three years after its introduction, this contribution has been widely adopted by the community and continues to receive active maintenance and support. On the other hand, we present Open RL Benchmark, our pioneering initiative to provide a comprehensive and fully tracked set of RL experiments, going beyond typical data to include all algorithm-specific and system metrics. This benchmark aims to improve research efficiency by providing out-of-the-box RL data and facilitating accurate reproducibility of experiments. With its community-driven approach, it has quickly become an important resource, documenting over 25,000 runs.These technical and methodological advances, along with the scientific contributions described above, are intended to promote a more general approach to Reinforcement Learning and, we hope, represent a meaningful step toward the eventual development of a more operative RL agent
Junyent, Barbany Miquel. "Width-Based Planning and Learning." Doctoral thesis, Universitat Pompeu Fabra, 2021. http://hdl.handle.net/10803/672779.
Full textLa presa seqüencial de decisions òptimes és un problema fonamental en diversos camps. En els últims anys, els mètodes d'aprenentatge per reforç (RL) han experimentat un èxit sense precedents, en gran part gràcies a l'ús de models d'aprenentatge profund, aconseguint un rendiment a nivell humà en diversos dominis, com els videojocs d'Atari o l'antic joc de Go. En contrast amb l'enfocament de RL, on l'agent aprèn una política a partir de mostres d'interacció amb l'entorn, ignorant l'estructura del problema, l'enfocament de planificació assumeix models coneguts per als objectius de l'agent i la dinàmica del domini, i es basa en determinar com ha de comportar-se l'agent per aconseguir els seus objectius. Els planificadors actuals són capaços de resoldre problemes que involucren grans espais d'estats precisament explotant l'estructura del problema, definida en el model estat-acció. En aquest treball combinem els dos enfocaments, aprofitant polítiques ràpides i compactes dels mètodes d'aprenentatge i la capacitat de fer cerques en problemes combinatoris dels mètodes de planificació. En particular, ens enfoquem en una família de planificadors basats en el width (ample), que han tingut molt èxit en els últims anys gràcies a que la seva escalabilitat és independent de la mida de l'espai d'estats. L'algorisme bàsic, Iterated Width (IW), es va proposar originalment per problemes de planificació clàssica, on el model de transicions d'estat i objectius ve completament determinat, representat per conjunts d'àtoms. No obstant, els planificadors basats en width no requereixen un model de l'entorn completament definit i es poden utilitzar amb simuladors. Per exemple, s'han aplicat recentment a dominis gràfics com els jocs d'Atari. Malgrat el seu èxit, IW és un algorisme purament exploratori i no aprofita la informació de recompenses anteriors. A més, requereix que l'estat estigui factoritzat en característiques, que han de predefinirse per a la tasca en concret. A més, executar l'algorisme amb un width superior a 1 sol ser computacionalment intractable a la pràctica, el que impedeix que IW resolgui problemes de width superior. Comencem aquesta tesi estudiant la complexitat dels mètodes basats en width quan l'espai d'estats està definit per característiques multivalor, com en els problemes de RL, en lloc d'àtoms booleans. Proporcionem un límit superior més precís en la quantitat de nodes expandits per IW, així com resultats generals de complexitat algorísmica. Per fer front a problemes més complexos (és a dir, aquells amb un width superior a 1), presentem un algorisme jeràrquic que planifica en dos nivells d'abstracció. El planificador d'alt nivell utilitza característiques abstractes que es van descobrint gradualment a partir de decisions de poda en l'arbre de baix nivell. Il·lustrem aquest algorisme en dominis PDDL de planificació clàssica, així com en dominis de simuladors gràfics. En planificació clàssica, mostrem com IW(1) en dos nivells d'abstracció pot resoldre problemes de width 2. Per aprofitar la informació de recompenses passades, incorporem una política explícita en el mecanisme de selecció d'accions. El nostre mètode, anomenat π-IW, intercala la planificació basada en width i l'aprenentatge de la política usant les accions visitades pel planificador. Representem la política amb una xarxa neuronal que, al seu torn, s'utilitza per guiar la planificació, reforçant així camins prometedors. A més, la representació apresa per la xarxa neuronal es pot utilitzar com a característiques per al planificador sense degradar el seu rendiment, eliminant així el requisit d'usar característiques predefinides. Comparem π-IW amb mètodes anteriors basats en width i amb AlphaZero, un mètode que també intercala planificació i aprenentatge, i mostrem que π-IW té un rendiment superior en entorns simples. També mostrem que l'algorisme π-IW supera altres mètodes basats en width en els jocs d'Atari. Finalment, mostrem que el mètode IW jeràrquic proposat pot integrar-se fàcilment amb el nostre esquema d'aprenentatge de la política, donant com a resultat un algorisme que supera els planificadors no jeràrquics basats en IW en els jocs d'Atari amb recompenses distants.
La toma secuencial de decisiones óptimas es un problema fundamental en diversos campos. En los últimos años, los métodos de aprendizaje por refuerzo (RL) han experimentado un éxito sin precedentes, en gran parte gracias al uso de modelos de aprendizaje profundo, alcanzando un rendimiento a nivel humano en varios dominios, como los videojuegos de Atari o el antiguo juego de Go. En contraste con el enfoque de RL, donde el agente aprende una política a partir de muestras de interacción con el entorno, ignorando la estructura del problema, el enfoque de planificación asume modelos conocidos para los objetivos del agente y la dinámica del dominio, y se basa en determinar cómo debe comportarse el agente para lograr sus objetivos. Los planificadores actuales son capaces de resolver problemas que involucran grandes espacios de estados precisamente explotando la estructura del problema, definida en el modelo estado-acción. En este trabajo combinamos los dos enfoques, aprovechando políticas rápidas y compactas de los métodos de aprendizaje y la capacidad de realizar búsquedas en problemas combinatorios de los métodos de planificación. En particular, nos enfocamos en una familia de planificadores basados en el width (ancho), que han demostrado un gran éxito en los últimos años debido a que su escalabilidad es independiente del tamaño del espacio de estados. El algoritmo básico, Iterated Width (IW), se propuso originalmente para problemas de planificación clásica, donde el modelo de transiciones de estado y objetivos viene completamente determinado, representado por conjuntos de átomos. Sin embargo, los planificadores basados en width no requieren un modelo del entorno completamente definido y se pueden utilizar con simuladores. Por ejemplo, se han aplicado recientemente en dominios gráficos como los juegos de Atari. A pesar de su éxito, IW es un algoritmo puramente exploratorio y no aprovecha la información de recompensas anteriores. Además, requiere que el estado esté factorizado en características, que deben predefinirse para la tarea en concreto. Además, ejecutar el algoritmo con un width superior a 1 suele ser computacionalmente intratable en la práctica, lo que impide que IW resuelva problemas de width superior. Empezamos esta tesis estudiando la complejidad de los métodos basados en width cuando el espacio de estados está definido por características multivalor, como en los problemas de RL, en lugar de átomos booleanos. Proporcionamos un límite superior más preciso en la cantidad de nodos expandidos por IW, así como resultados generales de complejidad algorítmica. Para hacer frente a problemas más complejos (es decir, aquellos con un width superior a 1), presentamos un algoritmo jerárquico que planifica en dos niveles de abstracción. El planificador de alto nivel utiliza características abstractas que se van descubriendo gradualmente a partir de decisiones de poda en el árbol de bajo nivel. Ilustramos este algoritmo en dominios PDDL de planificación clásica, así como en dominios de simuladores gráficos. En planificación clásica, mostramos cómo IW(1) en dos niveles de abstracción puede resolver problemas de width 2. Para aprovechar la información de recompensas pasadas, incorporamos una política explícita en el mecanismo de selección de acciones. Nuestro método, llamado π-IW, intercala la planificación basada en width y el aprendizaje de la política usando las acciones visitadas por el planificador. Representamos la política con una red neuronal que, a su vez, se utiliza para guiar la planificación, reforzando así caminos prometedores. Además, la representación aprendida por la red neuronal se puede utilizar como características para el planificador sin degradar su rendimiento, eliminando así el requisito de usar características predefinidas. Comparamos π-IW con métodos anteriores basados en width y con AlphaZero, un método que también intercala planificación y aprendizaje, y mostramos que π-IW tiene un rendimiento superior en entornos simples. También mostramos que el algoritmo π-IW supera otros métodos basados en width en los juegos de Atari. Finalmente, mostramos que el IW jerárquico propuesto puede integrarse fácilmente con nuestro esquema de aprendizaje de la política, dando como resultado un algoritmo que supera a los planificadores no jerárquicos basados en IW en los juegos de Atari con recompensas distantes.
Parisi, Simone. "Reinforcement Learning with Sparse and Multiple Rewards." Phd thesis, 2020. https://tuprints.ulb.tu-darmstadt.de/11372/1/THESIS.PDF.
Full textChi, Lu-cheng, and 紀律呈. "An Improved Deep Reinforcement Learning with Sparse Rewards." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/eq94pr.
Full text國立中山大學
電機工程學系研究所
107
In reinforcement learning, how an agent explores in an environment with sparse rewards is a long-standing problem. An improved deep reinforcement learning described in this thesis encourages an agent to explore unvisited environmental states in an environment with sparse rewards. In deep reinforcement learning, an agent directly uses an image observation from environment as an input to the neural network. However, some neglected observations from environment, such as depth, might provide valuable information. An improved deep reinforcement learning described in this thesis is based on the Actor-Critic algorithm and uses the convolutional neural network as a hetero-encoder between an image input and other observations from environment. In the environment with sparse rewards, we use these neglected observations from environment as a target output of supervised learning and provide an agent denser training signals through supervised learning to bootstrap reinforcement learning. In addition, we use the loss from supervised learning as the feedback for an agent’s exploration behavior in an environment, called the label reward, to encourage an agent to explore unvisited environmental states. Finally, we construct multiple neural networks by Asynchronous Advantage Actor-Critic algorithm and learn the policy with multiple agents. An improved deep reinforcement learning described in this thesis is compared with other deep reinforcement learning in an environment with sparse rewards and achieves better performance.
Books on the topic "Sparse Reward"
Rudyard, Kipling. Puck of Pook's Hill ; and, Rewards and fairies. Oxford: Oxford University Press, 1992.
Find full textPersson, Fabian. Women at the Early Modern Swedish Court. NL Amsterdam: Amsterdam University Press, 2021. http://dx.doi.org/10.5117/9789463725200.
Full textPrima. Official Sega Genesis: Power Tips Book. Rocklin, CA: Prima Publishing, 1992.
Find full textMcdermott, Leeanne. GamePro Presents: Sega Genesis Games Secrets: Greatest Tips. Rocklin: Prima Publishing, 1992.
Find full textSandler, Corey. Official Sega Genesis and Game Gear strategies, 3RD Edition. New York: Bantam Books, 1992.
Find full textRudyard, Kipling. Rewards and Fairies. Createspace Independent Publishing Platform, 2016.
Find full textBook chapters on the topic "Sparse Reward"
Hensel, Maximilian. "Exploration Methods in Sparse Reward Environments." In Reinforcement Learning Algorithms: Analysis and Applications, 35–45. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-41188-6_4.
Full textMoy, Glennn, and Slava Shekh. "Evolution Strategies for Sparse Reward Gridworld Environments." In AI 2022: Advances in Artificial Intelligence, 266–78. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-22695-3_19.
Full textJeewa, Asad, Anban W. Pillay, and Edgar Jembere. "Learning to Generalise in Sparse Reward Navigation Environments." In Artificial Intelligence Research, 85–100. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-66151-9_6.
Full textChen, Zhongpeng, and Qiang Guan. "Continuous Exploration via Multiple Perspectives in Sparse Reward Environment." In Pattern Recognition and Computer Vision, 57–68. Singapore: Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-8435-0_5.
Full textLei, Hejun, Paul Weng, Juan Rojas, and Yisheng Guan. "Planning with Q-Values in Sparse Reward Reinforcement Learning." In Intelligent Robotics and Applications, 603–14. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-13844-7_56.
Full textFu, Yupeng, Yuan Xiao, Jun Fang, Xiangyang Deng, Ziqiang Zhu, and Limin Zhang. "Distributed Advantage-Based Weights Reshaping Algorithm with Sparse Reward." In Lecture Notes in Computer Science, 391–400. Singapore: Springer Nature Singapore, 2024. http://dx.doi.org/10.1007/978-981-97-7181-3_31.
Full textLe, Bang-Giang, Thi-Linh Hoang, Hai-Dang Kieu, and Viet-Cuong Ta. "Structural and Compact Latent Representation Learning on Sparse Reward Environments." In Intelligent Information and Database Systems, 40–51. Singapore: Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-5837-5_4.
Full textWu, Feng, and Xiaoping Chen. "Solving Large-Scale and Sparse-Reward DEC-POMDPs with Correlation-MDPs." In RoboCup 2007: Robot Soccer World Cup XI, 208–19. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008. http://dx.doi.org/10.1007/978-3-540-68847-1_18.
Full textMizukami, Naoki, Jun Suzuki, Hirotaka Kameko, and Yoshimasa Tsuruoka. "Exploration Bonuses Based on Upper Confidence Bounds for Sparse Reward Games." In Lecture Notes in Computer Science, 165–75. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-71649-7_14.
Full textKang, Yongxin, Enmin Zhao, Yifan Zang, Kai Li, and Junliang Xing. "Towards a Unified Benchmark for Reinforcement Learning in Sparse Reward Environments." In Communications in Computer and Information Science, 189–201. Singapore: Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-1639-9_16.
Full textConference papers on the topic "Sparse Reward"
Hossain, Jumman, Abu-Zaher Faridee, Nirmalya Roy, Jade Freeman, Timothy Gregory, and Theron Trout. "TopoNav: Topological Navigation for Efficient Exploration in Sparse Reward Environments." In 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 693–700. IEEE, 2024. https://doi.org/10.1109/iros58592.2024.10802380.
Full textHuang, Chao, Yibei Guo, Zhihui Zhu, Mei Si, Daniel Blankenberg, and Rui Liu. "Quantum Exploration-based Reinforcement Learning for Efficient Robot Path Planning in Sparse-Reward Environment." In 2024 33rd IEEE International Conference on Robot and Human Interactive Communication (ROMAN), 516–21. IEEE, 2024. http://dx.doi.org/10.1109/ro-man60168.2024.10731199.
Full textYang, Kai, Zhirui Fang, Xiu Li, and Jian Tao. "CMBE: Curiosity-driven Model-Based Exploration for Multi-Agent Reinforcement Learning in Sparse Reward Settings." In 2024 International Joint Conference on Neural Networks (IJCNN), 1–8. IEEE, 2024. http://dx.doi.org/10.1109/ijcnn60899.2024.10650769.
Full textFarkaš, Igor. "Explaining Internal Representations in Deep Networks: Adversarial Vulnerability of Image Classifiers and Learning Sequential Tasks with Sparse Reward." In 2025 IEEE 23rd World Symposium on Applied Machine Intelligence and Informatics (SAMI), 000015–16. IEEE, 2025. https://doi.org/10.1109/sami63904.2025.10883317.
Full textXi, Lele, Hongkun Wang, Zhijie Li, and Changchun Hua. "An Experience Replay Approach Based on SSIM to Solve the Sparse Reward Problem in Pursuit Evasion Game*." In 2024 China Automation Congress (CAC), 6238–43. IEEE, 2024. https://doi.org/10.1109/cac63892.2024.10864615.
Full textWang, Guojian, Faguo Wu, and Xiao Zhang. "Trajectory-Oriented Policy Optimization with Sparse Rewards." In 2024 2nd International Conference on Intelligent Perception and Computer Vision (CIPCV), 76–81. IEEE, 2024. http://dx.doi.org/10.1109/cipcv61763.2024.00023.
Full textCheng, Hao, Jiahang Cao, Erjia Xiao, Mengshu Sun, and Renjing Xu. "Gaining the Sparse Rewards by Exploring Lottery Tickets in Spiking Neural Networks." In 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 442–49. IEEE, 2024. https://doi.org/10.1109/iros58592.2024.10802854.
Full textHuang, Yuming, Bin Ren, Ziming Xu, and Lianghong Wu. "MRHER: Model-based Relay Hindsight Experience Replay for Sequential Object Manipulation Tasks with Sparse Rewards." In 2024 International Joint Conference on Neural Networks (IJCNN), 1–8. IEEE, 2024. http://dx.doi.org/10.1109/ijcnn60899.2024.10650959.
Full textTian, Yuhe, Ayooluwa Akintola, Yazhou Jiang, Dewei Wang, Jie Bao, Miguel A. Zamarripa, Brandon Paul, et al. "Reinforcement Learning-Driven Process Design: A Hydrodealkylation Example." In Foundations of Computer-Aided Process Design, 387–93. Hamilton, Canada: PSE Press, 2024. http://dx.doi.org/10.69997/sct.119603.
Full textYang, Dong, and Yuhua Tang. "Adaptive Inner-reward Shaping in Sparse Reward Games." In 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020. http://dx.doi.org/10.1109/ijcnn48605.2020.9207302.
Full textReports on the topic "Sparse Reward"
Erik Lyngdorf, Niels, Selina Thelin Ruggaard, Kathrin Otrel-Cass, and Eamon Costello. The Hacking Innovative Pedagogies (HIP) framework: - Rewilding the digital learning ecology. Aalborg University, 2023. http://dx.doi.org/10.54337/aau602808725.
Full textMurray, Chris, Keith Williams, Norrie Millar, Monty Nero, Amy O'Brien, and Damon Herd. A New Palingenesis. University of Dundee, November 2022. http://dx.doi.org/10.20933/100001273.
Full text