Rozprawy doktorskie: „Reinforcement”

1

Pettersson, Markus, i Andreas Larsson. "Automated Construction- Reinforcement : Lifting Prefabricated Reinforcement Cages". Thesis, Luleå tekniska universitet, Institutionen för samhällsbyggnad och naturresurser, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-84326.

Pełny tekst źródła

Streszczenie:

The construction industry is moving towards an increasingly industrial production and one step towards this is to use prefabricated rebar cages. A new concept is being tested where tied rebar cages are produced on the construction site by industrial robots. The industrial robots produce the rebar cages based on a geometric model and when finished they are lifted to the casting site with the help of a crane. In order for this concept to become an efficient process, it must already be possible to evaluate the stability of the rebar cages in the early design phase to be able to efficiently determine whether it is possible to lift the rebar cages to the casting site after production. The scope of this thesis is to investigate what is required to create a numerical model with help of data from a geometric model in an efficient way in order to determine whether a rebar cage can be lifted. This thesis is limited to one specific rebar cage that has already been created in the CADprogram Tekla Structures by the contractor Skanska Sweden AB. The process of what is required when creating a numerical model with data from a geometric model is limited between the analysis software LUSAS Bridge and the CAD software Tekla Structures. To be able to determine what is required in order to create a numerical model with data from a geometric model in the analysis software LUSAS Bridge in an efficient way a survey was performed. The survey includes the necessary steps to create a numerical model of a tied rebar cage from a geometric model and the problems that occurs along the way to get an efficient process. In order to determine if the geometric model of the rebar cage is liftable a linear analysis was created in the analysis software LUSAS Bridge. The analysis was created with data from the geometric model created in Tekla Structures and with experimental results from a study where tied connection strength and stiffness were evaluated. The analysis was created to simulate a rebar cage for a bridge foundation in scale 1:2 that is lifted in four lifting points. The lifting points in this analysis is modeled as supports while the rebar cage is subjected to an acceleration force to simulate a lift. The analysis is performed in two parts, first when the tied connections have full stiffness capacity and the second part when the stiffness is decreased to be able to evaluate what happens to the stability of the rebar cage. In order to determine if the rebar cage is liftable two conditions are looked at, (1) stress in rebars and (2) forces in the joint elements. The results from the study show that in order to create a numerical model of a tied rebar cage in an efficient way some improvements must be done in the analysis software LUSAS Bridge. The most time-consuming process when creating a numerical model of a tied rebar cage are the connections. In order to create these tied connections in an efficient way some new functions must be created in LUSAS Bridge where the software can generate different types of tied connections. The results from the analysis show that the stress in the rebars at the lifting points is the criterion that is most critical when the rebar cage is lifted. The maximum stress reached 356 MPa with the utilization rate of 81,9% when the connection stiffness has full capacity. When the connection stiffness was adjusted, it was also shown here that the stress was the criterion that is most critical. The analysis for 50% connection stiffness capacity showed a maximum stress of 402 MPa with the utilization rate of 92,4%, this shows an increase of 10,5% of the utilization rate when the IV connection stiffness is decreased with 50%. Based on these results, it can be stated that the rebar cage can be lifted if four lifting points are used. It can also be seen from the results that the stiffness in the connections has a very small impact on the behavior of the cage and therefore the placement of the rebars contributes more to the stability.
Byggindustrin går mot ett allt mer industriellt byggande och ett steg på vägen är att använda sig av prefabricerade armeringskorgar. Ett nytt koncept håller på att undersökas där man vill framställa najade armeringskorgar på byggarbetsplatsen med hjälp av industrirobotar. Industrirobotarna bygger armeringskorgarna utifrån en geometrisk modell för att sedan lyftas till gjutplatsen med hjälp av en kran. För att detta koncept ska bli en effektiv process måste man redan i projekteringsfasen kunna utvärdera armeringskorgarnas lyftbarhet för att kunna avgöra om det går att lyfta armeringskorgarna till gjutplatsen efter att de är färdigmonterade. Examensarbetets syfte är att undersöka vad som krävs för att ska skapa en numerisk modell med hjälp av data från en geometrisk modell på ett effektivt sätt för att sedan kunna avgöra om en armeringskorg går att lyfta. Denna studie är avgränsad till enbart en typ av armeringskorg som redan är skapad i CADprogrammet Tekla Structures av entreprenören Skanska Sverige AB. Processen som krävs för att skapa en numerisk modell med hjälp av en geometrisk modell är begränsad till analysprogrammet LUSAS Bridge och CAD-programmet Tekla Structures. För att kunna avgöra vad som krävs för att skapa en numerisk modell med hjälp av en geometrisk modell i analysprogrammet LUSAS Bridge utfördes en kartläggning. Kartläggningen omfattar de steg som krävs för att skapa en numerisk modell av en najad armeringskorg från en geometrisk modell och de svårigheter som finns längs vägen. För att kunna avgöra om den geometriska modellen av armeringskorgen är lyftbar skapades en linjär Finita Element Analys i analysprogrammet LUSAS Bridge. Analysen är skapad med hjälp av data från den geometriska modellen från Tekla Structures samt testresultat från en tidigare studie där man har provat hållfastheten och styvheten hos najade knutpunkter. Analysen är skapad för att efterlikna en armeringskorg till ett brofundament i skala 1:2 som lyfts i fyra punkter. Lyftpunkterna i denna analys är simulerade som stöd medan armeringskorgen utsätts för en accelerationskraft för att efterlikna ett lyft. Analysen utförs i två delar, en när de najade knutpunkterna har full styvhet och den andra när styvhet för knutpunkterna minskar. Detta för att se hur styvheten i knutpunkterna påverkar stabiliteten i armeringskorgen. Armeringskorgens lyftbarhet bedöms av två olika kriterier (1) spänning i armeringsjärnen och (2) krafter i knutpunkterna. Resultaten från studien visar att för att kunna skapa en numeriskmodell av en najad armeringskorg på ett effektivt sätt måste en del förbättringar göras i analysprogrammet LUSAS Bridge. Den process som är mest tidskrävande är när man skapar de najade knutpunkterna för armeringskorgen. För att det ska gå att genomföra detta på ett effektivt sätt måste en ny funktion skapas i LUSAS Bridge där programmet kan generera olika typer av knutpunkter automatiskt. Resultaten från analysen visade att spänningen av armeringsjärnen vid lyftpunkterna är det kriterium som är mest kritisk när armeringskorgen utsätts för lyft. Den maximala spänningen uppgick till 356 MPa med en utnyttjandegrad på 81,9% för armeringskorgen med full styvhet. När styvheten justerades visades det även här att spänningen är det kriterium som är mest kritiskt. Analysen för 50% styvhet visade en maximal spänning på 402 MPa med en utnyttjande grad på 92,4%, detta visar en ökning på 10,5% av utnyttjandegraden när styvheten halveras. VI Utifrån dessa resultat kan man konstatera att armeringskorgen går att lyfta om man använder fyra lyftpunkter. Man kan även utifrån resultaten se att en minskning av styvheten i knutpunkterna har en liten påverkar på armeringskorgens lyftbarhet och istället är det placeringen av armeringsjärnen som har störst betydelse.

Style APA, Harvard, Vancouver, ISO itp.

2

Fox, James J. "Negative Reinforcement". Digital Commons @ East Tennessee State University, 2015. https://dc.etsu.edu/etsu-works/161.

Pełny tekst źródła

Streszczenie:

Book Summary:: A teacher’s ability to manage the classroom strongly influences the quality of teaching and learning that can be accomplished. Among the most pressing concerns for inexperienced teachers is classroom management, a concern of equal importance to the general public in light of behavior problems and breakdowns in discipline that grab newspaper headlines. But classroom management is not just about problems and what to do when things go wrong and chaos erupts. It’s about how to run a classroom so as to elicit the best from even the most courteous group of students. An array of skills is needed to produce such a learning environment. The SAGE Encyclopedia of Classroom Management raises issues and introduces evidence-based, real-world strategies for creating and maintaining well-managed classrooms where learning thrives. Students studying to become teachers will need to develop their own classroom management strategies consistent with their own philosophies of teaching and learning. It is hoped that this work will help open their eyes to the range of issues and the array of skills they might integrate into their unique teaching styles.

Style APA, Harvard, Vancouver, ISO itp.

3

Izquierdo, Ayala Pablo. "Learning comparison: Reinforcement Learning vs Inverse Reinforcement Learning : How well does inverse reinforcement learning perform in simple markov decision processes in comparison to reinforcement learning?" Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-259371.

Pełny tekst źródła

Streszczenie:

This research project elaborates a qualitative comparison between two different learning approaches, Reinforcement Learning (RL) and Inverse Reinforcement Learning (IRL) over the Gridworld Markov Decision Process. The interest focus will be set on the second learning paradigm, IRL, as it is considered to be relatively new and little work has been developed in this field of study. As observed, RL outperforms IRL, obtaining a correct solution in all the different scenarios studied. However, the behaviour of the IRL algorithms can be improved and this will be shown and analyzed as part of the scope.
Denna studie är en kvalitativ jämförelse mellan två olika inlärningsangreppssätt, “Reinforcement Learning” (RL) och “Inverse Reinforcement Learning” (IRL), om använder "Gridworld", en "Markov Decision-Process". Fokus ligger på den senare algoritmen, IRL, eftersom den anses relativt ny och få studier har i nuläget gjorts kring den. I studien är RL mer fördelaktig än IRL, som skapar en korrekt lösning i alla olika scenarier som presenteras i studien. Beteendet hos IRL-algoritmen kan dock förbättras vilket också visas och analyseras i denna studie.

Style APA, Harvard, Vancouver, ISO itp.

4

Seymour, B. J. "Aversive reinforcement learning". Thesis, University College London (University of London), 2010. http://discovery.ucl.ac.uk/800107/.

Pełny tekst źródła

Streszczenie:

We hypothesise that human aversive learning can be described algorithmically by Reinforcement Learning models. Our first experiment uses a second-order conditioning design to study sequential outcome prediction. We show that aversive prediction errors are expressed robustly in the ventral striatum, supporting the validity of temporal difference algorithms (as in reward learning), and suggesting a putative critical area for appetitive-aversive interactions. With this in mind, the second experiment explores the nature of pain relief, which as expounded in theories of motivational opponency, is rewarding. In a Pavlovian conditioning task with phasic relief of tonic noxious thermal stimulation, we show that both appetitive and aversive prediction errors are co-expressed in anatomically dissociable regions (in a mirror opponent pattern) and that striatal activity appears to reflect integrated appetitive-aversive processing. Next we designed a Pavlovian task in which cues predicted either financial gains, losses, or both, thereby forcing integration of both motivational streams. This showed anatomical dissociation of aversive and appetitive predictions along a posterior-anterior gradient within the striatum, respectively. Lastly, we studied aversive instrumental control (avoidance). We designed a simultaneous pain avoidance and financial reward learning task, in which subjects had to learn independently learn about each, and trade off aversive and appetitive predictions. We show that predictions for both converge on the medial head of caudate nucleus, suggesting that this is a critical site for appetitive-aversive integration in instrumental decision making. We also study also tested whether serotonin (5HT) modulates either phasic or tonic opponency using acute tryptophan depletion. Both behavioural and imaging data confirm the latter, in which it appears to mediate an average reward term, providing an aspiration level against which the benefits of exploration are judged. In summary, our data provide a basic computational and neuroanatomical framework for human aversive learning. We demonstrate the algorithmic and implementational validity of reinforcement learning models for both aversive prediction and control, illustrate the nature and neuroanatomy of appetitive-aversive integration, and discover the critical (and somewhat unexpected) central role for the striatum.

Style APA, Harvard, Vancouver, ISO itp.

5

Gonçalves, Madalena Telo. "BPI: capital reinforcement". Master's thesis, NSBE - UNL, 2013. http://hdl.handle.net/10362/11679.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

6

MacAleese, Kenneth R. "Examining conjugate reinforcement /". abstract and full text PDF (UNR users only), 2008. http://0-gateway.proquest.com.innopac.library.unr.edu/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3342622.

Pełny tekst źródła

Streszczenie:

Thesis (Ph. D.)--University of Nevada, Reno, 2008.
"December, 2008." Includes bibliographical references (leaves 55-64). Library also has microfilm. Ann Arbor, Mich. : ProQuest Information and Learning Company, [2009]. 1 microfilm reel ; 35 mm. Online version available on the World Wide Web.

Style APA, Harvard, Vancouver, ISO itp.

7

Tabell, Johnsson Marco, i Ala Jafar. "Efficiency Comparison Between Curriculum Reinforcement Learning & Reinforcement Learning Using ML-Agents". Thesis, Blekinge Tekniska Högskola, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-20218.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

8

Yang, Zhaoyuan Yang. "Adversarial Reinforcement Learning for Control System Design: A Deep Reinforcement Learning Approach". The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu152411491981452.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

9

Cortesi, Daniele. "Reinforcement Learning in Rogue". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/16138/.

Pełny tekst źródła

Streszczenie:

In this work we use Reinforcement Learning to play the famous Rogue, a dungeon-crawler videogame father of the rogue-like genre. By employing different algorithms we substantially improve on the results obtained in previous work, addressing and solving the problems that were arisen. We then devise and perform new experiments to test the limits of our own solution and encounter additional and unexpected issues in the process. In one of the investigated scenario we clearly see that our approach is not yet enough to even perform better than a random agent and propose ideas for future works.

Style APA, Harvard, Vancouver, ISO itp.

10

Girgin, Sertan. "Abstraction In Reinforcement Learning". Phd thesis, METU, 2007. http://etd.lib.metu.edu.tr/upload/12608257/index.pdf.

Pełny tekst źródła

Streszczenie:

Reinforcement learning is the problem faced by an agent that must learn behavior through trial-and-error interactions with a dynamic environment. Generally, the problem to be solved contains subtasks that repeat at different regions of the state space. Without any guidance an agent has to learn the solutions of all subtask instances independently, which degrades the learning performance. In this thesis, we propose two approaches to build connections between different regions of the search space leading to better utilization of gained experience and accelerate learning is proposed. In the first approach, we first extend existing work of McGovern and propose the formalization of stochastic conditionally terminating sequences with higher representational power. Then, we describe how to efficiently discover and employ useful abstractions during learning based on such sequences. The method constructs a tree structure to keep track of frequently used action sequences together with visited states. This tree is then used to select actions to be executed at each step. In the second approach, we propose a novel method to identify states with similar sub-policies, and show how they can be integrated into reinforcement learning framework to improve the learning performance. The method uses an efficient data structure to find common action sequences started from observed states and defines a similarity function between states based on the number of such sequences. Using this similarity function, updates on the action-value function of a state are reflected to all similar states. This, consequently, allows experience acquired during learning be applied to a broader context. Effectiveness of both approaches is demonstrated empirically by conducting extensive experiments on various domains.

Style APA, Harvard, Vancouver, ISO itp.

11

Suay, Halit Bener. "Reinforcement Learning from Demonstration". Digital WPI, 2016. https://digitalcommons.wpi.edu/etd-dissertations/173.

Pełny tekst źródła

Streszczenie:

Off-the-shelf Reinforcement Learning (RL) algorithms suffer from slow learning performance, partly because they are expected to learn a task from scratch merely through an agent's own experience. In this thesis, we show that learning from scratch is a limiting factor for the learning performance, and that when prior knowledge is available RL agents can learn a task faster. We evaluate relevant previous work and our own algorithms in various experiments. Our first contribution is the first implementation and evaluation of an existing interactive RL algorithm in a real-world domain with a humanoid robot. Interactive RL was evaluated in a simulated domain which motivated us for evaluating its practicality on a robot. Our evaluation shows that guidance reduces learning time, and that its positive effects increase with state space size. A natural follow up question after our first evaluation was, how do some other previous works compare to interactive RL. Our second contribution is an analysis of a user study, where na"ive human teachers demonstrated a real-world object catching with a humanoid robot. We present the first comparison of several previous works in a common real-world domain with a user study. One conclusion of the user study was the high potential of RL despite poor usability due to slow learning rate. As an effort to improve the learning efficiency of RL learners, our third contribution is a novel human-agent knowledge transfer algorithm. Using demonstrations from three teachers with varying expertise in a simulated domain, we show that regardless of the skill level, human demonstrations can improve the asymptotic performance of an RL agent. As an alternative approach for encoding human knowledge in RL, we investigated the use of reward shaping. Our final contributions are Static Inverse Reinforcement Learning Shaping and Dynamic Inverse Reinforcement Learning Shaping algorithms that use human demonstrations for recovering a shaping reward function. Our experiments in simulated domains show that our approach outperforms the state-of-the-art in cumulative reward, learning rate and asymptotic performance. Overall we show that human demonstrators with varying skills can help RL agents to learn tasks more efficiently.

Style APA, Harvard, Vancouver, ISO itp.

12

Gao, Yang. "Argumentation accelerated reinforcement learning". Thesis, Imperial College London, 2014. http://hdl.handle.net/10044/1/26603.

Pełny tekst źródła

Streszczenie:

Reinforcement Learning (RL) is a popular statistical Artificial Intelligence (AI) technique for building autonomous agents, but it suffers from the curse of dimensionality: the computational requirement for obtaining the optimal policies grows exponentially with the size of the state space. Integrating heuristics into RL has proven to be an effective approach to combat this curse, but deriving high-quality heuristics from people's (typically conflicting) domain knowledge is challenging, yet it received little research attention. Argumentation theory is a logic-based AI technique well-known for its conflict resolution capability and intuitive appeal. In this thesis, we investigate the integration of argumentation frameworks into RL algorithms, so as to improve the convergence speed of RL algorithms. In particular, we propose a variant of Value-based Argumentation Framework (VAF) to represent domain knowledge and to derive heuristics from this knowledge. We prove that the heuristics derived from this framework can effectively instruct individual learning agents as well as multiple cooperative learning agents. In addition,we propose the Argumentation Accelerated RL (AARL) framework to integrate these heuristics into different RL algorithms via Potential Based Reward Shaping (PBRS) techniques: we use classical PBRS techniques for flat RL (e.g. SARSA(λ)) based AARL, and propose a novel PBRS technique for MAXQ-0, a hierarchical RL (HRL) algorithm, so as to implement HRL based AARL. We empirically test two AARL implementations - SARSA(λ)-based AARL and MAXQ-based AARL - in multiple application domains, including single-agent and multi-agent learning problems. Empirical results indicate that AARL can improve the convergence speed of RL, and can also be easily used by people that have little background in Argumentation and RL.

Style APA, Harvard, Vancouver, ISO itp.

13

Alexander, John W. "Transfer in reinforcement learning". Thesis, University of Aberdeen, 2015. http://digitool.abdn.ac.uk:80/webclient/DeliveryManager?pid=227908.

Pełny tekst źródła

Streszczenie:

The problem of developing skill repertoires autonomously in robotics and artificial intelligence is becoming ever more pressing. Currently, the issues of how to apply prior knowledge to new situations and which knowledge to apply have not been sufficiently studied. We present a transfer setting where a reinforcement learning agent faces multiple problem solving tasks drawn from an unknown generative process, where each task has similar dynamics. The task dynamics are changed by varying in the transition function between states. The tasks are presented sequentially with the latest task presented considered as the target for transfer. We describe two approaches to solving this problem. Firstly we present an algorithm for transfer of the function encoding the stateaction value, defined as value function transfer. This algorithm uses the value function of a source policy to initialise the policy of a target task. We varied the type of basis the algorithm used to approximate the value function. Empirical results in several well known domains showed that the learners benefited from the transfer in the majority of cases. Results also showed that the Radial basis performed better in general than the Fourier. However contrary to expectation the Fourier basis benefited most from the transfer. Secondly, we present an algorithm for learning an informative prior which encodes beliefs about the underlying dynamics shared across all tasks. We call this agent the Informative Prior agent (IP). The prior is learnt though experience and captures the commonalities in the transition dynamics of the domain and allows for a quantification of the agent's uncertainty about these. By using a sparse distribution of the uncertainty in the dynamics as a prior, the IP agent can successfully learn a model of 1) the set of feasible transitions rather than the set of possible transitions, and 2) the likelihood of each of the feasible transitions. Analysis focusing on the accuracy of the learned model showed that IP had a very good accuracy bound, which is expressible in terms of only the permissible error and the diffusion, a factor that describes the concentration of the prior mass around the truth, and which decreases as the number of tasks experienced grows. The empirical evaluation of IP showed that an agent which uses the informative prior outperforms several existing Bayesian reinforcement learning algorithms on tasks with shared structure in a domain where multiple related tasks were presented only once to the learners. IP is a step towards the autonomous acquisition of behaviours in artificial intelligence. IP also provides a contribution towards the analysis of exploration and exploitation in the transfer paradigm.

Style APA, Harvard, Vancouver, ISO itp.

14

Leslie, David S. "Reinforcement learning in games". Thesis, University of Bristol, 2004. http://hdl.handle.net/1983/420b3f4b-a8b3-4a65-be23-6d21f6785364.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

15

Schneider, Markus. "Reinforcement Learning für Laufroboter". [S.l. : s.n.], 2007. http://nbn-resolving.de/urn:nbn:de:bsz:747-opus-344.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

16

Wülfing, Jan [Verfasser], i Martin [Akademischer Betreuer] Riedmiller. "Stable deep reinforcement learning". Freiburg : Universität, 2019. http://d-nb.info/1204826188/34.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

17

Fox, James J. "Differential Treatment and Reinforcement". Digital Commons @ East Tennessee State University, 2015. https://dc.etsu.edu/etsu-works/160.

Pełny tekst źródła

Streszczenie:

Book Summary: A teacher’s ability to manage the classroom strongly influences the quality of teaching and learning that can be accomplished. Among the most pressing concerns for inexperienced teachers is classroom management, a concern of equal importance to the general public in light of behavior problems and breakdowns in discipline that grab newspaper headlines. But classroom management is not just about problems and what to do when things go wrong and chaos erupts. It’s about how to run a classroom so as to elicit the best from even the most courteous group of students. An array of skills is needed to produce such a learning environment. The SAGE Encyclopedia of Classroom Management raises issues and introduces evidence-based, real-world strategies for creating and maintaining well-managed classrooms where learning thrives. Students studying to become teachers will need to develop their own classroom management strategies consistent with their own philosophies of teaching and learning. It is hoped that this work will help open their eyes to the range of issues and the array of skills they might integrate into their unique teaching styles.

Style APA, Harvard, Vancouver, ISO itp.

18

Volovik, Daniel. "Reinforcement in opinion dynamics". Thesis, Boston University, 2013. https://hdl.handle.net/2144/12872.

Pełny tekst źródła

Streszczenie:

Thesis (Ph.D.)--Boston University PLEASE NOTE: Boston University Libraries did not receive an Authorization To Manage form for this thesis or dissertation. It is therefore not openly accessible, though it may be available by request. If you are the author or principal advisor of this work and would like to request open access for it, please contact us at open-help@bu.edu. Thank you.
I consider the evolution and acceptance of a new opinion in a population of unaware agents by using physics-based models of contagion spread. These models rely upon agentbased dynamics, in which an agent changes opinion by interactions with neighbors according to specific interactions. Most of these models have the feature that only a single input is required to change the opinion of an agent - an agent has no commitment to its current opinion and accepts a new idea at the slightest provocation. These single-input models fail to account for people's confidence in their own beliefs. Thus I study the concept of social reinforcement - that an agent adopts a new opinion only after multiple reinforcing prompts. Building on single-input models, I introduce two models of opinion spreading that incorporate a social reinforcement mechanism. (a) In the irreversible innovation and in the transient fad spreading models, a development is initially known only to a small portion of the population and subsequently spreads. An individual requires M > 1 interactions with an adopter before adopting the development. The ultimate extent of a transient fad depends critically on the characteristic time the fad keeps the attention of an adopting agent. (b) In the confident voter model, a voter can be in one of two opinion states and can additionally have two levels of commitment to an opinion: confident and vacillating. Upon interacting with an agent of a different opinion, a confident voter becomes less committed, or vacillating, but does not change opinion. However, a vacillating agent changes opinion by interacting with an agent of a different opinion. In two dimensions, the distribution of consensus times is characterized by two distinct times one that scales linearly with N and another that appears to scale as N^3/2. The longer time arises from configurations that fall into long-lived states that consist of multiple single-opinion stripes before consensus is reached.

Style APA, Harvard, Vancouver, ISO itp.

19

Aguilera, Carolina. "Effects of reinforcement history for following rules on sensitivity to contingencies of reinforcement". Morgantown, W. Va. : [West Virginia University Libraries], 2000. http://etd.wvu.edu/templates/showETD.cfm?recnum=1764.

Pełny tekst źródła

Streszczenie:

Thesis (M.A.)--West Virginia University, 2000.
Title from document title page. Document formatted into pages; contains viii, 64 p. : ill. (some col.). Includes abstract. Includes bibliographical references (p. 54-56).

Style APA, Harvard, Vancouver, ISO itp.

20

Trapp, Nancy L. "The Relative Susceptibilities of Interresponse Times and Post-Reinforcement Pauses to Differential Reinforcement". DigitalCommons@USU, 1987. https://digitalcommons.usu.edu/etd/5971.

Pełny tekst źródła

Streszczenie:

Post-reinforcement pauses (PRP) and interresponse times (IRTs) were examined to determine if these two temporal units changed in a similar fashion as a function of the delivery of differential reinforcement. Two experiments were conducted. In Experiment 1, four pigeons were exposed to a series of procedures in which PRP and IRT durations were gradually increased and then decreased. A fixed-ratio two (FR 2) differentiation schedule was used. Reinforcement was delivered if the PRP or IRT durations were greater than (PRP > and IRT > procedures) or less than (PRP < and IRT < procedures) specified temporal criteria. Criteria were gradually changed across procedures. Results showed that PRPs and IRTs changed in accordance with the differential reinforcement as specified by the various contingencies. When PRPs and IRTs were free to vary, the PRPs tended to change in a direction consistent with the IRT shaping contingencys whereas, the IRTs tended to shorten regardless of the PRP shaping contingency. In Experiment 2, two subjects were exposed to both an FR 2 and FR 1 schedule to determine if schedule size influenced the effects obtained on the differentiation procedures. PRPs were systematically changed using a differentiation procedure with a response requirement of either FR 1 or FR 2. Results showed similar changes in PRP durations between FR 1 and FR 2 differentiation procedures. An analysis of errors made on each shaping condition in both experiments was conducted to determine whether PRPs or IRTs were more susceptible to the differential reinforcement contingencies. Fewer errors were made on the PRP shaping conditions, indicating that PRPs were more easily changed. Implications for a comprehensive theory of reinforcement were discussed.

Style APA, Harvard, Vancouver, ISO itp.

21

Rottmann, Axel [Verfasser], i Wolfram [Akademischer Betreuer] Burgard. "Approaches to online reinforcement learning for miniature airships = Online Reinforcement Learning Verfahren für Miniaturluftschiffe". Freiburg : Universität, 2012. http://d-nb.info/1123473560/34.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

22

Blixt, Rikard, i Anders Ye. "Reinforcement learning AI to Hive". Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-134908.

Pełny tekst źródła

Streszczenie:

This report is about the game Hive, which is a very unique board game. Firstly we cover what Hive is, and then later details on our implementations of it, which issues we ran into during the implementation and how we solved those issues. Also we attempted to make an AI and by using reinforcement learning teaching it to become good at playing Hive. More precisely we used two AI that has no knowledge of Hive other than game rules. This however turned out to be impossible within reasonable timeframe, our estimations is that it would have to run on an upper-end home computer for at least 140 years to become decent at playing the game.
Denna rapport handlar om det unika brädspelet Hive. Rapporten kommer först berätta om vad Hive är och sedan gå in på detalj hur vi implementerar spelet, vad för problem vi stötte på och hur dessa problem löstes. Även så försökte vi göra en AI som lärde sig med hjälp av förstärkningslärning för att bli bra på spelet. Mer exakt så använde vi två AI som inte kunde något alls om Hive förutom spelreglerna. Detta visades vara omöjligt att genomföra inom rimlig tid, vår uppskattning är att det skulle ha tagit en bra stationär hemdator minst 140 år att lära en AI spel Hive på en godtagbar nivå.

Style APA, Harvard, Vancouver, ISO itp.

23

Borgstrand, Richard, i Patrik Servin. "Reinforcement Learning AI till Fightingspel". Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-3113.

Pełny tekst źródła

Streszczenie:

Utförandet av projektet har varit att implementera två stycken fightingspels Artificiell Intelligens (kommer att förkortas AI). En oadaptiv och mer deterministisk AI och en adaptiv dynamisk AI som använder reinforcement learning. Detta har utförts med att skripta beteendet av AI:n i en gratis 2D fightingsspels motor som heter ”MUGEN”. AI:n använder sig utav skriptade sekvenser som utförs med MUGEN’s egna trigger och state system. Detta system kollar om de skriptade specifierade kraven är uppfyllda för AI:n att ska ”trigga”, utföra den bestämda handlingen. Den mer statiska AI:n har blivit uppbyggd med egen skapade sekvenser och regler som utförs delvis situationsmässigt och delvis slumpmässigt. För att försöka uppnå en reinforcement learning AI så har sekvenserna tilldelats en variabel som procentuellt ökar chansen för utförandet av handlingen när handlingen har givit något positivt och det motsatta minskar när handlingen har orsakat något negativt.

Style APA, Harvard, Vancouver, ISO itp.

24

Arnekvist, Isac. "Reinforcement learning for robotic manipulation". Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-216386.

Pełny tekst źródła

Streszczenie:

Reinforcement learning was recently successfully used for real-world robotic manipulation tasks, without the need for human demonstration, usinga normalized advantage function-algorithm (NAF). Limitations on the shape of the advantage function however poses doubts to what kind of policies can be learned using this method. For similar tasks, convolutional neural networks have been used for pose estimation from images taken with fixed position cameras. For some applications however, this might not be a valid assumption. It was also shown that the quality of policies for robotic tasks severely deteriorates from small camera offsets. This thesis investigates the use of NAF for a pushing task with clear multimodal properties. The results are compared with using a deterministic policy with minimal constraints on the Q-function surface. Methods for pose estimation using convolutional neural networks are further investigated, especially with regards to randomly placed cameras with unknown offsets. By defining the coordinate frame of objects with respect to some visible feature, it is hypothesized that relative pose estimation can be accomplished even when the camera is not fixed and the offset is unknown. NAF is successfully implemented to solve a simple reaching task on a real robotic system where data collection is distributed over several robots, and learning is done on a separate server. Using NAF to learn a pushing task fails to converge to a good policy, both on the real robots and in simulation. Deep deterministic policy gradient (DDPG) is instead used in simulation and successfully learns to solve the task. The learned policy is then applied on the real robots and accomplishes to solve the task in the real setting as well. Pose estimation from fixed position camera images is learned and the policy is still able to solve the task using these estimates. By defining a coordinate frame from an object visible to the camera, in this case the robot arm, a neural network learns to regress the pushable objects pose in this frame without the assumption of a fixed camera. However, the precision of the predictions were too inaccurate to be used for solving the pushing task. Further modifications to this approach could however show to be a feasible solution to randomly placed cameras with unknown poses.
Reinforcement learning har nyligen använts framgångsrikt för att lära icke-simulerade robotar uppgifter med hjälp av en normalized advantage function-algoritm (NAF), detta utan att använda mänskliga demonstrationer. Restriktioner på funktionsytorna som använts kan dock visa sig vara problematiska för generalisering till andra uppgifter. För poseestimering har i liknande sammanhang convolutional neural networks använts med bilder från kamera med konstant position. I vissa applikationer kan dock inte kameran garanteras hålla en konstant position och studier har visat att kvaliteten på policys kraftigt förvärras när kameran förflyttas. Denna uppsats undersöker användandet av NAF för att lära in en ”pushing”-uppgift med tydliga multimodala egenskaper. Resultaten jämförs med användandet av en deterministisk policy med minimala restriktioner på Q-funktionsytan. Vidare undersöks användandet av convolutional neural networks för pose-estimering, särskilt med hänsyn till slumpmässigt placerade kameror med okänd placering. Genom att definiera koordinatramen för objekt i förhållande till ett synligt referensobjekt så tros relativ pose-estimering kunna utföras även när kameran är rörlig och förflyttningen är okänd. NAF appliceras i denna uppsats framgångsrikt på enklare problem där datainsamling är distribuerad över flera robotar och inlärning sker på en central server. Vid applicering på ”pushing”- uppgiften misslyckas dock NAF, både vid träning på riktiga robotar och i simulering. Deep deterministic policy gradient (DDPG) appliceras istället på problemet och lär sig framgångsrikt att lösa problemet i simulering. Den inlärda policyn appliceras sedan framgångsrikt på riktiga robotar. Pose-estimering genom att använda en fast kamera implementeras också framgångsrikt. Genom att definiera ett koordinatsystem från ett föremål i bilden med känd position, i detta fall robotarmen, kan andra föremåls positioner beskrivas i denna koordinatram med hjälp av neurala nätverk. Dock så visar sig precisionen vara för låg för att appliceras på robotar. Resultaten visar ändå att denna metod, med ytterligare utökningar och modifikationer, skulle kunna lösa problemet.

Style APA, Harvard, Vancouver, ISO itp.

25

Hengst, Bernhard Computer Science &amp Engineering Faculty of Engineering UNSW. "Discovering hierarchy in reinforcement learning". Awarded by:University of New South Wales. Computer Science and Engineering, 2003. http://handle.unsw.edu.au/1959.4/20497.

Pełny tekst źródła

Streszczenie:

This thesis addresses the open problem of automatically discovering hierarchical structure in reinforcement learning. Current algorithms for reinforcement learning fail to scale as problems become more complex. Many complex environments empirically exhibit hierarchy and can be modeled as interrelated subsystems, each in turn with hierarchic structure. Subsystems are often repetitive in time and space, meaning that they reoccur as components of different tasks or occur multiple times in different circumstances in the environment. A learning agent may sometimes scale to larger problems if it successfully exploits this repetition. Evidence suggests that a bottom up approach that repetitively finds building-blocks at one level of abstraction and uses them as background knowledge at the next level of abstraction, makes learning in many complex environments tractable. An algorithm, called HEXQ, is described that automatically decomposes and solves a multi-dimensional Markov decision problem (MDP) by constructing a multi-level hierarchy of interlinked subtasks without being given the model beforehand. The effectiveness and efficiency of the HEXQ decomposition depends largely on the choice of representation in terms of the variables, their temporal relationship and whether the problem exhibits a type of constrained stochasticity. The algorithm is first developed for stochastic shortest path problems and then extended to infinite horizon problems. The operation of the algorithm is demonstrated using a number of examples including a taxi domain, various navigation tasks, the Towers of Hanoi and a larger sporting problem. The main contributions of the thesis are the automation of (1)decomposition, (2) sub-goal identification, and (3) discovery of hierarchical structure for MDPs with states described by a number of variables or features. It points the way to further scaling opportunities that encompass approximations, partial observability, selective perception, relational representations and planning. The longer term research aim is to train rather than program intelligent agents

Style APA, Harvard, Vancouver, ISO itp.

26

Cleland, Benjamin George. "Reinforcement Learning for Racecar Control". The University of Waikato, 2006. http://hdl.handle.net/10289/2507.

Pełny tekst źródła

Streszczenie:

This thesis investigates the use of reinforcement learning to learn to drive a racecar in the simulated environment of the Robot Automobile Racing Simulator. Real-life race driving is known to be difficult for humans, and expert human drivers use complex sequences of actions. There are a large number of variables, some of which change stochastically and all of which may affect the outcome. This makes driving a promising domain for testing and developing Machine Learning techniques that have the potential to be robust enough to work in the real world. Therefore the principles of the algorithms from this work may be applicable to a range of problems. The investigation starts by finding a suitable data structure to represent the information learnt. This is tested using supervised learning. Reinforcement learning is added and roughly tuned, and the supervised learning is then removed. A simple tabular representation is found satisfactory, and this avoids difficulties with more complex methods and allows the investigation to concentrate on the essentials of learning. Various reward sources are tested and a combination of three are found to produce the best performance. Exploration of the problem space is investigated. Results show exploration is essential but controlling how much is done is also important. It turns out the learning episodes need to be very long and because of this the task needs to be treated as continuous by using discounting to limit the size of the variables stored. Eligibility traces are used with success to make the learning more efficient. The tabular representation is made more compact by hashing and more accurate by using smaller buckets. This slows the learning but produces better driving. The improvement given by a rough form of generalisation indicates the replacement of the tabular method by a function approximator is warranted. These results show reinforcement learning can work within the Robot Automobile Racing Simulator, and lay the foundations for building a more efficient and competitive agent.

Style APA, Harvard, Vancouver, ISO itp.

27

Kim, Min Sub Computer Science &amp Engineering Faculty of Engineering UNSW. "Reinforcement learning by incremental patching". Awarded by:University of New South Wales, 2007. http://handle.unsw.edu.au/1959.4/39716.

Pełny tekst źródła

Streszczenie:

This thesis investigates how an autonomous reinforcement learning agent can improve on an approximate solution by augmenting it with a small patch, which overrides the approximate solution at certain states of the problem. In reinforcement learning, many approximate solutions are smaller and easier to produce than ???flat??? solutions that maintain distinct parameters for each fully enumerated state, but the best solution within the constraints of the approximation may fall well short of global optimality. This thesis proposes that the remaining gap to global optimality can be efficiently minimised by learning a small patch over the approximate solution. In order to improve the agent???s behaviour, algorithms are presented for learning the overriding patch. The patch is grown around particular regions of the problem where the approximate solution is found to be deficient. Two heuristic strategies are proposed for concentrating resources to those areas where inaccuracies in the approximate solution are most costly, drawing a compromise between solution quality and storage requirements. Patching also handles problems with continuous state variables, by two alternative methods: Kuhn triangulation over a fixed discretisation and nearest neighbour interpolation with a variable discretisation. As well as improving the agent???s behaviour, patching is also applied to the agent???s model of the environment. Inaccuracies in the agent???s model of the world are detected by statistical testing, using a selective sampling strategy to limit storage requirements for collecting data. The patching algorithms are demonstrated in several problem domains, illustrating the effectiveness of patching under a wide range of conditions. A scenario drawn from a real-time strategy game demonstrates the ability of patching to handle large complex tasks. These contributions combine to form a general framework for patching over approximate solutions in reinforcement learning. Complex problems cannot be solved by brute force alone, and some form of approximation is necessary to handle large problems. However, this does not mean that the limitations of approximate solutions must be accepted without question. Patching demonstrates one way in which an agent can leverage approximation techniques without losing the ability to handle fine yet important details.

Style APA, Harvard, Vancouver, ISO itp.

28

Patrascu, Relu-Eugen. "Adaptive exploration in reinforcement learning". Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp01/MQ35921.pdf.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

29

Jordan, Andrew R. "Wetpreg Reinforcement of Glulam Beams". Fogler Library, University of Maine, 1998. http://www.library.umaine.edu/theses/pdf/JordanA1998.pdf.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

30

Stig, Fredrik. "3D-woven Reinforcement in Composites". Doctoral thesis, KTH, Lättkonstruktioner, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-70438.

Pełny tekst źródła

Streszczenie:

Composites made from three-dimensional (3D) textile preforms can reduce both the weight and manufacturing cost of advanced composite structures within e.g. aircraft, naval vessels and blades of wind turbines. In this thesis composite beams reinforced with 3D weave are studied, which are intended for use as joining elements in a boltless modular design. In practice, there are a few obstacles on the way to realise the modular boltless design. There is lack of experimental data and more importantly, lack of experience and tools to predict the properties of composites reinforced with 3D-weaves. The novel material will not be accepted and used in engineering applications unless proper design methods are available. The overall aim of this thesis is to remedy these deﬁciencies by generating data, experience and a foundation for the development of adequate design methods. In Paper A, an initial experimental study is presented where the mechanical properties of 3D-weave reinforced composites are compared with corresponding properties of 2D-laminates. The conclusion from Paper A is that the out- of-plane properties are enhanced, while the in-plane stiffness and strength is reduced. In Paper B the inﬂuential crimp parameter is investigated and three analytical models are proposed. The warp yarns exhibit 3D crimp which had a large effect the predicted Young’s modulus as expected. The three models have different levels of detail, and the more sophisticated models generate more reliable predictions. However, the overall trends are consistent for all models. A novel framework for constitutive modelling of composites reinforced with 3D-woven preforms is presented in Papers C and D. The framework enables predictive modelling of both internal architecture and mechanical properties of composites containing 3D textiles using a minimum of input parameters. The result is geometry models which are near authentic with a high level of detail in features compared with real composite specimens. The proposed methodology is therefore the main contribution of this thesis to the field of composite material simulation. Paper E addresses the eﬀect of crimp and diﬀerent textile architectures on the mechanical properties of the ﬁnal composite material. Both stiﬀness and strength decreases non-linearly with increasing crimp. Furthermore specimens containing 3D-woven reinforcement exhibit non-linear stress-strain behaviour in tension, believed to be associated with relatively early onset of matrix shear cracks.

QC 20120131

Style APA, Harvard, Vancouver, ISO itp.

31

Li, Jingxian. "Reinforcement learning using sensorimotor traces". Thesis, University of British Columbia, 2013. http://hdl.handle.net/2429/45590.

Pełny tekst źródła

Streszczenie:

The skilled motions of humans and animals are the result of learning good solutions to difficult sensorimotor control problems. This thesis explores new models for using reinforcement learning to acquire motion skills, with potential applications to computer animation and robotics. Reinforcement learning offers a principled methodology for tackling control problems. However, it is difficult to apply in high-dimensional settings, such as the ones that we wish to explore, where the body can have many degrees of freedom, the environment can have significant complexity, and there can be further redundancies that exist in the sensory representations that are available to perceive the state of the body and the environment. In this context, challenges to overcome include: a state space that cannot be fully explored; the need to model how the state of the body and the perceived state of the environment evolve together over time; and solutions that can work with only a small number of sensorimotor experiences. Our contribution is a reinforcement learning method that implicitly represents the current state of the body and the environment using sensorimotor traces. A distance metric is defined between the ongoing sensorimotor trace and previously experienced sensorimotor traces and this is used to model the current state as a weighted mixture of past experiences. Sensorimotor traces play multiple roles in our method: they provide an embodied representation of the state (and therefore also the value function and the optimal actions), and they provide an embodied model of the system dynamics. In our implementation, we focus specifically on learning steering behaviors for a vehicle driving along straight roads, winding roads, and through intersections. The vehicle is equipped with a set of distance sensors. We apply value-iteration using off-policy experiences in order to produce control policies capable of steering the vehicle in a wide range of circumstances. An experimental analysis is provided of the effect of various design choices. In the future we expect that similar ideas can be applied to other high-dimensional systems, such as bipedal systems that are capable of walking over variable terrain, also driven by control policies based on sensorimotor traces.

Style APA, Harvard, Vancouver, ISO itp.

32

Kwan, Cho Ching Joe. "Geogrid reinforcement of railway ballast". Thesis, University of Nottingham, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.433991.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

33

Chowdhury, Mina Munir-ul Mahmood. "Evolutionary and reinforcement fuzzy control". Thesis, University of Glasgow, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.299747.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

34

Rummery, Gavin Adrian. "Problem solving with reinforcement learning". Thesis, University of Cambridge, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.363828.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

35

Ward-Waller, Elizabeth 1982. "Corrosion resistance of concrete reinforcement". Thesis, Massachusetts Institute of Technology, 2005. http://hdl.handle.net/1721.1/31125.

Pełny tekst źródła

Streszczenie:

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Civil and Environmental Engineering, 2005.
"June 2005."
Includes bibliographical references (leaves 39-40).
The objective of this thesis is to investigate the mechanism of corrosion of steel reinforcement in concrete and epoxy coated reinforcing bars as corrosion resistant alternatives. Several case studies explore the durability and deterioration issues for epoxy-coated bars discovered through 30 years of implementation in reinforced concrete structures. The methods for predicting the end of functional service life for structures reinforced with uncoated reinforcing bars and with epoxy-coated reinforcing bars are detailed and tested in a design problem in the final section of this report.
by Elizabeth Ward-Waller.
M.Eng.

Style APA, Harvard, Vancouver, ISO itp.

36

McCabe, Jonathan Aiden. "Reinforcement learning in virtual reality". Thesis, University of Cambridge, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.608852.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

37

Budhraja, Karan Kumar. "Neuroevolution Based Inverse Reinforcement Learning". Thesis, University of Maryland, Baltimore County, 2016. http://pqdtopen.proquest.com/#viewpdf?dispub=10140581.

Pełny tekst źródła

Streszczenie:

Motivated by such learning in nature, the problem of Learning from Demonstration is targeted at learning to perform tasks based on observed examples. One of the approaches to Learning from Demonstration is Inverse Reinforcement Learning, in which actions are observed to infer rewards. This work combines a feature based state evaluation approach to Inverse Reinforcement Learning with neuroevolution, a paradigm for modifying neural networks based on their performance on a given task. Neural networks are used to learn from a demonstrated expert policy and are evolved to generate a policy similar to the demonstration. The algorithm is discussed and evaluated against competitive feature-based Inverse Reinforcement Learning approaches. At the cost of execution time, neural networks allow for non-linear combinations of features in state evaluations. These valuations may correspond to state value or state reward. This results in better correspondence to observed examples as opposed to using linear combinations. This work also extends existing work on Bayesian Non-Parametric Feature construction for Inverse Reinforcement Learning by using non-linear combinations of intermediate data to improve performance. The algorithm is observed to be specifically suitable for a linearly solvable non-deterministic Markov Decision Processes in which multiple rewards are sparsely scattered in state space. Performance of the algorithm is shown to be limited by parameters used, implying adjustable capability. A conclusive performance hierarchy between evaluated algorithms is constructed.

Style APA, Harvard, Vancouver, ISO itp.

38

Piano, Francesco. "Deep Reinforcement Learning con PyTorch". Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2022. http://amslaurea.unibo.it/25340/.

Pełny tekst źródła

Streszczenie:

Il Reinforcement Learning è un campo di ricerca del Machine Learning in cui la risoluzione di problemi da parte di un agente avviene scegliendo l’azione più idonea da eseguire attraverso un processo di apprendimento iterativo, in un ambiente dinamico che lo incentiva tramite ricompense. Il Deep Learning, anch’esso approccio del Machine Learning, sfruttando una rete neurale artificiale è in grado di applicare metodi di apprendimento per rappresentazione allo scopo di ottenere una struttura dei dati più idonea ad essere elaborata. Solo recentemente il Deep Reinforcement Learning, creato dalla combinazione di questi due paradigmi di apprendimento, ha permesso di risolvere problemi considerati prima intrattabili riscuotendo un notevole successo e rinnovando l’interesse dei ricercatori riguardo l’applicazione degli algoritmi di Reinforcement Learning. Con questa tesi si è voluto approfondire lo studio del Reinforcement Learning applicato a problemi semplici, per poi esaminare come esso possa superare i propri limiti caratteristici attraverso l’utilizzo delle reti neurali artificiali, in modo da essere applicato in un contesto di Deep Learning attraverso l'utilizzo del framework PyTorch, una libreria attualmente molto usata per il calcolo scientifico e il Machine Learning.

Style APA, Harvard, Vancouver, ISO itp.

39

Kozlova, Olga. "Hierarchical and factored reinforcement learning". Paris 6, 2010. http://www.theses.fr/2010PA066196.

Pełny tekst źródła

Streszczenie:

Les méthodes d'apprentissage par renforcement factorisé et hiérarchique (HFRL) sont basées sur le formalisme des processus de décision markoviens factorisées (FMDP) et les MDP hiérarchiques (HMDP). Dans cette thèse, nous proposons une méthode de HFRL qui utilise les approches d’apprentissage par renforcement indirect et le formalisme des options pour résoudre les problèmes de prise de décision dans les environnements dynamiques sans connaissance a priori de la structure du problème. Dans la première contribution de cette thèse, nous montrons comment modéliser les problèmes où certaines combinaisons de variables n’existent pas et nous démontrons les performances de nos algorithmes sur des problèmes jouet classiques dans la littérature, MAZE6 et BLOCKSWORLD, en comparaison avec l’approche standard. La deuxième contribution de cette thèse est la proposition de TeXDYNA, un algorithme pour la résolution de MDP de grande taille dont la structure est inconnue. TeXDYNA décompose hiérarchiquement le FMDP sur la base de la découverte automatique des sous-tâches directement à partir de la structure du problème qui est elle même apprise en interaction avec l’environnement. Nous évaluons TeXDYNA sur deux benchmarks, à savoir les problèmes TAXI et LIGHTBOX. Finalement, nous estimons le potentiel et les limitations de TeXDYNA sur un problème jouet plus représentatif du domaine de la simulation industrielle.

Style APA, Harvard, Vancouver, ISO itp.

40

Blows, Curtly. "Reinforcement learning for telescope optimisation". Master's thesis, Faculty of Science, 2019. http://hdl.handle.net/11427/31352.

Pełny tekst źródła

Streszczenie:

Reinforcement learning is a relatively new and unexplored branch of machine learning with a wide variety of applications. This study investigates reinforcement learning and provides an overview of its application to a variety of different problems. We then explore the possible use of reinforcement learning for telescope target selection and scheduling in astronomy with the hope of effectively mimicking the choices made by professional astronomers. This is relevant as next-generation astronomy surveys will require near realtime decision making in response to high-speed transient discoveries. We experiment with and apply some of the leading approaches in reinforcement learning to simplified models of the target selection problem. We find that the methods used in this study show promise but do not generalise well. Hence while there are indications that reinforcement learning algorithms could work, more sophisticated algorithms and simulations are needed.

Style APA, Harvard, Vancouver, ISO itp.

41

Stigenberg, Jakob. "Scheduling using Deep Reinforcement Learning". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-284506.

Pełny tekst źródła

Streszczenie:

As radio networks have continued to evolve in recent decades, so have theircomplexity and the difficulty in efficiently utilizing the available resources. Ina cellular network, the scheduler controls the allocation of time, frequencyand spatial resources to users in both uplink and downlink directions. Thescheduler is therefore a key component in terms of efficient usage of networkresources. Although the scope and characteristics of available resources forschedulers are well defined in network standards, e.g. Long-Term Evolutionor New Radio, its real implementation is not. Most previous work focus onconstructing heuristics, based on metrics such as Quality of Service (QoS)classes, channel quality and delay, from which packets are then sorted andscheduled. In this work, a new approach to time domain scheduling using reinforcementlearning is presented. The proposed algorithm leverages modelfreereinforcement learning in order to treat the frequency domain scheduleras a black box. The proposed algorithm uses end-to-end learning and considersall packets, including control packets such as scheduling requests and CSIreports. Using a Deep Q-Network, the algorithm was evaluated in a settingwith multiple delay sensitive VoiP users and one best effort user. Compared toa priority based scheduler, the agent was able to improve total cell throughputby 20:5%, 23:5%, and 16:2% in the 10th, 50th, and 90th percentiles, respectively,while simultaneously reducing the VoiP packet delay by 29:6%, thusimproving QoS.
I takt med radionätverks fortsatta utveckling under de senaste decenniernahar även komplexiteten och svårigheten i att effektivt utnyttja de tillgängligaresurserna ökat. I varje trådlöst nätverk finns en schemaläggare som styrtrafikflödet genom nätverket. Schemaläggaren är därmed en nyckelkomponentnär det kommer till att effektivt utnyttja de tillgängliga nätverksresurserna. Ien given nätverkspecifikation, t.ex. Long-Term Evoluation eller New Radio,är det givet vilka möjligheter till allokering som schemaläggaren kan använda.Hur schemaläggaren utnyttjar dessa möjligheter, det vill säga implementationenav schemaläggaren, är helt upp till varje enskild tillverkare. I tidigarearbete har fokus främst legat på att manuellt definera sorteringsvikter baseratpå, bland annat, Quality of Service (QoS) -klass, kanalkvalitet och fördröjning.Nätverkspaket skickas sedan givet viktordningen. I detta examensarbetepresenteras en ny metod för schemaläggning baserat på förstärkande inlärning.Metoden hanterar resursallokeraren som en svart låda och lär sig denbästa sorteringen direkt från indata (end-to-end) och hanterar även kontrollpaket.Ramverket utvärderades med ett Deep Q-Network i ett scenario medflera fördröjningskänsliga röstanvändare tillsammans med en (oändligt) storfilnedladdning. Algoritmen lärde sig att minska mängden försenade röstpaket,alltså öka QoS, med 29.6% samtidigt som den ökade total överföringshastighetmed 20.5, 23.5 och 16.2% i den 10:e, 50:e samt 90:e kvantilen.

Style APA, Harvard, Vancouver, ISO itp.

42

Khouly, Mohamed A. "Analysis of soil-reinforcement interaction /". The Ohio State University, 1995. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487863429092366.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

43

Jesu, Alberto. "Reinforcement learning over encrypted data". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/23257/.

Pełny tekst źródła

Streszczenie:

Reinforcement learning is a particular paradigm of machine learning that, recently, has proved times and times again to be a very effective and powerful approach. On the other hand, cryptography usually takes the opposite direction. While machine learning aims at analyzing data, cryptography aims at maintaining its privacy by hiding such data. However, the two techniques can be jointly used to create privacy preserving models, able to make inferences on the data without leaking sensitive information. Despite the numerous amount of studies performed on machine learning and cryptography, reinforcement learning in particular has never been applied to such cases before. Being able to successfully make use of reinforcement learning in an encrypted scenario would allow us to create an agent that efficiently controls a system without providing it with full knowledge of the environment it is operating in, leading the way to many possible use cases. Therefore, we have decided to apply the reinforcement learning paradigm to encrypted data. In this project we have applied one of the most well-known reinforcement learning algorithms, called Deep Q-Learning, to simple simulated environments and studied how the encryption affects the training performance of the agent, in order to see if it is still able to learn how to behave even when the input data is no longer readable by humans. The results of this work highlight that the agent is still able to learn with no issues whatsoever in small state spaces with non-secure encryptions, like AES in ECB mode. For fixed environments, it is also able to reach a suboptimal solution even in the presence of secure modes, like AES in CBC mode, showing a significant improvement with respect to a random agent; however, its ability to generalize in stochastic environments or big state spaces suffers greatly.

Style APA, Harvard, Vancouver, ISO itp.

44

Suggs, Sterling. "Reinforcement Learning with Auxiliary Memory". BYU ScholarsArchive, 2021. https://scholarsarchive.byu.edu/etd/9028.

Pełny tekst źródła

Streszczenie:

Deep reinforcement learning algorithms typically require vast amounts of data to train to a useful level of performance. Each time new data is encountered, the network must inefficiently update all of its parameters. Auxiliary memory units can help deep neural networks train more efficiently by separating computation from storage, and providing a means to rapidly store and retrieve precise information. We present four deep reinforcement learning models augmented with external memory, and benchmark their performance on ten tasks from the Arcade Learning Environment. Our discussion and insights will be helpful for future RL researchers developing their own memory agents.

Style APA, Harvard, Vancouver, ISO itp.

45

Skarvelas, Georgios Aristeidis. "Reinforcement and Bonded Block Modelling". Thesis, Luleå tekniska universitet, Institutionen för samhällsbyggnad och naturresurser, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-85984.

Pełny tekst źródła

Streszczenie:

The objective of this master’s thesis is to evaluate the use of Bonded Block Modelling (BBM) in 3DEC software combined with hybrid rock bolts, for three different cases. These cases included the laboratory rock bolt case, the shearing case and the blocky rock mass case. 3DEC is a Distinct Element Method (DEM) numerical software which can be used to simulate both continuum and discontinuum media in 3D. The Bonded Block Model in 3DEC can be used to simulate a rock mass as bonded polyhedral elements. The BBM is a relatively new numerical modelling technique. Earlier studies have focused mainly on laboratory test cases and less on field scale studies. The laboratory rock bolt test was introduced by Hoek and the main idea was to describe the way that rock bolts work. Four different rock bolt spacing designs were simulated and one unsupported model, in order to validate Hoek’s results. The diameter of the blocks was 15 cm while the zones were modelled with length of 5 cm. The tunnel on the shearing case was excavated at the depth of 1500 m. For the stress field, the in-situ stresses of Kiirunavaara mine were considered. The tunnel on the blocky case was excavated at the depth of 30 m and a gravitational stress field was assumed. The shearing model as well as the blocky model, were simulated on a quasi-3D model. The zone length for both cases was 0.1 m. In both cases, a discontinuum non-BBM was modelled first and then, a discontinuum BBM with different rock UCS values was simulated. The discontinuum BBM on the shearing case was simulated for rock UCS of 200, 100, and 50 MPa, while on the blocky case, it was simulated for rock UCS of 50 MPa. The Mohr – Coulomb constitutive model was selected for all three modelling cases. The conclusions of this work were the following: – The laboratory rock bolt model validated the results of Hoek. Hoek suggested that rock bolt spacing less than three times the average rock piece diameter would be sufficient to produce positive results. The stabilization of the rock pieces as well as the forming of the compression zone were achieved when this equation was satisfied. The geometry of the stabilized material as well as the compression zone, were also correct. – The discontinuum BBM on the shearing case with intact rock UCS of 200 MPa, produced similar results as the discontinuum non-BBM. This indicates that BBM can be applied for these cases and produce reliable results. The displacement of the fault was expected to be higher than the resulting values. The discontinuum BBM with reduced rock strength (100 MPa and 50 MPa) resulted in rock mass fragmentation. However, the fragmented rock pieces did not detach from the rock mass as the displacement values were not high enough. – The discontinuum BBM on the blocky case with intact rock UCS of 50 MPa, produced similar results as the discontinuum non-BBM. There were two discontinuities that affected the smooth transition of the displacement/stress results on the different blocks. The fragmentation of the rock mass due to the existence of the discontinuities did not produce any further rock mass movements. – The interaction between rock mass and rock bolts was evident in any modelling case. For the laboratory rock bolt model, the hybrid bolts design was vital for producing correct results. For the shearing model, the hybrid bolts were subjected to shearing movements due to fault movements. In the blocky model, the bolts in the roof of the tunnel were subjected to axial displacements, due to the existence of blocks. The recommendations for further work were the following: – The hybrid bolts in the laboratory rock bolt test were pretensioned only in the beginning of the computation phase. In reality, the tensioned bolts act at every moment and not only in the beginning. However, it would be interesting to see if the results are similar with continuously tensioned hybrid bolts. It is anticipated that the constantly tensioned hybrid bolts should be able to keep the compressive zones with high values throughout the whole cycling process. Thus, it is suggested for future modellers that this case could be modelled with continuously tensioned hybrid bolts. – The installation of rock bolts in the shear case as well as in the blocky case, was at the exact same time as the tunnel was excavated. This is not realistic fact because it is impossible to install the rock bolts exactly the same time as the tunnel excavated. Thus, it is suggested that those two cases could be modelled in the future with more focus on the stress relaxation factor.

Style APA, Harvard, Vancouver, ISO itp.

46

Liu, Chong. "Reinforcement learning with time perception". Thesis, University of Manchester, 2012. https://www.research.manchester.ac.uk/portal/en/theses/reinforcement-learning-with-time-perception(a03580bd-2dd6-4172-a061-90e8ac3022b8).html.

Pełny tekst źródła

Streszczenie:

Classical value estimation reinforcement learning algorithms do not perform very well in dynamic environments. On the other hand, the reinforcement learning of animals is quite flexible: they can adapt to dynamic environments very quickly and deal with noisy inputs very effectively. One feature that may contribute to animals' good performance in dynamic environments is that they learn and perceive the time to reward. In this research, we attempt to learn and perceive the time to reward and explore situations where the learned time information can be used to improve the performance of the learning agent in dynamic environments. The type of dynamic environments that we are interested in is that type of switching environment which stays the same for a long time, then changes abruptly, and then holds for a long time before another change. The type of dynamics that we mainly focus on is the time to reward, though we also extend the ideas to learning and perceiving other criteria of optimality, e.g. the discounted return, so that they can still work even when the amount of reward may also change. Specifically, both the mean and variance of the time to reward are learned and then used to detect changes in the environment and to decide whether the agent should give up a suboptimal action. When a change in the environment is detected, the learning agent responds specifically to the change in order to recover quickly from it. When it is found that the current action is still worse than the optimal one, the agent gives up this time's exploration of the action and then remakes its decision in order to avoid longer than necessary exploration. The results of our experiments using two real-world problems show that they have effectively sped up learning, reduced the time taken to recover from environmental changes, and improved the performance of the agent after the learning converges in most of the test cases compared with classical value estimation reinforcement learning algorithms. In addition, we have successfully used spiking neurons to implement various phenomena of classical conditioning, the simplest form of animal reinforcement learning in dynamic environments, and also pointed out a possible implementation of instrumental conditioning and general reinforcement learning using similar models.

Style APA, Harvard, Vancouver, ISO itp.

47

Tluk, von Toschanowitz Katharina. "Relevance determination in reinforcement learning". Tönning Lübeck Marburg Der Andere Verl, 2009. http://d-nb.info/993341128/04.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

48

Brinegar, Jennifer Lynn. "Self-control with running reinforcement". Diss., [Missoula, Mont.] : The University of Montana, 2007. http://etd.lib.umt.edu/theses/available/etd-01042008-104048/.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

49

Bonneau, Maxime. "Reinforcement Learning for 5G Handover". Thesis, Linköpings universitet, Statistik och maskininlärning, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-140816.

Pełny tekst źródła

Streszczenie:

The development of the 5G network is in progress, and one part of the process that needs to be optimised is the handover. This operation, consisting of changing the base station (BS) providing data to a user equipment (UE), needs to be efficient enough to be a seamless operation. From the BS point of view, this operation should be as economical as possible, while satisfying the UE needs. In this thesis, the problem of 5G handover has been addressed, and the chosen tool to solve this problem is reinforcement learning. A review of the different methods proposed by reinforcement learning led to the restricted field of model-free, off-policy methods, more specifically the Q-Learning algorithm. On its basic form, and used with simulated data, this method allows to get information on which kind of reward and which kinds of action-space and state-space produce good results. However, despite working on some restricted datasets, this algorithm does not scale well due to lengthy computation times. It means that the agent trained can not use a lot of data for its learning process, and both state-space and action-space can not be extended a lot, restricting the use of the basic Q-Learning algorithm to discrete variables. Since the strength of the signal (RSRP), which is of high interest to match the UE needs, is a continuous variable, a continuous form of the Q-learning needs to be used. A function approximation method is then investigated, namely artificial neural networks. In addition to the lengthy computational time, the results obtained are not convincing yet. Thus, despite some interesting results obtained from the basic form of the Q-Learning algorithm, the extension to the continuous case has not been successful. Moreover, the computation times make the use of reinforcement learning applicable in our domain only for really powerful computers.

Style APA, Harvard, Vancouver, ISO itp.

50

Round, Thomas. "Representation-Reinforcement and Australian Constitutionalism". Thesis, Griffith University, 2002. http://hdl.handle.net/10072/367951.

Pełny tekst źródła

Streszczenie:

Constitutional theory in Australia, as in the USA and other liberal democracies, is contested by rival views of the proper roles of courts and legislatures. Simple adherence to the literal text of the Constitution or the original intentions of its framers is inadequate to protect against unjust actions by legislative and executive officials (the raison d'Ã©tre of an entrenched Constitution) when these appear in novel guises. But empowering judges to strike down laws they consider 'unjust' risks sacrificing democratic self-government, and the process can undercut the very goal (equal respect for all citizens) that it is supposed to ensure as an outcome. American theorists of 'representation-reinforcing' or 'process-policing' judicial review - outlined by Justice Harlan Stone in US v Carolene Products (1938), then elaborated by Professor John Hart Ely in Democracy and Distrust (1980) - offer a solution. Representation-reinforcement opposes judicial activism except on two grounds. The first is protecting majority rule, invalidating laws that entrench those in power against opposition or removal. The second is protecting minority rights, by invalidating laws motivated by prejudice that discriminate against unpopular groups. Constitutional courts should avoid dictating substantive policy outcomes, lest this undermine democracy. Instead, judges should concentrate on 'reinforcing representation' - on ensuring that political processes function properly, producing decisions that have maximum popular support. Many US constitutional scholars have criticised Ely's theory. But even so, representation-reinforcement remains a promising doctrine for Australia to adopt. Ely's American critics disagree even more with each other than with Ely, and most of their criticisms carry weight only in the USA's rights-based, individualistic context. Australia's Benthamite culture of majoritarian constitutionalism is more receptive to representation-reinforcement. And most other criticisms of Ely can be answered by revising, instead of abandoning, the concept of process-policing judicial review.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Politics and Public Policy
Arts, Education and Law
Full Text

Style APA, Harvard, Vancouver, ISO itp.

Rozprawy doktorskie na temat „Reinforcement”

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych