Dissertations / Theses on the topic 'Policy gradient'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 49 dissertations / theses for your research on the topic 'Policy gradient.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Jacobzon, Gustaf, and Martin Larsson. "Generalizing Deep Deterministic Policy Gradient." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-239365.
Full textGreensmith, Evan, and evan greensmith@gmail com. "Policy Gradient Methods: Variance Reduction and Stochastic Convergence." The Australian National University. Research School of Information Sciences and Engineering, 2005. http://thesis.anu.edu.au./public/adt-ANU20060106.193712.
Full textGreensmith, Evan. "Policy gradient methods : variance reduction and stochastic convergence /." View thesis entry in Australian Digital Theses Program, 2005. http://thesis.anu.edu.au/public/adt-ANU20060106.193712/index.html.
Full textAberdeen, Douglas Alexander, and doug aberdeen@anu edu au. "Policy-Gradient Algorithms for Partially Observable Markov Decision Processes." The Australian National University. Research School of Information Sciences and Engineering, 2003. http://thesis.anu.edu.au./public/adt-ANU20030410.111006.
Full textAberdeen, Douglas Alexander. "Policy-gradient algorithms for partially observable Markov decision processes /." View thesis entry in Australian Digital Theses Program, 2003. http://thesis.anu.edu.au/public/adt-ANU20030410.111006/index.html.
Full textLidström, Christian, and Hannes Leskelä. "Learning for RoboCup Soccer : Policy Gradient Reinforcement Learning inmulti-agent systems." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-157469.
Full textRobo Cup Soccer är en årlig världsomspännande robotiktävling, i vilken lag av autonoma robotagenter spelar fotboll mot varandra. Denna rapport fokuserar på 2D-simulatorn, vilken är en variant där inga riktiga robotar behövs, utan där spelarklienterna istället kommunicerar med en server vilken håller reda på speltillståndet. RoboCup Soccer 2D simulation har blivit ett stort ämne för forskning inom articiell intelligens, samarbete och beteende i multi-agent-system, och lärandet därav. Någon form av maskininlärning är ett krav om man villkunna tävla på den högsta nivån, då problemet är för komplext för att beslutsfattandet ska kunna programmeras manuellt.Denna rapport finner att PGRL är en vanlig metod för maskininlärning i Robo Cup-lag, den används inom några av de bästa lagen i Robo Cup. Rapporten nner också att PGRL är en effektiv form av maskininlärningn är det gäller inlärningshastighet, men att det finns många faktorer som kan påverka detta. Oftast måste en avvägning ske mellan inlärningshastighet och precision.
GAVELLI, VIKTOR, and ALEXANDER GOMEZ. "Multi-agent system with Policy Gradient Reinforcement Learning for RoboCup Soccer Simulator." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-157418.
Full textRoboCup Soccer Simulator är en multiagent fotbollssimulator som används i tävlingar för att simulera robotar som spelar fotboll. Dessa tävlingar hålls huvudsakligen för att marknadsföra forskning inom robotik och articiell intelligens genom att tillhandahålla ett billigt och lättillgängligt sätt att programmera robotlika agenter. I denna rapportbeskrivs och testas en implementation av ett multiagentfotbollslag. PolicyGradiend Reinforcement Learning (PGRL) används för att träna ochförändra lagets beteende. Resultaten visar att PGRL förbättrar lagets prestanda, men närlagets prestanda skiljer sig avsevärt från motståndarens blir resultatetofullständigt.3
Pianazzi, Enrico. "A deep reinforcement learning approach based on policy gradient for mobile robot navigation." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2022.
Find full textPoulin, Nolan. "Proactive Planning through Active Policy Inference in Stochastic Environments." Digital WPI, 2018. https://digitalcommons.wpi.edu/etd-theses/1267.
Full textFleming, Brian James. "The social gradient in health : trends in C20th ideas, Australian Health Policy 1970-1998, and a health equity policy evaluation of Australian aged care planning /." Title page, abstract and table of contents only, 2003. http://web4.library.adelaide.edu.au/theses/09PH/09phf5971.pdf.
Full textBjörnberg, Adam, and Haris Poljo. "Impact of observation noise and reward sparseness on Deep Deterministic Policy Gradient when applied to inverted pendulum stabilization." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-259758.
Full textDjupa Reinforcement Learning (RL) algoritmer har visat sig kunna lösa komplexa problem. Deep Deterministic Policy Gradient (DDPG) är en modern djup RL algoritm som kan hantera miljöer med kontinuerliga åtgärdsutrymmen. Denna studie utvärderar hur DDPG-algoritmen presterar med avseende på lösningsgrad och resultat beroende på observationsbrus och belöningsgles-het i en enkel miljö. Ett tröskelvärde för hur mycket gaussiskt brus som kan läggas på observationer innan algoritmens prestanda börjar minska hittades mellan en standardavvikelse på 0,025 och 0,05. Det drogs även slutsatsen att belöningsgleshet leder till inkonsekventa resultat och oreproducerbarhet, vilket visar vikten av en väl utformad belöningsfunktion. Ytterligare tester krävs för att grundligt utvärdera effekten av att kombinera brusiga observationer och glesa belöningssignaler.
Tagesson, Dennis. "A Comparison Between Deep Q-learning and Deep Deterministic Policy Gradient for an Autonomous Drone in a Simulated Environment." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-55134.
Full textOlafsson, Björgvin. "Partially Observable Markov Decision Processes for Faster Object Recognition." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-198632.
Full textKaisaravalli, Bhojraj Gokul, and Yeswanth Surya Achyut Markonda. "Policy-based Reinforcement learning control for window opening and closing in an office building." Thesis, Högskolan Dalarna, Mikrodataanalys, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:du-34420.
Full textCox, Carissa. "Spatial Patterns in Development Regulation: Tree Preservation Ordinances of the DFW Metropolitan Area." Thesis, University of North Texas, 2011. https://digital.library.unt.edu/ark:/67531/metadc84194/.
Full textMcDowell, Journey. "Comparison of Modern Controls and Reinforcement Learning for Robust Control of Autonomously Backing Up Tractor-Trailers to Loading Docks." DigitalCommons@CalPoly, 2019. https://digitalcommons.calpoly.edu/theses/2100.
Full textMichaud, Brianna. "A Habitat Analysis of Estuarine Fishes and Invertebrates, with Observations on the Effects of Habitat-Factor Resolution." Scholar Commons, 2016. http://scholarcommons.usf.edu/etd/6543.
Full textOlsson, Anton, and Felix Rosberg. "Domain Transfer for End-to-end Reinforcement Learning." Thesis, Högskolan i Halmstad, Akademin för informationsteknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-43042.
Full textCrowley, Mark. "Equilibrium policy gradients for spatiotemporal planning." Thesis, University of British Columbia, 2011. http://hdl.handle.net/2429/38971.
Full textAklil, Nassim. "Apprentissage actif sous contrainte de budget en robotique et en neurosciences computationnelles. Localisation robotique et modélisation comportementale en environnement non stationnaire." Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066225/document.
Full textDecision-making is a highly researched field in science, be it in neuroscience to understand the processes underlying animal decision-making, or in robotics to model efficient and rapid decision-making processes in real environments. In neuroscience, this problem is resolved online with sequential decision-making models based on reinforcement learning. In robotics, the primary objective is efficiency, in order to be deployed in real environments. However, in robotics what can be called the budget and which concerns the limitations inherent to the hardware, such as computation times, limited actions available to the robot or the lifetime of the robot battery, are often not taken into account at the present time. We propose in this thesis to introduce the notion of budget as an explicit constraint in the robotic learning processes applied to a localization task by implementing a model based on work developed in statistical learning that processes data under explicit constraints, limiting the input of data or imposing a more explicit time constraint. In order to discuss an online functioning of this type of budgeted learning algorithms, we also discuss some possible inspirations that could be taken on the side of computational neuroscience. In this context, the alternation between information retrieval for location and the decision to move for a robot may be indirectly linked to the notion of exploration-exploitation compromise. We present our contribution to the modeling of this compromise in animals in a non-stationary task involving different levels of uncertainty, and we make the link with the methods of multi-armed bandits
Nilsson, Anna-Maria, and Malin Björk. "Interpretation and Grading in the Current Grading System." Thesis, Malmö högskola, Lärarutbildningen (LUT), 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-29798.
Full textThis dissertation deals with the goal- and criterion-referenced grading system in use in Swedish schools today. We have chosen to investigate how teachers perceive the current grading system and what challenges they are faced with when grading in the English subject. The interest for this topic was deepened during our final in-school-practices after which we discussed the issue with each other and came to the conclusion that the grading system would be useful to delve into in order to feel more secure when leaving the teacher training college. The current grading system is debated in schools on a daily basis since it is a tool for teachers to work with. Although the teachers give the impression of still having difficulties with the current grading system, the results show that the majority of the interviewees have grasped the system. It rather seems as if the difficulty is how to interpret different policy documents due to the fact that the goals and the criteria are of a general nature at times. A general opinion among most of the teachers is that they do not have difficulties with the grading itself in the current grading system. But they do, however, request further grading steps in order to be better able to explain where the students are on the grading scale. Moreover, we concluded that the teachers believe the system to benefit the students as well as themselves. Also, that it is necessary to continuously have discussions concerning the current grading system so as to better understand what it entails.
Henry, Dawn Therese. "Standards-based Grading: The Effect of Common Grading Criteria on Academic Growth." Bowling Green State University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1522846892709392.
Full textSehnke, Frank [Verfasser], Patrick van der [Akademischer Betreuer] Smagt, and Jürgen [Akademischer Betreuer] Schmidhuber. "Parameter Exploring Policy Gradients and their Implications / Frank Sehnke. Gutachter: Jürgen Schmidhuber. Betreuer: Patrick van der Smagt." München : Universitätsbibliothek der TU München, 2012. http://d-nb.info/1030099820/34.
Full textDennis, Janelle. "No-Zero Policy in Middle School: A Comparison of High School Student Achievement." ScholarWorks, 2018. https://scholarworks.waldenu.edu/dissertations/5694.
Full textTolman, Deborah A. "Environmental Gradients, Community Boundaries, and Disturbance the Darlingtonia Fens of Southwestern Oregon." PDXScholar, 2004. https://pdxscholar.library.pdx.edu/open_access_etds/3013.
Full textDe, Larkin Christian Martin II. "A Study of Teacher-Buy-In and Grading Policy Reform in a Los Angeles Archdiocesan Catholic High School." Thesis, Loyola Marymount University, 2013. http://pqdtopen.proquest.com/#viewpdf?dispub=3597221.
Full textThis study examined the construct of teacher buy-in (TBI) during a grading policy reform effort in a high school. The purpose of this study was to identify and describe teachers' perceived value to the grading reform. Additionally, the researcher studied teacher behavior by identifying the teachers' actual practice of the policy. The study finally compared the identified reported values of the participants with their actual grading practices to determine the convergence of values and practice.
The research provided empirical evidence for a new way to study TBI and its relationship to a reform implementation. This study addressed a school-site policy reform effort and described TBI contributing to, and perhaps challenging, current practices in school reform and teacher grading policies. This study described the extent to which teacher bought into the grading policies and provided a framework for studying TBI and grading policies in the context of Standards-Based Reform in the future. The findings and discussion highlight how grading policies are a critical element of the student evaluation process in the increasing movement towards national learning standards and testing.
De, Larkin Christian Martín II. "A Study of Teacher-Buy-In and Grading Policy Reform in a Los Angeles Archdiocesan Catholic High School." Digital Commons at Loyola Marymount University and Loyola Law School, 2013. https://digitalcommons.lmu.edu/etd/220.
Full textWolford, Walter Paul. "Policy and Practice Concerning Essay-Grading Criteria in Developmental English and College-Level English Programs in Tennessee Community Colleges." Digital Commons @ East Tennessee State University, 2000. https://dc.etsu.edu/etd/4.
Full textSouter, Dawn Hopkins. "The Nature of Feedback Provided to Elementary Students in Classrooms where Grading and Reporting are Standards-Based." Digital Archive @ GSU, 2009. http://digitalarchive.gsu.edu/eps_diss/62.
Full textRice, William Robertson. "Subjectivity in grading: The role individual subjectivity plays in assigning grades." Miami University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=miami1623317108089967.
Full textHaley, James. "To Curve or Not to Curve? The Effect of College Science Grading Policies on Implicit Theories of Intelligence, Perceived Classroom Goal Structures, and Self-efficacy." Thesis, Boston College, 2015. http://hdl.handle.net/2345/bc-ir:104165.
Full textThere is currently a shortage of students graduating with STEM (science, technology, engineering, or mathematics) degrees, particularly women and students of color. Approximately half of students who begin a STEM major eventually switch out. Many switchers cite the competitiveness, grading curves, and weed-out culture of introductory STEM classes as reasons for the switch. Variables known to influence resilience include a student's implicit theory of intelligence and achievement goal orientation. Incremental theory (belief that intelligence is malleable) and mastery goals (pursuit of increased competence) are more adaptive in challenging classroom contexts. This dissertation investigates the role that college science grading policies and messages about the importance of effort play in shaping both implicit theories and achievement goal orientation. College students (N = 425) were randomly assigned to read one of three grading scenarios: (1) a "mastery" scenario, which used criterion-referenced grading, permitted tests to be retaken, and included a strong effort message; (2) a "norm" scenario, which used norm-referenced grading (grading on the curve); or (3) an "effort" scenario, which combined a strong effort message with the norm-referenced policies. The dependent variables included implicit theories of intelligence, perceived classroom goal structure, and self-efficacy. A different sample of students (N = 15) were randomly assigned a scenario to read, asked to verbalize their thoughts, and responded to questions in a semi-structured interview. Results showed that students reading the mastery scenario were more likely to endorse an incremental theory of intelligence, perceived greater mastery goal structure, and had higher self-efficacy. The effort message had no effect on self-efficacy, implicit theory, and most of the goal structure measures. The interviews revealed that it was the retake policy in the mastery scenario and the competitive atmosphere in the norm-referenced scenarios that were likely driving the results. Competitive grading policies appear to be incompatible with mastery goals, cooperative learning, and a belief in the efficacy of effort. Implications for college STEM instruction are discussed
Thesis (PhD) — Boston College, 2015
Submitted to: Boston College. Lynch School of Education
Discipline: Teacher Education, Special Education, Curriculum and Instruction
Masoudi, Mohammad Amin. "Robust Deep Reinforcement Learning for Portfolio Management." Thesis, Université d'Ottawa / University of Ottawa, 2021. http://hdl.handle.net/10393/42743.
Full textКовальов, Костянтин Миколайович. "Комп'ютерна система управління промисловим роботом." Bachelor's thesis, КПІ ім. Ігоря Сікорського, 2019. https://ela.kpi.ua/handle/123456789/28610.
Full textQualifying work includes an explanatory note (56 p., 2 appendix). The object of the study are reinforcement learning algorithms for the task of an industrial robotic arm control. Continuous control of an industrial robotic arm for non-trivial tasks is too complicated or even unsolvable for classical methods of robotics. Reinforcement learning methods can be used in this case. They are quite simple to implement, allow for generalization to unseen cases, and learn from high-dimensional data. We implement deep deterministic policy gradient algorithm that is suitable for complex continuous contol tasks. During the study: • An analysis of existing classical methods for the problem of industrial robot control was conducted • An analysis of existing algorithms of training with reinforcement learning and their use in the field of robotics has been conducted • Deep deterministic policy gradient algorithm is implemented • Implemented algorithm is tested on a simplified environment • The architecture of the neural network is proposed for solving the problem • Algorithm was tested on the training set of objects • Algorithm was tested for its generalization ability on the test set It was shown that deep deterministic policy gradient algorithm with neural network as policy approximator is able to solve the problem with the image as an input and to generalize to objects not seen before.
Helmér, Henrik. "Närbyråkraters individuella handlingsutrymme : Lärares handlingslogiker vid myndighetsutövning i form av bedömning och betygsättning." Thesis, Linköpings universitet, Statsvetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-166307.
Full textSu, Xiaoshan. "Three Essays on the Design, Pricing, and Hedging of Insurance Contracts." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSE2065.
Full textThis thesis makes use of some theoretical tools in finance, decision theory, machine learning, to improve the design, pricing and hedging of insurance contracts. Chapter 3 develops closed-form pricing formulas for participating life insurance contracts, based on matrix Wiener-Hopf factorization, where multiple risk sources, such as credit, market, and economic risks, are considered. The pricing method proves to be accurate and efficient. The dynamic and semi-static hedging strategies are introduced to assist insurance company to reduce risk exposure arising from the issue of participating contracts. Chapter 4 discusses the optimal contract design when the insured is third degree risk averse. The results showthat dual limited stop-loss, change-loss, dual change-loss, and stop-loss can be optimal contracts favord by both of risk averters and risk lovers in different settings. Chapter 5 develops a stochastic gradient boosting frequency-severity model, which improves the important and popular GLM and GAM frequency-severity models. This model fully inherits advantages ofgradient boosting algorithm, overcoming the restrictive linear or additive forms of the GLM and GAM frequency-severity models, through learning the model structure from data. Further, our model can also capture the flexible nonlinear dependence between claim frequency and severity
Senate, University of Arizona Faculty. "Faculty Senate Minutes March 6, 2017." University of Arizona Faculty Senate (Tucson, AZ), 2017. http://hdl.handle.net/10150/623059.
Full textCai, Bo-Yin, and 蔡博胤. "A Behavior Fusion Approach Based on Policy Gradient." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/u6ctx3.
Full text國立中山大學
電機工程學系研究所
107
In this study, we propose a behavioral fusion algorithm based on policy gradient. We use Actor-Critic algorithm to train sub-tasks. After the training is completed, the behavior fusion algorithm proposed in this paper is used for the learning of complex tasks. We can know the state value function of each sub-task in each state by reading the trained sub-task neural network, then calculate the return of each sub-task, and then pass the normalized return to the behavior fusion algorithm as a policy gradient. When reinforced learning is learning a complex task, there is often a problem that the reward function is difficult to be designed. If we use the sparse reward, although the best solution can be achieved theoretically, it will take a long training time. If we use the dense reward, although the speed of training is accelerated, it is also easy to get the agent into the local minimum. If the complex task is disassembled into several sub-tasks for training, the reward functions of the sub-tasks are easier to design. After the training is completed, these sub-tasks can be merged to achieve the complex tasks. In this study, we use the wafer probe simulator designed by our laboratory and pong in Atari game as the test environment. The wafer inspection simulator is used to simulate how the probe moves when the fab detects the chip. The goal is to have each wafer on the wafer checked once and not repeatedly check the same chip. The pong environment is about letting agents learn to defeat the computer on their own.
Greensmith, Evan. "Policy Gradient Methods: Variance Reduction and Stochastic Convergence." Phd thesis, 2005. http://hdl.handle.net/1885/47105.
Full textChen, Yi-Ching, and 陳怡靜. "Solving Rubik's Cube by Policy Gradient Based Reinforcement Learning." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/t842yt.
Full text國立清華大學
資訊工程學系所
107
Reinforcement Learning provides a mechanism for training an agent to interact with its environment. Policy gradient makes the right actions more probable. We propose using a linear policy gradient method in a deep neural network-based reinforcement learning. The proposed method employs an intensifying reward function to increase the probabilities of right actions to solve the Rubik's Cube problems. Experiments show that our proposed neural network learned to solve some Rubik's Cube states. For more difficult initial states, the network still cannot always give the correct suggestion.
Aberdeen, Douglas. "Policy-Gradient Algorithms for Partially Observable Markov Decision Processes." Phd thesis, 2003. http://hdl.handle.net/1885/48180.
Full textKiah-YangChong and 張家揚. "Design and Implementation of Fuzzy Policy Gradient Gait Learning Method for Humanoid Robot." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/90100127378597192142.
Full text國立成功大學
電機工程學系碩博士班
98
The design and implementation of Fuzzy Policy Gradient Learning (FPGL) method for small-sized humanoid robot is proposed in this thesis. This thesis not only introduces the mechanism structure of the humanoid robot and the hardware system adapted on the robot, which is named as aiRobots-V, but also improves and parameterizes the gait pattern of the robot. The movement of arms is added to the gait pattern to reduce the tilt of trunk while walking. FPGL method is an integrated machine learning method based on Policy Gradient Reinforcement Learning (PGRL) method and fuzzy logic concept in order to improve the efficiency and speed of gait learning computation. The humanoid robot is trained with FPGL method which is using the walking distance in constant walking cycles as the reward to learn faster and stable gait automatically. The tilt degree of trunk is chosen as the reward to learn the movement of arms in the walking cycle. The result of the experiment shows that FPGL method could train the gait pattern from 9.26 mm/s walking speed to 162.27 mm/s in about an hour. The training data of experiments also shows that this method could improve the efficiency of basic PGRL method up to 13%. The effect of arm movement to reduce the tilt degree of trunk is also proved by the experimental results. This robot is also applied to participate in the throw-in technical challenge of RoboCup 2010.
"Adaptive Curvature for Stochastic Optimization." Master's thesis, 2019. http://hdl.handle.net/2286/R.I.53675.
Full textDissertation/Thesis
Masters Thesis Computer Science 2019
Fleming, Brian James. "The social gradient in health : trends in C20th ideas, Australian Health Policy 1970-1998, and a health equity policy evaluation of Australian aged care planning / Brian James Fleming." Thesis, 2003. http://hdl.handle.net/2440/22062.
Full textPereira, Bruno Alexandre Barbosa. "Deep reinforcement learning for robotic manipulation tasks." Master's thesis, 2021. http://hdl.handle.net/10773/33654.
Full textOs avanços recentes na Inteligência Artificial (IA) demonstram um conjunto de novas oportunidades para a robótica. A Aprendizagem Profunda por Reforço (DRL) é uma subárea da IA que resulta da combinação de Aprendizagem Profunda (DL) com Aprendizagem por Reforço (RL). Esta subárea define algoritmos de aprendizagem automática que aprendem diretamente por experiência e oferece uma abordagem compreensiva para o estudo da interação entre aprendizagem, representação e a decisão. Estes algoritmos já têm sido utilizados com sucesso em diferentes domínios. Nomeadamente, destaca-se a aplicação de agentes de DRL que aprenderam a jogar vídeo jogos da consola Atari 2600 diretamente a partir de pixels e atingiram um desempenho comparável a humanos em 49 desses jogos. Mais recentemente, a DRL em conjunto com outras técnicas originou agentes capazes de jogar o jogo de tabuleiro Go a um nível profissional, algo que até ao momento era visto como um problema demasiado complexo para ser resolvido devido ao seu enorme espaço de procura. No âmbito da robótica, a DRL tem vindo a ser utilizada em problemas de planeamento, navegação, controlo ótimo e outros. Nestas aplicações, as excelentes capacidades de aproximação de funções e aprendizagem de representação das Redes Neuronais Profundas permitem à RL escalar a problemas com espaços de estado e ação multidimensionais. Adicionalmente, propriedades inerentes à DRL fazem a transferência de aprendizagem útil ao passar da simulação para o mundo real. Esta dissertação visa investigar a aplicabilidade e eficácia de técnicas de DRL para aprender políticas de sucesso no domínio das tarefas de manipulação robótica. Inicialmente, um conjunto de três problemas clássicos de RL foram resolvidos utilizando algoritmos de RL e DRL de forma a explorar a sua implementação prática e chegar a uma classe de algoritmos apropriados para estas tarefas de robótica. Posteriormente, foi definida uma tarefa em simulação onde um agente tem como objetivo controlar um manipulador com 6 graus de liberdade de forma a atingir um alvo com o seu terminal. Esta é utilizada para avaliar o efeito no desempenho de diferentes representações do estado, hiperparâmetros e algoritmos do estado da arte de DRL, o que resultou em agentes com taxas de sucesso elevadas. O foco é depois colocado na velocidade e restrições de tempo do posicionamento do terminal. Para este fim, diferentes sistemas de recompensa foram testados para que um agente possa aprender uma versão modificada da tarefa anterior para velocidades de juntas superiores. Neste cenário, foram verificadas várias melhorias em relação ao sistema de recompensa original. Finalmente, uma aplicação do melhor agente obtido nas experiências anteriores é demonstrada num cenário implicado de captura de bola.
Mestrado em Engenharia de Computadores e Telemática
Shang, Li-Dan, and 商瓈丹. "Does Instructors’ Grading Policy Affect Student Evaluation of Teaching? Evidence from National Tsing Hua University." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/84201754667124562349.
Full text國立清華大學
經濟學系
104
This paper takes advantage of a uniquely compiled data set from National Tsing Hua University to empirically evaluate whether the results of student evaluation of teaching (SET) are affected by the reputation of instructors with controlling other factors. Our results show that there exists a significant impact between reputation/justice and student evaluation of teaching (SET). In particular, our empirical results suggest that if instructors are willing to gain better SET outcome, what they should do is to establish their reputation by giving students higher grades and treating students fairly. We also find that courses, instructors and students related characteristics are important in regard to SET.
"Policy and Practice Concerning Essay-Grading Criteria in Developmental English and College-Level English Programs in Tennessee Community Colleges." East Tennessee State University, 2000. http://etd-submit.etsu.edu/etd/theses/available/etd-0330100-210610/.
Full text"Grade Inflation in English 102: Thirty Years of Data." Doctoral diss., 2015. http://hdl.handle.net/2286/R.I.29962.
Full textDissertation/Thesis
Doctoral Dissertation English 2015
Mickwitz, Larissa. "En reformerad lärare : Konstruktionen av en professionell och betygssättande lärare i skolpolitik och skolpraktik." Doctoral thesis, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-115348.
Full textMagalane, T. Phoshoko. "Exploring the adaptability of indigenous African marriage song to piano for classroom and the university level education." Diss., 2017. http://hdl.handle.net/11602/957.
Full textCentre for African Studies
This study explored the adaptability of indigenous African marriage songs to piano. Music education has always been biased towards Western music content to the exclusion of local musical traditions. A vast amount of musical repertoire within indigenous African societies exists. Formal music education, however, seems oblivious of this resource despite some educators decrying the dearth of materials. There is a need for music curriculum which is located within an African context and which includes indigenous African musical practices. Such need is also expressed in the new Curriculum and Assessment Policy Statement (CAPS) document. This study explored the feasibility of building a repertoire of indigenous songs for classroom purposes. A number of songs, were collected, transcribed, analysed then placed in various levels of difficulty. These were then matched with the requisite proficiency levels congruent to other graded piano regimes commonly used in the school system. The assumption is that the adaptation and arrangement of indigenous marriage songs will help to bring indigenous African musical practices into modern music education space. Furthermore, it is envisaged that the philosophical understanding and the knowledge attendant to music practices yielding these songs and the context in which they are performed will form the basis for further advancement.