Log in

Relevant bibliographies by topics / Constrained RL / Journal articles

Journal articles on the topic 'Constrained RL'

To see the other types of publications on this topic, follow the link: Constrained RL.

Author: Grafiati

Published: 10 December 2022

Last updated: 28 January 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Constrained RL.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

HasanzadeZonuzy, Aria, Archana Bura, Dileep Kalathil, and Srinivas Shakkottai. "Learning with Safety Constraints: Sample Complexity of Reinforcement Learning for Constrained MDPs." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 9 (May 18, 2021): 7667–74. http://dx.doi.org/10.1609/aaai.v35i9.16937.

Full text

Abstract:

Many physical systems have underlying safety considerations that require that the policy employed ensures the satisfaction of a set of constraints. The analytical formulation usually takes the form of a Constrained Markov Decision Process (CMDP). We focus on the case where the CMDP is unknown, and RL algorithms obtain samples to discover the model and compute an optimal constrained policy. Our goal is to characterize the relationship between safety constraints and the number of samples needed to ensure a desired level of accuracy---both objective maximization and constraint satisfaction---in a PAC sense. We explore two classes of RL algorithms, namely, (i) a generative model based approach, wherein samples are taken initially to estimate a model, and (ii) an online approach, wherein the model is updated as samples are obtained. Our main finding is that compared to the best known bounds of the unconstrained regime, the sample complexity of constrained RL algorithms are increased by a factor that is logarithmic in the number of constraints, which suggests that the approach may be easily utilized in real systems.

APA, Harvard, Vancouver, ISO, and other styles

2

Zhang, Renchi, Runsheng Yu, and Wei Xia. "Constraint-aware Policy Optimization to Solve the Vehicle Routing Problem with Time Windows." Information Technology and Control 51, no. 1 (March 26, 2022): 126–38. http://dx.doi.org/10.5755/j01.itc.51.1.29924.

Full text

Abstract:

The vehicle routing problem with time windows (VRPTW) as one of the most known combinatorial operations (CO) problem is considered to be a tough issue in practice and the main challenge of that is to find the approximate solutions within a reasonable time. In recent years, reinforcement learning (RL) based methods have gained increasing attention in many CO problems, such as the vehicle routing problems (VRP), due to their enormous potential to efficiently generate high-quality solutions. However, neglecting the information between the constraints and the solutions makes previous approaches performance unideal in some strongly constrained problems, like VRPTW. We present the constraint-aware policy optimization (CPO) for VRPTWthat can let the agent learn the constraints as a representation of the whole environment to improve the generalization of RL methods. Extensive experiments on both the Solomon benchmark and the generated datasets demonstrate that our approach significantly outperforms other competition methods.

APA, Harvard, Vancouver, ISO, and other styles

3

Hu, Jingwei, Zhu Liu, Chichuan Jin, and Weimin Yuan. "Relativistic Fe Kα line in the composite X-ray spectra of radio-loud active galactic nuclei." Monthly Notices of the Royal Astronomical Society 488, no. 3 (July 25, 2019): 4378–88. http://dx.doi.org/10.1093/mnras/stz2030.

Full text

Abstract:

ABSTRACT While a broad Fe Kα emission line is generally found in the X-ray spectra of radio quiet (RQ) active galactic nuclei (AGNs), this feature, commonly thought to be broadened by the relativistic effects near the central black hole, appears to be rare in their radio loud (RL) counterparts. In this paper, we carry out a detailed study of the ensemble property of the X-ray spectra, focusing on the Fe line, of 97 RL AGNs by applying the spectral stacking method to the spectra obtained with XMM–Newton. For comparison, the same analysis is also performed for 193 RQ AGNs. Both a narrow and a broad component of the Fe Kα line are detected at high significance in the stacked spectra of both samples. The broad lines can be well fitted with relativistically broadened line profiles. Our results suggest that, as in their RQ counterparts, a relativistic Fe line component is commonly present in RL AGNs, though it may not be detected unambiguously in individual objects with spectra of relatively low signal to noise. We try to constrain the average spin of the black holes for both the RL and RQ AGN samples by modelling their composite Fe line spectral profiles with relativistic disc line models. For the RL sample, the average spin is loosely constrained and a wide range is allowed except for very fast spins (<0.78, 90 per cent confidence), while for the RQ sample, it is constrained to be low or moderate (<0.24). We conclude that the more precise measurement of the black hole spins in RL AGNs has to await for the advent of future high-throughput X-ray telescopes.

APA, Harvard, Vancouver, ISO, and other styles

4

Bhatia, Abhinav, Pradeep Varakantham, and Akshat Kumar. "Resource Constrained Deep Reinforcement Learning." Proceedings of the International Conference on Automated Planning and Scheduling 29 (May 25, 2021): 610–20. http://dx.doi.org/10.1609/icaps.v29i1.3528.

Full text

Abstract:

In urban environments, resources have to be constantly matched to the “right” locations where customer demand is present. For instance, ambulances have to be matched to base stations regularly so as to reduce response time for emergency incidents in ERS (Emergency Response Systems); vehicles (cars, bikes among others) have to be matched to docking stations to reduce lost demand in shared mobility systems. Such problems are challenging owing to the demand uncertainty, combinatorial action spaces and constraints on allocation of resources (e.g., total resources, minimum and maximum number of resources at locations and regions).Existing systems typically employ myopic and greedy optimization approaches to optimize resource allocation. Such approaches typically are unable to handle surges or variances in demand patterns well. Recent work has demonstrated the ability of Deep RL methods in adapting well to highly uncertain environments. However, existing Deep RL methods are unable to handle combinatorial action spaces and constraints on allocation of resources. To that end, we have developed three approaches on top of the well known actor-critic approach, DDPG (Deep Deterministic Policy Gradient) that are able to handle constraints on resource allocation. We also demonstrate that they are able to outperform leading approaches on simulators validated on semi-real and real data sets.

APA, Harvard, Vancouver, ISO, and other styles

5

Gu, Shangding, Guang Chen, Lijun Zhang, Jing Hou, Yingbai Hu, and Alois Knoll. "Constrained Reinforcement Learning for Vehicle Motion Planning with Topological Reachability Analysis." Robotics 11, no. 4 (August 16, 2022): 81. http://dx.doi.org/10.3390/robotics11040081.

Full text

Abstract:

Rule-based traditional motion planning methods usually perform well with prior knowledge of the macro-scale environments but encounter challenges in unknown and uncertain environments. Deep reinforcement learning (DRL) is a solution that can effectively deal with micro-scale unknown and uncertain environments. Nevertheless, DRL is unstable and lacks interpretability. Therefore, it raises a new challenge: how to combine the effectiveness and overcome the drawbacks of the two methods while guaranteeing stability in uncertain environments. In this study, a multi-constraint and multi-scale motion planning method is proposed for automated driving with the use of constrained reinforcement learning (RL), named RLTT, and comprising RL, a topological reachability analysis used for vehicle path space (TPS), and a trajectory lane model (TLM). First, a dynamic model of vehicles is formulated; then, TLM is developed on the basis of the dynamic model, thus constraining RL action and state space. Second, macro-scale path planning is achieved through TPS, and in the micro-scale range, discrete routing points are achieved via RLTT. Third, the proposed motion planning method is designed by combining sophisticated rules, and a theoretical analysis is provided to guarantee the efficiency of our method. Finally, related experiments are conducted to evaluate the effectiveness of the proposed method; our method can reduce 19.9% of the distance cost in the experiments as compared to the traditional method. Experimental results indicate that the proposed method can help mitigate the gap between data-driven and traditional methods, provide better performance for automated driving, and facilitate the use of RL methods in more fields.

APA, Harvard, Vancouver, ISO, and other styles

6

Wang, Ru-Min, Jin-Huan Sheng, Jie Zhu, Ying-Ying Fan, and Yuan-Guo Xu. "Decays $D^+_{(s)}\to \pi(K)^{+}\ell^+\ell^-$ and D0→ℓ+ℓ- in the MSSM with and without R-parity." International Journal of Modern Physics A 30, no. 12 (April 28, 2015): 1550063. http://dx.doi.org/10.1142/s0217751x15500633.

Full text

Abstract:

We study the rare decays D+→π+ℓ+ℓ-, [Formula: see text] and D0→ℓ+ℓ-(ℓ = e, μ) in the minimal supersymmetic standard model with and without R-parity. Using the strong constraints on relevant supersymmetric parameters from [Formula: see text] mixing and [Formula: see text] decay, we examine constrained supersymmetry contributions to relevant branching ratios, direct CP violations and ratios of [Formula: see text] and [Formula: see text] decay rates. We find that both R-parity conserving LR as well as RL mass insertions and R-parity violating squark exchange couplings have huge effects on the direct CP violations of [Formula: see text], moreover, the constrained LR and RL mass insertions still have obvious effects on the ratios of [Formula: see text] and [Formula: see text] decay rates. The direct CP asymmetries and the ratios of [Formula: see text] and [Formula: see text] decay rates are very sensitive to both moduli and phases of relevant supersymmetric parameters. In addition, the differential direct CP asymmetries of [Formula: see text] are studied in detail.

APA, Harvard, Vancouver, ISO, and other styles

7

Wei, Honghao, Xin Liu, and Lei Ying. "A Provably-Efficient Model-Free Algorithm for Infinite-Horizon Average-Reward Constrained Markov Decision Processes." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 4 (June 28, 2022): 3868–76. http://dx.doi.org/10.1609/aaai.v36i4.20302.

Full text

Abstract:

This paper presents a model-free reinforcement learning (RL) algorithm for infinite-horizon average-reward Constrained Markov Decision Processes (CMDPs). Considering a learning horizon K, which is sufficiently large, the proposed algorithm achieves sublinear regret and zero constraint violation. The bounds depend on the number of states S, the number of actions A, and two constants which are independent of the learning horizon K.

APA, Harvard, Vancouver, ISO, and other styles

8

Lee, Xian Yeow, Sambit Ghadai, Kai Liang Tan, Chinmay Hegde, and Soumik Sarkar. "Spatiotemporally Constrained Action Space Attacks on Deep Reinforcement Learning Agents." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (April 3, 2020): 4577–84. http://dx.doi.org/10.1609/aaai.v34i04.5887.

Full text

Abstract:

Robustness of Deep Reinforcement Learning (DRL) algorithms towards adversarial attacks in real world applications such as those deployed in cyber-physical systems (CPS) are of increasing concern. Numerous studies have investigated the mechanisms of attacks on the RL agent's state space. Nonetheless, attacks on the RL agent's action space (corresponding to actuators in engineering systems) are equally perverse, but such attacks are relatively less studied in the ML literature. In this work, we first frame the problem as an optimization problem of minimizing the cumulative reward of an RL agent with decoupled constraints as the budget of attack. We propose the white-box Myopic Action Space (MAS) attack algorithm that distributes the attacks across the action space dimensions. Next, we reformulate the optimization problem above with the same objective function, but with a temporally coupled constraint on the attack budget to take into account the approximated dynamics of the agent. This leads to the white-box Look-ahead Action Space (LAS) attack algorithm that distributes the attacks across the action and temporal dimensions. Our results showed that using the same amount of resources, the LAS attack deteriorates the agent's performance significantly more than the MAS attack. This reveals the possibility that with limited resource, an adversary can utilize the agent's dynamics to malevolently craft attacks that causes the agent to fail. Additionally, we leverage these attack strategies as a possible tool to gain insights on the potential vulnerabilities of DRL agents.

APA, Harvard, Vancouver, ISO, and other styles

9

Delgrange, Florent, Ann Nowé, and Guillermo A. Pérez. "Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 6 (June 28, 2022): 6497–505. http://dx.doi.org/10.1609/aaai.v36i6.20602.

Full text

Abstract:

We consider the challenge of policy simplification and verification in the context of policies learned through reinforcement learning (RL) in continuous environments. In well-behaved settings, RL algorithms have convergence guarantees in the limit. While these guarantees are valuable, they are insufficient for safety-critical applications. Furthermore, they are lost when applying advanced techniques such as deep-RL. To recover guarantees when applying advanced RL algorithms to more complex environments with (i) reachability, (ii) safety-constrained reachability, or (iii) discounted-reward objectives, we build upon the DeepMDP framework to derive new bisimulation bounds between the unknown environment and a learned discrete latent model of it. Our bisimulation bounds enable the application of formal methods for Markov decision processes. Finally, we show how one can use a policy obtained via state-of-the-art RL to efficiently train a variational autoencoder that yields a discrete latent model with provably approximately correct bisimulation guarantees. Additionally, we obtain a distilled version of the policy for the latent model.

APA, Harvard, Vancouver, ISO, and other styles

10

Ding, Zhenhuan, Xiaoge Huang, and Zhao Liu. "Active Exploration by Chance-Constrained Optimization for Voltage Regulation with Reinforcement Learning." Energies 15, no. 2 (January 16, 2022): 614. http://dx.doi.org/10.3390/en15020614.

Full text

Abstract:

Voltage regulation in distribution networks encounters a challenge of handling uncertainties caused by the high penetration of photovoltaics (PV). This research proposes an active exploration (AE) method based on reinforcement learning (RL) to respond to the uncertainties by regulating the voltage of a distribution network with battery energy storage systems (BESS). The proposed method integrates engineering knowledge to accelerate the training process of RL. The engineering knowledge is the chance-constrained optimization. We formulate the problem in a chance-constrained optimization with a linear load flow approximation. The optimization results are used to guide the action selection of the exploration for improving training efficiency and reducing the conserveness characteristic. The comparison of methods focuses on how BESSs are used, training efficiency, and robustness under varying uncertainties and BESS sizes. We implement the proposed algorithm, a chance-constrained optimization, and a traditional Q-learning in the IEEE 13 Node Test Feeder. Our evaluation shows that the proposed AE method has a better response to the training efficiency compared to traditional Q-learning. Meanwhile, the proposed method has advantages in BESS usage in conserveness compared to the chance-constrained optimization.

APA, Harvard, Vancouver, ISO, and other styles

11

Ciliegi, P., M. Elvis, B. J. Wilkes, B. J. Boyle, R. G. Mcmahon, and T. Maccacaro. "VLA Observations of the Cambridge-Cambridge Rosat Survey." Symposium - International Astronomical Union 175 (1996): 543–44. http://dx.doi.org/10.1017/s0074180900081791.

Full text

Abstract:

We report the result of the VLA observations of all the 80 AGN in the Cambridge-Cambridge ROSAT Serendipity Survey (CRSS, Boyle et al. 1995), a new well defined sample of 80 X-ray selected AGN with fx(0.5-2.0keV)≥ 2 × 10–14 erg s–1 cm–2. Our aim was to obtain a complete classification of the sample members as Radio-loud (RL) or Radio-quiet (RQ) in order to determine well-constrained X-ray luminosity function (XLF) for X-ray selected RQ and RL AGN separately.

APA, Harvard, Vancouver, ISO, and other styles

12

McMillan, Justin R., Jonathan Botts, and Jason E. Summers. "Deep reinforcement learning for cognitive active-sonar employment." Journal of the Acoustical Society of America 151, no. 4 (April 2022): A101. http://dx.doi.org/10.1121/10.0010785.

Full text

Abstract:

We introduce a framework to leverage deep reinforcement learning (RL) for active sonar employment, wherein we train an RL agent to select waveform parameters, which maximize the probability of single-target detection. We first simulate raw sonar returns of targets and clutter in reverberation and noise using a physics-based sonar-simulation model, the Sonar Simulation Toolkit (SST), then process the resulting signatures into network inputs via an in-house signal and information processing model of an archetypal antisubmarine warfare (ASW) processing chain. We demonstrate that the trained RL agent is able to appropriately select between continuous wave (CW) and hyperbolic frequency modulated (HFM) waveforms depending on target trajectory, as well as select an optimal bandwidth and pulse length trade-off (when constrained by a constant time-bandwidth product), when presented with sonar returns from a reverb-limited or noise-limited environment.

APA, Harvard, Vancouver, ISO, and other styles

13

Wang, Lehan, Jingzhou Sun, Yuxuan Sun, Sheng Zhou, and Zhisheng Niu. "A UoI-Optimal Policy for Timely Status Updates with Resource Constraint." Entropy 23, no. 8 (August 20, 2021): 1084. http://dx.doi.org/10.3390/e23081084.

Full text

Abstract:

Timely status updates are critical in remote control systems such as autonomous driving and the industrial Internet of Things, where timeliness requirements are usually context dependent. Accordingly, the Urgency of Information (UoI) has been proposed beyond the well-known Age of Information (AoI) by further including context-aware weights which indicate whether the monitored process is in an emergency. However, the optimal updating and scheduling strategies in terms of UoI remain open. In this paper, we propose a UoI-optimal updating policy for timely status information with resource constraint. We first formulate the problem in a constrained Markov decision process and prove that the UoI-optimal policy has a threshold structure. When the context-aware weights are known, we propose a numerical method based on linear programming. When the weights are unknown, we further design a reinforcement learning (RL)-based scheduling policy. The simulation reveals that the threshold of the UoI-optimal policy increases as the resource constraint tightens. In addition, the UoI-optimal policy outperforms the AoI-optimal policy in terms of average squared estimation error, and the proposed RL-based updating policy achieves a near-optimal performance without the advanced knowledge of the system model.

APA, Harvard, Vancouver, ISO, and other styles

14

Mutombo, Vially Kazadi, Seungyeon Lee, Jusuk Lee, and Jiman Hong. "EER-RL: Energy-Efficient Routing Based on Reinforcement Learning." Mobile Information Systems 2021 (April 19, 2021): 1–12. http://dx.doi.org/10.1155/2021/5589145.

Full text

Abstract:

Wireless sensor devices are the backbone of the Internet of things (IoT), enabling real-world objects and human beings to be connected to the Internet and interact with each other to improve citizens’ living conditions. However, IoT devices are memory and power-constrained and do not allow high computational applications, whereas the routing task is what makes an object to be part of an IoT network despite of being a high power-consuming task. Therefore, energy efficiency is a crucial factor to consider when designing a routing protocol for IoT wireless networks. In this paper, we propose EER-RL, an energy-efficient routing protocol based on reinforcement learning. Reinforcement learning (RL) allows devices to adapt to network changes, such as mobility and energy level, and improve routing decisions. The performance of the proposed protocol is compared with other existing energy-efficient routing protocols, and the results show that the proposed protocol performs better in terms of energy efficiency and network lifetime and scalability.

APA, Harvard, Vancouver, ISO, and other styles

15

Zhan, Xianyuan, Haoran Xu, Yue Zhang, Xiangyu Zhu, Honglei Yin, and Yu Zheng. "DeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 4 (June 28, 2022): 4680–88. http://dx.doi.org/10.1609/aaai.v36i4.20393.

Full text

Abstract:

Optimizing the combustion efficiency of a thermal power generating unit (TPGU) is a highly challenging and critical task in the energy industry. We develop a new data-driven AI system, namely DeepThermal, to optimize the combustion control strategy for TPGUs. At its core, is a new model-based offline reinforcement learning (RL) framework, called MORE, which leverages historical operational data of a TGPU to solve a highly complex constrained Markov decision process problem via purely offline training. In DeepThermal, we first learn a data-driven combustion process simulator from the offline dataset. The RL agent of MORE is then trained by combining real historical data as well as carefully filtered and processed simulation data through a novel restrictive exploration scheme. DeepThermal has been successfully deployed in four large coal-fired thermal power plants in China. Real-world experiments show that DeepThermal effectively improves the combustion efficiency of TPGUs. We also report the superior performance of MORE by comparing with the state-of-the-art algorithms on the standard offline RL benchmarks.

APA, Harvard, Vancouver, ISO, and other styles

16

Xie, Zhihang, and Qiquan Lin. "Reinforcement Learning-Based Adaptive Position Control Scheme for Uncertain Robotic Manipulators with Constrained Angular Position and Angular Velocity." Applied Sciences 13, no. 3 (January 18, 2023): 1275. http://dx.doi.org/10.3390/app13031275.

Full text

Abstract:

Aiming at robotic manipulators subject to system uncertainty and external disturbance, this paper presents a novel adaptive control scheme that uses the time delay estimation (TED) technique and reinforcement learning (RL) technique to achieve a good tracking performance for each joint of a manipulator. Compared to conventional controllers, the proposed control scheme can not only handle the system parametric uncertainty and external disturbance but also guarantee both the angular positions and angular velocities of each joint without exceeding their preset constraints. Moreover, it has been proved by using Lyapunov theory that the tracking errors are uniformly ultimately bounded (UUB) with a small bound related to the parameters of the controller. Additionally, an innovative RL-based auxiliary term in the proposed controller further minimizes the steady state tracking errors, and thereby the tracking accuracy is not compromised by the lack of asymptotic convergence of tracking errors. Finally, the simulation results validate the effectiveness of the proposed control scheme.

APA, Harvard, Vancouver, ISO, and other styles

17

Turnbull, Matthew H., Romà Ogaya, Adrià Barbeta, Josep Peñuelas, Joana Zaragoza-Castells, Owen K. Atkin, Fernando Valladares, Teresa E. Gimeno, Beatriz Pías, and Kevin L. Griffin. "Light inhibition of foliar respiration in response to soil water availability and seasonal changes in temperature in Mediterranean holm oak (Quercus ilex) forest." Functional Plant Biology 44, no. 12 (2017): 1178. http://dx.doi.org/10.1071/fp17032.

Full text

Abstract:

In the present study we investigated variations in leaf respiration in darkness (RD) and light (RL), and associated traits in response to season, and along a gradient of soil moisture, in Mediterranean woodland dominated by holm oak (Quercus ilex L.) in central and north-eastern Spain respectively. On seven occasions during the year in the central Spain site, and along the soil moisture gradient in north-eastern Spain, we measured rates of leaf RD, RL (using the Kok method), light-saturated photosynthesis (A) and related light response characteristics, leaf mass per unit area (MA) and leaf nitrogen (N) content. At the central Spain site, significant seasonal changes in soil water content and ambient temperature (T) were associated with changes in MA, foliar N, A and stomatal conductance. RD measured at the prevailing daily T and in instantaneous R–T responses, displayed signs of partial acclimation and was not significantly affected by time of year. RL was always less than, and strongly related to, RD, and RL/RD did not vary significantly or systematically with seasonal changes in T or soil water content. Averaged over the year, RL/RD was 0.66 ± 0.05 s.e. (n = 14) at the central Spain site. At the north-eastern Spain site, the soil moisture gradient was characterised by increasing MA and RD, and reduced foliar N, A, and stomatal conductance as soil water availability decreased. Light inhibition of R occurred across all sites (mean RL/RD = 0.69 ± 0.01 s.e. (n = 18)), resulting in ratios of RL/A being lower than for RD/A. Importantly, the degree of light inhibition was largely insensitive to changes in soil water content. Our findings provide evidence for a relatively constrained degree of light inhibition of R (RL/RD ~ 0.7, or inhibition of ~30%) across gradients of water availability, although the combined impacts of seasonal changes in both T and soil water content increase the range of values expressed. The findings thus have implications in terms of the assumptions made by predictive models that seek to account for light inhibition of R, and for our understanding of how environmental gradients impact on leaf trait relationships in Mediterranean plant communities.

APA, Harvard, Vancouver, ISO, and other styles

18

Wang, Xun, and Hongbin Chen. "A Reinforcement Learning-Based Dynamic Clustering Algorithm for Compressive Data Gathering in Wireless Sensor Networks." Mobile Information Systems 2022 (May 9, 2022): 1–10. http://dx.doi.org/10.1155/2022/2736734.

Full text

Abstract:

Compressive data gathering (CDG) is an effective technique to handle large amounts of data transmissions in resource-constrained wireless sensor networks (WSNs). However, CDG with static clustering cannot adapt to time-varying environments in WSNs. In this paper, a reinforcement learning-based dynamic clustering algorithm (RLDCA) for CDG in WSNs is proposed. It is a dynamic and adaptive clustering method aiming to further reduce data transmissions and energy consumption in WSNs. Sensor nodes act as reinforcement learning (RL) agents which can observe the environment and dynamically select a cluster to join in. These RL agents are instructed by a well-designed reward scheme to join a cluster with strong data correlation and proper distance. It is also a distributed and lightweight learning method. All agents are independent and operate in parallel. Additional overheads introduced by RL are lightweight. Computations of a linear reward function and a few comparison operations are needed. It is implementable in WSNs. Simulations performed in MATLAB validate the effectiveness of the proposed method and simulation results show that the proposed algorithm achieves the desired effect as well as fine convergence. It decreases data transmissions by 16.6% and 54.4% and energy consumption by 6% and 29%, respectively, compared to the two contrastive schemes.

APA, Harvard, Vancouver, ISO, and other styles

19

Pandit, Mohammad Khalid, Roohie Naaz Mir, and Mohammad Ahsan Chishti. "Adaptive task scheduling in IoT using reinforcement learning." International Journal of Intelligent Computing and Cybernetics 13, no. 3 (June 30, 2020): 261–82. http://dx.doi.org/10.1108/ijicc-03-2020-0021.

Full text

Abstract:

PurposeThe intelligence in the Internet of Things (IoT) can be embedded by analyzing the huge volumes of data generated by it in an ultralow latency environment. The computational latency incurred by the cloud-only solution can be significantly brought down by the fog computing layer, which offers a computing infrastructure to minimize the latency in service delivery and execution. For this purpose, a task scheduling policy based on reinforcement learning (RL) is developed that can achieve the optimal resource utilization as well as minimum time to execute tasks and significantly reduce the communication costs during distributed execution.Design/methodology/approachTo realize this, the authors proposed a two-level neural network (NN)-based task scheduling system, where the first-level NN (feed-forward neural network/convolutional neural network [FFNN/CNN]) determines whether the data stream could be analyzed (executed) in the resource-constrained environment (edge/fog) or be directly forwarded to the cloud. The second-level NN ( RL module) schedules all the tasks sent by level 1 NN to fog layer, among the available fog devices. This real-time task assignment policy is used to minimize the total computational latency (makespan) as well as communication costs.FindingsExperimental results indicated that the RL technique works better than the computationally infeasible greedy approach for task scheduling and the combination of RL and task clustering algorithm reduces the communication costs significantly.Originality/valueThe proposed algorithm fundamentally solves the problem of task scheduling in real-time fog-based IoT with best resource utilization, minimum makespan and minimum communication cost between the tasks.

APA, Harvard, Vancouver, ISO, and other styles

20

Marchesini, Enrico, Davide Corsi, and Alessandro Farinelli. "Exploring Safer Behaviors for Deep Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 7 (June 28, 2022): 7701–9. http://dx.doi.org/10.1609/aaai.v36i7.20737.

Full text

Abstract:

We consider Reinforcement Learning (RL) problems where an agent attempts to maximize a reward signal while minimizing a cost function that models unsafe behaviors. Such formalization is addressed in the literature using constrained optimization on the cost, limiting the exploration and leading to a significant trade-off between cost and reward. In contrast, we propose a Safety-Oriented Search that complements Deep RL algorithms to bias the policy toward safety within an evolutionary cost optimization. We leverage evolutionary exploration benefits to design a novel concept of safe mutations that use visited unsafe states to explore safer actions. We further characterize the behaviors of the policies over desired specifics with a sample-based bound estimation, which makes prior verification analysis tractable in the training loop. Hence, driving the learning process towards safer regions of the policy space. Empirical evidence on the Safety Gym benchmark shows that we successfully avoid drawbacks on the return while improving the safety of the policy.

APA, Harvard, Vancouver, ISO, and other styles

21

Liang, Enming, Zicheng Su, Chilin Fang, and Renxin Zhong. "OAM: An Option-Action Reinforcement Learning Framework for Universal Multi-Intersection Control." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 4 (June 28, 2022): 4550–58. http://dx.doi.org/10.1609/aaai.v36i4.20378.

Full text

Abstract:

Efficient traffic signal control is an important means to alleviate urban traffic congestion. Reinforcement learning (RL) has shown great potentials in devising optimal signal plans that can adapt to dynamic traffic congestion. However, several challenges still need to be overcome. Firstly, a paradigm of state, action, and reward design is needed, especially for an optimality-guaranteed reward function. Secondly, the generalization of the RL algorithms is hindered by the varied topologies and physical properties of intersections. Lastly, enhancing the cooperation between intersections is needed for large network applications. To address these issues, the Option-Action RL framework for universal Multi-intersection control (OAM) is proposed. Based on the well-known cell transmission model, we first define a lane-cell-level state to better model the traffic flow propagation. Based on this physical queuing dynamics, we propose a regularized delay as the reward to facilitate temporal credit assignment while maintaining the equivalence with minimizing the average travel time. We then recapitulate the phase actions as the constrained combinations of lane options and design a universal neural network structure to realize model generalization to any intersection with any phase definition. The multiple-intersection cooperation is then rigorously discussed using the potential game theory. We test the OAM algorithm under four networks with different settings, including a city-level scenario with 2,048 intersections using synthetic and real-world datasets. The results show that the OAM can outperform the state-of-the-art controllers in reducing the average travel time.

APA, Harvard, Vancouver, ISO, and other styles

22

Jeffery, Robert P., Richard J. Simpson, Hans Lambers, Daniel R. Kidd, and Megan H. Ryan. "Plants in constrained canopy micro-swards compensate for decreased root biomass and soil exploration with increased amounts of rhizosphere carboxylates." Functional Plant Biology 44, no. 5 (2017): 552. http://dx.doi.org/10.1071/fp16398.

Full text

Abstract:

Root traits related to phosphorus (P) acquisition are used to make inferences about a species’ P-foraging ability under glasshouse conditions. However, the effect on such root traits of constrained canopy spread, as occurs in dense pasture swards, is unknown. We grew micro-swards of Trifolium subterraneum L. and Ornithopus compressus L. at 15 and 60 mg kg–1 soil P in a glasshouse. Shoots either spread beyond the pot perimeter or were constrained by a cylindrical sleeve adjusted to canopy height. After 8 weeks, shoot and root dry mass (DM), shoot tissue P concentration, rhizosphere carboxylates, arbuscular mycorrhizal (AM) fungal colonisation, total and specific root length (TRL and SRL respectively), average root diameter (ARD) and average root hair length (ARHL) were measured. In all species and treatments, constrained canopy spread decreased root DM (39–59%), TRL (27–45%) and shoot DM (10–28%), and increased SRL (20–33%), but did not affect ARD, ARHL and AM fungal colonisation. However, shoot P concentration and content increased, and rhizosphere carboxylates increased 3.5 to 12-fold per unit RL and 2.0- to 6.5-fold per micro-sward. Greater amounts of rhizosphere carboxylates when canopy spread was constrained appeared to compensate for reduced root growth enabling shoot P content to be maintained.

APA, Harvard, Vancouver, ISO, and other styles

23

Jing, Mingxuan, Xiaojian Ma, Wenbing Huang, Fuchun Sun, Chao Yang, Bin Fang, and Huaping Liu. "Reinforcement Learning from Imperfect Demonstrations under Soft Expert Guidance." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (April 3, 2020): 5109–16. http://dx.doi.org/10.1609/aaai.v34i04.5953.

Full text

Abstract:

In this paper, we study Reinforcement Learning from Demonstrations (RLfD) that improves the exploration efficiency of Reinforcement Learning (RL) by providing expert demonstrations. Most of existing RLfD methods require demonstrations to be perfect and sufficient, which yet is unrealistic to meet in practice. To work on imperfect demonstrations, we first define an imperfect expert setting for RLfD in a formal way, and then point out that previous methods suffer from two issues in terms of optimality and convergence, respectively. Upon the theoretical findings we have derived, we tackle these two issues by regarding the expert guidance as a soft constraint on regulating the policy exploration of the agent, which eventually leads to a constrained optimization problem. We further demonstrate that such problem is able to be addressed efficiently by performing a local linear search on its dual form. Considerable empirical evaluations on a comprehensive collection of benchmarks indicate our method attains consistent improvement over other RLfD counterparts.

APA, Harvard, Vancouver, ISO, and other styles

24

Ullah, Zakir, Zhiwei Xu, Lei Zhang, Libo Zhang, and Waheed Ullah. "RL and ANN Based Modular Path Planning Controller for Resource-Constrained Robots in the Indoor Complex Dynamic Environment." IEEE Access 6 (2018): 74557–68. http://dx.doi.org/10.1109/access.2018.2882875.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Tatavarti, Hari, Prashant Doshi, and Layton Hayes. "Data-Driven Decision-Theoretic Planning using Recurrent Sum-Product-Max Networks." Proceedings of the International Conference on Automated Planning and Scheduling 31 (May 17, 2021): 606–14. http://dx.doi.org/10.1609/icaps.v31i1.16009.

Full text

Abstract:

Sum-product networks (SPN) are knowledge compilation models and are related to other graphical models for efficient probabilistic inference such as arithmetic circuits and AND/OR graphs. Recent investigations into generalizing SPNs have yielded sum-product-max networks (SPMN) which offer a data-driven alternative for decision making that has predominantly relied on handcrafted models. However, SPMNs are not suited for decision-theoretic planning which involves sequential decision making over multiple time steps. In this paper, we present recurrent SPMNs (RSPMN) that learn from and model decision-making data over time. RSPMNs utilize a template network that is unfolded as needed depending on the length of the data sequence. This is significant as RSPMNs not only inherit the benefits of SPNs in being data driven and mostly tractable, they are also well suited for planning problems. We establish soundness conditions on the template network, which guarantee that the resulting SPMN is valid, and present a structure learning algorithm to learn a sound template. RSPMNs learned on a testbed of data sets, some generated using RDDLSim, yield MEUs and policies that are close to the optimal on perfectly-observed domains and easily improve on a recent batch-constrained RL method, which is important because RSPMNs offer a new model-based approach to offline RL.

APA, Harvard, Vancouver, ISO, and other styles

26

Zong Chen, Joy Iong, and Kong-Long Lai. "Internet of Things (IoT) Authentication and Access Control by Hybrid Deep Learning Method - A Study." December 2020 2, no. 4 (January 19, 2021): 236–45. http://dx.doi.org/10.36548/jscp.2020.4.005.

Full text

Abstract:

In the history of device computing, Internet of Things (IoT) is one of the fastest growing field that facing many security challenges. The effective efforts should have been made to address the security and privacy issues in IoT networks. The IoT devices are basically resource control device which provide routine attract impression for cyber attackers. The IoT participation nodes are increasing rapidly with more resource constrained that creating more challenging conditions in the real time. The existing methods provide an ineffective response to the tasks for effective IoT device. Also, it is an insufficient to involve the complete security and safety spectrum of the IoT networks. Because of the existing algorithms are not enriched to secure IoT bionetwork in the real time environment. The existing system is not enough to detect the proxy to the authorized person in the embedding devices. Also, those methods are believed in single model domain. Therefore, the effectiveness is dropping for further multimodal domain such as combination of behavioral and physiological features. The embedding intelligent technique will be securitizing for the IoT devices and networks by deep learning (DL) techniques. The DL method is addressing different security and safety problems arise in real time environment. This paper is highlighting hybrid DL techniques with Reinforcement Learning (RL) for the better performance during attack and compared with existing one. Also, here we discussed about DL combined with RL of several techniques and identify the higher accuracy algorithm for security solutions. Finally, we discuss the future direction of decision making of DL based IoT security system.

APA, Harvard, Vancouver, ISO, and other styles

27

Shanmugam, Sivagurunathan, Muthu Ganeshan V., Prathapchandran K., and Janani T. "Mitigating Black Hole Attacks in Routing Protocols Using a Machine Learning-Based Trust Model." International Journal of Sociotechnology and Knowledge Development 14, no. 1 (January 1, 2022): 1–23. http://dx.doi.org/10.4018/ijskd.310067.

Full text

Abstract:

Many application domains gain considerable advantages with the internet of things (IoT) network. It improves our lifestyle towards smartness in smart devices. IoT devices are mostly resource-constrained such as memory, battery, etc. So it is highly vulnerable to security attacks. Traditional security mechanisms can't be applied to these devices due to their restricted resources. A trust-based security mechanism plays an important role to ensure security in the IoT environment because it consumes only fewer resources. Thus, it is essential to evaluate the trustworthiness among IoT devices. The proposed model improves trusted routing in the IoT environment by detecting and isolating malicious nodes. This model uses reinforcement learning (RL) where the agent learns the behavior of the node and isolates the malicious nodes to improve the network performance. The model focuses on IoT with the routing protocol for low power and lossy network (RPL) and counters the blackhole attack.

APA, Harvard, Vancouver, ISO, and other styles

28

Jiang, Jianhua, Yangang Ren, Yang Guan, Shengbo Eben Li, Yuming Yin, Dongjie Yu, and Xiaoping Jin. "Integrated decision and control at multi-lane intersections with mixed traffic flow." Journal of Physics: Conference Series 2234, no. 1 (April 1, 2022): 012015. http://dx.doi.org/10.1088/1742-6596/2234/1/012015.

Full text

Abstract:

Abstract Autonomous driving at intersections is one of the most complicated and accident-prone traffic scenarios, especially with mixed traffic participants such as vehicles, bicycles and pedestrians. The driving policy should make safe decisions to handle the dynamic traffic conditions and meet the requirements of on-board computation. However, most of the current researches focuses on simplified intersections considering only the surrounding vehicles and idealized traffic lights. This paper improves the integrated decision and control framework and develops a learning-based algorithm to deal with complex intersections with mixed traffic flows, which can not only take account of realistic characteristics of traffic lights, but also learn a safe policy under different safety constraints. We first consider different velocity models for green and red lights in the training process and use a finite state machine to handle different modes of light transformation. Then we design different types of distance constraints for vehicles, traffic lights, pedestrians, bicycles respectively and formulize the constrained optimal control problems (OCPs) to be optimized. Finally, reinforcement learning (RL) with value and policy networks is adopted to solve the series of OCPs. In order to verify the safety and efficiency of the proposed method, we design a multi-lane intersection with the existence of large-scale mixed traffic participants and set practical traffic light phases. The simulation results indicate that the trained decision and control policy can well balance safety and tracking performance. Compared with model predictive control (MPC), the computational time is three orders of magnitude lower.

APA, Harvard, Vancouver, ISO, and other styles

29

Chen, Tsing-Chang, Jenq-Dar Tsay, and William J. Gutowski. "A Comparison Study of Three Polar Grids." Journal of Applied Meteorology and Climatology 47, no. 11 (November 1, 2008): 2993–3007. http://dx.doi.org/10.1175/2008jamc1746.1.

Full text

Abstract:

Abstract The circumference of a latitude circle decreases toward the Poles, making it difficult to present meteorological field variables on equally spaced grids with respect to latitude and longitude because of data aggregation. To identify the best method for displaying data at the Poles, three different grids are compared that have all been designed to reduce data aggregation: the reduced latitude–longitude (RL) grid, the National Snow and Ice Data Center Equal-Area Special Sensor Microwave Imager (SSM/I) Earth (EA) grid, and the National Meteorological Center octagonal (OG) grid. The merits and disadvantages of these grids are compared in terms of depictions of the Arctic summer circulation with wind vectors, streamfunction, and velocity potential at 400 hPa where maximum westerlies are located. Using geostrophy, the 400-hPa streamfunction at high latitudes can be formed from geopotential height. In comparison with this geostrophic streamfunction, the streamfunction generated from vorticity on the OG grid shows a negligible error (∼0.5%). The error becomes larger using vorticity on the EA (∼15%) and RL (∼30%) grids. During the northern summer, the Arctic circulation at 400 hPa is characterized by three troughs. The streamfunction and velocity potential of these three troughs are spatially in quadrature with divergent (convergent) centers located ahead of (behind) these troughs. These circulation features are best depicted by the streamfunction and velocity potential generated on the OG grid. It is demonstrated by these findings that the National Meteorological Center octagonal grid is the most ideal among the three grids used for the polar regions. However, this assessment is constrained by the hemispheric perspective of meteorological field variables, because these variables depicted on the octagonal grid at higher latitudes need to be merged with those on the equal-latitude-longitude grid at lower latitudes.

APA, Harvard, Vancouver, ISO, and other styles

30

Ma, Yecheng Jason, Andrew Shen, Osbert Bastani, and Jayaraman Dinesh. "Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 5 (June 28, 2022): 5404–12. http://dx.doi.org/10.1609/aaai.v36i5.20478.

Full text

Abstract:

Reinforcement Learning (RL) agents in the real world must satisfy safety constraints in addition to maximizing a reward objective. Model-based RL algorithms hold promise for reducing unsafe real-world actions: they may synthesize policies that obey all constraints using simulated samples from a learned model. However, imperfect models can result in real-world constraint violations even for actions that are predicted to satisfy all constraints. We propose Conservative and Adaptive Penalty (CAP), a model-based safe RL framework that accounts for potential modeling errors by capturing model uncertainty and adaptively exploiting it to balance the reward and the cost objectives. First, CAP inflates predicted costs using an uncertainty-based penalty. Theoretically, we show that policies that satisfy this conservative cost constraint are guaranteed to also be feasible in the true environment. We further show that this guarantees the safety of all intermediate solutions during RL training. Further, CAP adaptively tunes this penalty during training using true cost feedback from the environment. We evaluate this conservative and adaptive penalty-based approach for model-based safe RL extensively on state and image-based environments. Our results demonstrate substantial gains in sample-efficiency while incurring fewer violations than prior safe RL algorithms. Code is available at: https://github.com/Redrew/CAP

APA, Harvard, Vancouver, ISO, and other styles

31

Krishnan, Srivatsan, Behzad Boroujerdian, William Fu, Aleksandra Faust, and Vijay Janapa Reddi. "Air Learning: a deep reinforcement learning gym for autonomous aerial robot visual navigation." Machine Learning 110, no. 9 (July 7, 2021): 2501–40. http://dx.doi.org/10.1007/s10994-021-06006-6.

Full text

Abstract:

AbstractWe introduce Air Learning, an open-source simulator, and a gym environment for deep reinforcement learning research on resource-constrained aerial robots. Equipped with domain randomization, Air Learning exposes a UAV agent to a diverse set of challenging scenarios. We seed the toolset with point-to-point obstacle avoidance tasks in three different environments and Deep Q Networks (DQN) and Proximal Policy Optimization (PPO) trainers. Air Learning assesses the policies’ performance under various quality-of-flight (QoF) metrics, such as the energy consumed, endurance, and the average trajectory length, on resource-constrained embedded platforms like a Raspberry Pi. We find that the trajectories on an embedded Ras-Pi are vastly different from those predicted on a high-end desktop system, resulting in up to $$40\%$$ 40 % longer trajectories in one of the environments. To understand the source of such discrepancies, we use Air Learning to artificially degrade high-end desktop performance to mimic what happens on a low-end embedded system. We then propose a mitigation technique that uses the hardware-in-the-loop to determine the latency distribution of running the policy on the target platform (onboard compute on aerial robot). A randomly sampled latency from the latency distribution is then added as an artificial delay within the training loop. Training the policy with artificial delays allows us to minimize the hardware gap (discrepancy in the flight time metric reduced from 37.73% to 0.5%). Thus, Air Learning with hardware-in-the-loop characterizes those differences and exposes how the onboard compute’s choice affects the aerial robot’s performance. We also conduct reliability studies to assess the effect of sensor failures on the learned policies. All put together, Air Learning enables a broad class of deep RL research on UAVs. The source code is available at: https://github.com/harvard-edge/AirLearning.

APA, Harvard, Vancouver, ISO, and other styles

32

Kaymak, Çağrı, Ayşegül Uçar, and Cüneyt Güzeliş. "Development of a New Robust Stable Walking Algorithm for a Humanoid Robot Using Deep Reinforcement Learning with Multi-Sensor Data Fusion." Electronics 12, no. 3 (January 22, 2023): 568. http://dx.doi.org/10.3390/electronics12030568.

Full text

Abstract:

The difficult task of creating reliable mobility for humanoid robots has been studied for decades. Even though several different walking strategies have been put forth and walking performance has substantially increased, stability still needs to catch up to expectations. Applications for Reinforcement Learning (RL) techniques are constrained by low convergence and ineffective training. This paper develops a new robust and efficient framework based on the Robotis-OP2 humanoid robot combined with a typical trajectory-generating controller and Deep Reinforcement Learning (DRL) to overcome these limitations. This framework consists of optimizing the walking trajectory parameters and posture balancing system. Multi-sensors of the robot are used for parameter optimization. Walking parameters are optimized using the Dueling Double Deep Q Network (D3QN), one of the DRL algorithms, in the Webots simulator. The hip strategy is adopted for the posture balancing system. Experimental studies are carried out in both simulation and real environments with the proposed framework and Robotis-OP2’s walking algorithm. Experimental results show that the robot performs more stable walking with the proposed framework than Robotis-OP2’s walking algorithm. It is thought that the proposed framework will be beneficial for researchers studying in the field of humanoid robot locomotion.

APA, Harvard, Vancouver, ISO, and other styles

33

Xu, Haoran, Xianyuan Zhan, and Xiangyu Zhu. "Constraints Penalized Q-learning for Safe Offline Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 8 (June 28, 2022): 8753–60. http://dx.doi.org/10.1609/aaai.v36i8.20855.

Full text

Abstract:

We study the problem of safe offline reinforcement learning (RL), the goal is to learn a policy that maximizes long-term reward while satisfying safety constraints given only offline data, without further interaction with the environment. This problem is more appealing for real world RL applications, in which data collection is costly or dangerous. Enforcing constraint satisfaction is non-trivial, especially in offline settings, as there is a potential large discrepancy between the policy distribution and the data distribution, causing errors in estimating the value of safety constraints. We show that naïve approaches that combine techniques from safe RL and offline RL can only learn sub-optimal solutions. We thus develop a simple yet effective algorithm, Constraints Penalized Q-Learning (CPQ), to solve the problem. Our method admits the use of data generated by mixed behavior policies. We present a theoretical analysis and demonstrate empirically that our approach can learn robustly across a variety of benchmark control tasks, outperforming several baselines.

APA, Harvard, Vancouver, ISO, and other styles

34

Mu, Tong, Georgios Theocharous, David Arbour, and Emma Brunskill. "Constraint Sampling Reinforcement Learning: Incorporating Expertise for Faster Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 7 (June 28, 2022): 7841–49. http://dx.doi.org/10.1609/aaai.v36i7.20753.

Full text

Abstract:

Online reinforcement learning (RL) algorithms are often difficult to deploy in complex human-facing applications as they may learn slowly and have poor early performance. To address this, we introduce a practical algorithm for incorporating human insight to speed learning. Our algorithm, Constraint Sampling Reinforcement Learning (CSRL), incorporates prior domain knowledge as constraints/restrictions on the RL policy. It takes in multiple potential policy constraints to maintain robustness to misspecification of individual constraints while leveraging helpful ones to learn quickly. Given a base RL learning algorithm (ex. UCRL, DQN, Rainbow) we propose an upper confidence with elimination scheme that leverages the relationship between the constraints, and their observed performance, to adaptively switch among them. We instantiate our algorithm with DQN-type algorithms and UCRL as base algorithms, and evaluate our algorithm in four environments, including three simulators based on real data: recommendations, educational activity sequencing, and HIV treatment sequencing. In all cases, CSRL learns a good policy faster than baselines.

APA, Harvard, Vancouver, ISO, and other styles

35

Shan, Nanliang, Zecong Ye, and Xiaolong Cui. "Collaborative Intelligence: Accelerating Deep Neural Network Inference via Device-Edge Synergy." Security and Communication Networks 2020 (September 7, 2020): 1–10. http://dx.doi.org/10.1155/2020/8831341.

Full text

Abstract:

With the development of mobile edge computing (MEC), more and more intelligent services and applications based on deep neural networks are deployed on mobile devices to meet the diverse and personalized needs of users. Unfortunately, deploying and inferencing deep learning models on resource-constrained devices are challenging. The traditional cloud-based method usually runs the deep learning model on the cloud server. Since a large amount of input data needs to be transmitted to the server through WAN, it will cause a large service latency. This is unacceptable for most current latency-sensitive and computation-intensive applications. In this paper, we propose Cogent, an execution framework that accelerates deep neural network inference through device-edge synergy. In the Cogent framework, it is divided into two operation stages, including the automatic pruning and partition stage and the containerized deployment stage. Cogent uses reinforcement learning (RL) to automatically predict pruning and partition strategies based on feedback from the hardware configuration and system conditions so that the pruned and partitioned model can better adapt to the system environment and user hardware configuration. Then through containerized deployment to the device and the edge server to accelerate model inference, experiments show that the learning-based hardware-aware automatic pruning and partition scheme can significantly reduce the service latency, and it accelerates the overall model inference process while maintaining accuracy. Using this method can accelerate up to 8.89× without loss of accuracy of more than 7%.

APA, Harvard, Vancouver, ISO, and other styles

36

Spieker, Helge. "Towards Sequence-to-Sequence Reinforcement Learning for Constraint Solving with Constraint-Based Local Search." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 10037–38. http://dx.doi.org/10.1609/aaai.v33i01.330110037.

Full text

Abstract:

This paper proposes a framework for solving constraint problems with reinforcement learning (RL) and sequence-tosequence recurrent neural networks. We approach constraint solving as a declarative machine learning problem, where for a variable-length input sequence a variable-length output sequence has to be predicted. Using randomly generated instances and the number of constraint violations as a reward function, a problem-specific RL agent is trained to solve the problem. The predicted solution candidate of the RL agent is verified and repaired by CBLS to ensure solutions, that satisfy the constraint model. We introduce the framework and its components and discuss early results and future applications.

APA, Harvard, Vancouver, ISO, and other styles

37

Le, Nhat, A. B. Siddique, Fuad Jamour, Samet Oymak, and Vagelis Hristidis. "Generating Predictable and Adaptive Dialog Policies in Single- and Multi-domain Goal-oriented Dialog Systems." International Journal of Semantic Computing 15, no. 04 (December 2021): 419–39. http://dx.doi.org/10.1142/s1793351x21400109.

Full text

Abstract:

Most existing commercial goal-oriented chatbots are diagram-based; i.e. they follow a rigid dialog flow to fill the slot values needed to achieve a user’s goal. Diagram-based chatbots are predictable, thus their adoption in commercial settings; however, their lack of flexibility may cause many users to leave the conversation before achieving their goal. On the other hand, state-of-the-art research chatbots use Reinforcement Learning (RL) to generate flexible dialog policies. However, such chatbots can be unpredictable, may violate the intended business constraints, and require large training datasets to produce a mature policy. We propose a framework that achieves a middle ground between the diagram-based and RL-based chatbots: we constrain the space of possible chatbot responses using a novel structure, the chatbot dependency graph, and use RL to dynamically select the best valid responses. Dependency graphs are directed graphs that conveniently express a chatbot’s logic by defining the dependencies among slots: all valid dialog flows are encapsulated in one dependency graph. Our experiments in both single-domain and multi-domain settings show that our framework quickly adapts to user characteristics and achieves up to 23.77% improved success rate compared to a state-of-the-art RL model.

APA, Harvard, Vancouver, ISO, and other styles

38

Liu, Yongshuai, Jiaxin Ding, and Xin Liu. "IPO: Interior-Point Policy Optimization under Constraints." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (April 3, 2020): 4940–47. http://dx.doi.org/10.1609/aaai.v34i04.5932.

Full text

Abstract:

In this paper, we study reinforcement learning (RL) algorithms to solve real-world decision problems with the objective of maximizing the long-term reward as well as satisfying cumulative constraints. We propose a novel first-order policy optimization method, Interior-point Policy Optimization (IPO), which augments the objective with logarithmic barrier functions, inspired by the interior-point method. Our proposed method is easy to implement with performance guarantees and can handle general types of cumulative multi-constraint settings. We conduct extensive evaluations to compare our approach with state-of-the-art baselines. Our algorithm outperforms the baseline algorithms, in terms of reward maximization and constraint satisfaction.

APA, Harvard, Vancouver, ISO, and other styles

39

Nurkasanah, Ika. "Reinforcement Learning Approach for Efficient Inventory Policy in Multi-Echelon Supply Chain Under Various Assumptions and Constraints." Journal of Information Systems Engineering and Business Intelligence 7, no. 2 (October 28, 2021): 138. http://dx.doi.org/10.20473/jisebi.7.2.138-148.

Full text

Abstract:

Background: Inventory policy highly influences Supply Chain Management (SCM) process. Evidence suggests that almost half of SCM costs are set off by stock-related expenses.Objective: This paper aims to minimise total inventory cost in SCM by applying a multi-agent-based machine learning called Reinforcement Learning (RL).Methods: The ability of RL in finding a hidden pattern of inventory policy is run under various constraints which have not been addressed together or simultaneously in previous research. These include capacitated manufacturer and warehouse, limitation of order to suppliers, stochastic demand, lead time uncertainty and multi-sourcing supply. RL was run through Q-Learning with four experiments and 1,000 iterations to examine its result consistency. Then, RL was contrasted to the previous mathematical method to check its efficiency in reducing inventory costs.Results: After 1,000 trial-error simulations, the most striking finding is that RL can perform more efficiently than the mathematical approach by placing optimum order quantities at the right time. In addition, this result was achieved under complex constraints and assumptions which have not been simultaneously simulated in previous studies.Conclusion: Results confirm that the RL approach will be invaluable when implemented to comparable supply network environments expressed in this project. Since RL still leads to higher shortages in this research, combining RL with other machine learning algorithms is suggested to have more robust end-to-end SCM analysis. Keywords: Inventory Policy, Multi-Echelon, Reinforcement Learning, Supply Chain Management, Q-Learning

APA, Harvard, Vancouver, ISO, and other styles

40

Zeng, Weixin, Xiang Zhao, Jiuyang Tang, Xuemin Lin, and Paul Groth. "Reinforcement Learning–based Collective Entity Alignment with Adaptive Features." ACM Transactions on Information Systems 39, no. 3 (May 6, 2021): 1–31. http://dx.doi.org/10.1145/3446428.

Full text

Abstract:

Entity alignment (EA) is the task of identifying the entities that refer to the same real-world object but are located in different knowledge graphs (KGs). For entities to be aligned, existing EA solutions treat them separately and generate alignment results as ranked lists of entities on the other side. Nevertheless, this decision-making paradigm fails to take into account the interdependence among entities. Although some recent efforts mitigate this issue by imposing the 1-to-1 constraint on the alignment process, they still cannot adequately model the underlying interdependence and the results tend to be sub-optimal. To fill in this gap, in this work, we delve into the dynamics of the decision-making process, and offer a reinforcement learning (RL)–based model to align entities collectively. Under the RL framework, we devise the coherence and exclusiveness constraints to characterize the interdependence and restrict collective alignment. Additionally, to generate more precise inputs to the RL framework, we employ representative features to capture different aspects of the similarity between entities in heterogeneous KGs, which are integrated by an adaptive feature fusion strategy. Our proposal is evaluated on both cross-lingual and mono-lingual EA benchmarks and compared against state-of-the-art solutions. The empirical results verify its effectiveness and superiority.

APA, Harvard, Vancouver, ISO, and other styles

41

Trella, Anna L., Kelly W. Zhang, Inbal Nahum-Shani, Vivek Shetty, Finale Doshi-Velez, and Susan A. Murphy. "Designing Reinforcement Learning Algorithms for Digital Interventions: Pre-Implementation Guidelines." Algorithms 15, no. 8 (July 22, 2022): 255. http://dx.doi.org/10.3390/a15080255.

Full text

Abstract:

Online reinforcement learning (RL) algorithms are increasingly used to personalize digital interventions in the fields of mobile health and online education. Common challenges in designing and testing an RL algorithm in these settings include ensuring the RL algorithm can learn and run stably under real-time constraints, and accounting for the complexity of the environment, e.g., a lack of accurate mechanistic models for the user dynamics. To guide how one can tackle these challenges, we extend the PCS (predictability, computability, stability) framework, a data science framework that incorporates best practices from machine learning and statistics in supervised learning to the design of RL algorithms for the digital interventions setting. Furthermore, we provide guidelines on how to design simulation environments, a crucial tool for evaluating RL candidate algorithms using the PCS framework. We show how we used the PCS framework to design an RL algorithm for Oralytics, a mobile health study aiming to improve users’ tooth-brushing behaviors through the personalized delivery of intervention messages. Oralytics will go into the field in late 2022.

APA, Harvard, Vancouver, ISO, and other styles

42

Koo, Seolwon, and Yujin Lim. "A Cluster-Based Optimal Computation Offloading Decision Mechanism Using RL in the IIoT Field." Applied Sciences 12, no. 1 (December 31, 2021): 384. http://dx.doi.org/10.3390/app12010384.

Full text

Abstract:

In the Industrial Internet of Things (IIoT), various tasks are created dynamically because of the small quantity batch production. Hence, it is difficult to execute tasks only with devices that have limited battery lives and computation capabilities. To solve this problem, we adopted the mobile edge computing (MEC) paradigm. However, if there are numerous tasks to be processed on the MEC server (MECS), it may not be suitable to deal with all tasks in the server within a delay constraint owing to the limited computational capability and high network overhead. Therefore, among cooperative computing techniques, we focus on task offloading to nearby devices using device-to-device (D2D) communication. Consequently, we propose a method that determines the optimal offloading strategy in an MEC environment with D2D communication. We aim to minimize the energy consumption of the devices and task execution delay under certain delay constraints. To solve this problem, we adopt a Q-learning algorithm that is part of reinforcement learning (RL). However, if one learning agent determines whether to offload tasks from all devices, the computing complexity of that agent increases tremendously. Thus, we cluster the nearby devices that comprise the job shop, where each cluster’s head determines the optimal offloading strategy for the tasks that occur within its cluster. Simulation results show that the proposed algorithm outperforms the compared methods in terms of device energy consumption, task completion rate, task blocking rate, and throughput.

APA, Harvard, Vancouver, ISO, and other styles

43

Liang, Di, Rui Yin, and Rong Peng Liu. "Project Management Plan's Research and Application under Resource Constraints." Applied Mechanics and Materials 687-691 (November 2014): 4790–93. http://dx.doi.org/10.4028/www.scientific.net/amm.687-691.4790.

Full text

Abstract:

For the problems existing in the traditional RCS, a proved RL-RCS planning model is proposed, and the CPM calculation method of timing allowance which is in consideration of resource tensity, job complexity, risk level is feasible.

APA, Harvard, Vancouver, ISO, and other styles

44

Lobbezoo, Andrew, and Hyock-Ju Kwon. "Simulated and Real Robotic Reach, Grasp, and Pick-and-Place Using Combined Reinforcement Learning and Traditional Controls." Robotics 12, no. 1 (January 16, 2023): 12. http://dx.doi.org/10.3390/robotics12010012.

Full text

Abstract:

The majority of robots in factories today are operated with conventional control strategies that require individual programming on a task-by-task basis, with no margin for error. As an alternative to the rudimentary operation planning and task-programming techniques, machine learning has shown significant promise for higher-level task planning, with the development of reinforcement learning (RL)-based control strategies. This paper reviews the implementation of combined traditional and RL control for simulated and real environments to validate the RL approach for standard industrial tasks such as reach, grasp, and pick-and-place. The goal of this research is to bring intelligence to robotic control so that robotic operations can be completed without precisely defining the environment, constraints, and the action plan. The results from this approach provide optimistic preliminary data on the application of RL to real-world robotics.

APA, Harvard, Vancouver, ISO, and other styles

45

Xu, Feng, Shengyi Jiang, Hao Yin, Zongzhang Zhang, Yang Yu, Ming Li, Dong Li, and Wulong Liu. "Enhancing Context-Based Meta-Reinforcement Learning Algorithms via An Efficient Task Encoder (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 18 (May 18, 2021): 15937–38. http://dx.doi.org/10.1609/aaai.v35i18.17965.

Full text

Abstract:

Meta-Reinforcement Learning (meta-RL) algorithms enable agents to adapt to new tasks from small amounts of exploration, based on the experience of similar tasks. Recent studies have pointed out that a good representation of a task is key to the success of off-policy context-based meta-RL. Inspired by contrastive methods in unsupervised representation learning, we propose a new method to learn the task representation based on the mutual information between transition tuples in a trajectory and the task embedding. We also propose a new estimation for task similarity based on Q-function, which can be used to form a constraint on the distribution of the encoded task variables, making the task encoder encode the task variables more effective on new tasks. Experiments on meta-RL tasks show that the newly proposed method outperforms existing meta-RL algorithms.

APA, Harvard, Vancouver, ISO, and other styles

46

Xu, Yuan, Xinyu Fu, Thomas D. Sharkey, Yair Shachar-Hill, and and Berkley J. Walker. "The metabolic origins of non-photorespiratory CO2 release during photosynthesis: a metabolic flux analysis." Plant Physiology 186, no. 1 (February 16, 2021): 297–314. http://dx.doi.org/10.1093/plphys/kiab076.

Full text

Abstract:

Abstract Respiration in the light (RL) releases CO2 in photosynthesizing leaves and is a phenomenon that occurs independently from photorespiration. Since RL lowers net carbon fixation, understanding RL could help improve plant carbon-use efficiency and models of crop photosynthesis. Although RL was identified more than 75 years ago, its biochemical mechanisms remain unclear. To identify reactions contributing to RL, we mapped metabolic fluxes in photosynthesizing source leaves of the oilseed crop and model plant camelina (Camelina sativa). We performed a flux analysis using isotopic labeling patterns of central metabolites during 13CO2 labeling time course, gas exchange, and carbohydrate production rate experiments. To quantify the contributions of multiple potential CO2 sources with statistical and biological confidence, we increased the number of metabolites measured and reduced biological and technical heterogeneity by using single mature source leaves and quickly quenching metabolism by directly injecting liquid N2; we then compared the goodness-of-fit between these data and data from models with alternative metabolic network structures and constraints. Our analysis predicted that RL releases 5.2 μmol CO2 g−1 FW h−1 of CO2, which is relatively consistent with a value of 9.3 μmol CO2 g−1 FW h−1 measured by CO2 gas exchange. The results indicated that ≤10% of RL results from TCA cycle reactions, which are widely considered to dominate RL. Further analysis of the results indicated that oxidation of glucose-6-phosphate to pentose phosphate via 6-phosphogluconate (the G6P/OPP shunt) can account for >93% of CO2 released by RL.

APA, Harvard, Vancouver, ISO, and other styles

47

Ibarz, Julian, Jie Tan, Chelsea Finn, Mrinal Kalakrishnan, Peter Pastor, and Sergey Levine. "How to train your robot with deep reinforcement learning: lessons we have learned." International Journal of Robotics Research 40, no. 4-5 (January 31, 2021): 698–721. http://dx.doi.org/10.1177/0278364920987859.

Full text

Abstract:

Deep reinforcement learning (RL) has emerged as a promising approach for autonomously acquiring complex behaviors from low-level sensor observations. Although a large portion of deep RL research has focused on applications in video games and simulated control, which does not connect with the constraints of learning in real environments, deep RL has also demonstrated promise in enabling physical robots to learn complex skills in the real world. At the same time, real-world robotics provides an appealing domain for evaluating such algorithms, as it connects directly to how humans learn: as an embodied agent in the real world. Learning to perceive and move in the real world presents numerous challenges, some of which are easier to address than others, and some of which are often not considered in RL research that focuses only on simulated domains. In this review article, we present a number of case studies involving robotic deep RL. Building off of these case studies, we discuss commonly perceived challenges in deep RL and how they have been addressed in these works. We also provide an overview of other outstanding challenges, many of which are unique to the real-world robotics setting and are not often the focus of mainstream RL research. Our goal is to provide a resource both for roboticists and machine learning researchers who are interested in furthering the progress of deep RL in the real world.

APA, Harvard, Vancouver, ISO, and other styles

48

Xu, Shihao, Yingzi Guan, Changzhu Wei, Yulong Li, and Lei Xu. "Reinforcement-Learning-Based Tracking Control with Fixed-Time Prescribed Performance for Reusable Launch Vehicle under Input Constraints." Applied Sciences 12, no. 15 (July 24, 2022): 7436. http://dx.doi.org/10.3390/app12157436.

Full text

Abstract:

This paper proposes a novel reinforcement learning (RL)-based tracking control scheme with fixed-time prescribed performance for a reusable launch vehicle subject to parametric uncertainties, external disturbances, and input constraints. First, a fixed-time prescribed performance function is employed to restrain attitude tracking errors, and an equivalent unconstrained system is derived via an error transformation technique. Then, a hyperbolic tangent function is incorporated into the optimal performance index of the unconstrained system to tackle the input constraints. Subsequently, an actor-critic RL framework with super-twisting-like sliding mode control is constructed to establish a practical solution for the optimal control problem. Benefiting from the proposed scheme, the robustness of the RL-based controller against unknown dynamics is enhanced, and the control performance can be qualitatively prearranged by users. Theoretical analysis shows that the attitude tracking errors converge to a preset region within a preassigned fixed time, and the weight estimation errors of the actor-critic networks are uniformly ultimately bounded. Finally, comparative numerical simulation results are provided to illustrate the effectiveness and improved performance of the proposed control scheme.

APA, Harvard, Vancouver, ISO, and other styles

49

Langlois, Eric D., and Tom Everitt. "How RL Agents Behave When Their Actions Are Modified." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 13 (May 18, 2021): 11586–94. http://dx.doi.org/10.1609/aaai.v35i13.17378.

Full text

Abstract:

Reinforcement learning in complex environments may require supervision to prevent the agent from attempting dangerous actions. As a result of supervisor intervention, the executed action may differ from the action specified by the policy. How does this affect learning? We present the Modified-Action Markov Decision Process, an extension of the MDP model that allows actions to differ from the policy. We analyze the asymptotic behaviours of common reinforcement learning algorithms in this setting and show that they adapt in different ways: some completely ignore modifications while others go to various lengths in trying to avoid action modifications that decrease reward. By choosing the right algorithm, developers can prevent their agents from learning to circumvent interruptions or constraints, and better control agent responses to other kinds of action modification, like self-damage.

APA, Harvard, Vancouver, ISO, and other styles

50

Zeng, Junjie, Long Qin, Yue Hu, Cong Hu, and Quanjun Yin. "Combining Subgoal Graphs with Reinforcement Learning to Build a Rational Pathfinder." Applied Sciences 9, no. 2 (January 17, 2019): 323. http://dx.doi.org/10.3390/app9020323.

Full text

Abstract:

In this paper, we present a hierarchical path planning framework called SG–RL (subgoal graphs–reinforcement learning), to plan rational paths for agents maneuvering in continuous and uncertain environments. By “rational”, we mean (1) efficient path planning to eliminate first-move lags; (2) collision-free and smooth for agents with kinematic constraints satisfied. SG–RL works in a two-level manner. At the first level, SG–RL uses a geometric path-planning method, i.e., simple subgoal graphs (SSGs), to efficiently find optimal abstract paths, also called subgoal sequences. At the second level, SG–RL uses an RL method, i.e., least-squares policy iteration (LSPI), to learn near-optimal motion-planning policies which can generate kinematically feasible and collision-free trajectories between adjacent subgoals. The first advantage of the proposed method is that SSG can solve the limitations of sparse reward and local minima trap for RL agents; thus, LSPI can be used to generate paths in complex environments. The second advantage is that, when the environment changes slightly (i.e., unexpected obstacles appearing), SG–RL does not need to reconstruct subgoal graphs and replan subgoal sequences using SSGs, since LSPI can deal with uncertainties by exploiting its generalization ability to handle changes in environments. Simulation experiments in representative scenarios demonstrate that, compared with existing methods, SG–RL can work well on large-scale maps with relatively low action-switching frequencies and shorter path lengths, and SG–RL can deal with small changes in environments. We further demonstrate that the design of reward functions and the types of training environments are important factors for learning feasible policies.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!