研究目的
To propose reinforcement learning-based computation offloading schemes for IoT devices with energy harvesting to optimize offloading policies without prior knowledge of MEC, energy consumption, and computation latency models.
研究成果
The proposed RL-based offloading schemes enable IoT devices to achieve optimal offloading policies without prior model knowledge, reducing energy consumption, computation latency, and task drop rate while increasing utility. The deep RL scheme accelerates learning and outperforms benchmarks, with theoretical bounds validated through simulations. Future work includes experimental implementation.
研究不足
The assumptions include known EH models and Markov chain-based radio transmission rates, which may not fully capture real-world dynamics. The computational complexity of deep RL may be high for resource-constrained IoT devices, and the schemes rely on sufficient exploration time for convergence.
1:Experimental Design and Method Selection:
The study uses reinforcement learning (RL) and deep RL techniques, specifically Q-learning and deep convolutional neural networks (CNN), to model the offloading process as a Markov decision process (MDP). Algorithms 1 (RLO) and 2 (DRLO) are designed for offloading policy selection based on system states including battery level, radio transmission rates, and predicted harvested energy.
2:Sample Selection and Data Sources:
Simulations are performed for an IoT device with RF energy harvesting, generating computation tasks at 120 kb/s, with parameters modeled as Markov chains (e.g., radio transmission rates and distance).
3:List of Experimental Equipment and Materials:
IoT device with EH module, edge devices, RF energy transmitter, and computational resources (e.g., Qualcomm Snapdragon 800 and Nvidia Tegra K1 platforms for deep learning implementation).
4:Experimental Procedures and Operational Workflow:
Time is slotted; in each slot, the IoT device estimates harvested energy, observes system state, selects offloading policy using ε-greedy method, offloads tasks, evaluates utility, and updates Q-function or CNN weights.
5:Data Analysis Methods:
Performance metrics include energy consumption, computation latency, task drop rate, and utility. Simulations compare proposed schemes with benchmarks (e.g., DRL, Q-learning, non-offloading) over time slots, with theoretical bounds derived for convergence.
独家科研数据包,助您复现前沿成果,加速创新突破
获取完整内容