Abstract:This study presents a reinforcement-learning-based intelligent configuration method for optimizing the actuation of multi-segment earthworm-like robots. First, a dynamic model of the multi-segment robotic system is established, and the actuator arrangement problem is formulated as a Markov decision process. By designing a multi-discrete action space, computational costs are significantly reduced. A reward function integrating locomotion speed and energy consumption constraints is proposed to effectively balance exploration and exploitation. For actuator-limited conditions, an action masking mechanism enables efficient policy search under hard constraints. Key findings include: (1) Midline-symmetric actuation yields optimal performance under full-drive conditions; (2) A “posterior-priority, centripetal-clustering” distribution pattern emerges under constrained actuation.