Robotics 24
☆ Multi-Agent Feedback Motion Planning using Probably Approximately Correct Nonlinear Model Predictive Control
For many tasks, multi-robot teams often provide greater efficiency,
robustness, and resiliency. However, multi-robot collaboration in real-world
scenarios poses a number of major challenges, especially when dynamic robots
must balance competing objectives like formation control and obstacle avoidance
in the presence of stochastic dynamics and sensor uncertainty. In this paper,
we propose a distributed, multi-agent receding-horizon feedback motion planning
approach using Probably Approximately Correct Nonlinear Model Predictive
Control (PAC-NMPC) that is able to reason about both model and measurement
uncertainty to achieve robust multi-agent formation control while navigating
cluttered obstacle fields and avoiding inter-robot collisions. Our approach
relies not only on the underlying PAC-NMPC algorithm but also on a terminal
cost-function derived from gyroscopic obstacle avoidance. Through numerical
simulation, we show that our distributed approach performs on par with a
centralized formulation, that it offers improved performance in the case of
significant measurement noise, and that it can scale to more complex dynamical
systems.
comment: 10 pages, 12 figures
☆ Improving robot understanding using conversational AI: demonstration and feasibility study
Explanations constitute an important aspect of successful human robot
interactions and can enhance robot understanding. To improve the understanding
of the robot, we have developed four levels of explanation (LOE) based on two
questions: what needs to be explained, and why the robot has made a particular
decision. The understandable robot requires a communicative action when there
is disparity between the human s mental model of the robot and the robots state
of mind. This communicative action was generated by utilizing a conversational
AI platform to generate explanations. An adaptive dialog was implemented for
transition from one LOE to another. Here, we demonstrate the adaptive dialog in
a collaborative task with errors and provide results of a feasibility study
with users.
comment: 40th Anniversary, IEEE International Conference on Robotics and
Automation,2024
☆ Evaluating Efficiency and Engagement in Scripted and LLM-Enhanced Human-Robot Interactions
To achieve natural and intuitive interaction with people, HRI frameworks
combine a wide array of methods for human perception, intention communication,
human-aware navigation and collaborative action. In practice, when encountering
unpredictable behavior of people or unexpected states of the environment, these
frameworks may lack the ability to dynamically recognize such states, adapt and
recover to resume the interaction. Large Language Models (LLMs), owing to their
advanced reasoning capabilities and context retention, present a promising
solution for enhancing robot adaptability. This potential, however, may not
directly translate to improved interaction metrics. This paper considers a
representative interaction with an industrial robot involving approach,
instruction, and object manipulation, implemented in two conditions: (1) fully
scripted and (2) including LLM-enhanced responses. We use gaze tracking and
questionnaires to measure the participants' task efficiency, engagement, and
robot perception. The results indicate higher subjective ratings for the LLM
condition, but objective metrics show that the scripted condition performs
comparably, particularly in efficiency and focus during simple tasks. We also
note that the scripted condition may have an edge over LLM-enhanced responses
in terms of response latency and energy consumption, especially for trivial and
repetitive interactions.
comment: Accepted as a Late-Breaking Report to the 2025, 20th ACM/IEEE
International Conference on Human-Robot Interaction (HRI)
☆ Towards autonomous photogrammetric forest inventory using a lightweight under-canopy robotic drone
Väinö Karjalainen, Niko Koivumäki, Teemu Hakala, Jesse Muhojoki, Eric Hyyppä, Anand George, Juha Suomalainen, Eija Honkavaara
Drones are increasingly used in forestry to capture high-resolution remote
sensing data. While operations above the forest canopy are already highly
automated, flying inside forests remains challenging, primarily relying on
manual piloting. Inside dense forests, reliance on the Global Navigation
Satellite System (GNSS) for localization is not feasible. Additionally, the
drone must autonomously adjust its flight path to avoid collisions. Recently,
advancements in robotics have enabled autonomous drone flights in GNSS-denied
obstacle-rich areas. In this article, a step towards autonomous forest data
collection is taken by building a prototype of a robotic under-canopy drone
utilizing state-of-the-art open-source methods and validating its performance
for data collection inside forests. The autonomous flight capability was
evaluated through multiple test flights in two boreal forest test sites. The
tree parameter estimation capability was studied by conducting diameter at
breast height (DBH) estimation using onboard stereo camera data and
photogrammetric methods. The prototype conducted flights in selected
challenging forest environments, and the experiments showed excellent
performance in forest reconstruction with a miniaturized stereoscopic
photogrammetric system. The stem detection algorithm managed to identify 79.31
% of the stems. The DBH estimation had a root mean square error (RMSE) of 3.33
cm (12.79 %) and a bias of 1.01 cm (3.87 %) across all trees. For trees with a
DBH less than 30 cm, the RMSE was 1.16 cm (5.74 %), and the bias was 0.13 cm
(0.64 %). When considering the overall performance in terms of DBH accuracy,
autonomy, and forest complexity, the proposed approach was superior compared to
methods proposed in the scientific literature. Results provided valuable
insights into autonomous forest reconstruction using drones, and several
further development topics were proposed.
comment: 35 pages, 13 Figures
☆ Low-Cost 3D printed, Biocompatible Ionic Polymer Membranes for Soft Actuators
Nils Trümpler, Ryo Kanno, Niu David, Anja Huch, Pham Huy Nguyen, Maksims Jurinovs, Gustav Nyström, Sergejs Gaidukovs, Mirko Kovac
Ionic polymer actuators, in essence, consist of ion exchange polymers
sandwiched between layers of electrodes. They have recently gained recognition
as promising candidates for soft actuators due to their lightweight nature,
noise-free operation, and low-driving voltages. However, the materials
traditionally utilized to develop them are often not human/environmentally
friendly. Thus, to address this issue, researchers have been focusing on
developing biocompatible versions of this actuator. Despite this, such
actuators still face challenges in achieving high performance, in payload
capacity, bending capabilities, and response time. In this paper, we present a
biocompatible ionic polymer actuator whose membrane is fully 3D printed
utilizing a direct ink writing method. The structure of the printed membranes
consists of biodegradable ionic fluid encapsulated within layers of activated
carbon polymers. From the microscopic observations of its structure, we
confirmed that the ionic polymer is well encapsulated. The actuators can
achieve a bending performance of up to 124$^\circ$ (curvature of 0.82
$\text{cm}^{-1}$), which, to our knowledge, is the highest curvature attained
by any bending ionic polymer actuator to date. It can operate comfortably up to
a 2 Hz driving frequency and can achieve blocked forces of up to 0.76 mN. Our
results showcase a promising, high-performing biocompatible ionic polymer
actuator, whose membrane can be easily manufactured in a single step using a
standard FDM 3D printer. This approach paves the way for creating customized
designs for functional soft robotic applications, including human-interactive
devices, in the near future.
comment: 6 pages, 8 figures, Accepted in IEEE International Conference on Soft
Robotics 2025 (Robosoft)
☆ Learning to Hop for a Single-Legged Robot with Parallel Mechanism
This work presents the application of reinforcement learning to improve the
performance of a highly dynamic hopping system with a parallel mechanism.
Unlike serial mechanisms, parallel mechanisms can not be accurately simulated
due to the complexity of their kinematic constraints and closed-loop
structures. Besides, learning to hop suffers from prolonged aerial phase and
the sparse nature of the rewards. To address them, we propose a learning
framework to encode long-history feedback to account for the under-actuation
brought by the prolonged aerial phase. In the proposed framework, we also
introduce a simplified serial configuration for the parallel design to avoid
directly simulating parallel structure during the training. A torque-level
conversion is designed to deal with the parallel-serial conversion to handle
the sim-to-real issue. Simulation and hardware experiments have been conducted
to validate this framework.
☆ Navigating Robot Swarm Through a Virtual Tube with Flow-Adaptive Distribution Control
With the rapid development of robot swarm technology and its diverse
applications, navigating robot swarms through complex environments has emerged
as a critical research direction. To ensure safe navigation and avoid potential
collisions with obstacles, the concept of virtual tubes has been introduced to
define safe and navigable regions. However, current control methods in virtual
tubes face the congestion issues, particularly in narrow virtual tubes with low
throughput. To address these challenges, we first originally introduce the
concepts of virtual tube area and flow capacity, and develop an new evolution
model for the spatial density function. Next, we propose a novel control method
that combines a modified artificial potential field (APF) for swarm navigation
and density feedback control for distribution regulation, under which a
saturated velocity command is designed. Then, we generate a global velocity
field that not only ensures collision-free navigation through the virtual tube,
but also achieves locally input-to-state stability (LISS) for density tracking
errors, both of which are rigorously proven. Finally, numerical simulations and
realistic applications validate the effectiveness and advantages of the
proposed method in managing robot swarms within narrow virtual tubes.
☆ Nocturnal eye inspired liquid to gas phase change soft actuator with Laser-Induced-Graphene: enhanced environmental light harvesting and photothermal conversion
Robotic systems' mobility is constrained by power sources and wiring. While
pneumatic actuators remain tethered to air supplies, we developed a new
actuator utilizing light energy. Inspired by nocturnal animals' eyes, we
designed a bilayer soft actuator incorporating Laser-Induced Graphene (LIG) on
the inner surface of a silicone layer. This design maintains silicone's
transparency and flexibility while achieving 54% faster response time compared
to conventional actuators through enhanced photothermal conversion.
comment: 23pages, 8 figures, journal paper
☆ DynoSAM: Open-Source Smoothing and Mapping Framework for Dynamic SLAM
Traditional Visual Simultaneous Localization and Mapping (vSLAM) systems
focus solely on static scene structures, overlooking dynamic elements in the
environment. Although effective for accurate visual odometry in complex
scenarios, these methods discard crucial information about moving objects. By
incorporating this information into a Dynamic SLAM framework, the motion of
dynamic entities can be estimated, enhancing navigation whilst ensuring
accurate localization. However, the fundamental formulation of Dynamic SLAM
remains an open challenge, with no consensus on the optimal approach for
accurate motion estimation within a SLAM pipeline. Therefore, we developed
DynoSAM, an open-source framework for Dynamic SLAM that enables the efficient
implementation, testing, and comparison of various Dynamic SLAM optimization
formulations. DynoSAM integrates static and dynamic measurements into a unified
optimization problem solved using factor graphs, simultaneously estimating
camera poses, static scene, object motion or poses, and object structures. We
evaluate DynoSAM across diverse simulated and real-world datasets, achieving
state-of-the-art motion estimation in indoor and outdoor environments, with
substantial improvements over existing systems. Additionally, we demonstrate
DynoSAM utility in downstream applications, including 3D reconstruction of
dynamic scenes and trajectory prediction, thereby showcasing potential for
advancing dynamic object-aware SLAM systems. DynoSAM is open-sourced at
https://github.com/ACFR-RPG/DynOSAM.
comment: 20 pages, 10 figures. Submitted to T-RO Visual SLAM SI 2025
☆ Connection-Coordination Rapport (CCR) Scale: A Dual-Factor Scale to Measure Human-Robot Rapport
Robots, particularly in service and companionship roles, must develop
positive relationships with people they interact with regularly to be
successful. These positive human-robot relationships can be characterized as
establishing "rapport," which indicates mutual understanding and interpersonal
connection that form the groundwork for successful long-term human-robot
interaction. However, the human-robot interaction research literature lacks
scale instruments to assess human-robot rapport in a variety of situations. In
this work, we developed the 18-item Connection-Coordination Rapport (CCR) Scale
to measure human-robot rapport. We first ran Study 1 (N = 288) where online
participants rated videos of human-robot interactions using a set of candidate
items. Our Study 1 results showed the discovery of two factors in our scale,
which we named "Connection" and "Coordination." We then evaluated this scale by
running Study 2 (N = 201) where online participants rated a new set of
human-robot interaction videos with our scale and an existing rapport scale
from virtual agents research for comparison. We also validated our scale by
replicating a prior in-person human-robot interaction study, Study 3 (N = 44),
and found that rapport is rated significantly greater when participants
interacted with a responsive robot (responsive condition) as opposed to an
unresponsive robot (unresponsive condition). Results from these studies
demonstrate high reliability and validity for the CCR scale, which can be used
to measure rapport in both first-person and third-person perspectives. We
encourage the adoption of this scale in future studies to measure rapport in a
variety of human-robot interactions.
comment: 8 pages, 5 figures
☆ Automating High Quality RT Planning at Scale
Riqiang Gao, Mamadou Diallo, Han Liu, Anthony Magliari, Jonathan Sackett, Wilko Verbakel, Sandra Meyers, Masoud Zarepisheh, Rafe Mcbeth, Simon Arberet, Martin Kraus, Florin C. Ghesu, Ali Kamen
Radiotherapy (RT) planning is complex, subjective, and time-intensive.
Advances in artificial intelligence (AI) promise to improve its precision,
efficiency, and consistency, but progress is often limited by the scarcity of
large, standardized datasets. To address this, we introduce the Automated
Iterative RT Planning (AIRTP) system, a scalable solution for generating
high-quality treatment plans. This scalable solution is designed to generate
substantial volumes of consistently high-quality treatment plans, overcoming a
key obstacle in the advancement of AI-driven RT planning. Our AIRTP pipeline
adheres to clinical guidelines and automates essential steps, including
organ-at-risk (OAR) contouring, helper structure creation, beam setup,
optimization, and plan quality improvement, using AI integrated with RT
planning software like Eclipse of Varian. Furthermore, a novel approach for
determining optimization parameters to reproduce 3D dose distributions, i.e. a
method to convert dose predictions to deliverable treatment plans constrained
by machine limitations. A comparative analysis of plan quality reveals that our
automated pipeline produces treatment plans of quality comparable to those
generated manually, which traditionally require several hours of labor per
plan. Committed to public research, the first data release of our AIRTP
pipeline includes nine cohorts covering head-and-neck and lung cancer sites to
support an AAPM 2025 challenge. This data set features more than 10 times the
number of plans compared to the largest existing well-curated public data set
to our best knowledge.
Repo:{https://github.com/RiqiangGao/GDP-HMM_AAPMChallenge}
comment: Related to GDP-HMM grand challenge
♻ ☆ FoundationStereo: Zero-Shot Stereo Matching
Tremendous progress has been made in deep stereo matching to excel on
benchmark datasets through per-domain fine-tuning. However, achieving strong
zero-shot generalization - a hallmark of foundation models in other computer
vision tasks - remains challenging for stereo matching. We introduce
FoundationStereo, a foundation model for stereo depth estimation designed to
achieve strong zero-shot generalization. To this end, we first construct a
large-scale (1M stereo pairs) synthetic training dataset featuring large
diversity and high photorealism, followed by an automatic self-curation
pipeline to remove ambiguous samples. We then design a number of network
architecture components to enhance scalability, including a side-tuning feature
backbone that adapts rich monocular priors from vision foundation models to
mitigate the sim-to-real gap, and long-range context reasoning for effective
cost volume filtering. Together, these components lead to strong robustness and
accuracy across domains, establishing a new standard in zero-shot stereo depth
estimation. Project page: https://nvlabs.github.io/FoundationStereo/
♻ ☆ A Search-to-Control Reinforcement Learning Based Framework for Quadrotor Local Planning in Dense Environments
Agile flight in complex environments poses significant challenges to current
motion planning methods, as they often fail to fully leverage the quadrotor's
dynamic potential, leading to performance failures and reduced efficiency
during aggressive maneuvers. Existing approaches frequently decouple trajectory
optimization from control generation and neglect the dynamics, further limiting
their ability to generate aggressive and feasible motions. To address these
challenges, we introduce an enhanced Search-to-Control planning framework that
integrates visibility path searching with reinforcement learning (RL) control
generation, directly accounting for dynamics and bridging the gap between
planning and control. Our method first extracts control points from
collision-free paths using a proposed heuristic search, which are then refined
by an RL policy to generate low-level control commands for the quadrotor's
controller, utilizing reduced-dimensional obstacle observations for efficient
inference with lightweight neural networks. We validate the framework through
simulations and real-world experiments, demonstrating improved time efficiency
and dynamic maneuverability compared to existing methods, while confirming its
robustness and applicability. To support further research, We will release our
implementation as an open-source package.
♻ ☆ RadaRays: Real-time Simulation of Rotating FMCW Radar for Mobile Robotics via Hardware-accelerated Ray Tracing
RadaRays allows for the accurate modeling and simulation of rotating FMCW
radar sensors in complex environments, including the simulation of reflection,
refraction, and scattering of radar waves. Our software is able to handle large
numbers of objects and materials in real-time, making it suitable for use in a
variety of mobile robotics applications. We demonstrate the effectiveness of
RadaRays through a series of experiments and show that it can more accurately
reproduce the behavior of FMCW radar sensors in a variety of environments,
compared to the ray casting-based lidar-like simulations that are commonly used
in simulators for autonomous driving such as CARLA. Our experiments
additionally serve as a valuable reference point for researchers to evaluate
their own radar simulations. By using RadaRays, developers can significantly
reduce the time and cost associated with prototyping and testing FMCW
radar-based algorithms. We also provide a Gazebo plugin that makes our work
accessible to the mobile robotics community.
♻ ☆ Sampling-based Model Predictive Control Leveraging Parallelizable Physics Simulations
Corrado Pezzato, Chadi Salmi, Elia Trevisan, Max Spahn, Javier Alonso-Mora, Carlos Hernández Corbato
We present a method for sampling-based model predictive control that makes
use of a generic physics simulator as the dynamical model. In particular, we
propose a Model Predictive Path Integral controller (MPPI), that uses the
GPU-parallelizable IsaacGym simulator to compute the forward dynamics of a
problem. By doing so, we eliminate the need for explicit encoding of robot
dynamics and contacts with objects for MPPI. Since no explicit dynamic modeling
is required, our method is easily extendable to different objects and robots
and allows one to solve complex navigation and contact-rich tasks. We
demonstrate the effectiveness of this method in several simulated and
real-world settings, among which mobile navigation with collision avoidance,
non-prehensile manipulation, and whole-body control for high-dimensional
configuration spaces. This method is a powerful and accessible open-source tool
to solve a large variety of contact-rich motion planning tasks.
comment: Accepted for RA-L. Code and videos available at
https://autonomousrobots.nl/paper_websites/isaac-mppi
♻ ☆ Concurrent-Learning Based Relative Localization in Shape Formation of Robot Swarms (Extended version)
In this paper, we address the shape formation problem for massive robot
swarms in environments where external localization systems are unavailable.
Achieving this task effectively with solely onboard measurements is still
scarcely explored and faces some practical challenges. To solve this
challenging problem, we propose the following novel results. Firstly, to
estimate the relative positions among neighboring robots, a concurrent-learning
based estimator is proposed. It relaxes the persistent excitation condition
required in the classical ones such as least-square estimator. Secondly, we
introduce a finite-time agreement protocol to determine the shape location.
This is achieved by estimating the relative position between each robot and a
randomly assigned seed robot. The initial position of the seed one marks the
shape location. Thirdly, based on the theoretical results of the relative
localization, a novel behavior-based control strategy is devised. This strategy
not only enables adaptive shape formation of large group of robots but also
enhances the observability of inter-robot relative localization. Numerical
simulation results are provided to verify the performance of our proposed
strategy compared to the state-of-the-art ones. Additionally, outdoor
experiments on real robots further demonstrate the practical effectiveness and
robustness of our methods.
♻ ☆ Multi-Agent Consensus Seeking via Large Language Models
Multi-agent systems driven by large language models (LLMs) have shown
promising abilities for solving complex tasks in a collaborative manner. This
work considers a fundamental problem in multi-agent collaboration: consensus
seeking. When multiple agents work together, we are interested in how they can
reach a consensus through inter-agent negotiation. To that end, this work
studies a consensus-seeking task where the state of each agent is a numerical
value and they negotiate with each other to reach a consensus value. It is
revealed that when not explicitly directed on which strategy should be adopted,
the LLM-driven agents primarily use the average strategy for consensus seeking
although they may occasionally use some other strategies. Moreover, this work
analyzes the impact of the agent number, agent personality, and network
topology on the negotiation process. The findings reported in this work can
potentially lay the foundations for understanding the behaviors of LLM-driven
multi-agent systems for solving more complex tasks. Furthermore, LLM-driven
consensus seeking is applied to a multi-robot aggregation task. This
application demonstrates the potential of LLM-driven agents to achieve
zero-shot autonomous planning for multi-robot collaboration tasks. Project
website: windylab.github.io/ConsensusLLM/.
♻ ☆ AirPilot: Interpretable PPO-based DRL Auto-Tuned Nonlinear PID Drone Controller for Robust Autonomous Flights
Navigation precision, speed and stability are crucial for safe Unmanned
Aerial Vehicle (UAV) flight maneuvers and effective flight mission executions
in dynamic environments. Different flight missions may have varying objectives,
such as minimizing energy consumption, achieving precise positioning, or
maximizing speed. A controller that can adapt to different objectives on the
fly is highly valuable. Proportional Integral Derivative (PID) controllers are
one of the most popular and widely used control algorithms for drones and other
control systems, but their linear control algorithm fails to capture the
nonlinear nature of the dynamic wind conditions and complex drone system.
Manually tuning the PID gains for various missions can be time-consuming and
requires significant expertise. This paper aims to revolutionize drone flight
control by presenting the AirPilot, a nonlinear Deep Reinforcement Learning
(DRL) - enhanced Proportional Integral Derivative (PID) drone controller using
Proximal Policy Optimization (PPO). AirPilot controller combines the simplicity
and effectiveness of traditional PID control with the adaptability, learning
capability, and optimization potential of DRL. This makes it better suited for
modern drone applications where the environment is dynamic, and
mission-specific performance demands are high. We employed a COEX Clover
autonomous drone for training the DRL agent within the simulator and
implemented it in a real-world lab setting, which marks a significant milestone
as one of the first attempts to apply a DRL-based flight controller on an
actual drone. Airpilot is capable of reducing the navigation error of the
default PX4 PID position controller by 90%, improving effective navigation
speed of a fine-tuned PID controller by 21%, reducing settling time and
overshoot by 17% and 16% respectively.
comment: 9 pages, 20 figures
♻ ☆ Complementarity-Free Multi-Contact Modeling and Optimization for Dexterous Manipulation
A significant barrier preventing model-based methods from achieving real-time
and versatile dexterous robotic manipulation is the inherent complexity of
multi-contact dynamics. Traditionally formulated as complementarity models,
multi-contact dynamics introduces non-smoothness and combinatorial complexity,
complicating contact-rich planning and optimization. In this paper, we
circumvent these challenges by introducing a lightweight yet capable
multi-contact model. Our new model, derived from the duality of
optimization-based contact models, dispenses with the complementarity
constructs entirely, providing computational advantages such as closed-form
time stepping, differentiability, automatic satisfaction with Coulomb friction
law, and minimal hyperparameter tuning. We demonstrate the effectiveness and
efficiency of the model for planning and control in a range of challenging
dexterous manipulation tasks, including fingertip 3D in-air manipulation,
TriFinger in-hand manipulation, and Allegro hand on-palm reorientation, all
performed with diverse objects. Our method consistently achieves
state-of-the-art results: (I) a 96.5% average success rate across all objects
and tasks, (II) high manipulation accuracy with an average reorientation error
of 11{\deg} and position error of 7.8mm, and (III) contact-implicit model
predictive control running at 50-100 Hz for all objects and tasks. These
results are achieved with minimal hyperparameter tuning.
comment: Video demo: https://youtu.be/NsL4hbSXvFg
♻ ☆ Optimal Spatial-Temporal Triangulation for Bearing-Only Cooperative Motion Estimation
Vision-based cooperative motion estimation is an important problem for many
multi-robot systems such as cooperative aerial target pursuit. This problem can
be formulated as bearing-only cooperative motion estimation, where the visual
measurement is modeled as a bearing vector pointing from the camera to the
target. The conventional approaches for bearing-only cooperative estimation are
mainly based on the framework distributed Kalman filtering (DKF). In this
paper, we propose a new optimal bearing-only cooperative estimation algorithm,
named spatial-temporal triangulation, based on the method of distributed
recursive least squares, which provides a more flexible framework for designing
distributed estimators than DKF. The design of the algorithm fully incorporates
all the available information and the specific triangulation geometric
constraint. As a result, the algorithm has superior estimation performance than
the state-of-the-art DKF algorithms in terms of both accuracy and convergence
speed as verified by numerical simulation. We rigorously prove the exponential
convergence of the proposed algorithm. Moreover, to verify the effectiveness of
the proposed algorithm under practical challenging conditions, we develop a
vision-based cooperative aerial target pursuit system, which is the first of
such fully autonomous systems so far to the best of our knowledge.
♻ ☆ Tightly-Coupled LiDAR-IMU-Wheel Odometry with an Online Neural Kinematic Model Learning via Factor Graph Optimization
Taku Okawara, Kenji Koide, Shuji Oishi, Masashi Yokozuka, Atsuhiko Banno, Kentaro Uno, Kazuya Yoshida
Environments lacking geometric features (e.g., tunnels and long straight
corridors) are challenging for LiDAR-based odometry algorithms because LiDAR
point clouds degenerate in such environments. For wheeled robots, a wheel
kinematic model (i.e., wheel odometry) can improve the reliability of the
odometry estimation. However, the kinematic model suffers from complex motions
(e.g., wheel slippage, lateral movement) in the case of skid-steering robots
particularly because this robot model rotates by skidding its wheels.
Furthermore, these errors change nonlinearly when the wheel slippage is large
(e.g., drifting) and are subject to terrain-dependent parameters. To
simultaneously tackle point cloud degeneration and the kinematic model errors,
we developed a LiDAR-IMU-wheel odometry algorithm incorporating online training
of a neural network that learns the kinematic model of wheeled robots with
nonlinearity. We propose to train the neural network online on a factor graph
along with robot states, allowing the learning-based kinematic model to adapt
to the current terrain condition. The proposed method jointly solves online
training of the neural network and LiDARIMUwheel odometry on a unified factor
graph to retain the consistency of all those constraints. Through experiments,
we first verified that the proposed network adapted to a changing environment,
resulting in an accurate odometry estimation across different environments. We
then confirmed that the proposed odometry estimation algorithm was robust
against point cloud degeneration and nonlinearity (e.g., large wheel slippage
by drifting) of the kinematic model.
comment: https://youtu.be/CvRVhdda7Cw
♻ ☆ FLAME: Learning to Navigate with Multimodal LLM in Urban Environments AAAI 2025
Large Language Models (LLMs) have demonstrated potential in
Vision-and-Language Navigation (VLN) tasks, yet current applications face
challenges. While LLMs excel in general conversation scenarios, they struggle
with specialized navigation tasks, yielding suboptimal performance compared to
specialized VLN models. We introduce FLAME (FLAMingo-Architected Embodied
Agent), a novel Multimodal LLM-based agent and architecture designed for urban
VLN tasks that efficiently handles multiple observations. Our approach
implements a three-phase tuning technique for effective adaptation to
navigation tasks, including single perception tuning for street view
description, multiple perception tuning for route summarization, and end-to-end
training on VLN datasets. The augmented datasets are synthesized automatically.
Experimental results demonstrate FLAME's superiority over existing methods,
surpassing state-of-the-art methods by a 7.3% increase in task completion on
Touchdown dataset. This work showcases the potential of Multimodal LLMs (MLLMs)
in complex navigation tasks, representing an advancement towards applications
of MLLMs in the field of embodied intelligence.
comment: Accepted to AAAI 2025 (Oral)
♻ ☆ DCPI-Depth: Explicitly Infusing Dense Correspondence Prior to Unsupervised Monocular Depth Estimation
There has been a recent surge of interest in learning to perceive depth from
monocular videos in an unsupervised fashion. A key challenge in this field is
achieving robust and accurate depth estimation in challenging scenarios,
particularly in regions with weak textures or where dynamic objects are
present. This study makes three major contributions by delving deeply into
dense correspondence priors to provide existing frameworks with explicit
geometric constraints. The first novelty is a contextual-geometric depth
consistency loss, which employs depth maps triangulated from dense
correspondences based on estimated ego-motion to guide the learning of depth
perception from contextual information, since explicitly triangulated depth
maps capture accurate relative distances among pixels. The second novelty
arises from the observation that there exists an explicit, deducible
relationship between optical flow divergence and depth gradient. A differential
property correlation loss is, therefore, designed to refine depth estimation
with a specific emphasis on local variations. The third novelty is a
bidirectional stream co-adjustment strategy that enhances the interaction
between rigid and optical flows, encouraging the former towards more accurate
correspondence and making the latter more adaptable across various scenarios
under the static scene hypotheses. DCPI-Depth, a framework that incorporates
all these innovative components and couples two bidirectional and collaborative
streams, achieves state-of-the-art performance and generalizability across
multiple public datasets, outperforming all existing prior arts. Specifically,
it demonstrates accurate depth estimation in texture-less and dynamic regions,
and shows more reasonable smoothness. Our source code will be publicly
available at mias.group/DCPI-Depth upon publication.
comment: 13 pages, 8 figures
♻ ☆ Embodied-RAG: General Non-parametric Embodied Memory for Retrieval and Generation
Quanting Xie, So Yeon Min, Pengliang Ji, Yue Yang, Tianyi Zhang, Kedi Xu, Aarav Bajaj, Ruslan Salakhutdinov, Matthew Johnson-Roberson, Yonatan Bisk
There is no limit to how much a robot might explore and learn, but all of
that knowledge needs to be searchable and actionable. Within language research,
retrieval augmented generation (RAG) has become the workhorse of large-scale
non-parametric knowledge; however, existing techniques do not directly transfer
to the embodied domain, which is multimodal, where data is highly correlated,
and perception requires abstraction. To address these challenges, we introduce
Embodied-RAG, a framework that enhances the foundational model of an embodied
agent with a non-parametric memory system capable of autonomously constructing
hierarchical knowledge for both navigation and language generation.
Embodied-RAG handles a full range of spatial and semantic resolutions across
diverse environments and query types, whether for a specific object or a
holistic description of ambiance. At its core, Embodied-RAG's memory is
structured as a semantic forest, storing language descriptions at varying
levels of detail. This hierarchical organization allows the system to
efficiently generate context-sensitive outputs across different robotic
platforms. We demonstrate that Embodied-RAG effectively bridges RAG to the
robotics domain, successfully handling over 250 explanation and navigation
queries across kilometer-level environments, highlighting its promise as a
general-purpose non-parametric system for embodied agents.
comment: Web: https://quanting-xie.github.io/Embodied-RAG-web/