публикации
2025
- CSRApplying Opponent and Environment Modelling in Decentralised Multi-Agent Reinforcement LearningAlexander Chernyavskiy, Aleksandr Panov, and Aleksey SkrynnikCognitive Systems Research, 2025
Multi-agent reinforcement learning (MARL) has recently gained popularity and achieved much success in different kind of games such as zero-sum, cooperative or general-sum games. Nevertheless, the vast majority of modern algorithms assume information sharing during training and, hence, could not be utilised in decentralised applications as well as leverage high-dimensional scenarios and be applied to applications with general or sophisticated reward structure. Thus, due to collecting expenses and sparsity of data in realworld applications it becomes necessary to use world models to model the environment dynamics, using latent variables — i.e. use world model to generate synthetic data for training of MARL algorithms. Therefore, focusing on the paradigm of decentralised training and decentralised execution, we propose an extension to the model-based reinforcement learning approaches leveraging fully decentralised training with planning conditioned on neighbouring co-players’ latent representations. Our approach is inspired by the idea of opponent modelling. The method makes the agent learn in joint latent space without need to interact with the environment. We suggest the approach as proof of concept that decentralised model-based algorithms are able to emerge collective behaviour with limited communication during planning, and demonstrate its necessity on iterated matrix games and modified versions of StarCraft Multi-Agent Challenge (SMAC).
- AAAIMAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at ScaleAnton Andreychuk, Konstantin Yakovlev, Aleksandr Panov, and 1 more authorIn AAAI 2025, 2025
Multi-agent pathfinding (MAPF) is a challenging computational problem that typically requires to find collision-free paths for multiple agents in a shared environment. Solving MAPF optimally is NP-hard, yet efficient solutions are critical for numerous applications, including automated warehouses and transportation systems. Recently, learning-based approaches to MAPF have gained attention, particularly those leveraging deep reinforcement learning. Following current trends in machine learning, we have created a foundation model for the MAPF problems called MAPF-GPT. Using imitation learning, we have trained a policy on a set of precollected sub-optimal expert trajectories that can generate actions in conditions of partial observability without additional heuristics, reward functions, or communication with other agents. The resulting MAPF-GPT model demonstrates zero-shot learning abilities when solving the MAPF problem instances that were not present in the training dataset. We show that MAPF-GPT notably outperforms the current best-performing learnable-MAPF solvers on a diverse range of problem instances and is efficient in terms of computation (in the inference mode).
- AIJGenerative Models for Grid-Based and Image-Based PathfindingDaniil Kirilenko, Anton Andreychuk, Aleksandr I Panov, and 1 more authorArtificial Intelligence, 2025
Pathfinding is a challenging problem which generally asks to find a sequence of valid moves for an agent provided with a representation of the environment, i.e. a map, in which it operates. In this work, we consider pathfinding on binary grids and on image representations of the digital elevation models. In the former case, the transition costs are known, while in latter scenario, they are not. A widespread method to solve the first problem is to utilize a search algorithm that systematically explores the search space to obtain a solution. Ideally, the search should also be complemented with an informative heuristic to focus on the goal and prune the unpromising regions of the search space, thus decreasing the number of search iterations. Unfortunately, the widespread heuristic functions for grid-based pathfinding, such as Manhattan distance or Chebyshev distance, do not take the obstacles into account and in obstacle-rich environments demonstrate inefficient performance. As for pathfinding with image inputs, the heuristic search cannot be applied straightforwardly as the transition costs, i.e. the costs of moving from one pixel to the other, are not known. To tackle both challenges, we suggest utilizing modern deep neural networks to infer the instance-dependent heuristic functions at the pre-processing step and further use them for pathfinding with standard heuristic search algorithms. The principal heuristic function that we suggest learning is the path probability, which indicates how likely the grid cell (pixel) is lying on the shortest path (for binary grids with known transition costs, we also suggest another variant of the heuristic function that can speed up the search). Learning is performed in a supervised fashion (while we have also explored the possibilities of end-to-end learning that includes a planner in the learning pipeline). At the test time, path probability is used as the secondary heuristic for the Focal Search, a specific heuristic search algorithm that provides the theoretical guarantees on the cost bound of the resultant solution. Empirically, we show that the suggested approach significantly outperforms state-of-the-art competitors in a variety of different tasks (including out-of-the distribution instances).
2024
- CSRHebbian Spatial Encoder with Adaptive Sparse ConnectivityPetr Kuderov, Evgenii Dzhivelikian, and Aleksandr PanovCognitive Systems Research, 2024
Biologically plausible neural networks have demonstrated efficiency in learning and recognizing patterns in data. This paper proposes a general online unsupervised algorithm for spatial data encoding using fast Hebbian learning. Inspired by the Hierarchical Temporal Memory (HTM) framework, we introduce the SpatialEncoder algorithm, which learns the spatial specialization of neurons’ receptive fields through Hebbian plasticity and k-WTA (k winners take all) inhibition. A key component of our model is a two-part synaptogenesis algorithm that enables the network to maintain a sparse connection matrix while adapting to non-stationary input data distributions. In the MNIST digit classification task, our model outperforms the HTM SpatialPooler in terms of classification accuracy and encoding stability. Compared to another baseline, a two-layer artificial neural network (ANN), our model achieves competitive classification accuracy with fewer iterations required for convergence. The proposed model offers a promising direction for future research on sparse neural networks with adaptive neural connectivity.
- ECAIInstruction Following with Goal-Conditioned Reinforcement Learning in Virtual EnvironmentsZoya Volovikova, Alexey Skrynnik, Petr Kuderov, and 1 more authorIn Frontiers in Artificial Intelligence and Applications, 2024
In this study, we address the issue of enabling an artificial intelligence agent to execute complex language instructions within virtual environments. In our framework, we assume that these instructions involve intricate linguistic structures and multiple interdependent tasks that must be navigated successfully to achieve the desired outcomes. To effectively manage these complexities, we propose a hierarchical framework that combines the deep language comprehension of large language models with the adaptive actionexecution capabilities of reinforcement learning agents: the language module (based on LLM) translates the language instruction into a high-level action plan, which is then executed by a pre-trained reinforcement learning agent.We have demonstrated the effectiveness of our approach in two different environments: in IGLU, where agents are instructed to build structures, and in Crafter, where agents perform tasks and interact with objects in the surrounding environment according to language commands.
- RALFFStreams: Fast Search with Streams for Autonomous Maneuver PlanningMais Jamal, and Aleksandr PanovIEEE Robotics and Automation Letters, 2024
In autonomous driving, maneuver planning is essential for ride safety and comfort, involving both motion planning and decision-making. This paper introduces FFStreams, a novel approach combining high-level decision-making and low-level motion planning to solve maneuver planning problems while considering kinematic constraints. Addressed as an integrated Task and Motion Planning (TAMP) problem in a dynamic environment, the planner utilizes PDDL, incorporates Streams, and employs Fast-Forward heuristic search. Evaluated against baseline methods in challenging overtaking and lanechanging scenarios, FFStreams demonstrates superior performance, highlighting its potential for real-world applications.
- IGPLSign-Based Image Criteria for Social Interaction Visual Question AnsweringAnfisa A Chuganskaya, Alexey K Kovalev, and Aleksandr I PanovLogic Journal of the IGPL, 2024
The multi-modal tasks have started to play a significant role in the research on Artificial Intelligence. A particular example of that domain is visuallinguistic tasks, such as Visual Question Answering. The progress of modern machine learning systems is determined, among other things, by the data on which these systems are trained. Most modern visual question answering datasets contain limited type questions that can be answered either by directly accessing the image itself or by using external data. At the same time, insufficient attention is paid to the issues of social interactions between people, which limits the scope of visual question answering systems. In this paper, we propose criteria by which images suitable for social interaction visual question answering can be selected for composing such questions, based on psychological research. We believe this should serve the progress of visual question answering systems.
- ICRANeural Potential Field for Obstacle-Aware Local Motion PlanningMuhammad Alhaddad, Konstantin Mironov, Aleksey Staroverov, and 1 more authorIn 2024 IEEE International Conference on Robotics and Automation (ICRA), 2024
Model predictive control (MPC) may provide local motion planning for mobile robotic platforms. The challenging aspect is the analytic representation of collision cost for the case when both the obstacle map and robot footprint are arbitrary. We propose a Neural Potential Field: a neural network model that returns a differentiable collision cost based on robot pose, obstacle map, and robot footprint. The differentiability of our model allows its usage within the MPC solver. It is computationally hard to solve problems with a very high number of parameters. Therefore, our architecture includes neural image encoders, which transform obstacle maps and robot footprints into embeddings, which reduce problem dimensionality by two orders of magnitude. The reference data for network training are generated based on algorithmic calculation of a signed distance function. Comparative experiments showed that the proposed approach is comparable with existing local planners: it provides trajectories with outperforming smoothness, comparable path length, and safe distance from obstacles. Experiment on Husky UGV mobile robot showed that our approach allows real-time and safe local planning. The code for our approach is presented at https://github.com/cog-isa/NPField together with demo video.
- AAAIDecentralized Monte Carlo Tree Search for Partially Observable Multi-agent PathfindingAlexey Skrynnik, Anton Andreychuk, Konstantin Yakovlev, and 1 more authorIn Proceedings of the AAAI Conference on Artificial Intelligence, 2024
The Multi-Agent Pathfinding (MAPF) problem involves finding a set of conflict-free paths for a group of agents confined to a graph. In typical MAPF scenarios, the graph and the agents’ starting and ending vertices are known beforehand, allowing the use of centralized planning algorithms. However, in this study, we focus on the decentralized MAPF setting, where the agents may observe the other agents only locally and are restricted in communications with each other. Specifically, we investigate the lifelong variant of MAPF, where new goals are continually assigned to the agents upon completion of previous ones. Drawing inspiration from the successful AlphaZero approach, we propose a decentralized multi-agent Monte Carlo Tree Search (MCTS) method for MAPF tasks. Our approach utilizes the agent’s observations to recreate the intrinsic Markov decision process, which is then used for planning with a tailored for multi-agent tasks version of neural MCTS. The experimental results show that our approach outperforms state-of-theart learnable MAPF solvers. The source code is available at https://github.com/AIRI-Institute/mats-lp.
- AAAILearn to Follow: Decentralized Lifelong Multi-Agent Pathfinding via Planning and LearningAlexey Skrynnik, Anton Andreychuk, Maria Nesterova, and 2 more authorsIn Proceedings of the AAAI Conference on Artificial Intelligence, 2024
Multi-agent Pathfinding (MAPF) problem generally asks to find a set of conflict-free paths for a set of agents confined to a graph and is typically solved in a centralized fashion. Conversely, in this work, we investigate the decentralized MAPF setting, when the central controller that possesses all the information on the agents’ locations and goals is absent and the agents have to sequentially decide the actions on their own without having access to the full state of the environment. We focus on the practically important lifelong variant of MAPF, which involves continuously assigning new goals to the agents upon arrival to the previous ones. To address this complex problem, we propose a method that integrates two complementary approaches: planning with heuristic search and reinforcement learning through policy optimization. Planning is utilized to construct and re-plan individual paths. We enhance our planning algorithm with a dedicated technique tailored to avoid congestion and increase the throughput of the system. We employ reinforcement learning to discover the collision avoidance policies that effectively guide the agents along the paths. The policy is implemented as a neural network and is effectively trained without any reward-shaping or external guidance. We evaluate our method on a wide range of setups comparing it to the state-of-the-art solvers. The results show that our method consistently outperforms the learnable competitors, showing higher throughput and better ability to generalize to the maps that were unseen at the training stage. Moreover our solver outperforms a rulebased one in terms of throughput and is an order of magnitude faster than a state-of-the-art search-based solver. The code is available at https://github.com/AIRI-Institute/learn-to-follow.
- ICLRObject-Centric Learning with Slot Mixture ModuleDaniil Kirilenko, Vitaliy Vorobyov, Alexey Kovalev, and 1 more authorIn The Twelfth International Conference on Learning Representations, 2024
Object-centric architectures usually apply a differentiable module to the entire feature map to decompose it into sets of entity representations called slots. Some of these methods structurally resemble clustering algorithms, where the cluster’s center in latent space serves as a slot representation. Slot Attention is an example of such a method, acting as a learnable analog of the soft k-means algorithm. Our work employs a learnable clustering method based on the Gaussian Mixture Model. Unlike other approaches, we represent slots not only as centers of clusters but also incorporate information about the distance between clusters and assigned vectors, leading to more expressive slot representations. Our experiments demonstrate that using this approach instead of Slot Attention improves performance in object-centric scenarios, achieving state-of-the-art results in the set property prediction task.
- EAAIHierarchical waste detection with weakly supervised segmentation in images from recycling plantsDmitry Yudin, Nikita Zakharenko, Artem Smetanin, and 7 more authorsEngineering Applications of Artificial Intelligence, 2024
Reducing environmental pollution with household waste and emissions from the computing clusters is an urgent technological problem. In our work, we explore both of these aspects: the deep learning application to improve the efficiency of waste recognition on recycling plant’s conveyor, as well as carbon dioxide emission from the computing devices used in this process. To conduct research, we developed an unique open WaRP dataset that demonstrates the best diversity among similar industrial datasets and contains more than 10,000 images with 28 different types of recyclable goods (bottles, glasses, card boards, cans, detergents, and canisters). Objects can overlap, be in poor lighting conditions, or significantly distorted. On the WaRP dataset, we study training and evaluation of cutting-edge deep neural networks for detection, classification and segmentation tasks. Additionally, we developed a hierarchical neural network approach called H-YC with weakly supervised waste segmentation. It provided a notable increase in the detection quality and made it possible to segment images, learning only having class labels, not their masks. Both the suggested hierarchical approach and the WaRP dataset have shown great industrial application potential.
- Interactive Semantic Map Representation for Skill-Based Visual Object NavigationTatiana Zemskova, Aleksei Staroverov, Kirill Muravyev, and 2 more authorsIEEE Access, 2024
Visual object navigation is one of the key tasks in mobile robotics. One of the most important components of this task is the accurate semantic representation of the scene, which is needed to determine and reach a goal object. This paper introduces a new representation of a scene semantic map formed during the embodied agent interaction with the indoor environment. It is based on a neural network method that adjusts the weights of the segmentation model with backpropagation of the predicted fusion loss values during inference on a regular (backward) or delayed (forward) image sequence. We implement this representation into a full-fledged navigation approach called SkillTron. The method can select robot skills from end-to-end policies based on reinforcement learning and classic map-based planning methods. The proposed approach makes it possible to form both intermediate goals for robot exploration and the final goal for object navigation. We conduct intensive experiments with the proposed approach in the Habitat environment, demonstrating its significant superiority over state-of-the-art approaches in terms of navigation quality metrics. The developed code and custom datasets are publicly available at github.com/AIRI-Institute/skill-fusion.
2023
- Fine-tuning Multimodal Transformer Models for Generating Actions in Virtual and Real EnvironmentsAleksei Staroverov, Andrey S Gorodetsky, Andrei S Krishtopik, and 3 more authorsIEEE Access, 2023
In this work, we propose and investigate an original approach to using a pre-trained multimodal transformer of a specialized architecture for controlling a robotic agent in an object manipulation task based on language instruction, which we refer to as RozumFormer. Our model is based on a bimodal (text-image) transformer architecture originally trained for solving tasks that use one or both modalities, such as language modeling, visual question answering, image captioning, text recognition, text-to-image generation, etc. The discussed model was adapted for robotic manipulation tasks by organizing the input sequence of tokens in a particular way, consisting of tokens for text, images, and actions. We demonstrated that such a model adapts well to new tasks and shows better results with fine-tuning than complete training in simulation and real environments. To transfer the model from the simulator to a real robot, new datasets were collected and annotated. In addition, experiments controlling the agent in a visual environment using reinforcement learning have shown that fine-tuning the model with a mixed dataset that includes examples from the initial visual-linguistic tasks only slightly decreases performance on these tasks, simplifying the addition of tasks from another domain.
- AAAITransPath: Learning Heuristics For Grid-Based Pathfinding via TransformersDaniil Kirilenko, Anton Andreychuk, Aleksandr Panov, and 1 more authorIn Proceedings of the AAAI Conference on Artificial Intelligence, 2023
Heuristic search algorithms, e.g. A*, are the commonly used tools for pathfinding on grids, i.e. graphs of regular structure that are widely employed to represent environments in robotics, video games etc. Instance-independent heuristics for grid graphs, e.g. Manhattan distance, do not take the obstacles into account and, thus, the search led by such heuristics performs poorly in the obstacle-rich environments. To this end, we suggest learning the instance-dependent heuristic proxies that are supposed to notably increase the efficiency of the search. The first heuristic proxy we suggest to learn is the correction factor, i.e. the ratio between the instance independent cost-to-go estimate and the perfect one (computed offline at the training phase). Unlike learning the absolute values of the cost-to-go heuristic function, which was known before, when learning the correction factor the knowledge of the instance-independent heuristic is utilized. The second heuristic proxy is the path probability, which indicates how likely the grid cell is lying on the shortest path. This heuristic can be utilized in the Focal Search framework as the secondary heuristic, allowing us to preserve the guarantees on the bounded sub-optimality of the solution. We learn both suggested heuristics in a supervised fashion with the state-of-the-art neural networks containing attention blocks (transformers). We conduct a thorough empirical evaluation on a comprehensive dataset of planning tasks, showing that the suggested techniques i) reduce the computational effort of the A* up to a factor of \4\x while producing the solutions, which costs exceed the costs of the optimal solutions by less than \0.3\% on average; ii) outperform the competitors, which include the conventional techniques from the heuristic search, i.e. weighted A*, as well as the state-of-the-art learnable planners.
- TNNLSWhen to Switch: Planning and Learning For Partially Observable Multi-Agent PathfindingAlexey Skrynnik, Anton Andreychuk, Konstantin Yakovlev, and 1 more authorIEEE Transactions on Neural Networks and Learning Systems, 2023
- RALPolicy Optimization to Learn Adaptive Motion Primitives in Path Planning With Dynamic ObstaclesBrian Angulo, Aleksandr Panov, and Konstantin YakovlevIEEE Robotics and Automation Letters, 2023
This paper addresses the kinodynamic motion planning for non-holonomic robots in dynamic environments with both static and dynamic obstacles – a challenging problem that lacks a universal solution yet. One of the promising approaches to solve it is decomposing the problem into the smaller sub-problems and combining the local solutions into the global one. The crux of any planning method for non- holonomic robots is the generation of motion primitives that generates solutions to local planning sub-problems. In this work we introduce a novel learnable steering function (policy), which takes into account kinodynamic constraints of the robot and both static and dynamic obstacles. This policy is efficiently trained via the policy optimization. Empirically, we show that our steering function generalizes well to unseen problems. We then plug in the trained policy into the sampling-based and lattice-based planners, and evaluate the resultant POLAMP algorithm (Policy Optimization that Learns Adaptive Motion Primitives) in a range of challenging setups that involve a car-like robot operating in the obstacle-rich parking-lot en- vironments. We show that POLAMP is able to plan collision- free kinodynamic trajectories with success rates higher than 92%, when 50 simultaneously moving obstacles populate the environment showing better performance than the state-of-the- art competitors.
- NeuroInfoAddressing Task Prioritization in Model-based Reinforcement LearningArtem Zholus, Yaroslav Ivchenkov, and Aleksandr I PanovIn Advances in Neural Computation, Machine Learning, and Cognitive Research VI. NEUROINFORMATICS 2022. Studies in Computational Intelligence, 2023
World models facilitate sample-efficient reinforcement learning (RL) and, by design, can benefit from the multitask information. However, it is not used by typical model-based RL (MBRL) agents. We propose a data-centric approach to this problem. We build a controllable optimization process for MBRL agents that selectively prioritizes the data used by the model-based agent to improve its performance. We show how this can favor implicit task generalization in a custom environment based on MetaWorld with a parametric task variability. Furthermore, by bootstrapping the agent’s data, our method can boost the performance on unstable environments from DeepMind Control Suite. This is done without any additional data and architectural changes outperforming state-of-the-art visual model-based RL algorithms. Additionally, we frame the approach within the scope of methods that have unintentionally followed the controllable optimization process paradigm, filling the gap of the data-centric task-bootstrapping methods.
- ICONIPHPointLoc: Point-based Indoor Place Recognition using Synthetic RGB-D ImagesDmitry Yudin, Yaroslav Solomentsev, Ruslan Musaev, and 2 more authorsIn Neural Information Processing. Lecture Notes in Computer Science, 2023
We present a novel dataset named as HPointLoc, specially designed for exploring capabilities of visual place recognition in indoor environment and loop detection in simultaneous localization and mapping. The loop detection sub-task is especially relevant when a robot with an on-board RGB-D camera can drive past the same place (“Point") at different angles. The dataset is based on the popular Habitat simulator, in which it is possible to generate photorealistic indoor scenes using both own sensor data and open datasets, such as Matterport3D. To study the main stages of solving the place recognition problem on the HPointLoc dataset, we proposed a new modular approach named as PNTR. It first performs an image retrieval with the Patch-NetVLAD method, then extracts keypoints and matches them using R2D2, LoFTR or SuperPoint with SuperGlue, and finally performs a camera pose optimization step with TEASER++. Such a solution to the place recognition problem has not been previously studied in existing publications. The PNTR approach has shown the best quality metrics on the HPointLoc dataset and has a high potential for real use in localization systems for unmanned vehicles. The proposed dataset and framework are publicly available: https://github.com/metra4ok/HPointLoc.
- Skill Fusion in Hybrid Robotic Framework for Visual Object Goal NavigationAleksei Staroverov, Kirill Muravyev, Konstantin Yakovlev, and 1 more authorRobotics, 2023
In recent years, Embodied AI has become one of the main topics in robotics. For the agent to operate in human-centric environments, it needs the ability to explore previously unseen areas and to navigate to objects that humans want the agent to interact with. This task, which can be formulated as ObjectGoal Navigation (ObjectNav), is the main focus of this work. To solve this challenging problem, we suggest a hybrid framework consisting of both not-learnable and learnable modules and a switcher between them—SkillFusion. The former are more accurate, while the latter are more robust to sensors’ noise. To mitigate the sim-to-real gap, which often arises with learnable methods, we suggest training them in such a way that they are less environment-dependent. As a result, our method showed top results in both the Habitat simulator and during the evaluations on a real robot. Video and code for our approach can be found on our website: https://github.com/AIRI-Institute/skill-fusion (accessed on 13 July 2023).
- IJCAIObject-Oriented Decomposition of World Model in Reinforcement LearningLeonid Ugadiarov, and Aleksandr I PanovIn IJCAI Neuro-Symbolic Agents Workshop, 2023
Object-oriented models are expected to have better generalization abilities and operate on a more compact state representation. Recent studies have shown that using pre-trained object-centric representation learning models for state factorization in model-free algorithms improves the efficiency of policy learning. Approaches using object-factored world models to predict the environment dynamics have also shown their effectiveness in object-based grid-world environments. Following those works, we propose a novel object-oriented model-based value-based reinforcement learning algorithm Object Oriented Q-network (OOQN) employing an object-oriented decomposition of the world and state-value models. The results of the experiments demonstrate that the developed algorithm outperforms state-of-the-art model-free policy gradient algorithms and model-based value-based algorithm with a monolithic world model in tasks where individual dynamics of the objects is similar.
- CVPRSegmATRon: Embodied Adaptive Semantic Segmentation for Indoor EnvironmentTatiana Zemskova, Margarita Kichik, Dmitry Yudin, and 1 more authorIn CVPR Workshop on Embodied AI, 2023
This paper presents an adaptive transformer model named SegmATRon for embodied image semantic segmentation. Its distinctive feature is the adaptation of model weights during inference on several images using a hybrid multicomponent loss function. We studied this model on datasets collected in the photorealistic Habitat Simulator. We showed that obtaining additional images using the agent’s actions in an indoor environment can improve the quality of semantic segmentation.
2022
- CSRVector Semiotic Model for Visual Question AnsweringAlexey K. Kovalev, Makhmud Shaban, Evgeny Osipov, and 1 more authorCognitive Systems Research, 2022
In this paper, we propose a Vector Semiotic Model as a possible solution to the symbol grounding problem in the context of Visual Question Answering. The Vector Semiotic Model combines the advantages of a Semiotic Approach implemented in the Sign-Based World Model and Vector Symbolic Architectures. The Sign-Based World Model represents information about a scene depicted on an input image in a structured way and grounds abstract objects in an agent’s sensory input. We use the Vector Symbolic Architecture to represent the elements of the Sign-Based World Model on a computational level. Properties of a high-dimensional space and operations defined for high-dimensional vectors allow encoding the whole scene into a high-dimensional vector with the preservation of the structure. That leads to the ability to apply explainable reasoning to answer an input question. We conducted experiments are on a CLEVR dataset and show results comparable to the state of the art. The proposed combination of approaches, first, leads to the possible solution of the symbol-grounding problem and, second, allows expanding current results to other intelligent tasks (collaborative robotics, embodied intellectual assistance, etc.).
- DokladyApplication of Pretrained Large Language Models in Embodied Artificial IntelligenceA. K. Kovalev, and Aleksandr I. PanovDoklady Mathematics, 2022
A feature of tasks in embodied artificial intelligence is that a query to an intelligent agent is formulated in natural language. As a result, natural language processing methods have to be used to transform the query into a format convenient for generating an appropriate action plan. There are two basic approaches to the solution of this problem. One is based on specialized models trained with particular instances of instructions translated into agent-executable format. The other approach relies on the ability of large language models trained with a large amount of unlabeled data to store common sense knowledge. As a result, such models can be used to generate an agent’s action plan in natural language without preliminary learning. This paper provides a detailed review of models based on the second approach as applied to embodied artificial intelli- gence tasks.
- BrainInfHierarchical intrinsically motivated agent planning behavior with dreaming in grid environmentsEvgenii Dzhivelikian, Artem Latyshev, Petr Kuderov, and 1 more authorBrain Informatics, 2022
Biologically plausible models of learning may provide a crucial insight for building autonomous intelligent agents capable of performing a wide range of tasks. In this work, we propose a hierarchical model of an agent operating in an unfamiliar environment driven by a reinforcement signal. We use temporal memory to learn sparse distributed representation of state–actions and the basal ganglia model to learn effective action policy on different levels of abstraction. The learned model of the environment is utilized to generate an intrinsic motivation signal, which drives the agent in the absence of the extrinsic signal, and through acting in imagination, which we call dreaming. We demonstrate that the proposed architecture enables an agent to effectively reach goals in grid environments.
- DokladyPlanning and Learning in Multi-Agent Path FindingK. S. Yakovlev, A. A. Andreychuk, A. A. Skrynnik, and 1 more authorDoklady Mathematics, 2022
Multi-agent path finding arises, on the one hand, in numerous applied areas. A classical example is automated warehouses with a large number of mobile goods-sorting robots operating simultaneously. On the other hand, for this problem, there are no universal solution methods that simultaneously satisfy numerous (often contradictory) requirements. Examples of such criteria are a guarantee of finding optimal solu- tions, high-speed operation, the possibility of operation in partially observable environments, etc. This paper provides a survey of modern methods for multi-agent path finding. Special attention is given to various settings of the problem. The differences and between trainable and nontrainable solution methods and their applicability are discussed. Experimental programming environments necessary for implementing trainable approaches are analyzed separately.
- PeerJPathfinding in stochastic environments: learning vs planningAlexey Skrynnik, Anton Andreychuk, Konstantin Yakovlev, and 1 more authorPeerJ Computer Science, 2022
Among the main challenges associated with navigating a mobile robot in complex environments are partial observability and stochasticity. This work proposes a stochastic formulation of the pathfinding problem, assuming that obstacles of arbitrary shapes may appear and disappear at random moments of time. Moreover, we consider the case when the environment is only partially observable for an agent. We study and evaluate two orthogonal approaches to tackle the problem of reaching the goal under such conditions: planning and learning. Within planning, an agent constantly re-plans and upyears the path based on the history of the observations using a search-based planner. Within learning, an agent asynchronously learns to optimize a policy function using recurrent neural networks (we propose an original efficient, scalable approach). We carry on an extensive empirical evaluation of both approaches that show that the learning-based approach scales better to the increasing number of the unpredictably appearing/disappearing obstacles. At the same time, the planning-based one is preferable when the environment is close-to-the-deterministic ( i.e. , external disturbances are rare). Code available at https://github.com/Tviskaron/pathfinding-in-stochastic-envs .
- Hierarchical Landmark Policy Optimization for Visual Indoor NavigationAleksei Staroverov, and Aleksandr PanovIEEE Access, 2022
In this paper, we study the problem of visual indoor navigation to an object that is defined by its semantic category. Recent works have shown significant achievements in the end-to-end reinforcement learning approach and modular systems. However, both approaches need a big step forward to be robust and practically applicable. To solve the problem of insufficient exploration of the scenes and make exploration more semantically meaningful, we extend standard task formulation and give the agent easily accessible landmarks in the form of the room locations and those types. The availability of landmarks allows the agent to build a hierarchical policy structure and achieve a success rate of 63% on validation scenes in a photo- realistic Habitat simulator. In a hierarchy, a low level consists of separately trained RL skills and a high level deterministic policy, which decides which skill is needed at the moment. Also, in this paper, we show the possibility of transferring a trained policy to a real robot. After a bit of training on the reconstructed real scene, the robot shows up to 79% SPL when solving the task of navigating to an arbitrary object.
2021
- LNCSLong-Term Exploration in Persistent MDPsLeonid Ugadiarov, Alexey Skrynnik, and Aleksandr I. PanovIn Advances in Soft Computing. MICAI 2021. Part I. Lecture Notes in Computer Science, 2021
Exploration is an essential part of reinforcement learning, which restricts the quality of learned policy. Hard-exploration environments are defined by huge state space and sparse rewards. In such conditions, an exhaustive exploration of the environment is often impossible, and the successful training of an agent requires a lot of interaction steps. In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process, in which agents during training can roll back to visited states. We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge. At all used levels of the game, our agent outperforms or shows comparable results with state-of-the-art curiosity methods with knowledge-based intrinsic motivation: ICM and RND. An implementation of RbExplore can be found at https://github.com/cds-mipt/RbExplore.
- LNCSFlexible Data Augmentation in Off-Policy Reinforcement LearningAlexandra Rak, Alexey Skrynnik, and Aleksandr I PanovIn Artificial Intelligence and Soft Computing. ICAISC 2021. Lecture Notes in Computer Science, 2021
This paper explores an application of image augmentation in reinforcement learning tasks - a popular regularization technique in the computer vision area. The analysis is based on the model-free off-policy algorithms. As a regularization, we consider the augmentation of the frames that are sampled from the replay buffer of the model. Evaluated augmentation techniques are random changes in image contrast, random shifting, random cutting, and others. Research is done using the environments of the Atari games: Breakout, Space Invaders, Berzerk, Wizard of Wor, Demon Attack. Using augmentations allowed us to obtain results confirming the significant acceleration of the model’s algorithm convergence. We also proposed an adaptive mechanism for selecting the type of augmentation depending on the type of task being performed by the agent.
- KBSForgetful experience replay in hierarchical reinforcement learning from expert demonstrationsAlexey Skrynnik, Aleksey Staroverov, Ermek Aitygulov, and 3 more authorsKnowledge-Based Systems, 2021
Deep reinforcement learning (RL) shows impressive results in complex gaming and robotic environments. These results are commonly achieved at the expense of huge computational costs and require an incredible number of episodes of interactions between the agent and the environment. Hierarchical methods and expert demonstrations are among the most promising approaches to improve the sample efficiency of reinforcement learning methods. In this paper, we propose a combination of methods that allow the agent to use low-quality demonstrations in complex vision-based environments with multiple related goals. Our Forgetful Experience Replay (ForgER) algorithm effectively handles expert data errors and reduces quality losses when adapting the action space and states representation to the agent’s capabilities. The proposed goal-oriented replay buffer structure allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations. Our method has a high degree of versatility and can be integrated into various off-policy methods. The ForgER surpasses the existing state-of-the-art RL methods using expert demonstrations in complex environments. The solution based on our algorithm beats other solutions for the famous MineRL competition and allows the agent to demonstrate the behavior at the expert level.
- CSRHierarchical Deep Q-Network from imperfect demonstrations in MinecraftAlexey Skrynnik, Aleksey Staroverov, Ermek Aitygulov, and 3 more authorsCognitive Systems Research, 2021
We present hierarchical Deep Q-Network (HDQfD) that took first place in MineRL competition. HDQfD works on imperfect demonstrations utilize hierarchical structure of expert trajectories extracting effective sequence of meta-actions and subgoals. We introduce structured task dependent replay buffer and adaptive prioritizing technique that allow the HDQfD agent to gradually erase poor-quality expert data from the buffer. In this paper we present the details of the HDQfD algorithm and give the experimental results in Minecraft domain.
- LNCSApplying Vector Symbolic Architecture and Semiotic Approach to Visual DialogAlexey K Kovalev, Makhmud Shaban, Anfisa A. Chuganskaya, and 1 more authorIn Hybrid Artificial Intelligent Systems. HAIS 2021. Lecture Notes in Computer Science, 2021
The multi-modal tasks have started to play a significant role in the research on Artificial Intelligence. A particular example of that domain is visual-linguistic tasks, such as Visual Question Answering and its extension, Visual Dialog. In this paper, we concentrate on the Visual Dialog task and dataset. The task involves two agents. The first agent does not see an image and asks questions about the image content. The second agent sees this image and answers questions. The symbol grounding problem, or how symbols obtain their meanings, plays a crucial role in such tasks. We approach that problem from the semiotic point of view and propose the Vector Semiotic Architecture for Visual Dialog. The Vector Semiotic Architecture is a combination of the Sign-Based World Model and Vector Symbolic Architecture. The Sign-Based World Model represents agent knowledge on the high level of abstraction and allows uniform representation of different aspects of knowledge, forming a hierarchical representation of that knowledge in the form of a special kind of semantic network. The Vector Symbolic Architecture represents the computational level and allows to operate with symbols as with numerical vectors using simple element-wise operations. That combination enables grounding object representation from any level of abstraction to the sensory agent input.
- Hybrid Policy Learning for Multi-Agent PathfindingAlexey Skrynnik, Alexandra Yakovleva, Vasilii Davydov, and 2 more authorsIEEE Access, 2021
In this work we study the behavior of groups of autonomous vehicles, which are the part of the Internet ofVehicles systems. One of the challenging modes of operation of such systems is the case when the observability of each vehicle is limited and the global/local communication is unstable, e.g. in the crowded parking lots. In such scenarios the vehicles have to rely on the local observations and exhibit cooperative behavior to ensure safe and efficient trips. This type of problems can be abstracted to the so-called multi- agent pathfinding when a group of agents, confined to a graph, have to find collision-free paths to their goals (ideally, minimizing an objective function e.g. travel time). Widely used algorithms for solving this problem rely on the assumption that a central controller exists for which the full state of the environment (i.e. the agents current positions, their targets, configuration of the static obstacles etc.) is known and they cannot be straightforwardly be adapted to the partially-observable setups. To this end, we suggest a novel approach which is based on the decomposition of the problem into the two sub-tasks: reaching the goal and avoiding the collisions. To accomplish each of this task we utilize reinforcement learning methods such as Deep Monte Carlo Tree Search, Q-mixing networks, and policy gradients methods to design the policies that map the agents’ observations to actions. Next, we introduce the policy-mixing mechanism to end up with a single hybrid policy that allows each agent to exhibit both types of behavior – the individual one (reaching the goal) and the cooperative one (avoiding the collisions with other agents). We conduct an extensive empirical evaluation that shows that the suggested hybrid-policy outperforms standalone stat-of-the-art reinforcement learning methods for this kind of problems by a notable margin.
- LNCSAdaptive Maneuver Planning for Autonomous Vehicles Using Behavior Tree on Apollo PlatformMais Jamal, and Aleksandr PanovIn Artificial Intelligence XXXVIII. SGAI 2021. Lecture Notes in Computer Science, 2021
In safety-critical systems such as autonomous driving sys- tems, behavior planning is a significant challenge. The presence of numerous dynamic obstacles makes the driving environment unpredictable. The planning algorithm should be safe, reactive, and adaptable to environmental changes. The paper presents an adaptive maneuver planning algorithm based on an evolving behavior tree created with genetic programming. In addition, we make a technical contribution to the Baidu Apollo autonomous driving platform, allowing the platform to test and develop overtaking maneuver planning algorithms.
- LNCSQ-Mixing Network for Multi-agent Pathfinding in Partially Observable Grid EnvironmentsVasilii Davydov, Alexey Skrynnik, Konstantin Yakovlev, and 1 more authorIn Artificial Intelligence. RCAI 2021. Lecture Notes in Computer Science, 2021
In this paper, we consider the problem of multi-agent navigation in partially observable grid environments. This problem is challenging for centralized planning approaches as they typically rely on full knowledge of the environment. To this end, we suggest utilizing the reinforcement learning approach when the agents first learn the policies that map observations to actions and then follow these policies to reach their goals. To tackle the challenge associated with learning cooperative behavior, i.e. in many cases agents need to yield to each other to accomplish a mission. We use a mixing Q-network that complements learning individual policies. In the experimental evaluation, we show that such approach leads to plausible results and scales well to a large number of agents.
2020
- Real-Time Object Navigation with Deep Neural Networks and Hierarchical Reinforcement LearningAleksey Staroverov, Dmitry A. Yudin, Ilya Belkin, and 3 more authorsIEEE Access, 2020
In the last years, deep learning and reinforcement learning methods have significantly improved mobile robots in such fields as perception, navigation, and planning. But there are still gaps in applying these methods to real robots due to the low computational efficiency of recent neural network architectures and their poor adaptability to robotic experiments’ realities. In this paper, we consider an important task in mobile robotics - navigation to an object using an RGB-D camera.We develop a new neural network framework for robot control that is fast and resistant to possible noise in sensors and actuators. We propose an original integration of semantic segmentation, mapping, localization, and reinforcement learning methods to improve the effectiveness of exploring the environment, finding the desired object, and quickly navigating to it. We created a new HISNav dataset based on the Habitat virtual environment, which allowed us to use simulation experiments to pre-train the model and then upload it to a real robot. Our architecture is adapted to work in a real-time environment and fully implements modern trends in this area.
2019
- LNCSMental Actions and Modelling of Reasoning in Semiotic Approach to AGIAlexey K Kovalev, and Aleksandr I PanovIn Artificial General Intelligence. AGI 2019. Lecture Notes in Computer Science, 2019
The article expounds the functional of a cognitive architecture Sign-Based World Model (SBWM) through the algorithm for the implementation of a particular case of reasoning. The SBWM architecture is a multigraph, called a semiotic network with special rules of activation spreading. In a semiotic network, there are four subgraphs that have specific properties and are composed of constituents of the main SBWM element – the sign. Such subgraphs are called causal networks on images, significances, personal meanings, and names. The semiotic network can be viewed as the memory of an intelligent agent. It is proposed to divide the agent’s memory in the SBWM architecture into a long-term memory consisting of signs-prototype, and a working memory consisting of signs-instance. The concept of elementary mental actions is introduced as an integral part of the reasoning process. Examples of such actions are provided. The performance of the proposed reasoning algorithm is considered by a model example.
- LNCSHierarchical Psychologically Inspired Planning for Human-Robot Interaction TasksGleb Kiselev, and Aleksandr PanovIn Interactive Collaborative Robotics. ICR 2019. Lecture Notes in Computer Science, 2019
This paper presents a new algorithm for hierarchical case-based behavior planning in a coalition of agents – HierMAP. The considered algorithm, in contrast to the well-known planners HEART, PANDA, and others, is intended primarily for use in multi-agent tasks. For this, the possibility of dynamically distributing agent roles with different functionalities was realized. The use of a psychologically plausible approach to the representation of the knowledge by agents using a semiotic network allows applying HierMAP in groups in which people participate as one of the actors. Thus, the algorithm allows us to represent solutions of collaborative problems, forming human- interpretable results at each planning step. Another advantage of the proposed method is the ability to save and reuse experience of planning – expansion in the field of case-based planning. Such extension makes it possible to consider information about the success/ failure of interaction with other members of the coalition. Presenting precedents as a special part of the agent’s memory (semantic network on meanings) allows to significantly reduce the planning time for a similar class of tasks. The paper deals with smart relocation tasks in the environment. A comparison is made with the main hierarchical planners widely used at present.
- OMNNObject Detection with Deep Neural Networks for Reinforcement Learning in the Task of Autonomous Vehicles Path Planning at the IntersectionD A Yudin, A Skrynnik, A Krishtopik, and 2 more authorsOptical Memory and Neural Networks, 2019
Among a number of problems in the behavior planning of an unmanned vehicle the central one is movement in difficult areas. In particular, such areas are intersections at which direct interac- tion with other road agents takes place. In our work, we offer a new approach to train of the intelligent agent that simulates the behavior of an unmanned vehicle, based on the integration of reinforcement learning and computer vision. Using full visual information about the road intersection obtained from aerial photographs, it is studied automatic detection the relative positions of all road agents with vari- ous architectures of deep neural networks (YOLOv3, Faster R-CNN, RetinaNet, Cascade R-CNN, Mask R-CNN, Cascade Mask R-CNN). The possibilities of estimation of the vehicle orientation angle based on a convolutional neural network are also investigated. Obtained additional features are used in the modern effective reinforcement learning methods of Soft Actor Critic and Rainbow, which allows to accelerate the convergence of its learning process. To demonstrate the operation of the devel- oped system, an intersection simulator was developed, at which a number of model experiments were carried out.
- LNCSToward Faster Reinforcement Learning for Robotics : Using Gaussian ProcessesAli Younes, and Aleksandr I PanovIn RAAI Summer School 2019. Lecture Notes in Computer Science, 2019
Standard robotic control works perfectly in case of ordinary conditions, but in the case of a change in the conditions (e.g. damaging of one of the motors), the robot won’t achieve its task anymore. We need an algorithm that provide the robot with the ability of adaption to unforeseen situations. Reinforcement learning provide a framework corresponds with that requirements, but it needs big data sets to learn robotic tasks, which is impractical. We discuss using Gaussian processes to improve the efficiency of the Reinforcement learning, where a Gaussian Process will learn a state transition model using data from the robot (interaction) phase, and after that use the learned GP model to simulate trajectories and optimize the robot’s controller in a (simulation) phase. PILCO algorithm considered as the most data efficient RL algorithm. It gives promising results in Cart-pole task, where a working controller was learned after seconds of (interaction) on the real robot, but the whole training time, considering the training in the (simulation) was longer. In this work, we will try to leverage the abilities of the computational graphs to produce a ROS friendly python implementation of PILCO, and discuss a case study of a real world robotic task.
2018
- LNCSTask and Spatial Planning by the Cognitive Agent with Human-like Knowledge RepresentationErmek Aitygulov, Gleb Kiselev, and Aleksandr I. PanovIn Interactive Collaborative Robotics. ICR 2018. Lecture Notes in Computer Science, 2018
The paper considers the task of simultaneous learning and planning actions for moving a cognitive agent in two-dimensional space. Planning is carried out by an agent who uses an anthropic way of knowledge representation that allows him to build transparent and understood planes, which is especially important in case of human-machine interaction. Learning actions to manipulate objects is carried out through reinforcement learning and demonstrates the possibilities of replenishing the agent’s procedural knowledge. The presented approach was demonstrated in an experiment in the Gazebo simulation environment.
2017
- LNCSSynthesis of the Behavior Plan for Group of Robots with Sign Based World ModelGleb A. Kiselev, and Aleksandr I. PanovIn Interactive Collaborative Robotics. ICR 2017. Lecture Notes in Computer Science, 2017
The paper considers the task of the group’s collective plan intellectual agents. Robotic systems are considered as agents, possessing a manipulator and acting with objects in a determined external environment. The MultiMAP planning algorithm proposed in the article is hierarchical. It is iterative and based on the original sign representation of knowledge about objects and processes, agents knowledge about themselfs and about other members of the group. For distribution actions between agents in general plan signs “I” and “Other” (“They”) are used. In conclusion, the results of experiments in the model problem “Blocksworld” for a group of several agents are presented.
2016
- CSRMultilayer cognitive architecture for UAV controlStanislav Emel’yanov, Dmitry Makarov, Aleksandr I. Panov, and 1 more authorCognitive Systems Research, 2016
Extensive use of unmanned aerial vehicles (UAVs) in recent years has induced the rapid growth of research areas related to UAV production. Among these, the design of control systems capable of automating a wide range of UAV activities is one of the most actively explored and evolving. Currently, researchers and developers are interested in designing control systems that can be referred to as intelligent, e.g. the systems which are suited to solve such tasks as planning, goal prioritization, coalition formation etc. and thus guarantee high levels of UAV autonomy. One of the principal problems in intelligent control system design is tying together various methods and models traditionally used in robotics and aimed at solving such tasks as dynamics modelling, control signal genera- tion, location and mapping, path planning etc. with the methods of behaviour modelling and planning which are thoroughly studied in cognitive science. Our work is aimed at solving this problem. We propose layered architecture — STRL (strategic, tactical, reactive, layered) — of the control system that au- tomates the behaviour generation using a cognitive approach while taking into account complex dynamics and kinematics of the control object (UAV).We use a special type of knowledge representation — sign world model — that is based on the psychological activity theory to describe individual behaviour planning and coalition formation processes. We also propose path planning methodology which serves as the mediator between the high-level cognitive activities and the reactive control signals generation. To generate these signals we use a state-dependent Riccati equation and specific method for solving it. We believe that utilization of the proposed architecture will broaden the spectrum of tasks which can be solved by the UAV’s coalition automatically, as well as raise the autonomy level of each individual member of that coalition.
2014
- JCSCBehavior control as a function of consciousness. I. World model and goal settingG. S. Osipov, A. I. Panov, and N. V. ChudovaJournal of Computer and Systems Sciences International, 2014
Functions that are referred in psychology as functions of consciousness are considered. These functions include reflection, consciousness of activity motivation, goal setting, synthesis of goal oriented behavior, and some others. The description is based on the concept of sign, which is widely used in psychology and, in particular, in the cultural-historical theory by Vygotsky, in which sign is interpreted informally. In this paper, we elaborate upon the concept of sign, consider mechanisms of sign formation, and some self-organization on the set of signs. Due to the work of self-organization mechanisms, a new method for the representation of the world model of an actor appears. The concept of semiotic network is introduced that is used for the examination of the actor’s world models. Models of some functions indicated above are constructed. The second part of the paper is devoted to functions of self-consciousness and to the application of the constructed models for designing plans and constructing new architectures of intelligent agents that are able, in particular, to distribute roles in coalitions.