Aleksandr I. Panov

Александр Панов
Москва, Россия

В настоящее время я руковожу группой нейросимвольной интеграции в Институте искусственного интеллекта AIRI, занимаюсь воплощенным искусственным интеллектом в Центре когнитивного моделирования Московского физико-технического института (МФТИ) и глубоким обучением с подкреплением в Федеральном исследовательском центре «Информатика и управление» РАН (ФИЦ ИУ РАН). Мои научные интересы в настоящее время связаны с трансформерными и структурированным моделям мира в обучениия с подкреплением, применением языковых моделей для планирования поведения (в том числе для робототехнических платформ), мультиагентным планированием и обучением, визуальной навигацией внутри помещений. В AIRI я занимаюсь в первую очередь фундаментальными исследования по созданию новых нейросимвольных архитектур для планирования и обучения. В МФТИ веду прикладную деятельность и руковожу заказными научно-исследовательскими работами в области робототехники и систем управления. C 2019 являюсь научным руководителем прикладных исследований в области разработки систем компьютерного зрения и планирования для мобильной робототехники и беспилотного транспорта по заказу таких компаний как НКБ ВС, Интегрант, Хуавей, Сбер. Также в МФТИ являюсь руководителем магистерской программы по искусственному интеллекту, которая являются одной из самых востребованных академических программ в ФПМИ. В ФИЦ ИУ РАН руковожу отделом «Интеллектуальные динамические системы и когнитивные исследлвания» и занимаюсь развитием различных подходов в области обучения с подкреплением.

Краткая биография: В 2005 году закончил физико-математическую школу при Новосибирском государственном университете (НГУ) и затем учился в бакавриате физического факультета НГУ со специализацией в автоматизации физико-технических исследований. Магистратуру по прикладной математике и физике закончил в МФТИ, кафедра Интеллектуальных систем (базовая кафедра Вычислительного центра РАН). Кандидатскую диссертацию выполнил в Институте системного анализа (ИСА РАН) под руководством Г.С. Осипова (тема «Исследование методов, разработка моделей и алгоритмов формирования элементов знаковой картины мира субъекта деятельности»). С 2011 года работаю в ИСА РАН, позднее воешедший в состав ФИУ ИУ РАН. С 2019 года руковожу Научно-образовательным центром когнитивного моделирования МФТИ и магистерской программой «Методы и технологии искусственного интеллекта» ФПМИ. В 2021 году присоединился к команде AIRI.

Награды: В 2017 году стал лауреатом медали Российской академии наук для молодых ученых. В 2019 году был руководителем команды CDS, выигравшей первое место в соревновании NeurIPS MineRL. В 2023 году был руоководителем команды SkillFusion, занявшей первое место в сорвеновании CVPR Habitat. Являюсь руководителем успешно завершенных грантов РФФИ и РНФ.

Академическая работа: С 2019 года являюсь редактором журнала Cognitive Systems Research (Elsevier). С 2015 состою в Российской ассоциации искусственного интеллекта (РАИИ), до 2022 год входил в Научный совет. С 2017 года провожу ежегодные летние школы РАИИ и AIRI. В 2021 и 2022 был организатором соревнования NeurIPS IGLU. Вхожу в программный комитет конференций AAAI 2023, 2024, ECAI 2023, 2024, IROS 2023, 2024, ICRA 2024, CVPR 2024, IJCAI 2024 и в циклахACL Rolling Review.

новости

Apr 15, 2024	Our paper “Interactive Semantic Map Representation for Skill-Based Visual Object Navigation” has been published in IEEE Access.
Apr 10, 2024	Our paper “Sign-Based Image Criteria for Social Interaction Visual Question Answering” has been published in Logic Journal of the IGPL.
Jan 31, 2024	Our paper “Neural Potential Field for Obstacle-Aware Local Motion Planning” has been accepted at ICRA 2024.
Jan 20, 2024	Our paper “Object-Centric Learning with Slot Mixture Module” has been accepted at ICLR 2024.
Dec 20, 2023	Our papers “Learn to Follow: Decentralized Lifelong Multi-Agent Pathfinding via Planning and Learning” and “Decentralized Monte Carlo Tree Search for Partially Observable Multi-agent Pathfinding” have been accepted at AAAI 2024.

избранные публикации

2024

ICRA
Neural Potential Field for Obstacle-Aware Local Motion Planning

Muhammad Alhaddad, Konstantin Mironov, Aleksey Staroverov, and 1 more author

In 2024 IEEE International Conference on Robotics and Automation (ICRA), 2024

Abs Bib PDF

Model predictive control (MPC) may provide local motion planning for mobile robotic platforms. The challenging aspect is the analytic representation of collision cost for the case when both the obstacle map and robot footprint are arbitrary. We propose a Neural Potential Field: a neural network model that returns a differentiable collision cost based on robot pose, obstacle map, and robot footprint. The differentiability of our model allows its usage within the MPC solver. It is computationally hard to solve problems with a very high number of parameters. Therefore, our architecture includes neural image encoders, which transform obstacle maps and robot footprints into embeddings, which reduce problem dimensionality by two orders of magnitude. The reference data for network training are generated based on algorithmic calculation of a signed distance function. Comparative experiments showed that the proposed approach is comparable with existing local planners: it provides trajectories with outperforming smoothness, comparable path length, and safe distance from obstacles. Experiment on Husky UGV mobile robot showed that our approach allows real-time and safe local planning. The code for our approach is presented at https://github.com/cog-isa/NPField together with demo video.
@inproceedings{Alhaddad2024, dimensions = {true}, title = {Neural Potential Field for Obstacle-Aware Local Motion Planning}, booktitle = {2024 IEEE International Conference on Robotics and Automation (ICRA)}, author = {Alhaddad, Muhammad and Mironov, Konstantin and Staroverov, Aleksey and Panov, Aleksandr}, year = {2024}, keywords = {myconf, frccsc, mipt, airi, confastar} }
AAAI
Decentralized Monte Carlo Tree Search for Partially Observable Multi-agent Pathfinding

Alexey Skrynnik, Anton Andreychuk, Konstantin Yakovlev, and 1 more author

In Proceedings of the AAAI Conference on Artificial Intelligence, 2024

Abs Bib PDF

The Multi-Agent Pathfinding (MAPF) problem involves finding a set of conflict-free paths for a group of agents confined to a graph. In typical MAPF scenarios, the graph and the agents’ starting and ending vertices are known beforehand, allowing the use of centralized planning algorithms. However, in this study, we focus on the decentralized MAPF setting, where the agents may observe the other agents only locally and are restricted in communications with each other. Specifically, we investigate the lifelong variant of MAPF, where new goals are continually assigned to the agents upon completion of previous ones. Drawing inspiration from the successful AlphaZero approach, we propose a decentralized multi-agent Monte Carlo Tree Search (MCTS) method for MAPF tasks. Our approach utilizes the agent’s observations to recreate the intrinsic Markov decision process, which is then used for planning with a tailored for multi-agent tasks version of neural MCTS. The experimental results show that our approach outperforms state-of-theart learnable MAPF solvers. The source code is available at https://github.com/AIRI-Institute/mats-lp.
@inproceedings{Skrynnik2024b, dimensions = {true}, title = {Decentralized Monte Carlo Tree Search for Partially Observable Multi-agent Pathfinding}, volume = {38}, url = {https://ojs.aaai.org/index.php/AAAI/article/view/29703}, doi = {10.1609/aaai.v38i16.29703}, pages = {17531--17540}, booktitle = {Proceedings of the {AAAI} Conference on Artificial Intelligence}, author = {Skrynnik, Alexey and Andreychuk, Anton and Yakovlev, Konstantin and Panov, Aleksandr}, year = {2024}, keywords = {myconf, scopus, frccsc, airi, confastar} }
AAAI
Learn to Follow: Decentralized Lifelong Multi-Agent Pathfinding via Planning and Learning

Alexey Skrynnik, Anton Andreychuk, Maria Nesterova, and 2 more authors

In Proceedings of the AAAI Conference on Artificial Intelligence, 2024

Abs Bib PDF

Multi-agent Pathfinding (MAPF) problem generally asks to find a set of conflict-free paths for a set of agents confined to a graph and is typically solved in a centralized fashion. Conversely, in this work, we investigate the decentralized MAPF setting, when the central controller that possesses all the information on the agents’ locations and goals is absent and the agents have to sequentially decide the actions on their own without having access to the full state of the environment. We focus on the practically important lifelong variant of MAPF, which involves continuously assigning new goals to the agents upon arrival to the previous ones. To address this complex problem, we propose a method that integrates two complementary approaches: planning with heuristic search and reinforcement learning through policy optimization. Planning is utilized to construct and re-plan individual paths. We enhance our planning algorithm with a dedicated technique tailored to avoid congestion and increase the throughput of the system. We employ reinforcement learning to discover the collision avoidance policies that effectively guide the agents along the paths. The policy is implemented as a neural network and is effectively trained without any reward-shaping or external guidance. We evaluate our method on a wide range of setups comparing it to the state-of-the-art solvers. The results show that our method consistently outperforms the learnable competitors, showing higher throughput and better ability to generalize to the maps that were unseen at the training stage. Moreover our solver outperforms a rulebased one in terms of throughput and is an order of magnitude faster than a state-of-the-art search-based solver. The code is available at https://github.com/AIRI-Institute/learn-to-follow.
@inproceedings{Skrynnik2024, dimensions = {true}, title = {Learn to Follow: Decentralized Lifelong Multi-Agent Pathfinding via Planning and Learning}, volume = {38}, url = {https://ojs.aaai.org/index.php/AAAI/article/view/29704}, pages = {17541--17549}, booktitle = {Proceedings of the {AAAI} Conference on Artificial Intelligence}, author = {Skrynnik, Alexey and Andreychuk, Anton and Nesterova, Maria and Yakovlev, Konstantin and Panov, Aleksandr}, year = {2024}, keywords = {myconf, scopus, mipt, airi, 70-2021-00138, confastar} }
ICLR
Object-Centric Learning with Slot Mixture Module

Daniil Kirilenko, Vitaliy Vorobyov, Alexey Kovalev, and 1 more author

In The Twelfth International Conference on Learning Representations, 2024

Abs Bib PDF

Object-centric architectures usually apply a differentiable module to the entire feature map to decompose it into sets of entity representations called slots. Some of these methods structurally resemble clustering algorithms, where the cluster’s center in latent space serves as a slot representation. Slot Attention is an example of such a method, acting as a learnable analog of the soft k-means algorithm. Our work employs a learnable clustering method based on the Gaussian Mixture Model. Unlike other approaches, we represent slots not only as centers of clusters but also incorporate information about the distance between clusters and assigned vectors, leading to more expressive slot representations. Our experiments demonstrate that using this approach instead of Slot Attention improves performance in object-centric scenarios, achieving state-of-the-art results in the set property prediction task.
@inproceedings{Kirilenko2024, dimensions = {true}, title = {Object-Centric Learning with Slot Mixture Module}, url = {https://openreview.net/forum?id=aBUidW4Nkd}, booktitle = {The Twelfth International Conference on Learning Representations}, author = {Kirilenko, Daniil and Vorobyov, Vitaliy and Kovalev, Alexey and Panov, Aleksandr}, year = {2024}, keywords = {myconf, frccsc, mipt, 20-71-10116, airi, confastar} }

ICLR

Gradual Optimization Learning for Conformational Energy Minimization

Artem Tsypin, Leonid Ugadiarov, Kuzma Khrabrov, and 7 more authors

In The Twelfth International Conference on Learning Representations, 2024

Bib PDF

@inproceedings{Tsypin2024,
  dimensions = {true},
  title = {Gradual Optimization Learning for Conformational Energy Minimization},
  url = {https://openreview.net/forum?id=FMMF1a9ifL},
  booktitle = {The Twelfth International Conference on Learning Representations},
  author = {Tsypin, Artem and Ugadiarov, Leonid and Khrabrov, Kuzma and Telepov, Alexander and Rumiantsev, Egor and Skrynnik, Alexey and Panov, Aleksandr and Vetrov, Dmitry and Tutubalina, Elena and Kadurin, Artur},
  year = {2024},
  keywords = {myconf, frccsc, airi, confastar}
}

EAAI
Hierarchical waste detection with weakly supervised segmentation in images from recycling plants

Dmitry Yudin, Nikita Zakharenko, Artem Smetanin, and 7 more authors

Engineering Applications of Artificial Intelligence, 2024

Abs Bib PDF

Reducing environmental pollution with household waste and emissions from the computing clusters is an urgent technological problem. In our work, we explore both of these aspects: the deep learning application to improve the efficiency of waste recognition on recycling plant’s conveyor, as well as carbon dioxide emission from the computing devices used in this process. To conduct research, we developed an unique open WaRP dataset that demonstrates the best diversity among similar industrial datasets and contains more than 10,000 images with 28 different types of recyclable goods (bottles, glasses, card boards, cans, detergents, and canisters). Objects can overlap, be in poor lighting conditions, or significantly distorted. On the WaRP dataset, we study training and evaluation of cutting-edge deep neural networks for detection, classification and segmentation tasks. Additionally, we developed a hierarchical neural network approach called H-YC with weakly supervised waste segmentation. It provided a notable increase in the detection quality and made it possible to segment images, learning only having class labels, not their masks. Both the suggested hierarchical approach and the WaRP dataset have shown great industrial application potential.
@article{Yudin2024, dimensions = {true}, title = {Hierarchical waste detection with weakly supervised segmentation in images from recycling plants}, volume = {128}, issn = {0952-1976}, url = {https://linkinghub.elsevier.com/retrieve/pii/S0952197623017268}, doi = {10.1016/j.engappai.2023.107542}, pages = {107542}, journal = {Engineering Applications of Artificial Intelligence}, author = {Yudin, Dmitry and Zakharenko, Nikita and Smetanin, Artem and Filonov, Roman and Kichik, Margarita and Kuznetsov, Vladislav and Larichev, Dmitry and Gudov, Evgeny and Budennyy, Semen and Panov, Aleksandr}, urldate = {2023-11-20}, year = {2024}, keywords = {mypub, scopus, mipt, airi, q1scopusprelim} }
Access
Interactive Semantic Map Representation for Skill-Based Visual Object Navigation

Tatiana Zemskova, Aleksei Staroverov, Kirill Muravyev, and 2 more authors

IEEE Access, 2024

Abs Bib PDF

Visual object navigation is one of the key tasks in mobile robotics. One of the most important components of this task is the accurate semantic representation of the scene, which is needed to determine and reach a goal object. This paper introduces a new representation of a scene semantic map formed during the embodied agent interaction with the indoor environment. It is based on a neural network method that adjusts the weights of the segmentation model with backpropagation of the predicted fusion loss values during inference on a regular (backward) or delayed (forward) image sequence. We implement this representation into a full-fledged navigation approach called SkillTron. The method can select robot skills from end-to-end policies based on reinforcement learning and classic map-based planning methods. The proposed approach makes it possible to form both intermediate goals for robot exploration and the final goal for object navigation. We conduct intensive experiments with the proposed approach in the Habitat environment, demonstrating its significant superiority over state-of-the-art approaches in terms of navigation quality metrics. The developed code and custom datasets are publicly available at github.com/AIRI-Institute/skill-fusion.
@article{Zemskova2024, dimensions = {true}, title = {Interactive Semantic Map Representation for Skill-Based Visual Object Navigation}, volume = {12}, issn = {2169-3536}, url = {https://ieeexplore.ieee.org/document/10477345}, doi = {10.1109/ACCESS.2024.3380450}, pages = {44628--44639}, journal = {{IEEE} Access}, author = {Zemskova, Tatiana and Staroverov, Aleksei and Muravyev, Kirill and Yudin, Dmitry and Panov, Aleksandr}, year = {2024}, keywords = {mypub, scopus, frccsc, mipt, 20-71-10116, airi, q1scopusprelim} }

2023

Access
Fine-tuning Multimodal Transformer Models for Generating Actions in Virtual and Real Environments

Aleksei Staroverov, Andrey S Gorodetsky, Andrei S Krishtopik, and 3 more authors

IEEE Access, 2023

Abs Bib PDF

In this work, we propose and investigate an original approach to using a pre-trained multimodal transformer of a specialized architecture for controlling a robotic agent in an object manipulation task based on language instruction, which we refer to as RozumFormer. Our model is based on a bimodal (text-image) transformer architecture originally trained for solving tasks that use one or both modalities, such as language modeling, visual question answering, image captioning, text recognition, text-to-image generation, etc. The discussed model was adapted for robotic manipulation tasks by organizing the input sequence of tokens in a particular way, consisting of tokens for text, images, and actions. We demonstrated that such a model adapts well to new tasks and shows better results with fine-tuning than complete training in simulation and real environments. To transfer the model from the simulator to a real robot, new datasets were collected and annotated. In addition, experiments controlling the agent in a visual environment using reinforcement learning have shown that fine-tuning the model with a mixed dataset that includes examples from the initial visual-linguistic tasks only slightly decreases performance on these tasks, simplifying the addition of tasks from another domain.
@article{Staroverov2023, dimensions = {true}, title = {Fine-tuning Multimodal Transformer Models for Generating Actions in Virtual and Real Environments}, volume = {11}, url = {https://ieeexplore.ieee.org/document/10323309}, doi = {10.1109/ACCESS.2023.3334791}, pages = {130548--130559}, journal = {{IEEE} Access}, author = {Staroverov, Aleksei and Gorodetsky, Andrey S and Krishtopik, Andrei S and Yudin, Dmitry A and Kovalev, Alexey K and Panov, Aleksandr I}, year = {2023}, keywords = {mypub, scopus, frccsc, q1scopus, airi, mipt\_other, 70-2021-00138} }
AAAI
TransPath: Learning Heuristics For Grid-Based Pathfinding via Transformers

Daniil Kirilenko, Anton Andreychuk, Aleksandr Panov, and 1 more author

In Proceedings of the AAAI Conference on Artificial Intelligence, 2023

Abs Bib PDF

Heuristic search algorithms, e.g. A*, are the commonly used tools for pathfinding on grids, i.e. graphs of regular structure that are widely employed to represent environments in robotics, video games etc. Instance-independent heuristics for grid graphs, e.g. Manhattan distance, do not take the obstacles into account and, thus, the search led by such heuristics performs poorly in the obstacle-rich environments. To this end, we suggest learning the instance-dependent heuristic proxies that are supposed to notably increase the efficiency of the search. The first heuristic proxy we suggest to learn is the correction factor, i.e. the ratio between the instance independent cost-to-go estimate and the perfect one (computed offline at the training phase). Unlike learning the absolute values of the cost-to-go heuristic function, which was known before, when learning the correction factor the knowledge of the instance-independent heuristic is utilized. The second heuristic proxy is the path probability, which indicates how likely the grid cell is lying on the shortest path. This heuristic can be utilized in the Focal Search framework as the secondary heuristic, allowing us to preserve the guarantees on the bounded sub-optimality of the solution. We learn both suggested heuristics in a supervised fashion with the state-of-the-art neural networks containing attention blocks (transformers). We conduct a thorough empirical evaluation on a comprehensive dataset of planning tasks, showing that the suggested techniques i) reduce the computational effort of the A* up to a factor of \4\x while producing the solutions, which costs exceed the costs of the optimal solutions by less than \0.3\% on average; ii) outperform the competitors, which include the conventional techniques from the heuristic search, i.e. weighted A*, as well as the state-of-the-art learnable planners.
@inproceedings{Kirilenko2023, dimensions = {true}, title = {TransPath: Learning Heuristics For Grid-Based Pathfinding via Transformers}, volume = {37}, url = {https://ojs.aaai.org/index.php/AAAI/article/view/26465}, doi = {10.1609/aaai.v37i10.26465}, pages = {12436--12443}, booktitle = {Proceedings of the {AAAI} Conference on Artificial Intelligence}, author = {Kirilenko, Daniil and Andreychuk, Anton and Panov, Aleksandr and Yakovlev, Konstantin}, year = {2023}, keywords = {myconf, scopus, frccsc, airi, confastar} }

TNNLS

When to Switch: Planning and Learning For Partially Observable Multi-Agent Pathfinding

Alexey Skrynnik, Anton Andreychuk, Konstantin Yakovlev, and 1 more author

IEEE Transactions on Neural Networks and Learning Systems, 2023

Bib PDF

@article{Skrynnik2023,
  dimensions = {true},
  title = {When to Switch: Planning and Learning For Partially Observable Multi-Agent Pathfinding},
  doi = {10.1109/TNNLS.2023.3303502},
  pages = {(In Press)},
  journal = {{IEEE} Transactions on Neural Networks and Learning Systems},
  shortjournal = {{TNNLS}},
  author = {Skrynnik, Alexey and Andreychuk, Anton and Yakovlev, Konstantin and Panov, Aleksandr},
  year = {2023},
  langid = {english},
  keywords = {appl, group1}
}

RAL
Policy Optimization to Learn Adaptive Motion Primitives in Path Planning With Dynamic Obstacles

Brian Angulo, Aleksandr Panov, and Konstantin Yakovlev

IEEE Robotics and Automation Letters, 2023

Abs Bib PDF

This paper addresses the kinodynamic motion planning for non-holonomic robots in dynamic environments with both static and dynamic obstacles – a challenging problem that lacks a universal solution yet. One of the promising approaches to solve it is decomposing the problem into the smaller sub-problems and combining the local solutions into the global one. The crux of any planning method for non- holonomic robots is the generation of motion primitives that generates solutions to local planning sub-problems. In this work we introduce a novel learnable steering function (policy), which takes into account kinodynamic constraints of the robot and both static and dynamic obstacles. This policy is efficiently trained via the policy optimization. Empirically, we show that our steering function generalizes well to unseen problems. We then plug in the trained policy into the sampling-based and lattice-based planners, and evaluate the resultant POLAMP algorithm (Policy Optimization that Learns Adaptive Motion Primitives) in a range of challenging setups that involve a car-like robot operating in the obstacle-rich parking-lot en- vironments. We show that POLAMP is able to plan collision- free kinodynamic trajectories with success rates higher than 92%, when 50 simultaneously moving obstacles populate the environment showing better performance than the state-of-the- art competitors.
@article{Angulo2023, dimensions = {true}, title = {Policy Optimization to Learn Adaptive Motion Primitives in Path Planning With Dynamic Obstacles}, volume = {8}, issn = {2377-3766}, url = {https://ieeexplore.ieee.org/document/10003648/}, doi = {10.1109/LRA.2022.3233261}, pages = {824--831}, number = {2}, journal = {{IEEE} Robotics and Automation Letters}, author = {Angulo, Brian and Panov, Aleksandr and Yakovlev, Konstantin}, year = {2023}, eprint = {2212.14307}, keywords = {robotics, group1} }
Robotics
Skill Fusion in Hybrid Robotic Framework for Visual Object Goal Navigation

Aleksei Staroverov, Kirill Muravyev, Konstantin Yakovlev, and 1 more author

Robotics, 2023

Abs Bib HTML PDF

In recent years, Embodied AI has become one of the main topics in robotics. For the agent to operate in human-centric environments, it needs the ability to explore previously unseen areas and to navigate to objects that humans want the agent to interact with. This task, which can be formulated as ObjectGoal Navigation (ObjectNav), is the main focus of this work. To solve this challenging problem, we suggest a hybrid framework consisting of both not-learnable and learnable modules and a switcher between them—SkillFusion. The former are more accurate, while the latter are more robust to sensors’ noise. To mitigate the sim-to-real gap, which often arises with learnable methods, we suggest training them in such a way that they are less environment-dependent. As a result, our method showed top results in both the Habitat simulator and during the evaluations on a real robot. Video and code for our approach can be found on our website: https://github.com/AIRI-Institute/skill-fusion (accessed on 13 July 2023).
@article{Staroverov2024, dimensions = {true}, title = {Skill {Fusion} in {Hybrid} {Robotic} {Framework} for {Visual} {Object} {Goal} {Navigation}}, volume = {12}, issn = {2218-6581}, url = {https://www.mdpi.com/2218-6581/12/4/104}, doi = {10.3390/robotics12040104}, journal = {Robotics}, author = {Staroverov, Aleksei and Muravyev, Kirill and Yakovlev, Konstantin and Panov, Aleksandr I}, year = {2023} }

2022

CSR
Vector Semiotic Model for Visual Question Answering

Alexey K. Kovalev, Makhmud Shaban, Evgeny Osipov, and 1 more author

Cognitive Systems Research, 2022

Abs Bib PDF

In this paper, we propose a Vector Semiotic Model as a possible solution to the symbol grounding problem in the context of Visual Question Answering. The Vector Semiotic Model combines the advantages of a Semiotic Approach implemented in the Sign-Based World Model and Vector Symbolic Architectures. The Sign-Based World Model represents information about a scene depicted on an input image in a structured way and grounds abstract objects in an agent’s sensory input. We use the Vector Symbolic Architecture to represent the elements of the Sign-Based World Model on a computational level. Properties of a high-dimensional space and operations defined for high-dimensional vectors allow encoding the whole scene into a high-dimensional vector with the preservation of the structure. That leads to the ability to apply explainable reasoning to answer an input question. We conducted experiments are on a CLEVR dataset and show results comparable to the state of the art. The proposed combination of approaches, first, leads to the possible solution of the symbol-grounding problem and, second, allows expanding current results to other intelligent tasks (collaborative robotics, embodied intellectual assistance, etc.).
@article{Kovalev2021, dimensions = {true}, title = {Vector Semiotic Model for Visual Question Answering}, volume = {71}, issn = {1389-0417}, url = {https://www.sciencedirect.com/science/article/abs/pii/S1389041721000632}, doi = {10.1016/j.cogsys.2021.09.001}, pages = {52--63}, journal = {Cognitive Systems Research}, author = {Kovalev, Alexey K. and Shaban, Makhmud and Osipov, Evgeny and Panov, Aleksandr I.}, year = {2022}, keywords = {elibrary, mypub, scopus, wos, frccsc, computation, mipt, wos\_core, q1scopus, visual question answering, 20-71-10116, semiotic approach, causal network, grounding problem, preprint submitted to cognitive, q1wosprelim, symbol, vector-symbolic architecture} }
BrainInf
Hierarchical intrinsically motivated agent planning behavior with dreaming in grid environments

Evgenii Dzhivelikian, Artem Latyshev, Petr Kuderov, and 1 more author

Brain Informatics, 2022

Abs Bib PDF

Biologically plausible models of learning may provide a crucial insight for building autonomous intelligent agents capable of performing a wide range of tasks. In this work, we propose a hierarchical model of an agent operating in an unfamiliar environment driven by a reinforcement signal. We use temporal memory to learn sparse distributed representation of state–actions and the basal ganglia model to learn effective action policy on different levels of abstraction. The learned model of the environment is utilized to generate an intrinsic motivation signal, which drives the agent in the absence of the extrinsic signal, and through acting in imagination, which we call dreaming. We demonstrate that the proposed architecture enables an agent to effectively reach goals in grid environments.
@article{Dzhivelikian2022a, dimensions = {true}, title = {Hierarchical intrinsically motivated agent planning behavior with dreaming in grid environments}, volume = {9}, issn = {2198-4018}, url = {https://braininformatics.springeropen.com/articles/10.1186/s40708-022-00156-6}, doi = {10.1186/s40708-022-00156-6}, pages = {8}, number = {1}, journal = {Brain Informatics}, author = {Dzhivelikian, Evgenii and Latyshev, Artem and Kuderov, Petr and Panov, Aleksandr I}, year = {2022}, keywords = {slap, group1} }
Access
Hierarchical Landmark Policy Optimization for Visual Indoor Navigation

Aleksei Staroverov, and Aleksandr Panov

IEEE Access, 2022

Abs Bib PDF

In this paper, we study the problem of visual indoor navigation to an object that is defined by its semantic category. Recent works have shown significant achievements in the end-to-end reinforcement learning approach and modular systems. However, both approaches need a big step forward to be robust and practically applicable. To solve the problem of insufficient exploration of the scenes and make exploration more semantically meaningful, we extend standard task formulation and give the agent easily accessible landmarks in the form of the room locations and those types. The availability of landmarks allows the agent to build a hierarchical policy structure and achieve a success rate of 63% on validation scenes in a photo- realistic Habitat simulator. In a hierarchy, a low level consists of separately trained RL skills and a high level deterministic policy, which decides which skill is needed at the moment. Also, in this paper, we show the possibility of transferring a trained policy to a real robot. After a bit of training on the reconstructed real scene, the robot shows up to 79% SPL when solving the task of navigating to an arbitrary object.
@article{Staroverov2022, dimensions = {true}, title = {Hierarchical Landmark Policy Optimization for Visual Indoor Navigation}, volume = {10}, issn = {2169-3536}, url = {https://ieeexplore.ieee.org/document/9795006/}, doi = {10.1109/ACCESS.2022.3182803}, pages = {70447--70455}, journal = {{IEEE} Access}, author = {Staroverov, Aleksei and Panov, Aleksandr}, year = {2022}, keywords = {robotics, group1} }

2021

KBS
Forgetful experience replay in hierarchical reinforcement learning from expert demonstrations

Alexey Skrynnik, Aleksey Staroverov, Ermek Aitygulov, and 3 more authors

Knowledge-Based Systems, 2021

Abs Bib PDF

Deep reinforcement learning (RL) shows impressive results in complex gaming and robotic environments. These results are commonly achieved at the expense of huge computational costs and require an incredible number of episodes of interactions between the agent and the environment. Hierarchical methods and expert demonstrations are among the most promising approaches to improve the sample efficiency of reinforcement learning methods. In this paper, we propose a combination of methods that allow the agent to use low-quality demonstrations in complex vision-based environments with multiple related goals. Our Forgetful Experience Replay (ForgER) algorithm effectively handles expert data errors and reduces quality losses when adapting the action space and states representation to the agent’s capabilities. The proposed goal-oriented replay buffer structure allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations. Our method has a high degree of versatility and can be integrated into various off-policy methods. The ForgER surpasses the existing state-of-the-art RL methods using expert demonstrations in complex environments. The solution based on our algorithm beats other solutions for the famous MineRL competition and allows the agent to demonstrate the behavior at the expert level.
@article{Skrynnik2021, dimensions = {true}, title = {Forgetful experience replay in hierarchical reinforcement learning from expert demonstrations}, volume = {218}, issn = {09507051}, url = {https://linkinghub.elsevier.com/retrieve/pii/S0950705121001076}, doi = {10.1016/j.knosys.2021.106844}, pages = {106844}, journal = {Knowledge-Based Systems}, author = {Skrynnik, Alexey and Staroverov, Aleksey and Aitygulov, Ermek and Aksenov, Kirill and Davydov, Vasilii and Panov, Aleksandr I.}, year = {2021}, keywords = {slap, group1} }
CSR
Hierarchical Deep Q-Network from imperfect demonstrations in Minecraft

Alexey Skrynnik, Aleksey Staroverov, Ermek Aitygulov, and 3 more authors

Cognitive Systems Research, 2021

Abs Bib PDF

We present hierarchical Deep Q-Network (HDQfD) that took first place in MineRL competition. HDQfD works on imperfect demonstrations utilize hierarchical structure of expert trajectories extracting effective sequence of meta-actions and subgoals. We introduce structured task dependent replay buffer and adaptive prioritizing technique that allow the HDQfD agent to gradually erase poor-quality expert data from the buffer. In this paper we present the details of the HDQfD algorithm and give the experimental results in Minecraft domain.
@article{Skrynnik2021a, dimensions = {true}, title = {Hierarchical Deep Q-Network from imperfect demonstrations in Minecraft}, volume = {65}, issn = {13890417}, url = {https://www.sciencedirect.com/science/article/pii/S1389041720300723?via%3Dihub}, doi = {10.1016/j.cogsys.2020.08.012}, pages = {74--78}, journal = {Cognitive Systems Research}, author = {Skrynnik, Alexey and Staroverov, Aleksey and Aitygulov, Ermek and Aksenov, Kirill and Davydov, Vasilii and Panov, Aleksandr I.}, year = {2021}, eprint = {1912.08664v2}, keywords = {slap, group1} }
Access
Hybrid Policy Learning for Multi-Agent Pathfinding

Alexey Skrynnik, Alexandra Yakovleva, Vasilii Davydov, and 2 more authors

IEEE Access, 2021

Abs Bib PDF

In this work we study the behavior of groups of autonomous vehicles, which are the part of the Internet ofVehicles systems. One of the challenging modes of operation of such systems is the case when the observability of each vehicle is limited and the global/local communication is unstable, e.g. in the crowded parking lots. In such scenarios the vehicles have to rely on the local observations and exhibit cooperative behavior to ensure safe and efficient trips. This type of problems can be abstracted to the so-called multi- agent pathfinding when a group of agents, confined to a graph, have to find collision-free paths to their goals (ideally, minimizing an objective function e.g. travel time). Widely used algorithms for solving this problem rely on the assumption that a central controller exists for which the full state of the environment (i.e. the agents current positions, their targets, configuration of the static obstacles etc.) is known and they cannot be straightforwardly be adapted to the partially-observable setups. To this end, we suggest a novel approach which is based on the decomposition of the problem into the two sub-tasks: reaching the goal and avoiding the collisions. To accomplish each of this task we utilize reinforcement learning methods such as Deep Monte Carlo Tree Search, Q-mixing networks, and policy gradients methods to design the policies that map the agents’ observations to actions. Next, we introduce the policy-mixing mechanism to end up with a single hybrid policy that allows each agent to exhibit both types of behavior – the individual one (reaching the goal) and the cooperative one (avoiding the collisions with other agents). We conduct an extensive empirical evaluation that shows that the suggested hybrid-policy outperforms standalone stat-of-the-art reinforcement learning methods for this kind of problems by a notable margin.
@article{Skrynnik2021b, dimensions = {true}, title = {Hybrid Policy Learning for Multi-Agent Pathfinding}, volume = {9}, issn = {2169-3536}, url = {https://ieeexplore.ieee.org/document/9532001/}, doi = {10.1109/ACCESS.2021.3111321}, pages = {126034--126047}, journal = {{IEEE} Access}, author = {Skrynnik, Alexey and Yakovleva, Alexandra and Davydov, Vasilii and Yakovlev, Konstantin and Panov, Aleksandr I.}, year = {2021}, keywords = {appl, group1} }

2020

Access
Real-Time Object Navigation with Deep Neural Networks and Hierarchical Reinforcement Learning

Aleksey Staroverov, Dmitry A. Yudin, Ilya Belkin, and 3 more authors

IEEE Access, 2020

Abs Bib PDF

In the last years, deep learning and reinforcement learning methods have significantly improved mobile robots in such fields as perception, navigation, and planning. But there are still gaps in applying these methods to real robots due to the low computational efficiency of recent neural network architectures and their poor adaptability to robotic experiments’ realities. In this paper, we consider an important task in mobile robotics - navigation to an object using an RGB-D camera.We develop a new neural network framework for robot control that is fast and resistant to possible noise in sensors and actuators. We propose an original integration of semantic segmentation, mapping, localization, and reinforcement learning methods to improve the effectiveness of exploring the environment, finding the desired object, and quickly navigating to it. We created a new HISNav dataset based on the Habitat virtual environment, which allowed us to use simulation experiments to pre-train the model and then upload it to a real robot. Our architecture is adapted to work in a real-time environment and fully implements modern trends in this area.
@article{Staroverov2020b, dimensions = {true}, title = {Real-Time Object Navigation with Deep Neural Networks and Hierarchical Reinforcement Learning}, volume = {8}, issn = {2169-3536}, url = {https://ieeexplore.ieee.org/document/9241850/}, doi = {10.1109/ACCESS.2020.3034524}, pages = {195608--195621}, journal = {{IEEE} Access}, author = {Staroverov, Aleksey and Yudin, Dmitry A. and Belkin, Ilya and Adeshkin, Vasily and Solomentsev, Yaroslav K. and Panov, Aleksandr I.}, year = {2020}, keywords = {robotics, group1} }

2016

CSR
Multilayer cognitive architecture for UAV control

Stanislav Emel’yanov, Dmitry Makarov, Aleksandr I. Panov, and 1 more author

Cognitive Systems Research, 2016

Abs Bib PDF

Extensive use of unmanned aerial vehicles (UAVs) in recent years has induced the rapid growth of research areas related to UAV production. Among these, the design of control systems capable of automating a wide range of UAV activities is one of the most actively explored and evolving. Currently, researchers and developers are interested in designing control systems that can be referred to as intelligent, e.g. the systems which are suited to solve such tasks as planning, goal prioritization, coalition formation etc. and thus guarantee high levels of UAV autonomy. One of the principal problems in intelligent control system design is tying together various methods and models traditionally used in robotics and aimed at solving such tasks as dynamics modelling, control signal genera- tion, location and mapping, path planning etc. with the methods of behaviour modelling and planning which are thoroughly studied in cognitive science. Our work is aimed at solving this problem. We propose layered architecture — STRL (strategic, tactical, reactive, layered) — of the control system that au- tomates the behaviour generation using a cognitive approach while taking into account complex dynamics and kinematics of the control object (UAV).We use a special type of knowledge representation — sign world model — that is based on the psychological activity theory to describe individual behaviour planning and coalition formation processes. We also propose path planning methodology which serves as the mediator between the high-level cognitive activities and the reactive control signals generation. To generate these signals we use a state-dependent Riccati equation and specific method for solving it. We believe that utilization of the proposed architecture will broaden the spectrum of tasks which can be solved by the UAV’s coalition automatically, as well as raise the autonomy level of each individual member of that coalition.
@article{Emelyanov2016, dimensions = {true}, title = {Multilayer cognitive architecture for {UAV} control}, volume = {39}, issn = {1389-0417}, url = {http://linkinghub.elsevier.com/retrieve/pii/S1389041716000048}, doi = {10.1016/j.cogsys.2015.12.008}, pages = {58--72}, journal = {Cognitive Systems Research}, author = {Emel’yanov, Stanislav and Makarov, Dmitry and Panov, Aleksandr I. and Yakovlev, Konstantin}, year = {2016}, keywords = {strl, group1} }