AUTHOR GUIDELINES

Regular And Short Papers

  • Full papers: 6 pages + references
  • Short papers: 4 pages + references
  • Peer review process: Double-blind
  • Conference language: English

Demo Papers

  • Length: Up to 4 pages
  • Additional content: 1-2 pages should be appended to the paper that illustrate how the demo will be conducted on-site at CBMI 2025. This additional content will not be published in the conference proceedings, should the submission be accepted
  • Video link encouraged: Showing the demo in action
  • Peer review process: Single-blind

Submission

The conference proceedings will be published by IEEE. All submitted papers must conform to the IEEE manuscript templates for conference proceedings and the instructions it provides. On the IEEE website you will find both instructions and templates for Microsoft Word and LaTeX, as well as an Overleaf link.

The conftool submission system for CBMI 2025 is now operational. Please submit your papers by the relevant deadline. Please note the relevant deadlines for your track.

Submit your paper at https://www.conftool.pro/cbmi2025

Let us know if you have any questions regarding the conference. We are happy to help you. Write to us at submissions@cbmi2025.org

Special Session Submission

The organisers of CBMI 2025 call for novel and original research papers that are relevant for the following special sessions:

  • MmIXR: Multimedia Indexing for XR is a special session that encompasses methods for processes during Extended Reality authoring as well as during the immersive experience.
  • ExMA: Explainability in Multimedia Analysis is a special session that aims to gather scientific contributions that will help improve the trust and transparency of multimedia analysis systems.
  • VR4B: Video Retrieval for Beginners is a special session that aims at providing better insights into how interactive video retrieval systems are usable by users who have a solid IT background, but are not familiar with the details of the system.
  • UHBER: Multimodal Data Analysis for Understanding of Human Behaviour, Emotions and their Reasons is a special session that addresses the processing of all types of data related to understanding of human behaviour, emotion, and their reasons, such as current or past context.
  • AMHTAI: Advancing Medical Healthcare through AI is a special session that focuses on the latest advancements in AI-driven medical multimedia processing, IoT-enabled pervasive healthcare, and human-computer interaction.
  • Multimedia AI in Modern CB Retrieval: Challenges and Applications is a special session that focuses on AI-powered CB retrieval across diverse domains, including multimedia verification and fact-checking, healthcare, large-scale news retrieval, and 3D multimedia analysis.

MmIXR: Multimedia Indexing for XR

Extended Reality (XR) applications rely not only on computer vision for navigation and object placement but also require a range of multimodal methods to understand the scene or assign semantics to objects being captured and reconstructed. Multimedia indexing for XR thus encompasses methods for processes during XR authoring, such as indexing content to be used for scene and object reconstruction, as well as during the immersive experience, such as object detection and scene segmentation.

The intrinsic multimodality of XR applications involves new challenges like the analysis of egocentric data (video, depth, gaze, head/hand motion) and their interplay. XR is also applied in diverse domains, e.g., manufacturing, medicine, education, and entertainment, each with distinct requirements and data. Thus, multimedia indexing methods must be capable of adapting to the relevant semantics of the particular application domain.

Topics covered in the Special Session include, but are not limited to:

  • Multimedia analysis for media mining, adaptation (to scene requirements), and description for use in XR experiences (including but not limited to AI-based approaches)
  • Processing of egocentric multimedia datasets and streams for XR (e.g., egocentric video and gaze analysis, active object detection, video diarization/summarization/captioning)
  • Cross- and multi-modal integration of XR modalities (video, depth, audio, gaze, hand/head movements, etc.)
  • Approaches for adapting multimedia analysis and indexing methods to new application domains (e.g., open-world/open-vocabulary recognition/detection/segmentation, few-shot learning)
  • Large-scale analysis and retrieval of 3D asset collections (e.g., objects, scenes, avatars, motion capture recordings)
  • Multimodal datasets for scene understanding for XR
  • Generative AI and foundation models for multimedia indexing and/or synthetic data generation
  • Combining synthetic and real data for improving scene understanding
  • Optimized multimedia content processing for real-time and low-latency XR applications
  • Privacy and security aspects and mitigations for XR multimedia content

Organizers:

  • Claudio Gennaro, CNR-ISTI
  • Werner Bailer, JOANNEUM RESEARCH
  • Lyndon J. B. Nixon, MODUL Technology GmbH
  • Vasileios Mezaris, ITI-CERTH

ExMA: Explainability in Multimedia Analysis

The rise of machine learning has significantly improved the performance of AI systems. However, it has also raised questions about the reliability and explainability of their predictions. As the goal of eXplainable AI (XAI) is to understand and explain how these systems make their decisions, this special session aims to gather scientific contributions that will help improve the trust and transparency of multimedia analysis systems, with important benefits for society as a whole.

Therefore, contributions on the following topics will be highly appreciated:

  • Analysis of the factors influencing the final decision, as an essential step to understand and improve the underlying processes.
  • Information visualization for models or their predictions.
  • Interactive applications for XAI.
  • Performance evaluation metrics and protocols for explainability.
  • Sample-centric and dataset-centric explanations.
  • Attention mechanisms for XAI.
  • XAI-based pruning.
  • Applications of XAI methods, in particular those addressing domain experts.
  • Open challenges from industry or existing and emerging regulatory frameworks.

Organizers:

  • Martin Winter, JOANNEUM RESEARCH - DIGITAL
  • Romain Giot, University of Bordeaux

VR4B: Video Retrieval for Beginners

Despite the advances in automated content description using deep learning, and the emergence of joint image-text embedding models, many video retrieval tasks still require a human user in the loop. Interactive video retrieval (IVR) systems address these challenges. In order to assess their performance, multimedia retrieval benchmarks such as Video Browser Showdown (VBS) or Lifelog Search Challenge (LSC) have been established. These benchmarks provide large-scale datasets as well as task settings and evaluation protocols, allowing to measure progress in research on IVR systems. However, in order to achieve the best possible performance of the participating systems, they are usually operated by members of the development team. This special session aims at providing better insights into how such systems are usable by users who have a solid IT background, but are not familiar with the details of the system.

The submitted retrieval systems will be presented as demonstrations (with a related poster), and will compete in a novice competition. Volunteer attendees, who are not related to the developer team of any participating IVR system, but have seen the systems in the demonstration session, will use them to solve a small number of Video Browser Showdown tasks.

Organizers:

  • Werner Bailer, JOANNEUM RESEARCH, Austria
  • Cathal Gurrin, Dublin City University (DCU), Ireland
  • Björn Þór Jónsson, Reykjavik University, Iceland
  • Klaus Schöffmann, Klagenfurt University, Austria

UHBER: Multimodal Data Analysis for Understanding of Human Behaviour, Emotions and their Reasons

This special session addresses the processing of all types of data related to understanding of human behaviour, emotion, and their reasons, such as current or past context. Understanding human behaviour and context may be beneficial for many services both online and in physical spaces, e.g., for workplaces, travel and leisure activities, for health support etc.

In the context of multimedia retrieval, ounderstanding human behaviour and emotions could help not only for multimedia indexing, but also to derive implicit (i.e., other than intentionally reported) human feedback regarding multimedia news, videos, advertisements, navigators, hotels, shopping items etc. and to improve multimedia retrieval. For example, probably a movie recommender system should not put tragic movie on top of recommended movies when a user is tired and stressed.

Humans are good at understanding other humans, their emotions and reasons, and learning their tastes, skills and personality traits. Hence the interest of this session is, how to improve AI understanding of the same aspects? The topics include (but are not limited to) the following:

  • Use of various sensors for monitoring and understanding human behaviour, emotion / mental state / cognition, and context: video, audio, infrared, wearables, virtual (e.g., mobile device usage, computer usage) sensors etc.
  • Methods for information fusion, including information from various heterogeneous sources.
  • Methods to learn human traits and preferences from long term observations.
  • Methods to detect human implicit feedback from past and current observations.
  • Methods to assess task performance: skills, emotions, confusion, engagement in the task and/or context.
  • Methods to detect potential security and safety threats and risks.
  • Methods to adapt behavioural and emotional models to different end users and contexts without collecting a lot of labels from each user and/or for each context: transfer learning, semi-supervised learning, anomaly detection, one-shot learning etc.
  • How to collect data for training AI methods from various sources, e.g., internet, open data, field pilots etc.
  • Use of behavioural or emotional data to model humans and adapt services either online or in physical spaces.
  • Ethics and privacy issues in modelling human emotions, behaviour, context and reasons.

Organizers:

  • Elena Vildjiounaite, VTT Technical Research Centre of Finland.
  • Johanna Kallio, VTT Technical Research Centre of Finland.
  • Sari Järvinen, VTT Technical Research Centre of Finland.
  • Benjamin Allaert, IMT-Nord-Europe, France.
  • Ioan Marius Bilasco, University of Lille, France.
  • Mihai Mitrea, Telecom SudParis

AMHTAI: Advancing Medical Healthcare through AI

Rapid advancements in multimedia analysis, multimodal data fusion, and AI-driven decision support systems are reshaping modern healthcare by integrating diverse data sources such as medical imaging, surgical video analysis, wearable sensor data, and electronic health records (EHRs). This integration enables early disease detection, personalized treatment strategies, and intelligent health monitoring, improving diagnosis and empowering patient self-care. Recent innovations in AI-based medical diagnostics, smart health environments, and digital health assistants are further enhancing personalized, context-aware health solutions, which are critical in managing aging populations, addressing chronic diseases, and improving healthcare accessibility in underserved regions. In this context, multimodal data fusion—combining clinical records, wearable sensors, lifestyle data, environmental factors, and medical imaging—offers a promising avenue for providing comprehensive, data-driven healthcare insights. This special session focuses on the latest advancements in AI-driven medical multimedia processing, IoT-enabled pervasive healthcare, and human-computer interaction. It addresses key challenges in context-aware data fusion, real-time analytics, and AI-powered decision support within healthcare environments. The session will highlight the role of multimodal analytics in improving risk prediction, enhancing disease diagnosis, and advancing precision medicine, showcasing innovative approaches that integrate AI and real-time data processing to optimize healthcare outcomes. By showcasing the latest breakthroughs in deep learning, medical imaging fusion, natural language processing (NLP), and multimodal sensor integration, this session aims to explore how next-generation healthcare solutions can be scalable, explainable, and ethically responsible.

Topics of interest for this special session include, but are not limited to:

  • Advances in Multimedia Integration for Diagnostics and Health Monitoring: Leveraging AI-Driven Analysis of Video, Audio, and Imaging Data
  • Multimodal Data Fusion: Integration of electronic health records (EHRs), imaging, omics, and IoT data.
  • Machine Learning for Patient Stratification: Predictive Analytics for Health Risk Assessment and Personalized Care
  • Explainable AI (XAI): Enhancing interpretability and trust in AI-driven medical decisions.
  • Personalized treatment strategies powered by AI
  • Smart health environments and digital health assistants Natural language processing (NLP) in healthcare applications Ethical considerations in AI-based healthcare solutions

Organizers:

  • Dr. Thanassis Mavropoulos(ITI-CERTH)
  • Dr. Christoniki Maga-Nteve
  • Prof. Henning Müller (HES-SO)
  • Assoc. Prof. Dr. Klaus Schoeffmann(ITEC)
  • Dr. Stefanos Vrochidis (ITI-CERTH)

Multimedia AI in Modern CB Retrieval: Challenges and Applications

The rapid growth of multimedia data has fueled the need for advanced content-based (CB) retrieval methods, leveraging AI to enhance indexing, retrieval, verification, and multimodal analysis. This special session will focus on AI-powered CB retrieval across diverse domains, including multimedia verification and fact-checking, healthcare, large-scale news retrieval, and 3D multimedia analysis. We welcome theoretical advancements, novel applications, and scalable solutions to improve the efficiency, interpretability, and robustness of modern CB retrieval systems.

Topics of interest for this special session include, but are not limited to:

  1. Multimedia Verification and Fact-Checking · AI-powered multimedia forensics and deepfake detection · Cross-modal fact verification in text, image, and video · Misinformation and fake news detection using CB retrieval · Forensic indexing and multimedia authenticity verification
  2. Multimedia for Health Science · AI-driven medical image and video retrieval · CB retrieval for clinical decision support · Health data lifelogging and personal analytics · Multimodal fusion in medical applications
  3. 3D Multimedia Retrieval and Analysis · 3D Gaussian Splatting and AI-based 3D scene indexing · CB retrieval for 3D point clouds, meshes, and volumetric data · Augmented Reality (AR) and Virtual Reality (VR) content retrieval · AI-driven 3D object recognition and shape-based retrieval
  4. News Retrieval from Large-Scale Multimedia Datasets · AI-powered news indexing from video, TV news, images, and lifelogs · Multimodal event detection and tracking across media formats · Temporal and semantic indexing for large-scale news retrieval · Bias detection and contextual analysis in multimedia news
  5. Advances in AI for CB Multimedia Retrieval · Deep learning for scalable content-based search · Self-supervised and contrastive learning for multimodal retrieval · Efficient indexing and storage techniques for large-scale datasets · Explainability, fairness, and ethical considerations in CB retrieval

Organizers:

  • Minh-Triet Tran, University of Science, VNU-HCM
  • Duc-Tien Dang-Nguyen, University of Bergen
  • Tam V. Nguyen, University of Dayton
  • Vinh-Tiep Nguyen, University of Information Technology, VNU-HCM
  • Trung-Nghia Le, University of Science, VNU-HCM