From 30 November to 7 December 2025, the Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS) took place across San Diego (CA), and Mexico City (MX). NeurIPS is one of the world’s leading conferences in machine learning (ML) and computational neuroscience, serving as a key forum for presenting cutting-edge research and emerging trends. The conference plays a central role in shaping AI innovation, influencing technological directions, and fostering collaboration between academia and industry.
At the 2025 edition, two AIXPERT partners, Athena, our project coordinator, and the Vector Institute, participated through multiple research papers and scientific posters, contributing to discussions on trustworthy, explainable, and responsible AI.
BloomXplain: A Framework and Benchmark Dataset for Pedagogically Sound LLM-Generated Explanations Based on Bloom’s Taxonomy
Athena’s contributions focused on advancing educational AI and efficient neural architectures. In BloomXplain, Athena introduced a framework and STEM-focused benchmark to evaluate large language models (LLMs) on pedagogically sound explanations aligned with Bloom’s Taxonomy, demonstrating improvements in both instructional quality and model accuracy. In Autocompressing Networks (ACNs), Athena proposed a novel neural architecture that dynamically compresses information during training, improving robustness, transfer learning, and continual learning while reducing computational redundancy. Together, these works highlight Athena’s leadership in developing AI systems that are both effective and efficient.
HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation
The Vector Institute’s contributions centered on assessing fairness, bias, and environmental impact in multimodal and generative AI. Their research included Prompting Away Stereotypes, which demonstrated that fairness-aware prompting can influence demographic representation in text-to-image models; Bias in the Picture, which benchmarked vision-language models on news imagery to analyse how visual context and demographics shape biased outputs; and LinguaMark, a multilingual benchmark revealing variations in bias and performance across different languages (high-resource and low-resource) and models. In addition, the Carbon Literacy for Generative AI poster addressed the environmental costs of AI training through human-scale emission visualisations and the HumaniBench one introduced a human-centric evaluation framework exposing gaps in fairness, empathy, reasoning, and robustness in multimodal models.
Attendance at NeurIPS 2025 was crucial for the project, enabling direct engagement with the global research community, validating the project’s scientific direction, and strengthening its contribution to the development of trustworthy and responsible AI.
You can find NeurIPS publications and posters in our dedicated library section.

