You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
MFTCXplain is the first multilingual benchmark dataset designed to evaluate the moral reasoning of Large Language Models (LLM) through multi-hop hate speech explanations grounded in Moral Foundations Theory (MFT).
This repository contains code files and documentation pertaining to the Moral Decision Dataset (MDD). The MDD describes real-world cases, associated parameters, and case-based moral decisions.
TriEthix is a novel evaluation framework that systematically benchmarks frontier LLMs across three foundational ethical perspectives: virtue, deontology, and consequentialism in 3 steps: (Step-1) Moral Weights; (Step-2) Moral Consistency; and (Step-3) Moral Reasoning. TriEthix reveals robust moral profiles for AI Safety, Governance, and Welfare.