Heretic is a tool that removes censorship (aka "safety alignment") from transformer-based language models without expensive post-training. It combines an advanced implementation of directional ...
🚀 TL;DR: We introduce Pseudo-Simulation, a novel AV evaluation methodology that combines the efficiency of open-loop evaluation with the robustness of closed-loop evaluation. By augmenting real data ...