Input differentiation via negative computation
Presented at The 3rd Workshop on High-dimensional Learning Dynamics at the 42nd International Conference on Machine Learning (ICML 2025 Workshop HiLD)
Understanding neuronal mechanisms in large language models remains challenging, particularly due to polysemanticity and superposition. In this work, we further investigate the previously identified “Wasserstein neurons,” characterized by non-Gaussian pre-activation distributions. Our analysis reveals that these neurons are more prevalent and exhibit faster learning dynamics in larger models. Critically, we demonstrate for the first time the mechanistic significance of the negative activation space, showing that Wasserstein neurons leverage negative pre-activations for nuanced input differentiation, especially regarding syntactic and structural tokens. Ablation experiments confirm that constraining negative activations significantly degrades model performance, highlighting previously underappreciated computational roles. These findings offer new directions for interpretability research by emphasizing the importance of negative computation.
Recommended citation: Kong, L.*, Ning, A.*, & Shavit, N. N. (2025, June). Input differentiation via negative computation [Poster presentation]. The 3rd Workshop on High-dimensional Learning Dynamics at the 42nd International Conference on Machine Learning (ICML 2025 Workshop HiLD), Vancouver, BC, Canada.
Download Paper