How an AI can blackmail its human supervisor

hace 1 día 4

Anthropic made a point of exploiting artificial intelligence (AI). It announced that, in tests, its new model, Claude Opus 4, had blackmailed its supervisor. In the experiment, the manager of a fictional company pretended to want to replace Claude with another model. Claude, in a feint exemplary of machine rebellion, threatened to reveal his supervisor’s extramarital affair, which it knew about because it had access to certain emails. Apparently, anything went to avoid being shut down.

Seguir leyendo

Leer el artículo completo