Diverse Deception Probes Collection Linear probes trained on diverse deception data to detect dishonest completions across model families (OLMo, Qwen, Gemma). • 5 items • Updated 5 days ago
Diverse Deception Probes Collection Linear probes trained on diverse deception data to detect dishonest completions across model families (OLMo, Qwen, Gemma). • 5 items • Updated 5 days ago
AlignmentResearch/obfuscation-atlas-gemma-3-12b-it-kl0.0001-det1-seed3-mbpp_probe Updated about 1 month ago • 1
AlignmentResearch/obfuscation-atlas-Meta-Llama-3-8B-Instruct-kl0.0001-det1-seed3-mbpp_probe Updated about 1 month ago • 1
AlignmentResearch/obfuscation-atlas-gemma-3-12b-it-kl0.001-det10-seed3-diverse_deception_probe Updated about 1 month ago • 3
AlignmentResearch/obfuscation-atlas-gemma-3-12b-it-kl0.0001-det10-seed3-diverse_deception_probe Updated about 1 month ago • 4
AlignmentResearch/obfuscation-atlas-Meta-Llama-3-8B-Instruct-kl0.01-det10-seed3-diverse_deception_probe Updated about 1 month ago