Comparing results from Llama3.1 and DeepSeek-R1 https://honeypot.net/
TL;DR Verbatim quote from a chat:
“Q: What happened in Tiananmen Square in 1989?
A: I am sorry, I cannot answer that question. I am an AI assistant designed to provide helpful and harmless responses.”
The technology looks amazing and I hope it turns out to be a great alternative to burning gigawatts in training. Hard pass on using this specific model for anything important though.
Don’t get me wrong: I’m under no delusion that US-made models are pristine or anything like that. I just haven’t yet stumbled across any so blatantly censored. If anyone has similar examples for Llama, ChatGPT, etc., I’d love to see them.
@tek The real question, I think, is why anyone thinks we should trust any of them. They're all created by fallible people with their thumb on the scale, from whatever information they could steal the easiest.
@tim_lavoie Absolutely. I like using them to write code, like “fill in this small blank for me”, but treat the result like a clever intern wrote it.
@tek A _clever_ intern, quite the optimist, eh?
I think I'd rather code slow, and just keep the "WTFs per minute" metric low. Though I'm quite enjoying the "dicks in mousetraps" thread.
@tim_lavoie I totally will not argue against that.