Discussion about this post

User's avatar
Mark Berry's avatar

John, thank you for the article. With the beginning of my sabbatical in March, I now make the time to read the writings of those I enjoy. You’ve always been on my short list.

I’m grateful to know that others conduct their LLM work through more than one platform. Over the past several months, I’ve used your process to evaluate - so far -five different platforms. I subscripts to two at a time. During the initial 30 days, it’s head to head - both running parallel prompts and evaluating what has been generated by one vs. the other. I’ll start the second 30 day cycle, continuing with the “winner” from round 1 and replacing the “loser” with another LLM platform. After 30 days, round 3 begins, with the round 2 winner taking on the next LLM. At any given time, I’m paying licenses for two LLM, as well as accessing “free” versions of remaining three I’ve already analyzed. The value of this exercise was - to me - a chance to “test drive” the five most popular LLMs in head-to-head testing.

You nail perfectly the inherent obsequiousness of these LLMs. Even with explicit guidance to be critical and challenge my default assumptions, I still find occasions when I’ve gotten well down the path of a detailed analysis only to have the LLM - as part of a late process check - “tell” me that what we’ve developed has previously undisclosed flaws or issues that call into question much or all of the analysis completed. Any ways you’ve found to overcome this?

\

No posts

Ready for more?