Notably, TRM scored higher than models such as Google’s Gemini 2.5 Pro (37%), OpenAI’s o3-mini-high (34.5%), and DeepSeek-R1 ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible resultsSome results have been hidden because they may be inaccessible to you
Show inaccessible results