
OpenAI is reportedly developing high-end AI “agents” with a $20,000 monthly tier for tasks requiring doctoral-level expertise. These agents are expected to conduct advanced research, debug complex code autonomously, and analyze large datasets for detailed reports.
The term “PhD-level AI” is essentially a marketing term, though OpenAI’s o3 model has demonstrated impressive performance on several academic benchmarks, including:
- ARC-AGI Visual Reasoning Benchmark: 87.5% (near human-level performance)
- 2024 American Invitational Mathematics Exam: 96.7% (missed only one question)
- GPQA Diamond (graduate-level STEM):
- 87.7%Frontier Math Benchmark: 25.2% (a significant jump over previous models)
The model utilizes “private chain of thought” reasoning, simulating internal dialogues before finalizing responses.
Despite these achievements, critics argue that hiring a real PhD student would be far cheaper than these AI agents’ $20,000 price tag. While the AI demonstrates high benchmark performance, its ability to replicate doctoral-level expertise in real-world research remains debatable.