Papers published by the author

The Single-File Test

A longitudinal public-interface evaluation of first-output LLM web generation with social reach tracking.

Published on arXiv cs.SE May 2026

The Single-File Test: A Longitudinal Public-Interface Evaluation of First-Output LLM Web Generation with Social Reach Tracking

This paper studies 68 single-file HTML generations from 17 public HTML AI Battle experiments, comparing GPT, Gemini, Grok, and Claude under a fixed first-output-only protocol. Each generated web app was evaluated for prompt adherence, functional correctness, and UI quality, then connected to social reach tracking across X, TikTok, and YouTube Shorts.

The results show Claude as the strongest and most consistent model family in this protocol, while longer measured reasoning time was not associated with higher overall quality. The study also examines LLM-as-a-judge behavior and explores which tracked variables predict social reach and generated HTML verbosity.

Open Paper Visual Presentation