Publications

Publications generated by jekyll-scholar

2026

  1. Bongards.png
    Cassidy Langenfeld, Claas Beger, Gloria Geng, Wasu Top Piriyakulkij, Keya Hu, Yewen Pu, and Kevin Ellis
    2026
    TLDR: Through a Bayesian approach to natural language rules and program synthesis, models can approach human performance on visual pattern recognition puzzles (Bongards).
  2. OmniCode.png
    Atharv Sonwane*, Eng-Shen Tu*, Wei-Chung Lu*Claas Beger*,  ..., Gloria Geng, Kevin Ellis, and Saikat Dutta
    2026
    TLDR: A multilingual programming benchmark for LLM-based software engineering agents consisting of bug-fixing, test-generation, style-fixing and addressing code reviews.

2025

  1. Figure_2_MIRAGE-1.png
    Alex Noviello*Claas Beger*, Jacob Groner, Kevin Ellis, and Weinan Sun
    2025
    TLDR: An architecture of schema learning and iterative application can resolve arbitrarily deep compositional statements.
  2. Memento.jpg
    Chao Wan, Albert Gong, Mihir Mishra, Carl-Leander Henneking, Claas Beger, and Kilian Q. Weinberger
    2025
    TLDR: Decomposing multi-hop questions into single-step prolog definition improves performance on various long-context question datasets.
  3. Error_Vis.jpg
    Claas Beger, and Saikat Dutta
    2025
    TLDR: Various language models struggle with dry-execution of simple and advanced code structures (Recursion, Concurrency OOP)
  4. Content_Style_Sentiment_Clustering.jpg
    Carl-Leander Henneking*, and Claas Beger*
    2025
    TLDR: Using improved clustering and a more diverse embedding approach our technique can more accurately compress preference datasets into human-readable constitutions
  5. citegeist-pipeline.jpg
    Award ribbon
    Claas Beger*, and Carl-Leander Henneking*
    2025
    TLDR: Through a multi-step retrieval and summarization pipeline with three definable properties Citegeist can synthesize related work for a given scientific paper
  6. Shortcut_Overview.png
    Claas Beger, Ryan Yi, Shuhao Fu, Arseny Moskvichev, Sarah W. Tsai, Sivasankaran Rajamanickam, and Melanie Mitchell
    2025
    TLDR: Vision-Language models have good performance on abstract reasoning tasks, but do not utilize the intended human-core knowledge priors.
  7. Poster.png
    Award ribbon
    Claas Beger
    2025
    TLDR: Models are commonly aligned on human preferences, but that does not mean they follow normative goals in their favor. Beyond that, new alignment techniques may be required to enable this.
  8. Chart.png
    Award ribbon
    Claas Beger, Shuhao Fu, Ryan Yi, Arseny Moskvichev, and Melanie Mitchell
    2025
    TLDR: The o3 model uses diverse shortcuts or heuristics to solve ARC tasks. Tool-usage can help output accuracy in visual modality, whereas increased reasoning effort helps for text.