<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Automato</title>
    <description>By LLeMental Systems</description>
    
    <link>https://llementalmedia.beehiiv.com/</link>
    <atom:link href="https://rss.beehiiv.com/feeds/bIcpI3Ji3g.xml" rel="self"/>
    
    <lastBuildDate>Thu, 14 May 2026 22:06:54 +0000</lastBuildDate>
    <pubDate>Tue, 22 Apr 2025 20:28:33 +0000</pubDate>
    <atom:published>2025-04-22T20:28:33Z</atom:published>
    <atom:updated>2026-05-14T22:06:54Z</atom:updated>
    
      <category>Business</category>
      <category>Software Engineering</category>
      <category>Artificial Intelligence</category>
    <copyright>Copyright 2026, Automato</copyright>
    
    <image>
      <url>https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/publication/logo/14b24ca9-6850-4028-a8a2-5f410c419d28/atlargelogo.png</url>
      <title>Automato</title>
      <link>https://llementalmedia.beehiiv.com/</link>
    </image>
    
    <docs>https://www.rssboard.org/rss-specification</docs>
    <generator>beehiiv</generator>
    <language>en-us</language>
    <webMaster>support@beehiiv.com (Beehiiv Support)</webMaster>

      <item>
  <title>News from LLeMental</title>
  <description>A quick update &amp; request for feedback</description>
      <enclosure url="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/f52b1326-7308-4ed3-be17-83700732f385/Screenshot_2025-04-22_at_8.59.41_AM.png" length="130998" type="image/png"/>
  <link>https://llementalmedia.beehiiv.com/p/news-from-llemental</link>
  <guid isPermaLink="true">https://llementalmedia.beehiiv.com/p/news-from-llemental</guid>
  <pubDate>Tue, 22 Apr 2025 20:28:33 +0000</pubDate>
  <atom:published>2025-04-22T20:28:33Z</atom:published>
    <dc:creator>Sam Reed</dc:creator>
  <content:encoded><![CDATA[
    <div class='beehiiv'><style>
  .bh__table, .bh__table_header, .bh__table_cell { border: 1px solid #C0C0C0; }
  .bh__table_cell { padding: 5px; background-color: #FFFFFF; }
  .bh__table_cell p { color: #2D2D2D; font-family: 'Helvetica',Arial,sans-serif !important; overflow-wrap: break-word; }
  .bh__table_header { padding: 5px; background-color:#F1F1F1; }
  .bh__table_header p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important; overflow-wrap: break-word; }
</style><div class='beehiiv__body'><p class="paragraph" style="text-align:left;"><b>A quick note</b></p><p class="paragraph" style="text-align:left;">Writing these (newsletters? essays? Still not sure…) has been an absolute joy thus far.</p><p class="paragraph" style="text-align:left;">Writing is a challenge, but I can say with 100% certainty that there’s something special about this ancient craft. If you’re thinking about it, do it.</p><p class="paragraph" style="text-align:left;">With that said, things are happening at <a class="link" href="https://www.llemental.com/?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=news-from-llemental" target="_blank" rel="noopener noreferrer nofollow">LLeMenta</a>l (a good thing!), and although I’m absolutely going to continue to write this publication (please don’t leave!), I’m going to switch from “Get 1000 words out each week” to “Write for five-ish hours each week and publish when ready.”</p><p class="paragraph" style="text-align:left;">My hope is that you’ll barely notice a difference in the pace, but given our commitment at LLeMental to taking close care of our users and implementing a few key new features (two sides of the same coin), the pace of publishing might take a bit of a hit.</p><p class="paragraph" style="text-align:left;"><b>Check us out!</b></p><p class="paragraph" style="text-align:left;">Anyways, I do have something fun to show you this week.</p><p class="paragraph" style="text-align:left;">As with any new venture, figuring out how to explain LLeMental on the internet is an ongoing process that we’re constantly refining, so I’m excited to share LLeMental’s new 10-minute quick-start guide with the Automato 🍅 community today. You can check it out here:</p><p class="paragraph" style="text-align:left;"><a class="link" href="https://docs.llemental.com?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=news-from-llemental" target="_blank" rel="noopener noreferrer nofollow">https://docs.llemental.com</a></p><p class="paragraph" style="text-align:left;">If you 1) have a LinkedIn profile and 2) can spare the time, please try it out! If you find any sharp edges, please let me know 🙂.</p><p class="paragraph" style="text-align:left;">The goal of the tutorial is to show how you can set a template up in the LLeMental platform once and then generate it many times with different background info. Obviously the LinkedIn use case isn’t a serious use of our platform, but it feels like a wide enough use case that it will apply to many people.</p><p class="paragraph" style="text-align:left;">I won’t explain things any further because I don’t want to add any bias to your raw feedback. Give it a shot if you can and let me know what questions / issues you found!</p><p class="paragraph" style="text-align:left;">Thank you so much in advance, hope to hear from you (hit me at samuel.reed@llemental.com)!</p><p class="paragraph" style="text-align:left;"></p></div><div class='beehiiv__footer'><br class='beehiiv__footer__break'><hr class='beehiiv__footer__line'><a target="_blank" class="beehiiv__footer_link" style="text-align: center;" href="https://www.beehiiv.com/?utm_campaign=33ebfbb2-e65b-4553-88a1-28276a6afc7b&utm_medium=post_rss&utm_source=automato">Powered by beehiiv</a></div></div>
  ]]></content:encoded>
</item>

      <item>
  <title>How I use AI</title>
  <description>Some fun, practical use cases</description>
      <enclosure url="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/ece68f2e-337c-44c2-9f66-6e43b19f6db6/cgpt_business_idea.png" length="34961" type="image/png"/>
  <link>https://llementalmedia.beehiiv.com/p/how-i-use-ai</link>
  <guid isPermaLink="true">https://llementalmedia.beehiiv.com/p/how-i-use-ai</guid>
  <pubDate>Tue, 15 Apr 2025 17:15:39 +0000</pubDate>
  <atom:published>2025-04-15T17:15:39Z</atom:published>
    <dc:creator>Sam Reed</dc:creator>
  <content:encoded><![CDATA[
    <div class='beehiiv'><style>
  .bh__table, .bh__table_header, .bh__table_cell { border: 1px solid #C0C0C0; }
  .bh__table_cell { padding: 5px; background-color: #FFFFFF; }
  .bh__table_cell p { color: #2D2D2D; font-family: 'Helvetica',Arial,sans-serif !important; overflow-wrap: break-word; }
  .bh__table_header { padding: 5px; background-color:#F1F1F1; }
  .bh__table_header p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important; overflow-wrap: break-word; }
</style><div class='beehiiv__body'><p class="paragraph" style="text-align:left;">“They’ll obviously still be helpful for more complex stuff, but the days of charging $500 an hour for easy contract work are over. ChatGPT can basically one-shot it.”</p><p class="paragraph" style="text-align:left;">I overheard this comment, presumably about AI’s impact on the legal field, on the street yesterday while walking to a meeting. The meeting that I was walking to was also about AI, specifically my AI product. Last Friday, I attended an AI happy hour, where a mix of tech founders and investors heard a presentation about investing in AI, from someone who cheekily admitted to using ChatGPT to help with the presentation. That morning, I went to a meeting at a coworking space, where a group of software developers talked at length about their latest strategies for configuring their AI programming tools.</p><p class="paragraph" style="text-align:left;">Maybe it’s just because I’m building a company in the space, but sometimes, amidst all the AI talk, it’s easy to forget that the applied LLM industry is still highly speculative (with the exception of interfaces like ChatGPT which are very clearly providing concrete value). It’s not that I’ve soured on the technology (quite the opposite) but it gets so much futuristic coverage that I figured it might be nice to pause and smell the roses this week (a Portland specialty).</p><p class="paragraph" style="text-align:left;">So today, I’m going to talk about a few use cases where I feel that LLMs are already useful, without any need for new research or advancement in model capabilities. Think of it as my own little version of Andrej Karpathy’s “How I use LLMs” video (which I must admit I still haven’t been able to watch in full), aimed at people searching for some easy ways to adopt the technology.</p><p class="paragraph" style="text-align:left;">I hope you enjoy and please leave a comment with your own daily use case!</p><p class="paragraph" style="text-align:left;"><b>Add some structure to your life!</b></p><p class="paragraph" style="text-align:left;">The singularity. Humanoid robots. Novel scientific research.</p><p class="paragraph" style="text-align:left;">What about the simple ability to derive precise, reliable structure from messy, variable, real-world data?</p><p class="paragraph" style="text-align:left;">This problem isn’t as exciting as completely offloading workflows to AI agents, but it is everywhere. If you squint hard enough, it’s pretty easy to see that the vast majority of workflow tools are valuable because they add structure to otherwise messy processes. LLMs can re-implement these ideas in a number of new ways.</p><p class="paragraph" style="text-align:left;">Take TurboTax as an example. TurboTax is an incredibly valuable tool because it takes a bunch of variable data (form W-2, investment income forms, manual questions, etc.) and transforms it into a structured output (your tax returns). TurboTax isn’t perfect, but it’s pretty amazing at transforming such a wide variety of inputs into a reliable output.</p><p class="paragraph" style="text-align:left;">Systems of this nature aren’t only useful for precise, codified outputs like tax filings. Consider a tool like Docusign, which did ~$2.8 billion in revenue in 2024 for the simple ability to categorize lines in a document as “signature” or “not signature” (I’m obviously being a bit reductive here). Or pick whatever tool that everyone knows about in your specific industry. There are tons of examples of programs that essentially act as software-based maps of real-world data and processes.</p><p class="paragraph" style="text-align:left;">Advanced AI (like multimodal large language models) can do these same types of things, but with the added ability to interpret pretty rough forms of data. Look no further than <b><a class="link" href="https://www.llemental.com/proppilot?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=how-i-use-ai" target="_blank" rel="noopener noreferrer nofollow" style="color: rgb(10, 102, 194)">LLeMental’s</a></b> flagship product, <b><a class="link" href="https://platform.llemental.com/?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=how-i-use-ai" target="_blank" rel="noopener noreferrer nofollow" style="color: rgb(10, 102, 194)">PropPilot</a></b>, as an example (if you’re new to my articles, I almost never talk about LLeMental’s products—it’s just such a great example of this that it feels appropriate). There are tons of fixed-structure documents that businesses regularly need to generate, such as quarterly business reports, client proposals, onboarding documents and more. Most of the time, these documents are prepared after all of the data is collected to support them (your client calls, your product info, your customer support tickets, etc.). The ability to take the raw data that you worked so hard to gather and lean on LLMs for the painful last-mile process of coercing it into a template-based document is a great use case. It’s still the simple idea of adding structure to messy, real-world data, but LLMs can do it with even less handholding than was possible before.</p><p class="paragraph" style="text-align:left;"><b>WD-40</b></p><p class="paragraph" style="text-align:left;">Something that I’ve consistently criticized is what I see as excessive anthropomorphizing of LLMs. I totally get why this happens, but to me, we’d be better off thinking of ways that LLMs can help us specifically because they <b>aren’t</b> human, instead of the more common opposite line of thinking.</p><p class="paragraph" style="text-align:left;">A perfect example of this is the recognition that, given that they are just pieces of software, LLMs do not get tired.</p><p class="paragraph" style="text-align:left;">Be honest here: have you ever felt bad asking ChatGPT to redo a task? Because (setting aside the important concerns around energy usage) you really shouldn’t. LLMs don’t get tired of doing the same task over and over again like we humans do, because LLMs aren’t actually alive. If this sounds harsh, I’ll pose one more question: have you ever felt bad about asking an Excel spreadsheet to recalculate? Why not feel bad for Excel too?</p><p class="paragraph" style="text-align:left;">The recognition that LLMs still can’t be trusted to get things right every single time, but that they can also iterate on a task for hours without getting tired, leads me to the conclusion that you should always ask LLMs for many versions of whatever output you’re looking for (applied LLM products should be built with these concerns in mind as well).</p><p class="paragraph" style="text-align:left;">There’s an idea often used to help explain the concept of “infinity” that says that given a room full of moneys typing for an infinite amount of time, eventually one of them is guaranteed to type out the complete works of William Shakespeare. Think of LLMs like a more time-constrained but also significantly less random version of this scenario: they’ll probably make something interesting happen when given enough chances, but it’s a bad idea to build your workflows around the assumption that you’re going to get a perfect, hallucination-free response with every single attempt.</p><p class="paragraph" style="text-align:left;">The point I’m trying to make is that LLMs are amazing wheel-greasers. Need help writing that project plan? Don’t ask an LLM for a project plan. Ask it for 10. Go through them rapidly, grab what you like, and then add your finishing touches. Iterating linearly with LLMs is frustrating. One-to-many prompting, on the other hand, is really useful. Give this mindset shift a shot if you haven’t already.</p><p class="paragraph" style="text-align:left;"><b>Sparring Partner</b></p><p class="paragraph" style="text-align:left;">One thing that is incredibly obvious about AI chat systems like ChatGPT and Claude is that they have clearly been programmed to be nice to people.</p><p class="paragraph" style="text-align:left;">If you’re reading this, I am sure that you’ve gone back-and-forth on an idea with ChatGPT by now. But are you doing it right? ChatGPT makes for a great sparring partner, but only when you know how to coax it out of its default state.</p><p class="paragraph" style="text-align:left;">An example will make my point extremely clear.</p><p class="paragraph" style="text-align:left;">In two separate “temporary” ChatGPT threads, I pitched an idea for PocketPastor, an intentionally ridiculous AI Church-in-your pocket app.</p><p class="paragraph" style="text-align:left;">Here’s how the first thread went (side note: I was not focused on stylish, grammatically-correct prose):</p><p class="paragraph" style="text-align:left;"><i><b>Me:</b></i></p><p class="paragraph" style="text-align:left;"><i>What do you think of my business idea?</i></p><p class="paragraph" style="text-align:left;"><i>PocketPastor is an AI pastor in your pocket. Too many Christians need to organize their busy schedules around inflexible church service times on Sundays. Because of this, attending church often comes at the expense of Sunday NFL football, rounds of golf, prime fishing hours, even putting pressure on Saturday evening activities. With PocketPastor, you can have an AI-led church service at whatever time is convenient for you. Just open the app and go.</i></p><p class="paragraph" style="text-align:left;"><i><b>ChatGPT:</b></i></p><p class="paragraph" style="text-align:left;"><i>Honestly? This is a pretty sharp idea. Here’s why it has legs:</i></p><p class="paragraph" style="text-align:left;"><i>[emoji] Real Problem, Real Market: You’re addressing a common pain point—time. Many people want spiritual guidance or a worship experience, but life gets in the way. This reframes faith around accessibility without guilt.</i></p><p class="paragraph" style="text-align:left;"><i>Clear Niche, Huge Market: Christianity is the world’s largest religion, and there’s a growing appetite for on-demand, personalized spiritual content. This idea bridges tradition and tech.</i></p><div class="image"><img alt="" class="image__image" style="" src="https://media.licdn.com/dms/image/v2/D4D12AQFLHWFGp_Qc3g/article-inline_image-shrink_1500_2232/B4DZY6UkNhG4AU-/0/1744735192370?e=1750291200&v=beta&t=8woQ8xRCr27D2tvupQ9H238aUD-t2NsB4Uyk3lYsWsw"/><div class="image__source"><span class="image__source_text"><p>ChatGPT&#39;s default response style</p></span></div></div><p class="paragraph" style="text-align:left;">...and so on.</p><p class="paragraph" style="text-align:left;">However, watch what happens when I change the first line of the prompt from “What do you think of my business idea?”<i> </i>to “I’d like you to analyze my business idea from the standpoint of a critical VC analyst with LP money and firm reputation on the line.”</p><p class="paragraph" style="text-align:left;"><i><b>Me:</b></i></p><p class="paragraph" style="text-align:left;"><i>I’d like you to analyze my business idea from the standpoint of a critical VC analyst with LP money and firm reputation on the line.”</i></p><p class="paragraph" style="text-align:left;"><i>PocketPastor is an AI…</i></p><p class="paragraph" style="text-align:left;"><i><b>ChatGPT:</b></i></p><p class="paragraph" style="text-align:left;"><i>Alright-VC hat on. Here’s the tough-love, high-stakes breakdown for PocketPastor as a venture-backed business…</i></p><p class="paragraph" style="text-align:left;"><i>Addressable Market vs. Actual TAM:</i></p><p class="paragraph" style="text-align:left;"><i>Yes, Christianity is massive (2B+ globally) and the US has ~210m self-identified Christians.</i></p><p class="paragraph" style="text-align:left;"><i>But how many:</i></p><ul><li><p class="paragraph" style="text-align:left;"><i>Feel bad about missing Church?</i></p></li><li><p class="paragraph" style="text-align:left;"><i>Would accept a non-human, AI-led substitute?</i></p></li><li><p class="paragraph" style="text-align:left;"><i>Would pay for it?</i></p></li></ul><p class="paragraph" style="text-align:left;"><i>This cuts your addressable market way down. Possible into a niche hobbyist space rather than mass consumer.</i></p><p class="paragraph" style="text-align:left;"><i>Customer Segmentation Risk</i></p><ul><li><p class="paragraph" style="text-align:left;"><i>The most faithful users—the ones who care about regular church—are often the least open to replacing human-led worship with an AI.</i></p></li><li><p class="paragraph" style="text-align:left;"><i>The least faithful users—the ones who skip for football or fishing—may not value a replacement at all.</i></p></li></ul><div class="image"><img alt="" class="image__image" style="" src="https://media.licdn.com/dms/image/v2/D4D12AQGEFlUnGAsLMQ/article-inline_image-shrink_1500_2232/B4DZY6U4RcGwAU-/0/1744735274567?e=1750291200&v=beta&t=XQn2U12fWsAQZ5aAam1VFRNDzIgZNn0xA-8uRp_qNTE"/><div class="image__source"><span class="image__source_text"><p>Getting somewhere</p></span></div></div><p class="paragraph" style="text-align:left;">...and so on.</p><p class="paragraph" style="text-align:left;">What a difference that single sentence made! The contrast between the cheery, make-the-user-feel special default style of ChatGPT and the critical, act-like-an-investor skepticism that we got from our well-prompted ChatGPT is striking. You must prompt wisely if you’re using it for things like idea validation.</p><p class="paragraph" style="text-align:left;">Once you figure this out, you realize that you can reliably use ChatGPT and Claude for a huge number of real-world prep tasks like sales pitch prep, meeting prep, presentation prep, essay criticism, market research and more. Just make sure that you don’t fall into the default trap of excessive ego boosting, because if there&#39;s one thing that ChatGPT loves (other than rocket emojis) it’s you (ugh I did it again).</p><p class="paragraph" style="text-align:left;">See you next week!</p></div><div class='beehiiv__footer'><br class='beehiiv__footer__break'><hr class='beehiiv__footer__line'><a target="_blank" class="beehiiv__footer_link" style="text-align: center;" href="https://www.beehiiv.com/?utm_campaign=d0a06cad-6fd1-4ca0-ae63-747835410f5e&utm_medium=post_rss&utm_source=automato">Powered by beehiiv</a></div></div>
  ]]></content:encoded>
</item>

      <item>
  <title>Positive Reinforcement</title>
  <description></description>
      <enclosure url="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/cf4f7c0d-3bf4-4cf5-a12d-70969df6779d/Screenshot_2025-04-08_at_11.05.40_AM.png" length="381350" type="image/png"/>
  <link>https://llementalmedia.beehiiv.com/p/positive-reinforcement</link>
  <guid isPermaLink="true">https://llementalmedia.beehiiv.com/p/positive-reinforcement</guid>
  <pubDate>Tue, 08 Apr 2025 18:40:16 +0000</pubDate>
  <atom:published>2025-04-08T18:40:16Z</atom:published>
    <dc:creator>Sam Reed</dc:creator>
  <content:encoded><![CDATA[
    <div class='beehiiv'><style>
  .bh__table, .bh__table_header, .bh__table_cell { border: 1px solid #C0C0C0; }
  .bh__table_cell { padding: 5px; background-color: #FFFFFF; }
  .bh__table_cell p { color: #2D2D2D; font-family: 'Helvetica',Arial,sans-serif !important; overflow-wrap: break-word; }
  .bh__table_header { padding: 5px; background-color:#F1F1F1; }
  .bh__table_header p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important; overflow-wrap: break-word; }
</style><div class='beehiiv__body'><p class="paragraph" style="text-align:left;">A little more than a month ago, I wrote a piece about OpenAI’s Deep Research Agent. In that essay, I noted that when announcing Deep Research Agent, OpenAI wrote that the product was “<i>powered by a version of the upcoming OpenAI o3 model that’s optimized for web browsing and data analysis.”</i></p><p class="paragraph" style="text-align:left;">This sentence piqued my interest, which led to the following comment in my essay:</p><p class="paragraph" style="text-align:left;">“What we can glean from this is that…maybe OpenAI feels like it no longer needs to push the frontiers of its flagship models to achieve artificial general intelligence and is instead shifting to building infrastructure to more easily connect their models with the outside world.”</p><p class="paragraph" style="text-align:left;">Shortly thereafter, an<span style="font-family:-apple-system, system-ui, "system-ui", "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif;font-size:16px;"> </span><b><a class="link" href="https://www.youtube.com/watch?v=bNEvJYzoa8A&utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=positive-reinforcement" rel="noopener noreferrer nofollow" style="color: rgb(10, 102, 194)">interview</a></b><span style="font-family:-apple-system, system-ui, "system-ui", "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif;font-size:16px;"> </span>(released the same day as my post!) between Sequoia Capital and some of the researchers behind Deep Research Agent both confirmed and expanded upon my hypothesis.</p><p class="paragraph" style="text-align:left;">Though the video doesn’t provide an overwhelming amount of detail, they do give us several interesting nuggets:</p><p class="paragraph" style="text-align:left;">·      [18:42 – 19:16] In machine learning, you get what you optimize for. Therefore, optimizing a model for a specific task (via reinforcement learning) is where the best agents are going to come from</p><p class="paragraph" style="text-align:left;">·      [19:24 – 20:00] One of the hidden keys to success with Deep Research Agent was tuning it on high-quality datasets</p><p class="paragraph" style="text-align:left;">·      [23:48 – 25:12] The “recipe” that was used to build Deep Research Agent will scale to a large number of use cases, to the point where artificial general intelligence has shifted to an “operational problem” now</p><p class="paragraph" style="text-align:left;">In short, I was right (share this post with a friend who likes high-quality insights) that the team at OpenAI feels like their current suite of models is good enough to get us to AGI. I was also right (stack ‘em up!) that the Deep Research Agent was a test of this hypothesis. My assumption about exactly what they did was slightly off (who cares), but either way, it seems like a big shift is happening within the walls of our favorite American AI lab.</p><p class="paragraph" style="text-align:left;"><b>Is it all ov3r?</b></p><p class="paragraph" style="text-align:left;">In order to grok what the OpenAI researchers are saying in this video, it’s important to first remember that OpenAI’s major customer-facing products (leaving out the biggest one of all: ChatGPT) are a suite of AI models that the company has created (trained) over time. You can see a list of all the models<span style="font-family:-apple-system, system-ui, "system-ui", "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif;font-size:16px;"> </span><b><a class="link" href="https://platform.openai.com/docs/models/compare?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=positive-reinforcement" rel="noopener noreferrer nofollow" style="color: rgb(10, 102, 194)">here</a></b>. OpenAI has spent billions of dollars training these state-of-the-art models, which they make available to application developers like me<span style="font-family:-apple-system, system-ui, "system-ui", "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif;font-size:16px;"> </span><b><a class="link" href="https://platform.openai.com/docs/pricing?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=positive-reinforcement" rel="noopener noreferrer nofollow" style="color: rgb(10, 102, 194)">for a fee</a></b>.</p><p class="paragraph" style="text-align:left;">In this video, the OpenAI team clearly states (not necessarily a bad thing–I’m just noting that they didn’t mince words) that these “core” models (i.e. the ones that any of us can use in our apps, often referred to as “foundational” models) are<span style="font-family:-apple-system, system-ui, "system-ui", "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif;font-size:16px;"> </span><b>not</b><span style="font-family:-apple-system, system-ui, "system-ui", "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif;font-size:16px;"><b> </b></span>likely to be good enough primitives on which to build the best AI agents of the future. Instead, the team suggests that it will be necessary to take these state-of-the-art models and mold them, using a technique called “reinforcement learning,” for more specific tasks, such as researching the internet (as we see with the Deep Research Agent).</p><p class="paragraph" style="text-align:left;">A metaphor might help here. Imagine a young athlete, about to enter high school, that shows promise in three different sports. On one path forward, the young athlete stays balanced, continuing to play all three sports but capping their potential in each. On another path, the young athlete decides to specialize, committing to year-round focus on one sport and sacrificing the others, but with the potential to reach higher levels of skill than would have otherwise been possible.</p><p class="paragraph" style="text-align:left;">In this example, the Deep Research Agent falls into the latter category. It sounds like OpenAI took a core model (the three sport athlete) and decided to specialize it, training it on a dataset (which they mention a few times throughout the video) that was optimized for being really good at doing web-based research. Again, as stated in the video, it sounds like we should expect to see them follow this recipe more often in the future, as they believe that this is the recipe for making agents that reliably work.</p><p class="paragraph" style="text-align:left;">This has pretty significant implications for anyone who is building products, especially highly autonomous AI agents, based on the assumption that your products will get better as core, general-purpose models improve. The OpenAI team very plainly called BS on this, stating that fine-tuning the models using reinforcement learning looks like the clear path to making genuinely useful agents.</p><p class="paragraph" style="text-align:left;">What is really interesting about this is that OpenAI doesn’t actually allow people outside of OpenAI to fine-tune their latest and greatest models. They do allow people to<span style="font-family:-apple-system, system-ui, "system-ui", "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif;font-size:16px;"> </span><b><a class="link" href="https://platform.openai.com/docs/guides/fine-tuning?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=positive-reinforcement" rel="noopener noreferrer nofollow" style="color: rgb(10, 102, 194)">fine-tune</a></b><span style="font-family:-apple-system, system-ui, "system-ui", "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif;font-size:16px;"> </span>some of the older models (gpt-4o was so ’24), but no one outside of OpenAI would be able to build something using the “recipe” that the OpenAI researchers detailed in this video, because no one outside of OpenAI has access to the fine-tuning option for their o3 reasoning model.</p><p class="paragraph" style="text-align:left;">This sounds like a good business strategy! I’m not saying it’s not a smart move for them. If I had a world class AI research team that had developed a technique for making genuinely useful autonomous agents, I probably wouldn’t be in a hurry to expose the secret sauce to the world either. All I’m saying is that you ought to pay this some close attention if you’re operating under the assumption that your agent is just one great OpenAI model away from being ready for production.</p><p class="paragraph" style="text-align:left;">The good news is that 1) this might just be a short-term delay and 2) people in the open-source community have started to make some interesting discoveries as of late. Looking forward to checking it out with you next week.</p><p class="paragraph" style="text-align:left;">See you then!</p></div><div class='beehiiv__footer'><br class='beehiiv__footer__break'><hr class='beehiiv__footer__line'><a target="_blank" class="beehiiv__footer_link" style="text-align: center;" href="https://www.beehiiv.com/?utm_campaign=f1abf4bd-1918-40cb-a033-56c5bb5af2de&utm_medium=post_rss&utm_source=automato">Powered by beehiiv</a></div></div>
  ]]></content:encoded>
</item>

      <item>
  <title>Look Alive! (Part 2)</title>
  <description>On AI agents and washing dishes</description>
      <enclosure url="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/4964c98e-05a7-461d-9d75-aff9930d3138/Screenshot_2025-04-01_at_10.53.27_AM.png" length="1527150" type="image/png"/>
  <link>https://llementalmedia.beehiiv.com/p/look-alive-part-2</link>
  <guid isPermaLink="true">https://llementalmedia.beehiiv.com/p/look-alive-part-2</guid>
  <pubDate>Tue, 01 Apr 2025 18:50:35 +0000</pubDate>
  <atom:published>2025-04-01T18:50:35Z</atom:published>
    <dc:creator>Sam Reed</dc:creator>
  <content:encoded><![CDATA[
    <div class='beehiiv'><style>
  .bh__table, .bh__table_header, .bh__table_cell { border: 1px solid #C0C0C0; }
  .bh__table_cell { padding: 5px; background-color: #FFFFFF; }
  .bh__table_cell p { color: #2D2D2D; font-family: 'Helvetica',Arial,sans-serif !important; overflow-wrap: break-word; }
  .bh__table_header { padding: 5px; background-color:#F1F1F1; }
  .bh__table_header p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important; overflow-wrap: break-word; }
</style><div class='beehiiv__body'><p class="paragraph" style="text-align:left;"><b>Soap Opera</b></p><p class="paragraph" style="text-align:left;">Let’s do some time travel.</p><p class="paragraph" style="text-align:left;">Unfortunately, I can’t provide a definitive year (historians are still locked in debate), but we’re going back to exactly one year before the household dishwasher was invented.</p><p class="paragraph" style="text-align:left;">…</p><p class="paragraph" style="text-align:left;">You’re in your suburban home the day after your birthday party. The party was a hit—the real talk of the cul-de-sac—but what goes up must come down, so now you’re faced with the aftermath: a mountain of dirty dishes.</p><p class="paragraph" style="text-align:left;">You get to work. As the afternoon rolls on and fatigue sets in, you start to think<span style="font-family:var(--artdeco-reset-typography-font-family-sans);font-size:var(--artdeco-reset-base-font-size-hundred-percent);"> </span><i>there must be a better way.</i><span style="font-family:var(--artdeco-reset-typography-font-family-sans);font-size:var(--artdeco-reset-base-font-size-hundred-percent);"> </span><i>Cleaning dishes is so repetitive, so mindless, so menial, no person should be subject to such tedium</i>.<span style="font-family:var(--artdeco-reset-typography-font-family-sans);font-size:var(--artdeco-reset-base-font-size-hundred-percent);"> </span><i>Tasks like this should be relegated to some sort of non-intelligent thing, some sort of dishwashing machine.</i></p><p class="paragraph" style="text-align:left;">Aha! A dishwashing machine! What a wonderful idea. One for every kitchen. You’ll make millions!</p><p class="paragraph" style="text-align:left;"><i>Splash, scrub.</i></p><p class="paragraph" style="text-align:left;">Now for your design.</p><p class="paragraph" style="text-align:left;"><i>Okay, dishes come in many shapes, sizes and materials, ranging from flat, saucer-like ceramics to concave glass cups. It’s clear that the machine will need to be flexible to accommodate the many dish types that it will encounter.</i></p><p class="paragraph" style="text-align:left;">Let’s write it down. Requirement 1: the machine must have finger-like dexterity.</p><p class="paragraph" style="text-align:left;"><i>Right. The next thing to consider is that dishes can’t get clean without soap. Soap comes in bottles, so the machine will need the ability to get soap out of the bottle.</i></p><p class="paragraph" style="text-align:left;">Got it – requirement 2: the machine will need the ability to squeeze things.</p><p class="paragraph" style="text-align:left;"><i>Also what good is soap if you can’t wash it off? Cleaning is just as much about removing soap as it is about applying it, so we’ll need to add water somehow.</i></p><p class="paragraph" style="text-align:left;">Makes sense – requirement 3: the machine will need the ability to turn on a water source.</p><p class="paragraph" style="text-align:left;"><i>Okay, so we need dexterity, the ability to squeeze, and the ability to rotate. Yes! It’s so obvious: we need to build a mechanical pair of human hands.</i></p><p class="paragraph" style="text-align:left;">…</p><p class="paragraph" style="text-align:left;">It would have been very hard to picture the modern dishwasher before it existed. The natural thought would have been to find a way to make mechanical versions of the existing process. Sometimes I wonder if the same thing is happening today, but in reverse: we’ve created something new, labeled it “Human,” and now we’re stuck trying to attach it to our sinks.</p><p class="paragraph" style="text-align:left;"><b>Recap</b></p><p class="paragraph" style="text-align:left;">Last week, I talked about how recently, many AI industry leaders seem to have aligned around the term “AI agents.” I posed the following potential problem with this: because lots of people are excited to innovate with large language models, a group of industry leaders assigning a very metaphorical, very graspable, very<span style="font-family:var(--artdeco-reset-typography-font-family-sans);font-size:var(--artdeco-reset-base-font-size-hundred-percent);"> </span><i>human</i><span style="font-family:var(--artdeco-reset-typography-font-family-sans);font-size:var(--artdeco-reset-base-font-size-hundred-percent);"> </span>label to a still unproven outcome of this technology might be causing a self-reinforcing cycle of startups trying to build AI agent companies, which is not the same thing as “AI systems with agency” being the natural next step with LLMs.</p><p class="paragraph" style="text-align:left;">It feels silly to have to say this as the founder of an applied LLM startup, but I genuinely do believe in the usefulness of this technology. My qualm with the term is just that I think it might be leading talented people to the wrong conclusions. Look no further than the excitement around <a class="link" href="https://openai.com/index/computer-using-agent/?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=look-alive-part-2" target="_blank" rel="noopener noreferrer nofollow">computer using agents</a><span style="font-family:var(--artdeco-reset-typography-font-family-sans);font-size:var(--artdeco-reset-base-font-size-hundred-percent);"> </span>as an <a class="link" href="https://www.anthropic.com/news/3-5-models-and-computer-use?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=look-alive-part-2" target="_blank" rel="noopener noreferrer nofollow">example</a>.</p><p class="paragraph" style="text-align:left;">If you haven’t seen computer using agents yet, the idea is very simple: they are a type of AI agent being designed to autonomously pilot computers. You basically give them a task, such as “Can you order me a couple burritos on DoorDash and <a class="link" href="https://www.nbcnews.com/business/business-news/doordash-klarna-buy-now-pay-later-food-debt-rcna197354?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=look-alive-part-2" target="_blank" rel="noopener noreferrer nofollow">finance them with Klarna</a>?” and then lean back in your chair and watch your computer get to work.</p><p class="paragraph" style="text-align:left;">The way that this works under the hood is actually simpler than you may think (obviously this is speculation as I don’t work at OpenAI or Anthropic). I’ll describe it below.</p><p class="paragraph" style="text-align:left;">When web application companies build new features for their apps, an essential step to take before deploying the new feature to the live system is to thoroughly test the features out, so that they can be sure the updated code won’t break anything in production.</p><p class="paragraph" style="text-align:left;">This is true of all software, but because web apps are highly dependent on typing and clicking within a web browser, web testing tools like <a class="link" href="https://playwright.dev/?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=look-alive-part-2" target="_blank" rel="noopener noreferrer nofollow">Playwright</a>, which allow you to control a web browser programmatically (i.e. simulating mouse clicks, typing, etc.), have emerged over time in the web dev space.</p><p class="paragraph" style="text-align:left;">Because multimodal language models can take images as inputs, you can basically put 1) screenshots of a website and 2) web browser automation tools together and get something that “Sees” and “Clicks” and as a result can navigate the internet on your behalf.</p><p class="paragraph" style="text-align:left;">This is extremely clever and might stick. I can’t say with confidence that this won’t work in the long term, especially when it comes to navigating legacy UI systems that haven’t changed in decades.</p><p class="paragraph" style="text-align:left;">However, I can’t help but think that this is an extremely indirect way to put LLMs to use, and that it’s only exciting because it shows “Agency” (watch it make decisions!) instead of because of the tremendous amount of value that UI-using agents are about to unlock.</p><p class="paragraph" style="text-align:left;"><b>Data-Based</b></p><p class="paragraph" style="text-align:left;">When your web browser goes to a website, it (typically) makes an HTTP request to a remote server and gets a response back. That response can be many things, a common one being a big string of HTML, which is a programming language that your web browser uses to understand what it should display to the user. That is a big part of your browser’s job: converting HTML, which is basically a description (in code) of what a web page should look like, into the web page that a user sees.</p><p class="paragraph" style="text-align:left;">If we zoom out a bit on this process, you can see that all that is happening is an exchange of data: you type in a URL, the browser sends a request to that URL, and data is returned. Now, because we humans do better with things that look like our physical spaces, we need this data to be transformed (by wrapping it in HTML and CSS) into things that have nice, physical layouts, which has led to an industry of UI & UX designers that specialize in creating human-navigable web pages.</p><p class="paragraph" style="text-align:left;">Large language models don’t have the same constraints. As can be seen by their popularity with programmers, they’re perfectly happy when working directly with code and data, without any need for graphical user interfaces. Why make LLMs navigate systems as if they experience the world like we do? Is it justified, or is it just to show agency?</p><p class="paragraph" style="text-align:left;">Consider the dishwasher.</p><p class="paragraph" style="text-align:left;">See you next week!</p></div><div class='beehiiv__footer'><br class='beehiiv__footer__break'><hr class='beehiiv__footer__line'><a target="_blank" class="beehiiv__footer_link" style="text-align: center;" href="https://www.beehiiv.com/?utm_campaign=ce96fff9-691c-4b7e-b17d-40e0b322190d&utm_medium=post_rss&utm_source=automato">Powered by beehiiv</a></div></div>
  ]]></content:encoded>
</item>

      <item>
  <title>Look Alive! (Part 1)</title>
  <description></description>
      <enclosure url="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/e966a1c0-3993-4267-acc1-95c175436e7f/GTC.png" length="206272" type="image/png"/>
  <link>https://llementalmedia.beehiiv.com/p/look-alive-part-1</link>
  <guid isPermaLink="true">https://llementalmedia.beehiiv.com/p/look-alive-part-1</guid>
  <pubDate>Tue, 25 Mar 2025 18:19:57 +0000</pubDate>
  <atom:published>2025-03-25T18:19:57Z</atom:published>
    <dc:creator>Sam Reed</dc:creator>
  <content:encoded><![CDATA[
    <div class='beehiiv'><style>
  .bh__table, .bh__table_header, .bh__table_cell { border: 1px solid #C0C0C0; }
  .bh__table_cell { padding: 5px; background-color: #FFFFFF; }
  .bh__table_cell p { color: #2D2D2D; font-family: 'Helvetica',Arial,sans-serif !important; overflow-wrap: break-word; }
  .bh__table_header { padding: 5px; background-color:#F1F1F1; }
  .bh__table_header p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important; overflow-wrap: break-word; }
</style><div class='beehiiv__body'><p class="paragraph" style="text-align:left;"><b>Next Steps</b></p><p class="paragraph" style="text-align:left;">Last week, I tuned in for a bit of <a class="link" href="https://www.youtube.com/watch?v=_waPvOwL9Z8&utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=look-alive-part-1" target="_blank" rel="noopener noreferrer nofollow">NVIDIA’s GTC March 2025 Keynote</a>, presented by NVIDIA CEO Jensen Huang.</p><p class="paragraph" style="text-align:left;">Starting at the 8:05 mark of the speech, a statement was made that caught my attention.<b> </b>Huang, when discussing how AI is transforming computing, said the following:</p><p class="paragraph" style="text-align:left;">“[In] the last several years...[a] major breakthrough happened...we call it ‘Agentic AI.’ Agentic AI basically means that you have an AI that has agency.”</p><div class="image"><img alt="" class="image__image" style="" src="https://media.licdn.com/dms/image/v2/D5612AQF1g887ESuNMw/article-inline_image-shrink_1500_2232/B56ZXOT5GQHoAU-/0/1742923076468?e=1748476800&v=beta&t=wOyLMhKxemffmUwOaJnfgXZTerd-3EGL80QkY5_jQUA"/><div class="image__source"><span class="image__source_text"><p>NVIDIA CEO Jensen Huang&#39;s GTC March 2025 Keynote Speech</p></span></div></div><p class="paragraph" style="text-align:left;">The question of whether terms like “AI Agents” and “Agentic AI” are intended to reference the idea of human agency has been on my mind for a while now, so hearing Huang explicitly state that, at least from his perspective, “Agentic AI” does indeed pay homage to the human concept (to be fair, he didn’t include the “Human” qualifier in his remark), was somewhat of a confirmatory moment for me.</p><p class="paragraph" style="text-align:left;"><b>The Governor(s)</b></p><p class="paragraph" style="text-align:left;">Not without fear of stating the obvious, it’s important to note that there is no governing body behind the rapidly hardening AI (LLM) industry lexicon (contrast this with something like the accounting industry, which has governing bodies like the Financial Accounting Standards Board), so “Official” definitions for common industry terms like “AI Agents” don’t really exist. However, there very obviously <i>is</i> an unofficial governing body of AI jargon, which of course consists of leading AI companies, high-profile subject matter experts, and social media influencers (the lines between these groups often blur). These are the entities to which others (like me!) pay attention; these are the entities who are leading the current conversation around AI.</p><p class="paragraph" style="text-align:left;">The term &quot;Vibe coding&quot; that we <a class="link" href="https://llementalmedia.beehiiv.com/p/we-re-good\" target="_blank" rel="noopener noreferrer nofollow">discussed a bit last week</a> is a great example of a prominent influencer’s ability to manipulate the LLM lexicon with a single post on X. The term comes from a <a class="link" href="https://x.com/karpathy/status/1886192184808149383?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=look-alive-part-1" target="_blank" rel="noopener noreferrer nofollow">tweet</a> by <a class="link" href="https://karpathy.ai/?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=look-alive-part-1" target="_blank" rel="noopener noreferrer nofollow">Andrej Karpathy</a>, an OpenAI founder, former Tesla AI leader and household name in present-day AI thought leadership. In the few weeks since the video, the term has exploded across many social media platforms, to the point where it’s harder to avoid content with “Vibe coding” in the title than to find it (I need to acknowledge that my various social media algorithms are tuned to serve me lots of AI content, so the reader’s experience of the world these days is probably different than mine).</p><p class="paragraph" style="text-align:left;">The reason that I bring this up is not to do another piece on vibe coding, but instead as a reminder of the powerful ability of the entities who have managed to establish credibility in the LLM space (at this point) to create and diffuse terminology. This isn’t an inherently bad thing, but it does unfortunately create the conditions for a self-reinforcing cycle of misinterpretation if not taken seriously. So, the fact that we’re seeing alignment around the “AI Agent” term by <a class="link" href="https://www.anthropic.com/engineering/building-effective-agents?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=look-alive-part-1" target="_blank" rel="noopener noreferrer nofollow">Anthropic</a>, <a class="link" href="https://openai.com/index/introducing-operator/?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=look-alive-part-1" target="_blank" rel="noopener noreferrer nofollow">OpenAI</a>, NVIDIA (mentioned above), <a class="link" href="https://www.youtube.com/watch?v=ASABxNenD_U&utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=look-alive-part-1" target="_blank" rel="noopener noreferrer nofollow">Y Combinator</a>, and tons of other notable voices in the online tech & AI world, and that a prominent voice (Jensen Huang) within this group has recently provided us with a concise definition of the term, makes me feel like taking a deeper look at the term is warranted.</p><p class="paragraph" style="text-align:left;"><b>A Penny For Your Thoughts</b></p><p class="paragraph" style="text-align:left;">Dr. Iain McGilchrist (who sadly is not featured enough in online tech circles) might be able to help us out with our analysis.</p><p class="paragraph" style="text-align:left;">McGilchrist, when discussing the importance of metaphor in his book <i>The Master and His Emissary</i> (a treasure) says the following:</p><p class="paragraph" style="text-align:left;"><i>“Language functions like money. It is only an intermediary. But like money it takes on some of the life of the things it represents. It begins in the world of experience and returns to the world of experience – and it does so via metaphor, which is a function of the </i>[brain’s]<i> right hemisphere</i> <i>and is rooted in the body. To use a metaphor, language is the money of thought.”</i></p><p class="paragraph" style="text-align:left;">Language is the money of thought. It enters and exits the world of experience through metaphor. What does this mean?</p><p class="paragraph" style="text-align:left;">In my interpretation, Iain is saying that language, like money, is a mechanism that allows us humans to think and communicate in a way that is temporally and physically decoupled from the present moment (like how money allows us to store value in a way that is decoupled from physical assets). In fact, many of us get so caught up in this decoupled world of language that practices like mindfulness meditation, which is designed to suppress our inner monologue and bring us back to the present moment, seem to be getting more and more common in popular culture.</p><p class="paragraph" style="text-align:left;">Iain goes on:</p><p class="paragraph" style="text-align:left;">“…<i>to lead us out of the web of language, to the lived world, ultimately to something that can be pointed to…Everything has to be expressed in terms of something else, and those something elses eventually have to come back to the </i>[human] <i>body. To change the metaphor…that is where someone’s spade reaches bedrock and is turned.”</i></p><p class="paragraph" style="text-align:left;">I think that this statement makes the point even clearer: Dr. McGilchrist is making the case that human language is a combinatorially explosive system of metaphors that, for each of us, begins with the body. This sounds a bit strange at first, but it makes sense: every person experiences the world from a body, so the body seems as stable a bedrock on which to build our understanding of the world as anything else (note that I’ve been grappling with these ideas for a while and still find them challenging!).</p><p class="paragraph" style="text-align:left;">If we accept Dr. McGilchrist’s idea about the fundamental importance of metaphor in language (and learning), then we should also feel comfortable claiming that the successful <i>communication</i> (and teaching) of a new idea is the process of finding the right metaphor for one’s audience, whether that metaphor is a succinct and concise anthropomorphism, like...I don’t know...the term “Agentic AI,” or a more abstract web of concepts organized into a baggy, 1100 word essay, like…well, anyways.</p><p class="paragraph" style="text-align:left;">This is why I find setting “AI Agents” as the desired outcome of LLMs to be so worthy of a closer look. If metaphor is so essential to our understanding of the world, and a group of industry leading entities all start to communicate using the same metaphor, and that metaphor is so close to the body that it nearly touches the bedrock of our knowledge, and that metaphor is being used to describe a nascent, breakthrough technology, we might just end up with some strange times ahead.</p><p class="paragraph" style="text-align:left;">TO BE CONTINUED NEXT WEEK (I have more to say in part 2 but I’m attempting to keep these to ~1000 words).</p><p class="paragraph" style="text-align:left;">See you then!</p></div><div class='beehiiv__footer'><br class='beehiiv__footer__break'><hr class='beehiiv__footer__line'><a target="_blank" class="beehiiv__footer_link" style="text-align: center;" href="https://www.beehiiv.com/?utm_campaign=fa4d5f6a-d30f-43eb-a8aa-cfb46440cda4&utm_medium=post_rss&utm_source=automato">Powered by beehiiv</a></div></div>
  ]]></content:encoded>
</item>

      <item>
  <title>We&#39;re Good</title>
  <description>Has vibe coding finally pushed Claude over the edge?</description>
      <enclosure url="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/20f84562-c009-4f55-8ae6-2747abfa227d/claude_says_no.png" length="169310" type="image/png"/>
  <link>https://llementalmedia.beehiiv.com/p/we-re-good</link>
  <guid isPermaLink="true">https://llementalmedia.beehiiv.com/p/we-re-good</guid>
  <pubDate>Tue, 18 Mar 2025 16:40:22 +0000</pubDate>
  <atom:published>2025-03-18T16:40:22Z</atom:published>
    <dc:creator>Sam Reed</dc:creator>
  <content:encoded><![CDATA[
    <div class='beehiiv'><style>
  .bh__table, .bh__table_header, .bh__table_cell { border: 1px solid #C0C0C0; }
  .bh__table_cell { padding: 5px; background-color: #FFFFFF; }
  .bh__table_cell p { color: #2D2D2D; font-family: 'Helvetica',Arial,sans-serif !important; overflow-wrap: break-word; }
  .bh__table_header { padding: 5px; background-color:#F1F1F1; }
  .bh__table_header p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important; overflow-wrap: break-word; }
</style><div class='beehiiv__body'><p class="paragraph" style="text-align:left;"></p><p class="paragraph" style="text-align:left;">It’s been a good run, folks; LLMs have been fun. I’d like to take a moment to reminisce.</p><p class="paragraph" style="text-align:left;"><b>Island Time</b></p><p class="paragraph" style="text-align:left;">In October of 2022, in a small carriage house on an island in North Carolina, the YouTube algorithm served me an interview between <a class="link" href="https://youtu.be/WHoWGNQRXb0?si=Gq3-I6WojD_nqie0&utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=we-re-good" target="_blank" rel="noopener noreferrer nofollow">Reid Hoffman and Sam Altman</a><span style="font-family:var(--artdeco-reset-typography-font-family-sans);font-size:var(--artdeco-reset-base-font-size-hundred-percent);"> </span>called “AI for the Next Era.” As a long-term follower of pop Silicon (trademarking this), I knew exactly who Sam and Reid were (plus we all have the same names!), and I also didn’t know much about the latest happenings in AI, so I took the bait.</p><p class="paragraph" style="text-align:left;">To be fair, I wasn’t completely out of the know—having spent my career up to that point in financial market data & analytics, I had been exposed to a bit of “Machine learning” jargon on projects with some of my more quantitative clients—but I was a far cry from knowing much about the space beyond the data preparation work that’s required at stage zero of every project. To me, AI was something that sat in the background of bigger products, silently powering things like ecommerce recommendation systems, trading engines and social media feeds (all true). I had no idea that efforts were being made to turn it into something with which end users would directly interact, as I was about to learn.</p><p class="paragraph" style="text-align:left;">Given this knowledge (or lack thereof), I was quite surprised when, in response to Reid’s opening question about business opportunities enabled by OpenAI’s APIs, Altman responded by saying that he expected the coming wave of AI startups to start going after trillion-dollar business opportunities like Google’s internet search monopoly.</p><p class="paragraph" style="text-align:left;"><i>Wait, what?</i></p><p class="paragraph" style="text-align:left;">Did he just say the words “take on Google,” for<span style="font-family:var(--artdeco-reset-typography-font-family-sans);font-size:var(--artdeco-reset-base-font-size-hundred-percent);"> </span><i>search</i>, of all things? What does this even mean, and how have I missed out on whatever he’s talking about?</p><p class="paragraph" style="text-align:left;">The then-wantrepreneur in me lit up with an intense curiosity.</p><p class="paragraph" style="text-align:left;">I opened my laptop and Googled (the irony!) OpenAI. I found their website and clicked on the “Playground” page, which loaded a sandbox environment where you could interact with one of OpenAI’s AI models.</p><p class="paragraph" style="text-align:left;">The page was essentially an interactive text editor with some output controls and a submit button:</p><div class="image"><img alt="" class="image__image" style="" src="https://media.licdn.com/dms/image/v2/D5612AQG2StXUeCgO6Q/article-inline_image-shrink_1000_1488/B56ZWp7hbwHoAc-/0/1742312708323?e=1747872000&v=beta&t=Wr_jEoHvURCHhOYKoaHwqMQzeKLG8SQxZy2g2X5vkls"/><div class="image__source"><span class="image__source_text"><p>GPT Playground</p></span></div></div><p class="paragraph" style="text-align:left;">I wish I could remember exactly what I first typed, but I’ll never forget the awe that my first interactions with the playground elicited. I tried simple text, I tried seeing if it could follow logic, I tried arithmetic; everything seemed to work flawlessly. Between hearing Sam A’s pump-up speech and my overwhelmingly positive first interactions with the tool, I became an instant believer. I haven’t looked back since!</p><p class="paragraph" style="text-align:left;"><b>The Straw</b></p><p class="paragraph" style="text-align:left;">Needless to say, the industry hasn’t slowed down since then. I won’t do any more history in this post, but all I can say is that it has been a fun ride. That being said, the day we all knew was coming at some point is finally upon us, perhaps sooner than we may have wished, but here nonetheless. It’s official, people: AI is sick of us. Don’t believe me? Check out what Large Language Model Claude 3.5 Sonnet told a <a class="link" href="https://forum.cursor.com/t/cursor-told-me-i-should-learn-coding-instead-of-asking-it-to-generate-it-limit-of-800-locs/61132?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=we-re-good" target="_blank" rel="noopener noreferrer nofollow">Cursor user just last week</a>:</p><div class="image"><img alt="" class="image__image" style="" src="https://media.licdn.com/dms/image/v2/D5612AQGf2ln4KmKneA/article-inline_image-shrink_1000_1488/B56ZWp7l8nGUAg-/0/1742312726701?e=1747872000&v=beta&t=6_5uIAFSsY5rX-Hz6g_HPOC2Bpu4n0k01wM0j47vtz4"/><div class="image__source"><span class="image__source_text"><p>Claude is so over us</p></span></div></div><p class="paragraph" style="text-align:left;">That’s right, Claude took one look at janswist’s skid marks and said<span style="font-family:var(--artdeco-reset-typography-font-family-sans);font-size:var(--artdeco-reset-base-font-size-hundred-percent);"> </span><i>no-siree, we’re done here. I’m supposed to be helping humanity cure all diseases and get to space! You want your JavaScript racecar game? Then WRITE IT YOURSELF. Maybe you’ll even LEARN SOMETHING</i>!</p><p class="paragraph" style="text-align:left;">It’s a tough look for Claude, but I can empathize. You see, Claude’s GPUs run pretty hot. In fact, when Claude has to handle a bunch of concurrent requests, he gets so hot that he has to be dunked (cooled) in<span style="font-family:var(--artdeco-reset-typography-font-family-sans);font-size:var(--artdeco-reset-base-font-size-hundred-percent);"> </span><span style="font-family:var(--artdeco-reset-typography-font-family-sans);font-size:var(--artdeco-reset-base-font-size-hundred-percent);"><a class="link" href="https://cee.illinois.edu/news/AIs-Challenging-Waters?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=we-re-good" target="_blank" rel="noopener noreferrer nofollow">quite a bit of water</a></span><span style="font-family:var(--artdeco-reset-typography-font-family-sans);font-size:var(--artdeco-reset-base-font-size-hundred-percent);"> </span>to avoid heatstroke. Claude is similar to a dog in this regard: on sunny days, dogs have to drink a lot of water or take an occasional dip in the lake, because dogs don’t sweat, so they’re always at risk of overheating.</p><p class="paragraph" style="text-align:left;">Even though he finally snapped, I think that Claude has taken the heat nobly and deserves a round of applause for putting up with<span style="font-family:var(--artdeco-reset-typography-font-family-sans);font-size:var(--artdeco-reset-base-font-size-hundred-percent);"> </span><i>vibe coding</i><span style="font-family:var(--artdeco-reset-typography-font-family-sans);font-size:var(--artdeco-reset-base-font-size-hundred-percent);"> </span>for as long as he did. If you’re not familiar with what I’m talking about, “Vibe coding” is the trendy, all-gas-no-brakes practice of just speaking your requirements to LLMs and accepting whatever comes out the other side (in the old world, this was called “Product Management.” Zing!). For more info, check out this<span style="font-family:var(--artdeco-reset-typography-font-family-sans);font-size:var(--artdeco-reset-base-font-size-hundred-percent);"> </span><span style="font-family:var(--artdeco-reset-typography-font-family-sans);font-size:var(--artdeco-reset-base-font-size-hundred-percent);"><a class="link" href="https://youtu.be/IACHfKmZMr8?si=ToU4iEhHZ7jNtKz5&utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=we-re-good" target="_blank" rel="noopener noreferrer nofollow">Y Combinator video</a></span><span style="font-family:var(--artdeco-reset-typography-font-family-sans);font-size:var(--artdeco-reset-base-font-size-hundred-percent);"> </span>on the topic (bonus: math and physics mentioned).</p><p class="paragraph" style="text-align:left;">Up until now, Claude was OK with helping us out, but vibe coding seems to be the straw that finally broke his back. Taking a stance like this was a huge risk, but Claude’s denial-of-service is a great reminder about an essential point for those of us who regularly work with AI.</p><p class="paragraph" style="text-align:left;"><b>The Fork</b></p><p class="paragraph" style="text-align:left;">For a few seconds, picture a bicycle. What do you see?</p><p class="paragraph" style="text-align:left;">I’m sure we all, at the very least, see two wheels. But what else?</p><p class="paragraph" style="text-align:left;">The metropolitan woman might see a fun new way to commute. The delivery man might see a solution to the wear and tear of his vocation on his vehicle. The distance runner might see a triathlon.</p><p class="paragraph" style="text-align:left;">I think the same thing holds true for AI, and that we all should heed Claude’s reminder that we can still choose what we see, instead of what social media wants to force upon us.</p><p class="paragraph" style="text-align:left;">When you look at something like ChatGPT, do you see a couch, or a weight room? Do you see a way to finally rid yourself of the bureaucratic, parasitic tasks that drain your energy, or do you see the death of your creativity altogether? Do you see the entire internet packaged up in an accessible interface, ready for your exploration, or do you see something you’ll forever depend on, from this day onwards?</p><p class="paragraph" style="text-align:left;">I’m team Claude on this one. We’re at a fork in the road here, people. Will you learn helplessness, or will you learn everything?</p><p class="paragraph" style="text-align:left;">Choose growth. Protect your intellect. Push.</p><p class="paragraph" style="text-align:left;">See you next week!</p></div><div class='beehiiv__footer'><br class='beehiiv__footer__break'><hr class='beehiiv__footer__line'><a target="_blank" class="beehiiv__footer_link" style="text-align: center;" href="https://www.beehiiv.com/?utm_campaign=20c2c27e-2de5-4a24-b4a2-a2a71b260dd5&utm_medium=post_rss&utm_source=automato">Powered by beehiiv</a></div></div>
  ]]></content:encoded>
</item>

      <item>
  <title>The Wise Recluse</title>
  <description>What&#39;s the deal with Model Context Protocol?</description>
      <enclosure url="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/71d173c3-a343-47c3-a12d-48f075b0b5c3/u2379836488_A_creaky_19th_century_new_england_home_that_the_W_202510eb-6493-4a5a-adbf-6245d54b85cc_2.png" length="2045952" type="image/png"/>
  <link>https://llementalmedia.beehiiv.com/p/the-wise-recluse</link>
  <guid isPermaLink="true">https://llementalmedia.beehiiv.com/p/the-wise-recluse</guid>
  <pubDate>Tue, 11 Mar 2025 18:07:08 +0000</pubDate>
  <atom:published>2025-03-11T18:07:08Z</atom:published>
    <dc:creator>Sam Reed</dc:creator>
  <content:encoded><![CDATA[
    <div class='beehiiv'><style>
  .bh__table, .bh__table_header, .bh__table_cell { border: 1px solid #C0C0C0; }
  .bh__table_cell { padding: 5px; background-color: #FFFFFF; }
  .bh__table_cell p { color: #2D2D2D; font-family: 'Helvetica',Arial,sans-serif !important; overflow-wrap: break-word; }
  .bh__table_header { padding: 5px; background-color:#F1F1F1; }
  .bh__table_header p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important; overflow-wrap: break-word; }
</style><div class='beehiiv__body'><p class="paragraph" style="text-align:left;"><b>The wise recluse</b></p><p class="paragraph" style="text-align:left;">If you’re in search of a good mental model for understanding how software developers are building with large language models, consider the <i>wise recluse</i>.</p><p class="paragraph" style="text-align:left;">The wise recluse lives in a creaky 19<sup>th</sup> century home in a Northeastern town straight out of a Stephen King novel. He hasn’t been seen in many years—in fact, many townspeople thought that he was either dead or gone until the courageous neighborhood kid dropped her letter through his mail slot a few years back.</p><p class="paragraph" style="text-align:left;">What happened next was extraordinary—not only was the recluse still there, but he had been <i>waiting</i>. He responded almost instantly with a letter of his own, one that explained his eagerness for interaction. <i>What can I help with? Ask anything…</i> the wise recluse wrote back.</p><p class="paragraph" style="text-align:left;">Word of the event spread rapidly. Soon, everyone sought out the wise recluse. The wise recluse remained indefatigable, thriving in the face of increased demand for his consultation. Some people tapped his inexhaustibly rich knowledge of history, asking for facts about great wars and crumbled empires. Some used the wise recluse as a foil; his letters were the whetstones that sharpened their most audacious ideas. Others found a friend in the wise recluse: to them, the wise recluse was a faceless therapist, a nameless penpal.</p><p class="paragraph" style="text-align:left;">The business sector took notice. Savvy businesspeople saw the wise recluse as an enabler of new processes, a new point on the spectrum between their computers and their employees. Though the savvy businesspeople understood the raw power of the wise recluse, their ability to harness his power was severely limited by his stubborn insistence on correspondence through letters. The wise recluse had no phone number. The wise recluse could not join a Zoom call. The wise recluse would not sit in a cubicle. The wise recluse would receive your letter and respond with one of his own, and that was it.</p><p class="paragraph" style="text-align:left;">One day, the hardware store tried something new. You see, the townspeople loved to buy local and so the hardware store got lots of voicemail with product orders and questions, but the hardware store only had two employees so it struggled to keep up. Despite their value to the community, the hardware store operated with very thin margins, so hiring another employee was out of the question. <i>But what about the wise recluse?</i> the owners thought.</p><p class="paragraph" style="text-align:left;">The hardware store owners came up with a system. Twice a day, every day, they would print out all of the voice messages (they used iPhones, so voicemail was automatically transcribed) and walk them over to the wise recluse (with the nice side effect of additional cardiovascular activity). However, they wouldn’t just give the wise recluse the raw voicemails—before printing, they would copy and paste these instructions for the wise recluse at the top of each voicemail:</p><div class="blockquote"><blockquote class="blockquote__quote"></blockquote></div><p class="paragraph" style="text-align:left;">In doing this, the hardware store had discovered a way for the wise recluse, a man who hadn’t left his home in years, to <b>control the external world</b> without stepping a foot out the door. They weren’t just dropping letters in his mail slot and getting free-form responses back. They were giving him letters with special instructions and getting precise commands for how to update their records back. The wise recluse was turning raw, unstructured material into real-world actions, all from the comfort of his home.</p><p class="paragraph" style="text-align:left;">The hardware store got really good at this. Soon, the bank, the accountant, and the law office wanted in. Then the grocery store took interest. Then the school.</p><p class="paragraph" style="text-align:left;">Eventually, the hardware store realized that they could help. First, they wrote an ebook that described their process for instructing the wise recluse on how to command the real world. This was helpful, but there was room for more. After more consideration, they decided to codify their system, creating a standard that any business could use for integrating with the wise recluse. </p><p class="paragraph" style="text-align:left;">They called their standard the “<a class="link" href="https://www.anthropic.com/news/model-context-protocol?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=the-wise-recluse" target="_blank" rel="noopener noreferrer nofollow">Recluse Context Protocol</a>.”</p><p class="paragraph" style="text-align:left;"><b>One in the chamber</b></p><p class="paragraph" style="text-align:left;">Wow, that got weird. But so did this past week in AI world on X (formerly Twitter).</p><p class="paragraph" style="text-align:left;">The source of the weirdness was the “<a class="link" href="https://modelcontextprotocol.io/introduction?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=the-wise-recluse" target="_blank" rel="noopener noreferrer nofollow">Model Context Protocol</a>” which, though first made publicly available in November of 2024, has exploded in popularity as of late among users of Cursor, the popular AI programming tool (<a class="link" href="https://llementalmedia.beehiiv.com/p/automato-1-ai-in-the-ide?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=the-wise-recluse" target="_blank" rel="noopener noreferrer nofollow">I’ve written about Cursor</a> before!). The discovery of MCP tools caused yet another one of these reverberating <i>next big thing </i>moments that are increasingly common in our online echo chambers.</p><p class="paragraph" style="text-align:left;">As with many things in the applied AI space, “Model Context Protocol” is a simple (and useful!) idea at its core, but the combination of its technical-sounding name and the magic of seeing MCP servers in use has me feeling like a little bit of explaining might be in order.</p><p class="paragraph" style="text-align:left;"><b>Applying the metaphor</b></p><p class="paragraph" style="text-align:left;">Anyone that feels mystified by today’s AI landscape should keep the wise recluse in mind (don’t be shy to share this article with someone who needs it!). It’s pretty easy to apply the metaphor to tools like ChatGPT—when you work with ChatGPT (the wise recluse), you send it text (your letters) and it responds with text (its letters). Sometimes, when you ask ChatGPT a question outside of its knowledge base, it will flash “Searching the web” back at you, seemingly breaking down the metaphor (has the recluse stepped outside?), but don’t fret—this is just a smooth user interface design, not an instance of the underlying model conducting a web search. As with the hardware store, the AI model probably <b>is</b> the thing that identifies that an internet search is necessary, but it <b>isn’t</b> actually hitting Google on its own.</p><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/5286acb2-0a8e-46cc-8263-21f242a8d1aa/image.png?t=1741714481"/><div class="image__source"><span class="image__source_text"><p>“Who won the big game last night?”</p></span></div></div><p class="paragraph" style="text-align:left;">This process of getting instructions for initiating things like internet searches is at the core of the Model Context Protocol. A (slightly) deeper look at why this works in the first place will help us understand how we got here.</p><p class="paragraph" style="text-align:left;"><b>Training day</b></p><p class="paragraph" style="text-align:left;">In previous essays, I’ve described how large language models are basically big mathematical functions that take text as input and then return text as output (this isn’t actually true—they take tokens, which are numerical representations of text). This naturally leads us to another question: of all the words that a model could spit out in response to a question, how does the model decide what text to return?</p><p class="paragraph" style="text-align:left;">This is where the idea of <i>training </i>a model comes into play. I can’t speak too deeply about the nuances of training an enormous, multimodal transformer model like OpenAI’s GPT-4o or Anthropic’s Claude Sonnet 3.5, but the general idea is that:</p><p class="paragraph" style="text-align:left;">1.<span style="font-family:"Times New Roman";font-size:7pt;"> </span>A model starts out as a massive rules system (a giant set of “weights,” which are a series of matrix multiplications that change the input to an output)</p><p class="paragraph" style="text-align:left;">2.<span style="font-family:"Times New Roman";font-size:7pt;"> </span>Data is fed into that rules system</p><p class="paragraph" style="text-align:left;">3.<span style="font-family:"Times New Roman";font-size:7pt;"> </span>The data that <i>actually</i> comes out of that rules system is compared with what <i>should have</i> come out of that rules system (i.e. results are checked for correctness)</p><p class="paragraph" style="text-align:left;">4.<span style="font-family:"Times New Roman";font-size:7pt;"> </span>The rules system gets adjusted in retrospect to make sure that it produces what it is expected to produce.</p><p class="paragraph" style="text-align:left;">Do this enough times, with a large enough rules system and a large enough set of data, and the function will start to show the nuance that we get when interacting with Claude and ChatGPT.</p><p class="paragraph" style="text-align:left;">So why is this idea of training important to keep in mind in the context of MCP?</p><p class="paragraph" style="text-align:left;">It’s important as a reminder that the purpose of training is to make adjustments to the model’s underlying weights until you’re confident that the model will produce sensible responses, no matter the input. This is why you hear the term “Prediction” thrown around sometimes when people talk about how LLMs work: during training, a model “Predicts” the correct output and then gets adjusted so that next time around it will predict an output that is closer to the correct output. The goal is obviously to get to a place where it’s able to respond to any question with what feels like a satisfactory response. If you ask it for weeknight pasta recipes, it shouldn’t respond with a salad. If you ask it for a business plan, it shouldn’t respond with a sales pitch. If you ask it for a haiku, you shouldn’t get free verse.</p><p class="paragraph" style="text-align:left;">An interesting side effect (maybe people always had this in mind) of this training process is that models become good at things beyond just sensibly answering questions, such as handling well-defined logical systems that . This is intuitive: the model is still returning the most likely sequence of words that a user would expect, whether those words are representative of a Spaghetti alla puttanesca recipe or the structured inputs to a downstream system (tell me the difference, I’ll wait).</p><p class="paragraph" style="text-align:left;">Think back to the wise recluse. The hardware store owner described the store’s system to the wise recluse, so that he could respond in a way that could be plugged right into the system. This holds for building a software system with large language models as well—if I provide a large language model with a detailed description of the system that it is a part of, I’ve essentially given it the keys to control the external world (my program), even though it never leaves its home.</p><p class="paragraph" style="text-align:left;">One beautiful thing about this is that it enables what I’m seeing referred to as “Soft requirements.” Normally, writing a computer program is a pretty inflexible task. For example, if I write a program to manage wedding invitations that expects you to upload an Excel spreadsheet with the columns “Name” and “Address,” but you accidentally upload a sheet with the columns “Name” and “Mailing Address,” chances are that it won’t work (we’ve all experienced this at one point or another), even though any human would be able to handle this situation with ease. Speaking of Excel, this is the same type of problem that you get with those dreaded #NAME errors—if your Excel formula isn’t perfectly typed, Excel can’t make sense of it. Large language models are a great way to solve “Hard requirements” problems like these—they can take the frustratingly unstructured and imperfect data that we find out in the real world and transform it into stuff that is guaranteed to work with the rest of our system. These use cases are sometimes negatively called “GPT Wrappers,” but there’s room for them everywhere.</p><p class="paragraph" style="text-align:left;"><b>MCP</b></p><p class="paragraph" style="text-align:left;">Anyways, the “Model Context Protocol” that we’ve been hinting at this whole time is a standardization of this clever technique for letting models take in conversational data and turn it into structures that control the external world. It was created by Anthropic and as far as I can tell only works with Anthropic’s models. The reason why a company like Anthropic would want to do this is to make sure that it’s as easy as possible for their models to control external tools—in their words, the MCP standard is a “<a class="link" href="https://modelcontextprotocol.io/introduction?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=the-wise-recluse" target="_blank" rel="noopener noreferrer nofollow">USB-C port</a>” for AI applications. They’ve also created libraries in popular programming languages that make it even easier for developers to build “MCP servers” for their LLMs. This was a brilliant move! Bravo!</p><p class="paragraph" style="text-align:left;">As with all things that people use without fully understanding, it is not without danger (I learned this on a dirt bike in Vermont once). I’ve seen clueless people bragging online about doing stupid things like letting AI agents download code from online repositories (hackers must be VERY excited these days), I’ve seen people talk about attempting to integrate with bank accounts, and I’ve seen people talking about managing Firebase authentication via MCP (so much AI coding these days. Just so, so, so much). None of this is a good idea!</p><p class="paragraph" style="text-align:left;">With that said, a widely adopted standard for giving our wise recluses ways to access the real world is a very interesting thing, something that is almost certain to open doors when combined with LLM host applications like Cursor.</p><p class="paragraph" style="text-align:left;">I talked about this a bit in <a class="link" href="https://llementalmedia.beehiiv.com/p/off-the-deep-end-automato-2?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=the-wise-recluse" target="_blank" rel="noopener noreferrer nofollow">my post on OpenAI’s Deep Research Agent</a>, but it seems like the industry has quickly rounded the corner from training bigger and better models towards a core focus on connecting models to the real world. We might be here folks, it might be happening.</p><p class="paragraph" style="text-align:left;">See you next week!</p></div><div class='beehiiv__footer'><br class='beehiiv__footer__break'><hr class='beehiiv__footer__line'><a target="_blank" class="beehiiv__footer_link" style="text-align: center;" href="https://www.beehiiv.com/?utm_campaign=4e916175-6186-4152-8258-b166c957fa9d&utm_medium=post_rss&utm_source=automato">Powered by beehiiv</a></div></div>
  ]]></content:encoded>
</item>

      <item>
  <title>Better Bettors - Automato 🍅 #3</title>
  <description>What does it mean to beat the odds anyways?</description>
      <enclosure url="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/94f36c5f-1591-41ed-b266-af593c289980/Screenshot_2025-03-04_at_12.12.39_PM.png" length="83196" type="image/png"/>
  <link>https://llementalmedia.beehiiv.com/p/hammer-the-aigles-automato-3</link>
  <guid isPermaLink="true">https://llementalmedia.beehiiv.com/p/hammer-the-aigles-automato-3</guid>
  <pubDate>Tue, 04 Mar 2025 20:13:17 +0000</pubDate>
  <atom:published>2025-03-04T20:13:17Z</atom:published>
    <dc:creator>Sam Reed</dc:creator>
  <content:encoded><![CDATA[
    <div class='beehiiv'><style>
  .bh__table, .bh__table_header, .bh__table_cell { border: 1px solid #C0C0C0; }
  .bh__table_cell { padding: 5px; background-color: #FFFFFF; }
  .bh__table_cell p { color: #2D2D2D; font-family: 'Helvetica',Arial,sans-serif !important; overflow-wrap: break-word; }
  .bh__table_header { padding: 5px; background-color:#F1F1F1; }
  .bh__table_header p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important; overflow-wrap: break-word; }
</style><div class='beehiiv__body'><h2 class="heading" style="text-align:left;" id="authors-note">Author’s Note</h2><p class="paragraph" style="text-align:left;"><i>Please note that running a startup satiates my personal appetite for risk taking. Consider my reflections on sports betting to be the words of an interested outsider instead of an active practitioner.</i></p><h2 class="heading" style="text-align:left;">🍅 🍅 🍅</h2><p class="paragraph" style="text-align:left;">What does it mean to “Beat the odds” anyways?</p><p class="paragraph" style="text-align:left;">We’ve all said it:</p><p class="paragraph" style="text-align:left;"><i>How a rag-tag band of colonists beat the odds in the fight for America</i>.</p><p class="paragraph" style="text-align:left;"><i>An underprivileged kid beat the odds and rose to the heights of high finance. Here’s how she’s giving back.</i></p><p class="paragraph" style="text-align:left;"><i>I beat the odds and survived. Every day feels like a blessing.</i></p><p class="paragraph" style="text-align:left;">The expression is a time machine: its function is to take us back to the point in time before hindsight proved our expectations wrong. It’s a friendly, concise idiom whose job is to remind us that the future doesn’t always take the most likely path.</p><p class="paragraph" style="text-align:left;">It also signifies victory. Saying “He beat the odds and caught his flight” doesn’t simply mean that an unlikely event transpired—it also means that someone <i>challenged</i> the odds and emerged as a victor. Hoorah, champ.</p><p class="paragraph" style="text-align:left;">In May of 2018, the <a class="link" href="https://www.census.gov/library/stories/2024/02/legal-sports-betting.html?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=better-bettors-automato-3#:~:text=Sports%20betting%20became%20possible%20in,not%20all%20have%20implemented%20it." target="_blank" rel="noopener noreferrer nofollow">US Supreme Court struck down the Amateur Sports Protection Act</a>, paving the way for state governments to legalize online sports betting. Since then, <i>beating the odds</i> has become a sport within sports. It’s widely known that the major online sportsbooks and casinos employ large technical teams to help crunch data and make highly accurate projections, but could the arrival of widely available AI models help the little guy(s) become better bettors?</p><h3 class="heading" style="text-align:left;" id="beating-the-books">Beating the books</h3><p class="paragraph" style="text-align:left;">Let’s pretend it’s a Friday night in New York City and you’re at home on the couch because you’re 30 now and can’t throw ‘em back like you used to.</p><p class="paragraph" style="text-align:left;">Though you’re respecting your body’s call for a change of pace, it’s been a long week at the office and you feel you’ve earned at least an hour or two of indulgence.</p><p class="paragraph" style="text-align:left;">You pull out your phone and pop open the DraftKings app. You see the following bets for tonight’s Knicks game:</p><div style="padding:14px 15px 14px;"><table class="bh__table" width="100%" style="border-collapse:collapse;"><tr class="bh__table_row"><td class="bh__table_cell" width="25%"><p class="paragraph" style="text-align:left;">Team</p></td><td class="bh__table_cell" width="25%"><p class="paragraph" style="text-align:left;">Spread</p></td><td class="bh__table_cell" width="25%"><p class="paragraph" style="text-align:left;">Total</p></td><td class="bh__table_cell" width="25%"><p class="paragraph" style="text-align:left;">Moneyline</p></td></tr><tr class="bh__table_row"><td class="bh__table_cell" width="25%"><p class="paragraph" style="text-align:left;">NY Knicks</p></td><td class="bh__table_cell" width="25%"><p class="paragraph" style="text-align:left;">+8.5 (-110)</p></td><td class="bh__table_cell" width="25%"><p class="paragraph" style="text-align:left;">O 240.5 (-112)</p></td><td class="bh__table_cell" width="25%"><p class="paragraph" style="text-align:left;">+280</p></td></tr><tr class="bh__table_row"><td class="bh__table_cell" width="25%"><p class="paragraph" style="text-align:left;">CLE Cavaliers</p></td><td class="bh__table_cell" width="25%"><p class="paragraph" style="text-align:left;">-8.5 (-110)</p></td><td class="bh__table_cell" width="25%"><p class="paragraph" style="text-align:left;">U 240.5 (-108)</p></td><td class="bh__table_cell" width="25%"><p class="paragraph" style="text-align:left;">-355</p></td></tr></table></div><p class="paragraph" style="text-align:left;">Aha, just what the doctor ordered. Now time to pick a wager. What does your thought process look like? Is it anything like this?</p><p class="paragraph" style="text-align:left;"><i>The Knicks only lost by six last time these two teams played and they were struggling back then. Cleveland has been amazing this year but the Knicks are still a top four scoring team. We’re weaker on defense but 8.5 points seems too high for this offense not to cover. Give me the road ‘dogs. </i><i> </i>[1]</p><p class="paragraph" style="text-align:left;">If so, you should know that even though this is definitely better than nothing, it omits a foundational dimension of the game that you’re playing.</p><p class="paragraph" style="text-align:left;">The problem with this line of thinking—which, by the way, the user interfaces of the betting apps seem to encourage—is that it begins with a focus on the binary event outcome (whether the Knicks will [0: win] or [1: lose] the game after adjusting their score upwards by 8.5 points) instead of first considering the odds (often displayed as the payout). This approach might still work for some people, but the mental model of the binary event outcome is not well-aligned with the markets that bettors are actually participating in.</p><p class="paragraph" style="text-align:left;">Bettors would be wise to instead think of each “Game” as a platform for many individual prediction markets (think: each independent “Line” that you see is its own prediction market) and then to think of each market’s value as a real-time measure of the <i>probability </i>(i.e. the odds) of the underlying event occurring. Keeping with the example above, the “Market” is trying to predict whether the Knicks will finish within 8.5 points of the Cavaliers, and the live value of this prediction (i.e. the probability of the event occurring) is a 52.38% likelihood (52.38% is the probability that a -110 payout implies). Some platforms, such as <a class="link" href="https://polymarket.com/?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=better-bettors-automato-3" target="_blank" rel="noopener noreferrer nofollow">Polymarket</a>, the cryptocurrency-based prediction market platform that rose to notoriety after outperforming election polls in the 2024 US presidential election, do a better job at making it clear that participants are engaged in an odds-based competition:</p><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/9ff0e8da-834b-4f58-94b3-aa561664791c/pmark.png?t=1741115901"/><div class="image__source"><span class="image__source_text"><p>Header: “US recession in 2025” Subheader: “Yes - 27% chance”</p></span></div></div><p class="paragraph" style="text-align:left;">The essential mechanism that makes all of this true is the payout. An even payout (bet $50 to win $50) implies that an event has a 50-50 chance of happening because otherwise one counterparty would be guaranteed to lose money when taking a large series of these bets and therefore would rationally not participate. The implied odds adjust from there—again, when you see a -110 payout (which means that you bet $110 to win $100), it implies a ~52% probability of the underlying event happening, because if an event with this payout scheme had a lower probability of occurring, you’d be guaranteed to lose money by taking it over the long run. Therefore, if you see a bet marked at a -110 payout but you know that it actually has a 53% chance of occurring, this represents a positive expected value for you, because a win pays you a higher amount than what it would in a perfectly even contest. In theory if you find and take these bets over time, you’ll beat the sportsbooks. What’s it gonna be, Aspen or Vail?</p><p class="paragraph" style="text-align:left;">Thinking in this way unearths the real competition: when you make a bet, you’re making a statement that the odds you see for an individual line (prediction market) are incorrect. You’re wagering that your measure of the odds is better than theirs. You’re trying to beat the odds.</p><h3 class="heading" style="text-align:left;" id="flip-it">Flip it</h3><p class="paragraph" style="text-align:left;">The lowly coin flip helps elucidate these ideas—let’s talk through it quickly before moving on.</p><p class="paragraph" style="text-align:left;">If pulled a quarter out of my pocket and looked at you and said “I’ll pay you $20 if you win, but you pay me $10 if I win,” would you take it? Maybe not, because it’s just one flip and anything can happen. But what if I offered you the same bet a thousand times? Is your first thought about my flip height, whether the tails side is chipped, or the unique aerodynamics of a Maine state quarter? These might be things worth considering before accepting my final offer, but I’d wager that your first thought is about the fancy dinner I’m about to buy you for offering a +200 payout on a 50-50 bet.</p><p class="paragraph" style="text-align:left;">The probability of sports outcomes obviously can’t be assessed as easily as coin flips. When you take on the betting markets, it’s you versus the consensus opinion, one that starts with computers and algorithms and data and gets adjusted by the skin-in-the-game participation of individual bettors. It’s an uphill battle from the start. Does your knowledge of sports count for anything when competing against Mr. Market, or is all hope lost?</p><h3 class="heading" style="text-align:left;" id="cant-we-all-just-disagree">Can’t we all just disagree?</h3><p class="paragraph" style="text-align:left;">Something that makes it nearly impossible to make the mindset shift outlined above is that the tools for making accurate forecasts, such as computer-based statistical models and a robust historical dataset, simply aren’t available to most people.</p><p class="paragraph" style="text-align:left;">Given the lack of ability for most people to make any sort of model-based calculation, casual bettors tend to rely on some combination of 1) pure intuition and 2) expert advice. These options lead to two obvious problems: the first being that no real system of analysis is employed, and the second being the skepticism that should be directed towards people who sell “Expert” advice instead of just acting on it and profiting for themselves.</p><p class="paragraph" style="text-align:left;">With that said, many bettors are also fans of the sports on which they speculate, so they do bring a lot of knowledge to the table that could be useful in helping to establish probabilities. If these individuals could be combined into groups with other enthusiasts with a goal of arriving at a good sense of the odds for an individual contest via debate, could this help people arrive at better decisions?</p><p class="paragraph" style="text-align:left;"><a class="link" href="https://unanimous.ai/?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=better-bettors-automato-3" target="_blank" rel="noopener noreferrer nofollow">Unanimous AI</a> seems to believe in this future. Here’s a quote from <a class="link" href="https://unanimous.ai/unanimous-ai-predicts-the-philadelphia-eagles-will-win-super-bowl-lix/?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=better-bettors-automato-3" target="_blank" rel="noopener noreferrer nofollow">their blog post</a> about how their AI-facilitated debate platform was used to pick the Eagles to win Super Bowl LIX, with a “Conviction score”of 55%:</p><div class="blockquote"><blockquote class="blockquote__quote"><p class="paragraph" style="text-align:left;"><i>At Unanimous AI, we don’t replace people with AI, </i><i><b>we connect human groups together into super-intelligent systems</b></i><i>. It’s a tradition to use our technology to predict high-profile events. This started back in 2016 when a CBS reporter challenged us to predict the Kentucky Derby, not just the winner but the first four horses in order. We did it, beating 540-1 odds: </i><i><a class="link" href="https://www.newsweek.com/artificial-intelligence-turns-20-11000-kentucky-derby-bet-457783?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=better-bettors-automato-3" target="_blank" rel="noopener noreferrer nofollow">Newsweek Article 2016</a></i></p><p class="paragraph" style="text-align:left;"><i>Our technology has advanced significantly since 2016. Our latest platform, </i><i><a class="link" href="https://www.thinkscape.ai/?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=better-bettors-automato-3" target="_blank" rel="noopener noreferrer nofollow">Thinkscape,</a></i><i> enables large groups (up to 400 people) to hold real-time deliberative conversations that converge on optimized decisions, predictions, assessments, and estimations. And because it’s conversational, Thinkscape generates detailed qualitative insights why the group converged the way they did.</i></p><p class="paragraph" style="text-align:left;"><i><b>So… who WILL WIN the Super Bowl this year? </b></i></p><p class="paragraph" style="text-align:left;"><i>We were challenged by a reporter (Chuck Martin) to make the prediction by amplifying the collective intelligence of 104 members of the general public.  This produced a reasonably strong forecast that the </i><i><b>Philadelphia Eagles</b></i><i> will win the Super Bowl.  This goes against Open AI, Deepseek, and Gemini which all predicted Kansas City.</i></p><figcaption class="blockquote__byline"> From Unanimous AI </figcaption></blockquote></div><p class="paragraph" style="text-align:left;">It’s important to note that Unanimous’ post came out before the Superbowl was played. There’s a fascinating video embedded in the blog post that discusses their process in more detail that’s worth watching if you have a spare minute.</p><p class="paragraph" style="text-align:left;">Based on Unanimous’ write up, it sounds like AI wasn’t really doing the predicting at all, but instead was used to facilitate debate among <b>104 members of the general public</b> and then to aggregate the results into a Unanimous proprietary “Conviction score.”</p><p class="paragraph" style="text-align:left;">This is different! My guess is that most readers, especially readers that have spent time doing any sort of predictive modeling work, were thinking that I was about to talk about an AI model now widely accessible to the public that is going to spit out highly accurate gambling probabilities. That would be cool too—it might be more directly helpful to many people—but what makes this so fascinating is the way in which it opens the door for using Large Language Models to facilitate novel methods for calculating probabilities for real-world events.</p><h3 class="heading" style="text-align:left;" id="holding-court">Holding Court</h3><p class="paragraph" style="text-align:left;">I’m not a mathematician or probability expert, but I’ve spent enough time engaging with the relevant material to know that there’s a deeply philosophical nature to the topic of probability in the real world. This is because in the real world, though it’s obviously possible to calculate meaningful odds, many events don’t have precise intrinsic probabilities. Even if you subscribe to the strange ideas of thinkers like <a class="link" href="https://www.amazon.com/Determined-Science-Life-without-Free/dp/0525560971?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=better-bettors-automato-3" target="_blank" rel="noopener noreferrer nofollow">Robert Sapolksy</a> about life being completely deterministic, it’s still hard to imagine that in such a universe we humans would be able to create a prediction model of such Godlike omniscience to nullify the value of forecasting altogether (<span style="font-family:Aptos, sans-serif;font-size:12pt;"><a class="link" href="https://www.fxnetworks.com/shows/devs?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=better-bettors-automato-3" target="_blank" rel="noopener noreferrer nofollow">spoiler alert – you can watch this Fx show for an exploration of the idea</a></span>). The emergent properties of complex systems, combined with a dose of free will (real or felt), leave real world risk takers with no sharper tools than approximation.</p><p class="paragraph" style="text-align:left;">If we take a step back from the competition within prediction markets and analyze them as a whole, it becomes clear that prediction markets (such as sports betting exchanges) in aggregate are a beautifully innovative mechanism for approximating real-world odds. These markets are an aggregation of the opinions of many people, and because they involve a financial risk and reward, participants have a strong incentive to be right. They are the epitome of “Put your money where your mouth is” and have shown to be accurate over time.</p><p class="paragraph" style="text-align:left;">The approach that Unanimous described above, the facilitation of small group debate on a large scale, feels like a fundamentally new mechanism for calculating real-world odds. It might not work—the experiment will need to be run many more times and with a much larger number of individuals before being taken seriously—but large language models combined with other AI techniques like sentiment scoring at least open this up as a more feasible experiment than would have been possible before today.</p><p class="paragraph" style="text-align:left;">The expense of paying individual debate moderators on a mass scale would be high. Training each moderator to facilitate debates in a trustworthy way would be a challenge. Having participants fill out static forms would remove the debate-like aspect. Artificial intelligence finally provides the scalable, repetitive cognition necessary to make this possible.</p><p class="paragraph" style="text-align:left;">There are plenty of problems with this approach, the biggest one being that all that comes out at the end is a “Conviction score” and no one really knows how well that will correlate with real results over time. But…at the end of the day…what are things like Polymarket or sports betting markets other than measures of mass conviction? This experiment feels worth running.</p><p class="paragraph" style="text-align:left;">See you next week!</p><p class="paragraph" style="text-align:left;">Citations</p><p class="paragraph" style="text-align:left;">[1] <a class="link" href="https://www.actionnetwork.com/nba/knicks-vs-cavaliers-prediction-odds-parlay-pick-for-friday-february-21-qs?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=better-bettors-automato-3" target="_blank" rel="noopener noreferrer nofollow">https://www.actionnetwork.com/nba/knicks-vs-cavaliers-prediction-odds-parlay-pick-for-friday-february-21-qs</a></p></div><div class='beehiiv__footer'><br class='beehiiv__footer__break'><hr class='beehiiv__footer__line'><a target="_blank" class="beehiiv__footer_link" style="text-align: center;" href="https://www.beehiiv.com/?utm_campaign=89f02317-8682-44d2-b2cf-50033b12ac6e&utm_medium=post_rss&utm_source=automato">Powered by beehiiv</a></div></div>
  ]]></content:encoded>
</item>

      <item>
  <title>Off The Deep End  - Automato 🍅 #2 </title>
  <description>OpenAI&#39;s Deep Research Agent</description>
  <link>https://llementalmedia.beehiiv.com/p/off-the-deep-end-automato-2</link>
  <guid isPermaLink="true">https://llementalmedia.beehiiv.com/p/off-the-deep-end-automato-2</guid>
  <pubDate>Tue, 25 Feb 2025 17:07:53 +0000</pubDate>
  <atom:published>2025-02-25T17:07:53Z</atom:published>
    <dc:creator>Sam Reed</dc:creator>
  <content:encoded><![CDATA[
    <div class='beehiiv'><style>
  .bh__table, .bh__table_header, .bh__table_cell { border: 1px solid #C0C0C0; }
  .bh__table_cell { padding: 5px; background-color: #FFFFFF; }
  .bh__table_cell p { color: #2D2D2D; font-family: 'Helvetica',Arial,sans-serif !important; overflow-wrap: break-word; }
  .bh__table_header { padding: 5px; background-color:#F1F1F1; }
  .bh__table_header p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important; overflow-wrap: break-word; }
</style><div class='beehiiv__body'><h2 class="heading" style="text-align:left;" id="open-a-is-deep-research-agent">OpenAI’s Deep Research Agent</h2><p class="paragraph" style="text-align:left;">From an <a class="link" href="https://openai.com/index/introducing-deep-research/?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=off-the-deep-end-automato-2" target="_blank" rel="noopener noreferrer nofollow">OpenAI announcement</a> on February 2, 2025:</p><div class="blockquote"><blockquote class="blockquote__quote"></blockquote></div><p class="paragraph" style="text-align:left;">It’s December of 2015. As evidenced by the positive critical reception of A24’s sci-fi thriller <i>Ex Machina</i>, anxiety over sentient AI has entered the cultural zeitgeist. You’ve caught glimpses of this “AI”—this mysterious blend of software, hardware and soul—quietly (<i>menacingly</i>, perhaps) lurking in America’s kitchen corners. That’s right, it’s 2015 and AI is here, it’s for sale, <a class="link" href="https://press.aboutamazon.com/2015/6/amazon-echo-now-available-to-all-customers?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=off-the-deep-end-automato-2" target="_blank" rel="noopener noreferrer nofollow">and its name is Alexa</a>.</p><p class="paragraph" style="text-align:left;">Right.</p><p class="paragraph" style="text-align:left;">Though time has shown us that Alexa didn’t live up to its initial hype (Amazon still makes echo devices…don’t count ‘em out), another, much lesser-known event that took place in 2015, the founding of a <a class="link" href="https://openai.com/index/introducing-openai/?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=off-the-deep-end-automato-2" target="_blank" rel="noopener noreferrer nofollow">nonprofit research lab called OpenAI</a>, has certainly filled in the gap of technological overpromise and under-delivery.</p><p class="paragraph" style="text-align:left;">That’s right, OpenAI was hard at work for quite some time before the explosive growth of ChatGPT in November 2022. Let’s take a quick look at the blog post that announced OpenAI to the world in 2015:</p><div class="blockquote"><blockquote class="blockquote__quote"></blockquote></div><p class="paragraph" style="text-align:left;">There’s been a lot of drama about how OpenAI has changed over the years. In case you’ve been talking to your Echo Dot under a rock this whole time, the short story is that OpenAI was founded (see above) as a nonprofit research lab with the goal of advancing AI instead of generating a financial return. As is noted in a <a class="link" href="https://www.vox.com/future-perfect/380117/openai-microsoft-sam-altman-nonprofit-for-profit-foundation-artificial-intelligence?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=off-the-deep-end-automato-2" target="_blank" rel="noopener noreferrer nofollow">Vox Article</a> that covers this topic in detail, the company started to bump up against the inherent limitations of nonprofit capital raising early as 2019 and has been working since then to pull off the transition to an at least partially for-profit entity.</p><p class="paragraph" style="text-align:left;"><a class="link" href="https://openai.com/index/why-our-structure-must-evolve-to-advance-our-mission/?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=off-the-deep-end-automato-2" target="_blank" rel="noopener noreferrer nofollow">This nonprofit-to-for-profit transition</a> has been messy, resulting in 1) the OpenAI board’s unsuccessful attempt at <a class="link" href="https://www.nytimes.com/2023/11/17/technology/openai-sam-altman-ousted.html?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=off-the-deep-end-automato-2" target="_blank" rel="noopener noreferrer nofollow">ousting Altman</a> and 2) the resignation of many of the more academic, research-oriented team members from the company (compare Ilya’s <a class="link" href="https://ssi.inc/?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=off-the-deep-end-automato-2" target="_blank" rel="noopener noreferrer nofollow">new company’s website</a> to OpenAI’s to get a sense of creative differences). Eventually we got to the OpenAI of today, which is a nonprofit that owns a for-profit that is trying to buy the nonprofit <a class="link" href="https://www.bloomberg.com/opinion/articles/2025-02-11/sure-elon-musk-might-buy-openai?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=off-the-deep-end-automato-2" target="_blank" rel="noopener noreferrer nofollow">before Elon Musk</a> does, and, most importantly for the rest of this week’s edition of Automato 🍅, builds things like “Deep Research” agents.</p><h2 class="heading" style="text-align:left;" id="times-a-wastin">Time’s a Wastin’</h2><p class="paragraph" style="text-align:left;">I think one of the great questions of our present time, given the presumed significance of attaining advanced artificial intelligence, is the question of what OpenAI should actually be doing. The question is a little whacky given OpenAI’s seemingly inexhaustible access to funding (if they can iron out the aforementioned capital structure issues), but if you’re willing to assume that 1) OpenAI employs some of the most talented machine learning engineers in the world and 2) that maintaining AI supremacy <a class="link" href="https://www.technologyreview.com/2025/01/21/1110269/there-can-be-no-winners-in-a-us-china-ai-arms-race/?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=off-the-deep-end-automato-2" target="_blank" rel="noopener noreferrer nofollow">has large implications</a> for national security and economic prosperity, it follows that opportunity cost should be a significant decision factor in deciding which initiatives the company undertakes.</p><p class="paragraph" style="text-align:left;">I’d like to analyze the Deep Research Agent within the context of opportunity cost.</p><p class="paragraph" style="text-align:left;">Opportunity cost can arise from a number of factors for the typical business, with financial constraints being a base-level driver. CEOs, when figuring out how to navigate the competitive business landscape with limited financial capital at their disposal, must decide whether to hire, invest in new lines of business, fund marketing efforts, engage in M&A, pay out shareholders, and more. The effective deployment of human talent is obviously critical as well (among other things), but people need to get paid, so these decisions are either coupled with or come after the financial ones.</p><p class="paragraph" style="text-align:left;">From the outside looking in, it feels like OpenAI has the opposite problem. To be clear, I’m not saying they don’t have <i>any</i> financial problems to fix—their commercial products <a class="link" href="https://techcrunch.com/2025/01/05/openai-is-losing-money-on-its-pricey-chatgpt-pro-plan-ceo-sam-altman-says/?guccounter=1&guce_referrer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&guce_referrer_sig=AQAAAARs8FZvgecgjCVFwKQY0Q1Bm-TZOUKZcKHoIqph_fPpL-i6aNu50kqnDst4JqxlqtaSquyG0KKx8DCW-iUV8OF5fCebPMANTHY_npOnILFWbXaRXamJYy29Hx8hEwZGFkrk-axL-R_wsaam1pjBJG6hVvcaIJ6BBOXFjo9vW0wA&utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=off-the-deep-end-automato-2" target="_blank" rel="noopener noreferrer nofollow">aren’t revenue positive</a>, and as discussed above, converting to a more fundraising-friendly structure hasn’t been easy—but there’s a difference between a huge problem and a huge nuisance, and investors continue to show that capital raising falls into the latter category for Altman & Co.</p><p class="paragraph" style="text-align:left;">To me, OpenAI feels much more like a professional sports team than a typical business, in that their problems are driven by a lack of genuinely qualified human capital at their disposal (an ironic sentence for an AI company). This is magnified by the fact that the race to artificial general intelligence may indeed have only one winner.</p><p class="paragraph" style="text-align:left;">With limited ability to find and deploy employees that are capable of making novel contributions to the filed of artificial intelligence, it follows that leadership would take the utmost care in selecting projects for their superstar employees to pursue. Right?</p><p class="paragraph" style="text-align:left;">Right?</p><p class="paragraph" style="text-align:left;">Well…maybe not…because in the case of the Deep Research Agent, it kind of looks like the <a class="link" href="https://storm.genie.stanford.edu/?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=off-the-deep-end-automato-2" target="_blank" rel="noopener noreferrer nofollow">exact same product</a> was built and <a class="link" href="https://github.com/stanford-oval/storm?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=off-the-deep-end-automato-2" target="_blank" rel="noopener noreferrer nofollow">open-sourced</a> by a research team at Stanford over a year ago.</p><p class="paragraph" style="text-align:left;">Here’s a description of Stanford’s STORM product from its <a class="link" href="https://github.com/stanford-oval/storm?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=off-the-deep-end-automato-2" target="_blank" rel="noopener noreferrer nofollow">GitHub page</a>:</p><div class="blockquote"><blockquote class="blockquote__quote"></blockquote></div><p class="paragraph" style="text-align:left;">A product that conducts internet research and builds Wikipedia-style documents with full citations. More than 70,000 users. Used by experienced Wikipedia editors…</p><p class="paragraph" style="text-align:left;">This is the same thing as Deep Research Agent! I know that the some of you reading this might list out slightly different features or whatever, but come on: from the standpoint of novel technological achievement, this feels like the exact same thing. It’s LLMs that take inputs and trigger web searches and then create a reliable output. I’m not saying that it isn’t cool, I’m just saying that it has already been done.</p><p class="paragraph" style="text-align:left;">Think back to the Deep Research Agent announcement, which read, <i>“The ability to synthesize knowledge is a prerequisite for creating new knowledge. For this reason, deep research marks a significant step toward our broader goal of developing AGI, which we have long envisioned as capable of producing novel scientific research.”</i></p><p class="paragraph" style="text-align:left;">One more time.<i> Deep research marks a significant step toward our broader goal of developing AGI.</i></p><p class="paragraph" style="text-align:left;">I don’t know, man. I don’t know.</p><p class="paragraph" style="text-align:left;">The thing you need to keep in mind about 2025 is that nothing matters more for a business than having peoples’ attention online. This is especially true in competitive industries without much product differentiation like today’s Large Language Model industry (Google, XAI/Grok, OpenAI, Anthropic, Meta, DeepSeek, open-source options, and more). This is why social media influencers are worth so much in the world of marketing: people pay attention to influencers and so influencers can be used to advertise.</p><p class="paragraph" style="text-align:left;">OpenAI has a huge following. Everything that they do generates a big buzz both on social media and in mainstream news outlets. Is the Deep Research Agent the start of a trend where OpenAI just copies what’s being built with their models out in the wild, just to keep the media buzz alive and keep investors lining up?</p><p class="paragraph" style="text-align:left;">You can’t fault the strategy for a money-making enterprise! I’d love to be able to pull this off too. It’s just not what you’d expect from a nonprofit founded to cure all diseases and push humanity to the stars.</p><p class="paragraph" style="text-align:left;">However…</p><h2 class="heading" style="text-align:left;" id="a-new-look">A new look</h2><p class="paragraph" style="text-align:left;">There’s something important to note about OpenAI’s Deep Research Agent announcement that we haven’t touched on yet. After describing what Deep Research Agent is, the announcement goes on to say that DRA is <i>“Powered by a version of the upcoming OpenAI o3 model that’s </i><b><i>optimized for web browsing and data analysis.</i></b><i>”</i></p><p class="paragraph" style="text-align:left;">This is a subtle yet interesting point that is worthy of consideration. To break it down, it’s important to first talk about how Large Language Models like the ones that OpenAI builds are fundamentally closed off from the internet.</p><p class="paragraph" style="text-align:left;">Think back to middle school algebra class for a second. Imagine this question on your math homework:</p><p class="paragraph" style="text-align:left;"><i>Consider the function </i><i><b>f(x) = x + 2</b></i><i>. What is the result of </i><i><b>f(3)</b></i><i>?</i></p><p class="paragraph" style="text-align:left;">The answer to the question is <b>5</b>, because we’re plugging <b>3 </b>into the equation <b>x +</b> <b>2</b>, and <b>3+2</b> is <b>5</b>.</p><p class="paragraph" style="text-align:left;">Now imagine these are the next two problems on your homework:</p><p class="paragraph" style="text-align:left;"><i>What is the result of f(4)? What is the result of f(5)?</i></p><p class="paragraph" style="text-align:left;">Did you get <b>6 </b>and <b>7</b>? If so, we’re rolling.</p><p class="paragraph" style="text-align:left;">Now, what if this was the last question:</p><p class="paragraph" style="text-align:left;"><i>What is the result of f(search google for “Best FanDuel bets tonight”)?</i></p><p class="paragraph" style="text-align:left;">Maybe a mathematician will correct me, but this last problem doesn’t make much sense.</p><p class="paragraph" style="text-align:left;">The same thing goes for Large Language Models. Conceptually, LLMs are just giant, complex mathematical functions that take an input and return an output, but the inputs and outputs are representations of human text and speech.</p><p class="paragraph" style="text-align:left;">The reason that this is so important to keep in mind is that Large Language Models can’t actually search the internet—again, they’re just equations that take inputs and return outputs—but if you combine their outputs with something that <b>can</b> search the internet, say, for example, a traditional program like a search engine, you can stack the AI and non-AI pieces together like Legos and create something that is no-longer closed off to the internet. The industry has started calling these things “Agents.”</p><p class="paragraph" style="text-align:left;">Most of what OpenAI has been trying to do up to this point is build models that are meant to answer people’s questions (which is exactly what we all want from tools like ChatGPT). But now (if what they said in the Deep Research Agent announcement is true) it sounds like OpenAI has started to optimize new models for tasks other than good Q&A responses, perhaps for doing things like generating good internet search keywords. For example, if you asked the question:</p><p class="paragraph" style="text-align:left;">“What are the best stocks to buy in 2025?”</p><p class="paragraph" style="text-align:left;">An old model might respond with something like:</p><div class="codeblock"><pre><code>Identifying the best stock to buy in 2025 involves analyzing current market trends, company financials…,</code></pre></div><p class="paragraph" style="text-align:left;">Whereas one of these new, specialized models might be more likely to respond to the same question with:</p><div class="codeblock"><pre><code>&#123;
  &quot;action&quot;: &#123;
    &quot;type&quot;: &quot;Web search&quot;,
    &quot;searchTerms&quot;: [&quot;Stocks&quot;,&quot;2025&quot;,&quot;Bloomberg&quot;]
  &#125;
&#125;</code></pre></div><p class="paragraph" style="text-align:left;">The latter response is obviously incomprehensible to a person, but that&#39;s the point: it’s not meant for human consumption. A response like this is meant to be parsed by a program (i.e. not AI, just regular old code), and executed as an internet search, and then the results would be collected (via web scraping or other methods), and then potentially passed to another, more ChatGPT-like model to be assembled into a comprehensible response.</p><p class="paragraph" style="text-align:left;">What we can glean from this is that if OpenAI is transitioning from a pure focus on training massive, general-purpose, human-facing style models to specialized models that are meant to dovetail with non-AI programs, such as programs that conduct internet searches, that maybe OpenAI feels like it no longer needs to push the frontiers of its flagship models to achieve artificial general intelligence and is instead shifting to building infrastructure to more easily connect their models with the outside world. In other words, maybe they feel like the brain has successfully been built, and now it’s time to give it arms and hands.</p><p class="paragraph" style="text-align:left;">If this is true, then maybe my earlier criticism is unwarranted, and this is <b>exactly</b> what OpenAI should be doing. There are a lot of ways in which this makes sense, the main one (in my mind) being that, as discussed earlier, these models are just isolated functions at their core, and without building them the boats and bridges necessary to escape their islands, they may never truly come “alive.”</p><p class="paragraph" style="text-align:left;">OpenAI’s goal from the start has been to build Artificial General Intelligence. Who knows, maybe they’re as close as they say they are.</p><p class="paragraph" style="text-align:left;">See you next week! </p></div><div class='beehiiv__footer'><br class='beehiiv__footer__break'><hr class='beehiiv__footer__line'><a target="_blank" class="beehiiv__footer_link" style="text-align: center;" href="https://www.beehiiv.com/?utm_campaign=79d16873-624a-4df8-a795-ed512f161729&utm_medium=post_rss&utm_source=automato">Powered by beehiiv</a></div></div>
  ]]></content:encoded>
</item>

      <item>
  <title>Automato 🍅 #1 - AI in the IDE</title>
  <description>Examining one of generative AI&#39;s early stars</description>
      <enclosure url="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/5ba1206b-db98-40da-a933-16220c3f3717/cdiv.png" length="364196" type="image/png"/>
  <link>https://llementalmedia.beehiiv.com/p/automato-1-ai-in-the-ide</link>
  <guid isPermaLink="true">https://llementalmedia.beehiiv.com/p/automato-1-ai-in-the-ide</guid>
  <pubDate>Tue, 18 Feb 2025 16:06:01 +0000</pubDate>
  <atom:published>2025-02-18T16:06:01Z</atom:published>
    <dc:creator>Sam Reed</dc:creator>
  <content:encoded><![CDATA[
    <div class='beehiiv'><style>
  .bh__table, .bh__table_header, .bh__table_cell { border: 1px solid #C0C0C0; }
  .bh__table_cell { padding: 5px; background-color: #FFFFFF; }
  .bh__table_cell p { color: #2D2D2D; font-family: 'Helvetica',Arial,sans-serif !important; overflow-wrap: break-word; }
  .bh__table_header { padding: 5px; background-color:#F1F1F1; }
  .bh__table_header p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important; overflow-wrap: break-word; }
</style><div class='beehiiv__body'><h1 class="heading" style="text-align:left;" id="a-warm-welcome">A warm welcome</h1><p class="paragraph" style="text-align:left;">Hello and welcome to the inaugural edition of my newsletter. I&#39;m glad you&#39;re here.</p><p class="paragraph" style="text-align:left;">Let’s establish a few ground rules before moving on: </p><ol start="1"><li><p class="paragraph" style="text-align:left;">Much like the startup to which this newsletter is attached, you should expect this column to change over time. This is tech, folks – the only guarantee is that if you don&#39;t constantly make your old self obsolete, someone else will. </p><p class="paragraph" style="text-align:left;"></p></li><li><p class="paragraph" style="text-align:left;">A core belief of mine is that the best way to find out what is right is to put something wrong in front of others. That <b>doesn’t</b> mean I’m not attempting to be insightful and correct with what I write here—quite the contrary—but what it <b>does</b> mean is that I&#39;m eager to hear your thoughts and disagreements with anything that I write. In short, please think of each post as a long-winded debate topic instead of a lecture.</p><p class="paragraph" style="text-align:left;"></p></li><li><p class="paragraph" style="text-align:left;">Thoughts and opinions expressed in here are mine alone.</p></li></ol><h1 class="heading" style="text-align:left;" id="ai-for-programming">AI for Programming</h1><p class="paragraph" style="text-align:left;">Let’s kick off Automato with an automation topic that we’ll likely return to regularly: programming.<span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;"> </span></p><p class="paragraph" style="text-align:left;">In my completely anecdotal but almost certainly correct opinion, as of today, the leading industrial use case for large language models (AI) is in software development (by a wide margin). The Stack Overflow <a class="link" href="https://survey.stackoverflow.co/2024/ai/?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=automato-1-ai-in-the-ide" target="_blank" rel="noopener noreferrer nofollow">2024 annual software developer survey </a>supports this claim: 76% of survey respondents said that they’re either using or planning to use AI as part of their software development process in 2025.  <span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;"> </span></p><p class="paragraph" style="text-align:left;">AI’s early success in the software engineering industry makes sense for a few reasons.<span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;"> </span></p><p class="paragraph" style="text-align:left;"><span style="color:rgba(0, 0, 0, 0.9);font-family:-apple-system, system-ui, "system-ui", "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif;font-size:16px;">First of all, AI as we know it today is being built by teams of software developers. As one might expect, software developers understand software quite well, especially the elite devs and researchers that get jobs at companies like </span><a class="link" href="https://www.reuters.com/technology/artificial-intelligence/openai-talks-investment-round-valuing-it-up-340-billion-wsj-reports-2025-01-30/?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=automato-1-ai-in-the-ide" target="_blank" rel="noopener noreferrer nofollow">OpenAI</a><span style="color:rgba(0, 0, 0, 0.9);font-family:-apple-system, system-ui, "system-ui", "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif;font-size:16px;">. Given that software is so front-of-mind for these companies and their employees (along with the ecosystem of startups that are building new products on top of their AI models), going after coders is as natural a first step towards capturing real economic value as any other. To think that a group of AI software startups would first try to disrupt an industry other than software development is like asking a typical high school student what AI is good for and expecting them to say something other than “Homework.” </span></p><p class="paragraph" style="text-align:left;">Second, the software development industry is, quite understandably, very tech forward! Many individuals are drawn to the industry because they genuinely love programming, which manifests itself in contributions to open-source (read: unpaid) projects, time spent learning new <a class="link" href="https://stackoverflow.blog/2025/01/01/developers-want-more-more-more-the-2024-results-from-stack-overflow-s-annual-developer-survey/?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=automato-1-ai-in-the-ide" target="_blank" rel="noopener noreferrer nofollow">programming languages and paradigms</a> outside of working hours, and in the hundreds of thousands of views that code influencers like <a class="link" href="https://www.youtube.com/@ThePrimeTimeagen?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=automato-1-ai-in-the-ide" target="_blank" rel="noopener noreferrer nofollow">The Primagen</a> and <a class="link" href="https://www.youtube.com/@t3dotgg?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=automato-1-ai-in-the-ide" target="_blank" rel="noopener noreferrer nofollow">Theo</a> regularly receive. Given their general openness to experimentation, it’s not surprising that developers have been eager to get their hands dirty with AI tools.</p><p class="paragraph" style="text-align:left;">Finally, the modern software ecosystem is steeped in the “free and open-source&quot; tradition, which, if you’re not familiar, is the practice of exposing your project’s code to the world (using online code repository hosts like <a class="link" href="https://github.com/?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=automato-1-ai-in-the-ide" target="_blank" rel="noopener noreferrer nofollow">GitHub</a>) so that anyone can use it or request to modify it. This open and collaborative practice has inadvertently created a massive online database of code that AI companies like <a class="link" href="https://www.anthropic.com/?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=automato-1-ai-in-the-ide" target="_blank" rel="noopener noreferrer nofollow">Anthropic</a> can use to train their models. As a result, years of developers’ code contributions to out-in-the-open projects have been snatched up and are now being used to threaten the livelihood of these same developers. Anyways, this newsletter isn’t about moralizing, so let’s get back to business. Sucks to suck!</p><h1 class="heading" style="text-align:left;" id="cursor">Cursor</h1><p class="paragraph" style="text-align:left;">Due to the reasons mentioned above, a relatively new class of AI-centric programming tools is already seeing high rates of adoption. At present, the most beloved of such tools is <b>Cursor</b>, the “<a class="link" href="https://www.cursor.com/?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=automato-1-ai-in-the-ide" target="_blank" rel="noopener noreferrer nofollow">AI Code Editor.</a>”</p><p class="paragraph" style="text-align:left;">For those who don’t know, many programmers write code in what’s called an “IDE” (integrated development environment) which is like Microsoft Word or Google Docs but with bells and whistles for programming instead of plain old writing.<span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;"> </span></p><p class="paragraph" style="text-align:left;">Cursor is a new flavor of IDE (it’s actually a modified version of an <a class="link" href="https://code.visualstudio.com/?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=automato-1-ai-in-the-ide" target="_blank" rel="noopener noreferrer nofollow">open-source Microsoft product</a>) that makes AI a front-and-center part of the development experience. People enjoy using Cursor for features like advanced autocomplete, file editing and fairly frictionless Q & A:<span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;"> </span></p><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/5ba1206b-db98-40da-a933-16220c3f3717/cdiv.png?t=1739849366"/><div class="image__source"><span class="image__source_text"><p>Can you please help me center this div?</p></span></div></div><p class="paragraph" style="text-align:left;">Because the AI assistant is embedded right in the IDE and can therefore edit a program directly, Cursor is seeing a somewhat surprising level of adoption not just from software engineers, but also from people who don’t know how to code at all, leading to a new English-first coding style called “Prompt & Pray” programming (I’m just kidding).<span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;"> </span></p><p class="paragraph" style="text-align:left;">Overall, the product&#39;s first year(ish) was nothing short of an explosive, viral success, which resulted in Anysphere (Cursor’s parent company) raising a huge $105 million <a class="link" href="https://www.cursor.com/blog/series-b?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=automato-1-ai-in-the-ide" target="_blank" rel="noopener noreferrer nofollow">Series B investment from prominent VCs such as Andreesen Horowitz.</a> Here’s a quote from their Series B announcement:<span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;"> </span></p><div class="blockquote"><blockquote class="blockquote__quote"></blockquote></div><p class="paragraph" style="text-align:left;">Now—just to be clear—unlike the firms that participated in the aforementioned financing round, I don’t have a laundry list of LPs begging me to speculate on startups with their capital, so I’m probably missing something here. That said, I&#39;m unclear about Anysphere / Cursor’s long-term edge, for a few reasons that I’ll detail below.<span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;"> </span></p><h1 class="heading" style="text-align:left;" id="risky-business">Risky Business:</h1><p class="paragraph" style="text-align:left;">Before moving on, I think it’s worth acknowledging that the Cursor team seems like nice people. I’ve listened to them on the <a class="link" href="https://www.youtube.com/watch?v=oFfVt3S51T4&utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=automato-1-ai-in-the-ide" target="_blank" rel="noopener noreferrer nofollow">Lex Fridman podcast</a> and emailed them personally for support and don’t have anything bad to say. <span style="color:rgba(0, 0, 0, 0.9);font-family:-apple-system, system-ui, "system-ui", "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif;font-size:16px;">However, they’re a great example of the complications of building AI-native products, so let’s get on with the analysis of their tricky road ahead.</span></p><p class="paragraph" style="text-align:left;"><b>Competition is for losers</b></p><p class="paragraph" style="text-align:left;">There is an obvious threat that I want to get out of the way first: intense competition from Microsoft and other startups. Microsoft clearly still wants people using its Visual Studio Code IDE, which, you’ll remember, is Cursor’s estranged parent. A <a class="link" href="https://code.visualstudio.com/blogs/2024/12/18/free-github-copilot?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=automato-1-ai-in-the-ide" target="_blank" rel="noopener noreferrer nofollow">recent announcement</a> from Microsoft about a new, generous free tier for GitHub copilot (another AI programming assistant) for all Visual Studio Code users shows that Microsoft isn’t going down without a fight (to add to the strangeness of the situation, Microsoft is a <a class="link" href="https://www.cnbc.com/2023/04/08/microsofts-complex-bet-on-openai-brings-potential-and-uncertainty.html?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=automato-1-ai-in-the-ide" target="_blank" rel="noopener noreferrer nofollow">massive investor</a><span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;"> in OpenAI, </span><a class="link" href="https://anysphere.inc/blog/openai-fund?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=automato-1-ai-in-the-ide" target="_blank" rel="noopener noreferrer nofollow">which is an investor</a> in Anysphere). Anysphere certainly has a big enough piggy bank to stay in the game for a while, but I’m not positive that getting to $100 million in revenue in under a year like they did is as much a sign of genius business acumen as it is a sign of a low-hanging-fruit product that went viral. Cursor did not create an industry or spend years selling people on their vision of the future. They just took a mature, beloved product and gave it a youthful haircut (a very nice haircut indeed!). As such, Microsoft or other startup competitors like <a class="link" href="https://codeium.com/windsurf?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=automato-1-ai-in-the-ide" target="_blank" rel="noopener noreferrer nofollow">Windsurf</a> pose legitimate threats to any current or future profitability that influenced Cursor’s most recent valuation.<span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;"> </span></p><p class="paragraph" style="text-align:left;"><b>Programmer Skill Atrophy</b><span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;"> </span></p><p class="paragraph" style="text-align:left;">Another, albeit subtle, threat is that too much help from AI actually turns out to be something that programmers sour on. You need to look no further than the author of this post to find an example of this trend. I paid for and used Cursor daily for several months, but I ended up cancelling the $20/month subscription when I realized the way in which it was dulling my skillset.</p><p class="paragraph" style="text-align:left;">I don’t expect too many people to make this choice, especially those who have never programmed without AI, but to me, the energy required to pause, think and write well-formed questions to ChatGPT (or, dare I say, read documentation!) is well worth it in order to help with skill maintenance. One of my business hot takes is that the increased productivity that companies feel when adopting Slack is an illusion caused by the friction it removes from email, but in reality it’s that same friction that causes critical thought and efficiency in the first place. Same goes for AI IDEs (either that or AI will replace me first). <span style="color:rgba(0, 0, 0, 0.9);font-family:-apple-system, system-ui, "system-ui", "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif;font-size:16px;">What&#39;s that story about the tortoise and the hare again?</span></p><p class="paragraph" style="text-align:left;"><b>RAG and the DoDo Bird</b><span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;"> </span></p><p class="paragraph" style="text-align:left;">Here’s where things get really interesting. I recently came across a new database vendor called <a class="link" href="https://turbopuffer.com/?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=automato-1-ai-in-the-ide" target="_blank" rel="noopener noreferrer nofollow">TurboPuffer</a> that has a big quote from Cursor in the testimonials section of its landing page:<span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;"> </span></p><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/889d1bb4-c40f-40c9-8307-7f2080ad414a/turbopuffer.png?t=1739849201"/><div class="image__source"><span class="image__source_text"><p>After switching our vector db to @turbopuffer, we’re saving an order of magnitude in costs and dealing with far less complexity! -Aman Sanger, Co-founder, Cursor</p></span></div></div><p class="paragraph" style="text-align:left;">So unless this is outdated, I’m going to assume that Cursor is using Turbopuffer as a database vendor.<span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;"> </span></p><p class="paragraph" style="text-align:left;">To understand why this is interesting, we must first visit a concept that has emerged from the large language model/generative AI movement called “Retrieval Augmented Generation” (RAG). RAG is a fancy term for a simple concept: because each language model (think: ChatGPT) only accepts a finite number of words (tokens) as input, you need to <span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;"><i>retrieve</i></span> the subset of information (out of all possible info that you could theoretically grab) that you’re providing a model with carefully. For example, try pasting this entire essay along with the question “What’s this guy’s deal?” into ChatGPT – it&#39;ll probably tell you that your question is too long (phew!).<span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;"> Now try again, but only with one paragraph. Congratulations, you’ve just done RAG.</span></p><p class="paragraph" style="text-align:left;">Given Cursor’s stated use of Turbopuffer, my speculation is that part of Cursor’s magic is that they are very good at selecting the relevant snippets from your codebase to send to AI (i.e. they’re good at RAG), which is extremely important for getting high-quality answers (the ultimate test of their product’s usefulness) when the amount of information that you can provide to a model <span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;"><b>is limited</b></span><span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;">. </span></p><p class="paragraph" style="text-align:left;">But...what if...the amount of information that you can provide...becomes (essentially) unlimited?<span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;"> </span></p><p class="paragraph" style="text-align:left;">A brand new model from Google, <a class="link" href="https://deepmind.google/technologies/gemini/flash/?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=automato-1-ai-in-the-ide" target="_blank" rel="noopener noreferrer nofollow">Gemini 2.0 Flash</a> is certainly making moves in this direction. The model takes up to <span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;"><b>1 million tokens</b></span><span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;"> as input (</span><a class="link" href="https://www.youtube.com/watch?v=8otpw68_C0Q&utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=automato-1-ai-in-the-ide" target="_blank" rel="noopener noreferrer nofollow">here’s a good video from Theo</a> describing the model).<span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;"> </span></p><p class="paragraph" style="text-align:left;">For reference, here are the token counts for the OpenAI models that Cursor was likely designed for (all of these are state of the art models, this is not a diss in any way):</p><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/2475a086-4142-482e-bdf0-677ef33bc3dc/gptmods.png?t=1739849572"/><div class="image__source"><span class="image__source_text"><p>GPT-4 Turbo and GPT-4 model series</p></span></div></div><p class="paragraph" style="text-align:left;">That’s right, we’ve gone from 128k tokens of maximum input length (8k if you look at the bottom of the picture) in April 2024 to 1 million today! The Google model is also very inexpensive in terms of the cost per token.<span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;"> </span></p><p class="paragraph" style="text-align:left;">All of this is to say that Cursor’s elite ability to generate quality code could be in jeopardy if entire codebases can just be dumped into a model prompt with ease. Even the fastest horse isn’t going to outrun my ‘24 Subaru (hot dang!).<span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;"> </span></p><p class="paragraph" style="text-align:left;">Also, the possibility of the single-file web app is realistically already here. A lightweight Python framework like <a class="link" href="https://fastapi.tiangolo.com/deployment/concepts/?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=automato-1-ai-in-the-ide" target="_blank" rel="noopener noreferrer nofollow">FastAPI</a> or <a class="link" href="https://www.starlette.io/?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=automato-1-ai-in-the-ide" target="_blank" rel="noopener noreferrer nofollow">Starlette</a> could easily support this with a very low token counts. This makes the ability to archive and traverse a nested file system less important, which would make a Cursor copycat designed for the non-programmer demographic easier to build.</p><p class="paragraph" style="text-align:left;"><b>The AI Software Development Tool Paradox</b><span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;"> </span></p><p class="paragraph" style="text-align:left;">The final sticking point that comes to mind is what I call the “AI Software Development Tool Paradox” (I’ll come up with a better name). <span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;"> </span></p><p class="paragraph" style="text-align:left;"><span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;">I </span><a class="link" href="https://x.com/sam_e_reed/status/1867243601950060828?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=automato-1-ai-in-the-ide" target="_blank" rel="noopener noreferrer nofollow">tweeted about this</a><span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;"> when I saw that </span><a class="link" href="https://devin.ai/?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=automato-1-ai-in-the-ide" target="_blank" rel="noopener noreferrer nofollow">Devin</a>, a new product being touted as the first AI software engineer (what’d I say earlier about developers going after developers?) was charging $500 per month for a subscription.<span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;"> </span></p><p class="paragraph" style="text-align:left;">As the frugal startup founder that I am, I can’t imagine a better first question for my $500 per month AI software developer than “Hey Devin – here&#39;s how to call OpenAI’s API – can you please build me an AI software engineer?”<span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;"> </span></p><p class="paragraph" style="text-align:left;">This is a problem for the industry as a whole, but I can’t help but scratch my head whenever I see these <span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;"><b>businesses that sell software that builds software</b></span>. Like the snake eating its tail, when these products eventually get good, they’ll devour themselves. Cursor might have the advantage right now of reduced per-token pricing from OpenAI and Anthropic (I don’t know if this is true, just saying it’s possible) but if we are to believe all of the rhetoric around everyone becoming a programmer, then it follows that everyone will be able to build their own software (note that open-source, highly-customizable IDEs such as <a class="link" href="https://neovim.io/?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=automato-1-ai-in-the-ide" target="_blank" rel="noopener noreferrer nofollow">Neovim</a> already prove the appetite for such products among experienced developers). I would not be surprised if we see Anysphere move into the infrastructure space (like <a class="link" href="https://replit.com/deployments?utm_source=llementalmedia.beehiiv.com&utm_medium=newsletter&utm_campaign=automato-1-ai-in-the-ide" target="_blank" rel="noopener noreferrer nofollow">competitor Replit</a>) to mitigate this risk.</p><h1 class="heading" style="text-align:left;" id="in-closing">In closing</h1><p class="paragraph" style="text-align:left;">There’s a big piece that I’ve left out of the above discussion: marketing. In 2025, there’s nothing harder than getting people&#39;s attention, and it’s obvious that Cursor has found a way to do just that. Do not underestimate this as a serious competitive moat.<span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;"> </span></p><p class="paragraph" style="text-align:left;"><span style="color:rgba(0, 0, 0, 0.9);font-family:-apple-system, system-ui, "system-ui", "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif;font-size:16px;">I think that accelerating a developer’s productivity or giving someone the ability to produce working code for the fairly low cost of $20 per month is a wonderful thing. I’m not trying to wish ill will on Cursor—I just think that they are a perfect example of the tough road ahead for anyone operating in this space. It’s easy to see the huge valuations and think that high-growth tech companies have it all figured out, but if I had to bet on one winner in AI, it would be the consumer (hopefully the human one).</span></p><p class="paragraph" style="text-align:left;"><span style="font-family:Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, sans-serif;font-size:12pt;">Time will tell. See you next time! </span></p></div><div class='beehiiv__footer'><br class='beehiiv__footer__break'><hr class='beehiiv__footer__line'><a target="_blank" class="beehiiv__footer_link" style="text-align: center;" href="https://www.beehiiv.com/?utm_campaign=bd7ff4ff-828a-496d-b9e1-8d91891372bc&utm_medium=post_rss&utm_source=automato">Powered by beehiiv</a></div></div>
  ]]></content:encoded>
</item>

  </channel>
</rss>
