<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>GenAI360 - Weekly AI News</title>
    <description>Weekly Gen AI Industry &amp; Research News, Curated by Team Activeloop. Read by 45K+ AI leaders, engineers, &amp; enthusiasts from 63% of Fortune 500 companies.</description>
    
    <link>https://genai360.beehiiv.com/</link>
    <atom:link href="https://rss.beehiiv.com/feeds/YGRtoYfULM.xml" rel="self"/>
    
    <lastBuildDate>Tue, 14 Apr 2026 17:44:31 +0000</lastBuildDate>
    <pubDate>Tue, 02 Dec 2025 15:01:23 +0000</pubDate>
    <atom:published>2025-12-02T15:01:23Z</atom:published>
    <atom:updated>2026-04-14T17:44:31Z</atom:updated>
    
      <category>Data Science</category>
      <category>Artificial Intelligence</category>
      <category>Technology</category>
    <copyright>Copyright 2026, GenAI360 - Weekly AI News</copyright>
    
    <image>
      <url>https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/publication/logo/0b5c4f4b-3417-433f-b591-fbfcd867c8c0/mikayelh_A_3d_isometric_illustration_of_white_and_orange_loop_m_35436fbb-e68d-444d-90cd-834aa96814ea.png</url>
      <title>GenAI360 - Weekly AI News</title>
      <link>https://genai360.beehiiv.com/</link>
    </image>
    
    <docs>https://www.rssboard.org/rss-specification</docs>
    <generator>beehiiv</generator>
    <language>en-us</language>
    <webMaster>support@beehiiv.com (Beehiiv Support)</webMaster>

      <item>
  <title>Announcing Activeloop’s Scientific Discover</title>
  <description>Connecting Research Data to Intelligence for Faster Scientific Discovery</description>
      <enclosure url="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/dfe60736-2ead-4927-bf2f-acc0341d9c9b/image.png" length="99919" type="image/png"/>
  <link>https://genai360.beehiiv.com/p/announcing-activeloop-s-scientific-discover</link>
  <guid isPermaLink="true">https://genai360.beehiiv.com/p/announcing-activeloop-s-scientific-discover</guid>
  <pubDate>Tue, 02 Dec 2025 15:01:23 +0000</pubDate>
  <atom:published>2025-12-02T15:01:23Z</atom:published>
    <dc:creator>Davit Buniatyan</dc:creator>
  <content:encoded><![CDATA[
    <div class='beehiiv'><style>
  .bh__table, .bh__table_header, .bh__table_cell { border: 1px solid #C0C0C0; }
  .bh__table_cell { padding: 5px; background-color: #FFFFFF; }
  .bh__table_cell p { color: #2D2D2D; font-family: 'Helvetica',Arial,sans-serif !important; overflow-wrap: break-word; }
  .bh__table_header { padding: 5px; background-color:#F1F1F1; }
  .bh__table_header p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important; overflow-wrap: break-word; }
</style><div class='beehiiv__body'><p class="paragraph" style="text-align:left;">Today we are excited to release <b>Activeloop’s Scientific Discover</b>, an intelligence agent built on one of the largest datasets of indexed scientific research. Here are the details:</p><ul><li><p class="paragraph" style="text-align:left;"><b>A fully indexed dataset of 25M open-access scientific papers</b>: more than <b>450M pages</b> of text, figures, charts, molecules, and tables (<b>175TB+ total</b>).</p></li><li><p class="paragraph" style="text-align:left;"><b>An open-source scientific intelligence agent</b> built on this dataset, achieving <b>48% SOTA performance on Humanity’s Last Exam</b> when paired with our tools and research index.</p></li></ul><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/dfe60736-2ead-4927-bf2f-acc0341d9c9b/image.png?t=1764614308"/></div><p class="paragraph" style="text-align:left;">Researchers across biotech, life science, and academia tell us the same story.</p><p class="paragraph" style="text-align:left;">“Our knowledge is everywhere, but our answers are nowhere.”</p><p class="paragraph" style="text-align:left;">Scientific insight is buried inside PDFs, figures, screenshots, tables, and formats that machines cannot read. Teams lose weeks to extraction and cleanup instead of actual discovery.</p><p class="paragraph" style="text-align:left;">This bottleneck slows breakthroughs in drug development, materials science, and every field that depends on research.</p><p class="paragraph" style="text-align:left;">Activeloop’s Scientific Discover is a scientific data agent that reads and reasons across the entire scientific literature.</p><h2 class="heading" style="text-align:left;" id="the-us-needs-to-unify-all-scientifi">The US Needs to Unify All Scientific Data</h2><p class="paragraph" style="text-align:left;">The White House recently launched the <a class="link" href="https://www.whitehouse.gov/presidential-actions/2025/11/launching-the-genesis-mission/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=announcing-activeloop-s-scientific-discover" target="_blank" rel="noopener noreferrer nofollow">Genesis Mission</a> to unify all datasets for scientific discovery, recognizing that the current infrastructure cannot support the AI agents needed to cure diseases or discover new materials.</p><div class="blockquote"><blockquote class="blockquote__quote"></blockquote></div><p class="paragraph" style="text-align:left;">Fully indexed research data will accelerate scientific discovery through AI. Activeloop’s Scientific Discover takes a large step in this direction. </p><h2 class="heading" style="text-align:left;" id="175-tb-of-scientific-research-data-">175TB of Scientific Research Data indexed on Deep Lake</h2><iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen="true" class="youtube_embed" frameborder="0" height="100%" src="https://youtube.com/embed/x8Lv5-C9ntw" width="100%"></iframe><p class="paragraph" style="text-align:left;">We have successfully indexed <b>175TB of open-access scientific data</b>, creating one of the world&#39;s largest AI-ready scientific datasets. </p><p class="paragraph" style="text-align:left;">It is a fully structured, multimodal knowledge base powered by <b>Deep Lake</b>.</p><p class="paragraph" style="text-align:left;">Traditional search engines see scientific papers as flat text mostly just titles and abstracts, often discarding the most critical data resorted within papers such as charts, molecular structures, and mathematical formulas. </p><p class="paragraph" style="text-align:left;">By utilizing <a class="link" href="https://github.com/activeloopai/deeplake?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=announcing-activeloop-s-scientific-discover" target="_blank" rel="noopener noreferrer nofollow">Deep Lake</a>’s tensor-based storage, we have preserved this multimodal context, allowing our AI agents to &quot;read&quot; papers with the same visual and semantic understanding as a human researcher.</p><ul><li><p class="paragraph" style="text-align:left;"><b>Scale:</b> 25 million open-access papers comprising over 450+ million pages.</p></li><li><p class="paragraph" style="text-align:left;"><b>Multimodality:</b> Images, tables, and graphs are indexed alongside text, preserving the relationships between distinct data types (e.g., a chemical structure image linked to its textual description).</p></li><li><p class="paragraph" style="text-align:left;"><b>Cutoff Date</b>: March, 2025</p></li><li><p class="paragraph" style="text-align:left;"><b>Infrastructure:</b> Built on Deep Lake’s &quot;Index-on-the-Lake&quot; technology, this dataset is stored efficiently on S3 Express, enabling sub-second retrieval of complex multimodal queries without the latency or cost of traditional vector databases.</p></li></ul><p class="paragraph" style="text-align:left;">This dataset serves as the foundational &quot;brain&quot; for the L1 Science Data Agent, ensuring it retrieves answers based on ground-truth scientific evidence rather than hallucination. </p><p class="paragraph" style="text-align:left;">You run the queries over API or at <a class="link" href="https://chat.activeloop.ai/science?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=announcing-activeloop-s-scientific-discover" target="_blank" rel="noopener noreferrer nofollow">chat.activeloop.ai/science</a></p><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/a5ff1e75-1817-4f0a-9801-318f48c1a1e2/image.png?t=1764614682"/></div><h2 class="heading" style="text-align:left;" id="humanitys-last-exam">Humanity’s Last Exam</h2><p class="paragraph" style="text-align:left;">Equipping the data agent with 3 tools: Code Interpreter, Web Search and Scientific Search via Activeloop API we achieve state of the art results on HLE benchmark. </p><p class="paragraph" style="text-align:left;">The agent achieves 43% accuracy in single pass, and with pass@2 48% attempting all 2500 queries including ones containing images or gifs. LLM cost per single iteration is under $1.</p><p class="paragraph" style="text-align:left;">While not exactly the same, one might speculate that Deep Think, GPT5 Pro and Grok Heavy employ 8 parallel trajectories simultaneously and then aggregate the final result. It makes an approximately equivalent case to the pass@8 score.</p><p class="paragraph" style="text-align:left;">To the most recent concerns of benchmark leakage especially with web search, we blocked access to the HuggingFace website. Furthermore, using LLMs we analyzed all executed traces to identify potential answer leaks. While 4.9% of answers suspected in leakage, only 0.2% where instances of tool usage (web search and scientific search).</p><p class="paragraph" style="text-align:left;">We are open sourcing the full code to reproduce the benchmarks at <a class="link" href="https://github.com/activeloopai/hle?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=announcing-activeloop-s-scientific-discover" target="_blank" rel="noopener noreferrer nofollow">http://github.com/activeloopai/hle</a></p><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/6e4cf67b-1633-441f-8abb-d08b731c6b43/image.png?t=1764614816"/></div><h2 class="heading" style="text-align:left;" id="activeloop-l-1-the-multimodal-scien">Activeloop L1: The Multimodal Scientific Agent</h2><p class="paragraph" style="text-align:left;">L1 is not a paper search engine. It is an AI researcher.</p><p class="paragraph" style="text-align:left;">It understands:</p><ul><li><p class="paragraph" style="text-align:left;">text</p></li><li><p class="paragraph" style="text-align:left;">charts</p></li><li><p class="paragraph" style="text-align:left;">molecules</p></li><li><p class="paragraph" style="text-align:left;">protein structures</p></li><li><p class="paragraph" style="text-align:left;">formulas</p></li><li><p class="paragraph" style="text-align:left;">experimental tables</p></li><li><p class="paragraph" style="text-align:left;">clinical graphs</p></li></ul><p class="paragraph" style="text-align:left;">This is possible because L1 runs on the largest visually indexed scientific dataset ever created.</p><p class="paragraph" style="text-align:left;"><b>25 million papers. Over 175+ terabytes of fully indexed data.</b></p><h2 class="heading" style="text-align:left;" id="instant-scientific-intelligence">Instant Scientific Intelligence</h2><p class="paragraph" style="text-align:left;">Ask a question like:</p><p class="paragraph" style="text-align:left;"><i>Which compounds show synergy with metformin for type 2 diabetes?</i></p><p class="paragraph" style="text-align:left;">L1 reads molecular structures, clinical results, and experimental evidence in seconds.</p><p class="paragraph" style="text-align:left;">No manual extraction. No stitching PDFs. No lost details.</p><p class="paragraph" style="text-align:left;">Your team gets trusted, multimodal answers immediately.</p><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/87190feb-0aa5-455f-a23d-17aa9676a845/image.png?t=1764614913"/></div><h2 class="heading" style="text-align:left;" id="free-researchers-to-do-real-discove">Free Researchers to Do Real Discovery</h2><p class="paragraph" style="text-align:left;">By automating the extraction and integration of scientific data, L1 lets your researchers focus on:</p><ul><li><p class="paragraph" style="text-align:left;">identifying new drug targets</p></li><li><p class="paragraph" style="text-align:left;">generating promising molecules</p></li><li><p class="paragraph" style="text-align:left;">validating hypotheses</p></li><li><p class="paragraph" style="text-align:left;">accelerating discovery cycles</p></li></ul><p class="paragraph" style="text-align:left;">Instead of spending their time cleaning data.</p><h2 class="heading" style="text-align:left;" id="key-applications-for-empowering-the">Key Applications for empowering the Genesis Mission</h2><p id="activeloop-l-1-the-multimodal-scien" class="paragraph" style="text-align:left;">Activeloop L1’s main applications include:</p><p class="paragraph" style="text-align:left;"><b>Accelerating Biotechnology & Target Identification:</b> Aligned with the mission to “cure diseases,” multimodal AI correlates diverse data such as gene expression, protein interactions, and clinical outcomes to pinpoint viable drug targets faster than humanly possible.<br><b>Critical Materials & Energy Dominance:</b> Essential for “nuclear fission, fusion, and energy dominance.” The agent can explore vast chemical spaces to generate candidate structures for next-gen batteries or superalloys that satisfy conflicting properties like efficacy, safety, and thermal stability.<br><b>Semiconductors & Advanced Manufacturing:</b> Supporting the race for “global technology dominance.” By indexing fabrication diagrams and material properties from millions of papers, the agent can suggest process improvements and novel material compositions for microelectronics.</p><h2 class="heading" style="text-align:left;" id="faster-science-discovery-over-api">Faster Science Discovery over API</h2><p class="paragraph" style="text-align:left;">Multimodal scientific research involves using AI to analyze and integrate data from multiple, diverse sources or modalities to gain a more holistic and accurate understanding of diseases and potential treatments. </p><p class="paragraph" style="text-align:left;">Instead of relying on a text-based research paper, multimodal models combine information from all of them simultaneously. This approach mirrors how human experts synthesize knowledge from different sources.</p><p class="paragraph" style="text-align:left;">You can try the agent today via our OpenAI-compliant API:</p><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/f557d528-5e96-42bb-bb6c-100b1a3d3733/carbon__2_.png?t=1764636951"/></div><p class="paragraph" style="text-align:left;"><span style="font-family:"Proxima Nova", sans-serif;">You can get an API KEY by signing up and subscribing at </span><b><a class="link" href="https://chat.activeloop.ai?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=announcing-activeloop-s-scientific-discover" target="_blank" rel="noopener noreferrer nofollow">chat.activeloop.ai</a></b><span style="font-family:"Proxima Nova", sans-serif;"> and learn more about the usage in </span><b><a class="link" href="https://docs.activeloop.ai/setup/quickstart?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=announcing-activeloop-s-scientific-discover" target="_blank" rel="noopener noreferrer nofollow" style="color: rgb(255, 183, 70)">docs</a></b><span style="font-family:"Proxima Nova", sans-serif;">.</span></p><h2 class="heading" style="text-align:left;" id="ready-to-accelerate-discovery">Ready to Accelerate Discovery?</h2><p class="paragraph" style="text-align:left;">We are partnering with leading biotech and research teams to unlock the next generation of multimodal scientific innovation.</p><p class="paragraph" style="text-align:left;"><b>Try the Science Agent:</b> <a class="link" href="https://chat.activeloop.ai/science?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=announcing-activeloop-s-scientific-discover" target="_blank" rel="noopener noreferrer nofollow"><b>chat.activeloop.ai/science</b></a></p><div class="button" style="text-align:left;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="" href="https://chat.activeloop.ai/science?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=announcing-activeloop-s-scientific-discover"><span class="button__text" style=""> Get Started </span></a></div></div><div class='beehiiv__footer'><br class='beehiiv__footer__break'><hr class='beehiiv__footer__line'><a target="_blank" class="beehiiv__footer_link" style="text-align: center;" href="https://www.beehiiv.com/?utm_campaign=b72d6be9-e203-448b-9822-ac1b7a61b274&utm_medium=post_rss&utm_source=genai360_weekly_ai_news">Powered by beehiiv</a></div></div>
  ]]></content:encoded>
</item>

      <item>
  <title>Unlock AI Data Analysis: From Data Silos to Vibe Intelligence</title>
  <description>Your Analysts Spend 70% of Their Time Wrangling Data. </description>
      <enclosure url="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/88a04fc7-6b28-4a52-adb8-c2823a5ef52f/image__40_.jpg" length="105719" type="image/jpeg"/>
  <link>https://genai360.beehiiv.com/p/unlock-ai-data-analysis-from-data-silos-to-vibe-intelligence</link>
  <guid isPermaLink="true">https://genai360.beehiiv.com/p/unlock-ai-data-analysis-from-data-silos-to-vibe-intelligence</guid>
  <pubDate>Mon, 20 Oct 2025 14:33:37 +0000</pubDate>
  <atom:published>2025-10-20T14:33:37Z</atom:published>
    <dc:creator>Davit Buniatyan</dc:creator>
  <content:encoded><![CDATA[
    <div class='beehiiv'><style>
  .bh__table, .bh__table_header, .bh__table_cell { border: 1px solid #C0C0C0; }
  .bh__table_cell { padding: 5px; background-color: #FFFFFF; }
  .bh__table_cell p { color: #2D2D2D; font-family: 'Helvetica',Arial,sans-serif !important; overflow-wrap: break-word; }
  .bh__table_header { padding: 5px; background-color:#F1F1F1; }
  .bh__table_header p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important; overflow-wrap: break-word; }
</style><div class='beehiiv__body'><p class="paragraph" style="text-align:left;">I had dozens of conversations with leaders at large enterprises, and was surprised to find out their frustration is always the same. Almost all of them said the same thing over and over.</p><p class="paragraph" style="text-align:left;">“<span style="color:rgb(34, 34, 34);"><i>Our data is everywhere, but our insights are nowhere.</i></span>“</p><p class="paragraph" style="text-align:left;">They’re buried in manual data prep. They try to reconcile reports between systems that don’t talk to each other. They&#39;re fighting a losing battle against data chaos. The business suffers from slow, untrustworthy insights. </p><p class="paragraph" style="text-align:left;">What if an AI could do that soul-crushing 70% of the work? What if it could automate the integration bottleneck and free your team to actually be strategists?</p><p class="paragraph" style="text-align:left;">🚀 <b>Today, we&#39;re introducing Activeloop to unlock AI Data Analysis for GTM Operations.</b></p><iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen="true" class="youtube_embed" frameborder="0" height="100%" src="https://youtube.com/embed/4Ua3Llox2HE" width="100%"></iframe><h2 class="heading" style="text-align:left;" id="ai-analyst-that-vibes-intelligence">AI Analyst that Vibes Intelligence</h2><p class="paragraph" style="text-align:left;">This isn’t another dashboarding tool. It’s a new way to automate the single biggest bottleneck holding your business back. Activeloop functions as an AI Analyst for your team, automating the painful data harmonization and integration work that prevents them from doing high-impact analysis.</p><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/4e34c8a2-1343-4b0f-8a83-059aa304beaa/ezgif-7264f89e28e043.gif?t=1760423646"/></div><h3 class="heading" style="text-align:left;" id="from-weeks-to-seconds-deliver-justi">From Weeks to Seconds: Deliver “Just-in-Time” Intelligence</h3><p class="paragraph" style="text-align:left;">Imagine being in a forecast meeting and asking, <i>&quot;Why did this region&#39;s deal cycle lengthen last quarter?&quot;</i> and getting a trusted answer instantly. Activeloop empowers your entire organization with true self-service analytics, allowing you to infinitely drill down into your data without ever having to say, &quot;We&#39;ll get back to you.&quot;</p><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/8dd74aad-2408-4c6e-ab68-d55b79c1e380/showmerevenue.png?t=1760970726"/></div><h3 class="heading" style="text-align:left;" id="free-your-analysts-to-be-strategist">Free Your Analysts to Be Strategists</h3><p class="paragraph" style="text-align:left;">By automating the integration bottleneck, we turn your operations team from &quot;report builders&quot; into the strategic engine your business needs. They can finally focus on uncovering revenue opportunities, optimizing sales processes, and driving GTM strategy, confident that the underlying data is unified and trusted.</p><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/fe7becfe-3e51-4e2f-b562-e7fd03e7e5e7/Screenshot_2025-10-13_at_11.02.25_PM.png?t=1760423740"/></div><h3 class="heading" style="text-align:left;" id="for-our-technical-audience-the-infr">For Our Technical Audience: The Infrastructure Powering the AI Analyst</h3><p class="paragraph" style="text-align:left;">The magic behind the AI Analyst is built on the robust, multimodal data infrastructure you know from Activeloop. We connect to fragmented sources (Salesforce, SAP, Snowflake, even unstructured contracts in SharePoint) and use <b>Deep Lake</b> to create a unified, AI-ready data foundation. This allows our reasoning models to query harmonized structured and unstructured data in real time, all accessible via our API for building custom automations.</p><hr class="content_break"><p class="paragraph" style="text-align:left;"><b>Ready to End the Data Chaos?</b></p><p class="paragraph" style="text-align:left;">We are now partnering with a select group of enterprises to solve their most complex data challenges.</p><p class="paragraph" style="text-align:left;">If you’re a leader who is tired of seeing your team buried in manual data work and ready to unlock the strategic potential of your organization, we want to talk to you.</p><p class="paragraph" style="text-align:left;"><b>👉 Request a Personalized Demo</b> <a class="link" href="https://www.activeloop.ai/sales/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=unlock-ai-data-analysis-from-data-silos-to-vibe-intelligence" target="_blank" rel="noopener noreferrer nofollow">https://www.activeloop.ai/sales/</a></p></div><div class='beehiiv__footer'><br class='beehiiv__footer__break'><hr class='beehiiv__footer__line'><a target="_blank" class="beehiiv__footer_link" style="text-align: center;" href="https://www.beehiiv.com/?utm_campaign=4b458b6b-cd98-45e8-90b9-6c23e2cf027e&utm_medium=post_rss&utm_source=genai360_weekly_ai_news">Powered by beehiiv</a></div></div>
  ]]></content:encoded>
</item>

      <item>
  <title>Introducing Multimodal Healthcare AI Systems Free Course</title>
  <description>We’re introducing our latest free course as part of our Gen AI 360 series: Building Multimodal Healthcare AI Systems with Deep Lake.</description>
      <enclosure url="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/fcfe13f0-dac1-4f8f-98f0-8edadc714931/image__32_.png" length="507341" type="image/png"/>
  <link>https://genai360.beehiiv.com/p/introducing-multimodal-healthcare-ai-systems-free-course</link>
  <guid isPermaLink="true">https://genai360.beehiiv.com/p/introducing-multimodal-healthcare-ai-systems-free-course</guid>
  <pubDate>Mon, 29 Sep 2025 14:09:10 +0000</pubDate>
  <atom:published>2025-09-29T14:09:10Z</atom:published>
    <dc:creator>Davit Buniatyan</dc:creator>
  <content:encoded><![CDATA[
    <div class='beehiiv'><style>
  .bh__table, .bh__table_header, .bh__table_cell { border: 1px solid #C0C0C0; }
  .bh__table_cell { padding: 5px; background-color: #FFFFFF; }
  .bh__table_cell p { color: #2D2D2D; font-family: 'Helvetica',Arial,sans-serif !important; overflow-wrap: break-word; }
  .bh__table_header { padding: 5px; background-color:#F1F1F1; }
  .bh__table_header p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important; overflow-wrap: break-word; }
</style><div class='beehiiv__body'><p class="paragraph" style="text-align:left;"></p><div class="button" style="text-align:center;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="" href="https://learn.activeloop.ai/courses/building-multimodal-healthcare-ai?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=introducing-multimodal-healthcare-ai-systems-free-course"><span class="button__text" style=""> Start Learning Now </span></a></div><p class="paragraph" style="text-align:left;">Healthcare AI is developing rapidly, and multimodal data is at the core of this development. Deep Lake 4.0 allows you to integrate multimodal biomedical datasets, consisting of images, text, literature, et cetera, with powerful generative AI tools, yielding research benefits. We’ve partnered with Bayer Radiology, Intel Corporation, and Amazon Web Services to bring you this course, which covers:</p><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/fcfe13f0-dac1-4f8f-98f0-8edadc714931/image__32_.png?t=1759124495"/><div class="image__source"><span class="image__source_text"><p>Building Multimodal Healthcare AI Systems with Deep Lake</p></span></div></div><h3 class="heading" style="text-align:left;" id="1-standardizing-multimodal-data-in-">1. Standardizing multimodal data in healthcare AI with Croissant and Deep Lake</h3><p class="paragraph" style="text-align:left;">The Croissant file format is a metadata-rich standard, particularly valuable for managing biomedical datasets that combine images with clinical or experimental context. In this chapter, you’ll learn how to load biomedical data in Croissant, store them in Deep Lake, and apply version control through branching and merging, ensuring reproducibility and collaborative progress in biomedical research workflows.</p><h3 class="heading" style="text-align:left;" id="2-powerful-ai-search-on-a-natural-l">2. Powerful AI search on a natural language corpus of adverse drug effects with Deep Lake</h3><p class="paragraph" style="text-align:left;">Pharmaceutical datasets are complex, with unstructured clinical notes, social media posts, and scientific literature. In this chapter, you’ll learn how to process the ADE Corpus V2 into a query-ready format using Deep Lake, including loading the corpus, converting it into a Deep Lake dataset, and training a TinyLlama LoRA model, enabling powerful AI search across diverse biomedical text.</p><h3 class="heading" style="text-align:left;" id="3-automated-ll-mpowered-labeling-of">3. Automated LLM-powered labeling of radiology image datasets</h3><p class="paragraph" style="text-align:left;">Radiology datasets are often underutilized due to limited descriptive metadata. In this chapter, you’ll learn how to use LLMs with Deep Lake to generate natural language labels for images, transforming them into multimodal, searchable datasets.</p><h3 class="heading" style="text-align:left;" id="4-a-ipowered-biomedical-literature-">4. AI-powered biomedical literature review</h3><p class="paragraph" style="text-align:left;">The growing volume of biomedical literature makes comprehensive analysis difficult and time-consuming. In this chapter, you’ll learn how to leverage <b>Activeloop’s L0 reasoning model</b> to perform AI-powered literature reviews on multimodal scientific papers. This approach enables researchers to extract insights and answer complex questions efficiently.</p><h3 class="heading" style="text-align:left;" id="5-unified-multimodal-search-for-rap">5. Unified multimodal search for rapid drug discovery</h3><p class="paragraph" style="text-align:left;">Drug discovery is often slowed by siloed, multimodal datasets spanning literature, assays, sequences, and imaging. In this chapter, you’ll learn how to unify and index diverse biomedical data on AWS Sagemaker Lakehouse with Deep Lake, enabling AI Search across textual, numerical, and visual information. Researchers can rapidly shortlist promising drug candidates using lexical, semantic, and visual queries, dramatically accelerating the path from data to actionable insights.</p><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/4a53f15d-a090-4f6c-abe6-0fefa7266856/image__33_.png?t=1759124528"/><div class="image__source"><span class="image__source_text"><p>Drug Discovery AI Search on AWS Sagemaker Lakehouse</p></span></div></div><p class="paragraph" style="text-align:left;">Authored by Darsh Mandera (Activeloop) and Steffen Vogler (Bayer Radiology).</p><p class="paragraph" style="text-align:left;">Special thanks to Vitor Freitas (AWS), Gitika Vijh (AWS), Susan Marquez (Intel Corporation) and Arijit Bandyopadhyay (Intel Corporation).</p><h3 class="heading" style="text-align:left;" id="big-upgrade-to-all-courses">Big Upgrade to All Courses</h3><p class="paragraph" style="text-align:left;">Additionally, we’ve also upgraded all of our previously released courses on <a class="link" href="https://learn.activeloop.ai?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=introducing-multimodal-healthcare-ai-systems-free-course" target="_blank" rel="noopener noreferrer nofollow">learn.activeloop.ai</a> to run on Deep Lake 4.0, the latest version of Deep Lake, as well as the latest versions of key artificial intelligence libraries like LlamaIndex and LangChain, providing a modernized learning experience.</p><div class="button" style="text-align:center;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="" href="https://learn.activeloop.ai/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=introducing-multimodal-healthcare-ai-systems-free-course"><span class="button__text" style=""> Visit Our Upgraded Learning Platform </span></a></div></div><div class='beehiiv__footer'><br class='beehiiv__footer__break'><hr class='beehiiv__footer__line'><a target="_blank" class="beehiiv__footer_link" style="text-align: center;" href="https://www.beehiiv.com/?utm_campaign=7f3e7a66-2cb8-4558-9615-13ff37749ec0&utm_medium=post_rss&utm_source=genai360_weekly_ai_news">Powered by beehiiv</a></div></div>
  ]]></content:encoded>
</item>

      <item>
  <title>Activeloop-L0: State-of-the-Art RAG Accuracy on Your Data</title>
  <description>Turn PDFs, images &amp; tables into instant, cited answers</description>
      <enclosure url="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/93326520-4845-41dd-8480-49e617a2499e/Activeloop_L0_-_Benchmark__1_.png" length="125133" type="image/png"/>
  <link>https://genai360.beehiiv.com/p/activeloop-l0-state-of-the-art-rag-accuracy-on-your-data</link>
  <guid isPermaLink="true">https://genai360.beehiiv.com/p/activeloop-l0-state-of-the-art-rag-accuracy-on-your-data</guid>
  <pubDate>Tue, 13 May 2025 11:05:00 +0000</pubDate>
  <atom:published>2025-05-13T11:05:00Z</atom:published>
    <dc:creator>Davit Buniatyan</dc:creator>
  <content:encoded><![CDATA[
    <div class='beehiiv'><style>
  .bh__table, .bh__table_header, .bh__table_cell { border: 1px solid #C0C0C0; }
  .bh__table_cell { padding: 5px; background-color: #FFFFFF; }
  .bh__table_cell p { color: #2D2D2D; font-family: 'Helvetica',Arial,sans-serif !important; overflow-wrap: break-word; }
  .bh__table_header { padding: 5px; background-color:#F1F1F1; }
  .bh__table_header p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important; overflow-wrap: break-word; }
</style><div class='beehiiv__body'><p class="paragraph" style="text-align:left;">While working on <a class="link" href="https://github.com/activeloopai/deeplake?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=activeloop-l0-state-of-the-art-rag-accuracy-on-your-data" target="_blank" rel="noopener noreferrer nofollow" style="color: inherit">Deep Lake</a>, I have seen many RAG systems collapse when exposed to production-scale corporate data. They often rely on predefined loops, custom logic and rigid agent scaffolds. Activeloop-L0 provides your agent with highly precise and answers grounded on your multimodal data. </p><h2 class="heading" style="text-align:left;" id="but-wait-is-rag-still-relevant-desp"><b>Why can’t we reliably analyze corporate documents</b>? </h2><ul><li><p class="paragraph" style="text-align:left;"><b>Architectural hurdles</b>: messy data integrations, unexpected infra costs, and reliability/safety constraints.</p></li><li><p class="paragraph" style="text-align:left;"><b>Commodity RAG</b> lacks depth for multimodal enterprise data (documents, images, audio).</p></li><li><p class="paragraph" style="text-align:left;"><b>Infrastructure burden</b>: parsing, chunking, embeddings, indexing, vector DBs, and agent loops slow teams.</p></li></ul><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/da113d21-2180-4da1-9598-eeeaaf2d3f58/Screenshot_2025-05-08_at_5.25.56_PM.png?t=1746750360"/></div><h2 class="heading" style="text-align:left;" id="but-wait-is-rag-still-relevant-desp">But wait, is RAG still relevant despite large context models? </h2><p class="paragraph" style="text-align:left;">Let’s consider four extensive NASA documents [<a class="link" href="https://www.nasa.gov/wp-content/uploads/2022/03/sls-reference-guide-2022-v2-508-0.pdf?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=activeloop-l0-state-of-the-art-rag-accuracy-on-your-data" target="_blank" rel="noopener noreferrer nofollow">1</a>, <a class="link" href="https://www.nasa.gov/wp-content/uploads/2023/02/orion-reference-guide-111022.pdf?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=activeloop-l0-state-of-the-art-rag-accuracy-on-your-data" target="_blank" rel="noopener noreferrer nofollow">2</a>, <a class="link" href="https://www.lpi.usra.edu/lunar/artemis/Artemis-I-Reference-Guide_NP-2022-03-3045-HQ.pdf?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=activeloop-l0-state-of-the-art-rag-accuracy-on-your-data" target="_blank" rel="noopener noreferrer nofollow">3</a>, <a class="link" href="https://www.ulalaunch.com/docs/default-source/rockets/2023_vulcan_user_guide.pdf?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=activeloop-l0-state-of-the-art-rag-accuracy-on-your-data" target="_blank" rel="noopener noreferrer nofollow">4</a>], each between 80 to 100 pages, containing visual descriptions, and pose a highly complex question.</p><p class="paragraph" style="text-align:left;">ChatGPT with o3, despite having full PDFs in context, failed after 11 minutes of reasoning. Now, imagine you have thousands of corporate documents that can’t be contained in a context. In contrast, Activeloop-L0 provided the correct answer in 4 minutes and can scale to a million documents.</p><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/89187f47-215e-48a5-a9eb-cc8cb85828c9/image__8_.png?t=1746747333"/></div><h2 class="heading" style="text-align:left;" id="what-is-activeloop-l-0">What is Activeloop-L0?</h2><p class="paragraph" style="text-align:left;"><b>Activeloop-L0</b> is a compound AI system that ingests your unstructured data and returns grounded answers. Behind the scenes, <a class="link" href="https://github.com/activeloopai/deeplake?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=activeloop-l0-state-of-the-art-rag-accuracy-on-your-data" target="_blank" rel="noopener noreferrer nofollow">Deep Lake</a> indexes neural representations at scale, then fuses “thinking tokens” with high-precision retrieval for fast multi-hop reasoning.</p><p class="paragraph" style="text-align:left;">It is available on <b><a class="link" href="https://chat.activeloop.ai?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=activeloop-l0-state-of-the-art-rag-accuracy-on-your-data" target="_blank" rel="noopener noreferrer nofollow">chat.activeloop.ai</a></b> now. </p><div class="button" style="text-align:center;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="background-color:#ff8a00;" href="https://www.ycombinator.com/launches/NUM-activeloop-l0-state-of-the-art-rag-accuracy-on-your-data?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=activeloop-l0-state-of-the-art-rag-accuracy-on-your-data"><span class="button__text" style="color:#F9FAFB;"><b>Upvote Launch YC</b></span></a></div><h3 class="heading" style="text-align:left;" id="how-is-it-different-compared-to-a-t">How is it different compared to a traditional RAG?</h3><ul><li><p class="paragraph" style="text-align:left;"><b>Multimodal:</b> Built-in support for images, PDFs, audio, and spreadsheets.</p></li><li><p class="paragraph" style="text-align:left;"><b>Integrated Reasoning & Retrieval:</b> Eliminates the need for loops.</p></li><li><p class="paragraph" style="text-align:left;"><b>Deep Indexing:</b> Cost-effective multi-layer indexing for richer context early on.</p></li><li><p class="paragraph" style="text-align:left;"><b>Simple:</b> Focus on innovation, not maintaining infrastructure.</p></li><li><p class="paragraph" style="text-align:left;"><b>Grounded and Accurate:</b> Clear citations for trustworthy insights.</p></li></ul><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/951ee900-7e2b-4021-8df1-d4a0481f051d/image.png?t=1747034419"/></div><h3 class="heading" style="text-align:left;" id="how-is-it-different-compared-to-a-t">How accurate is Activeloop-L0?</h3><p class="paragraph" style="text-align:left;">Activeloop-L0 achieves overall 85.6% state-of-the-art accuracy on 1,142 multimodal questions (292 PDFs, 5.5K pages). It outperforms text-only RAG by +20%, visual RAG by +10%, and Alibaba’s ViDoRAG by +6% on their own ViDoSeek benchmark.</p><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/93326520-4845-41dd-8480-49e617a2499e/Activeloop_L0_-_Benchmark__1_.png?t=1747033556"/></div><h3 class="heading" style="text-align:left;" id="how-is-it-different-compared-to-a-t">Is there an OpenAI-compliant API? </h3><p class="paragraph" style="text-align:left;">Yes, Activeloop-L0 is available with an OpenAI-compliant API. You can easily plug into your agents for providing highly relevant context. You can get started here. <a class="link" href="https://docs.activeloop.ai/setup/quickstart?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=activeloop-l0-state-of-the-art-rag-accuracy-on-your-data" target="_blank" rel="noopener noreferrer nofollow">https://docs.activeloop.ai/setup/quickstart</a></p><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/061280ce-09a0-4ea5-93ed-d5980ba8a10f/image__9_.png?t=1746747559"/></div><h3 class="heading" style="text-align:left;" id="read-to-deploy-on-your-data">Read to Deploy on Your Data?</h3><p class="paragraph" style="text-align:left;">Activeloop is trusted by F500 including the likes of <b><a class="link" href="https://www.activeloop.ai/usecase/bayer/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=activeloop-l0-state-of-the-art-rag-accuracy-on-your-data" target="_blank" rel="noopener noreferrer nofollow">Bayer</a></b>, <b><a class="link" href="https://www.activeloop.ai/usecase/flagshippioneering/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=activeloop-l0-state-of-the-art-rag-accuracy-on-your-data" target="_blank" rel="noopener noreferrer nofollow">Flagship Pioneering</a></b>, <a class="link" href="https://www.activeloop.ai/usecase/matterport/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=activeloop-l0-state-of-the-art-rag-accuracy-on-your-data" target="_blank" rel="noopener noreferrer nofollow">Matterport</a>.</p><ul><li><p class="paragraph" style="text-align:left;"><b>Your Cloud</b>: Deploy on your cloud, ensuring data never leaves your infrastructure.</p></li><li><p class="paragraph" style="text-align:left;"><b>Your Models</b>: Integrate your LLMs.</p></li><li><p class="paragraph" style="text-align:left;"><b>Your Security</b>: SOC2 compliance, fine-grained access control, and SSO.</p></li></ul><div class="button" style="text-align:center;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="background-color:#ff8a00;" href="https://www.activeloop.ai/contact/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=activeloop-l0-state-of-the-art-rag-accuracy-on-your-data"><span class="button__text" style=""> Book a Call with Us </span></a></div></div><div class='beehiiv__footer'><br class='beehiiv__footer__break'><hr class='beehiiv__footer__line'><a target="_blank" class="beehiiv__footer_link" style="text-align: center;" href="https://www.beehiiv.com/?utm_campaign=a52fe738-6a67-4466-83cb-b4d4083d73a8&utm_medium=post_rss&utm_source=genai360_weekly_ai_news">Powered by beehiiv</a></div></div>
  ]]></content:encoded>
</item>

      <item>
  <title>Introducing AI Knowledge Agent</title>
  <description>Deep Research on Your Multi-Modal Data</description>
      <enclosure url="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/dd31995a-5964-48ab-941e-69d383c7ba77/Frame_2117131215__1_.png" length="445705" type="image/png"/>
  <link>https://genai360.beehiiv.com/p/introducing-ai-knowledge-agent</link>
  <guid isPermaLink="true">https://genai360.beehiiv.com/p/introducing-ai-knowledge-agent</guid>
  <pubDate>Tue, 25 Feb 2025 08:01:37 +0000</pubDate>
  <atom:published>2025-02-25T08:01:37Z</atom:published>
    <category><![CDATA[Activeloop]]></category>
  <content:encoded><![CDATA[
    <div class='beehiiv'><style>
  .bh__table, .bh__table_header, .bh__table_cell { border: 1px solid #C0C0C0; }
  .bh__table_cell { padding: 5px; background-color: #FFFFFF; }
  .bh__table_cell p { color: #2D2D2D; font-family: 'Helvetica',Arial,sans-serif !important; overflow-wrap: break-word; }
  .bh__table_header { padding: 5px; background-color:#F1F1F1; }
  .bh__table_header p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important; overflow-wrap: break-word; }
</style><div class='beehiiv__body'><p class="paragraph" style="text-align:left;">We&#39;re introducing Deep Lake AI Knowledge agent, our take on AI search that can do deep research not only on public but also private data—no matter its size or modality!</p><p class="paragraph" style="text-align:left;">Scroll to learn more about it, or try out our quick interactive demo (it takes just 20 seconds).</p><div class="button" style="text-align:center;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="background-color:#ff8a00;" href="https://app.arcade.software/share/QnHWPDFT9HYHwDhnIPVt?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=introducing-ai-knowledge-agent"><span class="button__text" style="color:#F9FAFB;"> Try the Demo </span></a></div><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/e8dc94a9-5cfa-4598-a3f5-0f2ac331b380/Frame_2117131203__1_.png?t=1740467379"/><div class="image__source"><span class="image__source_text"><p>Ingest and search data with vision-language models</p></span></div></div><p class="paragraph" style="text-align:left;">Deep Lake supports multi-modal retrieval from the ground up. It uses vision language models for data ingestion and retrieval so that you can connect any data (PDFs, images, videos, structured data, etc.). You can even combine different types of data and query them together.</p><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/b268312a-1469-43f9-8866-4b9e4d7b2ed4/multimodal_research__5_.png?t=1740467244"/><div class="image__source"><span class="image__source_text"><p>Query planning and deep reasoning</p></span></div></div><p class="paragraph" style="text-align:left;">Deep Lake deconstructs the query, understanding where to find the answer to your question, and does just that!</p><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/11645dc1-3048-46cc-99a8-018d3d23c3b2/connect_any_source__6_.png?t=1740467187"/><div class="image__source"><span class="image__source_text"><p>Connect any source (GCP, Azure, S3, Dropbox)</p></span></div></div><p class="paragraph" style="text-align:left;">Contrary to any other provider on the market, Deep Lake&#39;s Deep Research works on any data from S3, Dropbox, and GCP. It also learns from your queries over time, making the results as relevant to your work as possible!</p><p class="paragraph" style="text-align:left;">We&#39;ve just launched on ProductHunt, so would appreciate your support - you can join the discussion by clicking the link below!</p><div class="button" style="text-align:center;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="background-color:#ff8a00;" href="https://www.producthunt.com/posts/deep-lake-ai-knowledge-agent?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=introducing-ai-knowledge-agent"><span class="button__text" style="color:#F9FAFB;"> Support on ProductHunt </span></a></div><p class="paragraph" style="text-align:left;">Many thanks!</p><p class="paragraph" style="text-align:left;"></p></div><div class='beehiiv__footer'><br class='beehiiv__footer__break'><hr class='beehiiv__footer__line'><a target="_blank" class="beehiiv__footer_link" style="text-align: center;" href="https://www.beehiiv.com/?utm_campaign=8c4a77e4-4a2f-4a51-bbd9-533ce65a72c3&utm_medium=post_rss&utm_source=genai360_weekly_ai_news">Powered by beehiiv</a></div></div>
  ]]></content:encoded>
</item>

      <item>
  <title>OpenAI&#39;s New AI Agent, Forge API Launches, FrontierMath Benchmark</title>
  <description>Plus, Google releases software code for AlphaFold 3 </description>
  <link>https://genai360.beehiiv.com/p/openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark</link>
  <guid isPermaLink="true">https://genai360.beehiiv.com/p/openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark</guid>
  <pubDate>Tue, 25 Feb 2025 05:46:18 +0000</pubDate>
  <atom:published>2025-02-25T05:46:18Z</atom:published>
  <content:encoded><![CDATA[
    <div class='beehiiv'><style>
  .bh__table, .bh__table_header, .bh__table_cell { border: 1px solid #C0C0C0; }
  .bh__table_cell { padding: 5px; background-color: #FFFFFF; }
  .bh__table_cell p { color: #2D2D2D; font-family: 'Helvetica',Arial,sans-serif !important; overflow-wrap: break-word; }
  .bh__table_header { padding: 5px; background-color:#F1F1F1; }
  .bh__table_header p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important; overflow-wrap: break-word; }
</style><div class='beehiiv__body'><h2 class="heading" style="text-align:left;" id="key-takeaways">Key Takeaways</h2><ul><li><p class="paragraph" style="text-align:left;">OpenAI&#39;s &quot;<b>Operator,&quot;</b> an AI agent expected to release in January 2025, aims to execute tasks directly on users&#39; computers, competing with Anthropic and Google in consumer-facing AI tools.</p></li><li><p class="paragraph" style="text-align:left;">Nous Research’s Forge API introduces <b>reasoning enhancements</b> via Monte Carlo Tree Search, Chain of Code, and a model mixture strategy, outperforming competitors in math reasoning benchmarks.</p></li><li><p class="paragraph" style="text-align:left;"><b>Qwen 2.5 </b>demonstrated SOTA coding capabilities, matching GPT-4o’s performance across multiple programming languages and benchmarks.</p></li><li><p class="paragraph" style="text-align:left;"><b>FrontierMath </b>evaluates AI systems on challenging mathematical reasoning, with experts confirming its rigor and AI’s limited ability to solve complex problems.</p></li><li><p class="paragraph" style="text-align:left;">Meta’s <b>Watermark Anything Model</b> redefined watermarking as segmentation, achieving over 85% accuracy in detecting watermarked areas and resilience to manipulations like splicing.</p></li><li><p class="paragraph" style="text-align:left;"><b>CDXFormer</b> improved spatial-temporal context analysis in satellite images using XLSTM, achieving SOTA performance across benchmarks with reduced computational costs.</p></li></ul><p class="paragraph" style="text-align:left;"><i>Got forwarded this newsletter? Subscribe below👇</i></p><div class="button" style="text-align:left;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="background-color:#ff8a00;" href="https://genai360.beehiiv.com/subscribe?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark"><span class="button__text" style=""> Subscribe </span></a></div><h2 class="heading" style="text-align:left;" id="the-latest-ai-news">The Latest AI News</h2><p class="paragraph" style="text-align:left;">It’s been a little while, but there were plenty of releases last week. OpenAI isn’t showing any signs of slowing down and are discussing the release of an <b>AI agent</b> in January 2025. Sutskever brought up some interesting results from scaling-up pre-training training as well. We also saw a bunch of cool models from Google with unique applications, including flood forecasting and predicting molecular structures. </p><p class="paragraph" style="text-align:left;">The release of <b>Qwen 2.5</b> was surprising as well with how well it performed on various benchmarks, along with a new math benchmark that received praise from some of the most well-known mathematicians in the world.</p><h3 class="heading" style="text-align:left;" id="open-ai-and-rivals-pivot-from-scali">OpenAI and Rivals Pivot from Scaling to Advanced Reasoning</h3><p class="paragraph" style="text-align:left;">We’re seeing a shift in the AI industry’s approach to developing LLMs. Companies like <a class="link" href="https://www.reuters.com/technology/artificial-intelligence/openai-rivals-seek-new-path-smarter-ai-current-methods-hit-limitations-2024-11-11/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark" target="_blank" rel="noopener noreferrer nofollow">OpenAI</a> are moving away from the &quot;<b>bigger is bette</b>r&quot; philosophy of simply scaling up models with more data and computing power. Instead, they&#39;re exploring more sophisticated techniques that mimic human-like reasoning.</p><p class="paragraph" style="text-align:left;">OpenAI&#39;s recently released o1 model is a perfect example of how we’re moving toward this new direction. It uses &quot;<b>test-time compute</b>,&quot; a technique that enhances AI models during the inference phase. This method lets models generate and evaluate multiple possibilities in real-time, dedicating more processing power to challenging tasks that require complex reasoning.</p><p class="paragraph" style="text-align:left;">Ilya Sutskever, co-founder of AI labs Safe Superintelligence (SSI) and OpenAI, mentioned results from <b>scaling up</b> pre-training have plateaued, marking a transition from &quot;the age of scaling&quot; to &quot;the age of wonder and discovery.&quot;</p><p class="paragraph" style="text-align:left;">Some effects of this shift could include:</p><ul><li><p class="paragraph" style="text-align:left;">It may reshape the AI arms race, with companies focusing on developing more efficient reasoning techniques rather than just increasing model size.</p></li><li><p class="paragraph" style="text-align:left;">The demand for computational resources could change, which might affect companies like Nvidia that have dominated the AI chip market.</p></li><li><p class="paragraph" style="text-align:left;">It could lead to more distributed, cloud-based servers for inference.</p></li></ul><p class="paragraph" style="text-align:left;">Other major AI labs, including Anthropic, xAI, and Google DeepMind, are reportedly working on their own versions of these <b>advanced reasoning techniques.</b> This means we might be in a time where innovation in model architecture and training methods may become as crucial as raw computational power.</p><h3 class="heading" style="text-align:left;" id="open-ai-prepares-operator-launch-an">OpenAI Prepares &#39;Operator&#39; Launch and Shares LLM Optimization Strategies</h3><p class="paragraph" style="text-align:left;">Previously, OpenAI released a lightweight library called <a class="link" href="https://genai360.beehiiv.com/p/of-new-benchmarks?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark" target="_blank" rel="noopener noreferrer nofollow">Swarm</a>. OpenAI has now revealed plans to release &quot;<a class="link" href="https://techcrunch.com/2024/11/13/openais-take-on-ai-agents-could-come-in-january/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark" target="_blank" rel="noopener noreferrer nofollow">Operator,</a>&quot; an upcoming<b> AI agent tool</b>, which is set for release as early as January 2025, with plans to initially offer it as a research preview via the developer API. The tool is designed to execute tasks directly on users&#39; computers.</p><p class="paragraph" style="text-align:left;">Operator is expected to compete with Anthropic&#39;s &quot;<b>Computer Use</b>&quot; feature and Google&#39;s rumored consumer-focused agent, potentially offering general-purpose capabilities in web browsers. As of right now, details on Operator&#39;s unique advantages aren’t entirely clea, although it aims to simplify task execution.</p><p class="paragraph" style="text-align:left;">OpenAI also released a post detailing how to <a class="link" href="https://platform.openai.com/docs/guides/optimizing-llm-accuracy/understanding-the-tools?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark" target="_blank" rel="noopener noreferrer nofollow">optimize LLMs</a>, since there’s issues due to varying requirements for accuracy, method selection, and production-readiness, requiring a structured approach tailored to <b>specific use cases.</b></p><p class="paragraph" style="text-align:left;">They recommended to begin with prompt engineering for simple tasks, then progress to RAG for dynamic context and fine-tuning for consistent behavior and task-specific accuracy.</p><p class="paragraph" style="text-align:left;">Afterwards, they mention you should establish an evaluation set to diagnose failures and iteratively refine optimization methods, ensuring tools like RAG or fine-tuning are applied when <b>prompt engineering alone falls short.</b></p><p class="paragraph" style="text-align:left;">On the legal side of things, A New York judge dismissed RawStory&#39;s lawsuit a<a class="link" href="https://www.linkedin.com/feed/update/urn:li:activity:7260734796497068032/?updateEntityUrn=urn%3Ali%3Afs_updateV2%3A%28urn%3Ali%3Aactivity%3A7260734796497068032%2CFEED_DETAIL%2CEMPTY%2CDEFAULT%2Cfalse%29&utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark" target="_blank" rel="noopener noreferrer nofollow">gainst OpenAI,</a> which might set a precedent for future cases involving AI training on copyrighted data. The dismissal was primarily based on technical grounds, though the judge&#39;s reasoning touched on broader issues related to <b>AI and copyright.</b></p><p class="paragraph" style="text-align:left;">The court found that authors aren’t plagiarized during AI training because the process involves a broad set of data and doesn’t result in <b>verbatim copying.</b></p><h3 class="heading" style="text-align:left;" id="forge-ap-is-reasoning-power-and-co-">Forge API&#39;s Reasoning Power and CoPilot Arena&#39;s Coding Leaderboard</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXcra_jv5YCcIPaZN4OAL6ThwcOurcGvtnhhjeEIiaKNd8VobkSgc8XsKMYS8K0Q1NzLi4UbvZXpR50u3cfVy_qCND4h4GABmkBbe75sfqU1KnHmNDqXGxR2XeTdRCoqi5sBoNvK?key=vox0MMs9NtX7u4rAwVZjGEXL"/><div class="image__source"><span class="image__source_text"><p>Hermes 3 70B is able to perform well on various reasoning benchmarks. <a class="link" href="https://nousresearch.com/introducing-the-forge-reasoning-api-beta-and-nous-chat-an-evolution-in-llm-inference/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Nous Research is launching<a class="link" href="https://nousresearch.com/introducing-the-forge-reasoning-api-beta-and-nous-chat-an-evolution-in-llm-inference/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark" target="_blank" rel="noopener noreferrer nofollow"> two new projects</a>: the Forge Reasoning API Beta and Nous Chat. Nous Chat is a simple chat platform featuring the<b> Hermes 3 70B language model, </b>which is available for free.</p><p class="paragraph" style="text-align:left;">The Forge Reasoning API Beta is being released to a select group of users, focusing on testing the architecture of their reasoning system. It allows users to enhance any popular model with a code interpreter and<b> advanced reasoning capabilities.</b></p><p class="paragraph" style="text-align:left;">Moreover, the API<b> supports multiple models </b>(Hermes 3, Claude Sonnet 3.5, Gemini, GPT-4) and allows users to combine models for enhanced output diversity.</p><p class="paragraph" style="text-align:left;">Here’s what’s involved in terms of reasoning layer architectures:</p><ul><li><p class="paragraph" style="text-align:left;"><b>MCTS (Monte Carlo Tree Search)</b>: Iteratively builds decision trees for planning problems through selection, expansion, simulation, and backpropagation. </p></li><li><p class="paragraph" style="text-align:left;"><b>CoC (Chain of Code): </b>Connects reasoning steps to a code interpreter, improving code and math capabilities. </p></li><li><p class="paragraph" style="text-align:left;"><b>MoA (Mixture of Agents):</b> Allows multiple models to collaborate on a query, providing more complete and diverse outputs.</p></li></ul><p class="paragraph" style="text-align:left;">Nous Research claims Hermes 70B augmented with Forge <b>outperforms</b> larger models from Google, OpenAI, and Anthropic in reasoning benchmarks. Specifically, they mentioned superior performance in the AIME evaluation, which focuses on competition-grade math questions.</p><p class="paragraph" style="text-align:left;">Ever wondered what model would take the first place spot for coding? <a class="link" href="https://blog.lmarena.ai/blog/2024/copilot-arena/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark" target="_blank" rel="noopener noreferrer nofollow">Imarena</a> released a code completions leaderboard using data from the<b> previous month </b>to answer that question.</p><p class="paragraph" style="text-align:left;"><b>Nine popular models</b> were evaluated, including open-source, code-specific, and commercial models. Top performers were DeepSeek V2.5 and Claude Sonnet 3.5, with Elo ratings of 1074 and 1053 respectively. The evaluation process included randomization of model pairings and positions, and standardized parameters for fair comparison.</p><p class="paragraph" style="text-align:left;">Moreover, a free AI coding assistant called Copilot Arena was launched recently, providing paired responses from different state-of-the-art LLMs. It has been downloaded 2,500 times on the VSCode Marketplace, served over 100,000 completions, and accumulated over 10,000 code completion battles. The tool offers paired <b>code completions </b>and inline editing features.</p><p class="paragraph" style="text-align:left;">A novel prompting technique was also developed to enable chat models to perform code completions, especially for &quot;fill-in-the-middle&quot; (FiM) tasks. The method involves<b> generating code snippets </b>and post-processing them, rather than forcing models to output in FiM format directly. This approach drastically reduced formatting errors across various models.</p><h3 class="heading" style="text-align:left;" id="tackling-climate-protein-and-math-c">Tackling Climate, Protein and Math Challenges</h3><p class="paragraph" style="text-align:left;">Google continues to work on unique AI applications, previously introducing a <a class="link" href="https://genai360.beehiiv.com/p/of-whales-and-strawberries?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark" target="_blank" rel="noopener noreferrer nofollow">whale bioacoustics model.</a> Now, they’ve developed a new <a class="link" href="https://research.google/blog/a-flood-forecasting-ai-model-trained-and-evaluated-globally/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark" target="_blank" rel="noopener noreferrer nofollow">flood forecasting model </a>with improved reliability and coverage. The model provides a <b>7-day lead time reliability</b> comparable to current best available nowcasts. It expands coverage to 100 countries with verified data and up to 150 countries with data based on virtual gauges.</p><p class="paragraph" style="text-align:left;"><br>The improved model quality and new evaluation approach have expanded coverage. Google Flood Hub now reaches users in <b>over 100 countries</b>, up from 80. This expansion enables Google to provide critical flood information to 700 million people worldwide, up from 460 million previously. </p><p class="paragraph" style="text-align:left;">The model incorporates DeepMind&#39;s medium-range global weather forecasting model as input. Training data has been<b> increased </b>from 5,680 gauges to nearly 16,000 gauges. The LSTM-based architecture has been improved to better combine different weather products and increase robustness to missing data.</p><p class="paragraph" style="text-align:left;">We saw an open-source implementation of AlphaFold 3 called <a class="link" href="https://genai360.beehiiv.com/p/of-llamas-and-proteins?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark" target="_blank" rel="noopener noreferrer nofollow">LIGO</a> a couple months back. Google DeepMind has now released the software code for <a class="link" href="https://research.google/blog/a-flood-forecasting-ai-model-trained-and-evaluated-globally/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark" target="_blank" rel="noopener noreferrer nofollow">AlphaFold 3</a>, allowing non-commercial use. This comes <b>six months </b>after initially withholding the code, which drew criticism from scientists (as you’d expect).</p><p class="paragraph" style="text-align:left;">AlphaFold 3 can model <b>proteins interacting with other molecules</b>, including potential drugs. The code is now downloadable, but model weights are only available to academics upon request. Several companies have developed AlphaFold 3-inspired models, including Baidu, ByteDance, and Chai Discovery. </p><p class="paragraph" style="text-align:left;">AI in math also saw advancements with <a class="link" href="https://x.com/EpochAIResearch/status/1854993676524831046?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark" target="_blank" rel="noopener noreferrer nofollow">FrontierMath</a> - a new benchmark designed to test the limits of AI systems in <b>mathematical reasoning.</b> The benchmark aims to capture a snapshot of contemporary math and evaluate AI&#39;s progress towards innovative thinking needed for scientific research.</p><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXcadX077wxykYYSugzX3xnXXV29hqtOHhy1NmX9sXnwJGVDTvIk9T5X_rN5zrtmM5OUuELkf3chSz8LzRvxfxSizSnsQd10Br1f-ZuhUYS-hUOF1S5ZlTpQx2VxQGi0RSqHXXWwPQ?key=vox0MMs9NtX7u4rAwVZjGEXL"/><div class="image__source"><span class="image__source_text"><p>FrontierMath deals with the saturation issue that other benchmarks face. <a class="link" href="https://x.com/EpochAIResearch/status/1854993680115155281/photo/1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">A quick rundown of the new benchmark’s key features:</p><ul><li><p class="paragraph" style="text-align:left;">All problems are new and unpublished to prevent data contamination</p></li><li><p class="paragraph" style="text-align:left;">Solutions are automatically verifiable, enabling efficient evaluation</p></li><li><p class="paragraph" style="text-align:left;">Problems are &quot;guessproof&quot; with a low chance of solving without proper reasoning</p></li><li><p class="paragraph" style="text-align:left;">Each problem demands hours of work from expert mathematicians</p></li></ul><p class="paragraph" style="text-align:left;"><br>The benchmark has even been validated by <b>Fields Medalists</b> like Terence Tao (considered the best mathematician in the world), Timothy Gowers, and Richard Borcherds.</p><h3 class="heading" style="text-align:left;" id="qwen-25-matches-gpt-4-as-x-expands-">Qwen 2.5 Matches GPT-4 as X Expands Grok Access </h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXcY3nYHSxZCi-gQr2dgZsRcGh4iBqCgqfOFWjRwlb-tTxsFqPbVxN_878LC7FVlHxU0yRLiNl32K1YiS5rOU9i132Qb09UtBxKhGKnHK5OiAupUjniguwUHOMt3gjcQQNLOuzZ5?key=vox0MMs9NtX7u4rAwVZjGEXL"/><div class="image__source"><span class="image__source_text"><p>Qwen2.5 showed promising results on various coding benchmarks. <a class="link" href="https://qwenlm.github.io/blog/qwen2.5-coder-family/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;"><a class="link" href="https://qwenlm.github.io/blog/qwen2.5-coder-family/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark" target="_blank" rel="noopener noreferrer nofollow">Qwen2.5-Coder-32B-Instruct</a> caught us off-guard last week, since there’s claims of it being the current SOTA open-source code model, matching GPT-4o&#39;s <b>coding capabilities.</b> It excels in code generation, repair, and reasoning across multiple programming languages.</p><p class="paragraph" style="text-align:left;">The model scored 73.7 on the Aider code repair benchmark, comparable to GPT-4o. It performs well across <b>40+ programming languages</b>, scoring 65.9 on McEval and 75.2 on MdEval.</p><p class="paragraph" style="text-align:left;">The series includes six model sizes: 0.5B, 1.5B, 3B, 7B, 14B, and 32B. Moreover, both Base and Instruct versions are available for each size. As you’d expect, performance scales positively with model size across <b>various benchmarks. </b>The series also outperforms other open-source models across all sizes on core datasets.</p><p class="paragraph" style="text-align:left;">In other news, X began to test a <a class="link" href="https://techcrunch.com/2024/11/10/x-is-testing-a-free-version-of-ai-chatbot-grok/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark" target="_blank" rel="noopener noreferrer nofollow">free version of Grok</a>. Previously, Grok was limited to <b>premium</b>, paying users of the platform.</p><p class="paragraph" style="text-align:left;">Sounds great, but there’s a some <b>restrictions</b> that free users will need to keep in mind:</p><ul><li><p class="paragraph" style="text-align:left;">10 queries per two hours with the Grok-2 model</p></li><li><p class="paragraph" style="text-align:left;">20 queries per two hours with the Grok-2 mini model</p></li><li><p class="paragraph" style="text-align:left;">3 image analysis questions per day</p></li></ul><p class="paragraph" style="text-align:left;">xAI launched Grok-2 in August with<b> image generation capabilities.</b> The model recently gained the ability to understand images.These features were previously exclusive to Premium and Premium+ users, but free users can also try out these features now.</p><h3 class="heading" style="text-align:left;" id="deep-l-voice-translates-as-bria-rmb">DeepL Voice Translates as BRIA RMBT2.0 Removes Backgrounds</h3><p class="paragraph" style="text-align:left;">DeepL, a German startup valued at $2 billion, has launched <a class="link" href="https://techcrunch.com/2024/11/13/deepl-launches-deepl-voice-real-time-text-based-translations-from-voices-and-videos/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark" target="_blank" rel="noopener noreferrer nofollow">DeepL Voice</a>, a real-time audio translation service. The service can &quot;hear&quot; 13 languages and provide translated captions in 33 languages supported by DeepL Translator. Currently, DeepL Voice outputs <b>translations</b> as text, not audio, focusing on live conversations and video conferencing.</p><p class="paragraph" style="text-align:left;">Translations can appear as &quot;mirrors&quot; on smartphones for in-person meetings or as side-by-side transcriptions. For video conferencing, translations appear as subtitles, with Microsoft Teams being the only integrated platform so far. There&#39;s <b>no API </b>for the voice product yet, as DeepL is working directly with partners and customers.</p><p class="paragraph" style="text-align:left;">In other news, a new background removal model was released called <a class="link" href="https://huggingface.co/briaai/RMBG-2.0?utm_campaign=RMBG%202.0&utm_source=linkedin&utm_medium=social&utm_content=Hugging%20Face%20RMBG2.0" target="_blank" rel="noopener noreferrer nofollow">RMBG v2.0</a>, which is developed by BRIA AI. It&#39;s designed for separating foreground from background across various image categories. The model is trained on a<b> diverse dataset i</b>ncluding stock images, e-commerce, gaming, and advertising content.</p><p class="paragraph" style="text-align:left;">It aims to rival leading source-available models in accuracy, efficiency, and versatility. The model is particularly suitable for <b>enterprise-scale content creation </b>where content safety, legal compliance, and bias mitigation are crucial.</p><h2 class="heading" style="text-align:left;" id="advancements-in-ai-research">Advancements in AI Research</h2><p class="paragraph" style="text-align:left;">Some interesting papers came to light last week, ranging from the CDXFormer model for remote sensing change detection to the critical evaluation of domain-adaptive pretraining for medical applications. Meta FAIR also released a paper looking at the issue of <b>localized image watermarking.</b></p><h3 class="heading" style="text-align:left;" id="cdx-former-a-new-approach-to-remote">CDXFormer: A New Approach to Remote Sensing Change Detection</h3><p class="paragraph" style="text-align:left;">Researchers from Zhejiang University tackled the critical challenge of effectively identifying changes in <b>r</b><b>emote sensing images</b> across complex and varied environmental conditions. They recognized existing methods like Convolutional Neural Networks, Transformers, and Mamba-based models struggled to balance performance and computational efficiency when analyzing spatial-temporal contexts.</p><p class="paragraph" style="text-align:left;">To address these limitations, they developed <a class="link" href="https://arxiv.org/pdf/2411.07863v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark" target="_blank" rel="noopener noreferrer nofollow">CDXFormer,</a> a new approach that uses Extended Long Short-Term Memory (XLSTM) technology. Their method introduces a scale-specific feature enhancement layer with <b>two key components</b>: a Cross-Temporal Global Perceptron for semantic-accurate deep features and a Cross-Temporal Spatial Refiner for detail-rich shallow features. </p><p class="paragraph" style="text-align:left;">Additionally, they implemented a Cross-Scale Interactive Fusion module to progressively integrate spatial information and global semantics. It achieved <b>SOTA performance </b>across three benchmark datasets, improving F1 scores by 0.22%, 1.08%, and 7.46% compared to previous top-performing methods.</p><p class="paragraph" style="text-align:left;">Crucially, the model <b>maintained high efficiency,</b> using only 16.19 million parameters and 3.92 G Flops—significantly lower than competing approaches. </p><h3 class="heading" style="text-align:left;" id="medical-models-vs-general-domain-ll">Medical Models vs. General-Domain LLMs: New Insights into AI&#39;s Effectiveness in Healthcare</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXdBx9ZJY9GBsqDorUzwBJCPy4sQBPnk-UQCkgSYjhg2A3QMNgFvOCqnGZanou4MtQMk_8LyhcWzguHh-3absHLaS_r1v_Gu2QFJEerdgTB9d3AeUMES6AxjoA2u9KDTQaKwXTa0TA?key=vox0MMs9NtX7u4rAwVZjGEXL"/><div class="image__source"><span class="image__source_text"><p>Overview of evaluation approach. <a class="link" href="https://arxiv.org/pdf/2411.04118?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">A recent study from Carnegie Mellon University and Mistral AI looks at the effectiveness of domain-adaptive pretraining (DAPT) for <a class="link" href="https://arxiv.org/pdf/2411.04118?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark" target="_blank" rel="noopener noreferrer nofollow">medical applications</a> of LLMs and VLMs. They aimed to address the prevailing assumption that DAPT consistently enhances performance on medical tasks, particularly in <b>answering medical licensing</b> exam questions.</p><p class="paragraph" style="text-align:left;">To investigate this, they conducted a <b>comprehensive evaluation </b>comparing seven medical LLMs and two medical VLMs against their corresponding general-domain base models. </p><p class="paragraph" style="text-align:left;">They optimized prompts for each model independently and accounted for statistical uncertainty in their comparisons. The results revealed a <b>surprising trend:</b> most medical models didn’t consistently outperform their general-domain counterparts. In fact, medical LLMs only surpassed base models in 12.1% of cases, with ties in 49.8% and underperformance in 38.2%.</p><p class="paragraph" style="text-align:left;">These findings suggest SOTA general-domain models may already <b>possess strong medical knowledge</b> and reasoning capabilities when prompted appropriately. </p><h3 class="heading" style="text-align:left;" id="mit-study-provides-causal-evidence-"><span style="color:rgb(67, 67, 67);">MIT Study Provides Causal Evidence of AI&#39;s Impact on Scientific Discovery and Innovation</span></h3><p class="paragraph" style="text-align:left;">A study from <a class="link" href="https://aidantr.github.io/files/AI_innovation.pdf?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark" target="_blank" rel="noopener noreferrer nofollow">MIT</a> provides the first causal evidence of AI&#39;s impact on real-world R&D, showing significant boosts to scientific discovery and product innovation. The research <b>exploits the randomized introduction </b>of an AI tool for materials discovery to over 1,000 scientists in a large U.S. firm&#39;s R&D lab.</p><p class="paragraph" style="text-align:left;">The study addresses key questions about AI&#39;s role in innovation, including its effects on the pace and direction of scientific breakthroughs, as well as its impact on scientists themselves. To investigate these issues, the researchers <b>analyzed detailed data</b> on each stage of R&D, from initial material discovery to patent filings and product prototypes.</p><p class="paragraph" style="text-align:left;">Key findings revealed AI-assisted researchers <b>discover 44% more materials</b>, leading to a 39% increase in patent filings and a 17% rise in downstream product innovation. </p><p class="paragraph" style="text-align:left;">Interestingly, the technology had strikingly <b>disparate effects</b> across researchers. While top scientists nearly doubled their output, the bottom third saw little benefit. The study found AI automated 57% of &quot;idea-generation&quot; tasks, shifting researchers&#39; focus to evaluating AI-suggested materials. Top scientists leveraged their domain knowledge to prioritize promising AI suggestions more effectively.</p><p class="paragraph" style="text-align:left;">It was reported that 82% of scientists experienced reduced job satisfaction due to decreased creativity and skill underutilization, so there’s definitely some challenges that still need to be <b>addressed in workforce adaptation to AI.</b></p><h3 class="heading" style="text-align:left;" id="meta-fair-introduces-wam-redefining">Meta FAIR Introduces WAM: Redefining Watermarking as a Segmentation Task</h3><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/abs/2411.07231?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark" target="_blank" rel="noopener noreferrer nofollow">Meta FAIR’s paper</a> addresses the challenge of localized image watermarking, which traditional methods struggle to handle effectively. They aimed to solve the problem of watermarking specific areas within an image, allowing for multiple distinct watermarks and <b>improved robustness </b>against image manipulations like splicing and inpainting.</p><p class="paragraph" style="text-align:left;">To tackle this issue, the researchers introduce the Watermark Anything Model (WAM), which redefines watermarking as a segmentation task. WAM consists of an <b>embedder</b> that imperceptibly modifies the input image and an extractor that segments the received image into watermarked and non-watermarked areas while recovering hidden messages. </p><p class="paragraph" style="text-align:left;">The model employs a two-stage training process, first focusing on robustness at low resolution without<b> perceptual constraints</b>, then fine-tuning for imperceptibility and multiple watermark handling.</p><p class="paragraph" style="text-align:left;">They used <b>deep learning techniques,</b> including a graph neural network architecture for the embedder and a vision transformer-based extractor. They incorporated a Just-Noticeable-Difference (JND) map to modulate watermark intensity and improve imperceptibility. </p><p class="paragraph" style="text-align:left;">The model demonstrates the ability to locate watermarked areas in spliced images and extract distinct <b>32-bit messages</b> from multiple small regions, even when they occupy as little as 10% of the image surface. Notably, WAM achieves over 85% mIoU in detecting watermarked areas and over 95% bit accuracy when hiding five 32-bit messages in 10% areas of images, even after manipulations like horizontal flipping and contrast adjustment.</p><h2 class="heading" style="text-align:left;" id="conversations-we-loved">Conversations We Loved</h2><p class="paragraph" style="text-align:left;">Anthropic’s CEO had a 5-hour discussion with Lex Fridman last week and gave us some interesting insights into the <b>scaling hypothesis </b>to think about. Meanwhile, another conversation about the evolution of LLMs by Andrew Ng also caught our attention.</p><h3 class="heading" style="text-align:left;" id="evolution-of-ll-ms-and-agentic-work">Evolution of LLMs and Agentic Workflows</h3><p class="paragraph" style="text-align:left;"><a class="link" href="https://x.com/AndrewYNg/status/1857117382378164267?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark" target="_blank" rel="noopener noreferrer nofollow">Andrew Ng</a> shared insights on the evolving landscape of LLMs and their increasing optimization for <b>agentic workflows</b>. Ng highlights a significant shift in LLM development, moving beyond consumer-facing question-answering to more complex, iterative processes that enable AI agents to perform sophisticated tasks.</p><p class="paragraph" style="text-align:left;">Ng observes that while LLMs have been primarily tuned for direct <b>human interaction</b>, there&#39;s a growing trend towards optimizing them for agentic behaviors. This includes capabilities like tool use, function calling, and even computer operation, as demonstrated by Anthropic&#39;s recent release. He emphasizes the potential of these advancements to dramatically boost agentic performance in AI applications.</p><p class="paragraph" style="text-align:left;">The conversation outlines a <b>three-stage progression</b> in agentic LLM development:</p><ul><li><p class="paragraph" style="text-align:left;">Prompting existing LLMs for agentic behaviors</p></li><li><p class="paragraph" style="text-align:left;">Fine-tuning models for specific, high-value applications</p></li><li><p class="paragraph" style="text-align:left;">Major LLM providers integrating agentic capabilities directly into their models</p></li></ul><p class="paragraph" style="text-align:left;">Ng predicts that this trend will lead to<b> significant performance gains</b> in AI agents over the next few years, opening up new possibilities for complex, multi-step AI workflows. </p><h3 class="heading" style="text-align:left;" id="dario-amodei-discusses-scaling-hypo">Dario Amodei Discusses Scaling Hypothesis and Responsible AI Development</h3><p class="paragraph" style="text-align:left;">In a conversation on the Lex Fridman Podcast, <a class="link" href="https://www.shortform.com/podcast/episode/lex-fridman-podcast-2024-11-11-episode-summary-452-dario-amodei-anthropic-ceo-on-claude-agi-the-future-of-ai-humanity?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark" target="_blank" rel="noopener noreferrer nofollow">Dario Amodei,</a> CEO of Anthropic, offered valuable insights into the state of AI and his company&#39;s approach to <b>responsible AI development.</b></p><p class="paragraph" style="text-align:left;">Amodei delved into the <b>scaling hypothesis</b>, which suggests that increasing the size and computational power of neural network models leads to significant capability growth across diverse tasks. He highlighted how larger models like GPT and CLIP have shown dramatic improvements when scaled up with more data and compute power. </p><p class="paragraph" style="text-align:left;">Amodei believes AI systems matching or exceeding human abilities across domains could be achievable as soon as <b>2026 or 2027</b>, though he acknowledges uncertainties remain.</p><p class="paragraph" style="text-align:left;">A key focus of the discussion was Anthropic&#39;s Responsible Scaling Plan (RSP), aimed at mitigating potential risks as AI becomes<b> more powerful</b>. This plan involves testing models for autonomous behavior and potential misuse, with escalating safety precautions as capabilities increase. Amodei emphasized the necessity of regulation in AI to address risks like malicious use and loss of human control.</p><p class="paragraph" style="text-align:left;">Amodei shared insights into Anthropic&#39;s work on the Claude AI model as well, which is being developed with human values in mind through <b>iterative testing and refinement. </b></p><h2 class="heading" style="text-align:left;" id="frameworks-we-love">Frameworks We Love</h2><p class="paragraph" style="text-align:left;">Some frameworks that caught our attention in the last week include:</p><ul><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2411.08804?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark" target="_blank" rel="noopener noreferrer nofollow">FinRobot</a>: Framework for equity research that integrates quantitative and qualitative analysis through a multi-agent Chain of Thought system</p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2411.08599?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark" target="_blank" rel="noopener noreferrer nofollow">XiYan-SQL</a>: Natural language to SQL framework that improves query generation quality and accuracy</p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://www.linkedin.com/posts/sahar-mor_ive-open-sourced-a-key-component-of-one-activity-7262486932645916672-rwpH/?utm_source=share&utm_medium=member_android" target="_blank" rel="noopener noreferrer nofollow">VoiceLab</a>: Open-source framework for testing and optimizing voice agents, providing tools for custom metrics, model migration, and prompt testing. </p></li></ul><p class="paragraph" style="text-align:left;">If you want your framework to be featured here, reply to this email saying hi :)</p><h2 class="heading" style="text-align:left;" id="money-moving-in-ai">Money Moving in AI</h2><p class="paragraph" style="text-align:left;">Writer saw a successful series C funding round for <b>$200 million</b>, while two key acquisitions took place last week: Anysphere acquired Supermaven and Redhat acquired NeuralMagic. Note that the sum for these acquisitions weren’t revealed.</p><h3 class="heading" style="text-align:left;" id="writer-raises-200-million-in-series">Writer Raises $200 Million in Series C Funding Round</h3><p class="paragraph" style="text-align:left;">Writer, an <b>enterprise generative AI platform</b>, raised <a class="link" href="https://techcrunch.com/2024/11/12/generative-ai-startup-writer-raises-200m-at-a-1-9b-valuation/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark" target="_blank" rel="noopener noreferrer nofollow">$200 million</a> in a Series C round at a $1.9 billion valuation. The funding will support product development, including AI agents, customizable guardrails, and no-code tools, cementing its position in enterprise AI. Writer&#39;s Palmyra models, tailored for business needs, have attracted clients like Salesforce, Uber, and Intuit, highlighting its success amidst intense competition.</p><h3 class="heading" style="text-align:left;" id="anysphere-acquires-supermaven">Anysphere Acquires Supermaven</h3><p class="paragraph" style="text-align:left;">Anysphere, maker of the AI-powered code editor Cursor, has acquired AI coding assistant <a class="link" href="https://techcrunch.com/2024/11/12/anysphere-acquires-supermaven-to-beef-up-cursor/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark" target="_blank" rel="noopener noreferrer nofollow">Supermaven</a> for an undisclosed sum. Supermaven&#39;s technology, including its low-latency AI model Babble, will enhance Cursor&#39;s upcoming Tab AI version for context-aware, intelligent coding, <b>especially for long sequences. </b></p><p class="paragraph" style="text-align:left;">The merger aims to combine <b>advanced model capabilities</b> with a seamless editor UI, accelerating product development and maintaining Supermaven’s plugins. </p><h3 class="heading" style="text-align:left;" id="red-hat-acquires-neural-magic">Red Hat Acquires Neural Magic</h3><p class="paragraph" style="text-align:left;">Red Hat has <a class="link" href="https://techcrunch.com/2024/11/12/red-hat-acquires-ai-optimization-startup-neural-magic/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-new-ai-agent-forge-api-launches-frontiermath-benchmark" target="_blank" rel="noopener noreferrer nofollow">acquired Neural Magic</a> (also for an undisclosed sum), a startup focused on optimizing AI models to run efficiently on <b>standard processors</b> and GPUs, enhancing hybrid cloud AI performance. Neural Magic, founded in 2018, offers tools like vLLM for model serving, which Red Hat will integrate into its platforms like OpenShift AI and Red Hat Enterprise Linux AI. </p><p class="paragraph" style="text-align:left;">This acquisition aligns with Red Hat&#39;s goal to expand its<b> AI capabilities in flexible, open-source environments.</b></p></div><div class='beehiiv__footer'><br class='beehiiv__footer__break'><hr class='beehiiv__footer__line'><a target="_blank" class="beehiiv__footer_link" style="text-align: center;" href="https://www.beehiiv.com/?utm_campaign=efc5cb8c-1a16-4b36-8c5f-77a6ec8c8718&utm_medium=post_rss&utm_source=genai360_weekly_ai_news">Powered by beehiiv</a></div></div>
  ]]></content:encoded>
</item>

      <item>
  <title>How to Leverage All of the World&#39;s Research for Scientific Discovery with GenAI</title>
  <description>A leading MedTech company aimed to speed up its manual research process for hundreds of scientists while ensuring top accuracy. With Activeloop&#39;s sub-second, multi-modal AI search, they connected all of the world&#39;s research papers to LLMs, cutting research time from weeks to days, advancing medical device development and patient outcomes. Learn how.</description>
      <enclosure url="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/812f784f-7719-44d1-988a-ab46cc73d236/image_63347755.png" length="443908" type="image/png"/>
  <link>https://genai360.beehiiv.com/p/scientific-research-medtech</link>
  <guid isPermaLink="true">https://genai360.beehiiv.com/p/scientific-research-medtech</guid>
  <pubDate>Wed, 13 Nov 2024 22:48:06 +0000</pubDate>
  <atom:published>2024-11-13T22:48:06Z</atom:published>
    <category><![CDATA[Activeloop]]></category>
  <content:encoded><![CDATA[
    <div class='beehiiv'><style>
  .bh__table, .bh__table_header, .bh__table_cell { border: 1px solid #C0C0C0; }
  .bh__table_cell { padding: 5px; background-color: #FFFFFF; }
  .bh__table_cell p { color: #2D2D2D; font-family: 'Helvetica',Arial,sans-serif !important; overflow-wrap: break-word; }
  .bh__table_header { padding: 5px; background-color:#F1F1F1; }
  .bh__table_header p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important; overflow-wrap: break-word; }
</style><div class='beehiiv__body'><h2 class="heading" style="text-align:left;" id="accelerating-medical-research-with-"><b>Accelerating Medical Research with AI: A MedTech Company’s Story</b></h2><p class="paragraph" style="text-align:left;">Imagine doing scientific research at scale, having to sift through millions of research articles across PubMed, internal research notes, and patient data including MRIs, CT scans, and more. </p><p class="paragraph" style="text-align:left;">The manual search process is slow, manual, and cross-referencing different, rapidly evolving data sources is prone to errors, ultimately delaying progress.</p><p class="paragraph" style="text-align:left;">A leading MedTech company tackled this challenge in collaboration with <a class="link" href="https://www.activeloop.ai/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=how-to-leverage-all-of-the-world-s-research-for-scientific-discovery-with-genai" target="_blank" rel="noopener noreferrer nofollow">Activeloop</a> and Intel. Here&#39;s what they were able to achieve (watch a 1.5 min demo):</p><iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen="true" class="youtube_embed" frameborder="0" height="100%" src="https://youtube.com/embed/M-PtpeSRVY8" width="100%"></iframe><div class="blockquote"><blockquote class="blockquote__quote"></blockquote></div><p class="paragraph" style="text-align:left;">The goal was to connect internal data (patient reports, MRIs, CT scans, research notes) and external research (e.g., PubMed articles) to speed up scientific research and obtain fast, accurate answers to complex questions involving multi-modal data from different sources.</p><p class="paragraph" style="text-align:left;">Deep Lake, combined with Intel’s 5th Gen Xeon® processors, provided an efficient way to search, connect research data, and integrate any data type with Large Language Models (LLMs) to enhance insights. Tasks that once took weeks could now be completed in days or even hours.</p><div class="button" style="text-align:center;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="background-color:#ff8a00;" href="https://activeloop.ai/usecase/medtech/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=how-to-leverage-all-of-the-world-s-research-for-scientific-discovery-with-genai"><span class="button__text" style="color:#FFFFFF;"> Read online </span></a></div><h2 class="heading" style="text-align:left;" id="the-challenge"><b>The Challenge</b></h2><p class="paragraph" style="text-align:left;">Researchers faced three key hurdles:</p><ul><li><p class="paragraph" style="text-align:left;">Efficiently searching over 40 million documents and multi-modal data.</p></li><li><p class="paragraph" style="text-align:left;">Keyword searches that missed crucial connections.</p></li><li><p class="paragraph" style="text-align:left;">Manual cross-referencing prone to errors.</p></li></ul><h2 class="heading" style="text-align:left;" id="the-solution"><b>The Solution</b></h2><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/812f784f-7719-44d1-988a-ab46cc73d236/image_63347755.png?t=1731533681"/><div class="image__source"><span class="image__source_text"><p>Deep Lake Research UI across radiological data and PubMed articles.</p></span></div></div><p class="paragraph" style="text-align:left;">The solution needed to handle diverse data, deliver quick responses, and support a growing research team. Deep Lake by Activeloop enabled AI-powered, accurate search across multi-modal data. Intel’s hardware ensured high speeds for data ingestion, embedding quantization, and querying.</p><p class="paragraph" style="text-align:left;">Researchers could now use a conversational AI assistant that analyzed queries, connected diverse data to LLMs, and delivered precise answers with citations across relevant articles and even correlate research findings to patient data.</p><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/b3e6cf40-e030-4509-83b8-0824d09c3bf9/image_63347378.png?t=1731534153"/><div class="image__source"><span class="image__source_text"><p>Cross-referencing data across multiple datatypes and clouds, instantly</p></span></div></div><p class="paragraph" style="text-align:left;"><b>Solution Benefits</b></p><ul><li><p class="paragraph" style="text-align:left;"><b>Faster Insights</b>: Projects that previously took months now took days, accelerating the research process.</p></li><li><p class="paragraph" style="text-align:left;"><b>Improved Accuracy</b>: Researchers could quickly find connections across clinical trials and device blueprints, uncovering insights that were previously missed, with an average 7% improvement in search accuracy.</p></li><li><p class="paragraph" style="text-align:left;"><b>Enhanced Efficiency</b>: The AI assistant streamlined workflows, reducing manual cross-referencing and minimizing errors, while delivering fast AI searches and enabling seamless connection of any data to LLMs for enhanced analysis.</p></li></ul><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/db05a172-325b-4aa4-ba23-28c950071222/IMG.png?t=1731533545"/><div class="image__source"><span class="image__source_text"><p>Multi-Layered Solution for Scientific Discovery with GenAI</p></span></div></div><p class="paragraph" style="text-align:left;"><b>Results Achieved with Intel</b></p><div class="image"><img alt="" class="image__image" style="border-radius:0px 0px 0px 0px;border-style:solid;border-width:0px 0px 0px 0px;box-sizing:border-box;border-color:#E5E7EB;" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/5e5750de-3fe1-47b2-b708-aab4a5da01e7/image.png?t=1725528376"/></div><p class="paragraph" style="text-align:left;">The impact included:</p><ul><li><p class="paragraph" style="text-align:left;"><b>Search times reduced</b> to 0.0243 seconds, achieved using 5th Gen Intel® Xeon® processors for real-time embedding inference.</p></li><li><p class="paragraph" style="text-align:left;"><b>65% faster</b> literature ingestion, made possible by Intel® Gaudi® 2 accelerators for improved batch processing.</p></li><li><p class="paragraph" style="text-align:left;"><b>Up to 4x faster</b> streaming computations, utilizing Intel® oneAPI Math Kernel Library (oneMKL) for enhanced cosine similarity computations.</p></li></ul><p class="paragraph" style="text-align:left;">Working on a similar research process at your company? Chat to our GenAI experts today to learn how you can leverage all your data for GenAI-powered search.</p><div class="button" style="text-align:center;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="background-color:#ff8a00;" href="https://www.activeloop.ai/contact/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=how-to-leverage-all-of-the-world-s-research-for-scientific-discovery-with-genai"><span class="button__text" style=""> Book a call </span></a></div><p class="paragraph" style="text-align:left;">Mikayel on behalf of Activeloop</p><p class="paragraph" style="text-align:left;"><span style="font-family:Apple Color Emoji, Segoe UI Emoji, NotoColorEmoji, Noto Color Emoji, Segoe UI Symbol, Android Emoji, EmojiSymbols;font-size:0.6rem;">©</span><span style="font-size:0.6rem;"> Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.</span></p></div><div class='beehiiv__footer'><br class='beehiiv__footer__break'><hr class='beehiiv__footer__line'><a target="_blank" class="beehiiv__footer_link" style="text-align: center;" href="https://www.beehiiv.com/?utm_campaign=eb2169b9-f4c8-419d-8725-92036f3a69b5&utm_medium=post_rss&utm_source=genai360_weekly_ai_news">Powered by beehiiv</a></div></div>
  ]]></content:encoded>
</item>

      <item>
  <title>RetrieveX is Here: Important Information and Agenda for The Day</title>
  <description>Doors Open at 10.30, Keynote from Meta Chameleon&#39;s Creator at 11. Be on time :)</description>
      <enclosure url="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/18ea3c8b-ff75-495a-b090-399c675e908e/RetrieveX_-_Top_Speakers-1200x630-px.png" length="229912" type="image/png"/>
  <link>https://genai360.beehiiv.com/p/retrievex-is-here-important-information-and-agenda-for-the-day</link>
  <guid isPermaLink="true">https://genai360.beehiiv.com/p/retrievex-is-here-important-information-and-agenda-for-the-day</guid>
  <pubDate>Thu, 17 Oct 2024 13:30:00 +0000</pubDate>
  <atom:published>2024-10-17T13:30:00Z</atom:published>
  <content:encoded><![CDATA[
    <div class='beehiiv'><style>
  .bh__table, .bh__table_header, .bh__table_cell { border: 1px solid #C0C0C0; }
  .bh__table_cell { padding: 5px; background-color: #FFFFFF; }
  .bh__table_cell p { color: #2D2D2D; font-family: 'Helvetica',Arial,sans-serif !important; overflow-wrap: break-word; }
  .bh__table_header { padding: 5px; background-color:#F1F1F1; }
  .bh__table_header p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important; overflow-wrap: break-word; }
</style><div class='beehiiv__body'><p class="paragraph" style="text-align:left;">Dear {{ first name | Data Leader }},</p><p class="paragraph" style="text-align:left;">In a few hours, My team and I will welcome you at RetrieveX - the first industry event focused entirely on Retrieval for AI!</p><p class="paragraph" style="text-align:left;">Get ready to join us and over 200 data leaders on October 17th at 10:30 AM at The Midway in San Francisco! We can&#39;t wait to see you there! You can totally bring your laptop to follow along some of the workshops on our agenda!</p><h3 class="heading" style="text-align:left;" id="how-to-travel-to-the-venue"><b>How to Travel to the Venue:</b></h3><p class="paragraph" style="text-align:left;"><b>The Address: </b><a class="link" href="https://maps.app.goo.gl/JV9DJhE3qKonYAAs8?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=retrievex-is-here-important-information-and-agenda-for-the-day" target="_blank" rel="noopener noreferrer nofollow">900 Marin St, San Francisco, CA 94124</a></p><p class="paragraph" style="text-align:left;"><b>The Parking: </b>While we&#39;re not providing parking, there&#39;s a lot of free parking space available in direct walking distance from the venue. </p><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/e9537817-5667-4e85-a103-5b58ca8b610e/image.png?t=1729159373"/></div><p class="paragraph" style="text-align:left;"><b>Closest light rail stop: </b>3rd St & Marin St</p><p class="paragraph" style="text-align:left;"></p><div class="section" style="background-color:#ff8a00;margin:0.0px 0.0px 0.0px 0.0px;padding:0.0px 0.0px 0.0px 0.0px;"><div class="blockquote"><blockquote class="blockquote__quote"></blockquote></div></div><h3 class="heading" style="text-align:left;" id="how-to-travel-to-the-venue"><b>At the Venue:</b></h3><ol start="1"><li><p class="paragraph" style="text-align:left;"><b>Doors open at 10:30</b> - Arrive.</p></li><li><p class="paragraph" style="text-align:left;"><b>Get checked in</b>. Download the Eventify app (see below) for ultra-fast check in process. </p></li><li><p class="paragraph" style="text-align:left;">Grab your badge, <b>coffee, and some light snacks</b>. </p></li><li><p class="paragraph" style="text-align:left;">Grab a spot and post that you&#39;re attending with the <b>Event Hashtag:</b> <b>#RetrieveX</b></p></li><li><p class="paragraph" style="text-align:left;">Conference Wifi: </p><ul><li><p class="paragraph" style="text-align:left;"><b>Name</b>: RetrieveX-Attendees </p></li><li><p class="paragraph" style="text-align:left;"><b>Pass:</b> pipinstalldeeplake</p></li></ul></li></ol><h3 class="heading" style="text-align:left;" id="the-agenda"><b>The Agenda:</b></h3><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/382ced14-f61e-4ac8-8ff0-1b7452dd3bf5/image.png?t=1729019295"/><div class="image__source"><span class="image__source_text"><p>Quick look at all the speakers</p></span></div></div><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/5fb41f4e-233e-418c-a986-1651c9f23ba6/image.png?t=1729160460"/><div class="image__source"><span class="image__source_text"><p>Presenting today…</p></span></div></div><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/904e3f7c-5689-4c9e-9e4f-7635e3d206f1/image.png?t=1729160489"/><div class="image__source"><span class="image__source_text"><p>and also…</p></span></div></div><ul><li><p class="paragraph" style="text-align:left;"><b>11:00 AM:</b> <b>Opening Keynote</b> with Armen Aghajanyan, Multi-Modal AI @Meta AI</p></li><li><p class="paragraph" style="text-align:left;"><b>11:50 AM: Keynote </b>Davit Buniatyan, CEO Activeloop: AI Search on Data Lakes</p></li><li><p class="paragraph" style="text-align:left;"><b>2:40 PM:</b> Rob Ferguson, Head of AI, Microsoft for Startups on Leaders in Retrieval Augmentation</p></li><li><p class="paragraph" style="text-align:left;"><b>Afternoon: </b>Talks and workshops from creators of PyTorch, CAFFE, Albumentations, as well as CTOs from Cresta, Hercules and Founders of Lepton AI, Omneky, Generative Alpha - on everything from implementing custom GenAI in enterprises to scaling efficiently across marketing, legal, finance, and other industries.</p></li><li><p class="paragraph" style="text-align:left;"><b>5:00 PM:</b> Networking and a catered reception.</p></li></ul><p class="paragraph" style="text-align:left;">Full agenda for the day is around the venue, or <a class="link" href="https://www.retrievex.co/agenda?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=retrievex-is-here-important-information-and-agenda-for-the-day" target="_blank" rel="noopener noreferrer nofollow">here</a>.</p><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/570ce4f9-329d-42e2-895b-f9f6fcae3c9b/image.png?t=1729160422"/></div><h3 class="heading" style="text-align:left;" id="action-item-download-the-event-app-"><b>Action Item: Download the Event App for Faster Check-in and Networking</b></h3><div class="image"><img alt="" class="image__image" style="border-radius:0px 0px 0px 0px;border-style:solid;border-width:0px 0px 0px 0px;box-sizing:border-box;border-color:#E5E7EB;" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/59d5cefc-ad32-4d9b-b5dd-bfda1fb55a84/image.png?t=1728786902"/></div><p class="paragraph" style="text-align:left;">If you have any questions or need assistance, feel free to reply to this email. </p><p class="paragraph" style="text-align:left;">See you on October 17,<br>Mikayel on behalf of the RetrieveX Team</p></div><div class='beehiiv__footer'><br class='beehiiv__footer__break'><hr class='beehiiv__footer__line'><a target="_blank" class="beehiiv__footer_link" style="text-align: center;" href="https://www.beehiiv.com/?utm_campaign=0d3887ef-1197-4f68-a54d-11efbc421566&utm_medium=post_rss&utm_source=genai360_weekly_ai_news">Powered by beehiiv</a></div></div>
  ]]></content:encoded>
</item>

      <item>
  <title>OpenAI&#39;s Swarm, First Open-Source Multimodal MoE Model, TensorWave’s AMD Challenge</title>
  <description>Plus, OpenAI’s new benchmark could reshape AI agent development</description>
  <link>https://genai360.beehiiv.com/p/of-new-benchmarks</link>
  <guid isPermaLink="true">https://genai360.beehiiv.com/p/of-new-benchmarks</guid>
  <pubDate>Tue, 15 Oct 2024 15:57:50 +0000</pubDate>
  <atom:published>2024-10-15T15:57:50Z</atom:published>
  <content:encoded><![CDATA[
    <div class='beehiiv'><style>
  .bh__table, .bh__table_header, .bh__table_cell { border: 1px solid #C0C0C0; }
  .bh__table_cell { padding: 5px; background-color: #FFFFFF; }
  .bh__table_cell p { color: #2D2D2D; font-family: 'Helvetica',Arial,sans-serif !important; overflow-wrap: break-word; }
  .bh__table_header { padding: 5px; background-color:#F1F1F1; }
  .bh__table_header p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important; overflow-wrap: break-word; }
</style><div class='beehiiv__body'><div class="section" style="background-color:#ff8a00;border-color:#ff8a00;border-radius:2px;border-style:solid;border-width:2px;margin:10.0px 10.0px 10.0px 10.0px;padding:10.0px 10.0px 10.0px 10.0px;"><h2 class="heading" style="text-align:left;"><span style="color:rgb(255, 255, 255);">Last Chance: Claim Free Tickets for the In-Person RetrieveX Conference on Oct 17 in San Francisco. </span></h2><p class="paragraph" style="text-align:left;"><span style="color:rgb(255, 255, 255);">Come hear from the creators of PyTorch, Albumentations, Meta Chameleon, Kubeflow, CAFFE, Albumentations, along with leaders from Microsoft, AWS, Bayer, Flagship Pioneering, Cresta, Omneky how to build best retrieval for AI. </span></p><p class="paragraph" style="text-align:left;"><span style="color:rgb(255, 255, 255);">If you&#39;re executive who&#39;s considering or working on GenAI projects, fill in the form below for a complimentary ticket for the conference - hurry up because tickets are limited! </span><span style="color:rgb(255, 255, 255);"><b>Please note that the conference is in-person only</b></span><span style="color:rgb(255, 255, 255);">.</span></p><div class="button" style="text-align:center;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="background-color:#F9FAFB;" href="https://www.retrievex.co/application?utm_source=newsletter&utm_medium=email&utm_campaign=weekly2"><span class="button__text" style=""><span style="color:rgb(34, 34, 34);">Get Tickets Today</span></span></a></div><p class="paragraph" style="text-align:left;"><span style="color:rgb(255, 255, 255);"><b>Date:</b></span><span style="color:rgb(255, 255, 255);"> October 17, 10:30am - 7pm PT</span><br><span style="color:rgb(255, 255, 255);"><b>Venue: </b></span><span style="color:rgb(255, 255, 255);">The Midway, 900 Marin St, San Francisco</span></p></div><h2 class="heading" style="text-align:left;" id="key-takeaways">Key Takeaways</h2><ul><li><p class="paragraph" style="text-align:left;">OpenAI released Swarm, a lightweight library for building multi-agent systems that allows dynamic switching between specialized agents during conversations.</p></li><li><p class="paragraph" style="text-align:left;">Aria, the first open-source multimodal native MoE model, was released, showing state-of-the-art performance on various multimodal and language tasks.</p></li><li><p class="paragraph" style="text-align:left;">BioNTech unveiled several AI initiatives, including the Kyber supercomputer and Bayesian Flow Network models for protein sequence generation.</p></li><li><p class="paragraph" style="text-align:left;">OpenAI&#39;s MLE-bench, evaluating AI agents on machine learning engineering tasks, found that the best setup achieved bronze medal level in 16.9% of Kaggle competitions, improving to 34.1% with multiple attempts.</p></li><li><p class="paragraph" style="text-align:left;">A study introducing Compositional GSM showed significant disparities in LLM performance between standard and more complex math problems, highlighting gaps in reasoning capabilities.</p></li></ul><p class="paragraph" style="text-align:left;"><i>Got forwarded this newsletter? Subscribe below👇</i></p><div class="button" style="text-align:left;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="" href="https://genai360.beehiiv.com/subscribe?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge"><span class="button__text" style=""> Subscribe </span></a></div><h2 class="heading" style="text-align:left;" id="the-latest-ai-news">The Latest AI News</h2><p class="paragraph" style="text-align:left;">The model releases seemed to have slowed down last week, but we did get the <b>first </b>open-source, multimodal native MoE mode, along with a new library from OpenAI.</p><p class="paragraph" style="text-align:left;">Meanwhile, AI in biotech saw some notable progress with AI being applied to calculating carbon footprints as well as a bunch of updates unveiled at <b>BioNTech’s inaugural AI day.</b></p><h3 class="heading" style="text-align:left;" id="open-a-is-new-tool-for-flexible-and">OpenAI&#39;s New Tool for Flexible and Dynamic AI Agent Interactions</h3><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/388a51a3-6bef-4a8f-bdeb-e02c2aad2e3f/image.png?t=1728946145"/><div class="image__source"><span class="image__source_text"><p>Example of how Swarm works. <a class="link" href="https://github.com/openai/swarm?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">OpenAI released <a class="link" href="https://github.com/openai/swarm?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge" target="_blank" rel="noopener noreferrer nofollow">Swarm</a> - a lightweight library for building multi-agent systems. It’s similar in concept to existing frameworks like <b>CrewAI and LangChain</b> that also help with the creation of multi-agent systems. Unexpected, but certainly welcomed. </p><p class="paragraph" style="text-align:left;">Swarm provides a stateless abstraction to manage interactions between multiple AI agents. It also allows for dynamic switching between specialized agents during conversations. Worth noting that it <b>doesn’t rely</b> on OpenAI&#39;s Assistants API, so it offers more flexibility and control.</p><p class="paragraph" style="text-align:left;">It lets devs create distinct agents, each with specific roles, instructions, and functions. These agents can interact dynamically based on<b> pre-defined handoff </b>logic, allowing for seamless task switching as conversations or workflows progress.</p><p class="paragraph" style="text-align:left;">Since the framework doesn’t maintain internal state between function calls, it lets agents pass control to other agents in real-time based on criteria or conversation flow. The handoff is simple—just return the <b>next agent </b>to engage. </p><p class="paragraph" style="text-align:left;">Notably, Swarm doesn’t use <b>OpenAI’s function calling feature</b>. Instead, it opts for a more flexible approach that allows for easier integration with various AI models and custom implementations.</p><p class="paragraph" style="text-align:left;">Swarm uses context variables to maintain and update the <b>state </b>throughout multi-agent interactions. </p><h3 class="heading" style="text-align:left;" id="aria-the-worlds-first-open-source-m">Aria: The World&#39;s First Open-Source Multimodal Native MoE Model Debuts</h3><div class="image"><a class="image__link" href="https://www.rhymes.ai/blog-details/aria-first-open-multimodal-native-moe-model?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge" rel="noopener" target="_blank"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/aa0e50a6-0249-4854-a24d-62b29c3de79f/image.png?t=1728946006"/></a><div class="image__source"><span class="image__source_text"><p>Aria showed SOTA results on benchmarks like MathVista and DocVQA. <a class="link" href="https://www.rhymes.ai/blog-details/aria-first-open-multimodal-native-moe-model?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">We saw the release of the first <b>open-source, multimodal native MoE</b> model called <a class="link" href="https://www.rhymes.ai/blog-details/aria-first-open-multimodal-native-moe-model?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge" target="_blank" rel="noopener noreferrer nofollow">Aria</a> by Rhymes AI.</p><p class="paragraph" style="text-align:left;">It&#39;s pre-trained from scratch on a mixture of <b>multimodal and language data.</b> It’s definitely no slouch as it showed SOTA performance on a wide range of multimodal and language tasks like MMMU and LongVideoBench. It’s adept at following instructions across both multimodal and language inputs, excelling in benchmarks like MIA-Bench and MT-Bench.</p><p class="paragraph" style="text-align:left;">Aria processes text, images, video, and code <b>simultaneously </b>- all without needing separate setups for each type. In terms of multimodal capabilities, it has a long context window of 64K tokens. Aria can caption a 256-frame video in just 10 seconds. </p><p class="paragraph" style="text-align:left;">It consists of a vision encoder and a MoE decoder, with the vision encoder operating in three resolution modes: medium, high, and ultra-high. The MoE decoder has 66 experts in each layer, with 2 shared experts and <b>6 activated experts per token.</b></p><h3 class="heading" style="text-align:left;" id="world-labs-google-cloud-partnership">World Labs&#39; Google Cloud Partnership, TensorWave&#39;s AMD Challenge, and Intel&#39;s AI-Enabled Processors</h3><p class="paragraph" style="text-align:left;"><a class="link" href="https://techcrunch.com/2024/10/08/fei-fei-li-picks-google-cloud-where-she-led-ai-as-world-labs-main-compute-provider/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge" target="_blank" rel="noopener noreferrer nofollow">Fei-Fei Li’s World Labs </a>(which came out of stealth last month) has selected Google Cloud as its primary compute provider to train its &quot;spatially intelligent&quot; AI models. They’re using a significant portion of its <b>$230 million funding round</b> for GPU server licensing.</p><p class="paragraph" style="text-align:left;">World Labs has a big focus developing multimodal AI models capable of processing, generating, and interacting with video and <b>geospatial data,</b> which are all pretty computationally demanding. </p><p class="paragraph" style="text-align:left;">World Labs’ partnership with <b>Google Cloud</b> is non-exclusive, so the startup can potentially explore other cloud providers in the future. But for now, Google Cloud currently hosts the majority of World Labs’ workloads and they aim to retain this business long-term.</p><p class="paragraph" style="text-align:left;">Nvidia has seen a lot of <a class="link" href="https://genai360.beehiiv.com/p/the-trillion-dollar-cluster?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge" target="_blank" rel="noopener noreferrer nofollow">competition</a> step up to the plate in the chip market in recent months, with last week being no exception. <a class="link" href="https://techcrunch.com/2024/10/08/tensorwave-claims-its-amd-powered-cloud-for-ai-will-give-nvidia-a-run-for-its-money/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge" target="_blank" rel="noopener noreferrer nofollow">TensorWave</a> is going against the grain by launching a cloud platform that only offers access to hardware from Nvidia rival AMD for AI workloads. Their main goal is to democratize AI by offering more a<b>ffordable compute access.</b></p><p class="paragraph" style="text-align:left;">TensorWave uses AMD Instinct <b>MI300X GPUs for AI workloads</b>, claiming its MI300X instances outperform Nvidia&#39;s H100 in running (but not training) AI models, particularly text-generating models like Meta&#39;s Llama 2.</p><p class="paragraph" style="text-align:left;">TensorWave rents GPU capacity by the hour with a minimum six-month contract.<br>Pricing ranges from approximately<b> $1 to $10 per hour,</b> depending on workload requirements and GPU configurations. Interestingly enough, the company aims to be more cost-effective than competitors due to the lower cost of AMD GPUs compared to Nvidia&#39;s H100.</p><p class="paragraph" style="text-align:left;">In terms of growth, TensorWave is already generating $3 million in annual recurring revenue. The company expects to reach<b> $25 million in revenue</b> by the end of the year. TensorWave plans to scale up to 20,000 MI300X GPUs. The company intends to bring AMD&#39;s next-gen MI325X GPUs online as early as November/December 2024.</p><p class="paragraph" style="text-align:left;">We also saw a new release from Intel called the <a class="link" href="https://www.neowin.net/news/intel-launches-core-ultra-200s-desktop-processors-arrow-lake-with-npus-for-ai-performance/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge" target="_blank" rel="noopener noreferrer nofollow">Intel Core Ultra 200S series</a>, which features built-in <b>Neural Processing Units (NPUs)</b>. That means it marks Intel&#39;s first desktop processors with integrated AI capabilities. </p><h3 class="heading" style="text-align:left;" id="from-packages-to-products-amazons-a">From Packages to Products: Amazon&#39;s AI Innovations in Delivery and Shopping</h3><p class="paragraph" style="text-align:left;">Amazon is introducing <a class="link" href="https://techcrunch.com/2024/10/09/amazons-new-ai-powered-vision-tech-tells-drivers-which-packages-to-deliver/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge" target="_blank" rel="noopener noreferrer nofollow">vision-based technology</a> to its electric vehicle fleet. It’s designed to help drivers prioritize packages by <b>highlighting</b> them with green circles (for delivery) or red lights (incorrect packages).</p><p class="paragraph" style="text-align:left;">The VAPR system aims to save drivers from having to stop and manually search for packages at each stop, reducing the time spent per stop from <b>2–5 minutes </b>to under a minute. </p><p class="paragraph" style="text-align:left;">It also includes an audio cue to confirm if the driver has selected the <b>correct package</b>, which gets rid of the need for handheld devices that drivers currently use for package scanning and tracking.</p><p class="paragraph" style="text-align:left;">Turns out the VAPR system has been in development since early 2020, with Amazon considering unique delivery challenges like lighting and space constraints inside vans. There are plans to deploy the VAPR system in <b>1,000</b> of its electric Rivian vans by early 2025, after testing the technology in select markets like Boston. It’s important to note that <a class="link" href="https://www.marketwatch.com/story/amazon-lost-nearly-1-billion-on-its-rivian-investment-last-week-as-another-analyst-downgrades-ev-makers-stock-9e77ee1f?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge" target="_blank" rel="noopener noreferrer nofollow">Amazon is Rivian’s largest shareholder with a stake of 16.6%.</a></p><p class="paragraph" style="text-align:left;">Amazon also released <a class="link" href="https://www.theverge.com/2024/10/9/24266204/amazon-ai-shopping-guides-catalog-feature-availability?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge" target="_blank" rel="noopener noreferrer nofollow">AI Shopping Guides</a> to help users find products based on specific features, offering <b>tailored suggestions</b> and product information for over 100 product types, including TVs, headphones, and skincare. </p><p class="paragraph" style="text-align:left;">It’s a more visual way to filter products, replacing traditional tick-box menus with interactive options for factors like brand, use case (e.g., sport or gaming), and connectivity type. Each guide includes educational content and customer insights to help users better understand product features and make more <b>informed purchasing decisions.</b></p><p class="paragraph" style="text-align:left;">The AI guides appear automatically during <b>searches </b>when relevant, or users can directly explore them through Amazon’s mobile website and apps for iOS and Android.</p><h3 class="heading" style="text-align:left;" id="ai-for-refining-carbon-accounting-a">AI for Refining Carbon Accounting and Updates from BioNTech’s Inaugural AI Day</h3><p class="paragraph" style="text-align:left;">AI saw some <a class="link" href="https://techcrunch.com/2024/10/09/unhappy-with-their-exit-these-ex-planetly-employees-are-using-ai-to-refine-carbon-accounting/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge#" target="_blank" rel="noopener noreferrer nofollow">climate tech</a> applications last week, with Forward Earth focusing on using AI to automate the calculation of complex CO2 footprints. The co-founders were former executives at carbon accounting startup Planetly, who launched <b>Forward Earth</b> since they weren’t too happy with Planetly&#39;s acquisition and shutdown by OneTrust.</p><p class="paragraph" style="text-align:left;">BioNTech recently had its <a class="link" href="https://investors.biontech.de/news-releases/news-release-details/biontech-highlights-ai-capabilities-and-rd-use-cases-inaugural?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge" target="_blank" rel="noopener noreferrer nofollow">inaugural AI day</a> where they unveiled a bunch of key updates. </p><p class="paragraph" style="text-align:left;">Here’s the main ones you need to know:</p><ul><li><p class="paragraph" style="text-align:left;"><b>AI Scaling Across Immunotherapy Pipeline</b>: BioNtech detailed its strategy to scale AI capabilities throughout its immunotherapy pipeline, using AI to drive innovation in areas like DNA/RNA sequencing, proteomics, protein design, and immunohistochemistry.</p></li><li><p class="paragraph" style="text-align:left;"><b>Launch of Kyber Supercomputer</b>: InstaDeep (BioNTech’s AI subsidiary) unveiled Kyber - a near exascale supercomputer aimed at enabling high-performance computing for large-scale AI and biotechnology research.</p></li><li><p class="paragraph" style="text-align:left;"><b>Bayesian Flow Network (BFN)</b>: BioNTech presented BFN generative models for protein sequence generation, showing the potential for advancements in personalized vaccine development and targeted therapies.</p></li><li><p class="paragraph" style="text-align:left;"><b>DeepChain™ Platform and External Partnerships</b>: The DeepChain™ multiomics design platform was introduced for external partnerships after success in BioNTech’s internal projects, including the mRNA-encoded antibody RiboMab™ platform.</p></li></ul><h3 class="heading" style="text-align:left;" id="adobe-launches-free-web-app-for-con">Adobe Launches Free Web App for Content Credentials, Addressing Past AI Training Controversies</h3><p class="paragraph" style="text-align:left;">Adobe is launching a <a class="link" href="https://www.theverge.com/2024/10/8/24265031/adobe-content-authenticity-web-app-ai-label-availability?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge" target="_blank" rel="noopener noreferrer nofollow">free web app</a> to improve the process of applying Content Credentials to images, videos, and audio files.</p><p class="paragraph" style="text-align:left;">It’s an interesting turn of events since we saw <a class="link" href="https://www.bloomberg.com/news/articles/2024-04-12/adobe-s-ai-firefly-used-ai-generated-images-from-rivals-for-training?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge" target="_blank" rel="noopener noreferrer nofollow">Adobe use Midjourney-generated images</a> to train their Firefly model in the past, even though they said the model was a “<b>commercially safe</b>” alternative.</p><p class="paragraph" style="text-align:left;">It might be Adobe’s response to this situation where they lost a lot of trust when it comes to ethical AI practices. They’re <b>introducing</b> tools that let creators protect their work and opt out of AI training datasets, which shows they’re adapting their practices and policies for a more ethical approach.</p><h2 class="heading" style="text-align:left;" id="advancements-in-ai-research">Advancements in AI Research</h2><p class="paragraph" style="text-align:left;">One paper that really stood out last week was about exploring intelligence emergence in rule-based systems. OpenAI also released a <b>new benchmark</b> for evaluating AI in scientific discovery tasks. Researchers provided a fresh perspective on assessing LLM capabilities as well.</p><h3 class="heading" style="text-align:left;" id="how-rule-complexity-shapes-ai-behav">How Rule Complexity Shapes AI Behavior</h3><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/134d9fbc-2d90-4afb-ad2c-65a8d7906a5a/image.png?t=1728946472"/><div class="image__source"><span class="image__source_text"><p>Framework overview. <a class="link" href="https://www.arxiv.org/pdf/2410.02536?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Researchers introduced a <a class="link" href="https://www.arxiv.org/pdf/2410.02536?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge" target="_blank" rel="noopener noreferrer nofollow">new approach</a> to understanding the emergence of intelligent behavior in artificial systems by investigating how the complexity of rule-based systems influences the capabilities of models trained to predict these rules.</p><p class="paragraph" style="text-align:left;">Previously we saw the release of <a class="link" href="https://genai360.beehiiv.com/p/of-whales-and-strawberries?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge" target="_blank" rel="noopener noreferrer nofollow">LifeGPT</a> which also progressed work with cellular automata. This paper used elementary cellular automata (ECA) as a framework to generate behaviors ranging from <b>simple to highly complex. </b></p><p class="paragraph" style="text-align:left;">They also trained separate <b>GPT-2 language models</b> on datasets generated by individual ECAs and evaluated the models&#39; &quot;intelligence&quot; through performance on downstream logical reasoning tasks.</p><p class="paragraph" style="text-align:left;">A <b>positive correlation</b> was found between the complexity of ECA rules and the downstream performance of models trained on them. Models trained on Class IV ECA rules, which showed structured yet complex behaviors, performed optimally. </p><p class="paragraph" style="text-align:left;">Interestingly, it showed that models can learn<b> complex solutions</b> even when trained on simple rules, which is likely due to overparameterization. They presented a hypothesis that by learning to incorporate past states, models develop generalizable logic that can be reused across tasks.</p><h3 class="heading" style="text-align:left;" id="from-bronze-to-gold-open-a-is-ml-eb">From Bronze to Gold: OpenAI&#39;s MLE-bench Challenges AI Agents in Real-World ML Tasks</h3><p class="paragraph" style="text-align:left;">In addition to Swarm, OpenAI also introduced <a class="link" href="https://openai.com/index/mle-bench/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge" target="_blank" rel="noopener noreferrer nofollow">MLE-bench</a> last week. It’s a new benchmark for assessing the <b>capabilities of AI agents</b> in machine learning engineering tasks. </p><p class="paragraph" style="text-align:left;">This benchmark aims to provide a <b>rigorous measure</b> of progress in autonomous ML engineering agents, addressing the growing interest in using AI to automate scientific workflows.</p><p class="paragraph" style="text-align:left;">Key aspects of MLE-bench include:</p><ol start="1"><li><p class="paragraph" style="text-align:left;">A diverse set of 75 Kaggle competitions across various domains, including natural language processing, computer vision, and signal processing.</p></li><li><p class="paragraph" style="text-align:left;">Careful curation to ensure tasks are challenging and representative of contemporary ML engineering work.</p></li><li><p class="paragraph" style="text-align:left;">The ability to compare AI agent performance directly with human-level performance using Kaggle leaderboards.</p></li></ol><p class="paragraph" style="text-align:left;">The researchers evaluated several frontier language models on MLE-bench using open-source agent scaffolds. They found the <b>best-performing setup</b> to be OpenAI&#39;s o1-preview with AIDE scaffolding, which achieved at least a bronze medal level in 16.9% of competitions.</p><p class="paragraph" style="text-align:left;">Performance significantly improved when agents were given multiple attempts per competition, with o1-preview&#39;s score doubling from 16.9% to 34.1% when allowed 8 attempts. Moreover, Agents performed well on competitions solvable with well-known approaches but struggled with <b>debugging and recovering from missteps.</b></p><h3 class="heading" style="text-align:left;" id="science-agent-bench-putting-ai-to-t">ScienceAgentBench: Putting AI to the Test in Scientific Discovery</h3><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2410.05080?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge" target="_blank" rel="noopener noreferrer nofollow">ScienceAgentBench</a> is a <b>new benchmar</b>k for evaluating language agents in data-driven scientific discovery tasks. It addresses the growing interest in using AI to automate scientific workflows, while also highlighting the need for more rigorous assessment of these systems.</p><p class="paragraph" style="text-align:left;">102 diverse tasks were extracted from <b>44 peer-reviewed publications</b> across four scientific disciplines: Bioinformatics, Computational Chemistry, Geographical Information Science, and Psychology & Cognitive Neuroscience.</p><p class="paragraph" style="text-align:left;">There was a big focus on code generation, requiring agents to produce complete <b>Python programs</b> for data analysis and visualization tasks. They also used careful quality control measures, including expert validation and strategies to mitigate data contamination concerns.</p><p class="paragraph" style="text-align:left;">They evaluated five open-weight and proprietary LLMs using three frameworks: direct prompting, OpenHands CodeAct, and self-debug. Surprisingly, they found simpler approaches like self-debug usually <b>outperformed</b> more complex frameworks in both performance and cost-efficiency.</p><p class="paragraph" style="text-align:left;">Results show that even the best-performing agent, <b>Claude-3.5-Sonnet </b>using self-debug, could only solve 34.3% of tasks with expert-provided knowledge. It’s definitely an eye-opener about the challenges that AI agents are having in automating complex scientific workflows.</p><h3 class="heading" style="text-align:left;" id="new-benchmark-reveals-llm-reasoning">New Benchmark Reveals LLM Reasoning Disparities</h3><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/9969144f-49b0-49c0-82e0-3e890b16dfaa/image.png?t=1728946618"/><div class="image__source"><span class="image__source_text"><p>Graph comparing reasoning performance on GSM8K and Compositional GSM accuracy. <a class="link" href="https://arxiv.org/pdf/2410.01748?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Researchers from Mila, Microsoft Research, and Google DeepMind introduced a <a class="link" href="https://arxiv.org/pdf/2410.01748?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge" target="_blank" rel="noopener noreferrer nofollow">new approach</a> to evaluating the<b> reasoning capabilities </b>of LLMs in grade-school math problems. </p><p class="paragraph" style="text-align:left;">They introduced <b>Compositional GSM</b>, a two-hop version of the GSM8K benchmark that challenges LLMs to solve chained math problems.</p><p class="paragraph" style="text-align:left;">They also evaluated <b>various LLMs</b>, including open-source and proprietary models like Gemini, Gemma2, LLAMA3, GPT, Phi, Qwen2.5, and Mistral families.</p><p class="paragraph" style="text-align:left;">There were significant disparities between LLMs&#39; performance on standard GSM8K problems and the more complex Compositional GSM tasks. <b>Smaller, more cost-efficient</b>, and math-specialized models showed larger reasoning gaps.</p><p class="paragraph" style="text-align:left;">For instance, GPT-4o mini, which nearly matches GPT-4o on <b>standard benchmarks</b>, showed a 2-12x worse reasoning gap on Compositional GSM.</p><p class="paragraph" style="text-align:left;">It showed instruction-tuning effects vary across LLM sizes, with smaller models showing more improvement on standard GSM8K but less on Compositional GSM. Extensive math specialization didn’t necessarily improve performance on these <b>compositional tasks.</b></p><h2 class="heading" style="text-align:left;" id="conversations-we-loved">Conversations We Loved</h2><p class="paragraph" style="text-align:left;">A post about a new architecture which might be able to improve on <b>o1’s ability </b>to scale inference-time compute gained a lot of attention - and for good reason.</p><p class="paragraph" style="text-align:left;">Another discussion about <b>LongCite,</b> a new method to improve the trustworthiness of AI outputs, was also one to look at.</p><h3 class="heading" style="text-align:left;" id="pause-tokens-and-parallel-reasoning">Pause Tokens and Parallel Reasoning: Entropix&#39;s New Approach to AI Cognition</h3><p class="paragraph" style="text-align:left;">A discussion about <a class="link" href="https://x.com/tedx_ai/status/1843066379299098937?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge" target="_blank" rel="noopener noreferrer nofollow">Entropix</a> came up last week, which is an <b>innovative architecture </b>anonymously released by an AI researcher. </p><p class="paragraph" style="text-align:left;">It highlighted a new approach to replicating and potentially improving upon OpenAI&#39;s latest o1 model&#39;s ability to scale <b>inference-time compute</b> - essentially improving its capacity to &#39;think&#39; before responding.</p><p class="paragraph" style="text-align:left;">What stood out was architecture&#39;s use of <b>uncertainty measurement</b> (formally defined as entropy and varentropy) to improve reasoning. </p><p class="paragraph" style="text-align:left;">The model inserts pause tokens like &quot;...wait&quot; when uncertain about the next best tokens or thoughts, prompting it to reflect and produce additional chains of thought. This approach allows for <b>dynamic scaling</b> of inference-time compute for more profound and strategic thinking.</p><p class="paragraph" style="text-align:left;">Makes us wonder how this method could potentially unlock more <b>powerful reasoning </b>capabilities from smaller models like Llama 3.1-1B, which can run locally on a laptop. </p><p class="paragraph" style="text-align:left;">Perhaps advancements in the underlying attention mechanism, rather than relying solely on prompt engineering or methods like <b>Monte Carlo Tree Search</b>, might be key to unlocking the true potential of inference-time compute</p><h3 class="heading" style="text-align:left;" id="boosting-llm-trustworthiness-with-f">Boosting LLM Trustworthiness with Fine-Grained Citations</h3><p class="paragraph" style="text-align:left;">Raschka brought an interesting paper called <a class="link" href="https://x.com/rasbt/status/1845468766118850862?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge" target="_blank" rel="noopener noreferrer nofollow">LongCite</a> to the spotlight, which probably slipped under the radar. It mentions a new approach to boost the trustworthiness of LLMs in <b>long-context question answering tasks.</b></p><p class="paragraph" style="text-align:left;">What makes this paper important is its focus making AI-generated content more trustworthy. <b>Misinformation</b> is common in today’s world, so equipping LLMs with the ability to generate precise, sentence-level citations could definitely raise human confidence in AI outputs. Not only does that help address concerns about hallucinations, but it also helps us verify information.</p><p class="paragraph" style="text-align:left;">The method used also stands out, since they leveraged existing LLMs to create a dataset of long-context Q&A instances. It’s an example of step-wise refinement that shows how AI can be adapted to deal with <b>pressing issues.</b></p><p class="paragraph" style="text-align:left;">Maybe we’ll see a shift toward models that <b>prioritize citation </b>and accountability as core functionalities in the future.</p><h2 class="heading" style="text-align:left;" id="frameworks-we-love">Frameworks We Love</h2><p class="paragraph" style="text-align:left;">Some frameworks that caught our attention in the last week include:</p><ul><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2410.07171?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge" target="_blank" rel="noopener noreferrer nofollow">IterComp</a>: Combines the strengths of multiple diffusion models to improve compositional text-to-image generation. </p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2410.07164?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge" target="_blank" rel="noopener noreferrer nofollow">AvatarGo</a>: Generates animatable 4D human-object interaction (HOI) scenes directly from textual inputs.</p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2410.07163?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge" target="_blank" rel="noopener noreferrer nofollow">SimNPO</a>: Framework for LLMs that aims to remove unwanted data influences and associated model capabilities </p></li></ul><p class="paragraph" style="text-align:left;">If you want your framework to be featured here, reply to this email saying hi :)</p><h2 class="heading" style="text-align:left;" id="money-moving-in-ai">Money Moving in AI</h2><p class="paragraph" style="text-align:left;">AI applications in <b>mineral discovery </b>seem to be making good progress as KoBold Metals secured $491 million after a massive discovery. Meanwhile, Basecamp and Braintrust secured $60 million and $36 million respectively.</p><h3 class="heading" style="text-align:left;" id="ko-bold-metals-secures-491-million">KoBold Metals Secures $491 Million</h3><p class="paragraph" style="text-align:left;"><a class="link" href="https://techcrunch.com/2024/10/07/ai-powered-critical-mineral-startup-kobold-metals-has-raised-491m-filings-reveal/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge" target="_blank" rel="noopener noreferrer nofollow">KoBold Metals</a>, an AI-powered mineral discovery startup, is close to raising over half a billion dollars, having already secured $491 million of a targeted <b>$527 million round. </b></p><p class="paragraph" style="text-align:left;">This funding comes on the heels of KoBold&#39;s discovery of what might be one of the largest high-grade copper deposits in history, showing the potential of <b>AI in mineral exploration. </b></p><h3 class="heading" style="text-align:left;" id="basecamp-raises-60-million-in-serie">Basecamp Raises $60 Million in Series B Funding Round</h3><p class="paragraph" style="text-align:left;">A London-based startup building an AI agent for <b>biology and biodiversity</b> insights <a class="link" href="https://techcrunch.com/2024/10/09/basecamp-research-taps-60m-to-build-a-gpt-for-biology/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge" target="_blank" rel="noopener noreferrer nofollow">called Basecamp</a> has raised $60 million in a Series B funding round led by Singular. </p><p class="paragraph" style="text-align:left;">The company aims to create an AI that can not only <b>answer questions</b> about biology but also produce new insights beyond human capabilities, with its BaseFold model already claiming to outperform DeepMind&#39;s AlphaFold 2 in certain protein structure predictions</p><h3 class="heading" style="text-align:left;" id="rad-ai-secures-50-million-in-series">Rad AI Secures $50 Million in Series B Funding Round</h3><p class="paragraph" style="text-align:left;">A startup focusing on generative AI for healthcare called <a class="link" href="https://www.linkedin.com/posts/doktorgurson_i-wanted-to-share-a-key-takeaway-from-rad-activity-7249440062113771520-HJIT/?utm_source=share&utm_medium=member_desktop" target="_blank" rel="noopener noreferrer nofollow">Rad AI</a> secured $50 million in series B funding led by Khosla Ventures, which brings their total capital raised to over $80 million. The CEO attributed a big part of their success to <b>strong business metrics, </b>including tripling year-over-year revenue and adoption by more than a third of all US health systems.</p><h3 class="heading" style="text-align:left;" id="braintrust-raises-36-million-in-ser">Braintrust Raises $36 Million in Series A Funding Round</h3><p class="paragraph" style="text-align:left;"><a class="link" href="https://www.braintrust.dev/blog/announcing-series-a?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-s-swarm-first-open-source-multimodal-moe-model-tensorwave-s-amd-challenge" target="_blank" rel="noopener noreferrer nofollow">Braintrust</a> focuses on empowering teams to build robust LLM-enabled applications. They announced a $36 million funding round, bringing their total funding to $45 million. In particular, they’re working toward building features like <b>being able to share code</b> in dev environments.</p></div><div class='beehiiv__footer'><br class='beehiiv__footer__break'><hr class='beehiiv__footer__line'><a target="_blank" class="beehiiv__footer_link" style="text-align: center;" href="https://www.beehiiv.com/?utm_campaign=600f76f3-9be3-4e69-960a-35ee1939b22a&utm_medium=post_rss&utm_source=genai360_weekly_ai_news">Powered by beehiiv</a></div></div>
  ]]></content:encoded>
</item>

      <item>
  <title>Attend RetrieveX, the Conference for AI Data Leaders</title>
  <description>Final Batch of Free Tickets Inside. Event is on Oct 17 in SF, CA</description>
      <enclosure url="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/18ea3c8b-ff75-495a-b090-399c675e908e/RetrieveX_-_Top_Speakers-1200x630-px.png" length="229912" type="image/png"/>
  <link>https://genai360.beehiiv.com/p/retrievex</link>
  <guid isPermaLink="true">https://genai360.beehiiv.com/p/retrievex</guid>
  <pubDate>Mon, 14 Oct 2024 23:08:31 +0000</pubDate>
  <atom:published>2024-10-14T23:08:31Z</atom:published>
  <content:encoded><![CDATA[
    <div class='beehiiv'><style>
  .bh__table, .bh__table_header, .bh__table_cell { border: 1px solid #C0C0C0; }
  .bh__table_cell { padding: 5px; background-color: #FFFFFF; }
  .bh__table_cell p { color: #2D2D2D; font-family: 'Helvetica',Arial,sans-serif !important; overflow-wrap: break-word; }
  .bh__table_header { padding: 5px; background-color:#F1F1F1; }
  .bh__table_header p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important; overflow-wrap: break-word; }
</style><div class='beehiiv__body'><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/3ffd7f7d-5991-478d-8728-66b44fb1f91b/image.png?t=1728946240"/></div><p class="paragraph" style="text-align:left;">Hi there,</p><p class="paragraph" style="text-align:left;">You probably know this already since you&#39;ve been reading our newsletter quite a bit, but as the free tickets for the conference are running out, I wanted to share <b>RetrieveX - the first industry event focused entirely on Retrieval for AI for 200+ Data Leaders in AI</b>! We&#39;re thrilled to have you join us this week, on<b> October 17th from 10.30 AM at The Midway in San Francisco</b>.</p><p class="paragraph" style="text-align:left;"></p><div class="button" style="text-align:center;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="background-color:#ff8a00;" href="https://retrievex.co/application?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=attend-retrievex-the-conference-for-ai-data-leaders"><span class="button__text" style="color:#FFFFFF;"> Request a Free Ticket </span></a></div><h2 class="heading" style="text-align:left;" id="event-details"><b>Event Details:</b></h2><ul><li><p class="paragraph" style="text-align:left;"><b>Date:</b> October 17, 2024</p></li><li><p class="paragraph" style="text-align:left;"><b>Doors open & light breakfast:</b> 10:30 AM</p></li><li><p class="paragraph" style="text-align:left;"><b>Location:</b> The Midway, 900 Marin St, San Francisco</p></li></ul><h2 class="heading" style="text-align:left;" id="what-to-expect"><b>What to expect:</b></h2><ul><li><p class="paragraph" style="text-align:left;">Talks from the creators of PyTorch, Meta Chameleon, Albumentations, Kubeflow, and CAFFE creators.</p></li><li><p class="paragraph" style="text-align:left;">Insights from top executives at <b>Microsoft, AWS, Bayer, Y Combinator, and Flagship Pioneering</b></p></li><li><p class="paragraph" style="text-align:left;">Deep dives into building AI search on object storage and highly accurate <b>multi-modal RAG.</b></p></li><li><p class="paragraph" style="text-align:left;"><b>Delivering ROI with LLM-powered apps </b>across Healthcare, Life Sciences, Legal, Customer Support, Finance, Marketing.</p></li></ul><h2 class="heading" style="text-align:left;" id="talk-highlights">Talk Highlights</h2><div class="image"><img alt="" class="image__image" style="border-radius:0px 0px 0px 0px;border-style:solid;border-width:0px 0px 0px 0px;box-sizing:border-box;border-color:#E5E7EB;" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/bac6ef6a-b71c-4870-9eef-65e193da35b7/image.png?t=1728787064"/></div><div class="button" style="text-align:center;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="background-color:#ff8a00;" href="https://retrievex.co/application?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=attend-retrievex-the-conference-for-ai-data-leaders"><span class="button__text" style="color:#FFFFFF;"> Request a Free Ticket </span></a></div><div class="section" style="background-color:#ff8a00;border-color:#FFFFFF;border-radius:5px;border-style:dotted;border-width:1px;margin:20.0px 20.0px 20.0px 20.0px;padding:20.0px 20.0px 20.0px 20.0px;"><h2 class="heading" style="text-align:left;"><b>Elevating Content Creation Through Vector Embeddings & Contextual Search at Spotter</b></h2><p class="paragraph" style="text-align:left;">Learn how the Data team at Spotter, the team behind creators such as Mr Beast, revolutionized contextual search capabilities, enabling YouTube creators with enhanced ideation</p><p class="paragraph" style="text-align:left;">Sazz Rahman, <b>Spotter</b></p></div><div class="section" style="background-color:#ff8a00;border-color:#FFFFFF;border-radius:5px;border-style:dotted;border-width:1px;margin:20.0px 20.0px 20.0px 20.0px;padding:20.0px 20.0px 20.0px 20.0px;"><h2 class="heading" style="text-align:left;"><b>Accelerating Generative AI with Deep Lake at Matterport</b></h2><p class="paragraph" style="text-align:left;">How to remove all furniture from your property without ever opening the door.</p><p class="paragraph" style="text-align:left;">Alan Dolhasz, <b>Matterport</b></p></div><div class="image"><img alt="" class="image__image" style="border-radius:0px 0px 0px 0px;border-style:solid;border-width:0px 0px 0px 0px;box-sizing:border-box;border-color:#E5E7EB;" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/700d9cde-2d3d-4439-adee-45b8ce70d80d/image.png?t=1728787364"/></div><div class="section" style="background-color:#ff8a00;border-color:#FFFFFF;border-radius:5px;border-style:dotted;border-width:1px;margin:20.0px 20.0px 20.0px 20.0px;padding:20.0px 20.0px 20.0px 20.0px;"><h2 class="heading" style="text-align:left;"><b>From Concept to Care: GenAI-Driven Innovation in Digital Health at Bayer</b></h2><p class="paragraph" style="text-align:left;">This session explores a GenAI-driven architecture designed to accelerate healthcare innovation, balancing speed, safety, and efficiency in high-risk applications.</p><p class="paragraph" style="text-align:left;">Steffen Vogler, <b>Principal Data Scientist, Bayer</b></p></div><div class="image"><img alt="" class="image__image" style="border-radius:0px 0px 0px 0px;border-style:solid;border-width:0px 0px 0px 0px;box-sizing:border-box;border-color:#E5E7EB;" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/703ddccd-ef33-4666-a6e5-f1c75539bd94/image.png?t=1728787376"/></div><div class="button" style="text-align:center;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="background-color:#ff8a00;" href="https://retrievex.co/application?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=attend-retrievex-the-conference-for-ai-data-leaders"><span class="button__text" style="color:#FFFFFF;"> Request a Free Ticket </span></a></div><p class="paragraph" style="text-align:left;">Hope to see you there,</p><p class="paragraph" style="text-align:left;">Mikayel</p></div><div class='beehiiv__footer'><br class='beehiiv__footer__break'><hr class='beehiiv__footer__line'><a target="_blank" class="beehiiv__footer_link" style="text-align: center;" href="https://www.beehiiv.com/?utm_campaign=9624f5bc-77e4-468f-97b9-094ae747839f&utm_medium=post_rss&utm_source=genai360_weekly_ai_news">Powered by beehiiv</a></div></div>
  ]]></content:encoded>
</item>

      <item>
  <title>OpenAI DevDay 2024, Nvidia Unveils NVLM 1.0, Meta Advances Video Generation</title>
  <description>Plus, Last Remaining FREE Tickets to RetrieveX on Oct 17 for Our Subscribers </description>
  <link>https://genai360.beehiiv.com/p/of-video-gen-models-and-agents</link>
  <guid isPermaLink="true">https://genai360.beehiiv.com/p/of-video-gen-models-and-agents</guid>
  <pubDate>Thu, 10 Oct 2024 15:33:41 +0000</pubDate>
  <atom:published>2024-10-10T15:33:41Z</atom:published>
  <content:encoded><![CDATA[
    <div class='beehiiv'><style>
  .bh__table, .bh__table_header, .bh__table_cell { border: 1px solid #C0C0C0; }
  .bh__table_cell { padding: 5px; background-color: #FFFFFF; }
  .bh__table_cell p { color: #2D2D2D; font-family: 'Helvetica',Arial,sans-serif !important; overflow-wrap: break-word; }
  .bh__table_header { padding: 5px; background-color:#F1F1F1; }
  .bh__table_header p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important; overflow-wrap: break-word; }
</style><div class='beehiiv__body'><div class="section" style="background-color:#ff8a00;border-color:#ff8a00;border-radius:2px;border-style:solid;border-width:2px;margin:10.0px 10.0px 10.0px 10.0px;padding:10.0px 10.0px 10.0px 10.0px;"><h2 class="heading" style="text-align:left;"><span style="color:#FFFFFF;">GenAI360 Exclusive: Last Week to Get Free Tickets for RetrieveX Conference on Oct 17 in San Francisco. </span></h2><p class="paragraph" style="text-align:left;"><span style="color:#FFFFFF;">Come hear from the creators of PyTorch, Albumentations, Meta Chameleon, Kubeflow, CAFFE, along with leaders from Microsoft, AWS, Bayer, Flagship Pioneering, Cresta, VoyageAI, Omneky how to build best retrieval for AI. </span></p><p class="paragraph" style="text-align:left;"><span style="color:#FFFFFF;">If you&#39;re executive who&#39;s considering or working on GenAI projects, fill in the form below for a complimentary ticket for the conference - hurry up because tickets are limited! </span><span style="color:#FFFFFF;"><b>Please note that the conference is in-person only</b></span><span style="color:#FFFFFF;">.</span></p><div class="button" style="text-align:center;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="background-color:#FFFFFF;" href="https://www.retrievex.co/application?utm_source=newsletter&utm_medium=email&utm_campaign=weekly2"><span class="button__text" style="color:#FFFFFF;"><span style="color:#222222;">Get Tickets Today</span></span></a></div><p class="paragraph" style="text-align:left;"><span style="color:#FFFFFF;"><b>Date:</b></span><span style="color:#FFFFFF;"> October 17, 10:30am - 7pm PT</span><br><span style="color:#FFFFFF;"><b>Venue: </b></span><span style="color:#FFFFFF;">The Midway, 900 Marin St, San Francisco</span></p></div><h2 class="heading" style="text-align:left;" id="key-takeaways">Key Takeaways</h2><ul><li><p class="paragraph" style="text-align:left;">John Hopfield and <b>Geoffrey Hinton, the Godfather of AI, win Nobel Prize</b>… in Physics.</p></li><li><p class="paragraph" style="text-align:left;">OpenAI announced<b> new API features</b>, including a Realtime API for voice-to-voice conversations and Vision fine-tuning for improved image tasks like object detection.</p></li><li><p class="paragraph" style="text-align:left;">Nvidia released <b>NVLM-D-72B</b>, a large multimodal model rivaling GPT-4o and Claude, with its model weights publicly available.</p></li><li><p class="paragraph" style="text-align:left;">Liquid AI released Liquid Foundation Models (LFMs), delivering SOTA performance with a <b>smaller memory footprint.</b></p></li><li><p class="paragraph" style="text-align:left;"><b>Meta&#39;s Movie Gen</b> lets users create videos and edit personal images using simple text inputs, producing HD media.</p></li><li><p class="paragraph" style="text-align:left;"><b>FlashMask</b> optimizes attention mechanisms by introducing a column-wise sparse representation for attention masks, improving memory efficiency for LLMs handling sequences up to 128K tokens.</p></li><li><p class="paragraph" style="text-align:left;"><b>PlasmidGPT</b> is a transformer-based model for designing plasmid DNA which improves annotation accuracy by 81%.</p></li></ul><p class="paragraph" style="text-align:left;"><i>Got forwarded this newsletter? Subscribe below👇</i></p><div class="button" style="text-align:left;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="background-color:#ff8a00;" href="https://genai360.beehiiv.com/subscribe?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation"><span class="button__text" style=""> Subscribe </span></a></div><h2 class="heading" style="text-align:left;" id="the-latest-ai-news">The Latest AI News</h2><p class="paragraph" style="text-align:left;">Folks, busy week as always. OpenAI took the <b>spotligh</b><b>t</b> in terms of AI news last week yet again. It wasn’t all positive though, as there was yet another OpenAI exodus with four key members leaving (and two of them joining competitor companies already). Although there were some neat announcements for new API features at OpenAI DevDay.</p><p class="paragraph" style="text-align:left;">Meanwhile, we saw new model releases from the likes of Nvidia, Liquid AI, and Resolve AI, as well as developments in <b>video generation AI.</b></p><h3 class="heading" style="text-align:left;" id="open-ai-unveils-realtime-api-vision">OpenAI Unveils Realtime API, Vision Fine-Tuning, and More at DevDay 2024</h3><p class="paragraph" style="text-align:left;"><a class="link" href="https://openai.com/devday/content/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">OpenAI Devday 2024</a> took place last week and there were some huge <b>feature announcements. Are they worth a </b><a class="link" href="https://www.moomoo.com/news/post/44394161/openai-suggests-in-2026-a-maximum-loss-of-14-billion?level=1&data_ticket=1728574075683991&utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow"><b>loss of $44B from 2023 to 2028</b></a><b>? We&#39;ll see.</b></p><p class="paragraph" style="text-align:left;">Here’s a quick rundown of what OpenAI revealed in terms of API features:</p><ol start="1"><li><p class="paragraph" style="text-align:left;"><b><a class="link" href="https://openai.com/index/introducing-the-realtime-api/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">Realtime API</a></b><b>:</b> For devs looking to quickly add speech-to-speech into apps, they’ll be glad to hear about this. Voice assistant development is a whole lot easier since Realtime API combines transcription, text reasoning, and text-to-speech into a single API call. Currently available to paid developers with text and audio input/output tokens priced at $0.06/minute for input and $0.24/minute for output.</p></li><li><p class="paragraph" style="text-align:left;"><b><a class="link" href="https://openai.com/index/introducing-vision-to-the-fine-tuning-api/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">Vision</a></b><b>: </b>Further multimodal developments were seen in the vision update, which means that GPT-4o now supports fine-tuning with images. The plans are for free vision fine-tuning tokens to be available until October 31 2024, after which training and inference will be priced based on token usage.</p></li><li><p class="paragraph" style="text-align:left;"><b><a class="link" href="https://openai.com/index/api-prompt-caching/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">Prompt caching</a></b><b>: </b>Devs can now reduce costs and latency by reusing previously seen input tokens through the new Prompt Caching feature in the GPT-4o models, offering discounts and faster processing. Prompt Caching automatically applies on prompts over 1,024 tokens, and cached prompts receive a 50% discount on input tokens. </p></li><li><p class="paragraph" style="text-align:left;"><b><a class="link" href="https://openai.com/index/api-model-distillation/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">Model distillation</a></b><b>: </b>This feature helps improve performance at lower costs as it allows for outputs from larger models like GPT-4o to be used to fine-tune smaller, cost-efficient models such as GPT-4o mini. Model Distillation is available for all devs with free training tokens until October 31 2024, and <a class="link" href="https://openai.com/api/pricing/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">standard pricing thereafter.</a></p></li><li><p class="paragraph" style="text-align:left;"><b><a class="link" href="https://openai.com/index/introducing-canvas/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">Canvas</a></b><b>:</b><b> </b>A new interface for writing and coding projects with ChatGPT, allowing collaboration beyond basic conversation. It enables direct editing and feedback - similar to a code reviewer or copy editor. It&#39;s currently in early beta with plans for rapid development based on user feedback.</p></li></ol><h3 class="heading" style="text-align:left;" id="ai-video-generation-heats-up-movie-">AI Video Generation Heats Up: Movie Gen, VidGen-2, and OpenFLUX Make Waves</h3><p class="paragraph" style="text-align:left;">A bunch of model releases were seen in the AI video generation space last week.</p><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfyMqWTWvo_45wNbhY3gdQCNsgjcj5sEKMZofm0rvLgCCbZ_OeaIxN36ojw85beIO6XG4fTXiyIPLEhGxJdw65CXcXkoOq7_QoUFTHODgk7edKxGlBFgIGzfjW3FyjB4e9TDmQWbj0v33M7kxYEtj-Yoes?key=8XMgMByevwbz8SacwQ5JKQ"/><div class="image__source"><span class="image__source_text"><p>Example of a video produced by Meta’s Movie Gen. <a class="link" href="https://ai.meta.com/blog/movie-gen-media-foundation-models-generative-ai-video/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Meta introduced <a class="link" href="https://ai.meta.com/blog/movie-gen-media-foundation-models-generative-ai-video/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">Movie Gen</a>, a breakthrough <b>generative AI research project</b> for media.</p><p class="paragraph" style="text-align:left;">It encompasses multiple modalities including image, video, and audio generation and editing.</p><p class="paragraph" style="text-align:left;">The system allows users to produce custom videos and sounds, edit existing videos, and transform personal images into unique videos using simple text inputs. Movie Gen uses a <b>30B parameter transformer model </b>for video generation. </p><p class="paragraph" style="text-align:left;">A quick overview of Movie Gen’s main features:</p><ol start="1"><li><p class="paragraph" style="text-align:left;"><b>Video Generation</b>: Using a 30B parameter transformer model, Movie Gen can create high-quality, high-definition videos up to 16 seconds long at 16 fps from text prompts. </p></li><li><p class="paragraph" style="text-align:left;"><b>Precise Video editing:</b> Movie Gen&#39;s editing capabilities allow for both localized and global changes to existing videos. Users can add, remove, or replace elements, or modify entire backgrounds and styles using text prompts. Unlike traditional editing tools, Movie Gen preserves original content while targeting only relevant pixels.</p></li><li><p class="paragraph" style="text-align:left;"><b>Audio Generation:</b> A 13B parameter audio model can generate high-fidelity audio up to 45 seconds long, including ambient sound, sound effects, and background music, all synchronized with video content.</p></li></ol><p class="paragraph" style="text-align:left;">Meanwhile, <a class="link" href="https://www.businesswire.com/news/home/20241001089771/en/Helm.ai-Introduces-VidGen-2-Generative-AI-for-Higher-Resolution-and-Enhanced-Realism-Multi-Camera-Video-for-Autonomous-Driving?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">HelmAI’s VidGen-2</a> offers 2X higher resolution than its predecessor, generating video sequences at <b>696 x 696 resolution</b>. It improves realism at 30 fps, with frame rates ranging from 5 to 30 fps. It can generate videos without an input prompt or with a single image or input video as the prompt.</p><p class="paragraph" style="text-align:left;">The model generates driving scene videos across multiple geographies, camera types, and vehicle perspectives. It produces highly realistic appearances and temporally consistent object motion. VidGen-2 learns and reproduces <b>human-like driving behaviors</b>, simulating the motions of the ego-vehicle and surrounding agents in accordance with traffic rules.</p><p class="paragraph" style="text-align:left;">Note that VidGen-2 was trained on<b> thousands of hours </b>of diverse driving footage using NVIDIA H100 Tensor Core GPUs. </p><p class="paragraph" style="text-align:left;">Midjourney’s competitor <a class="link" href="https://genai360.beehiiv.com/p/on-device-llms-challenge-gpt3-5?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">FLUX</a> by Black Forest Labs also saw a new release called <a class="link" href="https://huggingface.co/Kijai/OpenFLUX-comfy/blob/main/OpenFlux-fp8_e4m3fn.safetensors?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">OpenFLUX.1</a>, which is a fine-tuned version of the <b>FLUX.1-schnell model</b>. The main aim of this model was to train out the distillation to leave behind an open-source model that can be fine-tuned.</p><h3 class="heading" style="text-align:left;" id="nvidia-unveils-nvlm-10-and-powers-b">Nvidia Unveils NVLM 1.0 and Powers Brave&#39;s Local LLMs</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXdyyt8uA90SO8VMdt8Kl_WylVJdBHzDX7EPrAAp7KHis8_SPzpfo_Gb8lu1eH83Uw7SUzx7qXn1NkXvtatpjkTO4vSh5fDx9XZN4ZwOlJuW0VJqfzXvNqsqIm_lnpFfs1-I4GFc9olX-D2te2nRRUf1-qzB?key=8XMgMByevwbz8SacwQ5JKQ"/><div class="image__source"><span class="image__source_text"><p>Nvidia’s new model can respond to prompts containing text and images. <a class="link" href="https://bgr.com/tech/nvidia-stunned-the-world-with-a-chatgpt-rival-thats-as-good-as-gpt-4o/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Even though we know Nvidia mainly for providing AI chips, they unexpectedly announced <a class="link" href="https://bgr.com/tech/nvidia-stunned-the-world-with-a-chatgpt-rival-thats-as-good-as-gpt-4o/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">NVLM 1.0</a> - a family of <b>large multimodal language models.</b></p><p class="paragraph" style="text-align:left;">The flagship model, NVLM-D-72B, with 72 billion parameters, is reported to perform on par with or better than leading proprietary models like GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro in various tasks. Unlike its competitors, Nvidia is making NVLM 1.0&#39;s <b>model weights </b>and training code publicly available. </p><p class="paragraph" style="text-align:left;">NVLM 1.0 demonstrates strong performance in vision-language tasks, effectively processing and reasoning about both text and images. The model shows <b>versatility</b> in tasks requiring OCR, reasoning, localization, common sense, world knowledge, and coding abilities.</p><p class="paragraph" style="text-align:left;">Additionally, after multimodal training, NVLM-D-72B shows improved accuracy on text-only tasks compared to its LLM backbone, indicating enhanced overall language understanding.</p><p class="paragraph" style="text-align:left;">Brave, a privacy-focused web browser, has launched<a class="link" href="https://blogs.nvidia.com/blog/rtx-ai-brave-browser/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow"> Leo AI,</a> a smart AI assistant. Leo AI enhances user experience by summarizing articles and videos, surfacing insights from documents, and answering questions.</p><p class="paragraph" style="text-align:left;">Ollama is an open-source project that sits on top of llama.cpp and simplifies AI model i<b>ntegration for applications.</b> It handles tasks like downloading and configuring specific AI models, making it easier for applications to access local AI capabilities. NVIDIA optimizes tools like Ollama for their hardware to deliver faster, more responsive AI experiences on RTX GPUs.</p><h3 class="heading" style="text-align:left;" id="open-ai-exodus-episode-which-one-ar">OpenAI Exodus: Episode… (which one are we one now?)</h3><p class="paragraph" style="text-align:left;">This isn’t the first time we’ve seen various <a class="link" href="https://genai360.beehiiv.com/p/io-openai-exodus-14-llms?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">key members leave OpenAI</a> in such a short span of time. </p><p class="paragraph" style="text-align:left;"><a class="link" href="https://techcrunch.com/2024/10/01/anthropic-hires-openai-co-founder-durk-kingma/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">Durk Kingma</a> isn’t as well-known as some of the other figures, but he&#39;s a <b>co-founde</b>r of OpenAI who announced his move to Anthropic. </p><p class="paragraph" style="text-align:left;">Kingma&#39;s hiring is part of a broader pattern of <b>high-profile recruitments by Anthropic</b>. The company has recently brought on board other notable figures from OpenAI, including Jan Leike (OpenAI&#39;s former safety lead) and John Schulman (another OpenAI co-founder).</p><p class="paragraph" style="text-align:left;"><a class="link" href="https://x.com/barret_zoph/status/1839095143397515452?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">Barret Zoph</a> also decided to leave OpenAI, who was around for the entire ChatGPT boom in 2022 since was a part of the post-training team. Those <b>weren’t </b>the only departures, as <a class="link" href="https://x.com/miramurati/status/1839025700009030027?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">Mira Murati</a> left OpenAI after 6 and a half years. <a class="link" href="https://x.com/sama/status/1839096160168063488?lang=en&utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">Altman</a> made a statement about Mira leaving, talking about how taxing it is to be a leader at the company.</p><p class="paragraph" style="text-align:left;">Lastly, Sora co-lead <a class="link" href="https://techcrunch.com/2024/10/03/a-co-lead-on-sora-openais-video-generator-has-left-for-google/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">Tim Brooks</a> released a short statement mentioning how he’ll be joining DeepMind to work on <b>video generators</b> and world simulators after spending 2 years at OpenAI working on Sora. </p><h3 class="heading" style="text-align:left;" id="resolve-ai-launches-efficient-ai-mo">Resolve AI Launches Efficient AI Model for DevOps Operations</h3><p class="paragraph" style="text-align:left;"><a class="link" href="https://greylock.com/portfolio-news/introducing-resolve/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">Resolve AI</a> introduced the world&#39;s first AI production engineer, designed to autonomously troubleshoot and resolve production issues. The company aims to empower engineers to focus on innovation rather than <b>routine maintenance tasks. </b></p><p class="paragraph" style="text-align:left;">The goal is to significantly reduce mean time to resolution (MTTR) and <b>accelerate </b>engineering teams&#39; productivity.</p><p class="paragraph" style="text-align:left;">The company built an agentic platform that integrates with tools like AWS, Kubernetes, observability stacks, GitHub, and Slack. It constructs a <b>comprehensive knowledge graph </b>of a company&#39;s production environment.</p><p class="paragraph" style="text-align:left;">The AI agent leverages this knowledge graph to troubleshoot incidents, analyze source code changes, detect anomalies, query logs, and suggest remediation actions. Resolve acts as an always-on partner, handling <b>operational tasks like </b>answering natural language queries and translating them into observability actions.</p><h2 class="heading" style="text-align:left;" id="advancements-in-ai-research">Advancements in AI Research</h2><p class="paragraph" style="text-align:left;">Last time, we saw a completely unexpected release in terms of biotech AI with <a class="link" href="https://genai360.beehiiv.com/p/of-whales-and-strawberries?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">Google’s whale bioacoustics model.</a> We’re seeing more biotech developments with PlasmidGPT, which is used to <b>annotate plasmid DNA sequences</b> (as the name suggests).</p><p class="paragraph" style="text-align:left;">Other developments include a new framework for multi-agent exploration and a new extension to the <b>FlashAttention </b>algorithm, as well as new generative AI models that take efficiency to another level.</p><h3 class="heading" style="text-align:left;" id="liquid-foundation-models-a-new-appr">Liquid Foundation Models: A New Approach to Efficient and Powerful AI</h3><p class="paragraph" style="text-align:left;"></p><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfyJg54qkOpALkLiUIwxzcNfxVuSKAfTWz-0SHawmux9Zppc9vEAk0GdJgcehRyY1gfkZn00klxtWM9ktXxisBFU1rEaHfIV5UL--488w6MaBTOEFrOVmWAKcJ6FoDI5fGhyj4J9e7ltnn6qt-CVICnOgVq?key=8XMgMByevwbz8SacwQ5JKQ"/><div class="image__source"><span class="image__source_text"><p>LFMs performance/size trade-offs are better than other models. <a class="link" href="https://www.liquid.ai/liquid-foundation-models?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Researchers from <a class="link" href="https://www.liquid.ai/liquid-foundation-models?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">Liquid AI</a> have introduced Liquid Foundation Models (LFMs), a new generation of generative AI models that aim to achieve <b>SOTA performance </b>while maintaining a smaller memory footprint and more efficient inference.</p><p class="paragraph" style="text-align:left;">LFMs address several challenges in current AI model development, including the trade-off between model size and performance, memory efficiency, and the ability to handle long-context tasks. The researchers proposed a <b>new architecture </b>that diverges from the traditional transformer-based models.</p><p class="paragraph" style="text-align:left;">Impressive results were reported across the board:</p><ul><li><p class="paragraph" style="text-align:left;">LFM-1B achieves the highest scores across various benchmarks in the 1B parameter category.</p></li><li><p class="paragraph" style="text-align:left;">LFM-3B outperforms previous generation 7B and 13B models on multiple benchmarks.</p></li><li><p class="paragraph" style="text-align:left;">LFM-40B offers performance comparable to larger models while using only 12B activated parameters.</p></li></ul><p class="paragraph" style="text-align:left;">They also show strong performance in multilingual capabilities and can effectively utilize their full 32k token context length. This <b>efficiency </b>enables long-context tasks on edge devices for the first time, so we might see new applications in areas like document analysis, context-aware chatbots, and improved RAG performance.</p><h3 class="heading" style="text-align:left;" id="plasmid-gpt-ai-powered-plasmid-desi">PlasmidGPT: AI-Powered Plasmid Design and Annotation</h3><p class="paragraph" style="text-align:left;"><a class="link" href="https://www.biorxiv.org/content/10.1101/2024.09.30.615762v1.full?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">PlasmidGPT</a> is a <b>generative framework</b> for designing and annotating plasmid DNA sequences. It addresses the challenges in automating plasmid design and leveraging the growing collection of engineered plasmid sequences.</p><p class="paragraph" style="text-align:left;">Some of the main innovations include:</p><ol start="1"><li><p class="paragraph" style="text-align:left;">A decoder-only transformer model pretrained on 153,000 engineered plasmid sequences from Addgene.</p></li><li><p class="paragraph" style="text-align:left;">The ability to generate de novo plasmid sequences that share characteristics with engineered plasmids but have low sequence identity to the training data.</p></li><li><p class="paragraph" style="text-align:left;">Conditional generation capabilities, allowing users to specify starting sequences or fine-tune the model for specific vector types.</p></li><li><p class="paragraph" style="text-align:left;">Effective prediction of various sequence-related attributes for both engineered and natural plasmids.</p></li></ol><p class="paragraph" style="text-align:left;">They showed that PlasmidGPT can generate plasmids with genetic part distributions similar to those in the training sequences. The model also <b>outperforms</b> previous approaches in predicting attributes like lab of origin, with a top-1 accuracy of 81% and top-10 accuracy of 92%.</p><h3 class="heading" style="text-align:left;" id="optimizing-attention-for-long-conte">Optimizing Attention for Long-Context LLMs with FlashMask</h3><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2410.01359v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">FlashMask</a> is an extension to the FlashAttention algorithm that significantly improves the efficiency and flexibility of<b> attention mechanisms in LLMs.</b></p><p class="paragraph" style="text-align:left;">It addresses the growing challenge of handling complex masking requirements in various LLM training and inference scenarios. FlashMask introduces a column-wise sparse representation of attention masks, allowing for <b>efficient handling</b> of a wide range of mask types without compromising computational accuracy.</p><p class="paragraph" style="text-align:left;">They introduced optimized kernel implementations that leverage sparsity in the attention mask to skip unnecessary computations. Moreover, extensive evaluation across different attention mask types and models were seen, showing significant throughput improvements in fine-tuning and <b>alignment training of LLMs.</b></p><p class="paragraph" style="text-align:left;">They reported end-to-end speedups ranging from 1.65x to 3.22x compared to existing FlashAttention dense methods. Additionally, FlashMask outperforms the latest counterpart, FlexAttention, by<b> 12.1% to 60.7% in terms of kernel TFLOPs/s.</b></p><h3 class="heading" style="text-align:left;" id="llm-guided-efficient-multi-agent-ex">LLM-Guided Efficient Multi-Agent Exploration Using LEMAE</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXcz2Aw76fGze_RDI3W3nNeqpF43rZO_6_pVHaq8aI3g3Cno8JYMV_BG51pU3AwcF8h5zLa7bQHhztZ2vZu0_Mdz-4l-rmabSJg29IQ9-7Fai1I1G594VE43UBxOBZnZfP1eDndGHAdNKRTE249_1bODJ5Jm?key=8XMgMByevwbz8SacwQ5JKQ"/><div class="image__source"><span class="image__source_text"><p>Map of the task “Pass.” (<a class="link" href="https://arxiv.org/pdf/2410.02511v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2410.02511v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">LLM for Efficient Multi-Agent Exploration (LEMAE)</a> leverages LLMs to enable efficient multi-agent exploration in reinforcement learning. This work addresses the longstanding challenge of efficient exploration in <b>complex multi-agent environments</b> with expansive state-action spaces.</p><p class="paragraph" style="text-align:left;">They used LLMs to ground linguistic knowledge into symbolic key states that are critical for task fulfillment. Additionally, a Subspace-based Hindsight Intrinsic Reward (SHIR) mechanism to <b>guide agents </b>toward key states by increasing reward density and a Key State Memory Tree (KSMT) to track transitions between key states and organize exploration were used.</p><p class="paragraph" style="text-align:left;">They showed LEMAE significantly outperforms existing SOTA approaches on challenging benchmarks like StarCraft Multi-Agent Challenge (SMAC) and Multiple-Particle Environment (MPE). In certain scenarios, LEMAE achieves a <b>10x</b> acceleration in exploration efficiency.</p><h2 class="heading" style="text-align:left;" id="conversations-we-loved">Conversations We Loved</h2><p class="paragraph" style="text-align:left;">A discussion about a new approach to information retrieval from complex documents took the spotlight, since it has benefits like having a <b>much faster</b> indexing process than traditional methods and it’s a lot more accurate when it comes to documents with a lot of visuals in them.</p><p class="paragraph" style="text-align:left;">The other discussion that caught our attention was about a new paper by Meta finally giving us some insight into the post-training process of the <b>Llama models.</b></p><h3 class="heading" style="text-align:left;" id="how-metas-mixture-of-judges-perfect">How Meta&#39;s Mixture of Judges Perfects LLM Post-Training</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXe_7_i-q649jkkpem8DJ6cXu8vWf7RZIaCjMoFtZS57j9RaoYipC5HocI5TFH-qBtcNFO14vk5Hi7bxUR55k8-EEalQnPVXGj-7yDsSkahaRB_uJSqDCzs-7-arFgi_gjriinwdc2G86amDDrsofikINnqK?key=8XMgMByevwbz8SacwQ5JKQ"/><div class="image__source"><span class="image__source_text"><p>Carr brought up Meta’s new paper. <a class="link" href="https://x.com/andrew_n_carr/status/1841178577129390553?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">If you’ve been wondering about how Meta has been going about the post-training process for the Llama models, you’re in luck as they recently released a <a class="link" href="https://x.com/andrew_n_carr/status/1841178577129390553?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">paper</a> on &quot;Constrained Generative Policy Optimization&quot; for <b>post-training LLMs.</b></p><p class="paragraph" style="text-align:left;">In particular, Carr put the spotlight on Meta&#39;s innovative approach to addressing the challenges of aligning LLMs on multiple tasks simultaneously. The introduction of Mixture of Judge models to achieve a balanced blend of<b> RLHF improvements</b> stood out in particular.</p><p class="paragraph" style="text-align:left;">This approach tackles common issues in multi-task alignment like reward hacking, multi-objective conflicts, and contradictory goals. The judges include <b>specialized models </b>for false refusal, precise instruction following, regex math/code reasoning, factuality, and safety.</p><p class="paragraph" style="text-align:left;">It emphasized the simplicity and effectiveness of this method, which <b>improves performance </b>across various benchmarks including MATH, Human Eval, ARC, and AlpacaEval. </p><h3 class="heading" style="text-align:left;" id="the-promise-of-col-pali-in-informat">The Promise of ColPali in Information Retrieval</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXeQ-dsaBhWUc0KHgWdyAoL4E4DQk-1ZcEyqYLhZG-L8LgkXVMsTSDZSLmgHd5OUjhXvIbKBAGWVv1K84kuQYB816cVR-metA00Y7JvBAE3LLdMfGTNRTBsKUUuoHba49-S-FlP8R0UVdeqrjxmUjGyT1fd6?key=8XMgMByevwbz8SacwQ5JKQ"/><div class="image__source"><span class="image__source_text"><p>Leonie’s post introducing ColPali. <a class="link" href="https://x.com/helloiamleonie/status/1839321865195851859?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">An interesting thread discussing <a class="link" href="https://x.com/helloiamleonie/status/1839321865195851859?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">ColPali</a> popped up last week. ColPali is a new approach to information retrieval from <b>complex document types </b>like PDFs. </p><p class="paragraph" style="text-align:left;">What caught our eye was the explanation of how ColPali combines two key technologies: the contextualized <b>late interaction mechanism</b> from ColBERT and the Vision Language Model capabilities of PaliGemma. </p><p class="paragraph" style="text-align:left;">This innovative approach replaces the traditional multi-step PDF parsing process with a simpler method using &quot;screenshots&quot; of <b>PDF pages</b>, which could change how we handle complex documents in information retrieval tasks. And on that note…</p><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXeRx5ZL18VfQHz6aCwd9CoT3uL-cj8uUTvreda2DEQDS0YOcTXgwK2gkvIWMDIK7r-yQjx5rDgbgoSFtZoLE1JYvhRISqOSLwyekh1ovKW7NW7Sv-sXlK6nq1tgLOP7ZiIPvOxAuipA2exBQtabESEwxRA?key=8XMgMByevwbz8SacwQ5JKQ"/><div class="image__source"><span class="image__source_text"><p>An interesting perspective on ColPali. <a class="link" href="https://x.com/ManuelFaysse/status/1839397235177722134?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><h2 class="heading" style="text-align:left;" id="frameworks-we-love">Frameworks We Love</h2><p class="paragraph" style="text-align:left;">Some frameworks that caught our attention in the last week include:</p><ul><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2410.02761?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">FakeShield</a>: Multimodal framework designed to address challenges in image forgery detection and localization (IFDL) </p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2410.02743?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">MA-RLHF</a>: Incorporates macro actions (sequences of tokens or higher-level language constructs) into the learning process for large language models.</p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2410.02705?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">ControlAR</a>: Integrates spatial controls into autoregressive image generation models</p></li></ul><p class="paragraph" style="text-align:left;">If you want your framework to be featured here, reply to this email saying hi :)</p><h2 class="heading" style="text-align:left;" id="money-moving-in-ai">Money Moving in AI</h2><p class="paragraph" style="text-align:left;">There were successful funding rounds for companies developing varied AI applications, including ones we don’t hear about often like <b>liquid cooling solutions</b> for data centers. Poolside, Submer, and Numa all saw successful funding rounds, with Poolside being the biggest winner last week.</p><h3 class="heading" style="text-align:left;" id="poolside-secures-500-million">Poolside Secures $500 Million</h3><p class="paragraph" style="text-align:left;"><a class="link" href="https://www.bloomberg.com/news/articles/2024-10-02/poolside-raises-500-million-with-bain-dst-for-coding-ai?accessToken=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzb3VyY2UiOiJTdWJzY3JpYmVyR2lmdGVkQXJ0aWNsZSIsImlhdCI6MTcyNzg4MzQyNSwiZXhwIjoxNzI4NDg4MjI1LCJhcnRpY2xlSWQiOiJTS080SUlUMVVNMFcwMCIsImJjb25uZWN0SWQiOiI3NkE1RDA0Q0RENTY0QjM1QUI2NTY4RDdBOTM1OUQ4MCJ9.KyGmlFM_3tpLLgj7f1ESRCegAh2xdkMhPepBNeo6cec&utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">Poolside</a>, a startup developing <b>AI-powered coding software,</b> has secured a massive $500 million investment led by Bain Capital Ventures, valuing the company at $3 billion. This significant funding round, which includes participation from DST Global, StepStone Group, Citi Ventures, and HSBC Ventures.</p><h3 class="heading" style="text-align:left;" id="submer-raises-555-million-in-series">Submer Raises $55.5 Million in Series C Funding Round</h3><p class="paragraph" style="text-align:left;"><a class="link" href="https://techcrunch.com/2024/10/02/as-data-center-usage-heats-up-submer-raises-55-5m-to-cool-things-down/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">Submer</a>, a Barcelona-based startup specializing in<b> liquid cooling solutions for data centers,</b> has raised $55.5 million in a Series C round at a $500 million valuation. The company&#39;s technology, which involves submerging entire server racks in biodegradable, non-conducting coolant. This helps address the growing challenge of heat management in AI-driven data centers.</p><h3 class="heading" style="text-align:left;" id="numa-secures-32-million-in-series-b">Numa Secures $32 Million in Series B Funding Round</h3><p class="paragraph" style="text-align:left;"><a class="link" href="https://techcrunch.com/2024/10/01/numa-is-bringing-ai-and-automation-to-car-dealerships/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">Numa</a>, an AI startup specializing in <b>customer service automation for auto dealerships</b>, has secured a $32 million Series B funding round, bringing its total raised to $48 million. The company, which pivoted from a general conversational AI product to focus specifically on the automotive industry, claims to be nearing cash-flow break-even with 600 customers across the U.S. and Canada.</p><h3 class="heading" style="text-align:left;" id="cerebras-files-for-ipo">Cerebras Files for IPO</h3><p class="paragraph" style="text-align:left;"><a class="link" href="https://www.cnbc.com/2024/09/30/cerebras-files-for-ipo.html?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=openai-devday-2024-nvidia-unveils-nvlm-1-0-meta-advances-video-generation" target="_blank" rel="noopener noreferrer nofollow">Cerebras Systems</a> filed for an initial public offering on September 30, 2024. The company plans to trade on the Nasdaq under the ticker symbol &quot;CBRS&quot;. Cerebras positions itself as a competitor to Nvidia in the AI chip market, claiming its <b>WSE-3 chip</b> has more cores and memory than Nvidia&#39;s popular H100.</p><p class="paragraph" style="text-align:left;">For the first six months of 2024, Cerebras reported a <b>net loss of $66.6 million</b> on $136.4 million in sales. This shows significant growth compared to the same period in 2023, where they had a net loss of $77.8 million on $8.7 million in sales. For the full year 2023, the company reported a net loss of $127.2 million on revenue of $78.7 million.</p></div><div class='beehiiv__footer'><br class='beehiiv__footer__break'><hr class='beehiiv__footer__line'><a target="_blank" class="beehiiv__footer_link" style="text-align: center;" href="https://www.beehiiv.com/?utm_campaign=a3c88ce3-cac0-4ca2-add9-d3813e60ee30&utm_medium=post_rss&utm_source=genai360_weekly_ai_news">Powered by beehiiv</a></div></div>
  ]]></content:encoded>
</item>

      <item>
  <title>September Roundup: 🦙3.2, GPT Next’s 100x Power, $125B Supercomputers</title>
  <description>Plus, free tickets for data leaders for RetrieveX Conference in SF on Oct 17</description>
  <link>https://genai360.beehiiv.com/p/of-llamas-and-proteins</link>
  <guid isPermaLink="true">https://genai360.beehiiv.com/p/of-llamas-and-proteins</guid>
  <pubDate>Tue, 08 Oct 2024 13:54:43 +0000</pubDate>
  <atom:published>2024-10-08T13:54:43Z</atom:published>
  <content:encoded><![CDATA[
    <div class='beehiiv'><style>
  .bh__table, .bh__table_header, .bh__table_cell { border: 1px solid #C0C0C0; }
  .bh__table_cell { padding: 5px; background-color: #FFFFFF; }
  .bh__table_cell p { color: #2D2D2D; font-family: 'Helvetica',Arial,sans-serif !important; overflow-wrap: break-word; }
  .bh__table_header { padding: 5px; background-color:#F1F1F1; }
  .bh__table_header p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important; overflow-wrap: break-word; }
</style><div class='beehiiv__body'><div class="section" style="background-color:#ff8a00;border-color:#F9FAFB;border-radius:5px;border-style:solid;border-width:5px;margin:0.0px 0.0px 0.0px 0.0px;padding:15.0px 15.0px 15.0px 15.0px;"><h2 class="heading" style="text-align:left;"><span style="color:#F9FAFB;">GenAI360 Exclusive: Unlock Free Tickets for RetrieveX Conference on Oct 17 in San Francisco. </span></h2><p class="paragraph" style="text-align:left;"><span style="color:#F9FAFB;">Come hear from the creators of Meta Chameleon, PyTorch, Kubeflow, CAFFE, along with leaders from Microsoft, AWS, Bayer, Flagship Pioneering, Cresta, VoyageAI, Omneky how to build best RAG and LLM-powered workflows. </span></p><p class="paragraph" style="text-align:left;"><span style="color:#F9FAFB;">If you&#39;re an engineering leader who&#39;s considering or working on GenAI projects, </span><span style="color:#F9FAFB;"><b>fill in the form below for a complimentary ticket for the conference </b></span><span style="color:#F9FAFB;">- hurry up, tickets are limited!</span></p><div class="button" style="text-align:center;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="background-color:#F9FAFB;" href="https://www.retrievex.co/application?utm_source=newsletter&utm_medium=email&utm_campaign=weekly1"><span class="button__text" style="color:#000000;"><span style="color:#000000;">Get Tickets Today</span></span></a></div><p class="paragraph" style="text-align:left;"><span style="color:#F9FAFB;">Date: October 17, 10:30am - 7pm PT</span><br><span style="color:#F9FAFB;">Venue: The Midway, 900 Marin St, San Francisco</span></p></div><h2 class="heading" style="text-align:left;" id="key-takeaways">Key takeaways</h2><ul><li><p class="paragraph" style="text-align:left;">Meta introduced <b>Llama 3.2 </b>with vision-capable LLMs (11B and 90B) for document understanding, and lightweight text models (1B, 3B) optimized for mobile devices, supporting on-device processing with improved privacy.</p></li><li><p class="paragraph" style="text-align:left;">OpenAI Japan announced &quot;<b>GPT Next,</b>&quot; projected to be 100x more powerful than GPT-4, with a planned release in 2024.</p></li><li><p class="paragraph" style="text-align:left;">DeepMind introduced <b>AlphaProteo</b>, an AI system for designing novel protein binders with 3-300x better binding affinities than existing methods.</p></li><li><p class="paragraph" style="text-align:left;">Google’s <b>NotebookLM is a personalized AI research assistant</b> that helps you chat across all your docs, and even generate audio overviews from uploaded content, creating podcasts and summaries with conversational AI voices.</p></li><li><p class="paragraph" style="text-align:left;">NVIDIA researchers introduced OP-RAG, outperforming long-context LLMs on the ∞Bench dataset using just <b>16K retrieved tokens</b>.</p></li></ul><h2 class="heading" style="text-align:left;" id="the-latest-ai-news">The Latest AI News</h2><p class="paragraph" style="text-align:left;">Just when you thought OpenAI had enough on their plate with Project Strawberry, they dropped some major news in Japan that they’re planning to release another model called <b>GPT Next later in the year.  </b>Other major releases included Meta’s Llama 3.2 and Google’s NotebookLM.</p><p class="paragraph" style="text-align:left;">There were a bunch of new releases last week too, including LLMs, AI agents, and VLMs. Progress in <b>biotech </b>was also made by DeepMind’s latest model called AlphaProteo.</p><h3 class="heading" style="text-align:left;" id="metas-llama-32-and-ai-updates">Meta’s Llama 3.2 and AI Updates</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXelFFIhr0ycxB-qvwDWb9vQIuPtK4JeBfHZIcCDe3FWrUwSPylQZu8XHTaDicW22FQHbYQ-3YkocmeFVTx8MptFTwkdC3vWTxlR2gx1t4v8o_uEIUu4LlpiTBL-jzx2ETZJT54H_19lgkw-9kD9-kS0UlUU?key=7OifTaVGnpPb3f8EB7B4vg"/><div class="image__source"><span class="image__source_text"><p>Llama 3.2 performs well on instruction-tuned benchmarks. <a class="link" href="https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Llama 3.1 recently reached <a class="link" href="https://genai360.beehiiv.com/p/strawberry-conf?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">350+ million lifetime downloads</a>. With this great achievement, Meta shipped another exciting update. <a class="link" href="https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">Meta&#39;s Llama 3.2</a> includes <b>vision-capable LLMs (11B and 90B)</b> for tasks like document understanding and visual reasoning, as well as lightweight text-only models (1B and 3B) optimized for edge and mobile devices with 128K token support. </p><p class="paragraph" style="text-align:left;">Llama 3.2&#39;s smaller models are optimized for devices using <b>Qualcomm and MediaTek </b>hardware. As a result, we’ll see faster, private processing for summarization and instruction-following without needing cloud access.</p><p class="paragraph" style="text-align:left;">The 11B and 90B models introduce a <b>new architecture</b> integrating pre-trained image encoders with cross-attention layers for high-performance image-text reasoning, making them drop-in replacements for text models.</p><p class="paragraph" style="text-align:left;">An important update was also the <a class="link" href="https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.md?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">Llama Stack</a>, a set of APIs for Agents, data generation, inference, guardrails, evals, and more. Aside from that, we saw some <b>updates</b> for <a class="link" href="https://about.fb.com/news/2024/09/metas-ai-product-news-connect/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">Meta’s AI products</a>, with new voice, text generation, and photo understanding capabilities within Messenger, Instagram, and Meta for Business.</p><h3 class="heading" style="text-align:left;" id="geminis-upgrade-notebook-lm-as-an-a">Gemini&#39;s Upgrade, NotebookLM as an AI Research Assistant, and Arcade&#39;s Product Platform </h3><p class="paragraph" style="text-align:left;">Google announced<a class="link" href="https://www.tomsguide.com/ai/google-just-dropped-new-versions-of-gemini-here-s-why-its-a-big-deal?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow"> new versions</a> of Gemini called <b>Gemini-1.5-Pro-002 and Gemini-1.5-Flash-002</b> models. Standout benefits include 50% price drop, 2x faster outputs, and 3x lower latency.</p><p class="paragraph" style="text-align:left;">These models show a 20% improvement in <b>math-related tasks </b>and substantial gains in visual understanding and code generation, so they’re ideal for tasks like summarizing large documents and understanding long videos.</p><p class="paragraph" style="text-align:left;">Moreover, Google reduced the output length by <b>5-20%</b>, speeding up responses without compromising quality, with options for more verbose outputs via prompting strategies. Google also simplified access to the models through AI Studio and Vertex AI, increasing rate limits and improving scalability for developers.</p><p class="paragraph" style="text-align:left;">Google also released<a class="link" href="https://notebooklm.google/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow"> NotebookLM (only available for English speakers),</a> which lets users upload content, ask questions, and generate podcast-like conversations between AI voices, providing <b>human-like discussions </b>of uploaded materials.</p><p class="paragraph" style="text-align:left;">It <a class="link" href="https://www.forbes.com/sites/rogerdooley/2024/10/04/how-to-create-an-ai-podcast-about-anything-in-seconds-with-notebooklm/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">supports a variety of content</a>, as users can upload PDFs, text files, websites, and videos into &quot;notebooks&quot; to create <b>customized summaries </b>or ask questions about the content. In addition to audio overviews, NotebookLM provides study guides, accurate quotations, FAQs, and detailed summaries.</p><p class="paragraph" style="text-align:left;">Furthermore, the first AI product creation platform was released by <a class="link" href="https://www.arcade.ai/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">Arcade AI,</a> described as “<b>prompt-to-product</b>” using design-to-manufacturing AI tech. It has a simple three step process where you enter a prompt, edit it, then share it with others. Currently, Arcade AI is in beta form.</p><h3 class="heading" style="text-align:left;" id="ai-2-s-multimodal-molmo-challenges-">Ai2&#39;s Multimodal Molmo Challenges Tech Giants</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfHNzucWqiSuTYD1OuHwBmo6-ExOz25r36h78VrO68Ph6h6aolb3VGm7jgUGmkMXyV3elkPBXOu6PQxYn92oIk4deSvs_9o3cmlKbh9i2E9UwiXaz5zdb3hCKEtCT5917EM2PR-8EQ9xrXIFpF5TkWQphJK?key=7OifTaVGnpPb3f8EB7B4vg"/><div class="image__source"><span class="image__source_text"><p>Molmo outperforms GPT-4o on academic benchmarks. <a class="link" href="https://molmo.allenai.org/blog?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Ai2 launched <a class="link" href="https://molmo.allenai.org/blog?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">Molmo</a>, a <b>multimodal</b> open-source model that rivals major models like GPT-4o, Gemini 1.5, and Claude-3.5 in performance, but at a fraction of their size.</p><p class="paragraph" style="text-align:left;">Molmo excels in tasks like answering questions about images, recognizing objects, and navigating web interfaces without needing complex infrastructure. This is thanks to the fact that it uses high-quality data, with only <b>600,000 annotated images</b>, achieving impressive results compared to models trained on billions of images.</p><h3 class="heading" style="text-align:left;" id="replits-ai-agent-vs-random-labs-ide">Replit&#39;s AI Agent vs Random Labs&#39; IDE Integration</h3><p class="paragraph" style="text-align:left;"><a class="link" href="https://docs.replit.com/replitai/agent?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">Replit’s new AI agent </a>helps users build software projects from the ground up by using natural language prompts. As a result, it makes software development a lot <b>more accessible</b> to users of levels.</p><p class="paragraph" style="text-align:left;">It’s available to Replit Core and Teams members at no additional cost during this “<b>early access”</b>, Pricing details will be available later in 2024 and it can be used through the web interface or mobile app.</p><p class="paragraph" style="text-align:left;">Another agent release we saw was by <a class="link" href="https://www.ycombinator.com/launches/Lnp-random-labs-an-open-source-software-agent-that-lives-in-your-codebase?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">Random Labs</a>, who developed an AI agent directly integrated into the <b>IDE</b>. It can make changes across huge codebases and provides more than just autocomplete or chat functionality. </p><h3 class="heading" style="text-align:left;" id="alpha-proteos-protein-revolution-an">AlphaProteo&#39;s Protein Revolution and LIGO&#39;s Open-Source Leap</h3><p class="paragraph" style="text-align:left;"></p><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXeW78XCFCaMNx0ZAeq8yxQIbLc4no5SHvC_oRzU53ZB-VvzfUMqYoI0hwHQShQTbm7GU4nEZxDse9g2iOevmCyMWWUp30I-rDhgG27b8GSa7JeMsMir6wG31U2YEU8Qxk49AySAOKVkUCG5CFwwbaHLfFMT?key=kUQfUwNwF7ajC3NgSN1FFA"/><div class="image__source"><span class="image__source_text"><p>Predicted protein binder structure generated by AlphaProteo, shown in blue. <a class="link" href="https://deepmind.google/discover/blog/alphaproteo-generates-novel-proteins-for-biology-and-health-research/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">(Source)</a> </p></span></div></div><p class="paragraph" style="text-align:left;">DeepMind made some significant advancements <b>in biotech </b>with the release of <a class="link" href="https://genai360.beehiiv.com/p/gpt4o-alphafold-yoco?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">AlphaFold 3 a few months ago.</a> </p><p class="paragraph" style="text-align:left;">Now, they’ve developed a new AI system called <a class="link" href="https://deepmind.google/discover/blog/alphaproteo-generates-novel-proteins-for-biology-and-health-research/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">AlphaProteo</a>, which can design novel, <b>high-strength protein binders </b>to serve as building blocks for biological and health research. It aims to accelerate progress in areas like drug development, disease understanding, and biosensors.</p><p class="paragraph" style="text-align:left;">The system generates new protein binders for diverse target proteins, including challenging targets like VEGF-A. It achieves <b>higher experimental success </b>rates and 3 to 300 times better binding affinities than existing methods on seven tested target proteins.  </p><p class="paragraph" style="text-align:left;">Sounds promising, but there’s still work to be done since the system has limitations like being unable to design binders for<b> some challenging targets.</b></p><p class="paragraph" style="text-align:left;">Speaking of AlphaFold 3, we saw an open-source implementation of it called <a class="link" href="https://github.com/Ligo-Biosciences/AlphaFold3?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">LIGO</a>. The current release implements the full AlphaFold 3 model along with the <b>training code</b>, focusing first on single chain prediction capability. Future updates will add ligand, multimer, and nucleic acid prediction capabilities once they are trained.</p><p class="paragraph" style="text-align:left;">LIGO&#39;s implementation includes several core modules reused from the OpenFold project, such as triangular attention and multiplicative update, as well as their data processing pipelines. It also utilizes the ProteinFlow library for its <b>data pipeline.</b></p><p class="paragraph" style="text-align:left;">The project addresses some <b>discrepancies</b> from AlphaFold 3&#39;s published pseudocode, including changes to the MSA module order, loss scaling, and DiT block design. </p><h3 class="heading" style="text-align:left;" id="open-a-is-gpt-next-bombshell-100-x-"><b>OpenAI&#39;s GPT Nex</b>t Bombshell: 100x Power, $2000 Price Tag</h3><p class="paragraph" style="text-align:left;">OpenAI Japan announced “<a class="link" href="https://the-decoder.com/openai-japan-shares-vision-for-much-more-powerful-gpt-next-coming-in-2024/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">GPT Next</a>” - a model planned for release in 2024 and is supposed to be 100x more powerful than GPT-4. This name is just a placeholder though, and we’ve seen some rumors floating around that the model is codenamed “<b>Project Orion</b>” (funny how this has the same name as Meta&#39;s AR glasses project).</p><p class="paragraph" style="text-align:left;">Orion focuses on improving NLP capabilities while expanding into multimodal territory, being able to integrate text, image, and video inputs. Notably, Orion is mentioned to leverage training data generated by O1, since high-quality data for training models<b> isn’t always abundant.</b></p><p class="paragraph" style="text-align:left;">It’ll be interesting to see if Orion will actually achieve such high performance as OpenAI claims, since training AI models on <b>excess training data</b> can actually lead to worse performance.</p><p class="paragraph" style="text-align:left;">In terms of pricing, executives at OpenAI are even thinking about putting subscription prices as high as <a class="link" href="https://www.pymnts.com/artificial-intelligence-2/2024/report-openai-considers-2000-monthly-subscription-prices-for-new-llms/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">$2000</a> for access to models like O1 and Orion, a <b>100x price increase from ChatGPT subscription.</b></p><p class="paragraph" style="text-align:left;">Anthropic also hit a milestone, with <a class="link" href="https://techcrunch.com/2024/09/04/anthropic-launches-claude-enterprise-plan-to-compete-with-openai/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">Anthropic’s Claude</a> recently reaching $1 million in mobile app revenue. They’re going a step further with the release of <b>Claude Enterprise.</b></p><p class="paragraph" style="text-align:left;">It’s a new subscription plan for its AI chatbot aimed at enterprise customers, offering more administrative controls and increased security. Puts Anthropic in <b>direct competition </b>with OpenAI&#39;s ChatGPT Enterprise, which was released about a year ago.</p><h3 class="heading" style="text-align:left;" id="col-pali-and-qwen-2-vl-for-multimod">ColPali and Qwen2-VL for Multimodal RAG and HoneyComb Takes Top Spot on SWE-Agent leaderboard</h3><p class="paragraph" style="text-align:left;">There was a recent discussion on X about ColPali being better for <b>information retrieval </b>over the combination of OCR and LLMs. </p><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXeCXU9jF0zvHXERScxiodb742LbcYqoM1nQL3WehXm3nCN9NU3a-35wlgPmyg-ipx-cMokydlbECtJbNTHMjf6A1GsStFZfXG4FgF_f8asFrxAbyXcRbHXRZ0aKar3VarDkj8LIpOB0Gfs0DHZ-Sm9IeQlU?key=kUQfUwNwF7ajC3NgSN1FFA"/><div class="image__source"><span class="image__source_text"><p>ColiPali and Qwen2-VL were used in combination. <a class="link" href="https://x.com/mervenoyann/status/1831737088468791711?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;"><a class="link" href="https://github.com/merveenoyan/smol-vision/blob/main/ColPali_%2B_Qwen2_VL.ipynb?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">ColPali </a>is built on the PaliGemma-3B model, which we saw in an announcement at <a class="link" href="https://genai360.beehiiv.com/p/io-openai-exodus-14-llms?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">I/O 2024 by Google.</a> In short, ColPali is a document retrieval model that uses VLMs to index and retrieve information from documents based on visual features. </p><p class="paragraph" style="text-align:left;">The main aim is to tackle <b>tricky issues</b> that traditional document retrieval systems can’t deal with, like:</p><ul><li><p class="paragraph" style="text-align:left;">Complex data ingestion pipeline</p></li><li><p class="paragraph" style="text-align:left;">OCR requirements </p></li><li><p class="paragraph" style="text-align:left;">Difficulty in handling visual elements like tables and figures</p></li></ul><p class="paragraph" style="text-align:left;">A key benefit of using ColPali is that OCR or image captioning isn’t necessary. It’s also a more efficient method compared to OCR + LLMs while having higher accuracy. In addition, Qwen2-VL-7B is used for the <b>generation process </b>in RAG.</p><p class="paragraph" style="text-align:left;">The <a class="link" href="https://x.com/dylfreed/status/1831075759747723709?s=46&t=r_IyUhjxHPp6D-O3kuDbzQ&utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">Qwen2-VL models</a> saw new releases last week with three sizes: 2B, 7B, and 72B, with the 7B instruction-tuned version being open-sourced.</p><p class="paragraph" style="text-align:left;">It showed impressive results, achieving SOTA performance on visual understanding benchmarks like MathVista, DocVQA, and RealWorldQA. Moreover, it can understand videos over <b>20 minutes long </b>for high-quality video-based tasks.</p><p class="paragraph" style="text-align:left;"><a class="link" href="https://x.com/snowmaker/status/1831219425007296566?s=46&t=r_IyUhjxHPp6D-O3kuDbzQ&utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">HoneyComb</a> was a new model that took the SWE-Agent leaderboard by storm last week by achieving SOTA results, putting it in first place.</p><p class="paragraph" style="text-align:left;">It takes a different approach to other models. Rather than having one, jack-of-all-trades AI agent, HoneyComb uses <b>multiple AI agents</b> that are fine-tuned to perform well at individual tasks like bug fixing and code review. </p><p class="paragraph" style="text-align:left;">What’s great about this method is that it creates a potentially infinite cycle of continuous improvement and bug fixing. Makes the lives of software engineers easier by being able to <b>automate any step</b> of the development process.</p><p class="paragraph" style="text-align:left;">A new feature for <a class="link" href="https://x.com/hamiltonulmer/status/1831361779739549887?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">SQL query editing</a> was another neat release. A bas this system executes queries in real-time as the user types them, so it provides <a class="link" href="https://x.com/hamiltonulmer/status/1831361779739549887?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">instant feedback</a> based on the query’s results and performance. </p><h3 class="heading" style="text-align:left;" id="us-eu-and-uk-sign-landmark-coe-trea">US, EU, and UK Sign Landmark COE Treaty on AI Safety, Draghis’ Report</h3><p class="paragraph" style="text-align:left;">Previously, the <a class="link" href="https://genai360.beehiiv.com/p/on-device-llms-challenge-gpt3-5?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">EU AI act came into force</a>. </p><p class="paragraph" style="text-align:left;">The saga continues as the Council of Europe (COE) has introduced the first-ever legally binding international treaty on AI safety, called the &quot;<a class="link" href="https://techcrunch.com/2024/09/05/us-uk-and-eu-sign-on-to-the-council-of-europes-high-level-ai-safety-treaty/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">Council of Europe Framework Convention on AI and Human Rights, Democracy, and the Rule of Law.</a>&quot; Major signatories include the U.S., U.K., and European Union, along with <b>several smaller nations.</b></p><p class="paragraph" style="text-align:left;">The treaty focuses on <b>three main areas: </b></p><ol start="1"><li><p class="paragraph" style="text-align:left;">protecting human rights (including data privacy and anti-discrimination)</p></li><li><p class="paragraph" style="text-align:left;">Safeguarding democracy</p></li><li><p class="paragraph" style="text-align:left;">Upholding the rule of law</p></li></ol><p class="paragraph" style="text-align:left;">Notable absences from the initial signatories include countries from Asia, the Middle East, and Russia. The treaty&#39;s goal is to create a technology-neutral framework that can withstand <b>future developments</b> in AI technology.</p><p class="paragraph" style="text-align:left;">It’ll officially enter into force <b>three months</b> after at least five signatories, including three COE member states, have ratified it. </p><p class="paragraph" style="text-align:left;"><a class="link" href="https://www.euronews.com/next/2024/09/09/with-ai-eu-has-opportunity-to-capitalise-on-digital-says-draghi?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">Draghi&#39;s report</a> on European competitiveness came at a crucial time for AI regulation in the EU. It follows the recent implementation of the <b>EU AI Act </b>and the signing of the Council of Europe&#39;s landmark AI safety treaty.</p><p class="paragraph" style="text-align:left;">While the treaty, which focuses on protecting human rights, safeguarding democracy, and upholding the rule of law, sets a <b>broad regulatory framework, </b>Draghi&#39;s report highlights the economic opportunities for the EU in the AI sector.</p><p class="paragraph" style="text-align:left;">As the EU works to balance innovation with responsible AI development, Draghi&#39;s recommendations could inform how the EU implements the <b>principles outlined in the COE treaty,</b> especially in areas where Europe still has a competitive edge.</p><h3 class="heading" style="text-align:left;" id="plans-for-two-125-billion-supercomp">Plans for Two $125 Billion Supercomputers in North Dakota</h3><p class="paragraph" style="text-align:left;">We could see the development of <a class="link" href="https://www.datacenterdynamics.com/en/news/two-companies-seek-to-develop-125bnai-data-centers-in-north-dakota/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">two massive data centers</a> in North Dakota, with each one costing up to <b>$125 billion.</b> This is currently being considered by two trillion-dollar market cap companies.</p><p class="paragraph" style="text-align:left;">This means it&#39;s the usual suspects like Nvidia, Microsoft, Apple, and Google. Rumor has it that Microsoft might be a likely candidate since they were looking into building a <b>$100 billion supercomputer</b> campus with OpenAI in March 2024, which we talked about in our <a class="link" href="https://genai360.beehiiv.com/p/1-bit-llms?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">first newsletter.</a></p><p class="paragraph" style="text-align:left;">The state currently has a small data center market, with only <b>seven facilities listed.</b></p><p class="paragraph" style="text-align:left;">Applied Digital, a cryptomining and AI provider, recently secured <b>$200 million</b> for facility expansion in the state and has a deal with an unnamed hyperscaler. Also worth pointing out that the state produces more energy than it uses.</p><h2 class="heading" style="text-align:left;" id="advancements-in-ai-research">Advancements in AI Research</h2><p class="paragraph" style="text-align:left;">Midjourney’s competitor FLUX saw some interesting developments from the research side of things, with a new paper detailing a generative framework called FluxMusic for text-to-music generation. In addition, we got some insights into new approaches for training LLMs with long context windows and improving <b>traditional RAG methods.</b></p><h3 class="heading" style="text-align:left;" id="composing-the-future-of-ai-generate">Composing the Future of AI-Generated Soundscapes With FluxMusic</h3><p class="paragraph" style="text-align:left;">We talked about Black Forest Lab’s FLUX before and noted that it’s a pretty serious <a class="link" href="https://genai360.beehiiv.com/p/on-device-llms-challenge-gpt3-5?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">competitor for Midjourney.  </a></p><p class="paragraph" style="text-align:left;">This paper gives us a new approach to <a class="link" href="https://arxiv.org/pdf/2409.00587v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">text-to-music generation</a> that aims to strengthen <b>multimodal LLM perception</b> and support higher input resolutions, with FluxMusic being a generative framework.</p><p class="paragraph" style="text-align:left;">It handles the issue of creating more versatile and powerful AI systems capable of generating high-quality music from textual descriptions, a task that has long been limited by the complexity of musical structure and the need for <b>domain-specific knowledge.</b></p><p class="paragraph" style="text-align:left;">They implemented a two-tiered LLM chain for content refinement and information extraction, and utilized multiple pre-trained text encoders for conditioned caption feature extraction and<b> inference flexibility.</b></p><p class="paragraph" style="text-align:left;">FLUX models support input resolutions up to over 1000 pixels and achieve strong performance on multimodal LLM benchmarks. This is especially true for resolution-sensitive tasks like OCR and document understanding. Notably, FLUX outperformed other prominent models like Mini-Gemini-HD and LLaVA-NeXT across <b>various metrics.</b></p><h3 class="heading" style="text-align:left;" id="breaking-the-long-context-barrier">Breaking the Long Context Barrier</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXciqxrZrp5RxbYXEGuMCHckoQuDZ7Rl50XefOkP8ASYbnR1G-j4Li9CzBw9eu5YIT8CRpXLhxfql8e2XS_s43fNbXgAiu4xM9Ju-4k8fTw-WqH8Kq7rZhJ9vPn9c8s4DTt_WV0djIIGUnzN7Us4H2lk7Kw?key=kUQfUwNwF7ajC3NgSN1FFA"/><div class="image__source"><span class="image__source_text"><p>End-to-end model training comparison. <a class="link" href="https://arxiv.org/pdf/2408.16978v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Researchers from Ohio State University and Microsoft have introduced a <a class="link" href="https://arxiv.org/pdf/2408.16978v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">novel approach to training LLMs</a> with <b>extremely long context windows</b>.</p><p class="paragraph" style="text-align:left;">Current LLM training is typically constrained to relatively short context lengths, such as 8K or 32K tokens, limiting their ability to process entire documents or maintain <b>coherent long-term dialogues.</b></p><p class="paragraph" style="text-align:left;">They introduced the Fully Pipelined Distributed Transformer (FPDT) that leverages multiple memory hierarchies in modern GPU clusters to enhance hardware efficiency and cost-effectiveness. </p><p class="paragraph" style="text-align:left;">This method uses:</p><ul><li><p class="paragraph" style="text-align:left;">A chunking mechanism</p></li><li><p class="paragraph" style="text-align:left;">Offloading idle tokens to host memory</p></li><li><p class="paragraph" style="text-align:left;">A double buffer strategy to overlap computation with data transfer.</p></li></ul><p class="paragraph" style="text-align:left;">FPDT can train an 8B parameter LLM with a 2 million token context length using only 4 GPUs, achieving over 55% Model FLOPs Utilization (MFU). This represents a 16x increase in sequence length compared to current SOTA solutions while <b>maintaining high efficiency.</b></p><h3 class="heading" style="text-align:left;" id="how-16-k-tokens-of-rag-outperform-1">How 16K Tokens of RAG Outperform 128K Context LLMs</h3><p class="paragraph" style="text-align:left;">As long-context language models (LLMs) become increasingly prevalent, some have questioned the continued relevance of RAG. A new study from NVIDIA researchers challenges this notion by showing that RAG still has a crucial role to play, even in the era of models with <b>100K+ token contexts.</b></p><p class="paragraph" style="text-align:left;">The team introduces <a class="link" href="https://arxiv.org/pdf/2409.01666?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">Order-Preserve RAG (OP-RAG)</a>, which significantly improves upon <b>traditional RAG methods</b>. By maintaining the original order of retrieved chunks rather than sorting by relevance, OP-RAG achieves a better balance between information recall and precision. </p><p class="paragraph" style="text-align:left;">They found that as the number of retrieved chunks increases, answer quality follows an inverted U-shaped curve, with an optimal &quot;sweet spot&quot; that outperforms long-context LLMs using the <b>entire available context</b>.</p><p class="paragraph" style="text-align:left;">Results on the ∞Bench dataset are striking: OP-RAG using Llama3.1-70B with just 16K retrieved tokens achieved a 44.43 F1 score, surpassing the 34.32 score of Llama3.1-70B using its full 128K context. It even bested <b>GPT-4o (32.36)</b> and Gemini-1.5-Pro (43.08). </p><h2 class="heading" style="text-align:left;" id="frameworks-we-love">Frameworks We Love</h2><p class="paragraph" style="text-align:left;">Some frameworks that caught our attention in the last week include:</p><ul><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://github.com/anthropics/anthropic-quickstarts?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">Anthropic Quick starts</a>: Collection of ready-to-use projects designed to help developers quickly build deployable applications using the Anthropic API and Claude&#39;s capabilities.</p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2409.03735?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">LLM-CI</a>: Assesses privacy norms encoded in LLMs using a Contextual Integrity-based factorial vignette methodology across different contexts. </p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://github.com/getomni-ai/zerox?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">GetOmni AI</a>: Open-source OCR tool that uses GPT-4 vision models to convert PDF documents into high-quality markdown text.</p></li></ul><h2 class="heading" style="text-align:left;" id="conversations-we-loved">Conversations We Loved</h2><p class="paragraph" style="text-align:left;">One discussion raised the question of what led to<b> public cloud storage prices </b>plateauing in 2017 rather than continuing to decrease like earlier years.</p><h3 class="heading" style="text-align:left;" id="the-great-cloud-storage-plateau">The Great Cloud Storage Plateau</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXda9ZeudL82lsUxRFrkcQik7MH7Z9l1tz62y_r-kidymSQtMKdvn8xyfEDmCjgyNDfdUT5uRYKCfBvEanu2ku9xpWUAwSzenKOvwwTaIW5_mY7apvQ7pfgKoRDL3-Iwf430LAzH0oQI-xXK0_KmplIYdAyN?key=kUQfUwNwF7ajC3NgSN1FFA"/><div class="image__source"><span class="image__source_text"><p>Cloud storage costs from 2007 to 2017. <a class="link" href="https://x.com/vikhyatk/status/1830680511448260635?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">A graph that tells a fascinating story about the evolution of <a class="link" href="https://x.com/vikhyatk/status/1830680511448260635?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">cloud storage pricing</a> caught our eye. The image, which tracks the cost per GB/month across various cloud providers from <b>2007 to 2019</b>, reveals that after years of steep declines, prices seemed to hit a floor around 2017.</p><p class="paragraph" style="text-align:left;">This <b>plateau in cloud storage costs </b>makes us wonder about the economics of cloud computing and the broader tech industry. </p><p class="paragraph" style="text-align:left;">Some speculate that we&#39;ve reached the <b>physical limits of storage density</b> improvements, while others point to market consolidation and reduced competition. There&#39;s also the possibility that cloud providers have shifted their focus from raw storage to value-added services.</p><h2 class="heading" style="text-align:left;" id="money-moving-in-ai">Money Moving in AI</h2><p class="paragraph" style="text-align:left;">A couple of huge acquisitions and investment rounds took place last week, with SSI raising $1 billion and Salesforce purchasing Own for $1.9 billion. <a class="link" href="https://You.com?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">You.com</a> also raised <b>$50 million from a series B funding round.</b></p><h3 class="heading" style="text-align:left;" id="ilya-sutskever-raises-1-billion-for">Ilya Sutskever Raises $1 Billion for New AI Safety Startup</h3><p class="paragraph" style="text-align:left;">Just a few months ago, OpenAI disbanded its AI safety team and key members left the company, including Ilya Sutskever. Sutskever created his <b>own AI safety startup</b> which raised a staggering <a class="link" href="https://www.reuters.com/technology/artificial-intelligence/openai-co-founder-sutskevers-new-safety-focused-ai-startup-ssi-raises-1-billion-2024-09-04/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">$1 billion at a $5 billion valuation</a>, just three months after its founding. </p><h3 class="heading" style="text-align:left;" id="you-raises-50-million-in-series-b-f">You Raises $50 Million in Series B Funding</h3><p class="paragraph" style="text-align:left;">You.com, an AI search company focusing on <b>complex queries and productivity tools</b>, has raised <a class="link" href="https://techcrunch.com/2024/09/04/you-com-refocuses-from-ai-search-to-deeper-productivity-agents-with-new-50m-round/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">$50 million in a Series B</a> round led by Georgian, with participation from notable investors including Nvidia and Salesforce Ventures.</p><p class="paragraph" style="text-align:left;">The company is betting on its ability to excel at answering complex questions and providing a &quot;productivity engine&quot; for <b>knowledge workers.</b></p><h3 class="heading" style="text-align:left;" id="salesforce-acquires-own-for-19-bill">Salesforce Acquires Own for $1.9 billion</h3><p class="paragraph" style="text-align:left;">Salesforce made its largest acquisition since Slack, purchasing data management firm Own for <a class="link" href="https://techcrunch.com/2024/09/05/salesforce-acquires-data-management-firm-own-for-1-9b-in-cash/?guccounter=1&utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=september-roundup-3-2-gpt-next-s-100x-power-125b-supercomputers" target="_blank" rel="noopener noreferrer nofollow">$1.9 billion in cash</a>. The deal highlights the growing importance and value of data protection and management solutions in the enterprise space, with the global data backup and recovery sector worth <b>$12.9 billion in 2023.</b></p></div><div class='beehiiv__footer'><br class='beehiiv__footer__break'><hr class='beehiiv__footer__line'><a target="_blank" class="beehiiv__footer_link" style="text-align: center;" href="https://www.beehiiv.com/?utm_campaign=c03dc7ca-5c08-4492-bdff-461c99a81595&utm_medium=post_rss&utm_source=genai360_weekly_ai_news">Powered by beehiiv</a></div></div>
  ]]></content:encoded>
</item>

      <item>
  <title>PyTorch LLM Improvements, Oracle &amp; Salesforce Conference Updates, Google’s 🐳  AI</title>
  <description>Plus, Mistral Debuts Pixtral 12B Multimodal Model</description>
  <link>https://genai360.beehiiv.com/p/of-whales-and-strawberries</link>
  <guid isPermaLink="true">https://genai360.beehiiv.com/p/of-whales-and-strawberries</guid>
  <pubDate>Tue, 01 Oct 2024 17:13:21 +0000</pubDate>
  <atom:published>2024-10-01T17:13:21Z</atom:published>
  <content:encoded><![CDATA[
    <div class='beehiiv'><style>
  .bh__table, .bh__table_header, .bh__table_cell { border: 1px solid #C0C0C0; }
  .bh__table_cell { padding: 5px; background-color: #FFFFFF; }
  .bh__table_cell p { color: #2D2D2D; font-family: 'Helvetica',Arial,sans-serif !important; overflow-wrap: break-word; }
  .bh__table_header { padding: 5px; background-color:#F1F1F1; }
  .bh__table_header p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important; overflow-wrap: break-word; }
</style><div class='beehiiv__body'><p class="paragraph" style="text-align:left;">But seriously, when did we start naming stuff after fruit instead of animals? Looking at you, 🍐 (oof) and the non-profit that decided to morph into a for-profit entity, with the last founding member deciding to depart faster than you can say &quot;I have no equity in OpenAI… I do this because I love it, 💚 &quot;. Anyways, this week is full of news (and there have been many more over the weekend), so let&#39;s dive right in! But first:</p><div class="section" style="background-color:#FFFFFF;border-color:#ff8a00;border-radius:2px;border-style:solid;border-width:2px;margin:10.0px 10.0px 10.0px 10.0px;padding:10.0px 10.0px 10.0px 10.0px;"><h2 class="heading" style="text-align:left;"><span style="color:rgb(34, 34, 34);">GenAI360 Exclusive: Unlock Free Tickets for RetrieveX Conference on Oct 17 in San Francisco. </span></h2><p class="paragraph" style="text-align:left;"><span style="color:rgb(34, 34, 34);">Come hear from the creators of Meta Llama, PyTorch, Kubeflow, CAFFE, along with leaders from Microsoft, AWS, Bayer, Flagship Pioneering, Cresta, VoyageAI, Omneky how to build best retrieval for AI. </span></p><p class="paragraph" style="text-align:left;"><span style="color:rgb(34, 34, 34);">If you&#39;re executive who&#39;s considering or working on GenAI projects, fill in the form below for a complimentary ticket for the conference - hurry up because tickets are limited!</span></p><div class="button" style="text-align:center;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="background-color:#ff8a00;" href="https://www.retrievex.co/application?utm_source=newsletter&utm_medium=email&utm_campaign=weekly1"><span class="button__text" style="color:#FFFFFF;"> Get Tickets Today </span></a></div><p class="paragraph" style="text-align:left;"><span style="color:rgb(34, 34, 34);">Date: October 17, 10:30am - 7pm PT</span><br><span style="color:rgb(34, 34, 34);">Venue: The Midway, 900 Marin St, San Francisco</span></p></div><h2 class="heading" style="text-align:left;" id="key-takeaways">Key Takeaways</h2><ul><li><p class="paragraph" style="text-align:left;">Mistral released Pixtral 12B, their <b>first multimodal model</b> capable of processing both images and text, outperforming other top models on various benchmarks.</p></li><li><p class="paragraph" style="text-align:left;">IBM researchers presented improvements to PyTorch, including a high-throughput data loader and enhanced LLM training throughput.</p></li><li><p class="paragraph" style="text-align:left;">Google developed a <b>whale bioacoustics model </b>capable of identifying eight distinct whale species and multiple vocalizations.</p></li><li><p class="paragraph" style="text-align:left;">General OCR Theory employs a<b> unified end-to-end model </b>with a high-compression encoder and long-context decoder, which outperformed existing models on various OCR tasks.</p></li><li><p class="paragraph" style="text-align:left;">OpenAI&#39;s o1 model shows significant improvements over GPT-4 on <b>reasoning-heavy tasks</b>, rivaling human expert performance on many benchmarks.</p></li><li><p class="paragraph" style="text-align:left;">A paper drawing parallels to<b> quantum mechanics</b> applies the Universal Approximation Theorem to explain LLM memory mechanisms, and found some models able to memorize nearly 100% of 2,000 poems after limited exposure.</p></li></ul><p class="paragraph" style="text-align:left;"><i>Got forwarded this newsletter? Subscribe below👇</i></p><div class="button" style="text-align:left;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="background-color:#ff8a00;" href="https://genai360.beehiiv.com/subscribe?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai"><span class="button__text" style=""> Subscribe </span></a></div><h2 class="heading" style="text-align:left;" id="the-latest-ai-news">The Latest AI News</h2><p class="paragraph" style="text-align:left;">We’ve been hearing about Project Strawberry for a while and it’s finally here, with OpenAI releasing a preview version of the model<b> called o1. </b>There were a bunch of other releases too, including multimodal modals, speech-text foundation models, and open-source development frameworks. </p><p class="paragraph" style="text-align:left;">Some interesting models for <b>life science applications</b> also came up, which might have gone under the radar amidst the o1 release hype.</p><h3 class="heading" style="text-align:left;" id="mistral-enters-multimodal-arena-and">Mistral Enters Multimodal Arena and Groq’s New Release</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXeaJwcI6vrVtx5w5sTkfPeczUHmC8oUigZA9HYyWf54XG6BzD32Lb1c_8VSU7gHCNCelzouoXn2YLSd_3EzKrgXuBcK4nybQuW7dKpwgcCrnf6rHqQycm0U-Faw6d0l6sXx81nrEoSNqfQrnVucnve6kHk2?key=qd_Cna9PprZkngI1BfwrHQ"/><div class="image__source"><span class="image__source_text"><p>Mistral’s first multimodal model showed promising results on various multimodal benchmarks. <a class="link" href="https://mistral.ai/news/pixtral-12b/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">(Source) </a></p></span></div></div><p class="paragraph" style="text-align:left;"><a class="link" href="https://techcrunch.com/2024/09/11/mistral-releases-pixtral-its-first-multimodal-model/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">Pixtral 12B</a> is a new multimodal AI model from Mistral, who previously released <a class="link" href="https://genai360.beehiiv.com/p/of-llamas-and-slms?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">three models</a> at around the same time. This new model is capable of processing both images and text. It has <b>12 billion parameters</b> and is approximately 24GB in size.</p><p class="paragraph" style="text-align:left;">Built on Mistral&#39;s Nemo 12B text model, it can answer questions about multiple images of any size using URLs or base64-encoded images.The model is expected to perform tasks like <b>image captioning </b>and object counting, similar to other multimodal models like Claude and GPT-4o.</p><p class="paragraph" style="text-align:left;">Pixtral 12B also performed well on various <a class="link" href="https://x.com/rajko_rad/status/1833934297486733487/photo/3?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">multimodal benchmarks</a> like MMMU and ChartQA, <b>outperforming </b>other top models like Claude 3 Haiku and Phi-3 Vision.</p><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXcmh2_-DY1vXINjj9ISVq3Qo3voMrHNAOS26-fPIdXGlgFrXILQsQr8tK1iD9VT4ksb_wGZ0B8Fl9VixkFIseL2YCWiwm1-apHNDkJ1XhLLQFYel3DESTCKetJe1qmWU9xOxJ2XitQDSGntBFGpJlPFAK7y?key=qd_Cna9PprZkngI1BfwrHQ"/><div class="image__source"><span class="image__source_text"><p>Don&#39;t hate us for this shot. Pixtral 12B’s results on various multimodal benchmarks. <a class="link" href="https://x.com/rajko_rad/status/1833934297486733487/photo/2?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Instead of another multimodal model release, we saw an open-source development framework pop up. <a class="link" href="https://www.youtube.com/watch?v=rQ1ZY0mdFcQ&utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">xRx </a>by Groq lets developers make <b>conversational AI solutions</b> by combining multimodal input and outputs.</p><p class="paragraph" style="text-align:left;">It makes life easier for users wanting to develop projects like <b>voice-based assistants, </b>text-based chatbots, and multimodal applications by giving them all the tools they need in one place. </p><h3 class="heading" style="text-align:left;" id="oracles-multicloud-vision-and-tenso">Oracle’s Multicloud Vision and Tensor Parallelism Breakthroughs</h3><p class="paragraph" style="text-align:left;">The conference season is upon us (while we&#39;re at it - grab your tickets for <a class="link" href="http://retrievex.co?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">RetrieveX</a>). PyTorch Conference, Dreamforce, the <a class="link" href="https://www.oracle.com/cloud/multicloud/larry-ellison-cloudworld-multicloud-strategy/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">Oracle CloudWorld</a>…<br><br>Speaking of the latter, Oracle&#39;s CTO Larry Ellison announced a new <b>partnership with AWS</b>, following similar deals with Microsoft and Google. The partnership, called Oracle Database@AWS, will embed Oracle Cloud Infrastructure inside AWS data centers. They&#39;ve also annouced APEX, a<b> low-code development platform</b>, uses AI to help create secure applications from the start.</p><p class="paragraph" style="text-align:left;">What’s more is that Ellison mentioned that Oracle is already building a <a class="link" href="https://x.com/andrewcurran_/status/1833960831245320317?s=46&t=r_IyUhjxHPp6D-O3kuDbzQ&utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">&gt;1GW data center </a>that’s powered by 3 data centers, and that Oracle is probably going to have <b>2000 data centers </b>worldwide. </p><p class="paragraph" style="text-align:left;">The <b>PyTorch conference</b> also took place at around the same time. Previously, we saw the release of a <a class="link" href="https://genai360.beehiiv.com/p/the-strawberry-mystery?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">PyTorch API</a>. </p><p class="paragraph" style="text-align:left;">Developments are continuing as we saw the implementation of experimental <a class="link" href="https://discuss.pytorch.org/t/distributed-w-torchtitan-introducing-async-tensor-parallelism-in-pytorch/209487?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">async tensor parallelism support,</a> integrated into TorchTitan. It can be used with torch.compile, which automatically detects <b>TP patterns</b> and rewrites them into async-TP ops, or directly in eager mode by calling async-TP ops.</p><p class="paragraph" style="text-align:left;">There were a couple of key performance challenges tackled by this implementation, including communication overhead and <b>magnified wave quantization</b> inefficiencies. As such, addressing these issues helped maintain real-time performance while incorporating complex learning mechanisms.</p><p class="paragraph" style="text-align:left;">Some notable results include:</p><ul><li><p class="paragraph" style="text-align:left;">In Llama3 7B, async-TP achieved up to <b>~29% forward pass</b> speedup and ~8% E2E speedup.</p></li><li><p class="paragraph" style="text-align:left;">For Llama3 70B, it showed up to <b>~20% forward pass</b> speedup and ~8% E2E speedup.</p></li></ul><h3 class="heading" style="text-align:left;" id="project-strawberry-unveiled-agi-wen">Project Strawberry Benchmarks: &#39;AGI Wen?&#39;</h3><div class="image"><img alt="" class="image__image" style="border-radius:0px 0px 0px 0px;border-style:solid;border-width:0px 0px 0px 0px;box-sizing:border-box;border-color:#E5E7EB;" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXdllRQXwQsBMNuexEBu9nfJ6xnV5WHmH4ghVOqYcKch1YMKd4pJFaVyfGjEfMKlFv1dmiqIdGIggMsHbds1IPmNIrBGX-6W1WOFrTSQENpk9VpF9LRRjCBccpgNKlAgNQEAmAt5GoJtVRuUrJAg0RFqXKU7?key=qd_Cna9PprZkngI1BfwrHQ"/><div class="image__source"><span class="image__source_text"><p>OpenAI’s latest model shows drastic improvements in competitive benchmarks over GPT-4. <a class="link" href="https://openai.com/index/learning-to-reason-with-llms/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">After all the build up, OpenAI finally made an <b>official announcement </b>about <a class="link" href="https://genai360.beehiiv.com/p/strawberry-conf?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">Project Strawberry</a>. It’s actually called “<a class="link" href="https://openai.com/index/learning-to-reason-with-llms/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">OpenAI o1</a>” instead.</p><p class="paragraph" style="text-align:left;">In the past, they used a combination of <b>supervised and reinforcement learning</b> for models like GPT-4. This time, OpenAI took a <b>slightly different approach</b> to training their latest model, using a large-scale reinforcement learning algorithm to teach the model productive thinking. The model’s performance consistently improves with more reinforcement learning (train-time compute) and more time spent thinking (test-time compute).</p><p class="paragraph" style="text-align:left;">In terms of results, it seems like OpenAI o1 lives up to the hype by significantly <b>outperforming GPT-4o</b> on most reasoning-heavy tasks. It also shows major improvements on challenging benchmarks like AIME (math), Codeforces (programming), and GPQA Diamond (PhD-level science questions), exceeding human expert performance on reasoning-heavy tasks. We&#39;ve yet to see nice results in production (beyond… Devin&#39;s reference), but we are currently using and liking it, too. </p><p class="paragraph" style="text-align:left;">Currently, <b>o1-preview</b> is being released for immediate use in ChatGPT and to trusted API users. <b>OpenAI is also A/B testing a feature where model selector is disabled</b>, opening up speculations they&#39;re working on model router to mange inference costs on consumer plans (and serving a ‘good enough&#39; model). </p><p class="paragraph" style="text-align:left;">Since <b>PhD-level intelligence </b>was a hot topic during the whole speculation phase of o1, it was interesting to see how it stacks up against other models on <a class="link" href="https://arcprize.org/blog/openai-o1-results-arc-prize?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">ARC Prize. </a></p><h3 class="heading" style="text-align:left;" id="salesforce-microsoft-and-amazons-ne">Salesforce, Microsoft, and Amazon’s New Releases</h3><p class="paragraph" style="text-align:left;">After releasing a couple of <a class="link" href="https://genai360.beehiiv.com/p/the-strawberry-mystery?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">autonomous AI agents</a>, Salesforce also released <a class="link" href="https://www.salesforce.com/agentforce/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">Agentforce</a> (confidently priced at <b>$2 per conversation,</b> ugh), which offers various <b>pre-built AI agents</b> for specific business functions, including Service Agents, SDRs, Sales Coaches, Personal Shoppers, and Campaign Agents.</p><p class="paragraph" style="text-align:left;">These agents are autonomous applications that provide 24/7 specialized support to employees or customers across <b>multiple channels </b>like web, mobile, WhatsApp, and Slack. </p><p class="paragraph" style="text-align:left;">Microsoft is launching the next wave of <a class="link" href="https://www.salesforce.com/agentforce/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">Copilot</a>, integrating web, work, and Pages as a new design for knowledge work. There’s also a bunch of Improvements being made to Copilot in various <b>Microsoft 365 apps.</b></p><p class="paragraph" style="text-align:left;">With GPT-4o and enhanced orchestration, Copilot responses are now more than two times faster on average. Response satisfaction has improved by nearly three times, so the <b>improvements are definitely noticeable.</b></p><p class="paragraph" style="text-align:left;">Copilot Pages (Perplexity called and wants its naming back…) is described as the first new <b>digital artifact</b> for the AI age, which allows users to edit, add to, and share AI-generated content. </p><p class="paragraph" style="text-align:left;">Amazon launched <a class="link" href="https://www.aboutamazon.com/news/innovation-at-amazon/amazon-project-amelia?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">Project Amelia</a>, a <b>generative AI assistant for sellers</b>, offering tailored business insights, guidance, and actions using Amazon Bedrock technology. Sellers can consult Amelia for best practices and strategies, receiving tailored responses based on their data and market trends. It offers quick access to sales data, customer traffic insights, and product performance analysis to help monitor progress.</p><h3 class="heading" style="text-align:left;" id="new-speech-text-foundation-model-an">New Speech-Text Foundation Model and Turning Ideas Into Apps</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfqPHyG6eOSjYKCHKQWJDBL9dCtAQWM3-g0V1bet3Sk8MPUKUr2yAvQwduU5ijCveX-HraAAT3VO6IXllkzf99q0AhGjwF2OmV7NU4P_MeTXaZOKykEVBn_7lKBDVj1Q2CdWEqXCohLZWcLi-FceWHg5s83?key=qd_Cna9PprZkngI1BfwrHQ"/><div class="image__source"><span class="image__source_text"><p>Moshi overview. <a class="link" href="https://github.com/kyutai-labs/moshi?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Kyutai Labs debuted a new speech-text foundation model called <a class="link" href="https://github.com/kyutai-labs/moshi?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">Moshi.</a> It uses Mimi, a SOTA streaming neural audio codec that processes 24 kHz audio down to a 12.5 Hz representation with a <b>bandwidth of 1.1 kbps.</b></p><p class="paragraph" style="text-align:left;">The model consists of two main components:</p><ol start="1"><li><p class="paragraph" style="text-align:left;">A small Depth Transformer for inter-codebook dependencies</p></li><li><p class="paragraph" style="text-align:left;">A large 7B parameter Temporal Transformer for temporal dependencies</p></li></ol><p class="paragraph" style="text-align:left;">Mimi outperforms existing <b>non-streaming codecs</b> like SpeechTokenizer and SemantiCodec in terms of efficiency and quality Moshi achieves a theoretical latency of 160ms (80ms for Mimi&#39;s frame size + 80ms of acoustic delay), with practical latency as low as 200ms on an L4 GPU. </p><p class="paragraph" style="text-align:left;">The trend of accessible coding seems to be continuing with <a class="link" href="https://llamacoder.together.ai/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">Llamacoder</a>, which lets you enter a simple prompt to turn your idea into a fully developed app within a <b>matter of minutes. </b>Keep in mind that it’s only for smaller apps though.</p><p class="paragraph" style="text-align:left;">It uses <b>Llama 3.1 405B </b>for code generation and Together AI for LLM inference - a nice combination that allows for efficient processing of prompts and code generation. </p><p class="paragraph" style="text-align:left;">There’s also other features in the works currently, such as adding support for multiple programming languages and plans for more <b>customization options in the future.</b></p><h3 class="heading" style="text-align:left;" id="from-whale-songs-to-cellular-automa"><span style="color:rgb(67, 67, 67);">From Whale Songs to Cellular Automata</span></h3><p class="paragraph" style="text-align:left;">Google decided to go in a different direction with a unique AI application. They developed a <a class="link" href="https://research.google/blog/whistles-songs-boings-and-biotwangs-recognizing-whale-vocalizations-with-ai/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">whale bioacoustics model </a>that can identify <b>eight whale species </b>and multiple vocalizations, including the Bryde&#39;s whale &quot;Biotwang&quot; sound. This is important for protecting whales that live in remote environments, since it’s hard to find them through other methods.</p><p class="paragraph" style="text-align:left;">The model uses spectrograms to classify <b>twelve whale vocalization</b> classes, offering detailed species identification and recognizing various vocalization types. Results were certainly impressive since it showed high accuracies across different whale species, excelling in identifying complex sounds like &quot;boings&quot; and &quot;gunshot&quot; calls.</p><p class="paragraph" style="text-align:left;">As such, it’s had some notable implications for <b>discoveries</b> about whale populations, movements, and behaviors.</p><p class="paragraph" style="text-align:left;">More AI applications in the life sciences were seen last week with <a class="link" href="https://x.com/ProfBuehlerMIT/status/1836742014215303626?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">LifeGPT,</a> a generative transformer model designed to simulate <b>Conway&#39;s Game of Life (Life). </b>Life is a cellular automaton simulation, which are models of computation that are useful in areas like biology.</p><p class="paragraph" style="text-align:left;">The issue is that algorithms within cellular automata like Life are tough to model and predict due to sensitivity of initial conditions and the fact that prior understanding is needed. LifeGPT helps us solve this issue by simulating Life <b>without needing prior knowledge.</b></p><p class="paragraph" style="text-align:left;">The model is trained on various grid configurations, allowing it to generalize across different grid sizes and periodic boundary conditions effectively. LifeGPT accurately captures the <b>complex dynamics </b>of cellular automata with near-perfect accuracy.</p><p class="paragraph" style="text-align:left;">We also saw the introduction of the &quot;autoregressive autoregressor&quot; method, which allows for recursive simulation of Life using LifeGPT for l<b>ong-term predictions.</b></p><p class="paragraph" style="text-align:left;">As a result, it would help design models that would have big implications for areas like <b>bioinspired materials and tissue engineering.</b></p><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/2154d500-3e91-4807-b15a-7cabca7f262d/cal_preview.png?t=1727800971"/><div class="image__source"><span class="image__source_text"><p>A note from our partner</p></span></div></div><div class="blockquote"><blockquote class="blockquote__quote"></blockquote></div><h2 class="heading" style="text-align:left;" id="advancements-in-ai-research">Advancements in AI Research</h2><p class="paragraph" style="text-align:left;">One paper really stood out since it made an interesting parallel that we thought we’d never see between <b>quantum mechanics and LLMs.</b> Progress was also made in applying AI for scientific research tasks and dealing with limitations that traditional OCR systems had a tough time with.</p><h3 class="heading" style="text-align:left;" id="ai-outperforms-experts-in-scientifi">AI Outperforms Experts in Scientific Research Tasks with PaperQA2</h3><p class="paragraph" style="text-align:left;">Researchers from FutureHouse developed <a class="link" href="https://storage.googleapis.com/fh-public/paperqa/Language_Agents_Science.pdf?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">PaperQA2</a>, which showed SOTA performance in synthesizing <b>scientific knowledge</b>. Creating AI systems that can accurately and reliably help with scientific research tasks has been an issue for quite some time, so this paper helps address a key issue.</p><p class="paragraph" style="text-align:left;">PaperQA2 used a <b>frontier language model </b>optimized for improved factuality, combined with an agentic approach to information retrieval and synthesis.</p><p class="paragraph" style="text-align:left;">The system uses a <b>multi-step process</b> involving paper search, evidence gathering, and answer generation - all orchestrated by an agent model. But what really stands out is the Reranking and Contextual Summarization (RCS) step, which improves the <b>relevance and quality of retrieved information.</b></p><p class="paragraph" style="text-align:left;">The researchers evaluated PaperQA2 on three real-world tasks:</p><ol start="1"><li><p class="paragraph" style="text-align:left;">Answering scientific questions</p></li><li><p class="paragraph" style="text-align:left;">Generating Wikipedia-style summaries</p></li><li><p class="paragraph" style="text-align:left;">Detecting contradictions in scientific literature</p></li></ol><p class="paragraph" style="text-align:left;">PaperQA2 matched or exceeded human expert performance across these tasks. For example, it achieved superhuman precision on the <b>LitQA2 question-answering benchmark </b>and produced more accurate and better-cited Wikipedia-style summaries than existing human-written articles.</p><h3 class="heading" style="text-align:left;" id="go-ts-comprehensive-solution-for-vi">GOT&#39;s Comprehensive Solution for Visual Data Extraction</h3><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2409.01704?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">General OCR Theory</a> (GOT) handles the<b> limitations </b>of traditional OCR systems in handling diverse types of artificial optical signals, from plain text to complex formulas and charts. It’s a unified end-to-end model consisting of a high-compression encoder and a long-context decoder. </p><p class="paragraph" style="text-align:left;">The encoder, with about <b>80M parameters</b>, processes 1024x1024 input images and compresses them into 256x1024 dimensional tokens. </p><p class="paragraph" style="text-align:left;">The decoder, with 500M parameters, supports an <b>8K </b>max token length for handling long-context scenarios.</p><p class="paragraph" style="text-align:left;">They developed a <b>multi-stage training strategy,</b> including decoupled pre-training of the encoder, joint training with a new decoder, and further post-training. They also created specialized data engines for synthetic data production to support each training stage.</p><p class="paragraph" style="text-align:left;">Results show that GOT outperforms existing models on <b>various OCR tasks</b>, including plain text recognition, formatted document understanding, and more specialized tasks like sheet music and chart recognition. For example, on the ChartQA-SE benchmark, GOT achieved an AP@strict score of 0.747, surpassing other models including GPT-4V and Qwen-VL.</p><h3 class="heading" style="text-align:left;" id="ll-ms-and-quantum-mechanics-the-sur">LLMs and Quantum Mechanics: The Surprising Connection in Memory Models</h3><p class="paragraph" style="text-align:left;">Researchers from The Hong Kong Polytechnic University proposed a <a class="link" href="https://arxiv.org/pdf/2409.10482v2?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">new framework</a> for <b>understanding memory</b> in LLMs in a way that you wouldn’t have thought of.</p><p class="paragraph" style="text-align:left;">They drew some pretty interesting parallels to quantum mechanics to answer the fundamental question of whether LLMs truly possess memory and, if so, how it functions <b>compared to human memory</b>.</p><p class="paragraph" style="text-align:left;">The study leverages the Universal Approximation Theorem (UAT) to explain the memory mechanism in LLMs. They argue that LLM memory operates like &quot;<b>Schrödinger&#39;s memory</b>&quot; - it only becomes observable when a specific memory is queried, and its existence can’t be determined otherwise.</p><p class="paragraph" style="text-align:left;">They demonstrated this concept through experiments on various LLMs, including fine-tuning models on poetry datasets to assess their <b>memory capabilities.</b></p><p class="paragraph" style="text-align:left;">Results show that LLMs can exhibit remarkable memory performance, with some models able to memorize nearly <b>100% of 2,000 poems</b> after limited exposure. The study also revealed memory performance decreases as input text length increases, mirroring human memory limitations.</p><h2 class="heading" style="text-align:left;" id="frameworks-we-love">Frameworks We Love</h2><p class="paragraph" style="text-align:left;">Some frameworks that caught our attention in the last week include:</p><ul><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2409.12193?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">Vista3D</a>:  Rapid and consistent 3D object generation from a single image, using a two-phase approach.</p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2409.12165?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">NSSR-DIL</a>: Reformulates image super-resolution as computing the inverse of degradation kernels, rather than directly generating high-resolution images from low-resolution inputs</p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2409.12140?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">MoRAG</a>: Text-based human motion generation that enhances motion diffusion models by leveraging additional knowledge from an improved motion retrieval process</p></li></ul><p class="paragraph" style="text-align:left;">If you want your framework to be featured here, reply to this email saying hi :)</p><h2 class="heading" style="text-align:left;" id="conversations-we-loved">Conversations We Loved</h2><p class="paragraph" style="text-align:left;">Don&#39;t worry, no fruit were harmed (or community noted) in our selection. One of our pics of the week went into depth about real-world usage patterns of <b>data warehouses</b> stood out since it goes against the common assumptions about these systems. </p><p class="paragraph" style="text-align:left;">Another post also grabbed our attention, which looked at <b>cloud pricing </b>for Kafka deployments, showing that there’s a big disparity between cloud costs and on-premise hardware capabilities for data streaming workloads.</p><h3 class="heading" style="text-align:left;" id="rethinking-data-infrastructure-and-">Rethinking Data Infrastructure and Processing</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfb5iP0twYU-4TZrjrEWk_Vafw8lOravklD448iFvxq3xgZDXNuFEOH9PGsjpEstdjlHt4llO9iBFSQvqSAIK9llL1w9kRwf8zFzwHzi1V6dQ5BMXRm2N91GgI9gw8SeQHJ0g8auad0UOx18cRKrtcDhac?key=qd_Cna9PprZkngI1BfwrHQ"/><div class="image__source"><span class="image__source_text"><p>Fraser looked usage patterns of data warehouses. <a class="link" href="https://x.com/frasergeorgew/status/1836117419540058588?s=46&t=r_IyUhjxHPp6D-O3kuDbzQ&utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">There was a deep dive into the real-world usage patterns of <a class="link" href="https://x.com/frasergeorgew/status/1836117419540058588?s=46&t=r_IyUhjxHPp6D-O3kuDbzQ&utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">data warehouses</a>, courtesy of George Fraser, CEO of Fivetran. His analysis of query samples from <b>Snowflake and Redshift</b> offered some surprising insights that challenge common assumptions about how these systems are used in practice.</p><p class="paragraph" style="text-align:left;">The post highlights that data warehouses are mainly used for ETL (Extract, Transform, Load) tasks, not just for business intelligence. Most queries are small, scanning only about 100 MB of data, which challenges the focus on massive scalability and has important implications for data infrastructure and processing.</p><h3 class="heading" style="text-align:left;" id="the-hidden-costs-of-cloud-kafka">The Hidden Costs of Cloud Kafka</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXcLtM9_eLANvQL11zJsfXxV7eeTsyx1oCOBs2tNfNiAnNF5AEHKC-FiTzlbgjCTawICtDesAgp2n2KIImafhT4pjGIbtG97-1U9fizn3S6dzKAgC6_rchv03pr708CbSEVIRN30j_ns-0n-HyXk_dCrzjIL?key=qd_Cna9PprZkngI1BfwrHQ"/><div class="image__source"><span class="image__source_text"><p>Kozlovski’s post about cloud pricing for Kafka deployments. <a class="link" href="https://x.com/BdKozlovski/status/1834968571299930527?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Kozlovski’s breakdown of <b>cloud pricing </b>for <a class="link" href="https://x.com/BdKozlovski/status/1834968571299930527?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">Kafka</a> deployments gained a lot of attention on X as it exposed the disparity between cloud costs and on-premises hardware capabilities for data streaming workloads.</p><p class="paragraph" style="text-align:left;">He explained how a modest <b>30MB/s Kafka cluster</b> could rack up over $110,000 in annual costs, with network traffic alone accounting for $88,300 of that sum. It shows a fundamental misalignment between cloud pricing structures and the needs of data-intensive applications.</p><p class="paragraph" style="text-align:left;">He also went into the reasons behind this pricing anomaly and explored potential solutions. From optimizations like &quot;<b>fetch from follower</b>&quot; to more radical approaches like WarpStream&#39;s innovative design, the discussion brought the ongoing struggle between cloud convenience and cost-effectiveness to light. Interesting.</p><h2 class="heading" style="text-align:left;" id="money-moving-in-ai">Money Moving in AI</h2><p class="paragraph" style="text-align:left;">While Microsoft launched the next wave of Copilot, they also announced a partnership with BlackRock where they’ll invest <b>$30 billion </b>for AI infrastructure development support. Other news included Nvidia thinking about acquiring OctoAI for $165 million. </p><p class="paragraph" style="text-align:left;">In terms of successful funding rounds, Sakana AI and Black Forest Labs raised <b>$210 million and $100 million respectively.</b></p><h3 class="heading" style="text-align:left;" id="open-a-is-150-b-sprint">OpenAI&#39;s $150B Sprint</h3><p class="paragraph" style="text-align:left;">In the next episode full of plot twists worthy of a soap opera, <a class="link" href="https://www.wsj.com/tech/apple-no-longer-in-talks-to-join-openai-investment-round-e3be3e66?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">Apple has decided to sit out the $6.5 billion OpenAI&#39;s (allegedly oversubscribed?) funding round</a>, leaving Microsoft and Nvidia to fight over the last dance. Rumor has it, Microsoft is sweetening the pot with an extra $1 billion, probably hoping to impress OpenAI with its deep pockets and irresistible charm. Beyond SoftBank at $500M, here&#39;s the list of <a class="link" href="https://www.yahoo.com/tech/nvidia-6-other-firms-could-090000549.html?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">other firms who are reportedly in talks to invest</a>. </p><h3 class="heading" style="text-align:left;" id="microsoft-and-black-rocks-initial-i">Microsoft and BlackRock’s Initial Investment of $30 Billion</h3><p class="paragraph" style="text-align:left;">Microsoft and BlackRock have announced a <a class="link" href="https://www.arise.tv/microsoftblackrock-to-launch-30-billion-ai-infrastructure-fund/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">joint venture</a>, the Global AI Infrastructure Investment Partnership, with an initial investment of over $30 billion to support the development of AI infrastructure, focusing on <b>data centers and energy resources. </b></p><p class="paragraph" style="text-align:left;">The partnership, which includes MGX as a general partner and expertise from Nvidia, aims to address the growing demand for <b>computational power </b>required by AI models while ensuring sustainable development.</p><h3 class="heading" style="text-align:left;" id="sakana-ai-raises-210-million-in-ser">Sakana AI Raises $210 Million in Series A Funding</h3><p class="paragraph" style="text-align:left;">Sakana AI, an NVIDIA-backed startup, has raised over <a class="link" href="https://finance.yahoo.com/news/nvidia-corporation-nvda-sakana-ai-072820030.html?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">$210 million</a> in a Series A funding round, doubling its initial target of $100 million and achieving a valuation exceeding <b>$1.5 billion</b> just a year after launch.</p><p class="paragraph" style="text-align:left;">The company has gained popularity for developing a method to automate the integration of <b>multiple foundational models.</b></p><h3 class="heading" style="text-align:left;" id="nvidia-may-acquire-octo-ai-for-165-">Nvidia May Acquire OctoAI for $165 Million</h3><p class="paragraph" style="text-align:left;">Nvidia is reportedly considering acquiring <a class="link" href="https://finance.yahoo.com/news/nvidia-considers-165m-octoai-acquisition-181454346.html?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">OctoAI,</a> a startup that develops <b>software to enhance AI model efficiency,</b> for $165 million.</p><p class="paragraph" style="text-align:left;">This potential acquisition follows OctoAI&#39;s recent collaboration with Nvidia to integrate NIM into its generative AI platform, aiming to serve <b>various enterprise use cases.</b></p><h3 class="heading" style="text-align:left;" id="black-forest-labs-reported-to-raise">Black Forest Labs Reported to Raise $100 Million</h3><p class="paragraph" style="text-align:left;">Black Forest Labs, the startup behind Grok&#39;s image generator (and <a class="link" href="https://genai360.beehiiv.com/p/on-device-llms-challenge-gpt3-5?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">competitor to Midjourney</a>), is reportedly raising <a class="link" href="https://techcrunch.com/2024/09/20/grok-image-generator-black-forest-labs-raising-100m-at-1b-valuation/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=pytorch-llm-improvements-oracle-salesforce-conference-updates-google-s-ai" target="_blank" rel="noopener noreferrer nofollow">$100 million at a $1 billion valuation</a>, just two months after emerging from stealth with <b>$31 million in funding. </b></p><p class="paragraph" style="text-align:left;">Previously, the startup’s valuation was around <b>$150 million</b> during the last funding round, so it’s definitely a big increase.</p></div><div class='beehiiv__footer'><br class='beehiiv__footer__break'><hr class='beehiiv__footer__line'><a target="_blank" class="beehiiv__footer_link" style="text-align: center;" href="https://www.beehiiv.com/?utm_campaign=1e1903b2-48ec-41f7-81c5-64de54bd3a8c&utm_medium=post_rss&utm_source=genai360_weekly_ai_news">Powered by beehiiv</a></div></div>
  ]]></content:encoded>
</item>

      <item>
  <title>August in AI: Grok-2 &gt; GPT-4 Turbo, New SOTA for Text-to-Image &amp; Video Gen</title>
  <description>Plus, SLMs from NVIDIA, MSFT, Google</description>
  <link>https://genai360.beehiiv.com/p/the-strawberry-mystery</link>
  <guid isPermaLink="true">https://genai360.beehiiv.com/p/the-strawberry-mystery</guid>
  <pubDate>Tue, 10 Sep 2024 15:09:08 +0000</pubDate>
  <atom:published>2024-09-10T15:09:08Z</atom:published>
  <content:encoded><![CDATA[
    <div class='beehiiv'><style>
  .bh__table, .bh__table_header, .bh__table_cell { border: 1px solid #C0C0C0; }
  .bh__table_cell { padding: 5px; background-color: #FFFFFF; }
  .bh__table_cell p { color: #2D2D2D; font-family: 'Helvetica',Arial,sans-serif !important; overflow-wrap: break-word; }
  .bh__table_header { padding: 5px; background-color:#F1F1F1; }
  .bh__table_header p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important; overflow-wrap: break-word; }
</style><div class='beehiiv__body'><p class="paragraph" style="text-align:left;">In a new experiment, we decided to provide a monthly mega-roundup of all things that may have flown under the radar for you in August. Before we start, share last week&#39;s news with a friend or a colleague:</p><div class="section" style="background-color:#FFFFFF;border-color:#ff8a00;border-radius:2px;border-style:solid;border-width:2px;margin:10.0px 10.0px 10.0px 10.0px;padding:10.0px 10.0px 10.0px 10.0px;"><h2 class="heading" style="text-align:left;"><span style="color:rgb(34, 34, 34);">Join RetrieveX Conference on Oct 17 in San Francisco. 30% OFF Before Prices Go Up Tomorrow</span></h2><p class="paragraph" style="text-align:left;"><span style="color:rgb(34, 34, 34);">Join RetrieveX, our flagship conference in retrieval for GenAI. Exclusively for those building high-accuracy, multimodal workflows, featuring leaders from Meta AI, Microsoft AI, YC, Bayer Radiology, Matterport, Cresta, as well as the co-creators of Meta LLama, PyTorch, Chameleon, and KubeFlow.</span></p><p class="paragraph" style="text-align:left;"><span style="color:rgb(34, 34, 34);">Checkout with promo code </span><span style="color:rgb(34, 34, 34);"><b>FINALCALL</b></span><span style="color:rgb(34, 34, 34);"> for 30% off (before the prices increase from </span><span style="color:rgb(34, 34, 34);"><b>$649 to $949 tomorrow</b></span><span style="color:rgb(34, 34, 34);">). Prices are going up by end of week, so secure your spot sooner rather than latter.</span></p><div class="button" style="text-align:center;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="background-color:#ff8a00;" href="https://www.eventbrite.com/e/retrievex-the-genai-data-retrieval-conference-tickets-983939869637?aff=oddtdtcreator&utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen"><span class="button__text" style="color:#F9FAFB;"> Get Tickets Today </span></a></div><p class="paragraph" style="text-align:left;"><span style="color:rgb(34, 34, 34);">Date: October 17, 10:30am - 7pm PT</span><br><span style="color:rgb(34, 34, 34);">Venue: The Midway, 900 Marin St, San Francisco</span><br><span style="color:rgb(34, 34, 34);">Attendees: 300 AI executives</span></p></div><h2 class="heading" style="text-align:left;" id="key-takeaways-for-august">Key Takeaways for August</h2><ul><li><p class="paragraph" style="text-align:left;">Hermes 3 by Nous Research showed <b>significant improvements </b>over its predecessor, demonstrating competitive benchmark scores against Llama 3.1 through enhanced training techniques.</p></li><li><p class="paragraph" style="text-align:left;">Ideogram 2.0, a SOTA <b>text-to-image</b> model, is now freely available with enhanced features and styles, improving image quality and text rendering through advanced training methods.</p></li><li><p class="paragraph" style="text-align:left;">Google AI Edge&#39;s <b>MediaPipe</b> enabled running 7B+ parameter language models in browsers using WebAssembly and WebGPU, overcoming memory restrictions through redesigned model-loading code.</p></li><li><p class="paragraph" style="text-align:left;">Researchers developed Pyramid Attention Broadcast (PAB) for real-time video generation, achieving up to <b>20.6 FPS</b> with a 10.5× acceleration by mitigating redundancy in attention computations.</p></li><li><p class="paragraph" style="text-align:left;">Google DeepMind open-sourced the <b>Vizier algorithm, </b>outperforming industry baselines in black-box optimization through a Gaussian process bandit approach.</p></li><li><p class="paragraph" style="text-align:left;">Anthropic’s new <b>prompt caching</b> feature dramatically reduces costs and latency for long prompts, set to become an industry norm.</p></li><li><p class="paragraph" style="text-align:left;">Huawei challenges Nvidia with the <b>Ascend 910C AI chip</b>, targeting the Chinese market amid production difficulties due to U.S. sanctions.</p></li><li><p class="paragraph" style="text-align:left;">Grok-2 and Grok-2 mini outperformed <b>GPT-4 Turbo</b> in benchmarks such as GPQA and MMMU, excelling in reasoning and factual accuracy.</p></li><li><p class="paragraph" style="text-align:left;">DeepSeek-Prover-V1.5 is an <b>advanced theorem-proving LLM</b> with improved performance on formal mathematics tasks, showcasing state-of-the-art results on rigorous benchmarks like ProofNet.</p></li></ul><p class="paragraph" style="text-align:left;"><i>Got forwarded this newsletter? Subscribe below👇</i></p><div class="button" style="text-align:left;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="background-color:#ff8a00;" href="https://genai360.beehiiv.com/subscribe?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen"><span class="button__text" style=""> Subscribe </span></a></div><h2 class="heading" style="text-align:left;" id="the-latest-ai-news">The Latest AI News</h2><p class="paragraph" style="text-align:left;">Things seemed to have taken a strange turn regarding the<b> Strawberry situation. </b>Meanwhile, some neat optimization frameworks and libraries were introduced, along with a bunch of language and image generator models.</p><p class="paragraph" style="text-align:left;">There were quite a few model releases, ranging from <b>SLMs to text-to-image</b> models. Moreover, a chatbot arena update saw Grok-2 ranked very highly, which means xAI might take the top spot very soon.</p><h3 class="heading" style="text-align:left;" id="salesforce-jamba-and-hermes-expand-">Salesforce, Jamba, and Hermes Expand the AI Model Landscape</h3><div class="image"><img alt="" class="image__image" style="border-radius:0px 0px 0px 0px;border-style:solid;border-width:0px 0px 0px 0px;box-sizing:border-box;border-color:#E5E7EB;" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXecOMFzawN61AwUwGZ1sbap5RVUTaqJffMpAQj9rOQHdnwSwxUy2eOEuYMtnQcjMAlFjzc-rz0_PxWBOxBPK46nJw4CfC4TlI3iOUATkC0kMGjrLuWD6nx1OacKNHLDFYzW6l161Hfta5IH7O5zkHZwP_6H?key=c8FDumnflB67qhwj04PXMQ"/><div class="image__source"><span class="image__source_text"><p>Comparison of claimed vs effective context window for the RULER benchmark. <a class="link" href="https://www.ai21.com/blog/announcing-jamba-model-family?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;"><a class="link" href="https://www.salesforce.com/news/stories/einstein-sales-agents-announcement/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">Salesforce</a> announced the release of Einstein SDR Agent and Einstein Sales Coach Agent. Einstein autonomously manages inbound leads by answering questions, handling objections, and scheduling meetings, all while grounded in a company’s <b>CRM and external data</b>. </p><p class="paragraph" style="text-align:left;">On the other hand, Einstein Sales Coach Agent does exactly what you’d expect from the name - coach sellers via role-plays and<b> provide personalized feedback afterwards.</b></p><p class="paragraph" style="text-align:left;">What’s more is that these AI agents can be tailored to a company’s specific needs, including setting engagement guardrails and language preferences, making them highly adaptable to different sales strategies. We’re seeing companies like Accenture use them to <b>improve deal effectiveness </b>and scale support for more complex sales activities.</p><p class="paragraph" style="text-align:left;">Another release included <a class="link" href="https://www.ai21.com/blog/announcing-jamba-model-family?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">AI21 Labs’ Jamba 1.5 Mini and Large models</a>, built on the novel SSM-Transformer architecture. These offer <b>superior long-context </b>handling, speed, and quality. These models are the first non-Transformer models to match the performance of leading competitors, featuring a 256K context window—the longest in the market for open models.</p><p class="paragraph" style="text-align:left;">The Jamba 1.5 models are designed for resource efficiency, capable of handling up to 140K tokens on a <b>single GPU</b> for Jamba 1.5 Mini. </p><p class="paragraph" style="text-align:left;">In particular, these models stand out because they maintain high performance across the entire context window, significantly improving the<b> efficiency </b>and accuracy of enterprise-scale applications. Independent tests showed Jamba 1.5 Mini as the fastest model on 10K contexts, outpacing other models in its size class.</p><p class="paragraph" style="text-align:left;"><a class="link" href="https://nousresearch.com/hermes3/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">Hermes 3 by Nous Research </a>was also a noteworthy release with drastic improvements over its <b>predecessor,</b> Hermes 2. Some of which include:</p><ul><li><p class="paragraph" style="text-align:left;">More reliable function calling </p></li><li><p class="paragraph" style="text-align:left;">Better code generation skills</p></li><li><p class="paragraph" style="text-align:left;">Enhanced general assistant capabilities</p></li></ul><p class="paragraph" style="text-align:left;">In terms of benchmark performance, Hermes 3 is certainly no slouch as it showed<b> highly competitive benchmark scores </b>with Llama 3.1.</p><div class="image"><img alt="" class="image__image" style="border-radius:0px 0px 0px 0px;border-style:solid;border-width:0px 0px 0px 0px;box-sizing:border-box;border-color:#E5E7EB;" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXcXuxb930MLEjpXPplc4FXwbk_I5fIZpRD2TjvGOUlY6FuVyF8yrH-uxNUkOpwu8zWc19QwpSLJZ80cMqu3dkvLCp341Iu-Dy9v_IyfDgC8vW1HiMfgOxyoyZjpIaK4ztFbUtA5xikHvqqNxCsZmkpoa7mY?key=c8FDumnflB67qhwj04PXMQ"/><div class="image__source"><span class="image__source_text"><p>Hermes 3 was able to outperform Llama 3.1 on benchmarks like AGIEval and ARC-C. <a class="link" href="https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><h3 class="heading" style="text-align:left;" id="ideograms-new-model-and-flux-availa">Ideogram’s New Model and FLUX Available on 3 Platforms</h3><div class="image"><img alt="" class="image__image" style="border-radius:0px 0px 0px 0px;border-style:solid;border-width:0px 0px 0px 0px;box-sizing:border-box;border-color:#E5E7EB;" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXcEyH2ZMmZRuJOA3XAB0PFPgM_2Oxlhq2DKN4WdbYh1uDwoMcostz7MuPZ4Ae0_dcy81e-g23QEAtgS1_VS10ZY6Y9mfO5JpfEVpfuubjJVlJ-9mHpL1Sc23lKipEAJsViG7cTNZEW42zj7JQUJkzzcj9w?key=c8FDumnflB67qhwj04PXMQ"/><div class="image__source"><span class="image__source_text"><p>Sample image produced by Ideogram 2.0. <a class="link" href="https://about.ideogram.ai/2.0?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;"><a class="link" href="https://about.ideogram.ai/2.0?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">Ideogram 2.0</a>, a SOTA text-to-image model, is now accessible to all users via Ideogram.ai and a<b> newly launched iOS app</b>. The platform also introduces premium features through subscription plans, offering enhanced creative control and flexibility.</p><p class="paragraph" style="text-align:left;">The model supports<b> various styles</b> such as realistic and design, so users can create highly detailed and context-specific images. These styles boost the realism of textures and accuracy of text in designs.</p><p class="paragraph" style="text-align:left;">Improvements over its predecessor include creative tools like Magic Prompt and Describe, which help users generate and refine prompts for <b>image creation</b>. These tools enhance the iterative creative process, allowing for continuous reimagining of visual concepts.</p><p class="paragraph" style="text-align:left;">We mentioned before that Midjourney had some <b>serious competition </b>with Black Forest Labs’ FLUX-1 models. Turns out these models are available on three platforms: </p><ul><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://replicate.com/black-forest-labs/flux-pro?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">Replicate</a> (no free credits)</p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://fal.ai/models/fal-ai/flux-pro?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">Fal</a> ($1 free credit)</p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://www.mystic.ai/black-forest-labs/flux1-pro?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">Mystic </a>(available for free)</p></li></ul><h3 class="heading" style="text-align:left;" id="google-nvidia-and-microsoft-advance">Google, Nvidia, and Microsoft Advance SLMs</h3><div class="image"><img alt="" class="image__image" style="border-radius:0px 0px 0px 0px;border-style:solid;border-width:0px 0px 0px 0px;box-sizing:border-box;border-color:#E5E7EB;" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfE58oL7dwahpS-tCQee-U5yz1ge5RqLkLdp3m-czm5QWE4BiAGiBWuHJIfaV0LnQqS7sg9oXFYO_eELNukr5lrneKAetwUe4LwkhdXEegmCdXJd6lRIIRfwwfcumEJUykDLhBp0P_uAMLIVg-W7RfzW0yH?key=c8FDumnflB67qhwj04PXMQ"/><div class="image__source"><span class="image__source_text"><p>Phi 3.5-MoE showed impressive modal quality on the MMLU benchmark. <a class="link" href="https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/discover-the-new-multi-lingual-high-quality-phi-3-5-slms/ba-p/4225280?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Given the recent trend of the <b>rising popularity</b> of SLMs recently, it isn’t too surprising to see more SLM releases.</p><p class="paragraph" style="text-align:left;"><a class="link" href="https://research.google/blog/unlocking-7b-language-models-in-your-browser-a-deep-dive-with-google-ai-edges-mediapipe/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">Google AI Edge&#39;s MediaPipe </a>redesigned its <b>model-loading code</b> to overcome memory restrictions, which means larger language models (7B+ parameters) can be run in the browser using their cross-platform inference framework.</p><p class="paragraph" style="text-align:left;">The framework compiles <b>C++ </b>code into WebAssembly for efficient browser performance while leveraging WebGPU API for <b>direct GPU access.</b> New strategies, such as asynchronous layer loading and local caching, drastically reduced WebAssembly memory usage, so larger models can run smoothly.</p><p class="paragraph" style="text-align:left;">After Nvidia discussed how to prune <a class="link" href="https://genai360.beehiiv.com/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=cryptic-strawberry-teasers-grok-2-gpt-4-turbo-ai-optimization-breakthroughs" target="_blank" rel="noopener noreferrer nofollow">Llama-3.1 8B to Llama-3.1-Minitron-8B, </a>they released <a class="link" href="https://blogs.nvidia.com/blog/mistral-nemo-minitron-8b-small-language-model/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">Mistral-NeMo-Minitron 8B</a>. It’s a <b>miniaturized v</b>ersion of the Mistral NeMo 12B model, which delivers SOTA accuracy in a compact 8 billion parameter form factor. </p><p class="paragraph" style="text-align:left;"><b>Mistral-NeMo-Minitron 8B </b>leads on nine popular benchmarks for language models of its size, covering tasks like language understanding, reasoning, summarization, coding, and generating truthful answers. </p><p class="paragraph" style="text-align:left;">NVIDIA also announced Nemotron-Mini-4B-Instruct, another <b>SLM</b> optimized for low memory usage and faster response times on NVIDIA GeForce RTX AI PCs and laptops, available as part of NVIDIA ACE technologies.</p><p class="paragraph" style="text-align:left;">Meanwhile, Microsoft introduced <a class="link" href="https://huggingface.co/microsoft/Phi-3.5-MoE-instruct?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">Phi-3.5</a>, an <b>updated family of SLMs</b> including:</p><ul><li><p class="paragraph" style="text-align:left;">Phi-3.5-mini (3.8B parameters)</p></li><li><p class="paragraph" style="text-align:left;">Phi-3.5-vision</p></li><li><p class="paragraph" style="text-align:left;">Phi-3.5-MoE (Mixture-of-Experts with 42B total parameters but only 6.6B active).</p></li></ul><p class="paragraph" style="text-align:left;">Phi-3.5-mini enhances <b>multi-lingual support </b>with a 128K context length, showing significant improvements in languages like Arabic, Dutch, Finnish, Polish, Thai and Ukrainian with 25-50% performance boosts.</p><p class="paragraph" style="text-align:left;">Phi-3.5-MoE outperforms <b>larger dense models</b> in quality and performance, supporting over 20 languages and employing robust safety post-training strategies combining Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO).</p><p class="paragraph" style="text-align:left;">Things seemed to have taken a strange turn regarding the <b>Strawberry situation</b>. Meanwhile, some neat optimization frameworks and libraries were introduced, along with a bunch of language and image generator models.</p><h3 class="heading" style="text-align:left;" id="ai-optimization-trifecta-llm-compre">AI Optimization Trifecta: LLM Compressor, Apple’s ToolSandbox, and PyTorch’s FlexAttention</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXegs3GY871JyzMlubMUulC1NKuRXB5Zx_llJ2RNNLRzguYsgbbRkbc07X7C7uAWmksSFfJuh2P6rEkXXWVLtjl8Lfe7A2lP2Vk4zRQMaFO7NQY3lmbFsObbQcLTSyfQmMysUPEBWu2qAZifp31fLKKUero?key=ABXGbecGfkiQdR3IaqjByw"/><div class="image__source"><span class="image__source_text"><p>LLM compressor overview. <a class="link" href="https://neuralmagic.com/blog/llm-compressor-is-here-faster-inference-with-vllm/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;"><a class="link" href="https://neuralmagic.com/blog/llm-compressor-is-here-faster-inference-with-vllm/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">NeuralMagic’s LLM Compressor</a> is a unified library for creating compressed models for faster inference with vLLM. It enables <b>various compression techniques</b> like weight quantization, activation quantization, and pruning.</p><p class="paragraph" style="text-align:left;">We saw that <a class="link" href="https://genai360.beehiiv.com/p/of-strawberries-and-models?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">Llama 3 was adapted to multimodality</a>. NeuralMagic also did some work with Llama 3 by using LLM Compressor to create fully <b>quantized </b>versions of <a class="link" href="https://huggingface.co/collections/neuralmagic/llama-31-quantization-66a3f907f48d07feabb8f300?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">Llama 3.1.</a></p><p class="paragraph" style="text-align:left;">It also integrates seamlessly with Hugging Face models and vLLM, which makes deployment pretty straightforward. LLM Compressor <b>showed notable performance improvements</b>, with INT8 weight and activation quantized models showing up to 1.6x speedup compared to FP16 baselines at low query rates.</p><p class="paragraph" style="text-align:left;"><a class="link" href="https://pytorch.org/blog/flexattention/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">ToolSandbox</a> is a new evaluation framework for assessing tool-use capabilities of LLMs, addressing limitations of <b>previous evaluation methods</b>. It introduces stateful tool execution and implicit state dependencies between tools, moving beyond simple stateless web services or single-turn prompts.</p><p class="paragraph" style="text-align:left;">The framework includes a built-in user simulator that enables on-policy conversational evaluation, allowing for more dynamic and realistic testing scenarios. ToolSandbox implements a <b>dynamic evaluation</b> strategy that can assess both intermediate and final milestones over arbitrary interaction trajectories.</p><p class="paragraph" style="text-align:left;">Optimization news continued with <a class="link" href="https://pytorch.org/blog/flexattention/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">FlexAttention</a>, a new<b> PyTorch AP</b>I that allows implementing many attention variants in a few lines of idiomatic PyTorch code, addressing the lack of flexibility in existing optimized attention implementations.</p><p class="paragraph" style="text-align:left;">It introduces a flexible API with a user-defined score_mod function that can modify attention scores prior to softmax, which enables <b>various attention patterns like:</b></p><ul><li><p class="paragraph" style="text-align:left;">Causal masking</p></li><li><p class="paragraph" style="text-align:left;">Relative positional encodings</p></li><li><p class="paragraph" style="text-align:left;">Sliding window attention</p></li></ul><p class="paragraph" style="text-align:left;">FlexAttention uses<b> torch.compile</b> to lower the user-defined functions into a fused FlashAttention kernel, achieving performance competitive with handwritten kernels without materializing extra memory.</p><h3 class="heading" style="text-align:left;" id="llama-3-pruning-and-claudes-caching">Llama 3 Pruning and Claude&#39;s Caching Technique</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXdTJ6QwFFyPUNmEcVyyy7m9wCrGPZ_YTmctMT89frvcHPzHbWXxhf5F8SxKoKDWbsvGysmFYW6EbDPPnXrPfxD97ON0qeldeJCxje0aosIngHDEmqqqzzWx17Gp2KucbAZDLT8An7HlolKk9uykNZUQxnIQ?key=ABXGbecGfkiQdR3IaqjByw"/><div class="image__source"><span class="image__source_text"><p>Pruning and distillation process of a single model. <a class="link" href="https://developer.nvidia.com/blog/how-to-prune-and-distill-llama-3-1-8b-to-an-nvidia-llama-3-1-minitron-4b-model/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">We’ve seen how small language models are recently attracting attention with various releases such as <a class="link" href="https://genai360.beehiiv.com/p/of-llamas-and-slms?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">GPT-4 mini</a> and <a class="link" href="https://genai360.beehiiv.com/p/on-device-llms-challenge-gpt3-5?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">Gemma 2 2B.</a> Nvidia is continuing the SLM momentum by showing us how larger models can be pruned to obtain a smaller model, using <a class="link" href="https://developer.nvidia.com/blog/how-to-prune-and-distill-llama-3-1-8b-to-an-nvidia-llama-3-1-minitron-4b-model/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">Llama-3.1-Minitron-4B</a> as an example. This involves structured weight pruning combined with <b>knowledge distillation.</b></p><p class="paragraph" style="text-align:left;">The pruning process includes <b>both </b>depth pruning (removing 16 layers) and width pruning (reducing embedding and MLP dimensions).</p><p class="paragraph" style="text-align:left;">Knowledge distillation is used to retrain the pruned model, with the original 8B model serving as the teacher. The pruned and distilled 4B model <b>outperforms </b>other models of similar size on various benchmarks, while requiring fewer training tokens and compute resources compared to training from scratch. </p><p class="paragraph" style="text-align:left;">Anthropic has introduced <a class="link" href="https://www.anthropic.com/news/prompt-caching?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">prompt caching</a> for its Claude AI models, letting developers store and reuse <b>large amounts of context</b> between API calls. This feature is currently available in public beta for Claude 3.5 Sonnet and Claude 3 Haiku, with support for Claude 3 Opus coming soon.</p><p class="paragraph" style="text-align:left;">Prompt caching can reduce costs by <b>up to 90%</b> and latency by up to 85% for long prompts. It&#39;s particularly useful for scenarios like conversational agents, coding assistants, large document processing, and agentic search where repeated access to extensive context is needed.</p><p class="paragraph" style="text-align:left;">The pricing model for cached prompts <b>involves a 25%</b> premium over base input token prices for writing to the cache, but only 10% of the base price for reading cached content. This structure makes it appealing to frequently reuse cached prompts. </p><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXeR2VlumZC30yYrHWzCSHBvYiapVo7pwX0SOK0e-VpdxYbYQ5FjbKgefNjUflqzAlv_NrJyNOdtXrN84to29kJi3n877LbykKlHOryc_TXHEwiFvoOrgqIYH7LXXx8dVIS_xPvNFEN4W3UErj36SG6C1DQ?key=ABXGbecGfkiQdR3IaqjByw"/><div class="image__source"><span class="image__source_text"><p>Prompt caching prices. <a class="link" href="https://www.anthropic.com/news/prompt-caching?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Early adopters have reported significant improvements in both speed and cost across various use cases. For example, chatting with a book using a 100,000 token cached prompt saw a <b>79% reduction </b>in latency and 90% cost reduction.</p><p class="paragraph" style="text-align:left;">Interestingly, it seems like prompt caching will become an <b>industry norm </b>at this rate.</p><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXdAoHU08yma-ib-6eUFgnGfNj6l01aiBAVKuBHHHdPIOerz4fwXI7u8De7mjdimESKSxq_Qb_2FKpdylsEaxdJp1lGlFdbRGJWAz-5eWmk42OuccnXdXNKZA_wqyER848jRs5BzIqYbyimyVGZ0bOg57oN1?key=ABXGbecGfkiQdR3IaqjByw"/><div class="image__source"><span class="image__source_text"><p>Matt Shumer pointed out that Google, DeepSeek, and Anthropic have all incorporated prompt caching. <a class="link" href="https://x.com/mattshumer_/status/1823755850353132002?s=46&t=r_IyUhjxHPp6D-O3kuDbzQ&utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><h3 class="heading" style="text-align:left;" id="grok-2-gpt-4-o-and-gemini-live-ente">Grok-2, GPT-4o, and Gemini Live Enter the Arena</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXeLtjMdkSODlxFww5VXr5tSt5fwZT63e_zZuZALfTn7eMjpl11qBIMKtKaHF8h-6ztnBu_tZcU0DkHbS_wOEprGCdR_20KhHNh2kWjYf4gSbRd3tTlfIOpRG3uFWoAgCnL2oEhsAmd2ngyQXkV2qTf3Jb5t?key=ABXGbecGfkiQdR3IaqjByw"/><div class="image__source"><span class="image__source_text"><p>Grok-2 and Grok-2 mini show notable improvement in various benchmarks compared to Grok-1.5 and other SOTA models. <a class="link" href="https://x.ai/blog/grok-2?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;"><a class="link" href="https://x.ai/blog/grok-2?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">Grok-2</a> and its smaller version, Grok-2 mini (another SLM), were released and showed significant improvements in chat, coding, and reasoning over Grok-1.5. Grok-2, tested under the name &quot;<b>sus-column-r,</b>&quot; outperforms models like Claude 3.5 Sonnet and GPT-4-Turbo on the LMSYS leaderboard.</p><p class="paragraph" style="text-align:left;">Grok-2 shows superior reasoning, factual accuracy, and tool use, particularly excelling in <b>academic benchmarks</b> such as MATH, MMLU, and DocVQA. Grok-2 is integrated into the X app, providing advanced AI assistance, real-time information, and enhanced search functionalities for Premium users.</p><p class="paragraph" style="text-align:left;">It was also added to the <a class="link" href="https://x.com/lmsysorg/status/1827041269534879784?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">LM Arena leaderboard</a>, which compares various LLMs. The leaderboard includes a <b>win-rate heatmap</b>, showing how Grok-2 performs against other models in head-to-head comparisons.</p><div class="image"><img alt="" class="image__image" style="border-radius:0px 0px 0px 0px;border-style:solid;border-width:0px 0px 0px 0px;box-sizing:border-box;border-color:#E5E7EB;" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXdhGAZ7LBIUIBcEkDpXP1gJv1n6rqL5bJTDS34yoX6N0XUq_HoQDvfB4c3JWQnEdTKM_AEaUiyj9fQIvJHuPVNpFpN24GZPL1vMKAAq_y6_COjDpOMD3SuxXWExcr2nQqFm_oGAB_AhbOYKa_BpUfquxGBl?key=c8FDumnflB67qhwj04PXMQ"/><div class="image__source"><span class="image__source_text"><p>Grok-2 has climbed to the top of the leaderboard. <a class="link" href="https://x.com/lmsysorg/status/1827041269534879784?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Grok-2’s performance is comparable to <b>top LLMs</b>, only losing to the latest GPT-4 model and Gemini-1.5 pro. Pretty impressive considering that xAI was only founded in March 2023 and already caught up to the likes of OpenAI.</p><p class="paragraph" style="text-align:left;">Meanwhile, OpenAI launched a new <a class="link" href="https://www.cryptopolitan.com/openai-rolls-out-new-gpt-4o-model-in-chatgpt/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">GPT-4o</a> model in ChatGPT, enhancing multi-step reasoning and detailed explanations, leading to more accurate and logical responses. The new model <b>independently</b> generates images, surpassing DALL-E 3 in both speed and visual accuracy, improving the integration within ChatGPT. </p><p class="paragraph" style="text-align:left;">Users noticed <b>significant improvements</b> in both reasoning and image generation, leading to better quality outputs and a more seamless experience. </p><p class="paragraph" style="text-align:left;">Google didn’t just sit on the sidelines and X.ai and OpenAI released new models. They unveiled <a class="link" href="https://techcrunch.com/2024/08/17/google-takes-on-openai-with-gemini-live/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">Gemini Live</a>, a c<b>onversational AI voice assistant,</b> at its Made by Google event alongside new Pixel phones, AI photo editing tools, and Pixel Buds Pro 2 with Gemini AI. The Gemini Live demo had some issues during the presentation.</p><h3 class="heading" style="text-align:left;" id="qs-efficiency-gains-and-jassys-long">Q’s Efficiency Gains and Jassy&#39;s Long-Term Confidence</h3><div class="image"><img alt="" class="image__image" style="border-radius:0px 0px 0px 0px;border-style:solid;border-width:0px 0px 0px 0px;box-sizing:border-box;border-color:#E5E7EB;" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXeJYm4hv012lg2DJS3Jv2r2aXA-AXbxN_D-KLrajdML5G0VI88Bwf6kGdpussNzl8iSYfa1vLkwh6vRmoDjZwNBfUk4KGdMhqkflBx6QXg2dcxPeg76TsTs6PfPXa42u5GGVGPvny6TeaX5NKs_vK7Rp2M?key=c8FDumnflB67qhwj04PXMQ"/><div class="image__source"><span class="image__source_text"><p>Amazon CEO Andy Jassy. <a class="link" href="https://www.crn.com/news/ai/2024/amazon-q2-2024-earnings-ceo-jassy-very-bullish-on-ai-in-long-run?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen#:~:text=Amazon%20AI%20A%20Long%2DTerm,multibillion%20dollar%20revenue%20run%20rate.%E2%80%9D" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Amazon remains &quot;<a class="link" href="https://finance.yahoo.com/news/amazon-ceo-andy-jassy-says-213018283.html?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">very bullish</a>&quot; on AI&#39;s medium to long-term impact across all businesses, with their AI business already at a &quot;<b>multibillion dollar revenue run rate&quot;. </b></p><p class="paragraph" style="text-align:left;">We’re seeing Amazon leverage AI in its e-commerce business, including an AI shopping assistant called Rufus, apparel simulation features, and a &quot;Project Private Investigator&quot; using AI and computer vision to<b> detect product defects.</b></p><p class="paragraph" style="text-align:left;">On AI costs, Jassy stated Amazon has developed expertise in managing capacity for AWS and AI customers. While investing significantly in AI infrastructure, they still see more demand than current capacity. AWS brought in<b> $26.3 billion in Q2</b>, up 19% year-over-year, with an annualized revenue run rate over $105 billion. </p><p class="paragraph" style="text-align:left;">Jassy also mentioned that Amazon’s AI assistant, <a class="link" href="https://www.crn.com/news/ai/2024/amazon-q2-2024-earnings-ceo-jassy-very-bullish-on-ai-in-long-run?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen#:~:text=Amazon%20AI%20A%20Long%2DTerm,multibillion%20dollar%20revenue%20run%20rate.%E2%80%9D" target="_blank" rel="noopener noreferrer nofollow">Amazon Q</a>, has significantly reduced software upgrade times, cutting the average time to upgrade an application to Java 17 from 50 developer days to<b> just a few hours.</b></p><p class="paragraph" style="text-align:left;">This efficiency has saved Amazon an estimated<b> 4,500 developer-ye</b>ars of work, with 79% of AI-generated code reviews being shipped without additional changes. The upgrades have not only saved time but also enhanced security and reduced infrastructure costs, providing an estimated $260 million in annualized efficiency gains.</p><p class="paragraph" style="text-align:left;">Amazon Q&#39;s success comes after initial challenges, including issues with incorrect outputs or &quot;<b>hallucinations</b>&quot;. These were addressed by expanding the team of human reviewers to fine-tune the chatbot&#39;s outputs.</p><h3 class="heading" style="text-align:left;" id="ai-chip-race-heats-up-as-huawei-cha">AI Chip Race Heats Up as Huawei Challenges Nvidia While Softbank Pivots from Intel</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXdpNxBxenY1C7uxpJCLrlZ4MG7BdZaYiSyHKQu_zsgLcix6RV7mcTeEgZ_pPFAa0VHJzS3C_frKp2hw0y0pk4hRZy2jYwfvkpE-nWWHLilYFSC2MZlkuznKkVJ3Z4w6n_-rAg1rppj2i-zqy_kFfw_vx88?key=ABXGbecGfkiQdR3IaqjByw"/><div class="image__source"><span class="image__source_text"><p>Huawei’s latest AI chip. <a class="link" href="https://www.huaweicentral.com/huawei-silently-testing-ascend-910c-ai-chip-to-rival-nvidia-report/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">There always seems to be something going on in the AI chip market. Huawei is developing the <a class="link" href="https://www.tomshardware.com/tech-industry/artificial-intelligence/huawei-already-has-a-new-chip-to-rival-nvidia-ai-gpus?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">Ascend 910C AI chip </a>to <b>compete </b>with Nvidia&#39;s GPUs in the Chinese market, particularly against the HGX H20 and the rumored Blackwell-based B20.</p><p class="paragraph" style="text-align:left;">Major Chinese companies like Baidu and China Mobile have <b>already</b> tested the chip, with results reportedly on par with Nvidia&#39;s H100. </p><p class="paragraph" style="text-align:left;">Expected demand for the Ascend 910C could exceed <b>70,000 units,</b> with shipments targeted to start in October, but production isn’t going as smoothly as expected because of U.S. sanctions.</p><p class="paragraph" style="text-align:left;">The Ascend 910C <b>aims to improve</b> upon its predecessor, the Ascend 910B, by addressing yield issues and enhancing performance.</p><p class="paragraph" style="text-align:left;">Previously, we looked at how Intel’s discussions with<a class="link" href="https://genai360.beehiiv.com/p/of-strawberries-and-models?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow"> OpenAI were a huge turning point </a>in the AI chip market. Intel’s story continues as <a class="link" href="https://www.cryptopolitan.com/softbank-ends-ai-chip-partnership-with-intel/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">SoftBank</a> <b>halted</b> its AI chip development partnership with Intel, since Intel had issues meeting production volume and speed requirements. SoftBank is now negotiating with TSMC, the world&#39;s largest contract chipmaker, for AI chip production.</p><p class="paragraph" style="text-align:left;">SoftBank&#39;s Project Izanagi aims to develop AI processors to rival Nvidia’s GPUs, initially relying on Intel’s capacity but now looking to <b>TSMC</b>. SoftBank plans to establish AI data centers globally by 2026 and is developing AI chips with Arm, targeting a prototype by 2025.</p><h3 class="heading" style="text-align:left;" id="ai-image-generation-leaps-forward-w">AI Image Generation Leaps Forward With Google’s Imagen 3, Runway’s Gen-3, and Midjourney&#39;s Unified Editor</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfHOzZh2l3OR3nAu4c8Z-r_J0uTNKxdIBfzJTgLs2Fjlbe4_nUQg2L3LqZPr10ZMGEeR9DTg9C64NCueYQpAhWcmB2OWvBuvdk86TF-Kwu5B60zmOnCZpfN4MYnhapS98schg5f_qfkFfswmJpn8-jjOVAc?key=ABXGbecGfkiQdR3IaqjByw"/><div class="image__source"><span class="image__source_text"><p>Example of an image produced by Imagen 3. <a class="link" href="https://www.theverge.com/2024/8/15/24221218/google-ai-image-generator-imagen-3-available?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">(Source) </a></p></span></div></div><p class="paragraph" style="text-align:left;">After releasing the next generation of <a class="link" href="https://genai360.beehiiv.com/p/on-device-llms-challenge-gpt3-5?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">Gemma 2 models</a>, Google released <a class="link" href="https://pytorch.org/blog/flexattention/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">Imagen 3</a>, an advanced AI text-to-image generator. This version was first announced during<b> Google I/O in May 2024.</b></p><p class="paragraph" style="text-align:left;">Imagen 3 introduces <b>significant improvements</b> in image generation, producing visuals with better detail, richer lighting, and fewer distracting artifacts compared to previous iterations. It aims to enhance realism and reduce errors in the generated images.</p><p class="paragraph" style="text-align:left;">Users can interact with the generated images by <b>highlighting specific sections </b>and applying changes based on their descriptions, offering a more refined and customizable image creation experience.</p><p class="paragraph" style="text-align:left;">AI image generators continued to see more releases with <a class="link" href="https://venturebeat.com/ai/runways-gen-3-alpha-turbo-is-here-and-can-make-ai-videos-faster-than-you-can-type/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">Runway ML </a>officially launching<b> Gen-3 Alpha Turbo</b>, an upgraded AI video generation model, promising seven times faster performance at half the cost compared to its predecessor, Gen-3 Alpha.</p><p class="paragraph" style="text-align:left;">The new model is accessible across <b>all subscription plans</b>, including free trials. It&#39;s priced at 5 credits per second of generated video, making it more affordable and widely available.</p><p class="paragraph" style="text-align:left;">Gen-3 Alpha Turbo prioritizes speed, <b>significantly reducing video generation time. </b>This improves workflow efficiency, particularly for users needing quick turnarounds.</p><p class="paragraph" style="text-align:left;">Early users have praised the model&#39;s <b>speed and quality,</b> with some still favoring the original for certain use cases - though the faster version is well-received for simpler tasks.</p><p class="paragraph" style="text-align:left;">Midjourney introduced a <a class="link" href="https://venturebeat.com/ai/midjourney-releases-new-unified-ai-image-editor-on-the-web/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">new web editor</a> that unifies inpainting, outpainting, and other tools into a single interface, making it easier to edit AI-generated images seamlessly. The new editor includes a <b>more precise virtual brush</b> for inpainting, replacing older tools, allowing for finer control over image edits.</p><h3 class="heading" style="text-align:left;" id="harveys-impressive-retention">Harvey’s Impressive Retention</h3><div class="image"><img alt="" class="image__image" style="border-radius:0px 0px 0px 0px;border-style:solid;border-width:0px 0px 0px 0px;box-sizing:border-box;border-color:#E5E7EB;" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXdSgSaLHh4tNp7AvRI2xDpxoPkMFIHoetvdFfHZ69QrT1I6f8InsCR9Qwrl4TGSk-8_9LUMaCrycDKNI2GLL8dKPJ9m8rvKZrqvf4qrwwV8VXrHDIhblv1xCGbug8KKXj6g4B2pqGqjkPSrAktAvly3V0e8?key=c8FDumnflB67qhwj04PXMQ"/><div class="image__source"><span class="image__source_text"><p>Harvey showed a notable increase in user retention. <a class="link" href="https://www.harvey.ai/blog/a-new-era-for-technology-adoption-in-professional-services?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;"><a class="link" href="https://www.harvey.ai/blog/a-new-era-for-technology-adoption-in-professional-services?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">Harvey,</a> an AI platform for professional services, has seen significant growth in utilization, more than <b>double</b> from 33% in August 2023 to 69% in August 2024 across all users.</p><p class="paragraph" style="text-align:left;">User retention rates for Harvey have remained<b> consistently high</b>, hovering around or above 70% after one year - exceptional for enterprise SaaS and legal tech.</p><p class="paragraph" style="text-align:left;">In particular, three case studies of BigLaw firms show <b>rapid and substantial adoption of Harvey: </b></p><ul><li><p class="paragraph" style="text-align:left;">Firm #1 reached 93% utilization by month 12</p></li><li><p class="paragraph" style="text-align:left;">Firm #2 jumped from 19% to 97% utilization in one month</p></li><li><p class="paragraph" style="text-align:left;">Firm #3 exceeded 100% utilization from month 4 onwards, peaking at 128% by month 10.</p></li></ul><p class="paragraph" style="text-align:left;">Harvey&#39;s success in rapid onboarding and consistent usage over time highlights its effectiveness in delivering<b> immediate value</b>, ease of integration into existing systems, and potential to provide firms with competitive advantages in service delivery and client satisfaction.</p><p class="paragraph" style="text-align:left;">Not to mention that Harvey recently had a <a class="link" href="https://www.forbes.com/sites/aliciapark/2024/08/08/why-openai-and-google-are-betting-on-this-ai-unicorn-with-a-100-million-deal/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">successful funding round</a> involving tech giants like OpenAI and Google. Harvey’s chatbot was able to impress OpenAI executives with an 86% accuracy rate, with the deal making Harvey the<b> highest value startup in OpenAI’s portfolio.</b></p><h3 class="heading" style="text-align:left;" id="claude-hits-1-m-mobile-milestone">Claude Hits $1M Mobile Milestone</h3><div class="image"><img alt="" class="image__image" style="border-radius:0px 0px 0px 0px;border-style:solid;border-width:0px 0px 0px 0px;box-sizing:border-box;border-color:#E5E7EB;" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXd9GIn9zGLgUhn9F7tUk27qiM6leZaAJHCBL4o32UHeZNEwgWQPOaWNZ7GRAEdhKUjWA2bFovf8IbP9ikL0h4anrloyFwpiKuhpKCPXrSvuH8u_QOmuUMiOxat19XzY3W4TpmCcDa3URichQElVcg7jVD5j?key=c8FDumnflB67qhwj04PXMQ"/><div class="image__source"><span class="image__source_text"><p>Graph of consumer spending in Claude App in recent weeks. <a class="link" href="https://techcrunch.com/2024/08/21/anthropics-claude-surpasses-1m-in-mobile-app-revenue/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Anthropic’s Claude has crossed <a class="link" href="https://techcrunch.com/2024/08/21/anthropics-claude-surpasses-1m-in-mobile-app-revenue/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">$1 million</a> in gross mobile app revenue across iOS and Android in 16 weeks since launch. Nearly half (48.4%) of Claude&#39;s mobile revenue has been <b>generated by users in the</b> <b>US.</b></p><p class="paragraph" style="text-align:left;">It’s certainly an impressive milestone, but Claude still<b> ranks far behind ChatGPT,</b> which is No. 1 by overall downloads and No. 26 by revenue in the US on iOS. Claude is only 95th in the Productivity category by downloads and 68th by revenue.</p><p class="paragraph" style="text-align:left;">Claude reached the <b>$1 million revenue mark faster </b>than competitors like Microsoft&#39;s Copilot (19 weeks) and Perplexity (22 weeks), but significantly slower than ChatGPT (3 weeks).</p><h2 class="heading" style="text-align:left;" id="advancements-in-ai-research">Advancements in AI Research</h2><p class="paragraph" style="text-align:left;">Google not only showed that 7B models can be run in browsers, but they also published an interesting paper that made notable progress in <b>black-box optimization</b>. Moreover, another paper looking at the impact of code data in LLM pre-training also caught our eye.</p><p class="paragraph" style="text-align:left;">We saw notable advancements in automated theorem proving, efficient model upcycling, and novel frameworks for <b>evaluating AI systems.</b></p><h3 class="heading" style="text-align:left;" id="googles-vizier-algorithm-outperform">Google&#39;s Vizier Algorithm Outperforms Industry Baselines in Black-Box Optimization</h3><div class="image"><img alt="" class="image__image" style="border-radius:0px 0px 0px 0px;border-style:solid;border-width:0px 0px 0px 0px;box-sizing:border-box;border-color:#E5E7EB;" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXdS-3GsbKUuP6cTrQd6Rq--pjlpv5vLmx6Q5tC1QxSHIlVDjkd27nQXo5pNX93QJ538WxZVsoSMfN58awgoC5EeP70JkDLGB7_qCttuEbuKCWY797DFPhhV_UZqJbWHGNUpOYHogm9oR7W69flfuN4wi2s?key=c8FDumnflB67qhwj04PXMQ"/><div class="image__source"><span class="image__source_text"><p>Main components of the algorithm. <a class="link" href="https://arxiv.org/pdf/2408.11527v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Researchers from Google DeepMind have <a class="link" href="https://arxiv.org/pdf/2408.11527v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">formalized and open-sourced</a> the algorithm behind Google Vizier, one of the <b>world&#39;s largest black-box optimization services. </b></p><p class="paragraph" style="text-align:left;">It looks at the challenge of creating a robust, versatile, and production-grade Bayesian optimization system that can handle a wide range of <b>optimization scenarios</b>, from high-dimensional spaces to categorical parameters and multi-objective optimization.</p><p class="paragraph" style="text-align:left;">The algorithm employs a <b>Gaussian process bandit optimization approach</b> with several key innovations, including sophisticated input and output preprocessing, flexible acquisition functions with trust regions, and a customized Firefly algorithm for acquisition optimization. </p><p class="paragraph" style="text-align:left;">DeepMinds’s Vizier algorithm consistently outperforms other industry-wide baselines across multiple axes, including non-continuous parameters, high-dimensional spaces, batched settings, and multi-metric objectives. For example, it demonstrates up to <b>8.2% relative improvement</b> in natural language reasoning tasks compared to text-only pre-training.</p><h3 class="heading" style="text-align:left;" id="code-in-llm-pretraining-improves-na">Code in LLM Pre-training Improves Natural Language Reasoning by 8.2%</h3><div class="image"><img alt="" class="image__image" style="border-radius:0px 0px 0px 0px;border-style:solid;border-width:0px 0px 0px 0px;box-sizing:border-box;border-color:#E5E7EB;" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXcWo-Vn4C7u6FqZk2fw3bMBEpEuaEiRmBvoD1lGgU_h_boMj4mtoyT0nhNA8C7LZEUT-1cx38yf_DAgqUwAtC5VZIrksZGVHjI19neTi03wDxR9WIkW5aQ8YILDsz99aUfLmAp5FEgR9GpDiERZXJrOw0xE?key=c8FDumnflB67qhwj04PXMQ"/><div class="image__source"><span class="image__source_text"><p>Framework overview. <a class="link" href="https://arxiv.org/pdf/2408.10914?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Cohere researchers conducted a comprehensive study to understand the impact of code data in <b>pre-training LLMs</b> on a variety of downstream tasks beyond code generation.</p><p class="paragraph" style="text-align:left;">The team employed a <b>systematic approach,</b> conducting extensive ablations across various dimensions:</p><ol start="1"><li><p class="paragraph" style="text-align:left;">Model initialization strategies</p></li><li><p class="paragraph" style="text-align:left;">Different proportions of code data</p></li><li><p class="paragraph" style="text-align:left;">Quality and properties of code data</p></li><li><p class="paragraph" style="text-align:left;">Role of code in pre-training cooldown</p></li></ol><p class="paragraph" style="text-align:left;">Their experiments spanned models ranging from <b>470 million to 2.8 billion </b>parameters, evaluating performance on natural language reasoning, world knowledge, code generation, and open-ended text generation tasks.</p><p class="paragraph" style="text-align:left;">Key findings include:</p><ul><li><p class="paragraph" style="text-align:left;">Compared to text-only pre-training, the best variant with code data showed relative increases of 8.2% in natural language reasoning, 4.2% in world knowledge, and a 6.6% improvement in generative win-rates</p></li><li><p class="paragraph" style="text-align:left;">Code performance saw a dramatic 12x boost</p></li><li><p class="paragraph" style="text-align:left;">Including code during the cooldown phase led to further improvements across all tasks</p></li><li><p class="paragraph" style="text-align:left;">High-quality synthetic code data, even in small proportions, had a strong positive impact on both code and non-code task performance</p></li></ul><p class="paragraph" style="text-align:left;">Results make it clear that code is a<b> critical building block</b> for generalization far beyond coding tasks. Investments in code quality and preserving code during pre-training can have positive impacts across a wide range of AI capabilities.</p><h3 class="heading" style="text-align:left;" id="automating-the-full-cycle-of-ml-res">Automating the Full Cycle of ML Research With The AI Scientist</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXd-CNzdq3FK1473ytmoXmSaySa2QGsm_UXLzfROds66TPs8eYo9HhyaPn1tgJFTaVsBzFZvgVHjNzYmj0IMvYvd0YDaDoIuMUVL_npK6G_enmAhVGyk1pgaoeNTZqirmztTs072IxG8E2cd6Zn0YySAZTwQ?key=ABXGbecGfkiQdR3IaqjByw"/><div class="image__source"><span class="image__source_text"><p>The AI Scientist overview. <a class="link" href="https://arxiv.org/pdf/2408.06292v2?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Sakana AI developed The AI Scientist, which aims to automate the entire scientific discovery process in ML, from idea generation to paper writing and peer review. It addresses the challenge of scaling up scientific research and democratizing access to cutting-edge AI developments by <b>leveraging LLM</b> to perform tasks traditionally done by human researchers.</p><p class="paragraph" style="text-align:left;">The AI Scientist uses a <b>combination of LLM-based agents</b> for idea generation, experiment design and execution, paper writing, and peer review. It employs techniques such as chain-of-thought reasoning, self-reflection, and automated code generation to carry out complex research tasks. The system was tested on three ML subfields: diffusion modeling, transformer-based language modeling, and learning dynamics.</p><p class="paragraph" style="text-align:left;">Results show that The AI Scientist can generate <b>hundreds of research papers</b> at a surprisingly low cost (approximately $15 per paper), with some papers achieving scores that exceed the acceptance threshold for top ML conferences according to an automated reviewer. The framework demonstrates the potential for AI to significantly accelerate scientific progress and lower barriers to entry in AI research.</p><h3 class="heading" style="text-align:left;" id="boosting-llm-decision-making-in-int">Boosting LLM Decision-Making in Interactive Environments Using AgentQ</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXed4yJqjUqBbxiPH--CgwzGS8BzIvr9BORmoC3WFmPapHNDVCenkXsBW6EHB10DrEennoSBeQE7dmZYtXGW7hGxm8r36FVdeyuF5S6XVFJhla__yMw4HYMGvgTEoMl-Qm2DFx3Rwh8SWej8dg7upm1Tacit?key=ABXGbecGfkiQdR3IaqjByw"/><div class="image__source"><span class="image__source_text"><p>Example of an input and output to the Agent. <a class="link" href="https://arxiv.org/pdf/2408.06292v2?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">MultiOn and Stanford University researchers introduced Agent Q - a framework that aims to improve the reasoning and decision-making capabilities of LLM in interactive, <b>multi-step environments like web navigation.</b> </p><p class="paragraph" style="text-align:left;">It tackles the issue of generalizing LLMs to <b>agentic tasks</b>, where they need to understand how their actions affect the environment and make complex decisions over multiple steps.</p><p class="paragraph" style="text-align:left;">Agent Q combines guided Monte Carlo Tree Search (MCTS) with a <b>self-critique mechanism</b> and iterative fine-tuning using an off-policy variant of Direct Preference Optimization (DPO). The framework uses AI feedback and self-criticism to guide search steps, and learns from both successful and unsuccessful trajectories through offline reinforcement learning.</p><p class="paragraph" style="text-align:left;">Drastic improvements were seen on the WebShop benchmark and real-world booking scenarios. For example, it boosts a Llama-3 70B model&#39;s zero-shot performance from 18.6% to 81.7% success rate on a <b>real-world reservations booking website</b> after a single day of data collection, and further to 95.4%.</p><p class="paragraph" style="text-align:left;">It’s worth noting that the approach used by Agent Q was also used by <b>Salesforce </b>to achieve 55% on SWE-Bench Lite - a benchmark used to test how well the AI model can solve Github issues, with the Lite version being a slightly easier benchmark than the original SWE-Bench.</p><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXc70Co8UeBe-1ByDFkGgTTz-vmvkiJQtXM1thOmjAbwUcWnwhaAjNvMed0SPJ3VrxAD87m2RK2wSvm0hCbFLmIKd727Mt3JYOakaxjN3hiVwrx7uEjpM1Qpu3T0UWtrzdtQNVT-EJWu6pPIBjzyzDQ2g-A?key=ABXGbecGfkiQdR3IaqjByw"/><div class="image__source"><span class="image__source_text"><p>Salesforce also used a similar approach. <a class="link" href="https://x.com/rm_rafailov/status/1823892301837746623?s=46&t=r_IyUhjxHPp6D-O3kuDbzQ&utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><h3 class="heading" style="text-align:left;" id="deep-seek-advances-automated-math-r">DeepSeek Advances Automated Math Reasoning</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXdejNO9sBc7NU3tv94nCXz_1zRa3_Cwxum50s1oNnCu620ybQ0uriOKkFcmTwI7L3Yc3fEkPfJjX8QP7DfnOkn79lP5uUW6n9zYxg3pAtqctezFCkcjjwknuvKT2_75eLVnT5dSemHOdp8JAg3Qm_UvjpEN?key=ABXGbecGfkiQdR3IaqjByw"/><div class="image__source"><span class="image__source_text"><p>DeepSeek-Prover-V1.5 framework overview. <a class="link" href="https://arxiv.org/pdf/2408.08152?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">DeepSeek-Prover-V1.5 is an open-source language model designed for theorem proving in Lean 4, since automating complex mathematical reasoning is a tricky area for language models to handle. This model is a step up from its predecessor by <b>optimizing both training and inference processes</b>, aiming to improve performance on formal theorem proving tasks.</p><p class="paragraph" style="text-align:left;">Some of the key components include:</p><ul><li><p class="paragraph" style="text-align:left;">Large-scale mathematical pre-training</p></li><li><p class="paragraph" style="text-align:left;">Formal mathematics corpus construction and augmentation</p></li><li><p class="paragraph" style="text-align:left;">Online reinforcement learning from proof assistant feedback</p></li><li><p class="paragraph" style="text-align:left;">Novel Monte-Carlo tree search methodology for long-term planning in theorem proving </p></li></ul><p class="paragraph" style="text-align:left;">DeepSeek-Prover-V1.5 uses a combination of supervised fine-tuning, reinforcement learning, and a new variant of Monte-Carlo tree search called RMaxTS, which employs an<b> intrinsic-reward-driven exploration strategy</b> to generate diverse proof paths.</p><p class="paragraph" style="text-align:left;">Results show significant improvements over DeepSeek-Prover-V1, achieving SOTA results on the test set of the <b>high school level miniF2F benchmark</b> (63.5%) and the undergraduate level ProofNet benchmark (25.3%).</p><h3 class="heading" style="text-align:left;" id="efficient-upcycling-of-dense-models">Efficient Upcycling of Dense Models Into MoE With BAM</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXd_ci7sMX7o1-4DthQD5xXpOVrq4DszXzcde9TX3SghnPmFzppBVx0RfPF9XxpWD0iy51blD8ZPeNj_JybIUZav55ut3MCq7QlhiXvNCemFIpQaCXyoFhmgbWcaVOQx2fSMOL59LAyzg7b97XPFwFxeIJFO?key=ABXGbecGfkiQdR3IaqjByw"/><div class="image__source"><span class="image__source_text"><p>The three phases of BAM. <a class="link" href="https://arxiv.org/pdf/2408.08274?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">BAM (Branch-Attend-Mix) is a new approach to efficiently upcycle pre-trained dense models into Mixture of Experts (MoE) models. Initializing MoEs is tough since they’re <b>computationally expensive</b> to train from scratch, so BAM makes full use of specialized dense models&#39; parameters to tackle this problem.</p><p class="paragraph" style="text-align:left;">BAM operates in three phases: </p><ol start="1"><li><p class="paragraph" style="text-align:left;">Branching (creating copies of a pre-trained seed model)</p></li><li><p class="paragraph" style="text-align:left;">Continued pre-training (specializing each copy on different domains)</p></li><li><p class="paragraph" style="text-align:left;">Mixture model training (initializing MoE layers using the specialized models</p></li></ol><p class="paragraph" style="text-align:left;">It introduces a soft-variant of Mixture of Attention (MoA) layers and employs a <b>parallel attention transformer architecture </b>to improve efficiency.</p><p class="paragraph" style="text-align:left;">BAM consistently outperforms baseline methods in both perplexity and downstream task performance across various domains, with experiments conducted on seed models ranging from <b>590 million to 2 billion parameters</b>. It’s a big step forward in MoE initialization, so we might see more efficient training of large-scale language models with superior performance in the future.</p><h2 class="heading" style="text-align:left;" id="frameworks-we-love">Frameworks We Love</h2><p class="paragraph" style="text-align:left;">Some frameworks that caught our attention include:</p><ul><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2408.12601?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">DreamCinema</a>: Simplifies film creation by using generative AI to automate the production of 3D characters and cinematic elements</p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2408.12579?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">RuleAlign</a>: Enhances LLMs like GPT-4 for medical diagnostics by aligning them with specific diagnostic rules</p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2408.12525?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">PCGRL+</a>: Designed to train AI agents to generate game levels based on specific quality metrics.</p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2405.14755?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">SigLLM</a>: Uses large language models for time series anomaly detection by converting time series data to text</p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://ewrfcas.github.io/MVInpainter/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">MVInpainter</a>: Reformulates 3D editing as a multi-view 2D inpainting task, enabling novel view synthesis and editing for in-the-wild scenes without relying on explicit camera poses. </p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2408.08067?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">RAGchecker</a>: comprehensive evaluation framework for Retrieval-Augmented Generation (RAG) systems that incorporates diagnostic metrics for both retrieval and generation modules. </p></li></ul><p class="paragraph" style="text-align:left;">If you want your framework to be featured here, reply to this email and say hi :) </p><h2 class="heading" style="text-align:left;" id="conversations-we-loved">Conversations We Loved</h2><p class="paragraph" style="text-align:left;">Wolfram’s <b>highly detailed discussio</b>n about his perspective on ML took the spotlight. Although there was a big focus on the theoretical aspects, it’s still useful to consider as it can provide a different perspective for practical applications of ML. </p><p class="paragraph" style="text-align:left;">While the strange Strawberry account was gaining all that attention, there was a pretty notable advancement that slipped under the radar with rStar. Moreover, Y Combinator CEO let us AI startups in on a little secret on how they can quickly build trust with customers using <b>golden magic demos.</b></p><h3 class="heading" style="text-align:left;" id="from-randomness-to-intelligence-wol">From Randomness to Intelligence: Wolfram&#39;s New Perspective on ML</h3><div class="image"><img alt="" class="image__image" style="border-radius:0px 0px 0px 0px;border-style:solid;border-width:0px 0px 0px 0px;box-sizing:border-box;border-color:#E5E7EB;" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXeK8g8MlU1N6_seKICsLU-jKWvkHqKAG9SRzI6V5t_76zZJvZQVRDamneS7JDisGFhy5lb-67fd_yINK73w4n149dm22izmkdJXrb8q3OqRh0cyEiK2iUkBQIGFxVp3T9dn9cvYn_Zf3ni7LRwGf3lTPgkF?key=c8FDumnflB67qhwj04PXMQ"/><div class="image__source"><span class="image__source_text"><p>Wolfram’s discussion about machine learning. <a class="link" href="https://x.com/stephen_wolfram/status/1826692234554875979?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">A recent blog post by Stephen Wolfram caught our attention this week, offering intriguing insights into the<b> fundamental workings of machine learning systems. </b>Wolfram explores minimal models that capture the essence of machine learning, stripping away complexities to reveal core principles.</p><p class="paragraph" style="text-align:left;">Wolfram introduces s<b>imple, visualizable</b> models like &quot;rule arrays&quot; that can perform machine learning tasks. These models suggest that machine learning works by &quot;sampling&quot; from the computational universe rather than building structured mechanisms. </p><p class="paragraph" style="text-align:left;">He also argues that the power of machine learning comes from leveraging computational irreducibility as a &quot;natural resource.&quot; Moreover, Wolfram mentions that we’re at a point where we can achieve notable results with machine learning techniques like neural networks, but we don’t truly understand how we’re able to get such results. </p><h3 class="heading" style="text-align:left;" id="the-overlooked-ai-reasoning-breakth">The Overlooked AI Reasoning Breakthrough Amid Strawberry Hype</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXe5O43rx9OUlj0yHg2GXoBH09yNv6to8-p8x9cwLxFeH7yaQiWHu9R2feHvuxLEplbU5WCMBYCUi1OoEJSXqJDphUq-dj96iVGeCwVo3bm236XNqQ4d3JUTbwgBA1wgzvU5fuiOBhlhji4kxOXqCuKcln2e?key=ABXGbecGfkiQdR3IaqjByw"/><div class="image__source"><span class="image__source_text"><p>Tekparmak brought light to an overlooked advancement. <a class="link" href="https://x.com/AtakanTekparmak/status/1823776878747877572?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">A thread by Tekparmak caught our attention amidst all the Strawberry hype discussing a new approach called rStar. This method uses a <b>generator LLM</b> to create solution trajectories and a discriminator LLM for &quot;peer review,&quot; reminiscent of the GAN architecture.</p><p class="paragraph" style="text-align:left;">rStar has demonstrated superior performance among multi-round self-improving approaches for AI reasoning tasks. This means that compared to other methods that use <b>multiple iterations</b> or rounds to improve their performance, rStar is currently achieving the best results.</p><p class="paragraph" style="text-align:left;">Interestingly, base model generators perform well with instruct discriminators, and the gap between GPT-4 and <b>smaller open-weight models</b> like Phi-3 mini is not as significant as one might expect.</p><p class="paragraph" style="text-align:left;">While breakthroughs like OpenAI&#39;s Project Strawberry generate buzz, there&#39;s still ample room for innovative approaches using<b> existing models and techniques.</b></p><h3 class="heading" style="text-align:left;" id="how-ai-startups-can-win-with-powerf">How AI Startups Can Win With Powerful Demos</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXdiYrQB1zWJ-ZrYL7u7L_g04xAdAWB8K_1HM5cYNqBK3JqHMLzcqSjSZGdgPEMJWdi95rIQS10vKAiBuO_j_GYlwDng7xQepY3kVh2kt7yR-COkvjEXv1B3I6E7FJqCegpP2mqNRRF6h5u03c5ULMywMJs?key=ABXGbecGfkiQdR3IaqjByw"/><div class="image__source"><span class="image__source_text"><p>Tan’s discussion about the power of a golden magic demo. <a class="link" href="https://x.com/garrytan/status/1823437129323868484?s=46&t=r_IyUhjxHPp6D-O3kuDbzQ&utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">(Source) </a></p></span></div></div><p class="paragraph" style="text-align:left;">Y Combinator CEO Garry Tan raised an interesting point about just how important a golden magic demo is for<b> AI startups</b>. But why is that?</p><p class="paragraph" style="text-align:left;">He mentions that a golden magic demo shows the benefit and value of the product right away, indicating how they could accomplish days worth of work in just 10 minutes. Repetitive tasks are a massive pain point customers face, so showing how AI can automate these tasks and <b>boost productivity</b> is a great way for startups to immediately show their solution can solve an important problem.</p><p class="paragraph" style="text-align:left;">Casetext was used to highlight this point, with the example of AI being able to quickly detect nuances in emails for lawyers to use as evidence for potential fraud. Their golden magic demo showed how<b> lawyers could save a ton of time</b> with AI, so they could quickly see the power of Casetext’s solution by the end of the demo.</p><p class="paragraph" style="text-align:left;">Tan also mentioned that a successful golden magic demo is something that Y Combinator sees in a lot of successful LLM-based startups, which makes it pretty <b>encouraging</b> for other startups to do the same.</p><h2 class="heading" style="text-align:left;" id="money-moving-in-ai">Money Moving in AI</h2><p class="paragraph" style="text-align:left;">We saw the success of three funding rounds for Story, Cursor, and Defcon - all of which were under <b>$100 million</b>. Interestingly, each company focuses on pretty different applications in AI: Story in blockchain, Cursor in coding, and Defcon in military logistics.</p><p class="paragraph" style="text-align:left;">Okpey and Elise AI also had successful funding rounds, with Opkey raising $47 billion and Elise AI raising $75 million. On the other hand, AMD decided to challenge Nvidia’s chip dominance by acquiring <b>ZT systems for $5 billion.</b></p><h3 class="heading" style="text-align:left;" id="amd-acquires-zt-systems-for-5-b-to-">AMD Acquires ZT Systems for $5 Billion to Challenge Nvidia</h3><p class="paragraph" style="text-align:left;">AMD has made a bold move in the AI chip market by agreeing to acquire ZT Systems, a New Jersey-based server maker, for nearly <a class="link" href="https://www.linkedin.com/news/story/amd-deal-takes-aim-at-nvidia-6125252/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">$5 billion</a> in cash and stock. This acquisition, which combines AMD&#39;s silicon and software capabilities with ZT&#39;s systems expertise, aims to accelerate the deployment of AMD-optimized data center AI solutions at <b>scale for cloud and enterprise customers.</b></p><h3 class="heading" style="text-align:left;" id="story-raises-80-million-in-series-b">Story Raises $80 Million in Series B Funding Round</h3><p class="paragraph" style="text-align:left;"><a class="link" href="https://techcrunch.com/2024/08/21/story-raises-83m-at-a-2-25b-valuation-to-build-a-blockchain-for-the-business-of-content-ip-in-the-age-of-ai/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">Story</a>, a startup building a <b>blockchain-based</b> platform for IP tracking and monetization in the age of AI, has secured $80 million in Series B funding led by Andreessen Horowitz&#39;s crypto division, with participation from Polychain Capital and other notable investors. The round values Story at $2.25 billion post-money and brings its total funding to $143 million. </p><h3 class="heading" style="text-align:left;" id="elise-ai-raises-75-million-in-serie">Elise AI Raises $75 Million in Series D Funding</h3><p class="paragraph" style="text-align:left;">EliseAI, a startup developing AI-powered property management tools, has secured a <a class="link" href="https://techcrunch.com/2024/08/14/eliseais-chatbots-for-property-owners-nets-it-75m-in-funding/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">$75 million </a>Series D funding round, valuing the company at $1 billion. Led by Sapphire Ventures with participation from Point72 Private Investments, Divco West, Navitas Capital, and Koch Real Estate Investments, this investment brings EliseAI&#39;s total funding to <b>$140 million.</b></p><h3 class="heading" style="text-align:left;" id="cursor-secures-60-million-in-series">Cursor Secures $60 Million in Series A Funding</h3><p class="paragraph" style="text-align:left;"><a class="link" href="https://www.cursor.com/blog/series-a?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">Cursor</a>, an AI-powered <b>coding</b> tool startup, has raised $60 million in Series A funding from prominent investors including Andreessen Horowitz, Thrive Capital, OpenAI, and notable tech founders. The company, which aims to create a &quot;magical tool&quot; for writing the world&#39;s software, has grown to over 30,000 customers across major enterprises, research labs, and startups. </p><p class="paragraph" style="text-align:left;">Cursor has even started to go <b>viral on X</b> off a video showing how an 8 year old can use it to start coding.</p><div class="image"><img alt="" class="image__image" style="border-radius:0px 0px 0px 0px;border-style:solid;border-width:0px 0px 0px 0px;box-sizing:border-box;border-color:#E5E7EB;" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXeFZZtSIoX8TkskhsfArIh7TD6sSayTMXORNV5mISCQyFwQ_yPGpUq8NUBQvGAAMtTURwQzw0m-qlnInkevV4W8gg5u8ly2uFXE-GeKYHTNgSEwD7p0kJr2w_LwX8WRXh8zjjEiOJfbJeG52JHd86YhEeE?key=c8FDumnflB67qhwj04PXMQ"/><div class="image__source"><span class="image__source_text"><p>Cursor is making coding more accessible. <a class="link" href="https://x.com/rickyrobinett/status/1825581674870055189?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">(Source</a></p></span></div></div><h3 class="heading" style="text-align:left;" id="opkey-raises-47-million-in-series-b">Opkey Raises $47 Million in Series B Funding</h3><p class="paragraph" style="text-align:left;">Opkey, an AI-powered continuous test automation platform for enterprise systems, has secured <a class="link" href="https://www.business-standard.com/companies/news/opkey-raises-47-million-in-series-b-funding-led-by-peakspan-capital-124082200870_1.html?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=august-in-ai-grok-2-gpt-4-turbo-new-sota-for-text-to-image-video-gen" target="_blank" rel="noopener noreferrer nofollow">$47 million in Series B</a> funding led by PeakSpan Capital, with participation from existing investors. The funding will fuel Opkey&#39;s mission to  accelerate product development and <b>expand its global market presence.</b></p></div><div class='beehiiv__footer'><br class='beehiiv__footer__break'><hr class='beehiiv__footer__line'><a target="_blank" class="beehiiv__footer_link" style="text-align: center;" href="https://www.beehiiv.com/?utm_campaign=543fc6de-db2e-4f1c-86c5-2f0461c97949&utm_medium=post_rss&utm_source=genai360_weekly_ai_news">Powered by beehiiv</a></div></div>
  ]]></content:encoded>
</item>

      <item>
  <title>🍓 in Fall, Magic&#39;s 100M Context, Alexa + Claude = ❤️</title>
  <description>Plus, Announcing RetrieveX Conference Tickets &amp; Discount</description>
      <enclosure url="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/1b3edfa4-c938-4baa-84ee-e7b00b5813b4/mikayelh_3d_isometric_illustration_of_white_and_orange_cubes__f852dc6a-915e-45db-b701-312725ac81c0_1__2_.png" length="967789" type="image/png"/>
  <link>https://genai360.beehiiv.com/p/strawberry-conf</link>
  <guid isPermaLink="true">https://genai360.beehiiv.com/p/strawberry-conf</guid>
  <pubDate>Tue, 03 Sep 2024 16:33:07 +0000</pubDate>
  <atom:published>2024-09-03T16:33:07Z</atom:published>
  <content:encoded><![CDATA[
    <div class='beehiiv'><style>
  .bh__table, .bh__table_header, .bh__table_cell { border: 1px solid #C0C0C0; }
  .bh__table_cell { padding: 5px; background-color: #FFFFFF; }
  .bh__table_cell p { color: #2D2D2D; font-family: 'Helvetica',Arial,sans-serif !important; overflow-wrap: break-word; }
  .bh__table_header { padding: 5px; background-color:#F1F1F1; }
  .bh__table_header p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important; overflow-wrap: break-word; }
</style><div class='beehiiv__body'><p class="paragraph" style="text-align:left;">Before we start, share last week&#39;s news with a friend or a colleague:</p><h2 class="heading" style="text-align:left;" id="key-takeaways"><sup>Key Takeaways</sup></h2><ul><li><p class="paragraph" style="text-align:left;">OpenAI&#39;s Project Strawberry, set for potential release in fall 2024, aims to drastically <b>advance AI reasoning capabilities </b>and address current limitations of GPT-4, including complex multi-step problems and hallucinations.</p></li><li><p class="paragraph" style="text-align:left;">Magic developed <b>LTM</b> (Long-Term Memory) models capable of reasoning on up to 100M tokens of context during inference, equivalent to about 10 million lines of code or 750 novels.</p></li><li><p class="paragraph" style="text-align:left;">Jina AI revealed a &quot;<b>modality gap&quot; </b>in multimodal AI models, where text embeddings and image embeddings cluster in separate parts of the semantic space.</p></li><li><p class="paragraph" style="text-align:left;">Nvidia released <b>Eagle</b>, a family of vision-centric high-resolution multimodal LLMs that uses a channel-concatenation-based fusion and showed impressive performance on various multimodal benchmarks like GQA and MMMU.</p></li><li><p class="paragraph" style="text-align:left;">Google DeepMind used <b>weaker but cheaper </b>models for generating synthetic training data, achieving up to 31.6% relative gains compared to strong but expensive (SE) models<sub>.</sub></p></li></ul><p class="paragraph" style="text-align:left;"><i>Got forwarded this newsletter? Subscribe below👇</i></p><div class="button" style="text-align:left;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="background-color:#ff8a00;" href="https://genai360.beehiiv.com/subscribe?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude"><span class="button__text" style=""> Subscribe </span></a></div><div class="section" style="background-color:#FFFFFF;border-color:#ff8a00;border-radius:2px;border-style:solid;border-width:2px;margin:10.0px 10.0px 10.0px 10.0px;padding:10.0px 10.0px 10.0px 10.0px;"><h2 class="heading" style="text-align:left;"><span style="color:#222222;">Announcing RetrieveX Conference on Oct 17 in San Francisco. 25% OFF For the Next 3 Days</span></h2><p class="paragraph" style="text-align:left;"><span style="color:#222222;">Executive in GenAI? Join RetrieveX, the top conference in retrieval for GenAI. Exclusively for those building high-accuracy, multimodal workflows, featuring leaders from Microsoft AI, YC, Bayer Radiology, Matterport, Cresta, as well as the creators of PyTorch and KubeFlow.</span></p><p class="paragraph" style="text-align:left;"><span style="color:#222222;">Checkout with promo code </span><span style="color:#222222;"><b>LABORDAY25</b></span><span style="color:#222222;"> for 25% off (valid for the next three days). Prices are going up by end of week, so secure your spot sooner rather than latter.</span></p><div class="button" style="text-align:center;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="background-color:#ff8a00;" href="https://www.retrievex.co/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude"><span class="button__text" style="color:#F9FAFB;"> Get Tickets Today </span></a></div><p class="paragraph" style="text-align:left;"><span style="color:#222222;">Date: October 17, 10:30am - 7pm PT</span><br><span style="color:#222222;">Venue: The Midway, 900 Marin St, San Francisco</span><br><span style="color:#222222;">Attendees: 300 AI executives</span></p></div><h2 class="heading" style="text-align:left;" id="the-latest-ai-news"><sub>The Latest AI News</sub></h2><p class="paragraph" style="text-align:left;">There were quite a few model releases, ranging from <b>SLMs to text-to-image</b> models. Moreover, a chatbot arena update saw Grok-2 ranked very highly, which means xAI might take the top spot very soon.</p><p class="paragraph" style="text-align:left;">Not to mention that we heard some news about Project Strawberry and that OpenAI is struggling to keep up with expenses (despite having <a class="link" href="https://www.reuters.com/technology/artificial-intelligence/openai-says-chatgpts-weekly-users-have-grown-200-million-2024-08-29/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">200 million users)</a> due to inference costs. There are currently talks for a new funding round that would put them at over <b>$100 billion valuation.</b></p><p class="paragraph" style="text-align:left;">Moreover, OpenAI and Apple are set to collaborate for<a class="link" href="https://openai.com/index/openai-and-apple-announce-partnership/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow"> Siri (but not in Europe),</a> and Amazon is planning a similar play with Anthropic by releasing a (subscription) version of Alexa that is powered by <b>Claude</b>. <b>AI is the new streaming, folks.</b></p><h3 class="heading" style="text-align:left;" id="open-a-is-project-strawberry-might-">OpenAI’s Project Strawberry Might Release in Fall, Struggles With Expenses Despite Potential Funding Round, and California Bill SB 1047 Passed</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfF6rXdb1sAhYT60g0bHFFyxQl1JkVGQZz76_WHzQ1kocTedZxyKCSUyJStvE6WbcuTiIqSOMa9SDfWxKAuq7uJF_ThpFXRaZ-E-ygomNSFIyPJn-UlFCKq-2seym_3PsXzQZMnu3EvmZFlQG9Uvxy60WmX?key=_xEjQoqWJ8ujpFPME8rVWw"/><div class="image__source"><span class="image__source_text"><p>We might see Project Strawberry release relatively soon.<a class="link" href="https://www.newsweek.com/opeanai-project-strawberry-chatgpt-next-gen-ai-model-1945977?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow"> (Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Project Strawberry is reportedly set for a <a class="link" href="https://www.newsweek.com/opeanai-project-strawberry-chatgpt-next-gen-ai-model-1945977?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">potential release in fall 2024.</a> As we talked about before, Strawberry has been rumored to drastically advance AI reasoning capabilities and even address the <b>current limitations</b> of <a class="link" href="https://www.reuters.com/technology/artificial-intelligence/openai-says-chatgpts-weekly-users-have-grown-200-million-2024-08-29/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">GPT-4. </a></p><p class="paragraph" style="text-align:left;">This includes issues like complex, multi-step problems and hallucinations that have posed problems for <b>GPT-4 in the past. </b></p><p class="paragraph" style="text-align:left;">OpenAI needs <span style="text-decoration:line-through;">Watermelon</span> Strawberry sugar, badly.<br><br>With 200 million MAUs, OpenAI faces inferencing costs. In 2024, the company is projected to incur expenses of <a class="link" href="https://www.theinformation.com/articles/why-openai-could-lose-5-billion-this-year?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">approximately $8.5 billion</a>. This is largely thanks to Microsoft&#39;s pricing structure, which charges OpenAI about $10.30 per hour for an eight-GPU server, compared to higher public rates (this still is… $4B). Simultaneously, <b>new models are creeping up in terms of accuracy (hello, 350M+ lifetime downloads of Llama 3.1)</b>. Hence, OpenAI needs just to put out something impressive that would blow out the competition out of water, OR raise another funding round to cover the costs of compute.<br><br>I guess, Sam Altman plans to do both, with the <b>next funding round </b>might put them at <a class="link" href="https://techcrunch.com/2024/08/29/apple-and-nvidia-could-be-openais-next-big-investors/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">$100 billion</a>, with Nvidia and Apple in talks to be a part of it.</p><h2 class="heading" style="text-align:left;" id="california-bill-sb-1047-ab-3211">California Bill SB 1047 & AB 3211</h2><p class="paragraph" style="text-align:left;">Meanwhile, California bill <a class="link" href="https://techcrunch.com/2024/08/15/california-weakens-bill-to-prevent-ai-disasters-before-final-vote-taking-advice-from-anthropic/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">SB 1047 was passed.</a> It’s a bill designed to prevent “disasters” caused by <b>AI systems before they occur</b>. This refers to serious events that would cause global issues, like using AI to cause a cyberattack that would result in more than $500 million in damages.</p><p class="paragraph" style="text-align:left;">Although, it wouldn’t apply to every AI model - only the ones that are considered <b>large enough </b>like GPT-4. <a class="link" href="https://techcrunch.com/2024/08/26/elon-musk-unexpectedly-offers-support-for-californias-ai-bill/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">The bill was supported by Musk,</a> who has been vocal about how the potential dangers of AI need to be monitored, despite the fact that xAI would also be affected by the bill’s requirements.</p><p class="paragraph" style="text-align:left;">Additionally, OpenAI, Adobe, and Microsoft have expressed support for another California bill called <a class="link" href="https://techcrunch.com/2024/08/26/openai-adobe-microsoft-support-california-bill-requiring-watermarks-on-ai-content/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">AB 3211</a>. It requires tech companies to <b>label AI-generated </b>content. This support is evidenced by letters from the companies viewed by TechCrunch, marking a shift from their previous opposition.</p><p class="paragraph" style="text-align:left;">AB 3211 mandates watermarks in the metadata of AI-generated photos, videos, and audio clips. While many AI companies already implement this practice, the bill goes further by requiring<b> large online platforms</b> like Instagram or X to label AI-generated content in a way that’s easily understandable to average viewers.</p><h3 class="heading" style="text-align:left;" id="magics-100-million-context-window-a">Magic’s 100 Million Context Window and Nvidia’s Eagle</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXe9hXp5zgIdl0KUPh-M-SSHxPzXadS9BOtW1rf_7DGGa208rlz4gnQ8tHAsqju_-bxTHTgUxrt2U0VGOqGRDFwLkH6AEAhgMoxKZ5yuRRghYKooz7wolmDY0zaEv4y0R9tkv8Bo61Wnn2t5v6uYmy-3zes?key=_xEjQoqWJ8ujpFPME8rVWw"/><div class="image__source"><span class="image__source_text"><p>Current long context evals like Needle In A Haystack have various limitations. <a class="link" href="https://magic.dev/blog/100m-token-context-windows?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Magic has <a class="link" href="https://magic.dev/blog/100m-token-context-windows?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">developed LTM (Long-Term Memory) </a>models capable of reasoning on up to 100M tokens of context during inference, equivalent to about <b>10 million lines of code or 750 novels</b>. Their LTM-2-mini model is significantly more efficient in computation and memory usage compared to traditional models like Llama 3.1 405B.</p><p class="paragraph" style="text-align:left;">They also introduced <b>HashHop</b>, a new evaluation method for long-context models that eliminates semantic hints and requires models to store and retrieve maximum information content, addressing flaws in current evaluation techniques like Needle In A Haystack.</p><p class="paragraph" style="text-align:left;"><a class="link" href="https://github.com/NVlabs/Eagle?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">Eagle </a>was another release we saw - a family of vision-centric high-resolution multimodal LLMs that uses a channel-concatenation-based fusion. The models support input resolutions up to over <b>1000 pixels </b>and perform strongly on multimodal benchmarks, especially resolution-sensitive tasks like OCR and document understanding.</p><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfKM9bOwHAM4vF8DwgEJrOW_z0fBcWK60-cWr2ojIoAU2uWLuWqUqKr9ODaRyw3DcTH-lFe7hiRrCT7XNXfaFHTiEDEZPWzr6_8AUeEtRmX77pNOf2gSAcahEMgsrhqnXOEQkJHwBd5wANTjaGfDAThy8g?key=_xEjQoqWJ8ujpFPME8rVWw"/></div><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfKM9bOwHAM4vF8DwgEJrOW_z0fBcWK60-cWr2ojIoAU2uWLuWqUqKr9ODaRyw3DcTH-lFe7hiRrCT7XNXfaFHTiEDEZPWzr6_8AUeEtRmX77pNOf2gSAcahEMgsrhqnXOEQkJHwBd5wANTjaGfDAThy8g?key=_xEjQoqWJ8ujpFPME8rVWw"/><div class="image__source"><span class="image__source_text"><p>Eagle showed impressive performance on multimodal benchmarks. <a class="link" href="https://github.com/NVlabs/EAGLE/blob/main/assets/fig-teaser.jpg?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">(Source) </a></p></span></div></div><p class="paragraph" style="text-align:left;">The Eagle family includes multiple variants such as Eagle-X4 and Eagle-X5 models with 7B and 13B parameters, built on <b>Vicuna language models</b> and LLaVA-v1.5 pretraining. </p><p class="paragraph" style="text-align:left;">The project is actively developing, with plans for models trained on larger and more diverse datasets, evaluation code, and vision encoder model weights with pre-alignment. An <b>online demo </b>of Eagle-X5-13B-Chat is available, and the project has achieved recognition, winning 2nd place in a CVPR24 Challenge on Driving with Language.</p><h3 class="heading" style="text-align:left;" id="new-trl-update-released">New TRL Update Released </h3><p class="paragraph" style="text-align:left;">The <a class="link" href="https://github.com/huggingface/trl/releases/tag/v0.10.1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">TRL v0.10.1</a> update just dropped with some pretty b</p><p class="paragraph" style="text-align:left;">One of which is that models can now be trained with DeepMind’s Online DPO. It’s an alignment method called <b>OnlineDPO</b> that generates data on the fly, eliminates the need for pre-collected preference datasets, and yields better results than traditional DPO.</p><p class="paragraph" style="text-align:left;">It also added support to align <b>vision-language models</b> (LLaVa-1.5, PaliGemma, and Idefics2) with DPO. DPO was previously used for text-only language models, so this update means DPO can now be applied to vision-language models as well.</p><p class="paragraph" style="text-align:left;">The update allows for the integration of<b> Ligon Triton kernels</b>, which leads to lower memory usage and faster throughput in training. This might allow for training larger models or using smaller hardware. </p><h3 class="heading" style="text-align:left;" id="open-ai-and-patched-collaborate-on-">OpenAI and Patched Collaborate on Static Analysis Evaluation Benchmark</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXdJPU-71ZAUU57waCtDfm27kBP9d5jLHJYRS8XaMeLMRiEzsrEcSl_shPa9_p4nwKhmCcdQAQqNWQxFmfJr7Uuly1sutb8vdWPCrUkKbqF1wIacIXFN0H03_SXZzh6ldWpyLdiIulfhvly7SxKLSe6KGSl1?key=_xEjQoqWJ8ujpFPME8rVWw"/><div class="image__source"><span class="image__source_text"><p>How various models performed on the Static Analysis Evaluation benchmark. <a class="link" href="https://www.patched.codes/blog/the-static-analysis-evaluation-benchmark-measuring-llm-performance-in-fixing-software-vulnerabilities?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">OpenAI collaborated with Patched to fine-tune GPT-4o for the <a class="link" href="https://www.patched.codes/blog/the-static-analysis-evaluation-benchmark-measuring-llm-performance-in-fixing-software-vulnerabilities?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">Static Analysis Evaluation Benchmark.</a> It’s designed to<b> assess LLMs&#39; performance</b> in fixing software vulnerabilities. The new version features more challenging instances with larger sample sizes (512-1024 tokens) and increased difficulty.</p><p class="paragraph" style="text-align:left;">The benchmark methodology involved:</p><ul><li><p class="paragraph" style="text-align:left;">Scanning top Python repositories on GitHub</p></li><li><p class="paragraph" style="text-align:left;">Filtering for file size</p></li><li><p class="paragraph" style="text-align:left;">Verifying vulnerabilities using Semgrep</p></li><li><p class="paragraph" style="text-align:left;">Curating a dataset representing real-world vulnerabilities in popular open-source projects</p></li></ul><p class="paragraph" style="text-align:left;">Results show that <b>combining techniques</b> like few-shot prompting, RAG, and fine-tuning leads to improved performance. Fine-tuned models consistently outperform base models, and larger models generally perform better.</p><h3 class="heading" style="text-align:left;" id="ibm-cloud-becomes-first-cloud-custo">IBM Cloud Becomes First Cloud Customer for Gaudi AI and Cerebras’ New AI Processor</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXdY6_HfE5zmMRrrBIYTZSuRC_awFYH3_9av8HpOO1iUwZxp1BPwIyL8oW_SgoNJD4S5ir02yvBnOk38zPpLdpWS0k0vlGpMkd_fFYj4JzEp4wk_hjev6b3oOeYq7u1t4mtrFb3g1MFLetq3we6Ui4xmjLZA?key=_xEjQoqWJ8ujpFPME8rVWw"/><div class="image__source"><span class="image__source_text"><p>IBM Cloud is the first cloud customer for Gaudi AI. <a class="link" href="https://techcrunch.com/2024/08/29/ibm-cloud-will-offer-intel-gaudi-3-chips-next-year/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Although Softbank ended its partnership with Intel recently, Intel secured <a class="link" href="https://techcrunch.com/2024/08/29/ibm-cloud-will-offer-intel-gaudi-3-chips-next-year/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">IBM Cloud</a> as its first cloud customer for the <b>Gaudi 3 AI accelerator chip</b>. IBM Cloud will offer Gaudi 3 to customers in early 2025 for both hybrid and on-premise environments, and plans to support Gaudi 3 within its Watsonx AI and data platform.</p><p class="paragraph" style="text-align:left;">Intel&#39;s expectations for Gaudi 3 revenue in 2024 are modest at $500 million, significantly lower than AMD&#39;s projected <b>$4.5 billio</b>n from its Instinct MI300-series GPUs and Nvidia&#39;s expected $40 billion from its data center business. Despite Gaudi 3&#39;s high performance-per-dollar, Intel faces challenges in attracting customers away from Nvidia.</p><p class="paragraph" style="text-align:left;">Nvidia&#39;s upcoming Blackwell chip, set for <b>production ramp-up in Q4</b>, poses a significant threat to Intel&#39;s Gaudi 3. Blackwell is expected to offer up to four times the performance of the H100, the chip Gaudi 3 is currently compared against, potentially further challenging Intel&#39;s position in the AI chip market.</p><p class="paragraph" style="text-align:left;">A new AI processor was also introduced by <a class="link" href="https://x.com/llama_index/status/1828484874065584263?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">Cerebras</a>. They developed the Wafer-Scale Engine-3 (WSE-3), which they claim is the world&#39;s largest and fastest AI processor. This powers their <b>CS-3 system</b>, a new class of AI supercomputer designed for generative AI training and inference with exceptional performance and scalability.</p><p class="paragraph" style="text-align:left;">CS-3 systems can be quickly clustered to create some of the world&#39;s largest AI supercomputers, simplifying the process of deploying and running very large AI models. This makes it easier for organizations to work with <b>cutting-edge AI </b>at scale.</p><h3 class="heading" style="text-align:left;" id="amazon-hires-covariant-founders-and">Amazon Hires Covariant Founders and Aims to Release New Version of Alexa Powered by Claude</h3><p class="paragraph" style="text-align:left;">Amazon has hired the founders of AI robotics startup <a class="link" href="https://techcrunch.com/2024/08/31/amazon-hires-the-founders-of-robotics-ai-startup-covariant/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">Covariant</a> - Pieter Abbeel, Peter Chen, and Rocky Duan. They also hired approximately <b>25% of the company&#39;s employees.</b></p><p class="paragraph" style="text-align:left;">As part of the deal, Amazon has secured a non-exclusive license to use Covariant&#39;s robotic foundation models, which are described as &quot;a large language model, but for robot language.&quot; These models focus on enabling robotic arms to perform <b>common warehouse</b> tasks like bin picking.</p><p class="paragraph" style="text-align:left;">Amazon plans to integrate Covariant&#39;s AI technology into its<b> existing robot fleet </b>to improve performance and create value for customers. This aligns with Amazon&#39;s ongoing efforts to enhance its fulfillment and robotics technologies.</p><p class="paragraph" style="text-align:left;">The deal structure seems similar to the ones we saw a<b> few months ago </b>when <a class="link" href="https://genai360.beehiiv.com/p/of-strawberries-and-models?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">Amazon hired most of Adept’s top employees.</a></p><p class="paragraph" style="text-align:left;">Amazon also plans to release a revamped version of <a class="link" href="https://www.reuters.com/technology/artificial-intelligence/amazon-turns-anthropics-claude-alexa-ai-revamp-2024-08-30/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">Alexa</a> in October, primarily powered by Anthropic&#39;s Claude AI models rather than Amazon&#39;s own AI. This decision was made after initial versions using Amazon&#39;s i<b>n-house </b>software struggled with response times and coherence.</p><p class="paragraph" style="text-align:left;">The new &quot;Remarkable&quot; Alexa will be a paid service, costing between<b> $5 to $10 per month</b>, while the current &quot;Classic&quot; Alexa will remain free. Amazon hopes this new version will help generate revenue from the currently unprofitable Alexa division.</p><p class="paragraph" style="text-align:left;">The upgraded Alexa is designed to handle <b>more complex queries, </b>carry on conversations that build on prior interactions, provide shopping advice, aggregate news, and perform more complicated tasks like ordering food or drafting emails from a single prompt.</p><h2 class="heading" style="text-align:left;" id="advancements-in-ai-research">Advancements in AI Research</h2><p class="paragraph" style="text-align:left;">While we saw some multimodal releases like Eagle and Phi-3 Vision, researchers also introduced a new family of VLMs in a new paper. DeepMind looked into ways of generating high-quality synthetic data. </p><p class="paragraph" style="text-align:left;">Another paper that caught our eye was about the law of next-token predictions, since the black-box nature of LLMs makes it difficult to understand how the model reached its conclusions.</p><h3 class="heading" style="text-align:left;" id="how-cheaper-models-outperform-more-">How Cheaper Models Outperform More Expensive Ones</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfmo5nrohBeFyNcOh3cD8WHHGq7c2uLeLtgXg7ui3mgqIRu3Umq_7g7rbdyRXlK2f3WjYBNk07m6mhk5tSMz99xtibuMoOEGJo9PKIoEVco44SvU9zxXamfQ_FEBdR859FfiHy3fq77XxCtf0N1DJt00Wk9?key=_xEjQoqWJ8ujpFPME8rVWw"/><div class="image__source"><span class="image__source_text"><p>Language models being fine-tuned with Gemma 2 and Gemini 1.5 data. <a class="link" href="https://arxiv.org/pdf/2408.16737?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2408.16737?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">DeepMind </a>decided to challenge the conventional wisdom of using strong but expensive (SE) language models for generating synthetic training data. They investigated whether using weaker but cheaper (WC) models could be more compute-optimal for <b>training LLM reasoners.</b></p><p class="paragraph" style="text-align:left;">They conducted extensive experiments comparing data generated from WC and SE models across<b> three key metric</b>s: coverage, diversity, and false positive rate. They then fine-tuned models on this data in various setups, including knowledge distillation, self-improvement, and a novel &quot;weak-to-strong improvement&quot; paradigm.</p><p class="paragraph" style="text-align:left;">Models trained on WC-generated data consistently outperformed those trained on SE-generated data across multiple benchmarks, with relative gains of up to 31.6%. For example, using Gemma2-9B (WC) data instead of Gemma2-27B (SE) data led to 6% <b>higher performance</b> in knowledge distillation and 5.8% in weak-to-strong improvement for math reasoning tasks.</p><h3 class="heading" style="text-align:left;" id="ll-ms-in-sales-and-negotiation-simu">LLMs in Sales and Negotiation: Simulating Human-Like Persuasion</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXe3SYO4ykPkEVquFmQeXjcKyFHqfJfkNoJk3IpefpdvtSmrBg5LCB6ucnwjrferODtV0VzdpG8o_Ajy8Fe3T1KtRi4bZEDNf-3qC4ymx7S6scpra6F7LDaSj5G3KVpsXKB-IwKlY7Gbfi65-o-Zy3egRKK-?key=_xEjQoqWJ8ujpFPME8rVWw"/><div class="image__source"><span class="image__source_text"><p>Example workflow of an insurance bot. <a class="link" href="https://arxiv.org/pdf/2408.15879?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Researchers developed a <a class="link" href="https://arxiv.org/pdf/2408.15879?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">multi-agent framework</a> to study the persuasion capabilities of LLMs various domains such as insurance, banking, and retail. This work addresses the challenge of creating AI systems that can engage in persuasive dialogue while <b>dynamically </b>adapting to user resistance and personality types.</p><p class="paragraph" style="text-align:left;">A <b>collaborative approach</b> using multiple AI agents was used, which included:</p><ul><li><p class="paragraph" style="text-align:left;">A primary conversational agent</p></li><li><p class="paragraph" style="text-align:left;">Auxiliary agents for information retrieval and analysis</p></li><li><p class="paragraph" style="text-align:left;">A fact-checking component</p></li></ul><p class="paragraph" style="text-align:left;">They simulated conversations using <b>LLM-generated personas </b>with varying demographics and emotional states, and measured persuasion effectiveness through pre- and post-interaction surveys, as well as user decisions.</p><p class="paragraph" style="text-align:left;">The paper showed that LLMs are capable of both persuading and resisting persuasion effectively. The <b>AI agents demonstrated the ability </b>to create perspective changes in users and influence purchase decisions. That sounds promising, but further work still needs to be done as conversations were terminated due to inadequate information from the sales agent. </p><h3 class="heading" style="text-align:left;" id="the-universal-law-of-llm-learning">The Universal Law of LLM Learning</h3><p class="paragraph" style="text-align:left;">Researchers from the University of Rochester and the University of Pennsylvania have discovered a <a class="link" href="https://arxiv.org/pdf/2408.13442v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">precise and quantitative law</a> governing how <b>LLMs learn contextualized token embeddings</b> for next-token prediction. In particular, this paper looks at the issue of understanding the internal data processing mechanisms of LLMs, which have long been considered black boxes.</p><p class="paragraph" style="text-align:left;">They used a wide range of open-source LLMs, including GPT variants, Llama models, and newer architectures like RWKV and Mamba. They also introduced a metric called &quot;<b>prediction residual&quot;</b> (PR) to quantify an LLM&#39;s next-token prediction capability at each layer.</p><p class="paragraph" style="text-align:left;">What came from the results is a universal &quot;law of equi-learning&quot;, where each layer contributes<b> equally </b>to enhancing prediction accuracy, from the lowest to the highest layer. The law emerged consistently across various model architectures, sizes, and pre-training data.</p><h3 class="heading" style="text-align:left;" id="text-2-sql-unifying-ai-and-database">Text2SQL: Unifying AI and Databases With TAG</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXeTBnVjlFtmTA-mjssw7uLEljLivZ6QAQKNfUhXCWb_XchAPhydRwJ4TY27z-dZHBYl4S7hYTWICjIYAHWvGFjgKze38llEj4QvEr458k3loQx1rddixQ1z8r_P3mNfWU65Y5gQyhKHE4BDxvrZXa-jAotu?key=_xEjQoqWJ8ujpFPME8rVWw"/><div class="image__source"><span class="image__source_text"><p>The three stages of the TAG pipeline. <a class="link" href="https://www.arxiv.org/pdf/2408.14717?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Researchers from UC Berkeley and Stanford University have introduced a new paradigm called<b> Table-Augmented Generation (TAG)</b> to address the limitations of current Text2SQL and RAG methods. This work tackles the challenge of answering complex natural language questions over databases that require both world knowledge and semantic reasoning.</p><p class="paragraph" style="text-align:left;">They developed a <b>unified framework</b> that combines the strengths of language models (LMs) and database systems. TAG consists of three key steps:</p><ol start="1"><li><p class="paragraph" style="text-align:left;"><b>Query synthesis:</b> Translating natural language requests into executable database queries</p></li><li><p class="paragraph" style="text-align:left;"><b>Query execution</b>: Efficiently computing relevant data using database systems</p></li><li><p class="paragraph" style="text-align:left;"><b>Answer generation:</b> Utilizing LMs to generate final natural language answers</p></li></ol><p class="paragraph" style="text-align:left;">To evaluate TAG, they created a dataset, requiring either world knowledge or semantic reasoning. Afterwards, they compared <b>TAG against several baselines</b>, including traditional Text2SQL and RAG approaches.</p><p class="paragraph" style="text-align:left;">Results show hand-written TAG pipelines consistently achieved <b>40% or better </b>exact match accuracy, significantly outperforming all other baselines which failed to exceed 20% accuracy. TAG demonstrated particular strength in comparison queries, with up to 65% accuracy.</p><h2 class="heading" style="text-align:left;" id="frameworks-we-love">Frameworks We Love</h2><p class="paragraph" style="text-align:left;">Some frameworks that caught our attention include:</p><ul><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2408.16768?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">SAM2POINT</a>: Adapts the SAM2 for zero-shot and promptable 3D segmentation by interpreting 3D data as multi-directional videos</p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2408.16760?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">OmniRe</a>: Comprehensive 3D Gaussian Splatting framework that reconstructs high-fidelity dynamic urban scenes from on-device driving logs, </p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2408.16700?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">GradBias</a>: Uses a combination of LLMs, Text-to-Image generative models, and Vision Question Answering to detect, quantify, and explain biases in image generation.</p></li></ul><p class="paragraph" style="text-align:left;">If you want your framework to be featured here, reply to this email and say hi :) </p><h2 class="heading" style="text-align:left;" id="conversations-we-loved">Conversations We Loved</h2><p class="paragraph" style="text-align:left;">One discussion about the <b>modality gap </b>in multimodal modals was also certainly one to look at since we don’t see this being talked about too often. In addition, a post about applying HybridRAG to financial document analysis makes us wonder how we might see this new approach be applied to other domains.</p><h3 class="heading" style="text-align:left;" id="exploring-the-modality-gap-in-multi">Exploring the Modality Gap in Multimodal AI</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXdMZY2E0bMzjCbbnzrC3Bf5cDAIR7kij9BiGhiwr5QLRH82kX7yoz2GIlP26dMSjr4JOVQ_SyQgscIhzPgNV-G0mdZ50ODYUQdRNwDptR6NZ8_cDJDSQ0eh36vOc1YKfBEs5J-WZ0mP_6cwfKIbVh_eHkue?key=_xEjQoqWJ8ujpFPME8rVWw"/><div class="image__source"><span class="image__source_text"><p>Jina AI’s discussion about the modality gap.<a class="link" href="https://jina.ai/news/the-what-and-why-of-text-image-modality-gap-in-clip-models/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow"> (Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Jira posted an interesting exploration of the &quot;<a class="link" href="https://jina.ai/news/the-what-and-why-of-text-image-modality-gap-in-clip-models/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">modality gap</a>&quot; in multimodal AI models, particularly those using CLIP (Contrastive Language-Image Pretraining). The discussion, centered around research by Jina AI, reveals an unexpected quirk in how these models <b>process and relate text and images.</b></p><p class="paragraph" style="text-align:left;">At first glance, you might assume that a well-trained AI would treat a picture of an apple and the text &quot;an apple&quot; as nearly identical. But surprisingly, that&#39;s not the case. The research shows these models tend to cluster text embeddings and image embeddings in separate parts of their semantic space, creating a &quot;<b>gap</b>&quot; between modalities.</p><p class="paragraph" style="text-align:left;">What&#39;s also interesting is how this gap emerges unintentionally. The models are encoding not just the <b>semantic content</b>, but also the medium itself. This happens even after extensive training, so it might be a fundamental aspect of how these models learn to represent information.</p><h3 class="heading" style="text-align:left;" id="overcoming-the-limitations-of-kg-an">Overcoming the Limitations of KG and Vector-Based RAG</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXd7_HSjZM6mSavW20NhGpB0T_nHHfftjZhQEXfuPt5lkvkyvCnUoz_GH_BmnIQddgu2XgxEyeddXOo13EqMYticL02ebcBOeNdiA2Ek_xph2NfM1heqx1RDZO-zkKctv-Zbv0k-bTLKDnH1iuXddjEW4C4?key=_xEjQoqWJ8ujpFPME8rVWw"/><div class="image__source"><span class="image__source_text"><p>AI engineer Rohan Paul brought up the potential of HybridRAG. <a class="link" href="https://x.com/rohanpaul_ai/status/1829528588636283152?s=46&t=r_IyUhjxHPp6D-O3kuDbzQ&utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">A new approach called <a class="link" href="https://x.com/rohanpaul_ai/status/1829528588636283152?s=46&t=r_IyUhjxHPp6D-O3kuDbzQ&utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">HybridRAG</a> is making waves in the field of financial document analysis, combining <b>KG and vector-based RAG.</b></p><p class="paragraph" style="text-align:left;">Financial documents are notoriously difficult for AI systems to parse because of their specialized terminology and intricate formats. Instead, HybridRAG gets the best of both worlds from Knowledge Graph and Vector-based RAG for more comprehensive and a<b>ccurate information retrieval.</b></p><p class="paragraph" style="text-align:left;">Note that while this paper is focusing on financial document analysis, there’s definitely potential for <b>HybridRAG</b> to be used for other applications because of its ability to excel at both extractive and abstractive questions.</p><p class="paragraph" style="text-align:left;">You might be thinking that this sounds great on paper, but does it actually perform well? The answer is yes - HybridRAG outperformed both VectorRAG and GraphRAG<b> across various metrics.</b> As a result, we might see HybridRAG be applied to other domains with complex information structures like legal documents or medical records. </p><h2 class="heading" style="text-align:left;" id="money-moving-in-ai">Money Moving in AI</h2><p class="paragraph" style="text-align:left;">We saw the success of three funding rounds for Story, Cursor, and Defcon - all of which were under <b>$100 million</b>. Interestingly, each company focuses on pretty different applications in AI: Story in blockchain, Cursor in coding, and Defcon in military logistics.</p><p class="paragraph" style="text-align:left;">Aside from Magic’s LTM models, they succeeded in raising $320 million in a funding round. Another AI coding business called Codeium also secured funding through a series C round for<b> $150 million. </b></p><h3 class="heading" style="text-align:left;" id="magic-raises-320-million">Magic Raises $320 Million </h3><p class="paragraph" style="text-align:left;"><a class="link" href="https://techcrunch.com/2024/08/29/generative-ai-coding-startup-magic-lands-320m-investment-from-eric-schmidt-atlassian-and-others/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">Magic,</a> secured a massive $320 million funding round led by<b> ex-Google CEO</b> Eric Schmidt, with participation from Alphabet&#39;s CapitalG and Atlassian.</p><p class="paragraph" style="text-align:left;">They also announced that they will work with Google Cloud to build two supercomputers that will use Nvidia’s <b>H100 GPUs and Blackwell chips.</b></p><h3 class="heading" style="text-align:left;" id="codeium-raises-150-million-in-serie">Codeium raises $150 million in Series C Funding Round</h3><p class="paragraph" style="text-align:left;"><a class="link" href="https://techcrunch.com/2024/08/29/github-copilot-competitor-codeium-raises-150m-at-a-1-25b-valuation/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">Codeium</a>, an AI-powered coding assistant startup competing with <b>GitHub</b> Copilot, has secured a $150 million Series C round led by General Catalyst, valuing the company at $1.25 billion. </p><p class="paragraph" style="text-align:left;">This latest investment brings Codeium&#39;s total funding to <b>$243 million</b> just three years after its launch, with the company achieving unicorn status and growing its user base to over 700,000 developers and 1,000 enterprise customers. </p><h3 class="heading" style="text-align:left;" id="story-raises-80-million-in-series-b">Story Raises $80 Million in Series B Funding Round</h3><p class="paragraph" style="text-align:left;"><a class="link" href="https://techcrunch.com/2024/08/21/story-raises-83m-at-a-2-25b-valuation-to-build-a-blockchain-for-the-business-of-content-ip-in-the-age-of-ai/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=in-fall-magic-s-100m-context-alexa-claude" target="_blank" rel="noopener noreferrer nofollow">Story</a>, a startup building a <b>blockchain-based</b> platform for IP tracking and monetization in the age of AI, has secured $80 million in Series B funding led by Andreessen Horowitz&#39;s crypto division, with participation from Polychain Capital and other notable investors. The round values Story at $2.25 billion post-money and brings its total funding to $143 million.</p></div><div class='beehiiv__footer'><br class='beehiiv__footer__break'><hr class='beehiiv__footer__line'><a target="_blank" class="beehiiv__footer_link" style="text-align: center;" href="https://www.beehiiv.com/?utm_campaign=69183571-0ed3-4692-9dcc-3f357d95508d&utm_medium=post_rss&utm_source=genai360_weekly_ai_news">Powered by beehiiv</a></div></div>
  ]]></content:encoded>
</item>

      <item>
  <title>🍓, Multimodal Llamas, the New GPT-4 Model</title>
  <description>Plus, Meta’s new AI character creation tool</description>
      <enclosure url="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/09ab7431-82b3-4ea4-bd49-9671391a2821/mikayelh_A_3d_isometric_white_and_orange_strawberry_with_whit_83f23798-eac0-4a3e-b90b-8f4f2821793a_2.png" length="1172696" type="image/png"/>
  <link>https://genai360.beehiiv.com/p/of-strawberries-and-models</link>
  <guid isPermaLink="true">https://genai360.beehiiv.com/p/of-strawberries-and-models</guid>
  <pubDate>Tue, 20 Aug 2024 00:05:32 +0000</pubDate>
  <atom:published>2024-08-20T00:05:32Z</atom:published>
  <content:encoded><![CDATA[
    <div class='beehiiv'><style>
  .bh__table, .bh__table_header, .bh__table_cell { border: 1px solid #C0C0C0; }
  .bh__table_cell { padding: 5px; background-color: #FFFFFF; }
  .bh__table_cell p { color: #2D2D2D; font-family: 'Helvetica',Arial,sans-serif !important; overflow-wrap: break-word; }
  .bh__table_header { padding: 5px; background-color:#F1F1F1; }
  .bh__table_header p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important; overflow-wrap: break-word; }
</style><div class='beehiiv__body'><p class="paragraph" style="text-align:left;">Before we start, share last week&#39;s news with a friend or a colleague:</p><h2 class="heading" style="text-align:left;" id="key-takeaways">Key Takeaways</h2><p class="paragraph" style="text-align:left;">No, seriously, are you keeping up with the AI news? This past two weeks were so packed we&#39;re splitting this issue in two parts, which we&#39;ll send you soon.</p><ul><li><p class="paragraph" style="text-align:left;">It was strawberry season for AI Twitter last week, with Altman&#39;s “<b>Project Strawberry</b>” hints. The new model that is rumored to have more advanced reasoning abilities than current LLMs (we&#39;re hearing a chatter of graduate-level intelligence). We&#39;ll cover this in a special mid-week release in two days, but take a look at <a class="link" href="https://x.com/iruletheworldmo?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">@iruletheworldmo</a> and <a class="link" href="https://x.com/lilyofashwood?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">Lily Ashwood</a> on X, who may or may not be real, and powered by the new model (the latter even hosted Twitter Spaces). </p></li><li><p class="paragraph" style="text-align:left;"><b>Activeloop</b>, Intel Disruptor Initiative, and Towards AI launched the <a class="link" href="https://learn.activeloop.ai/courses/genaitest?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">Impossible GenAI Test</a>, where only 1 in 20 engineers passes the test. Take it for free today.</p></li><li><p class="paragraph" style="text-align:left;">GPT-4o-2024-08-06 <b>launched on Azure </b>with structured outputs support, achieving perfect scores in JSON Schema evaluations.</p></li><li><p class="paragraph" style="text-align:left;">Meta introduced <b>AI Studio</b> for creating AI characters, while discontinuing celebrity AI chatbots due to user feedback calling them &quot;creepy&quot;.</p></li><li><p class="paragraph" style="text-align:left;">Idefics3 is a new model that adapts Llama 3 to <b>multimodality </b>and showed drastic improvements in OCR and document understanding over its predecessors.</p></li><li><p class="paragraph" style="text-align:left;">MiniCPM demonstrated <b>comparable performance</b> to larger models with smaller parameter counts (1.3B and 2.7B) through efficient fine-tuning techniques.</p></li></ul><p class="paragraph" style="text-align:left;"><i>Got forwarded this newsletter? Subscribe below👇</i></p><div class="button" style="text-align:left;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="background-color:#ff8a00;" href="https://genai360.beehiiv.com/subscribe?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model"><span class="button__text" style=""> Subscribe </span></a></div><div class="section" style="background-color:#ff8a00;margin:15.0px 15.0px 15.0px 15.0px;padding:15.0px 15.0px 15.0px 15.0px;"><h2 class="heading" style="text-align:left;"><span style="color:#F9FAFB;">Launching The Impossible GenAI Test</span></h2><p class="paragraph" style="text-align:left;"><span style="color:#F9FAFB;">As a subscriber, you&#39;ve been the first to know we&#39;ve introduced the Impossible GenAI Test. It tests across 6 core GenAI competences, comprises 25 questions, and is so tough only 1 in 20 passes (after the preliminary data, we&#39;ve may actually need to update this to… 1 in 40!). </span><br><br><span style="color:#F9FAFB;">Learn more about the </span><span style="color:#F9FAFB;"><a class="link" href="https://genai360.beehiiv.com/p/impossible-genai-test?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">test here</a></span><span style="color:#F9FAFB;">. Try it yourself today.</span></p><div class="button" style="text-align:center;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="background-color:#FFFFFF;" href="https://learn.activeloop.ai/courses/genaitest?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model"><span class="button__text" style="color:#222222;"> Check out my website </span></a></div></div><h2 class="heading" style="text-align:left;" id="the-latest-ai-news">The Latest AI News</h2><p class="paragraph" style="text-align:left;">Safe to say that OpenAI was at full force last week with all kinds of news, ranging from a <b>highly detailed system card </b>for safety to the introduction of structured outputs in the API and a new GPT-4 model. We also saw a new AI tool from Meta that lets users create their own characters, shortly after they shut down their Al celebrity chatbots.</p><h3 class="heading" style="text-align:left;" id="gpt-4-s-system-card-new-model-avail">GPT-4’s System Card, New Model Available on Azure, and Structured Outputs in the API</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXe3VTkD8OMf2JYW0rnIVc4JkUzy7cD-2DMqUW83nOM8bPnUK2GZtIClVrl6y1u48DGnj8PN0nhbVPoBZUVU1R6dWZ1iYplgFitAZ6P0ttKNk7o7NFW2OpTHUVTeGr1htshiRN1JSRd0Uvid5257oQpjY-k1?key=fjqdnkYAZeUBbSRmz0FGww"/><div class="image__source"><span class="image__source_text"><p>OpenAI detailed the safety measures taken before releasing GPT-4o. <a class="link" href="https://openai.com/index/gpt-4o-system-card/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">The <a class="link" href="https://openai.com/index/gpt-4o-system-card/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">GPT-4 System Card</a> details the safety measures, limitations, and evaluation methodologies implemented to mitigate risks associated with GPT-4&#39;s deployment. It outlines the system&#39;s <b>potential harms</b> and the steps taken to address issues like misinformation, bias, and unintentional harmful outputs.</p><p class="paragraph" style="text-align:left;">The system&#39;s residual risks, such as <b>occasional unauthorized</b> voice generation or over-refusals in non-English languages, are areas of active improvement. The focus remains on refining these aspects to minimize risks while enhancing the model&#39;s utility across diverse contexts.</p><p class="paragraph" style="text-align:left;">But that <b>wasn’t </b>the only move we saw from openAI last week. </p><p class="paragraph" style="text-align:left;">The latest model, <b>GPT-4o-2024-08-06</b>, has been <a class="link" href="https://azure.microsoft.com/en-us/blog/announcing-a-new-openai-feature-for-developers-on-azure/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">launched on Azure</a>, focusing on enhancing developer productivity by supporting <a class="link" href="https://openai.com/index/introducing-structured-outputs-in-the-api/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">structured outputs</a>, such as JSON Schemas. It even achieved perfect scores on evaluations with Structured Outputs, which means that the model&#39;s generated outputs consistently and accurately follow the complex JSON schema that was provided.</p><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXdPgAIQazHospU3bQSDX56_fQjiqhwUFWQmsg43tVTsJx4i36HZVFv1P-Cl3b9y_1zbEIlR_Qs4CjKgu39jiGyEjKvH6h1B_XHf7IYVb9Id1tngnL00XRE2KGMXztHf6OvyAYTfWr01MnyORt_FUlKym8e4?key=fjqdnkYAZeUBbSRmz0FGww"/><div class="image__source"><span class="image__source_text"><p>Structured outputs achieved a 100% score. <a class="link" href="https://openai.com/index/introducing-structured-outputs-in-the-api/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">The new GPT model is also <b>cheaper to use as a reranker</b> than Cohere’s model, with OpenAI’s model offering <a class="link" href="https://openai.com/api/pricing/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">$2.50/1M input tokens and $10/1M output tokens</a> compared to Cohere’s Command R+ <a class="link" href="https://cohere.com/pricing?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">$3/1M input tokens and $15/1M output tokens. </a></p><p class="paragraph" style="text-align:left;">GPT-4 mini is even more cost effective than both models at <a class="link" href="https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">$0.15/1M input tokens and $0.60/1M output tokens</a>. It’s also worth mentioning that the new GPT-4 model offers a higher context window at <a class="link" href="https://platform.openai.com/docs/models/gpt-4o?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">16K.</a></p><h3 class="heading" style="text-align:left;" id="amazon-upgrades-image-gen-tool-meta">Amazon Upgrades Image Gen Tool, Meta Releases New AI Chatbot Tool, and Idefics3 Emerges</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXd9IIbeYkq2WK-DkEGm-vKLPSwRMg7XvHj952oZE1e03YvLp_Hshdh5wXrej9bhEVXW-RmTLEF6mqhCqL_ZuvqrNl7EX0eqYEX-toZ2Kz62ZHtB5iZ5LkNGO0XLs9FUfd9Do-PZp3I3HoJ261BYcN_VOPR-?key=fjqdnkYAZeUBbSRmz0FGww"/><div class="image__source"><span class="image__source_text"><p>Examples of images generated by Titan Image Generator 2. <a class="link" href="https://techcrunch.com/2024/08/06/amazon-upgrades-its-ai-image-generator/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">The last time we heard from Amazon is when they were working on a GPT-killer called <a class="link" href="https://genai360.beehiiv.com/p/two-million-vs-rag-and-evolution?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">Metis</a>, so it’s been a little awhile in terms of news from them. They’ve released an <b>upgraded version</b> of their image generating model called <a class="link" href="https://techcrunch.com/2024/08/06/amazon-upgrades-its-ai-image-generator/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">Titan Image Generator v2</a>, which can detect and segment multiple objects within the foreground of an image.</p><p class="paragraph" style="text-align:left;">It also introduces<b> improved image condition capabilities</b>, so users can focus on specific visual characteristics such as edges, object outlines, and structural elements - all leading to more detailed image generation. Although, it isn’t quite clear as to what data Amazon used to train this model.</p><p class="paragraph" style="text-align:left;">Meanwhile, Meta’s new tool called AI studio lets users create AI versions of themselves, or even <b>create an entirely new AI character</b>. Speaking of which, Google acqui-hired <a class="link" href="https://Character.ai?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">Character.ai</a><a class="link" href="https://fortune.com/2024/08/02/google-character-ai-founders-microsoft-inflection-amazon-adept/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">’s founders</a>, which also provides a similar type of technology where users can talk to personality-driven AI chatbots, like a mental health helper or an English teacher.</p><p class="paragraph" style="text-align:left;">These types of conversational AI platforms have been pretty popular in the last couple of years, as we’ve seen a number of startups in this space <b>continue to grow rapidly</b> - with character.ai being one of them. </p><p class="paragraph" style="text-align:left;">But that wasn’t the case for<b> Meta’s celebrity AI chatbots </b>(which featured celebs like Snoop Dogg or Tom Brady), as users can no longer interact with them. It fell flat with users who even called them “<a class="link" href="https://www.digitaltrends.com/computing/facebook-shelves-celebrity-ai-chatbot-program/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">creepy</a>”.</p><p class="paragraph" style="text-align:left;">In other news, we saw the release of <a class="link" href="https://huggingface.co/HuggingFaceM4/Idefics3-8B-Llama3?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">Idefics3</a>, a model that adapts Llama 3 to <b>multimodality</b>. It’s capable of processing arbitrary sequences of text and image inputs to generate text outputs. It can perform tasks such as visual question answering, image captioning, and story creation based on multiple images.</p><p class="paragraph" style="text-align:left;">It builds upon Idefics1 and Idefics2, <b>significantly improving </b>in areas like OCR (Optical Character Recognition), document understanding, and visual reasoning.</p><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXe-jWH88GJ-V5q6zUsjJvH0ciDToSaG19AZ_NsgFEfHua4Gv1L2B9XCeiVKb8FDc8OEqK016Ts9ZvGROtRYo7utkCs1WfKdxEBaJvX40WAR3FNNPRYpULzOSCur31A7cI2dw4mhZXSDvBq27YNB1MlWjXc?key=fjqdnkYAZeUBbSRmz0FGww"/><div class="image__source"><span class="image__source_text"><p>Idefics3 outperforms its predecessor, Idefics2, across multiple benchmarks. <a class="link" href="https://huggingface.co/HuggingFaceM4/Idefics3-8B-Llama3?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><h3 class="heading" style="text-align:left;" id="open-a-is-board-expansion">OpenAI’s Board Expansion</h3><p class="paragraph" style="text-align:left;">In the wake of <a class="link" href="https://genai360.beehiiv.com/p/on-device-llms-challenge-gpt3-5?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">several high-profile departures</a>, OpenAI decided to <a class="link" href="https://techcrunch.com/2024/08/08/openai-adds-a-carnegie-mellon-professor-to-its-board-of-directors/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">appoint Zico Kolter</a>, a prominent professor and director at Carnegie Mellon University&#39;s Machine Learning Department, to its <b>board of directors.</b></p><p class="paragraph" style="text-align:left;">There’s been some concerns about the internal dynamics at OpenAI, especially regarding the <b>allocation of resources</b> for AI safety initiatives. Since Kolter’s research focuses on safety, it’s a smart move by OpenAI.</p><p class="paragraph" style="text-align:left;">Kolter will join OpenAI’s Safety and Security Committee, which includes other directors like Bret Taylor and Adam D’Angelo, as well as technical experts. This committee is tasked with overseeing and <b>making recommendations</b> on the safety and security of all OpenAI projects.</p><h3 class="heading" style="text-align:left;" id="xs-eu-data-pause-and-reddits-ai-pow">X’s EU Data Pause and Reddit’s AI-Powered Search Results</h3><p class="paragraph" style="text-align:left;">X has agreed to <a class="link" href="https://techcrunch.com/2024/08/08/elon-musks-x-agrees-to-pause-eu-data-processing-for-training-grok/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">suspend the use of European users&#39; data</a> for <b>training its AI tool,</b> Grok, following legal action from Ireland&#39;s Data Protection Commission (DPC). This suspension covers the period between May 7, 2024, and August 1, 2024, and will remain in effect as the DPC continues to assess the legality of this data processing under the GDPR.</p><p class="paragraph" style="text-align:left;">X has publicly criticized the DPC&#39;s actions, labeling the injunction as &quot;unwarranted&quot; and &quot;overbroad.&quot; The company claims to have implemented privacy settings allowing users to control their data and argues that it has been working with the DPC on these issues <b>since last year.</b></p><p class="paragraph" style="text-align:left;">The DPC, in collaboration with other EU/EEA regulators, is investigating whether X&#39;s data processing practices comply with GDPR requirements. This investigation includes examining the <b>potential unlawfulness</b> of AI models trained on data collected without proper consent.</p><p class="paragraph" style="text-align:left;">The other social media giant that had some AI news was<a class="link" href="https://techcrunch.com/2024/08/06/reddit-ai-powered-search-results/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow"> Reddit,</a> as they’ll be testing out <b>AI-powered search pages</b> soon. It builds on Reddit&#39;s recent partnerships with OpenAI and Google, which allow the company to leverage their LLM and AI capabilities.</p><h3 class="heading" style="text-align:left;" id="humanes-pin-faces-more-returns-than">Humane&#39;s Pin Faces More Returns Than Sales</h3><p class="paragraph" style="text-align:left;">Previously, we mentioned <a class="link" href="https://genai360.beehiiv.com/p/of-new-architectures?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">two executives at Humane left</a> to form their own fact checking company shortly after the release of Ai Pin <b>didn’t exactly go to plan</b> and received some harsh criticism. It doesn’t seem like things are getting any better as the Ai Pin was reported to have more <a class="link" href="https://www.theverge.com/2024/8/7/24211339/humane-ai-pin-more-daily-returns-than-sales?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">returns than sales</a> between May and August.  </p><p class="paragraph" style="text-align:left;">To make matters worse, Humane <b>can’t refurbish or resell </b>because of T-mobile limitations on reassigning devices to new users. As a result, the returned pins become e-waste and a loss of revenue for Humane.</p><p class="paragraph" style="text-align:left;">Additionally, Humane <b>experienced significant executive turnover,</b> including departures of key engineering leaders and the director of customer experience. The company also laid off 4% of employees in January as a cost-cutting measure.</p><p class="paragraph" style="text-align:left;">Humane is under a lot of <b>financial pressure </b>considering the fact that they raised $200 million in funding while dealing with low sales numbers and a lot of unhappy customers.</p><h3 class="heading" style="text-align:left;" id="is-the-autonomous-driving-market-re">Is the Autonomous Driving Market Ready for a Chinese Challenger?</h3><p class="paragraph" style="text-align:left;">WeRide, a Chinese autonomous driving startup, <a class="link" href="https://www.pymnts.com/news/ipo/2024/chinese-autonomous-driving-firm-weride-plans-us-ipo/https://www.pymnts.com/news/ipo/2024/chinese-autonomous-driving-firm-weride-plans-us-ipo/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">officially filed for an IPO with the U.S. </a>Securities and Exchange Commission (SEC) on <b>July 26</b>, which marks the company&#39;s intention to go public in the US.</p><p class="paragraph" style="text-align:left;">WeRide reported losses of<b> $268 million</b> in the previous year, with only $55 million in revenue. Despite these losses, the company continues to push forward with its IPO, reflecting the high growth potential it sees in the autonomous driving market.</p><p class="paragraph" style="text-align:left;">For key players in the self-driving industry like Wayve and NIO, this might mean increased competition, as we might see <b>other Chinese autonomous driving companies</b> go public soon as well. </p><p class="paragraph" style="text-align:left;">The huge losses of $268 million also points toward the trend that autonomous driving companies are <a class="link" href="https://www.scmp.com/tech/big-tech/article/3259682/baidus-self-driving-project-swerves-profit-chasing-after-burning-cash-years?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">facing issues when it comes to profitability</a>, due to the high initial costs involved and the fact that making profit from this type of product is more a <b>long-game </b>instead of one that provides immediate returns.</p><p class="paragraph" style="text-align:left;">Additionally, Warren Buffett&#39;s Berkshire Hathaway has <a class="link" href="https://www.barrons.com/articles/byd-stock-berkshire-hathaway-warren-buffett-sell-773d9e05?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">sold more shares</a> of <b>Chinese electric vehicle maker BYD</b>, continuing its gradual reduction in holdings. The sale has sparked speculation about Berkshire&#39;s confidence in BYD, given its significant past investments in the company. </p><h2 class="heading" style="text-align:left;" id="advancements-in-ai-research">Advancements in AI Research</h2><p class="paragraph" style="text-align:left;">We saw some notable progress in <b>language model research</b> last week, with CODEXGRAPH providing a means for LLMs to interact with code repositories and MiniCPM as a method for deploying GPT-4V level MLLMs on end devices.</p><p class="paragraph" style="text-align:left;">RAGFoundry was another framework that stood out, since it provides a single workflow that <b>combines various aspects </b>to make RAG implementation a little less complex.</p><h3 class="heading" style="text-align:left;" id="codex-graph-bridges-the-gap-between">CodexGraph Bridges the Gap Between LLMs and Complex Codebases</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXdrawLS9-IJuirR2jkor9GMxI0xbMMqWp1spXfg-BmXRg_Chqd939JaY0zaWRPZutPL0LszcloMoPgRk-M_TqwsD_B4AMHqvSJgmLXxt3boGJ_rSZEZ0D9O0CjfIrOUZ_YOCSgPSfdAO8tZcBFsxitOfUi2?key=fjqdnkYAZeUBbSRmz0FGww"/><div class="image__source"><span class="image__source_text"><p>CODEXGRAPH overview. <a class="link" href="https://arxiv.org/pdf/2408.03910v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">(Source) </a></p></span></div></div><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2408.03910v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">CODEXGRAPH</a> is a system that lets LLMs effectively interact with entire code repositories. It addresses the challenge of handling <b>complex, repository-level coding tasks </b>that require understanding cross-file code structures and performing intricate reasoning across large codebases.</p><p class="paragraph" style="text-align:left;">To achieve this, CODEXGRAPH integrates LLM agents with graph database interfaces extracted from code repositories. The system uses<b> static analysis </b>to construct code graphs, where nodes represent code symbols and edges represent relationships between them. </p><p class="paragraph" style="text-align:left;">LLM agents then generate and execute graph queries to navigate the codebase, allowing for <b>precise, code structure-aware context retrieval.</b></p><p class="paragraph" style="text-align:left;">Results were impressive as it achieved competitive performance across three <b>challenging repository-level benchmarks</b>: CrossCodeEval, SWE-bench, and EvoCodeBench. </p><p class="paragraph" style="text-align:left;">When equipped with GPT-4o, CODEXGRAPH <b>outperforms</b> other retrieval-augmented code generation baselines on CrossCodeEval and EvoCodeBench, while matching state-of-the-art performance on SWE-bench.</p><h3 class="heading" style="text-align:left;" id="nac-ls-hybrid-eviction-policy-slash">NACL&#39;s Hybrid Eviction Policy Slashes LLM Memory Usage</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXe5s9D_8D5peUra2nn8yh9Zj9oluadLzDFMbcSmuANAaxYZ0_IWYhRdZAr32GomAZwWUGh6z-oPBS9h9F8dsydRoZFvw-w_3ypARKvSU7daujkq1yUXQZTLAeIcZjEzQMsjujaOyJqniHrzLvpqaSsjKKw?key=fjqdnkYAZeUBbSRmz0FGww"/><div class="image__source"><span class="image__source_text"><p><a class="link" href="https://arxiv.org/pdf/2408.03675v2?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">NACL</a> is a much more efficient alternative than traditional eviction algorithms that use step-by-step greedy search. <a class="link" href="https://arxiv.org/pdf/2408.03675v2?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Researchers from the Chinese Academy of Sciences and Baidu have introduced NACL, a framework for key-value (KV) cache eviction in LLMs during inference time. This approach addresses the challenge of<b> managing extensive memory consumption in KV caches,</b> particularly for models with extended context windows, which has been a significant bottleneck in deploying LLMs for long-context tasks.</p><p class="paragraph" style="text-align:left;">NACL employs a<b> hybrid eviction policy</b> combining Proxy-Tokens Eviction and Random Eviction. The Proxy-Tokens Eviction utilizes global statistics of attention scores from selected proxy tokens, while Random Eviction incorporates a diversified sampling strategy.</p><p class="paragraph" style="text-align:left;">NACL drastically improves performance on both short- and long-text tasks by <b>80% and 76% respectively</b>, while reducing KV Cache by up to 5× with over 95% performance maintenance. It shows that there’s a practical approach to managing memory constraints in LLMs, which might enable efficient deployment of these models for long-context applications.</p><h3 class="heading" style="text-align:left;" id="mini-cp-ms-two-stage-fine-tuning-to">MiniCPM&#39;s Two-Stage Fine-Tuning to Deploy GPT-4V Level Models On-Device</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXcoWCb-eN-l4SMeyMcGgm7D-tlBhyMdGQjZW-m5D4tKonKt1FTcHaxZyjeW8mTrjzFZBHm-ltQEH_ewr5FCk9aD6Be0G477dp3Sr123nqU8JzeYZKZNv3wJu8JGx4eBQZsTBSAse2oJuqF1XeV_qWKsA6n5?key=fjqdnkYAZeUBbSRmz0FGww"/><div class="image__source"><span class="image__source_text"><p>Moore’s Law for MLLMs, which shows that deploying GPT-4V level MLLMs on end devices is becoming a reality. <a class="link" href="https://arxiv.org/pdf/2408.01800v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">OpenBMB researchers introduced MiniCPM, a means of improving the performance of multimodal large language models (MLLMs) through efficient fine-tuning techniques. This helps boost their capabilities without having to <b>increase their size or computational requirements</b>, making them more suitable for deployment in resource-constrained environments.</p><p class="paragraph" style="text-align:left;">MiniCPM employs a two-stage fine-tuning process. First, it uses a larger teacher model to generate high-quality synthetic data, then it fine-tunes the smaller target model on this data using techniques like LoRA and QLoRA. The approach also <b>incorporates multi-task learning</b> and careful data curation to maximize the efficiency of the fine-tuning process.</p><p class="paragraph" style="text-align:left;">MiniCPM models with only 1.3B and 2.7B parameters achieve performance comparable to <b>much larger models like GPT-4V</b> in various benchmarks, including common sense reasoning, math problem-solving, and coding tasks. </p><p class="paragraph" style="text-align:left;">For instance, the MiniCPM-2.7B model outperforms Llama2-13B on several metrics despite being a lot smaller. As a result, GPT-4V level MLLMs can be deployed on end devices.</p><h2 class="heading" style="text-align:left;" id="frameworks-we-love">Frameworks We Love</h2><p class="paragraph" style="text-align:left;">Some frameworks that caught our attention in the last week include:</p><ul><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2408.04482?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">SegXAL</a>: Explainable Active Learning (XAL) model designed for semantic segmentation in driving scenes, which integrates human expertise through an explainable AI module and uncertainty measures.</p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2408.04449?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">RiskAwareBench</a>:  Automated framework designed to assess physical risk awareness in LLM-based embodied agents.</p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2408.04347?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">AggSS</a>: Introduces an Aggregated Self-Supervision approach for class-incremental learning, where image rotations are treated as additional classes to enhance robust feature learning. </p></li></ul><p class="paragraph" style="text-align:left;">If you want your framework to be featured here, reply to this email and say hi :) </p><h2 class="heading" style="text-align:left;" id="conversations-we-loved">Conversations We Loved</h2><p class="paragraph" style="text-align:left;">OpenAI continued the wave of news with Altman<b> dropping a huge hint </b>about a new model announcement that we might see soon. His cryptic post containing a picture of a strawberry was actually referring to “Project Strawberry”, a highly advanced model with better reasoning capabilities than current models. </p><p class="paragraph" style="text-align:left;">Another interesting discussion that popped up was regarding <b>Intel’s discussions with OpenAI </b>in 2017-2018, and what impact this had on the chip industry.</p><h3 class="heading" style="text-align:left;" id="project-strawberry-announcement-inc">Project Strawberry Announcement Incoming?</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfrugOr7ww3BAnWSgSTsE51o4mwn4hzytEVOt7hpS7lWGKU9d8xAU7Z1WxL5XwjMXgQSVoeM9eO3jNZDMtLK_d1P1CvaJ7LQaz6rNYwI1Pg3k8EJcpSBC-QJTYDJoW-3_NzKRXZOyAdafN23hK3H5dYQ3M?key=fjqdnkYAZeUBbSRmz0FGww"/><div class="image__source"><span class="image__source_text"><p>Altman’s hint for OpenAI’s newest model. <a class="link" href="https://x.com/sama/status/1821207141635780938?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;"><a class="link" href="https://x.com/sama/status/1821207141635780938?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">Altman&#39;s cryptic tweet</a> featuring a strawberry sparked intense speculation about &quot;Project Strawberry,&quot; a new AI model reportedly capable of <b>advanced reasoning</b>. This project, an extension of the previously revealed Q* initiative, aims to address one of AI&#39;s biggest challenges: multi-step problem-solving and reasoning.</p><p class="paragraph" style="text-align:left;">Project Strawberry reportedly builds on OpenAI&#39;s existing LLMs, fine-tuning them for <a class="link" href="https://singularityhub.com/2024/07/19/openais-project-strawberry-is-said-to-be-building-ai-that-reasons-and-does-deep-research/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">enhanced reasoning capabilities</a>. The approach is said to be similar to the Self-Taught Reasoner (STaR) method, which uses <b>iterative self-improvement </b>techniques to boost AI&#39;s problem-solving skills.</p><p class="paragraph" style="text-align:left;">Reports suggest impressive capabilities, particularly in math and science – areas that have <b>traditionally been difficult for AI</b>. <a class="link" href="https://www.zdnet.com/article/what-is-project-strawberry-openais-mystery-ai-tool-explained/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">An anonymous model</a>, possibly related to Project Strawberry, has already demonstrated reasoning abilities surpassing GPT-4 on the AI testing platform Arena. </p><p class="paragraph" style="text-align:left;">This follows a pattern similar to GPT-4&#39;s pre-release testing, hinting at a potential <b>imminent announcement </b><a class="link" href="https://www.tomsguide.com/ai/chatgpt/openai-could-be-about-to-drop-project-strawberry-in-huge-chatgpt-upgrade?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">as early as this week.</a></p><h3 class="heading" style="text-align:left;" id="open-a-is-billion-dollar-opportunit">OpenAI&#39;s Billion-Dollar Opportunity: How Intel&#39;s Hesitation Reshaped the AI Landscape</h3><p class="paragraph" style="text-align:left;">Althoug Nvidia currently leads the AI chip market, <a class="link" href="https://www.reuters.com/technology/artificial-intelligence/how-chip-giant-intel-spurned-openai-fell-behind-times-2024-08-07/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">Intel </a>was once the dominant player in the chip industry. In 2017-2018, Intel had discussions with OpenAI about potentially acquiring a <b>15% stake for $1 billion.</b></p><p class="paragraph" style="text-align:left;">The deal also included provisions for Intel to provide <b>specialized</b> chips at cost to OpenAI, potentially shaping the future of AI computing. However, Intel ultimately decided not to proceed with the investment.</p><p class="paragraph" style="text-align:left;">At the time, the company&#39;s leadership, including then-CEO Bob Swan, had a different perspective on the <b>near-term market potential of generative AI</b>. This decision came during a period when Intel was navigating the transition from CPU to GPU architecture for AI applications.</p><p class="paragraph" style="text-align:left;">Meanwhile, Nvidia&#39;s focus on GPUs for AI workloads helped them gain a <b>significant market share.</b></p><h2 class="heading" style="text-align:left;" id="money-moving-in-ai">Money Moving in AI</h2><p class="paragraph" style="text-align:left;">Investments were plentiful in the AI industry last week, with Recursion Pharmaceuticals being involved in a <b>massive deal to acquire Exscentia</b> for $688 million. Meanwhile, Groq secured $640 million in a successful round and Leonardo.ai was acquired by Canva.</p><h3 class="heading" style="text-align:left;" id="recursion-pharmaceuticals-ready-to-">Recursion Pharmaceuticals Ready to Acquire Exscentia in $688 Million Deal</h3><p class="paragraph" style="text-align:left;"><a class="link" href="https://finimize.com/content/recursion-pharmaceuticals-to-acquire-exscientia-in-688-million-deal?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">Recursion Pharmaceuticals</a> is set to acquire Exscientia in a $688 million all-stock deal, marking a significant consolidation in the <b>AI-driven drug discovery space.</b> This merger combines Recursion&#39;s focus on rare diseases and cancers with Exscientia&#39;s AI-powered drug discovery platform, aiming to accelerate drug development and reduce costs. </p><h3 class="heading" style="text-align:left;" id="groq-secures-640-million-in-series-">Groq Secures $640 Million in Series D Funding</h3><p class="paragraph" style="text-align:left;"><a class="link" href="https://wow.groq.com/news_press/groq-raises-640m-to-meet-soaring-demand-for-fast-ai-inference/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">Groq</a>, a leader in fast AI inference, has secured a massive $640 million Series D funding round at a<b> $2.8 billion valuation</b>, led by BlackRock Private Equity Partners with participation from notable investors including Neuberger Berman, Cisco Investments, and Samsung Catalyst Fund. </p><h3 class="heading" style="text-align:left;" id="tencent-contributes-to-300-million-">Tencent Contributes to $300 Million Funding Round for Moonshot</h3><p class="paragraph" style="text-align:left;"><a class="link" href="https://www.bloomberg.com/news/articles/2024-08-05/tencent-joins-300-million-financing-for-china-s-ai-unicorn?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">Tencent</a> has participated in a $300 million-plus financing round for Chinese AI startup Moonshot, valuing the company at<b> $3.3 billion</b>, with Alibaba and Gaorong Capital also joining the investment. This move is part of a larger trend of significant capital inflow into Chinese AI firms, as major tech companies and venture capitalists compete to establish dominance in the AI market and develop alternatives to ChatGPT.</p><h3 class="heading" style="text-align:left;" id="adept-ai-investors-to-be-paid-back">Adept AI Investors to be Paid Back</h3><p class="paragraph" style="text-align:left;">In a complex deal blurring the lines between acquisition and talent poaching, <a class="link" href="https://www.semafor.com/article/08/02/2024/investors-in-adept-ai-will-be-paid-back-after-amazon-hires-startups-top-talent?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">Amazon</a> has effectively hired away most of Adept&#39;s top employees while arranging for the AI startup&#39;s investors to recoup their <b>$414 million investment. </b></p><p class="paragraph" style="text-align:left;">This arrangement, which sees Adept retaining about a third of its workforce and receiving <b>$25 million</b>, has caught the attention of regulators, with the FTC probing whether it circumvents merger notification rules. </p><h3 class="heading" style="text-align:left;" id="leonardoai-acquired-by-canva">Leonardo.ai Acquired by Canva</h3><p class="paragraph" style="text-align:left;"><a class="link" href="https://techcrunch.com/2024/07/29/canva-acquires-leonardo-ai-to-boost-its-generative-ai-efforts/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=multimodal-llamas-the-new-gpt-4-model" target="_blank" rel="noopener noreferrer nofollow">Canva</a>, the design platform giant, has acquired Leonardo.ai, a generative AI content startup, in a strategic move to enhance its AI capabilities and expand its Magic Studio suite. We don’t know the full financial terms, but the deal involves a <b>mix of cash and stock, </b>with all 120 Leonardo.ai employees, including the executive team, joining Canva. </p></div><div class='beehiiv__footer'><br class='beehiiv__footer__break'><hr class='beehiiv__footer__line'><a target="_blank" class="beehiiv__footer_link" style="text-align: center;" href="https://www.beehiiv.com/?utm_campaign=30bf912d-4f17-4d72-9ed2-5577d20e85ba&utm_medium=post_rss&utm_source=genai360_weekly_ai_news">Powered by beehiiv</a></div></div>
  ]]></content:encoded>
</item>

      <item>
  <title>The Impossible GenAI Course: 1 in 20 Passes, Will You?</title>
  <description>The Hardest Challenge You&#39;ve Ever Taken is Inside</description>
      <enclosure url="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/fe5c6709-8e25-48fa-b2d4-3a01288f60dc/RAG_Course_copy.png" length="462826" type="image/png"/>
  <link>https://genai360.beehiiv.com/p/impossible-genai-test</link>
  <guid isPermaLink="true">https://genai360.beehiiv.com/p/impossible-genai-test</guid>
  <pubDate>Thu, 15 Aug 2024 14:57:50 +0000</pubDate>
  <atom:published>2024-08-15T14:57:50Z</atom:published>
  <content:encoded><![CDATA[
    <div class='beehiiv'><style>
  .bh__table, .bh__table_header, .bh__table_cell { border: 1px solid #C0C0C0; }
  .bh__table_cell { padding: 5px; background-color: #FFFFFF; }
  .bh__table_cell p { color: #2D2D2D; font-family: 'Helvetica',Arial,sans-serif !important; overflow-wrap: break-word; }
  .bh__table_header { padding: 5px; background-color:#F1F1F1; }
  .bh__table_header p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important; overflow-wrap: break-word; }
</style><div class='beehiiv__body'><p class="paragraph" style="text-align:left;">tl;dr - the hardest challenge you&#39;ll ever face - The Impossible GenAI Course is Now Live!</p><p class="paragraph" style="text-align:left;">Today&#39;s a special day. Activeloop, together with our partners at Intel Disruptor and TowardsAI, was one of the first companies to pioneer high-quality, production-oriented GenAI courses.</p><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/5e5750de-3fe1-47b2-b708-aab4a5da01e7/image.png?t=1725528376"/></div><p class="paragraph" style="text-align:left;">One year, and tens of thousands of professionals educated later, we&#39;ve noticed one pattern. </p><p class="paragraph" style="text-align:left;">While everyone focuses on providing content, no one focuses on… Comprehension of the said content! Truly, so many people nowadays (42,7K, according to LinkedIn/Apollo data) call themselves ‘AI Engineers&#39;, but can they build production-ready GenAI solutions that solve actual business problems?</p><p class="paragraph" style="text-align:left;">To address that, we&#39;ve teamed up with top AI minds, Intel Disruptor Initiative and TowardsAI to craft the Impossible GenAI Test. <b>Only one in 20 test takers succeeds. Do you think it can be you?</b></p><div class="button" style="text-align:center;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="background-color:#ff8a00;" href="https://learn.activeloop.ai/courses/genaitest?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=the-impossible-genai-course-1-in-20-passes-will-you"><span class="button__text" style="color:#F9FAFB;"> Take the Test </span></a></div><h2 class="heading" style="text-align:left;" id="whats-in-store">What&#39;s in store?</h2><p class="paragraph" style="text-align:left;">- 30 Questions, 6 Topics, 40 Minutes<br>- Questions do not repeat, vary in difficulty, and get you more points based on complexity<br>- Wrong answers are penalized</p><h2 class="heading" style="text-align:left;" id="what-questions-will-be-asked">What Questions Will Be Asked?</h2><p class="paragraph" style="text-align:left;">The test covers six key areas of Generative AI. It will test everything from your deep understanding of how chunking impacts downstream solutions, to deciding on what would be the most cost-efficient solution in a case study, to what legal ramifications does building GenAI applications have in the US vs EU. More specifically, you&#39;ll be tested on:</p><ol start="1"><li><p class="paragraph" style="text-align:left;"><b>AI Foundations</b></p></li><li><p class="paragraph" style="text-align:left;"><b>Retrieval Augmented Generation</b></p></li><li><p class="paragraph" style="text-align:left;"><b>Model Training & Fine-tuning</b></p></li><li><p class="paragraph" style="text-align:left;"><b>Observability & Evaluation</b></p></li><li><p class="paragraph" style="text-align:left;"><b>Model Inference and Deployment</b></p></li><li><p class="paragraph" style="text-align:left;"><b>Ethics & Compliance</b></p></li></ol><p class="paragraph" style="text-align:left;">Each section will challenge you with five questions selected at random. </p><p class="paragraph" style="text-align:left;">Yes, the test is really tough. But we&#39;re not here to trip you up. With GenAI&#39;s growing impact, mastering the basics, and understanding production intricacies is crucial before launching any GenAI app into the real world. Challenge yourself, your friends, and colleagues to an Impossible GenAI Test face-off. </p><p class="paragraph" style="text-align:left;">With enough data, we will work to release company leaderboards, so people can compare knowledge across companies!</p><p class="paragraph" style="text-align:left;">To sum up, here&#39;s what Arijit Bandyopadhyay from Intel Corporation had to say about the initiative, developed jointly by Activeloop and the Intel Disruptor Initiative:</p><div class="section" style="background-color:#FFFFFF;margin:0.0px 0.0px 0.0px 0.0px;padding:0.0px 0.0px 0.0px 0.0px;"><div class="blockquote"><blockquote class="blockquote__quote"></blockquote></div></div><p class="paragraph" style="text-align:center;">Take the course & test yourself today!</p><div class="button" style="text-align:center;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="background-color:#ff8a00;" href="https://learn.activeloop.ai/courses/genaitest?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=the-impossible-genai-course-1-in-20-passes-will-you"><span class="button__text" style="color:#FFFFFF;"> Accept the Challenge </span></a></div><p class="paragraph" style="text-align:left;"><span style="font-family:Apple Color Emoji, Segoe UI Emoji, NotoColorEmoji, Noto Color Emoji, Segoe UI Symbol, Android Emoji, EmojiSymbols;font-size:0.6rem;">©</span><span style="font-size:0.6rem;"> Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.</span></p></div><div class='beehiiv__footer'><br class='beehiiv__footer__break'><hr class='beehiiv__footer__line'><a target="_blank" class="beehiiv__footer_link" style="text-align: center;" href="https://www.beehiiv.com/?utm_campaign=82e95f6b-03e8-4496-92ca-8e3395d67450&utm_medium=post_rss&utm_source=genai360_weekly_ai_news">Powered by beehiiv</a></div></div>
  ]]></content:encoded>
</item>

      <item>
  <title>Gemma 2 2B &gt; GPT-3.5, Open-Source FLUX-1 vs Midjourney, GitHub Takes on Hugging Face</title>
  <description>Plus, EU AI Act comes into force</description>
      <enclosure url="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/abff904e-6fe1-4d77-b0e4-13e7cc4c29a9/mikayelh_A_3d_isometric_white_and_orange_micropocessor_highli_01c49fe9-1a9c-45ad-a066-4bd1c0885e51_1.png" length="713062" type="image/png"/>
  <link>https://genai360.beehiiv.com/p/on-device-llms-challenge-gpt3-5</link>
  <guid isPermaLink="true">https://genai360.beehiiv.com/p/on-device-llms-challenge-gpt3-5</guid>
  <pubDate>Tue, 06 Aug 2024 16:08:49 +0000</pubDate>
  <atom:published>2024-08-06T16:08:49Z</atom:published>
  <content:encoded><![CDATA[
    <div class='beehiiv'><style>
  .bh__table, .bh__table_header, .bh__table_cell { border: 1px solid #C0C0C0; }
  .bh__table_cell { padding: 5px; background-color: #FFFFFF; }
  .bh__table_cell p { color: #2D2D2D; font-family: 'Helvetica',Arial,sans-serif !important; overflow-wrap: break-word; }
  .bh__table_header { padding: 5px; background-color:#F1F1F1; }
  .bh__table_header p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important; overflow-wrap: break-word; }
</style><div class='beehiiv__body'><p class="paragraph" style="text-align:left;">Before we start, share this week&#39;s news with a friend or a colleague:</p><h2 class="heading" style="text-align:left;" id="key-takeaways">Key Takeaways</h2><ul><li><p class="paragraph" style="text-align:left;">Google unveiled the new generation of <b>Gemma 2 models</b>, with the 2B model showing strong performance while running on-device along with two other models that focus on privacy and transparency.</p></li><li><p class="paragraph" style="text-align:left;">Apple introduced two <b>new foundation language models</b> (AFM-on-device and AFM-server), with the on-device model outperforming Llama-3 8B and the server model outperforming GPT3.5.</p></li><li><p class="paragraph" style="text-align:left;">Salesforce AI Research introduced MINT-1T, the<b> first trillion token </b>interleaved dataset that outperformed the previous state-of-the-art interleaved dataset, OBELICS.</p></li><li><p class="paragraph" style="text-align:left;">Black Forest Labs released the<b> FLUX.1 suite of models</b> which outperforms Midjourney-V6.0 in aspects like visual quality by incorporating rotary positional embeddings and parallel attention layers.</p></li><li><p class="paragraph" style="text-align:left;">Stanford University researchers created a <b>comprehensive benchmark </b>for solving predictive tasks over relational databases using graph neural networks, aimed at generating large-scale training data for autonomous driving applications.</p></li></ul><p class="paragraph" style="text-align:left;"><i>Got forwarded this newsletter? Subscribe below👇</i></p><div class="button" style="text-align:left;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="background-color:#ff8a00;" href="https://genai360.beehiiv.com/subscribe?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face"><span class="button__text" style=""> Subscribe </span></a></div><h2 class="heading" style="text-align:left;" id="the-talk-of-the-day-and-then-there-">The Talk of the Day: And Then There Were Three…</h2><p class="paragraph" style="text-align:left;">OpenAI has just lost <a class="link" href="https://techcrunch.com/2024/08/05/openai-co-founder-leaves-for-anthropic/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">more of the original 11 founders</a>. President Greg Brockman, co-founder John Schulman, and product leader Peter Deng have left the ChatGPT developer.</p><p class="paragraph" style="text-align:left;">After AI safety researcher <a class="link" href="https://genai360.beehiiv.com/p/io-openai-exodus-14-llms?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">Jan Leike left to work at Anthropic</a> on the AI alignment problem, Schulman took over as the leader of OpenAI&#39;s alignment science team, also called the &quot;post-training&quot; team. Now, he&#39;ll continue this mission at Anthropic, following in Leike&#39;s footsteps. </p><p class="paragraph" style="text-align:left;">While it&#39;s not yet clear where Deng will go, Brockman has decided to take a sabbatical ‘to relax since co-founding OpenAI 9 years ago&#39;. </p><p class="paragraph" style="text-align:left;">Only three of OpenAI’s 11 original founders remain: OpenAI CEO Sam Altman, Brockman (who hasn&#39;t quit officially), and Wojciech Zaremba, lead of language and code generation.</p><h2 class="heading" style="text-align:left;" id="the-latest-ai-news">The Latest AI News</h2><p class="paragraph" style="text-align:left;">Google’s Gemma 2 2B <b>outperformed GPT-3.5</b> despite being a lot smaller, meaning that competition for the GPT models is only increasing.</p><p class="paragraph" style="text-align:left;">Meanwhile, the <b>hardware race</b> is heating up with AMD challenging NVIDIA&#39;s dominance and Samsung ramping up chip production, so we might see a big shift in the AI infrastructure landscape.</p><h3 class="heading" style="text-align:left;" id="the-next-generation-of-gemma-2-mode">The Next Generation of Gemma 2 Models & Salesforce&#39;s MINT-1T</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXerBANvCJ6fq-izNJzKSJGrQ_EhIfURZ0hHaWSSQqzzffpseC9_e5yxi_jr6oiqSRSFTGvqg_WCbhpOXt5hLmHeelFZNHdyWehQIDoQjui1sL5Hm85p2F_X4kIK3TbcGcz8ejGs8O0RdQvfH0aNdLfz__M?key=i6VOPgWULlhs1KUjQCjlIA"/><div class="image__source"><span class="image__source_text"><p>Gemma 2 outperforms much larger models. <a class="link" href="https://developers.googleblog.com/en/smaller-safer-more-transparent-advancing-responsible-ai-with-gemma/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Three new models were added to the <a class="link" href="https://developers.googleblog.com/en/smaller-safer-more-transparent-advancing-responsible-ai-with-gemma/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">Gemma 2 model</a> family by Google last week.</p><p class="paragraph" style="text-align:left;">Gemma 2 2B is a smaller, yet more efficient model that <b>outperforms larger models </b>while still having safety advancements built into it. The 2B model even outperformed every GPT3.5 model on Chatbot Arena.</p><p class="paragraph" style="text-align:left;">In addition to Gemma 2 2B, ShieldGemma and Gemma Scope were also introduced. ShieldGemma<b> boosts the privacy and security </b>of AI models by protecting user data, while Gemma Scope offers tools and techniques to provide a better understanding of how AI decisions were made.</p><p class="paragraph" style="text-align:left;">In other news, Salesforce AI Research presented <a class="link" href="https://blog.salesforceairesearch.com/mint-1t/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">MINT-1T</a> - the first ever <b>trillion token interleaved </b>dataset. So what’s the big deal about it?</p><p class="paragraph" style="text-align:left;">Interleaved documents have a mixture of text and images, which means they can be used to train multimodal models for <b>text and visual capabilities</b>. Models like <a class="link" href="https://genai360.beehiiv.com/p/io-openai-exodus-14-llms?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">Chameleon </a>showed just how effective interleaved data can be in achieving high performance for multimodal models.</p><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfQ_O0BAZB-MSxDFQnHBddNbvdGOtu0N5o8vIVoXUeW6qs1uvgZdWOYeew4nv-rGEqndn_uiFzz9CyMNRxvHHf_jNolEHMg1OZT7kQH8SAAzEWQAC-86zfLdu7Ijqg0x7Z6kBzuMWKEBMG4i4j-FGG6xbNa?key=i6VOPgWULlhs1KUjQCjlIA"/><div class="image__source"><span class="image__source_text"><p>MINT-1T outperforms the previous leading interleaved dataset, OBELICS. <a class="link" href="https://blog.salesforceairesearch.com/mint-1t/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">As a result, the chances of seeing much <b>larger multimodal models </b>in the future has increased drastically. In fact, this dataset is already being used to train <a class="link" href="https://huggingface.co/Salesforce/xgen-mm-phi3-mini-instruct-r-v1?ref=blog.salesforceairesearch.com&utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">the XGen-MM</a> series.</p><h3 class="heading" style="text-align:left;" id="stability-a-is-double-play-stable-f">Stability AI’s Double Play: Stable Fast 3D and S4VD</h3><p class="paragraph" style="text-align:left;">Stability AI has introduced <a class="link" href="https://stability.ai/news/introducing-stable-fast-3d?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">Stable Fast 3D</a>, a new AI model capable of generating 3D assets from a <b>single image in 0.5 seconds</b>. It’s a big leap in the field of AI-driven 3D content creation since it could change the way we look at game development, visual effects, and product design.</p><p class="paragraph" style="text-align:left;">Stable Fast 3D&#39;s ability to quickly generate 3D assets challenges the traditional notion that high-quality 3D generation requires lengthy processing times. This could lead to new paradigms in real-time content creation and interactive design processes.</p><p class="paragraph" style="text-align:left;">The other model Stability AI released was <a class="link" href="https://huggingface.co/stabilityai/sv4d?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">SV4D</a>, which generates a 4D image matrix from a single-view video of an object. It generates 40 frames at 576x576 resolution and outperforms its predecessor, SV3D. </p><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXe04KO7rxkGpHevjQxK-HZars-y5CtK_q-c75qqPhXC1-4T-E3zfKLDii29txuaProkoK1-_9ngpA-8LuRFhdsNG60rWsyEds7xpYDKkA3bM7uHSYPM6fmhbLzGZpr_6yKMeQTaJqJs1RwqAykZUkeig5Di?key=i6VOPgWULlhs1KUjQCjlIA"/><div class="image__source"><span class="image__source_text"><p>SV4D offers better video synthesis than SV3D by boosting video frame consistency. <a class="link" href="https://arxiv.org/pdf/2407.17470?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">But we’re a little surprised to see <b>more model releases</b> since <a class="link" href="https://www.neatprompts.com/p/stability-ai-strategic-move-selling-clipdrop-to-jasper?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">Jasper bought ClipDrop</a> (an image creation and editing platform) from Stability earlier in the year. </p><h3 class="heading" style="text-align:left;" id="metas-sam-2-runways-gen-3-alpha-and">Meta&#39;s SAM 2, Runway&#39;s Gen-3 Alpha, and FLUX.1: The Next Wave of Important AI Models</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXdnyqXoBA5RluNxV_ZEmybYHM7Dn72kx599kMzrp0T6z-9VVwuXTP-rrnkmlfduVZu35sLcOFtuNryDIAj6zIhnltGCK_e6-U_KU3FRm0GfMeuO_bWEZEfKP_ov1eBhvMhFQd34Z3r85mBwl7-fIAf6Ofby?key=i6VOPgWULlhs1KUjQCjlIA"/><div class="image__source"><span class="image__source_text"><p>How SAM 2 works. <a class="link" href="https://ai.meta.com/blog/segment-anything-2/?utm_source=twitter&utm_medium=organic_social&utm_content=reel&utm_campaign=sam2" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Meta has introduced <a class="link" href="https://ai.meta.com/blog/segment-anything-2/?utm_source=twitter&utm_medium=organic_social&utm_content=reel&utm_campaign=sam2" target="_blank" rel="noopener noreferrer nofollow">Segment Anything Model 2 (SAM 2)</a>, which builds upon the success of its predecessor by offering improved accuracy, efficiency, and versatility in <b>segmenting objects within images.</b></p><p class="paragraph" style="text-align:left;">The fact that the model can annotate <b>8.4 times faster</b> than using SAM 1 per frame is pretty impressive. It addresses a critical need for real-time applications in fields like autonomous driving and augmented reality. Note that SAM 1 had a massive impact on annotation companies (most of which pivoted into RLHF), so SAM 2 being able to annotate a lot faster is a big deal.</p><p class="paragraph" style="text-align:left;">Yet another model in the AI image space we saw was <a class="link" href="https://help.runwayml.com/hc/en-us/articles/30266515017875-Creating-with-Gen-3-Alpha?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">Gen-3 Alpha by Runway.</a> It demonstrates enhanced capabilities in <b>understanding and executing complex prompts. </b></p><p class="paragraph" style="text-align:left;">Meanwhile, Midjourney is facing some <b>serious competition</b> with the release of <a class="link" href="https://blackforestlabs.ai/announcing-black-forest-labs/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">FLUX.1</a> by Black Forest Labs, which offers better performance in aspects like image detail and style diversity,</p><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXepWfj1Xy8pQyESbs1QfF-FLgxZmdfHwMO1x9EOBebAyCIgEedM4WXyGlONEL5jRUm9ZFIuD_Egwk6yLhtolv7FdGzGco3i-aYsQZcxjanqySVQMRAv5Z1vjZH6qBlMHLKNatczUy95J7fuRzqzdFlx2DUL?key=i6VOPgWULlhs1KUjQCjlIA"/><div class="image__source"><span class="image__source_text"><p>FLUX.1 models achieved the highest ELO. <a class="link" href="https://blackforestlabs.ai/announcing-black-forest-labs/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">These models use a <b>hybrid architecture</b> of multimodal and parallel diffusion blocks. They built on top of flow matching and included parallel attention layers to achieve such impressive results that drastically outperforms previous models.</p><h3 class="heading" style="text-align:left;" id="nvidia-delays-and-amd-vs-samsung-vs">NVIDIA Delays, and AMD vs Samsung vs Apple: The AI Chip Wars Heat Up</h3><p class="paragraph" style="text-align:left;">We previously saw that Nvidia made its move in the AI chip market by developing a new chip <a class="link" href="https://genai360.beehiiv.com/p/of-llms-and-imos?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">just for the Chinese market</a>. The story continues with <a class="link" href="https://siliconangle.com/2024/07/30/amd-steps-rival-nvidia-ai-chip-sector-huge-revenue-gains-sending-stock-higher/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">AMD </a>reporting significant revenue gains and unveiled plans to <b>compete more aggressively</b> with Nvidia in the AI chip market.</p><p class="paragraph" style="text-align:left;">AMD&#39;s focus on both CPUs and GPUs for AI workloads suggests a multi-pronged approach to AI computing, which could i<b>nfluence future hardware architectures</b> for AI systems. </p><p class="paragraph" style="text-align:left;">In other news,<a class="link" href="https://www.sammobile.com/news/samsung-hbm-memory-chip-sales-rise-50-percent/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow"> Samsung Electronics</a> reported a <b>remarkable 50% increase </b>in sales of its High Bandwidth Memory (HBM) chips, a critical component in AI accelerators. </p><p class="paragraph" style="text-align:left;">Samsung&#39;s success in this sector could reshape the competitive landscape of the semiconductor industry, with implications for other <b>major players like Nvidia and AMD.</b></p><p class="paragraph" style="text-align:left;">Nvidia’s chip dominance was further challenged by Apple last week (who, as we know, produces their own chips). Apple <a class="link" href="https://www.reuters.com/technology/apple-says-it-uses-no-nvidia-gpus-train-its-ai-models-2024-07-29/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">went with Google</a> TPUs instead of Nvidia to train their models, despite Nvidia controlling a massive 80% of the AI chip market.</p><p class="paragraph" style="text-align:left;">Apple was using two types of <b>Google TPUs </b>instead of relying on Nvidia GPUs for AI models used in iPhones.</p><p class="paragraph" style="text-align:left;">Nvidia also reported <a class="link" href="https://www.theverge.com/2024/8/3/24212518/nvidia-ai-chip-delay-blackwell-b200-microsoft-amazon-google-openai-meta-artificial-intelligence?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">delays in the next AI chip </a>(Blackwell B200) because of a “<b>design flaw</b>”.</p><h3 class="heading" style="text-align:left;" id="git-hub-challenges-hugging-face">GitHub Challenges Hugging Face</h3><p class="paragraph" style="text-align:left;">GitHub introduced <a class="link" href="https://github.blog/news-insights/product-news/introducing-github-models/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">GitHub models</a>, which is designed to enhance software development workflows, giving Hugging Face some serious competition. These models are integrated into GitHub Copilot and other GitHub products to <b>improve coding efficiency and accuracy.</b></p><p class="paragraph" style="text-align:left;">While Hugging Face excels in providing a wide range of models for various machine learning applications, including NLP, its broader focus may not cater as specifically to the coding needs of developers as GitHub Models does. This is seen by the fact that Hugging Face was reported to have <b>700,000 LLMs</b> last June.  </p><p class="paragraph" style="text-align:left;">In terms of users, GitHub has <b>100 million users,</b> while Hugging Face was reported to have <a class="link" href="https://www.forbes.com/companies/hugging-face/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">4 million users in August 2023.</a> (this number is likely to be higher now), so GitHub can distribute the models far more efficiently than HF.</p><h3 class="heading" style="text-align:left;" id="open-ai-endorses-senate-bills-and-e">OpenAI Endorses Senate Bills and EU AI Act is in Force</h3><p class="paragraph" style="text-align:left;">There wasn’t much from <a class="link" href="https://techcrunch.com/2024/07/30/openai-endorses-senate-bills-that-could-shape-americas-ai-policy/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">OpenAI </a>in terms of model releases in the couple weeks aside from GPT-4 mini, but they endorsed<b> three Senate bills </b>last week. These bills include:</p><ul><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://www.congress.gov/bill/118th-congress/senate-bill/4178?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">Future of AI Innovation Act</a></p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://www.dataguidance.com/news/usa-senators-introduce-nsf-ai-education-bill?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">NSF AI Education Act</a></p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://www.congress.gov/bill/118th-congress/senate-bill/2714/titles?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">Create AI Act</a></p></li></ul><p class="paragraph" style="text-align:left;">Probably doesn’t come as a surprise knowing that OpenAI is one of the biggest AI companies out there, so they certainly wouldn’t mind having a say on topics like this to get on the <b>good side </b>of lawmakers.</p><p class="paragraph" style="text-align:left;">In other news, the <a class="link" href="https://genai360.beehiiv.com/p/of-sonnets-and-agents?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">EU AI Act</a> we discussed before has now gone into force. It’s worth pointing out that it’ll <b>take some time </b>before all the rules will be applied though.</p><p class="paragraph" style="text-align:left;">Specifically, it focuses on ethical development and deployment of <a class="link" href="https://www.cryptopolitan.com/eu-ai-act-now-in-force/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">AI within EU,</a> categorizing AI applications based on their <b>risk level.</b> Higher risk means stricter rules, but it won’t affect the popular chatbots we all know and love like ChatGPT since most of them are considered to be minimal risk.</p><p class="paragraph" style="text-align:left;">It’s one of the <b>first comprehensive legal frameworks</b> for AI, but it’s unlikely to be the last as we move forward into a future with more powerful models.</p><h2 class="heading" style="text-align:left;" id="advancements-in-ai-research">Advancements in AI Research</h2><p class="paragraph" style="text-align:left;">From MindSearch&#39;s innovative approach to complex information retrieval to the practical advancements in video object detection, we&#39;re seeing AI tackle increasingly nuanced and real-world challenges. On-device models also continued to advance with Apple’s foundation language model paper, showing <b>i</b><b>mpressive results across multiple benchmarks.</b></p><h3 class="heading" style="text-align:left;" id="apples-on-device-ai-how-small-model">Apple&#39;s On-Device AI: How Small Models Achieve Big Results</h3><p class="paragraph" style="text-align:left;">Aside from the news about Apple not using Nvidia chips, they also introduced <b>two</b> new <a class="link" href="https://arxiv.org/abs/2407.21075?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">foundation language models</a> designed to power Apple Intelligence features across iOS, iPadOS, and macOS. </p><p class="paragraph" style="text-align:left;">The models, AFM-on-device (~3 billion parameters) and AFM-server (larger server-based model), aim to perform a wide range of tasks efficiently, accurately, and responsibly while addressing the challenges of running AI on <b>consumer devices and maintaining user privacy.</b></p><p class="paragraph" style="text-align:left;">To achieve this, Apple employed <b>several innovative techniques:</b></p><ul><li><p class="paragraph" style="text-align:left;"><b>Architecture optimization</b>: The models use a dense decoder-only architecture with improvements like grouped-query attention and RoPE positional embeddings for long-context support.</p></li><li><p class="paragraph" style="text-align:left;"><b>Efficient training</b>: A three-stage pre-training process (core, continued, and context-lengthening) using a diverse, high-quality data mixture and custom optimizer.</p></li><li><p class="paragraph" style="text-align:left;"><b>Adapter-based fine-tuning:</b> LoRA adapters for task-specific optimization without changing the base model.</p></li></ul><p class="paragraph" style="text-align:left;">AFM-on-device outperforms larger open-source models <b>like Mistral-7B</b> in instruction following and writing tasks, while AFM-server achieves competitive performance against GPT-3.5 and GPT-4 in various benchmarks. </p><p class="paragraph" style="text-align:left;">This means GPT3.5 isn’t just facing competition from Google’s Gemma 2 2B, but also from <b>Apple’s AFM-on-device model.</b></p><h3 class="heading" style="text-align:left;" id="can-ai-mimic-human-cognitive-proces">Can AI Mimic Human Cognitive Processes for Complex Web Searches?</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfS2pdmKLyx6lJPJo7ZaKKlC6x5MANFl1Wrmi78tIBeM7-sP-HYSagVhQ8WFnm4BtnLOxMrl91_ltDJQACG4ZpCqcLVG-6YHooy29sO5tUZDQB3hOU0DOFoKCLLcQDn4Us6ZUc07sWGjosWuoVXKmFAhpw?key=i6VOPgWULlhs1KUjQCjlIA"/><div class="image__source"><span class="image__source_text"><p>Framework of MindSearch. <a class="link" href="https://arxiv.org/pdf/2407.20183v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2407.20183v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">MindSearch</a> aims to overcome the limitations of current AI search methods, which often struggle with accurately retrieving and integrating information for complex queries that require multi-step reasoning and in-depth analysis.</p><p class="paragraph" style="text-align:left;">To achieve this, the team developed a simple yet effective LLM-based multi-agent framework consisting of a <b>WebPlanner and WebSearcher. </b>The WebPlanner models the human mind&#39;s multi-step information seeking process as a dynamic graph construction, decomposing user queries into atomic sub-questions.</p><p class="paragraph" style="text-align:left;">Meanwhile, the WebSearcher performs <b>hierarchical information retrieval </b>with search engines and collects valuable information for the WebPlanner. </p><p class="paragraph" style="text-align:left;">Notably, responses from MindSearch based on InternLM2.5-7B are preferred by humans <b>over those from </b>GPT-4o and Perplexity. It might change how we approach complex information retrieval and integration tasks in various domains.</p><h3 class="heading" style="text-align:left;" id="feature-selection-and-aggregation-b">Feature Selection and Aggregation Boost Accuracy and Speed in Video Object Detection</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXcl7Rd_JA3RusjAxRSqejLCAS6P1JadKAzVFFUocCcwVvAj0nqMJJvsp4x_LImuy9s6eWpoBfcEPJBIjZtgVMijT82z7gBCW6yck7xOWj57LsqXba7kX4J6M17IjiwUT-2qXeeKUWXfZVjAtpB115WFopOB?key=i6VOPgWULlhs1KUjQCjlIA"/><div class="image__source"><span class="image__source_text"><p>Schematic of the framework. <a class="link" href="https://arxiv.org/pdf/2407.19650v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Researchers from Tianjin University have introduced a different approach to <a class="link" href="https://arxiv.org/pdf/2407.19650v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">video object detection (VOD)</a> that addresses the challenges of <b>high across-frame variation i</b>n object appearance and diverse deterioration in video frames. </p><p class="paragraph" style="text-align:left;">To achieve this, the team developed <b>two key components: </b></p><ul><li><p class="paragraph" style="text-align:left;">A Feature Selection Module (FSM) to reject low-quality candidates and reduce computational expense</p></li><li><p class="paragraph" style="text-align:left;">A Feature Aggregation Module (FAM) that uses feature similarity measurements to guide the aggregation process.</p></li></ul><p class="paragraph" style="text-align:left;">The approach incorporates an <b>average pooling operator</b> on reference features to alleviate shortcomings of commonly-used cosine similarity.</p><p class="paragraph" style="text-align:left;">Results demonstrate that their model achieves a <b>new record performance</b> of 92.9% AP50 at over 30 FPS on the ImageNet VID dataset using a single 3090 GPU. It offers a practical solution for video object detection that balances accuracy and speed, making it suitable for large-scale or real-time applications.</p><h3 class="heading" style="text-align:left;" id="new-benchmark-for-ai-in-relational-">New Benchmark for AI in Relational Databases</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfNeWJL6_6m-pHpvGAJdWXqyKSPP3P5F2tfIs84h_li9l3_Mjnb4NkiToSR9w8pj6nKlTPgBMoP8fYzh7aKlL-3f74A-CvypKnMUuDNSWl1f_C9HEuI-IyTgSKCV11Vpkx1pjuaPD8DeRhD9Oodv9wruRU?key=i6VOPgWULlhs1KUjQCjlIA"/><div class="image__source"><span class="image__source_text"><p>RELBENCH overview. <a class="link" href="https://arxiv.org/pdf/2407.20060v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Researchers from Stanford University and <a class="link" href="https://Kumo.AI?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">Kumo.AI</a> have introduced <a class="link" href="https://arxiv.org/pdf/2407.20060v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">RELBENCH</a>, a comprehensive benchmark that aims to address the challenge of generating <b>large-scale, high-quality training data</b> for machine learning tasks in autonomous driving, particularly for map perception and related applications.</p><p class="paragraph" style="text-align:left;">To achieve this, they developed an extension to the Lanelet2 framework called lanelet2_ml_converter, which enables the generation of<b> diverse training labels</b> directly from HD maps. </p><p class="paragraph" style="text-align:left;">The extension introduces features such as: </p><ul><li><p class="paragraph" style="text-align:left;">Compound labels for independence from map annotation artifacts</p></li><li><p class="paragraph" style="text-align:left;">Traceability of labels to original map elements</p></li><li><p class="paragraph" style="text-align:left;">Support for varying local reference frame poses</p></li></ul><p class="paragraph" style="text-align:left;">Results showed that Relational Deep Learning (RDL) models trained on RELBENCH outperform traditional feature engineering approaches, reducing human work hours by <b>96% and lines of code by 94% on average. </b></p><p class="paragraph" style="text-align:left;">It’s important because it provides a <b>foundational infrastructure</b> for future research into RDL, which could speed up the development and validation of machine learning models for autonomous vehicles and other domains that rely on relational databases.</p><h2 class="heading" style="text-align:left;" id="frameworks-we-love">Frameworks We Love</h2><p class="paragraph" style="text-align:left;">Some frameworks that caught our attention in the last week include:</p><ul><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://pytorch.org/blog/torchchat-local-llm-inference/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">Torchchat</a>: A library by Meta AI that enables running large language models like Llama 3 and 3.1 locally on laptops, desktops, and mobile devices with high performance</p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://github.com/mckaywrigley/ai-router-chat?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">AI router chat</a>: Personal chatbot arena that adapts to the user</p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://github.com/opendatalab/PDF-Extract-Kit?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">PDF Extract kit</a>: Comprehensive toolkit for high-quality content extraction from PDF documents.</p></li></ul><p class="paragraph" style="text-align:left;">If you want your framework to be featured here, reply to this email and say hi :) </p><h2 class="heading" style="text-align:left;" id="conversations-we-loved">Conversations We Loved</h2><p class="paragraph" style="text-align:left;">Last week’s discussions gave us plenty to think about when it comes to building and using large-scale AI systems. From GPT-4o&#39;s <b>impressive 64K output </b>leap to a pharma CIO&#39;s candid take on Microsoft&#39;s Copilot, these discussions highlight how the AI world is grappling with both technical advances and real-world applications. </p><h3 class="heading" style="text-align:left;" id="gpt-4-os-64-k-leap">GPT4-o&#39;s 64K Leap</h3><p class="paragraph" style="text-align:left;">OpenAI&#39;s experimental release of GPT-4o with a 64K output capability has <a class="link" href="https://twitter.com/simonw/status/1818181750704804311?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">sparked discussion in the AI community</a> about the <b>future of large-scale</b> language model applications. </p><p class="paragraph" style="text-align:left;">The 64K output opens up new possibilities for tasks like full document translation, comprehensive structured data extraction, and long-form content generation. At <b>$</b><b>6/$18 per million input/output tokens,</b> the new alpha model is slightly more expensive than standard GPT-4o ($5/$15 per million input/output tokens). But it’s a little difficult to tell if increased output capability is worth the additional cost for different use cases.</p><p class="paragraph" style="text-align:left;">It could lead to more <b>efficient workflows </b>in industries like translation, data analysis, and content creation. However, it also raises important questions about the cost-effectiveness of AI solutions and the need for careful consideration of use cases that truly benefit from such extended output capabilities.</p><h3 class="heading" style="text-align:left;" id="why-was-the-copilot-ai-deal-cancell">Why Was the Copilot AI Deal Cancellation a Big Deal?</h3><p class="paragraph" style="text-align:left;">A recent revelation from a <a class="link" href="https://www.businessinsider.com/pharma-cio-cancelled-microsoft-copilot-ai-tool-2024-7?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">pharmaceutical company&#39;s CIO</a> provided some food for thought about the real-world value of enterprise AI tools. The CIO&#39;s decision to cancel their <b>Microsoft 365 Copilot subscription</b> after a <b>six-month trial period</b> raises important questions about the gap between AI&#39;s promised potential and its current practical applications.</p><p class="paragraph" style="text-align:left;">The CIO&#39;s comparison of Copilot&#39;s slide-generation capabilities to &quot;middle school presentations&quot; highlights the need for AI tools to deliver t<b>angible, high-quality results </b>that justify their cost.</p><p class="paragraph" style="text-align:left;">With Copilot doubling the cost of Microsoft 365 licenses, there&#39;s a growing debate about how to price AI capabilities in a way that aligns with their perceived value. Microsoft&#39;s massive investments in AI infrastructure raise questions about how tech giants will recoup these costs and what it means for <b>future pricing and product strategies. </b></p><h2 class="heading" style="text-align:left;" id="money-moving-in-ai">Money Moving in AI</h2><p class="paragraph" style="text-align:left;">It’s a little unusual to be mentioning <b>Reddit </b>in an AI newsletter, but they acquired a company called Memorable AI last week. In terms of successful funding rounds, Aisles and Sybil secured $500 million and $11 million respectively.</p><h3 class="heading" style="text-align:left;" id="aisles-secures-500-million-in-priva">Aisles Secures $500 Million in Private Equity Round and Introduces Aisles Enterprises</h3><p class="paragraph" style="text-align:left;">Aisles has secured <a class="link" href="https://www.accesswire.com/895388/aisles-unveils-aisles-enterprise-new-branch-raises-500m-in-private-equity-round-to-invest-in-tech-startups?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">$500 million</a> in a private equity round to fuel its expansion into tech startup investments. The company introduced <b>Aisles Enterprise</b>, a new branch dedicated to identifying and supporting promising AI and tech startups.</p><h3 class="heading" style="text-align:left;" id="sybil-secures-11-million-in-seed-fu">Sybil Secures $11 Million in Seed Funding Round</h3><p class="paragraph" style="text-align:left;">Sybill, a startup developing an AI assistant for salespeople, has secured <a class="link" href="https://techcrunch.com/2024/07/31/sybill-raises-11m-for-its-ai-assistant-that-helps-salespeople-reduce-administrative-burden/?guccounter=1&guce_referrer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&guce_referrer_sig=AQAAACC3ha9owFhbSK5q8jfGifxti0wLrQFxoYArZ9AxOt9tBgKEa0OwuBUKDF3rDsvG34AOZYSqlT9vbqk6II4gZsmdouXb1i5cdJ-up4emvcNNFdOvjU8dfDLlqI8Dw3ualOkkNAbJ4yCEmPA3TrLU_CAiBngQvHlo_CXeGh0xSvGi&utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">$11 million </a>in seed, led by Khosla Ventures. The company&#39;s AI tool aims to reduce the <b>administrative burden</b> on sales teams by automating tasks like note-taking, data entry, and follow-up scheduling.</p><h3 class="heading" style="text-align:left;" id="reddit-acquires-memorable-ai">Reddit Acquires Memorable AI</h3><p class="paragraph" style="text-align:left;">Reddit has made its first acquisition since going public in March, purchasing ad-optimization company <a class="link" href="https://c.newsnow.co.uk/A/1238698270?-42899%3A29419=&utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=gemma-2-2b-gpt-3-5-open-source-flux-1-vs-midjourney-github-takes-on-hugging-face" target="_blank" rel="noopener noreferrer nofollow">Memorable AI </a>for an undisclosed amount. Memorable AI uses artificial intelligence to<b> analyze audience reactions</b> to content, helping to determine what resonates with specific groups.</p></div><div class='beehiiv__footer'><br class='beehiiv__footer__break'><hr class='beehiiv__footer__line'><a target="_blank" class="beehiiv__footer_link" style="text-align: center;" href="https://www.beehiiv.com/?utm_campaign=1f4e8aec-38c7-4e46-8cde-885a70dcf367&utm_medium=post_rss&utm_source=genai360_weekly_ai_news">Powered by beehiiv</a></div></div>
  ]]></content:encoded>
</item>

      <item>
  <title>Second GPT-4 Level Open Model, International Mathematical Olympiad vs DeepMind&#39;s LLM + RLHF</title>
  <description>Plus, Nvidia Makes Chips for China</description>
      <enclosure url="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/35982600-d9d3-4ef5-96e8-44756635d06e/mikayelh_A_3d_isometric_white_and_orange_cubes_formin_the_let_5c2e3afc-e64e-4894-9bd6-4722c6e0c26c_2.png" length="974103" type="image/png"/>
  <link>https://genai360.beehiiv.com/p/of-llms-and-imos</link>
  <guid isPermaLink="true">https://genai360.beehiiv.com/p/of-llms-and-imos</guid>
  <pubDate>Tue, 30 Jul 2024 13:33:42 +0000</pubDate>
  <atom:published>2024-07-30T13:33:42Z</atom:published>
  <content:encoded><![CDATA[
    <div class='beehiiv'><style>
  .bh__table, .bh__table_header, .bh__table_cell { border: 1px solid #C0C0C0; }
  .bh__table_cell { padding: 5px; background-color: #FFFFFF; }
  .bh__table_cell p { color: #2D2D2D; font-family: 'Helvetica',Arial,sans-serif !important; overflow-wrap: break-word; }
  .bh__table_header { padding: 5px; background-color:#F1F1F1; }
  .bh__table_header p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important; overflow-wrap: break-word; }
</style><div class='beehiiv__body'><p class="paragraph" style="text-align:left;">Before we start, share this week&#39;s news with a friend or a colleague:</p><div class="button" style="text-align:left;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="background-color:#ff8a00;" href="https://genai360.beehiiv.com/p/of-llms-and-imos?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf"><span class="button__text" style=""> Share the newsletter </span></a></div><h2 class="heading" style="text-align:left;" id="key-takeaways">Key Takeaways</h2><ul><li><p class="paragraph" style="text-align:left;">Meta&#39;s Llama 3.1 demonstrates <b>advanced reasoning capabilities</b>, solving complex math puzzles that have stumped other AI models, including GPT-4.</p></li><li><p class="paragraph" style="text-align:left;">Mistral releases Large 2, a <b>123 billion parameter model</b>, claiming performance on par with offerings from OpenAI and Meta in code generation, mathematics, and reasoning.</p></li><li><p class="paragraph" style="text-align:left;">Google DeepMind unveils AlphaGeometry, an AI system capable of solving International Mathematical Olympiad (IMO) geometry problems at a <b>silver medal level.</b></p></li><li><p class="paragraph" style="text-align:left;">RT-DETRv2 improves upon RT-DETR by offering greater flexibility in multi-scale feature extraction and <b>achieving enhanced performance</b> without speed loss across various detector sizes.</p></li><li><p class="paragraph" style="text-align:left;">LazyLLM is a dynamic token pruning method for<b> efficient long context LLM inference</b>, accelerating the pre-filling stage of LLama 2 7B by 2.34x while maintaining accuracy.</p></li></ul><p class="paragraph" style="text-align:left;"><i>Got forwarded this newsletter? Subscribe below👇</i></p><div class="button" style="text-align:left;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="background-color:#ff8a00;" href="https://genai360.beehiiv.com/subscribe?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf"><span class="button__text" style=""> Subscribe </span></a></div><h2 class="heading" style="text-align:left;" id="the-latest-ai-news">The Latest AI News</h2><p class="paragraph" style="text-align:left;">From Meta&#39;s Llama 3.1 solving complex math puzzles to Mistral&#39;s Large 2 challenging industry giants, we&#39;re seeing model capabilities and applications grow rapidly. Meanwhile, controversies around data usage and the race for market dominance remind us that the path to AI advancement isn’t <b>without its ethical and practical challenges</b>.</p><h3 class="heading" style="text-align:left;" id="llamas-math-prowess-mistrals-leap-a"><span style="color:rgb(67, 67, 67);">Llama&#39;s Math Prowess, Mistral&#39;s Leap, and NVIDIA&#39;s Miniature Marvels</span></h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXf0jvHaARKvu401oKCeilhGooqRCFdY_inXqy2KeLyYwILyWcwj8x8sawsAnsdOiDJxGAMYkmG7lhoWDpXhedO5C1HJdoIuAk3EHy-dnuuKkkn45hQ4lDiup1QfvSytcM4w-C0Rt5RMdps1x5rcnXjozYra?key=y-BK4rgat6gAFF-1TTPy3A"/><div class="image__source"><span class="image__source_text"><p>Mistral Large 2 outperforms Llama 3.1 at code generation and math. <a class="link" href="https://mistral.ai/news/mistral-large-2407/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">We previously covered how <a class="link" href="https://genai360.beehiiv.com/p/of-llamas-and-slms?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">Llama 3.1</a> is attracting <b>considerable attention</b> due to its impressive benchmark performances and potential to challenge other flagship models.</p><p class="paragraph" style="text-align:left;">Last week, <a class="link" href="https://www.linkedin.com/posts/omarsar_this-is-wild-llama-31-405b-instruct-finally-activity-7221566292485378049-OV2z/?utm_source=share&utm_medium=member_ios" target="_blank" rel="noopener noreferrer nofollow">Llama 3.1</a> continued to gain attention with a notable demonstration. In it, Llama 3.1 successfully solved a <b>complex math puzzle</b> involving candle lengths, displaying advanced logical reasoning that has left other AI models stumped.</p><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/5f4108bf-400a-48c7-afb2-59edac9417a4/image.png?t=1722343745"/><div class="image__source"><span class="image__source_text"><p>A tough math problem that Llama 3.1 could handle while other models were unable to solve it. <a class="link" href="https://www.linkedin.com/posts/omarsar_this-is-wild-llama-31-405b-instruct-finally-activity-7221566292485378049-OV2z/?utm_source=share&utm_medium=member_ios" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">The model correctly deduced that the<b> longest remaining candle</b> (option 3 in the image above) was the first to be blown out, a task that even GPT-4 has struggled with in similar tests.</p><p class="paragraph" style="text-align:left;">But Meta wasn’t the only one to recently release a<b> state-of-the-art LLM. </b></p><p class="paragraph" style="text-align:left;">A day after Meta&#39;s Llama 3.1 announcement, Mistral released its new flagship model, Large 2, claiming performance <b>on par</b> with the latest offerings from OpenAI and Meta in code generation, mathematics, and reasoning. </p><p class="paragraph" style="text-align:left;">With 123 billion parameters, <a class="link" href="https://mistral.ai/news/mistral-large-2407/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">Large 2</a> reportedly outperforms Meta&#39;s recently released Llama 3.1 405B in <b>code generation and math tasks</b>, despite having less than a third of the parameters. The creators also claim SOTA for function calling.</p><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfmu6dQuqwmBrnkJ5zW2Whx_XioSjbCXeW5r6XXN4FRk5AQh6tDSUeAa-iIXGZ1g1FsPFsWAjTIifZ_AmJQeMD-isV_lxkwu30BOWHv0G-SCeB2qeOzFelJ1RxrR04tTtCNTyHTxhZQhdRh3mw4uBb0ozXy?key=y-BK4rgat6gAFF-1TTPy3A"/><div class="image__source"><span class="image__source_text"><p>Large 2 and Llama 3.1 are both close to GPT-4o in terms of performance. <a class="link" href="https://twitter.com/emollick/status/1816138569292951776?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Mistral emphasizes <b>reduced hallucination issues</b> and improved multilingual support, covering 12 languages and 80 coding languages. The model features a 128,000 token context window and is available on major cloud platforms. </p><p class="paragraph" style="text-align:left;">Of course, you can’t have an AI news roundup without mentioning Nvidia somewhere. They introduced <a class="link" href="https://github.com/NVlabs/Minitron?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">Minitron</a>, a new family of small language models (SLMs) derived from their larger Nemotron-4 15B model. The Minitron models, available in<b> 8B and 4B parameter sizes</b>, are created through a combination of pruning and knowledge distillation techniques. </p><p class="paragraph" style="text-align:left;">This approach significantly<b> reduces the computational cost of training</b>, requiring up to 40x fewer tokens and resulting in a 1.8x reduction in overall compute costs for the model family. </p><p class="paragraph" style="text-align:left;">Despite their smaller size, Minitron models demonstrate competitive performance, with up to 16% improvement in MMLU scores compared to models trained from scratch, and <b>comparable results </b>to other community models like Mistral 7B, Gemma 7B, and Llama-3 8B. </p><h3 class="heading" style="text-align:left;" id="deep-mind-advances-math">DeepMind Advances Math </h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXdXwvCd1X5umsqiW5UBY8tR3v5UDZYSTYygLgZsRAxqWUPn_ZZ1MqecjjywvyiqQiFOJ0Yi4RjaINeaVJ1RTlF5yvVNykQDajSXjFv_sF9ZQuqcSNt6BSfxtWKGsO29qT77x2Oxu5TXeDW6cvWTgfayFpXE?key=y-BK4rgat6gAFF-1TTPy3A"/><div class="image__source"><span class="image__source_text"><p>AlphaGeometry earned 28 out of 42 points, putting it on the same level as a silver medalist. <a class="link" href="https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/?utm_source=x&utm_medium=social&utm_campaign=&utm_content=" target="_blank" rel="noopener noreferrer nofollow">(Source) </a></p></span></div></div><p class="paragraph" style="text-align:left;">Google DeepMind unveiled <a class="link" href="https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">AlphaProof, a new reinforcement-learning based system for formal math reasoning, and AlphaGeometry 2, an improved version of our geometry-solving system</a>. Together, the two present an AI system capable of solving International Mathematical Olympiad (IMO) geometry problems at a <b>silver medal level (4 out of 6 problems).</b></p><p class="paragraph" style="text-align:left;">Traditional AI systems have struggled with formal mathematical proofs due to limited training data, while NLP, despite access to vast data, often produces plausible but incorrect proofs. </p><p class="paragraph" style="text-align:left;">AlphaProof solves this problem by bridging the gap, combining a fine-tuned Gemini model with reinforcement learning techniques similar to AlphaZero. Using the formal language Lean, AlphaProof generates verifiable proofs and continuously improves by learning from its own verified solutions. Read more on what&#39;s special about the AlphaProof and AlphaGeometry 2 <a class="link" href="https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">here</a>.</p><h3 class="heading" style="text-align:left;" id="the-middle-easts-4-billion-dollar-b">The Middle East&#39;s 4 Billion-Dollar Bet and Nvidia&#39;s Chinese Gambit</h3><p class="paragraph" style="text-align:left;">Abu Dhabi is set to create a <b>major player</b> in the AI and space technology sectors with the merger of Yahsat and Bayanat AI to form <a class="link" href="https://www.zawya.com/en/business/technology-and-telecom/uaes-4bln-space42-set-to-become-menas-biggest-ai-powered-space-tech-firm-wkphlckp?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">Space42. </a></p><p class="paragraph" style="text-align:left;">The new entity, <b>valued at $4 billion</b>, aims to become the Middle East and North Africa&#39;s largest AI-powered space technology company. Space42 will integrate satellite communications and business intelligence to position itself for both regional and global opportunities. </p><p class="paragraph" style="text-align:left;">Alongside the new Minitron models, Nvidia also made moves in the AI chip market. They are reportedly developing a new AI chip <b>specifically tailored</b> for the Chinese market, aiming to comply with U.S. export restrictions while maintaining their presence in this crucial market. A couple of weeks ago, OpenAI was in talks to develop their <a class="link" href="https://genai360.beehiiv.com/p/of-llamas-and-slms?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">own AI chip.</a></p><p class="paragraph" style="text-align:left;"><a class="link" href="https://www.cryptopolitan.com/nvidia-developing-version-ai-chip-china/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">Nvidia’s newest chip</a>, known internally as the H20, is a <b>scaled-down</b> version of Nvidia&#39;s flagship H100 AI accelerator, designed to meet the U.S. government&#39;s performance thresholds for exports to China. </p><p class="paragraph" style="text-align:left;">This move comes as Nvidia seeks to balance regulatory compliance with its business interests in China, which accounted for <b>21% of its revenue </b>in the most recent fiscal year. The H20 is expected to be part of a new product line that includes the L20 and L2 chips, all aimed at addressing the growing demand for AI hardware in China while navigating complex geopolitical challenges.</p><h3 class="heading" style="text-align:left;" id="runway-accused-of-allegedly-using-p">Runway Accused of Allegedly Using Publicly Available YouTube Videos</h3><p class="paragraph" style="text-align:left;">Video generation startup <a class="link" href="https://siliconangle.com/2024/07/25/latest-ai-training-drama-runway-accused-using-publicly-available-youtube-videos/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">Runway AI is facing controversy</a> over its AI training practices. According to a report by 404 Media, the company allegedly used thousands of publicly available YouTube videos to train its <b>Gen-3 Alpha model</b>, which generates 10-second videos. </p><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfyektyj37Cb4oqZNJJymNRJfDayfVeI9vU-ag_8GOEABETG4cFo9PmDZ4htrJaOWFJAiiUqe7QvYJZOveaTHzYIJl5iqGTp1Pc2uCtDRySrmtzGqbGlSEyNZBDtCysaxTohn9p11UeJpvO5h8_udT8jWvL?key=y-BK4rgat6gAFF-1TTPy3A"/><div class="image__source"><span class="image__source_text"><p>Tech Review MKHB pointed out that a lot of his videos were used by Runway to train their video generator.<a class="link" href="https://twitter.com/MKBHD/status/1816487078265344313?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow"> (Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">The accusation is based on a leaked internal spreadsheet, which suggests Runway scraped content from popular YouTube creators, brands, and even pirated films. While this has sparked debate about the ethics and legality of using<b> publicly available content</b> for AI training, experts note that the legal landscape around such practices remains unclear.</p><p class="paragraph" style="text-align:left;">This isn’t the first time we’ve heard of this type of accusation, as <a class="link" href="https://genai360.beehiiv.com/p/new-chips-and-controversies?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">OpenAI reportedly used over a million hours</a> of YouTube videos to train <b>GPT-4 last April.</b></p><h3 class="heading" style="text-align:left;" id="how-machines-are-calling-the-bluff-">How Machines are Calling the Bluff on Human Poker Champions</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfDSPm2IX2Ff-Lks-OCoOyRv8esGXkgnxYUHt6WhhkYHO6vO3SAZmW7ajyY3DNfjq0q7KNoiFQO_RtOtGqVpIA-ysfHvniDZWtavxQJxDAt_7WUvrqg_UM0xAF2KWjPdPV2QNalB0epiLTntO_XABG-tRfC?key=y-BK4rgat6gAFF-1TTPy3A"/><div class="image__source"><span class="image__source_text"><p>How Pluribus’ blueprint strategy improves while training on a 64-core CPU. <a class="link" href="https://ai.meta.com/blog/pluribus-first-ai-to-beat-pros-in-6-player-poker/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Facebook AI and Carnegie Mellon University have developed <a class="link" href="https://ai.meta.com/blog/pluribus-first-ai-to-beat-pros-in-6-player-poker/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">Pluribus</a>, the first AI to consistently beat elite human professionals in<b> six-player no-limit Texas Hold&#39;em poker.</b></p><p class="paragraph" style="text-align:left;">We’ve seen AI win against the World Go champion before, but the fact that AI was able to win in poker is pretty impressive. It&#39;s the first time an AI system has outperformed humans in a complex game <b>with more than two players or teams. </b></p><p class="paragraph" style="text-align:left;">Pluribus won decisively against top poker professionals, including World Series of Poker champions, earning an <b>estimated $1,000 per hour</b> against five human players.</p><p class="paragraph" style="text-align:left;">The AI&#39;s success stems from its ability to handle hidden information and multiple players efficiently. Pluribus uses self-play to develop its strategy without human input and employs a <b>novel search algorithm </b>that looks only a few moves ahead rather than to the end of the game. </p><p class="paragraph" style="text-align:left;">Pluribus was trained using relatively modest computing resources - <b>less than $150</b> worth of cloud computing. Contrary to popular belief, you don’t need extensive computational resources to pull off impressive feats like this.</p><h3 class="heading" style="text-align:left;" id="open-a-is-search-gpt-met-with-skept">OpenAI&#39;s SearchGPT Met With Skepticism<span style="color:rgb(67, 67, 67);"> </span>and Microsoft’s AI-Powered Feature for Search Results</h3><p class="paragraph" style="text-align:left;">OpenAI has introduced <a class="link" href="https://techcrunch.com/2024/07/25/with-google-in-its-sights-openai-unveils-searchgpt/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">SearchGPT</a>, a<b> new search feature</b> designed to provide &quot;timely answers&quot; to questions using web sources. The prototype, powered by GPT-3.5, GPT-4, and GPT-4o models, is currently available to a limited group of users and publishers. </p><p class="paragraph" style="text-align:left;">While OpenAI positions SearchGPT as a more responsible AI search tool with clear attribution and publisher collaboration, the announcement was met with skepticism from industry observers. </p><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/8ab17eb2-826d-4e16-8560-735659afb63a/image.png?t=1722343419"/><div class="image__source"><span class="image__source_text"><p>SearchGPT gets its own demo… wrong</p></span></div></div><p class="paragraph" style="text-align:left;">Funnily, just like <a class="link" href="https://www.theverge.com/2023/2/8/23590864/google-ai-chatbot-bard-mistake-error-exoplanet-demo?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">Bard by Google back in the day</a> (RIP), the first product demo shows <a class="link" href="https://www.theverge.com/2024/7/25/24206488/openais-searchgpt-demo-results-arent-actually-that-helpful?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">a factual error</a>. When mock user types “music festivals in boone north carolina in august,” SearchGPT pulls up a list of festivals, the first being An Appalachian Summer Festival. Then the tool tells the user the festival is on for dates when it’s officially closed (festival dates are from 6/29 - 7/27). But SearchGPT gives the dates as<b> 7/29 to 8/16</b>. </p><p class="paragraph" style="text-align:left;">In the meantime, Microsoft also expanded its AI offerings by rolling out a beta of a <a class="link" href="https://www.engadget.com/microsoft-is-adding-ai-powered-summaries-to-bing-search-results-203053790.html?utm_source=tldrai&guccounter=1" target="_blank" rel="noopener noreferrer nofollow">new AI-powered feature </a>for Bing search results, which provides <b>concise summaries</b> of web pages directly in the search results.  </p><p class="paragraph" style="text-align:left;">This feature is powered by GPT-4 and provides users with key information from websites without the need to click through.</p><h2 class="heading" style="text-align:left;" id="advancements-in-ai-research">Advancements in AI Research</h2><p class="paragraph" style="text-align:left;">Multi-agent research saw further advancements last week, handling issues with scalability. Moreover, <b>autonomous driving</b> saw some important progress with real-time object detection and training data generation.</p><h3 class="heading" style="text-align:left;" id="scaling-multi-agent-simulations-to-">Scaling Multi-Agent Simulations to Millions With AgentScope</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXe5JV-jMhri1Rq3qaOoUbIWOwy-nR64esCV9NOC5F9HcjISi_pmwzZN7mEWaThzjQuYsbGboP6HofpCDbsxSNTJlvxFJ7PMnLlxgnHJ8Bqb0UKzYEgM5LYF7-OuUmMEDgzTqaq5pkXfLWmnXt5-Qv4jUWeF?key=y-BK4rgat6gAFF-1TTPy3A"/><div class="image__source"><span class="image__source_text"><p>The multi-layer environment structure. <a class="link" href="https://arxiv.org/pdf/2407.17789v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2407.17789v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">AgentScope</a> addresses key challenges in<b> conducting large-scale</b> multi-agent simulations, including limited scalability, unsatisfied agent diversity, and effort-intensive management processes. </p><p class="paragraph" style="text-align:left;">To tackle these issues, the authors:</p><ul><li><p class="paragraph" style="text-align:left;">Developed an actor-based distributed mechanism for improved scalability and efficiency</p></li><li><p class="paragraph" style="text-align:left;">Provided flexible environment support for various real-world scenarios</p></li><li><p class="paragraph" style="text-align:left;">Integrated tools for creating diverse agent backgrounds and managing large numbers of agents across multiple devices. </p></li></ul><p class="paragraph" style="text-align:left;">Their experiments demonstrate the ability to conduct simulations involving 1 million agents using only 4 devices, showing drastic improvements in <b>scalability and efficiency</b> compared to existing approaches.</p><p class="paragraph" style="text-align:left;">By providing a comprehensive framework that addresses<b> both technical and usability challenges</b>, AgentScope lets researchers and developers conduct more realistic and complex simulations involving a massive number of diverse agents.</p><p class="paragraph" style="text-align:left;">This could lead to <b>valuable insights </b>in fields such as social science, economics, and urban planning, where understanding the collective behavior of large populations is key.</p><h3 class="heading" style="text-align:left;" id="enhancing-real-time-object-detectio">Enhancing Real-Time Object Detection Performance with RT-DETRv2</h3><p class="paragraph" style="text-align:left;"></p><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXcg-JD-g21jC64KjmrtUQFKXdlRLM4SntZgEuzIsGx7z8NlkirpTQXv5u5R6zTLcdbsoBHehrdKorUyd46sy46imjVIdXYM_4jU1daWK14Bq85kL-611d533Wzvy0P-a0rfDJks4ZwIHSI2f6xog-rpJ1Ev?key=y-BK4rgat6gAFF-1TTPy3A"/><div class="image__source"><span class="image__source_text"><p>RT-DETRv2 shows notable improvements over its predecessor. <a class="link" href="https://arxiv.org/pdf/2407.17140v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2407.17140v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">RT-DETRv2</a> addresses key challenges in<b> real-time object detection</b>, including the need for greater flexibility in multi-scale feature extraction, deployment constraints associated with DETRs, and performance optimization without sacrificing speed.</p><p class="paragraph" style="text-align:left;">Researchers at Peking University proposed setting <b>distinct numbers of sampling points</b> for features at different scales in deformable attention, introducing an optional discrete sampling operator to replace the grid_sample operator, and implementing dynamic data augmentation.</p><p class="paragraph" style="text-align:left;">The results show that RT-DETRv2 provides an improved baseline for RT-DETR with increased flexibility and practicality. It achieves<b> enhanced performance</b> without speed loss across various detector sizes.</p><p class="paragraph" style="text-align:left;">By addressing deployment constraints and optimizing training strategies, RT-DETRv2 pushes the boundaries of what&#39;s possible in real-time object detection. It could certainly impact a wide range of applications, <b>from autonomous driving to video surveillance.</b></p><h3 class="heading" style="text-align:left;" id="generating-diverse-training-data-fo">Generating Diverse Training Data for Autonomous Driving</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXdNRE_mZnNsbOzVnE9xdGNJmy-c1rpOZ4WhRBp6TQ04LnRSyAzkX6xx8u0YS3IuMrXl-kXMlSytQ3zB_Rl6tiMI8bEIsCyMxJp9Mm_d4ne1aCUTueP21QNEfODt3Mnl7RYd5FW-y0CcNV5FnT9tWrZpqIzp?key=y-BK4rgat6gAFF-1TTPy3A"/><div class="image__source"><span class="image__source_text"><p>Applications of the software module. <a class="link" href="https://arxiv.org/pdf/2407.17409v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Speaking of autonomous driving, <a class="link" href="https://arxiv.org/pdf/2407.17409v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">this paper</a> addresses the challenge of generating large-scale, <b>high-quality training data</b> for machine learning tasks in autonomous driving - particularly for map perception and related applications.</p><p class="paragraph" style="text-align:left;">The authors propose an extension to the Lanelet2 framework called lanelet2_ml_converter. This extension enables the generation of diverse training labels directly from HD maps while maintaining compatibility with <b>existing automated driving functionalities.</b></p><p class="paragraph" style="text-align:left;">The extension introduces features such as:</p><ul><li><p class="paragraph" style="text-align:left;">Compound labels for independence from map annotation artifacts</p></li><li><p class="paragraph" style="text-align:left;">Traceability of labels to original map elements</p></li><li><p class="paragraph" style="text-align:left;">Support for varying local reference frame poses.</p></li></ul><p class="paragraph" style="text-align:left;">Results were impressive as they showed the proposed framework&#39;s flexibility and effectiveness in generating training data for <b>various map perception tasks</b>, including online HD map construction, topology inference, and map fusion.</p><p class="paragraph" style="text-align:left;">It bridges the gap between HD maps used in automated driving and the growing need for large-scale, <b>standardized training data</b> in AI-based mapping and perception tasks. </p><h3 class="heading" style="text-align:left;" id="using-lazy-llm-to-accelerate-llm-in">Using LazyLLM to Accelerate LLM Inference with Dynamic Token Pruning</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXf4JBRU8BE8i61OzEpx12FL6pCd-w6rlwa1U1B79rYayYifu1YpgoABl-OVg9HDHv1zOfG831mx-koyaROrnzaPwQnP6yigIK2OBVVYCIEKLe2j08kYFt_E9XwBlqLWsAIUFs4-QiWTO4ES469cbDyAVL6b?key=y-BK4rgat6gAFF-1TTPy3A"/><div class="image__source"><span class="image__source_text"><p>How LazyLLM differs from standard LLM. <a class="link" href="https://arxiv.org/pdf/2407.17409v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2407.14057?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">LazyLLM</a> addresses the challenge of <b>slow first token generation</b> in large language models when processing long prompts, which can significantly impact overall inference speed.</p><p class="paragraph" style="text-align:left;">To solve this, the authors introduce a<b> dynamic token pruning method </b>that selectively computes key-value (KV) caches only for tokens deemed important for next token prediction in both the prefilling and decoding stages, allowing the model to adapt its token selection at each generation step.</p><p class="paragraph" style="text-align:left;">Experiments across various tasks demonstrate that LazyLLM can <b>accelerate the prefilling stage</b> of the LLama 2 7B model by 2.34x while maintaining accuracy on multi-document question-answering tasks.</p><p class="paragraph" style="text-align:left;">Notably, LazyLLM offers<b> a generic method</b> to improve the efficiency of LLM inference, particularly for long-context scenarios, without requiring model fine-tuning.</p><p class="paragraph" style="text-align:left;">By focusing on optimizing the often-overlooked prefilling stage, LazyLLM provides a complementary approach to existing methods that <b>primarily target decoding efficiency, </b>potentially leading to more comprehensive improvements in LLM inference speed across various applications.</p><h2 class="heading" style="text-align:left;" id="frameworks-we-love">Frameworks We Love</h2><ul><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2407.18038?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">TiCoSS</a>: Tightens the coupling between semantic segmentation and stereo matching tasks for autonomous driving perception. </p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2407.17998?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">iNNspector</a>: Comprehensive system for systematic debugging of deep learning models, providing interactive visualizations and tools to explore model architectures.</p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://pub.towardsai.net/structured-financial-data-extraction-from-unstructured-data-ca2c8d166de6?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">Structured financial data extraction:</a> Extracts financial data from unstructured documents like balance sheets and financial statements using a combination of schema-based extraction with Pydantic, LLM, and the Indexify framework.</p></li></ul><h2 class="heading" style="text-align:left;" id="conversations-we-loved">Conversations We Loved</h2><p class="paragraph" style="text-align:left;">Last week we saw discussions about the in-depth process that goes into <b>building generative AI platforms</b>, alongside the differences in the technicalities of terms like open-source and free models by using Llama 3 as an example.</p><h3 class="heading" style="text-align:left;" id="deep-dive-into-building-a-generativ">Deep Dive Into Building a Generative AI Platforms</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXcC-yBiN5J0ZNbA83zbUwM37vahcHiMo2fvae7DNHyD3v4tcym23bzAVebz2imz4sLnGRx_oR0EheJ3ZgSdMOLPhX7_eBobr_xQA6_2qwCTaTGxrSVgJsKOohkkvQluwACHnPOvjnswoPnmHDOjFr3RtHI?key=y-BK4rgat6gAFF-1TTPy3A"/><div class="image__source"><span class="image__source_text"><p>Huyen’s exploration of how to build a generative AI platform. <a class="link" href="https://huyenchip.com/2024/07/25/genai-platform.html?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Chip Huyen, a prominent figure in machine learning systems, shared a comprehensive overview of building generative AI platforms, <b>outlining common components and their implementations. </b></p><h3 class="heading" style="text-align:left;" id="is-llama-31-really-open-source">Is Llama 3.1 Really Open Source?</h3><p class="paragraph" style="text-align:left;">We don’t really think about it, but there are actually key distinctions between the terms <b>open-source, open weights, and free models</b>. However, <a class="link" href="https://www.linkedin.com/posts/eordax_llm-llama-ai-ugcPost-7222564549281910785-A-Xi?utm_source=share&utm_medium=member_desktop" target="_blank" rel="noopener noreferrer nofollow">Generative AI lead at AWS Eduardo Ordax</a> brought this issue to light by using Llama 3.1 as an example.</p><p class="paragraph" style="text-align:left;">𝗢𝗽𝗲𝗻 𝗦𝗼𝘂𝗿𝗰𝗲: you get the whole shebang—source code, hyperparameters, the original dataset, and all the juicy documentation. It&#39;s like getting the keys to a candy store and being told, &quot;Go nuts!&quot;.</p><p class="paragraph" style="text-align:left;">𝗢𝗽𝗲𝗻 𝗪𝗲𝗶𝗴𝗵𝘁𝘀: You can use the pre-trained model and even fine-tune it, but you won&#39;t get the original code or training methods (just like for Llama v3.1 and Mistral Large 2).</p><p class="paragraph" style="text-align:left;">𝗟𝗶𝗰𝗲𝗻𝘀𝗶𝗻𝗴 𝗟𝗶𝗺𝗶𝘁𝗮𝘁𝗶𝗼𝗻𝘀: Llama 3.1 is released under the <a class="link" href="https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">Llama 3.1 Community License Agreement</a>. Commercial use is allowed, but with limitations. Conversely, Large 2 is allowed only for non-commercial news.</p><p class="paragraph" style="text-align:left;">While downloading the model is free, the post points out the <b>significant costs </b>associated with deploying and running inference, challenging the notion of Llama 3.1 being entirely &quot;free.&quot;</p><p class="paragraph" style="text-align:left;">The highlights how the AI community needs to be more accurate in its use of terms like &quot;open source&quot; and &quot;free,&quot; since these have specific implications <b>for model development and adoption.</b></p><h2 class="heading" style="text-align:left;" id="money-moving-in-ai">Money Moving in AI</h2><p class="paragraph" style="text-align:left;">Cowbell and Harvey had <b>successful Series C funding rounds</b>, raising $60 million and $100 million. Meanwhile, Cohere continues to move forward in its competition with Cohere by raising $500 million.</p><h3 class="heading" style="text-align:left;" id="cohere-raises-500-million-while-tak">Cohere Raises $500 Million While Taking on OpenAI </h3><p class="paragraph" style="text-align:left;">AI startup Cohere has raised<a class="link" href="https://fortune.com/2024/07/23/after-500-million-funding-ai-startup-cohere-layoffs/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow"> $500 million</a> in a funding round that values the company at <b>over $5 billion</b>, signalling its ambition to compete with industry leaders like OpenAI. However, Cohere cut around 20 employees the day after this funding round, which is about 5% of its total employees.</p><h3 class="heading" style="text-align:left;" id="harvey-secures-100-million-in-serie">Harvey Secures $100 Million in Series C Funding</h3><p class="paragraph" style="text-align:left;">Harvey, an AI-powered legal technology startup, has secured <a class="link" href="https://www.harvey.ai/blog/harvey-raises-series-c?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">$100 million</a> in Series C funding at a<b> $1.5 billion valuation,</b> led by Google Ventures.</p><p class="paragraph" style="text-align:left;">Harvey plans to use the new capital to expand its engineering and data capabilities, develop <b>domain-specific models</b>, and deepen partnerships with cloud and model providers to enhance its AI platform. We are curious how this will affect the company&#39;s spend with <a class="link" href="https://openai.com/index/harvey/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">OpenAI</a>, although since OpenAI is on the roster of the company&#39;s investors, the impact may be minor.</p><h3 class="heading" style="text-align:left;" id="cowbell-raises-60-million-funding-i">Cowbell Raises $60 Million Funding in Series C Funding</h3><p class="paragraph" style="text-align:left;">Cowbell, a leading AI-powered cyber insurance provider, has raised <a class="link" href="https://www.morningstar.com/news/pr-newswire/20240726ny70414/cowbell-secures-60-million-series-c-funding-from-zurich-insurance-group-to-scale-up-operations-and-advance-global-sme-cyber-adoption?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=second-gpt-4-level-open-model-international-mathematical-olympiad-vs-deepmind-s-llm-rlhf" target="_blank" rel="noopener noreferrer nofollow">$60 million</a> in Series C funding from Zurich Insurance Group, bringing its total funding to <b>$160 million</b>.</p></div><div class='beehiiv__footer'><br class='beehiiv__footer__break'><hr class='beehiiv__footer__line'><a target="_blank" class="beehiiv__footer_link" style="text-align: center;" href="https://www.beehiiv.com/?utm_campaign=ac60a834-9fa6-405c-a4f5-4f8ba1bc0d59&utm_medium=post_rss&utm_source=genai360_weekly_ai_news">Powered by beehiiv</a></div></div>
  ]]></content:encoded>
</item>

      <item>
  <title>A Small Language Model Week, GPT-4 Mini, Llama-3 405B Leaked</title>
  <description>Plus, Mistral&#39;s new Trifecta of Models, &amp; Lessons from Google Cloud’s Early Missteps</description>
      <enclosure url="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/01900e5a-3c64-46b1-9e9c-a8d7d8648bb3/mikayelh_A_3d_isometric_white_and_orange_llama_falling_through__b881c2e5-d890-4654-90c6-a2abfc793d841-ezgif.com-optipng.png" length="777621" type="image/png"/>
  <link>https://genai360.beehiiv.com/p/of-llamas-and-slms</link>
  <guid isPermaLink="true">https://genai360.beehiiv.com/p/of-llamas-and-slms</guid>
  <pubDate>Tue, 23 Jul 2024 13:20:51 +0000</pubDate>
  <atom:published>2024-07-23T13:20:51Z</atom:published>
  <content:encoded><![CDATA[
    <div class='beehiiv'><style>
  .bh__table, .bh__table_header, .bh__table_cell { border: 1px solid #C0C0C0; }
  .bh__table_cell { padding: 5px; background-color: #FFFFFF; }
  .bh__table_cell p { color: #2D2D2D; font-family: 'Helvetica',Arial,sans-serif !important; overflow-wrap: break-word; }
  .bh__table_header { padding: 5px; background-color:#F1F1F1; }
  .bh__table_header p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important; overflow-wrap: break-word; }
</style><div class='beehiiv__body'><p class="paragraph" style="text-align:left;">Before we start, share this week&#39;s news with a friend or a colleague:</p><div class="button" style="text-align:left;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="background-color:#ff8a00;" href="https://genai360.beehiiv.com/p/of-llamas-and-slms?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked"><span class="button__text" style=""> Share the newsletter </span></a></div><h2 class="heading" style="text-align:left;" id="key-takeaways">Key Takeaways</h2><ul><li><p class="paragraph" style="text-align:left;">On the theme of accidents, <b>Llama-3 405B was allegedly leaked on HuggingFace</b> (and made available for download on 4Chan) yesterday. Read below for what we know about the model ahead of today&#39;s release.</p></li><li><p class="paragraph" style="text-align:left;">OpenAI unveiled GPT-4o mini, a compact and cost-effective AI model for ChatGPT that outperforms leading small AI models on reasoning tasks while being <b>60% cheaper </b>to operate than GPT-3.5 Turbo.</p></li><li><p class="paragraph" style="text-align:left;">Salesforce released xLAM, a family of models for autonomous task planning and execution, with the <b>7B model achieving 88.24%</b> on the BFCL function calling leaderboard.</p></li><li><p class="paragraph" style="text-align:left;">HuggingFace’s SmolLM is a new line of efficient small language models designed for local devices, available in<b> 135M, 360M, and 1.7B</b> parameter sizes, outperforming similar-sized models like GPT-2 and MobileLM across various benchmarks.</p></li><li><p class="paragraph" style="text-align:left;">FlashAttention-3 achieves up to<b> 2× speedup</b> in attention mechanisms using producer-consumer asynchrony and hardware-accelerated low-precision operations.</p></li><li><p class="paragraph" style="text-align:left;">LMMs-Eval <b>proposes a unified evaluation framework</b> for multimodal AI, balancing task diversity, human alignment, and efficiency to enable standardized model comparisons.</p></li></ul><p class="paragraph" style="text-align:left;"><i>Got forwarded this newsletter? Subscribe below👇</i></p><div class="button" style="text-align:left;"><a target="_blank" rel="noopener nofollow noreferrer" class="button__link" style="background-color:#ff8a00;" href="https://genai360.beehiiv.com/subscribe?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked"><span class="button__text" style=""> Subscribe </span></a></div><h2 class="heading" style="text-align:left;" id="the-latest-ai-news">The Latest AI News</h2><p class="paragraph" style="text-align:left;">Sheesh, what a week. We&#39;re glad we can be… unburdened with what has been happening over the past week on political arena and the international Blue Screen day as an AI newsletter. After all, so much has happened in AI, too!<br><br>Last week has been big for small language models. AI developments showcased a push towards<b> efficient, compact models</b> like GPT-4o mini, Arcee-Nova, SmolLM, and xLAM, alongside AI titans pivoting to specialized ventures like Fei-Fei Li&#39;s World Labs. </p><p class="paragraph" style="text-align:left;">These advancements are occurring amidst growing regulatory scrutiny, as evidenced by Meta&#39;s EU decision and Altman&#39;s &quot;AI-client privilege&quot; proposal, highlighting the <b>tension</b> between innovation and ethical considerations in AI.</p><p class="paragraph" style="text-align:left;">But first… the talk of the day:</p><h2 class="heading" style="text-align:left;" id="llama-3-405-b-leaked-on-4-chan-what">Llama-3 405B Leaked on 4Chan. What Do We Know Ahead of the Today&#39;s Release?</h2><p class="paragraph" style="text-align:left;">Weighing almost 820GB, it was ‘accidentally’ leaked on HuggingFace repository ahead of the release. (UPD: you can read <a class="link" href="https://ai.meta.com/blog/meta-llama-3-1/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">Meta&#39;s full announcement here</a>).</p><p class="paragraph" style="text-align:left;">We don&#39;t know where you can download more RAM to run this, but here&#39;s what we do know:</p><ol start="1"><li><p class="paragraph" style="text-align:left;">Outperforms GPT4-o and Claude Sonnet on more than 90% of benchmarks, but may fall short on some text-related tasks. Unclear how it fares against the newly-released GPT4-o mini yet.</p></li><li><p class="paragraph" style="text-align:left;">128k context tokens</p></li><li><p class="paragraph" style="text-align:left;">15 Trillions tokens pre-trained 😱 (the number floating around for the OG GPT-4 was 13T)</p></li><li><p class="paragraph" style="text-align:left;">Multilingual, but not yet multi-modal</p></li><li><p class="paragraph" style="text-align:left;">Fine-tuning and Quants available soon after base release</p></li><li><p class="paragraph" style="text-align:left;">Possibly paywalled at a certain point as per this part of the code:</p></li></ol><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/ce3c45fb-38ba-4062-82e5-cc17375cb6fa/image.png?t=1721738848"/><div class="image__source"><span class="image__source_text"><p>Llama code repo contains upsell prompts</p></span></div></div><p class="paragraph" style="text-align:left;">Here are the Llama 3.1 405B benchmarks:</p><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/af214a55-0a9d-4d51-a806-ebbb7c9164fb/image.png?t=1721749854"/><div class="image__source"><span class="image__source_text"><p>Llama 3.1 vs close-source models</p></span></div></div><div class="image"><img alt="" class="image__image" style="" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/38ccc6b2-4a1e-423d-815f-b2424f9765e5/image.png?t=1721749880"/><div class="image__source"><span class="image__source_text"><p>Llama 3.1 vs open-source models</p></span></div></div><h3 class="heading" style="text-align:left;" id="new-models-from-open-ai-and-mistral">New Models From OpenAI and Mistral Lead the Efficiency Race</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-us.googleusercontent.com/docsz/AD_4nXdwW38h9L5vFsEFUYYSgbNxvDtQ-gkM1SClZPG33PYT4HUtbZehQYpar39tndU2Js2WIELdAxqqVdevM1POsEM6-xn0GjpK86gFf4q4VtqLBCJnLbNmUMtS04WeK3BpWj_e9Ag8SDq8Iv-WhVyo-91vASs?key=RUEzH523fmtjVYvsTmrKfg"/><div class="image__source"><span class="image__source_text"><p>GPT-4o mini is among the most cost-effective LLMs. <a class="link" href="https://techcrunch.com/2024/07/18/openai-unveils-gpt-4o-mini-a-small-ai-model-powering-chatgpt/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">(Source) </a></p></span></div></div><p class="paragraph" style="text-align:left;">OpenAI has introduced <a class="link" href="https://techcrunch.com/2024/07/18/openai-unveils-gpt-4o-mini-a-small-ai-model-powering-chatgpt/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">GPT-4o mini</a>, a new small-scale AI model designed to balance performance and efficiency. This model, which will power ChatGPT and be available to developers, <b>outperforms leading small AI models</b> on reasoning tasks while being significantly more cost-effective to run.</p><p class="paragraph" style="text-align:left;">GPT-4o Mini scores<b> 82% on the MMLU benchmark</b>, surpassing Gemini 1.5 Flash and Claude 3 Haiku. It&#39;s also over 60% cheaper to operate than its predecessor, GPT-3.5 Turbo.</p><p class="paragraph" style="text-align:left;">Moreover, the model features a <b>128,000 token context window </b>and supports text and vision inputs.</p><p class="paragraph" style="text-align:left;">Meanwhile, the <a class="link" href="https://techcrunch.com/2024/07/17/ttt-models-might-be-the-next-frontier-in-generative-ai/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">TTT (Tensor-Train-Transformer) models</a> are looking to shake up the generative AI field. Researchers suggest these models could offer significant advantages over traditional transformers in terms of <b>efficiency and capabilities.</b></p><p class="paragraph" style="text-align:left;">TTT models use a different mathematical structure that allows for more efficient information processing. Early results are promising, as they show TTT models <b>performing competitively</b> with much larger transformer models while using fewer parameters.</p><p class="paragraph" style="text-align:left;">On the other hand, Mistral released a plethora of <b>new models,</b> including Mamba, Mathstral, and Nemo.</p><ol start="1"><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://mistral.ai/news/codestral-mamba/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">Mamba:</a> Combines the Mamba architecture with advanced code and reasoning capabilities. </p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://mistral.ai/news/mathstral/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">Mathstral:</a><b> </b>Specifically designed to tackle complex mathematical problems requiring multi-step logical reasoning</p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://mistral.ai/news/mistral-nemo/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">NeMo:</a><b> </b>12B parameter model that offers high-performance in reasoning, world knowledge, and coding accuracy for its size category. </p></li></ol><div class="image"><img alt="" class="image__image" style="" src="https://lh7-us.googleusercontent.com/docsz/AD_4nXd_R2trEIUvFnS2-lrhMb5hrU5Gr2vbY0lQHxOm5XxkvjkusJNvQJCMhPv6F2AV9PjhUMFwwLQSjPmEexpwtvrMjiIvXnbCbcOIRyf-6iIsFnfWI5bEmiRFyqccS4M2dacLT1IDh0H9tdg4qAtQVoNS8Ebh?key=RUEzH523fmtjVYvsTmrKfg"/><div class="image__source"><span class="image__source_text"><p>NeMo outperforms similar-sized models like Gemma 2 and Llama 3 across various benchmarks. <a class="link" href="https://mistral.ai/news/mistral-nemo/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><h3 class="heading" style="text-align:left;" id="arcee-merges-sl-ms-to-close-in-on-g">Arcee Merges SLMs to Close in on GPT-4</h3><p class="paragraph" style="text-align:left;">Arcee released Arcee-Nova, a Small Language Model (SLM) developed as a local alternative to GPT-4 and Sonnet 3.5. It scores <b>9.17 on MT-Bench</b>, 0.01 points below GPT-4 (May 2023). </p><p class="paragraph" style="text-align:left;">On OpenLLM leaderboard tasks, it achieves the <b>highest average scores</b> among models tested (Llama-3-70B-Instruct, Tess-Qwen2, Qwen2-72B-Instruct, etc.). Arcee-Nova is designed for coding, mathematics, and creative writing.</p><div class="image"><img alt="" class="image__image" style="" src="https://lh7-us.googleusercontent.com/docsz/AD_4nXcxvgpkamNEtsMgHJ7yMtfxEDqXZy5OohDH1EXQ_xw8HWP58yEsv6Ws_BLgOmmrjvLxI6zPImXAitYJsicshenGB_BA7kYKAsDT7wXQ_vrtk_ovjc7dDgdFimclSnIsP4Y8uyeiwcZi3dH0Z5WOeVJ9ivxb?key=RUEzH523fmtjVYvsTmrKfg"/></div><p class="paragraph" style="text-align:left;">The model was trained for <b>3 epochs on 1.75 million samples</b> from 10 publicly available datasets. Dataset curation involved a custom reranker for instruction following and safety, with scores averaged with fineweb-edu classifier for educational value. </p><p class="paragraph" style="text-align:left;">Arcee has also publicly released the resulting dataset - <a class="link" href="https://huggingface.co/datasets/arcee-ai/The-Tome?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">arcee-ai/The-Tome</a>. Nova was<b> merged </b>with Qwen2-72B-Instruct, primarily using lower layers from Instruct and higher layers from Nova-Premerge. </p><p class="paragraph" style="text-align:left;">The merge underwent <b>further alignment</b> using DPO (Direct Preference Optimization - i.e., optimization of a model was guided directly by human preferences rather than predefined metrics or loss functions).</p><h3 class="heading" style="text-align:left;" id="even-smaller-sl-ms-x-lam-from-sales">Even Smaller SLMs - xLAM From Salesforce and SmolLM From HuggingFace</h3><p class="paragraph" style="text-align:left;"><a class="link" href="https://github.com/SalesforceAIResearch/xLAM?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">Salesforce</a> has introduced xLAM, a new family of large action models (LAMs) designed to autonomously plan and execute tasks. These models, available in 1.35B and 7B parameter versions, demonstrate <b>impressive capabilities</b> despite their relatively compact size.</p><p class="paragraph" style="text-align:left;">The 7B model achieves<b> 88.24% on the BFCL</b> (function calling leaderboard), while the 2B version scores 78.94%, outperforming many larger open-access models. xLAM models show competitive performance against GPT-4 and Claude 3.5 on function calling tasks.</p><p class="paragraph" style="text-align:left;">On the contrary, Hugging Face released <a class="link" href="https://huggingface.co/blog/smollm?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">SmolLM,</a> a new family of state-of-the-art SLMs designed to operate <b>efficiently </b>on local devices. </p><p class="paragraph" style="text-align:left;">The SmolLM models come in <b>three sizes</b>: 135M, 360M, and 1.7B parameters. The models are trained on a meticulously curated high-quality dataset, SmolLM-Corpus, and demonstrate strong performance in common sense reasoning and world knowledge tasks.</p><div class="image"><img alt="" class="image__image" style="" src="https://lh7-us.googleusercontent.com/docsz/AD_4nXcf5_seIOWaKfeUB9Z9vOpZyAxdIS0h4wwSSddNki5SISr6coMwRLT6e6LhI0vehmgCYOWwfpTvHu29s47hcnBpNePw4ODlz2RilbnrsMoX3HqD5KX7jfN35qqxHgfCR6-yBw9-EimsL8VoQNC_9-QRHgg?key=RUEzH523fmtjVYvsTmrKfg"/><div class="image__source"><span class="image__source_text"><p>SmolLM showed impressive performance across various benchmarks like MMLU and ARC. <a class="link" href="https://huggingface.co/blog/smollm?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">(Source) </a></p></span></div></div><p class="paragraph" style="text-align:left;">Both releases are a sign that we’re moving toward <a class="link" href="https://genai360.beehiiv.com/p/of-new-architectures?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">more efficient, compact models</a> that can perform complex tasks <b>with fewer parameters.</b></p><h3 class="heading" style="text-align:left;" id="li-and-karpathys-next-big-bets">Li and Karpathy’s Next Big Bets</h3><p class="paragraph" style="text-align:left;"><a class="link" href="https://observer.com/2024/07/ai-godmother-fei-fei-li-1b-spatial-intelligence-startup/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">Fei-Fei Li</a>, often dubbed the &quot;Godmother of AI,&quot; has launched World Labs, a startup focused on developing spatial intelligence in AI - the ability to understand and reason about <b>3D spaces</b> and physical environments.</p><p class="paragraph" style="text-align:left;">In just four months, the company has achieved a <b>$1 billion valuation</b> and secured backing from major venture capital firms, including Andreessen Horowitz and Radical Ventures.</p><p class="paragraph" style="text-align:left;">But Li wasn’t the only one who introduced a new company. Former Tesla AI director and OpenAI researcher <a class="link" href="https://techcrunch.com/2024/07/16/after-tesla-and-openai-andrej-karpathys-startup-aims-to-apply-ai-assistants-to-education/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">Andrej Karpathy</a> has unveiled his new venture, Eureka Labs, which aims to <b>completely change education through A</b>I. The startup plans to leverage AI teaching assistants to enhance learning experiences, with an initial focus on a course for training LLMs.</p><p class="paragraph" style="text-align:left;">Eureka Labs will use AI to create personalized, interactive learning experiences, with the platform&#39;s first offering being a course on<b> training LLMs from scratch. </b></p><p class="paragraph" style="text-align:left;">These developments might mean we’re in a <b>new phase of AI innovation</b> where industry leaders are targeting specific, high-impact AI applications rather than the broader applications we’re used to seeing.</p><h3 class="heading" style="text-align:left;" id="metas-eu-pause-and-altmans-call-to-">Meta’s EU Pause and Altman’s Call to Privacy</h3><p class="paragraph" style="text-align:left;"><a class="link" href="https://www.silicon.co.uk/ai/meta-refuses-eu-release-of-multimodal-llama-ai-model-572200?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">Meta</a> announced it will not release its upcoming multimodal Llama AI model in the European Union, citing the &quot;unpredictable nature of the European regulatory environment.&quot; This decision highlights<b> growing tensions </b>between tech giants and EU regulators over AI governance.</p><p class="paragraph" style="text-align:left;">Meta&#39;s move follows <b>similar actions</b> by other tech companies, like Apple&#39;s decision to exclude the EU from certain AI features. The decision comes shortly after the EU finalized compliance deadlines for its <a class="link" href="https://genai360.beehiiv.com/p/of-sonnets-and-agents?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">new AI Act</a>, potentially impacting EU companies&#39; access to cutting-edge AI technologies.</p><p class="paragraph" style="text-align:left;">While Meta was busy making decisions about model releases in EU, the debate about AI ethics continued. <a class="link" href="https://www.msn.com/en-xl/health/other/sam-altman-says-society-may-decide-we-need-ai-client-privilege-similar-to-confidentiality-with-lawyers-or-doctors/ar-BB1q5GSr?ocid=finance-verthp-feeds&utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">OpenAI CEO Sam Altman</a> suggests that society may need to establish an &quot;AI-client privilege&quot; similar to the <b>confidentiality protections</b> that exist between lawyers or doctors and their clients. </p><p class="paragraph" style="text-align:left;">Establishing such a privilege could help build trust between users and AI systems, encouraging more <b>open and honest interactions</b> without fear of information misuse.</p><p class="paragraph" style="text-align:left;">Although, it’s important to consider that this would require <b>advanced security measures</b> and potentially new technological solutions to ensure the confidentiality of AI interactions.</p><h3 class="heading" style="text-align:left;" id="open-a-is-custom-chip-gambit">OpenAI’s Custom Chip Gambit</h3><p class="paragraph" style="text-align:left;">Since Nvidia has been dominating the AI chip market, OpenAI decided to throw its name in the hat of <a class="link" href="https://genai360.beehiiv.com/p/the-trillion-dollar-cluster?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">its competitors</a>.</p><p class="paragraph" style="text-align:left;"><a class="link" href="https://www.datacenterdynamics.com/en/news/openai-in-talks-with-broadcom-to-develop-custom-ai-chip-altman-looks-to-fund-data-centers-report/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">OpenAI</a> is reportedly in talks with Broadcom to develop a custom AI chip, signaling a <b>potential shift</b> in the AI hardware landscape. This move comes as part of OpenAI&#39;s broader strategy to reduce dependence on existing GPU suppliers and address the increasing demand for AI computation.</p><p class="paragraph" style="text-align:left;">This could give OpenAI more control over its hardware supply chain and <b>reduce costs </b>in the long run. It may also lead to a competitive advantage, as developing proprietary hardware could give OpenAI an edge in the AI race. </p><p class="paragraph" style="text-align:left;">OpenAI’s initiative to develop custom chips <b>mirrors similar efforts</b> by other major tech companies to reduce dependence on external suppliers and optimize hardware for their needs. </p><p class="paragraph" style="text-align:left;">Apple introduced the <a class="link" href="https://www.apple.com/uk/newsroom/2024/05/apple-introduces-m4-chip/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">M4 chip</a> a couple of months ago, the latest example of its custom silicon strategy for Macs and other devices. Additionally, Google has been developing <a class="link" href="https://cloud.google.com/tpu?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">TPUs </a>specifically designed to <b>accelerate machine learning workloads. </b></p><h2 class="heading" style="text-align:left;" id="advancements-in-ai-research">Advancements in AI Research</h2><p class="paragraph" style="text-align:left;">Last week’s advancements showcase a convergence of efforts to enhance efficiency, evaluation, and application across <b>diverse domains</b>, from optimizing LLMs to improving multimodal capabilities and 3D perception. </p><p class="paragraph" style="text-align:left;">Breakthroughs like FlashAttention-3 and Q-Sparse are pushing the boundaries of model efficiency, while LMMs-Eval addresses the critical need for <b>standardized evaluation methods in multimodal AI.</b></p><h3 class="heading" style="text-align:left;" id="optimizing-control-for-multilingual">Optimizing Control for Multilingual Text-Image Generation</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-us.googleusercontent.com/docsz/AD_4nXclYAxmLMIYHss6sABTP13LRGicrNSyxo5i2TXIb-k-IGXWynFGtJVZdgvuEkBOyEmevCkMNU-bQRX4e_Jr1uQOI9NioW9apkbAuYQ2eAKtUiScx6d3iHw1WspjCxXOF-_16mjDTEdaOBaOfJNN2mFgF5Ha?key=RUEzH523fmtjVYvsTmrKfg"/><div class="image__source"><span class="image__source_text"><p>Impact of control at different stages of denoising. <a class="link" href="https://arxiv.org/pdf/2407.11502v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Researchers at the University of Science and Technology of China introduced <a class="link" href="https://arxiv.org/pdf/2407.11502v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">TextGen</a>, a framework that addresses a critical challenge in visual text generation: effectively utilizing <b>control information</b> throughout the diffusion process. </p><p class="paragraph" style="text-align:left;">This work tackles the limitations of <b>current ControlNet-based approache</b>s, which often struggle with fine-grained character details and coherent text placement.</p><p class="paragraph" style="text-align:left;">Their findings reveal that control information has <b>unique characteristics </b>compared to conventional inputs like Canny edges or depth maps. Notably, control in both the early and late stages of denoising plays crucial roles - early control influences global coherence, while late-stage control refines textual details.</p><p class="paragraph" style="text-align:left;">The results were impressive. TextGen achieved state-of-the-art performance on the <b>AnyWord benchmark,</b> with significant gains in both English (73.36% accuracy, up from 64.26%) and Chinese (67.92% accuracy, up from 65.02%) text generation.</p><h3 class="heading" style="text-align:left;" id="navigating-the-evaluation-trilemma-">Navigating the Evaluation Trilemma for Multimodal AI</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-us.googleusercontent.com/docsz/AD_4nXcYIwK4e4R6QGL2bTXBPgrIshdK0Vjb9RbWOsjMF8yhxsfyVtJAvRwmgEvaWdzRCynjspgAIDDXUtIDPuGTcsH_rt7xc2duaDqnfyqLnVZo9-B1N6VoLtbKSETqMgtC2PuvS3s-45PbfbARUtG2m3-AkW9T?key=RUEzH523fmtjVYvsTmrKfg"/><div class="image__source"><span class="image__source_text"><p>The three components developed to deal with the “evaluation trilemma.” <a class="link" href="https://arxiv.org/pdf/2407.12772v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Researchers at the LMMs-Lab Team presented<a class="link" href="https://arxiv.org/pdf/2407.12772v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow"> LMMs-EVAL</a>, a framework that addresses the critical challenges in evaluating large multimodal models (LMMs). This work tackles the &quot;<b>evaluation trilemma</b>&quot; - the difficulty in simultaneously achieving wide coverage, low cost, and zero contamination in AI model assessment.</p><p class="paragraph" style="text-align:left;">They developed <b>three</b> key components:</p><ul><li><p class="paragraph" style="text-align:left;"><b>LMMs-EVAL:</b> A unified evaluation suite covering over 50 tasks and more than 10 models, ensuring standardized comparisons.</p></li><li><p class="paragraph" style="text-align:left;"><b>LMMs-EVAL LITE: </b>An efficient subset of benchmarks that maintains reliability while reducing evaluation costs.</p></li><li><p class="paragraph" style="text-align:left;"><b>LIVEBENCH:</b> A dynamic evaluation framework using continuously updated news and online content to assess models&#39; zero-shot generalization abilities.</p></li></ul><p class="paragraph" style="text-align:left;">Results show significant improvements in evaluation efficiency and reliability. LMMs-EVAL LITE achieved correlation scores <b>above 0.9</b> with full benchmark results while substantially reducing computation time. </p><p class="paragraph" style="text-align:left;">LIVEBENCH also revealed performance gaps between open-source and commercial models, highlighting the need for<b> more robust evaluation methods.</b></p><h3 class="heading" style="text-align:left;" id="unlocking-full-activation-sparsity-">Unlocking Full Activation Sparsity in LLMs</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-us.googleusercontent.com/docsz/AD_4nXctSN8SN7BU645oswcEYHACdpEPg_bptGlOADaSl3t1R45MvENmhWA13GjX4AYP8B3QB_AqJeo7_XNL8t2-d_0w3T4n03OG4ldykXrwifL0zDLtcBiX16PEQM-1e19f39VQu1Gq_pqXdQsYInvW9aAYAAtI?key=RUEzH523fmtjVYvsTmrKfg"/><div class="image__source"><span class="image__source_text"><p>How Q-Sparse achieves a better inference-scaling law than the dense models. <a class="link" href="https://arxiv.org/pdf/2407.10969?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">(Source) </a></p></span></div></div><p class="paragraph" style="text-align:left;">Researchers at Microsoft have introduced<a class="link" href="https://arxiv.org/pdf/2407.10969?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow"> Q-Sparse</a>, a novel approach to training sparsely-activated LLMs that significantly improves<b> inference efficiency </b>without compromising performance. </p><p class="paragraph" style="text-align:left;">This work addresses the critical challenge of <b>reducing computational costs </b>and memory footprint in LLMs - especially during inference.</p><p class="paragraph" style="text-align:left;">Q-Sparse can match the performance of dense baseline models while achieving up to 40% sparsity. Notably, the method works effectively across <b>various settings,</b> including training from scratch, continuing pre-training of existing LLMs, and fine-tuning.</p><p class="paragraph" style="text-align:left;">They also derived an<b> inference-optimal scaling law</b> for sparsely-activated LLMs, finding that models with a sparsity ratio of 45.58% (for full-precision) and 61.25% (for 1.58-bit models) can achieve optimal performance given the same inference budget.</p><h3 class="heading" style="text-align:left;" id="pushing-the-boundaries-of-efficient">Pushing the Boundaries of Efficient Attention</h3><p class="paragraph" style="text-align:left;">Researchers at Colfax Research, Meta, NVIDIA, Princeton University, and Together AI have introduced <a class="link" href="https://arxiv.org/pdf/2407.08608v1?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">FlashAttention-3</a>, a new approach to optimize attention mechanisms in large language models. It addresses the <b>computational bottleneck of attention</b>, which has quadratic scaling in sequence length and limits the ability to handle long contexts.</p><p class="paragraph" style="text-align:left;">Results show significant performance gains with a <b>1.5-2.0×</b> speedup over FlashAttention-2 in the forward pass. The method can scale to much longer sequence lengths than previous approaches, which is crucial for improving the context understanding of large language models.</p><p class="paragraph" style="text-align:left;">This research not only improves the efficiency of <b>attention mechanisms</b> but also shows how leveraging hardware-specific features can lead to substantial performance gains in AI models.</p><h2 class="heading" style="text-align:left;" id="frameworks-we-love">Frameworks We Love</h2><p class="paragraph" style="text-align:left;">Some frameworks that caught our attention in the last week include:</p><ul><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2407.13766?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">Visual Haystacks</a>: Uses natural language processing techniques to detect inconsistencies within 4G and 5G cellular network protocol specifications.</p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2407.13759?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">StreetScapes</a>: Generates long sequences of city views through synthesized urban environments.</p></li><li><p class="paragraph" style="text-align:left;"><a class="link" href="https://arxiv.org/pdf/2407.13761?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">SegPoint</a>: Used 3D point cloud segmentation, leverages multimodal LLMs to produce point-wise segmentation masks for various tasks.</p></li></ul><p class="paragraph" style="text-align:left;">If you want your framework to be featured here, reply to this email and say hi :)</p><h2 class="heading" style="text-align:left;" id="conversations-we-loved">Conversations We Loved</h2><p class="paragraph" style="text-align:left;">Renee Shah&#39;s insights on the rising prominence of <b>headless data architecture </b>shed light on a significant shift in how organizations approach data management and analysis. </p><p class="paragraph" style="text-align:left;">Meanwhile, Hemant Mohapatra&#39;s reflections on <b>Google Cloud Platform&#39;s early challenges</b> offer valuable lessons on the complexities of building and scaling cloud services in a highly competitive market.</p><h3 class="heading" style="text-align:left;" id="is-headless-the-future-of-data-syst">Is Headless the Future of Data Systems?</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-us.googleusercontent.com/docsz/AD_4nXesTOTRLNCrXFRuZ-7IfnXpEoSVAo6MAM48ocW7lTmbwfvxp_aW-dCpdNunRKWJKzMnmnE_ztnzs3jZ8EY-qKin8I2SRGyVwzABLbaXQeAo-5-jB1buci6vRz6zURHCfBJv_ouUw98r7UsyfO9deCyVIHQ?key=RUEzH523fmtjVYvsTmrKfg"/><div class="image__source"><span class="image__source_text"><p>Shah’s discussion about how headless data architecture is becoming a common term these days. <a class="link" href="https://x.com/reneeshah123/status/1812944622890725395?s=46&t=r_IyUhjxHPp6D-O3kuDbzQ&utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">(Source)</a></p></span></div></div><p class="paragraph" style="text-align:left;">Renee Shah brings up how <a class="link" href="https://x.com/reneeshah123/status/1812944622890725395?s=46&t=r_IyUhjxHPp6D-O3kuDbzQ&utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">headless data architecture</a> has recently gained significant traction in the data engineering world. This approach represents a massive shift towards<b> more flexible and scalable data system</b>s.</p><p class="paragraph" style="text-align:left;">One core benefit is that it decouples storage from compute, allowing for independent resource scaling. It’s also cost-effective because users only pay for the compute resources used during query execution. At Activeloop, we&#39;re huge <b>proponents of compute/storage separation </b>and have had this architecture from day 1 six years ago!</p><p class="paragraph" style="text-align:left;">However, it presents a <b>challenge in performance optimization</b>, as ensuring efficient query performance across different engines may require additional tuning. Increased complexity is another issue, as managing multiple components can be more challenging than using a single, integrated system.</p><p class="paragraph" style="text-align:left;">The growing popularity of headless data architecture reflects a broader trend towards <b>modular, cloud-native data systems</b> that prioritize flexibility and scalability. </p><h3 class="heading" style="text-align:left;" id="the-untold-story-of-google-clouds-e">The Untold Story of Google Cloud’s Early Days</h3><div class="image"><img alt="" class="image__image" style="" src="https://lh7-us.googleusercontent.com/docsz/AD_4nXfU4Nj6lK4vnluNWix8B6hg-lv-7zrGWGV0qShRg6D37EPNw8yaQAgqA-QWxe382W8Z9qZUS46EOD0SQSaVDCJ6te7IzlCwRdQoLpvu0LVxBVJhoSLRqVdFtJipom69rD3_FtnpWZIvNBHMHyepgcTAL4A?key=RUEzH523fmtjVYvsTmrKfg"/><div class="image__source"><span class="image__source_text"><p>Mohapatra reflects on insights from the journey of GCP. <a class="link" href="https://x.com/MohapatraHemant/status/1812018704034496969?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">(Source) </a></p></span></div></div><p class="paragraph" style="text-align:left;">Hemant Mohapatra, a former Google Cloud employee, shared insights on Twitter about <a class="link" href="https://x.com/MohapatraHemant/status/1812018704034496969?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">Google Cloud Platform&#39;s (GCP)</a> <b>early challenges and lessons learned.</b></p><p class="paragraph" style="text-align:left;">While focusing on cloud-native clients like Snapchat, GCP <b>initially overlooked </b>the larger market of <b>hybrid cloud customers</b> like Goldman Sachs. Early problems included salespeople without quotas, inconsistent branding, title inflation, and weak partnership programs.</p><p class="paragraph" style="text-align:left;">It shows how even companies with superior technology <b>can struggle</b> if they don&#39;t have the right go-to-market strategy and organizational focus. Moreover, GCP&#39;s initial focus on cloud-native clients like Snapchat, while overlooking the larger hybrid cloud market, illustrates the critical need for companies to understand and adapt to their full market potential.</p><p class="paragraph" style="text-align:left;">Regardless, Mohapatra emphasizes that while these observations reflect past challenges, GCP has since evolved into a<b> much stronger platform and team.</b></p><h2 class="heading" style="text-align:left;" id="money-moving-in-ai">Money Moving in AI</h2><p class="paragraph" style="text-align:left;">Anthropic launched a $100 million fund for AI startups to speed up the <b>pace of AI development,</b> while Arcee and Eden successfully raised $24 million and $10 million respectively.</p><h3 class="heading" style="text-align:left;" id="anthropic-announces-100-million-fun">Anthropic Announces $100 Million Fund for AI Startups</h3><p class="paragraph" style="text-align:left;">Anthropic, in partnership with Menlo Ventures, has announced a<a class="link" href="https://www.itpro.com/technology/artificial-intelligence/anthropic-just-launched-a-100-million-investment-fund-for-ai-startups?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow"> $100 million fund</a> called the Anthology Fund to invest in early-stage AI startups. This initiative aims to support entrepreneurs in key areas of AI development and application. Startups will receive financial backing and <b>$25,000 in credits</b> to access Anthropic&#39;s advanced language models.</p><h3 class="heading" style="text-align:left;" id="arcee-secures-24-million-in-series-">Arcee Secures $24 Million in Series A Funding </h3><p class="paragraph" style="text-align:left;">Arcee AI, a Miami-based startup specializing in small language models (SLMs), has raised <a class="link" href="https://venturebeat.com/ai/small-language-models-rising-as-arcee-ai-lands-24m-series-a/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">$24 million in Series A funding</a> led by Emergence Capital. This investment comes just six months after their <b>$5.5 million seed round</b> in January 2024. </p><p class="paragraph" style="text-align:left;">Previously, <a class="link" href="https://www.activeloop.ai/resources/how-we-finetuned-a-large-language-model-to-search-patents-generate-new-patents/?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow">we’ve worked with </a>them to co-develop PatentGPT, <b>a patent generation and search engine</b>, with Arcee&#39;s models and our database for AI, so we&#39;re really happy to see them achieve this milestone!</p><h3 class="heading" style="text-align:left;" id="eden-raises-10-million">Eden Raises $10 Million</h3><p class="paragraph" style="text-align:left;">Eden, a Mexico City-based healthtech startup, has raised<a class="link" href="https://www.axios.com/pro/health-tech-deals/2024/07/17/eden-10-million-radiology?utm_source=genai360.beehiiv.com&utm_medium=newsletter&utm_campaign=a-small-language-model-week-gpt-4-mini-llama-3-405b-leaked" target="_blank" rel="noopener noreferrer nofollow"> $10 million</a> in a funding round led by Sierra Ventures, with participation from Dalus Capital, Ali Capital, Liquid, and Endeavor. The company specializes in <b>generative AI</b> for medical imaging and diagnostics, aiming to improve radiology services across Latin America.</p></div><div class='beehiiv__footer'><br class='beehiiv__footer__break'><hr class='beehiiv__footer__line'><a target="_blank" class="beehiiv__footer_link" style="text-align: center;" href="https://www.beehiiv.com/?utm_campaign=a3312f2a-4056-4cf7-85d5-736cec7fee69&utm_medium=post_rss&utm_source=genai360_weekly_ai_news">Powered by beehiiv</a></div></div>
  ]]></content:encoded>
</item>

  </channel>
</rss>
