AI pulse last 7 days
Daily AI pulse from YouTube, blogs, Reddit, HN. Ruthlessly filtered.
Sources (41)▶
- criticalAndrej Karpathy
Były dyrektor AI w Tesli, OpenAI cofounder. Każde video to gold.
- criticalAnthropic
Oficjalny kanał Anthropic. Każdy release Claude'a.
- criticalComfyUI Blog
Release log dla integracji ComfyUI — Luma Uni-1, GPT Image 2, ACE-Step music gen, Seedance. Pokrywa video+image+music+workflow.
- criticalOpenAI Blog
Oficjalny blog OpenAI. Wszystkie release.
- criticalSimon Willison's Weblog
Najlepszy 'thinker' AI. Codzienne posty, deep insights, niska hype rate.
- highAI Explained
Głęboka analiza papers i benchmarków, niska hype rate.
- highAI Jason
Praktyczne tutoriale Claude Code, MCP, workflow vibe codingu.
- highBen's Bites
Daily AI digest, creator-friendly tone. Codex, model releases, agentic AI.
- highCole Medin
Vibe coding + agentic workflows + Claude Code MCP integrations.
- highFal AI Blog
Fal hostuje większość nowych AI image/video modeli — ich blog to wczesne sygnały premier.
- highHN: 3D & Gaussian Splatting
HN signal dla 3D generative — Gaussian Splatting, NeRF, image-to-3D. Próg 20 bo niszowa kategoria (top historic 182pts).
- highHN: AI agents / MCP
HN posty o agentach, MCP, vibe codingu z min 100 pkt.
- highHN: Claude / Anthropic
HN posty z 'Claude' lub 'Anthropic' z min 100 pkt.
- highHugging Face Blog
Releases dla image, video, audio, 3D modeli. Część tech-heavy — Gemini relevance odfiltruje noise. Downgraded z critical: za duży volume na 'must-read' status.
- highIndyDevDan
Claude Code power user, prompty, hooki.
- highInterconnects (Nathan Lambert)
AI policy + research analysis. Niska hype rate, opinionated.
- highLatent Space
Podcast + blog Swyx — wywiady z founderami i deep dives engineeringowe.
- highMatt Wolfe
Comprehensive AI tools weekly digest. ~700K subs.
- highMatthew Berman
AI news, model release reviews, agent demos. Wysoki output.
- highr/aivideo
Community AI video — Sora, Veo, Runway, Kling, LTX. Co naprawdę zaskakuje twórców.
- highr/ClaudeAI
Społeczność Claude'a — power users, tipy, problemy.
- highr/LocalLLaMA
Open-source LLMs, lokalne uruchamianie, benchmarks bez hype.
- highr/StableDiffusion
Największa community open-source image gen (700k+ users). Premiery modeli, LoRA, ComfyUI workflows.
- highRiley Brown
Vibe coding, AI builder workflows, Cursor + Claude tutorials.
- highThe Decoder
Niemiecki AI news outlet po angielsku, dobre breaking news.
- highTheo - t3.gg
TypeScript + AI dev workflows. Hot takes, narrative-driven.
- highYannic Kilcher
Paper reviews i deep dives w research AI.
- lowAI Weirdness
Janelle Shane — playful AI experiments, image gen quirks. Niski volume, unikalna perspektywa.
- mediumbycloud
AI papers digestible — między 2MP a Yannic Kilcher.
- mediumCreative Bloq
Design industry — gdzie AI ingeruje w klasyczne dyscypliny graficzne.
- mediumFireship
100-sec format, often AI/LLM + tech news.
- mediumfxguide
VFX i film industry — coraz więcej AI w pipeline. Profesjonalna perspektywa.
- mediumGreg Isenberg
Solo founder vibe — buduje produkty z AI, podcasty z indie hackers.
- mediumr/ChatGPTCoding
Vibe coding tipy, IDE setupy, prompty. Mix wszystkich modeli.
- mediumr/comfyui
ComfyUI workflows — custom nodes, JSON workflows, optymalizacje.
- mediumr/midjourney
Midjourney community — premiery v7+, style references, prompt patterns.
- mediumr/runwayml
Runway-specific community — premiery features, prompt patterns, comparisons z konkurencją.
- mediumr/SunoAI
Suno music gen community — nowe wersje modelu, lyric prompting techniques. Audio AI ma slaby RSS ecosystem.
- mediumTina Huang
AI workflows for data science, practical applications.
- mediumTwo Minute Papers
Krótkie streszczenia papers AI, świetne dla szybkiego scan'a.
- mediumWes Roth
AI news z bardziej clickbaitowym tonem — filtr Gemini odsiewa hype.
Running Qwen3.5 / Qwen3.6 with NextN MTP (Multi-Token Prediction) speculative decode in llama.cpp — single RTX 3090 Ti GPU guide
Speed up Qwen 3.5/3.6 models by nearly 3x on a single GPU using NextN Multi-Token Prediction in llama.cpp with this specific build and quantization guide.
This technical guide details how to implement NextN Multi-Token Prediction (MTP) for the Qwen 3.5 and 3.6 model families using llama.cpp. By leveraging MTP, users can achieve approximately 2.9x faster decoding speeds with zero loss in output quality, as the prediction heads are natively integrated into these models. The process currently requires building llama.cpp from specific pull requests (#22400 and #22673) or using a provided fork. A critical step involves a specific quantization override (--tensor-type nextn=q8_0) to prevent output corruption. Benchmarks show the 35B MoE variant reaching an impressive ~150 tokens per second on a single RTX 3090 Ti.
r/LocalLLaMA·tutorial·05/07/2026, 09:56 AM·/u/yes_i_tried_googleExaggerated PCI-E bandwidth concerns?
PCIe bandwidth concerns for multi-GPU setups are likely exaggerated; even a 4.0 x4 link handles high-speed prefill for mid-range cards using vLLM and Tensor Parallelism.
A user on r/LocalLLaMA conducted benchmarks to test if PCIe bandwidth is a true bottleneck for multi-GPU local LLM setups on consumer hardware. Using two RTX 5060 Ti 16GB cards with vLLM and Tensor Parallelism (TP=2), they found that peak bandwidth during prefill reached only 3-4 GB/s. This represents about 50% of the capacity of a PCIe 4.0 x4 slot, suggesting that even limited chipset-connected slots are sufficient for mid-range cards. The test involved high-speed quants like NVFP4, achieving prefill rates up to 1700 t/s. These findings suggest hobbyists can scale to 3 or 4 GPUs using M.2 adapters without needing expensive workstation-grade motherboards.
r/LocalLLaMA·news·05/06/2026, 07:54 PM·/u/ziphnor
OpenAI built a networking protocol with AMD, Broadcom, Intel, Microsoft, and NVIDIA to fix AI supercomputer bottlenecks
OpenAI and tech giants released MRC, an open-source protocol that makes training massive models faster and cheaper by optimizing how 100,000+ GPUs communicate.
OpenAI, in collaboration with industry leaders like NVIDIA, Microsoft, and AMD, has introduced MRC (Multi-Path Remote Communication), an open-source networking protocol designed for AI supercomputing. The protocol addresses the massive data bottlenecks inherent in training LLMs across tens of thousands of GPUs. By enabling data transmission across hundreds of paths simultaneously, MRC reduces the required network switch layers from four down to just two. This architecture supports clusters of over 100,000 GPUs while significantly lowering power consumption and hardware costs. Currently, the protocol is operational within OpenAI's Stargate supercomputer project, signaling a shift towards more efficient, standardized AI infrastructure.
The Decoder·tooling·05/06/2026, 07:13 PM·Matthias Bastian
Anthropic taps SpaceX's Colossus-1 data center for 220,000 GPUs to power Claude
Anthropic is scaling up massively by leasing SpaceX's Colossus-1 data center, which will double Claude Code rate limits and boost API capacity for Opus models.
Anthropic is taking over the full computing capacity of SpaceX's Colossus-1 data center, utilizing over 220,000 NVIDIA GPUs and 300 megawatts of power. The facility is expected to be operational within a month, providing a massive boost to Anthropic's training and inference capabilities. Consequently, the company is doubling rate limits for Claude Code and increasing API limits for its high-end Opus models. This scale of infrastructure suggests that Anthropic is gearing up for the release of significantly more powerful frontier models. The partnership highlights the intensifying competition for massive-scale compute resources in the AI industry.
The Decoder·news·05/06/2026, 06:42 PM·Matthias BastianAnthropic Just Secured a Reserve.
Anthropic is massively scaling its training power by securing 220,000+ NVIDIA GPUs through a new partnership with SpaceX.
Anthropic has announced a strategic partnership with SpaceX to utilize the full compute capacity of the Colossus 1 data center. This agreement grants Anthropic access to over 300 megawatts of power and a massive deployment of more than 220,000 NVIDIA GPUs, expected to be online within the month. This scale of infrastructure is significantly larger than most current AI clusters, indicating a massive push for the next generation of Claude models. The move highlights the intensifying arms race for compute resources among top-tier AI labs. By securing this reserve, Anthropic ensures it has the hardware necessary for training and serving increasingly complex frontier models.
r/ClaudeAI·news·05/06/2026, 05:05 PM·/u/DragonflyOk7139
Analysis of the 100 most popular hardware setups on Hugging Face
See which GPUs actually dominate the AI landscape, from enterprise A100s to the consumer RTX 4090s favored for local LLM execution.
Hugging Face CEO Clement Delangue released an analysis of the top 100 hardware configurations used on the platform. The data underscores NVIDIA's market capture, with the A100 and H100 leading for heavy workloads, while the RTX 3090 and 4090 remain the top choices for local enthusiasts. This report offers a factual look at the compute landscape, moving beyond hype to show what hardware is actually accessible to developers. It highlights the importance of VRAM capacity for running modern LLMs locally. For the creative-tech community, this serves as a benchmark for building and optimizing tools that fit the most common user profiles.
r/LocalLLaMA·news·05/06/2026, 04:35 PM·/u/clem59480Protip if you want to squeeze most out of your VRAM if you have a CPU with iGPU
Free up hundreds of MBs of VRAM for your models by plugging your monitor into the motherboard and using your iGPU for the OS display.
This practical tip for local LLM enthusiasts explains how to maximize available VRAM on dedicated GPUs by offloading system tasks. By enabling the integrated GPU (iGPU) in the BIOS and connecting the display cable directly to the motherboard, the system uses the iGPU for GUI rendering instead of the primary graphics card. This simple hardware adjustment can reclaim several hundred megabytes of VRAM, which is often critical when trying to fit a specific model or a larger context window into memory. The method is especially effective for users on Windows or Linux distributions with a desktop environment. It offers a straightforward way to optimize hardware resources without needing complex software tweaks.
r/LocalLLaMA·tutorial·05/06/2026, 11:35 AM·/u/Th3Sim0n
Relevance auto-scored by LLM (0–10). List shows top 30 from the last 7 days.