AI pulse last 7 days
Daily AI pulse from YouTube, blogs, Reddit, HN. Ruthlessly filtered.
Sources (41)▶
- criticalAndrej Karpathy
Były dyrektor AI w Tesli, OpenAI cofounder. Każde video to gold.
- criticalAnthropic
Oficjalny kanał Anthropic. Każdy release Claude'a.
- criticalComfyUI Blog
Release log dla integracji ComfyUI — Luma Uni-1, GPT Image 2, ACE-Step music gen, Seedance. Pokrywa video+image+music+workflow.
- criticalOpenAI Blog
Oficjalny blog OpenAI. Wszystkie release.
- criticalSimon Willison's Weblog
Najlepszy 'thinker' AI. Codzienne posty, deep insights, niska hype rate.
- highAI Explained
Głęboka analiza papers i benchmarków, niska hype rate.
- highAI Jason
Praktyczne tutoriale Claude Code, MCP, workflow vibe codingu.
- highBen's Bites
Daily AI digest, creator-friendly tone. Codex, model releases, agentic AI.
- highCole Medin
Vibe coding + agentic workflows + Claude Code MCP integrations.
- highFal AI Blog
Fal hostuje większość nowych AI image/video modeli — ich blog to wczesne sygnały premier.
- highHN: 3D & Gaussian Splatting
HN signal dla 3D generative — Gaussian Splatting, NeRF, image-to-3D. Próg 20 bo niszowa kategoria (top historic 182pts).
- highHN: AI agents / MCP
HN posty o agentach, MCP, vibe codingu z min 100 pkt.
- highHN: Claude / Anthropic
HN posty z 'Claude' lub 'Anthropic' z min 100 pkt.
- highHugging Face Blog
Releases dla image, video, audio, 3D modeli. Część tech-heavy — Gemini relevance odfiltruje noise. Downgraded z critical: za duży volume na 'must-read' status.
- highIndyDevDan
Claude Code power user, prompty, hooki.
- highInterconnects (Nathan Lambert)
AI policy + research analysis. Niska hype rate, opinionated.
- highLatent Space
Podcast + blog Swyx — wywiady z founderami i deep dives engineeringowe.
- highMatt Wolfe
Comprehensive AI tools weekly digest. ~700K subs.
- highMatthew Berman
AI news, model release reviews, agent demos. Wysoki output.
- highr/aivideo
Community AI video — Sora, Veo, Runway, Kling, LTX. Co naprawdę zaskakuje twórców.
- highr/ClaudeAI
Społeczność Claude'a — power users, tipy, problemy.
- highr/LocalLLaMA
Open-source LLMs, lokalne uruchamianie, benchmarks bez hype.
- highr/StableDiffusion
Największa community open-source image gen (700k+ users). Premiery modeli, LoRA, ComfyUI workflows.
- highRiley Brown
Vibe coding, AI builder workflows, Cursor + Claude tutorials.
- highThe Decoder
Niemiecki AI news outlet po angielsku, dobre breaking news.
- highTheo - t3.gg
TypeScript + AI dev workflows. Hot takes, narrative-driven.
- highYannic Kilcher
Paper reviews i deep dives w research AI.
- lowAI Weirdness
Janelle Shane — playful AI experiments, image gen quirks. Niski volume, unikalna perspektywa.
- mediumbycloud
AI papers digestible — między 2MP a Yannic Kilcher.
- mediumCreative Bloq
Design industry — gdzie AI ingeruje w klasyczne dyscypliny graficzne.
- mediumFireship
100-sec format, often AI/LLM + tech news.
- mediumfxguide
VFX i film industry — coraz więcej AI w pipeline. Profesjonalna perspektywa.
- mediumGreg Isenberg
Solo founder vibe — buduje produkty z AI, podcasty z indie hackers.
- mediumr/ChatGPTCoding
Vibe coding tipy, IDE setupy, prompty. Mix wszystkich modeli.
- mediumr/comfyui
ComfyUI workflows — custom nodes, JSON workflows, optymalizacje.
- mediumr/midjourney
Midjourney community — premiery v7+, style references, prompt patterns.
- mediumr/runwayml
Runway-specific community — premiery features, prompt patterns, comparisons z konkurencją.
- mediumr/SunoAI
Suno music gen community — nowe wersje modelu, lyric prompting techniques. Audio AI ma slaby RSS ecosystem.
- mediumTina Huang
AI workflows for data science, practical applications.
- mediumTwo Minute Papers
Krótkie streszczenia papers AI, świetne dla szybkiego scan'a.
- mediumWes Roth
AI news z bardziej clickbaitowym tonem — filtr Gemini odsiewa hype.
Decoupled Attention from Weights - Gemma 4 26B
Run massive models like Gemma 4 26B by splitting attention and weights across multiple cheap local machines, bypassing single-GPU VRAM limits.
Larql introduces a method to decouple attention mechanisms from model weights, specifically demonstrated with Gemma 4 26B. This approach allows users to split the memory load across multiple local machines, keeping the attention mechanism on a primary device while offloading the massive weight matrices to a secondary, cheaper server like an old Xeon. This effectively bypasses the VRAM bottleneck that typically limits local LLM performance and model size. The repository includes functional code to implement this distributed inference strategy. It represents a significant shift for home lab enthusiasts who want to run large-scale models without investing in high-end enterprise GPUs.
r/LocalLLaMA·tooling·05/06/2026, 11:56 AM·/u/yeah-ok
Gemini Omni, Gemini 3.2 Flash, a 12M Context Window Model, Claude Replaces Analysts, & More! AI NEWS
A massive week of AI updates including a 12M context window model, GPT-5.5 Instant, and Claude's automation of financial analyst roles.
This week saw a flurry of AI announcements ahead of Google IO, headlined by the leak of Gemini 3.2 Flash and a new Omni model for native video generation. A startup called SubQ introduced a sub-quadratic sparse attention architecture, enabling a staggering 12-million-token context window with 52x faster processing than traditional methods. OpenAI quietly rolled out GPT-5.5 Instant, a faster, more reliable version of their flagship model optimized for real-time use. Anthropic launched specialized Claude agent templates designed to automate entry-level financial analyst tasks, including valuation and market research. Additionally, Google updated Gemma 4 with multi-token prediction for 3x speed gains and enhanced Notebook LM with advanced mind-mapping features.
AI Jason·news·05/06/2026, 06:30 AM·WorldofAI▶Watch here

Dense Model Shoot-Off: Gemma 4 31B vs Qwen3.6/5 27B... Result is Slower is Faster.
Gemma 4 31B proves that token efficiency beats raw speed: it completes tasks faster than Qwen 3.6 by being smarter with every token generated.
A performance comparison between Google's Gemma 4 31B and Alibaba's Qwen 3.6/3.5 27B highlights a critical distinction between raw inference speed and task completion time. While Qwen models often achieve higher scores on synthetic benchmarks, Gemma 4 demonstrates superior token efficiency, requiring fewer tokens to generate accurate responses. This creates a 'slower is faster' scenario where Gemma, despite having lower tokens-per-second due to its larger size, finishes complex tasks more quickly than its competitors. The analysis suggests that Qwen may be 'benchmaxxed'—optimized specifically for test scores—whereas Gemma offers higher intelligence density for real-world use. Local LLM enthusiasts are now looking forward to further optimizations like DFlash and MTP to enhance Gemma's perf…
r/LocalLLaMA·news·05/05/2026, 06:12 PM·/u/MiaBchDaveGemma 4 MTP released
Google released MTP draft models for Gemma 4, enabling up to 2x faster generation through speculative decoding without sacrificing output quality.
Google has officially released Multi-Token Prediction (MTP) draft models for the Gemma 4 family, including the 31B and various MoE variants. MTP works by pairing the base model with a smaller, faster draft model that predicts multiple tokens ahead. These predictions are then verified in parallel by the main model using a Speculative Decoding pipeline. This approach achieves up to a 2x speedup in inference speed, which is critical for local and on-device deployments. Crucially, the final output remains identical to standard generation, offering a significant performance boost for supported hardware and software stacks without sacrificing quality.
r/LocalLLaMA·model_release·05/05/2026, 04:01 PM·/u/rerri
Gemma 4 MTP released
Get up to 2x faster inference on Gemma 4 models using the newly released Multi-Token Prediction draft checkpoints for speculative decoding.
Google has officially released Multi-Token Prediction (MTP) draft models for the Gemma 4 family, including variants for the 31B and smaller models. These draft models are designed for Speculative Decoding, where a smaller model predicts multiple future tokens that the main model then validates in parallel. This technique can achieve up to 2x speedups in generation latency while maintaining identical output quality compared to standard autoregressive generation. The release includes specialized checkpoints on Hugging Face tuned as assistants for the main Gemma 4 weights. This is a significant update for local LLM users and on-device applications where inference speed is often the primary bottleneck.
r/LocalLLaMA·model_release·05/05/2026, 04:01 PM·rerri
Relevance auto-scored by LLM (0–10). List shows top 30 from the last 7 days.