Curated Claude Code catalog
Updated 07.05.2026 · 19:39 CET
01 / Skill
K-Dense-AI

scientific-agent-skills

Quality
9.0

This repository offers 135 ready-to-use scientific and research skills, transforming any AI agent supporting the Agent Skills standard into a powerful research assistant. It excels in automating complex multi-step scientific workflows across diverse domains like bioinformatics, drug discovery, clinical research, and materials science, integrating with over 100 scientific databases and optimized Python packages.

USP

This comprehensive collection stands out with 135 pre-built skills and 100+ database integrations, now compatible with any Agent Skills-compliant AI agent. Its unique K-Dense BYOK desktop app provides a private, powerful AI co-scientist wo…

Use cases

  • 01Bioinformatics & Genomics analysis
  • 02Cheminformatics & Drug Discovery
  • 03Clinical Research & Precision Medicine
  • 04Scientific Data Analysis & Visualization
  • 05Automated Literature Review

Detected files (8)

  • scientific-skills/adaptyv/SKILL.mdskill
    Show content (7629 bytes)
    ---
    name: adaptyv
    author: "K-Dense, Inc."
    description: "How to use the Adaptyv Bio Foundry API and Python SDK for protein experiment design, submission, and results retrieval. Use this skill whenever the user mentions Adaptyv, Foundry API, protein binding assays, protein screening experiments, BLI/SPR assays, thermostability assays, or wants to submit protein sequences for experimental characterization. Also trigger when code imports `adaptyv`, `adaptyv_sdk`, or `FoundryClient`, or references `foundry-api-public.adaptyvbio.com`."
    ---
    
    # Adaptyv Bio Foundry API
    
    Adaptyv Bio is a cloud lab that turns protein sequences into experimental data. Users submit amino acid sequences via API or UI; Adaptyv's automated lab runs assays (binding, thermostability, expression, fluorescence) and delivers results in ~21 days.
    
    ## Quick Start
    
    **Base URL:** `https://foundry-api-public.adaptyvbio.com/api/v1`
    
    **Authentication:** Bearer token in the `Authorization` header. Tokens are obtained from [foundry.adaptyvbio.com](https://foundry.adaptyvbio.com/) sidebar.
    
    When writing code, always read the API key from the environment variable `ADAPTYV_API_KEY` or from a `.env` file — never hardcode tokens. Check for a `.env` file in the project root first; if one exists, use a library like `python-dotenv` to load it.
    
    ```bash
    export FOUNDRY_API_TOKEN="abs0_..."
    curl https://foundry-api-public.adaptyvbio.com/api/v1/targets?limit=3 \
      -H "Authorization: Bearer $FOUNDRY_API_TOKEN"
    ```
    
    Every request except `GET /openapi.json` requires authentication. Store tokens in environment variables or `.env` files — never commit them to source control.
    
    ## Python SDK
    
    Install: `uv add adaptyv-sdk` (falls back to `uv pip install adaptyv-sdk` if no `pyproject.toml` exists)
    
    **Environment variables** (set in shell or `.env` file):
    ```bash
    ADAPTYV_API_KEY=your_api_key
    ADAPTYV_API_URL=https://foundry-api-public.adaptyvbio.com/api/v1
    ```
    
    ### Decorator Pattern
    
    ```python
    from adaptyv import lab
    
    @lab.experiment(target="PD-L1", experiment_type="screening", method="bli")
    def design_binders():
        return {"design_a": "MVKVGVNG...", "design_b": "MKVLVAG..."}
    
    result = design_binders()
    print(f"Experiment: {result.experiment_url}")
    ```
    
    ### Client Pattern
    
    ```python
    from adaptyv import FoundryClient
    
    client = FoundryClient(api_key="...", base_url="https://foundry-api-public.adaptyvbio.com/api/v1")
    
    # Browse targets
    targets = client.targets.list(search="EGFR", selfservice_only=True)
    
    # Estimate cost
    estimate = client.experiments.cost_estimate({
        "experiment_spec": {
            "experiment_type": "screening",
            "method": "bli",
            "target_id": "target-uuid",
            "sequences": {"seq1": "EVQLVESGGGLVQ..."},
            "n_replicates": 3
        }
    })
    
    # Create and submit
    exp = client.experiments.create({...})
    client.experiments.submit(exp.experiment_id)
    
    # Later: retrieve results
    results = client.experiments.get_results(exp.experiment_id)
    ```
    
    ## Experiment Types
    
    | Type | Method | Measures | Requires Target |
    |---|---|---|---|
    | `affinity` | `bli` or `spr` | KD, kon, koff kinetics | Yes |
    | `screening` | `bli` or `spr` | Yes/no binding | Yes |
    | `thermostability` | — | Melting temperature (Tm) | No |
    | `expression` | — | Expression yield | No |
    | `fluorescence` | — | Fluorescence intensity | No |
    
    ## Experiment Lifecycle
    
    ```
    Draft → WaitingForConfirmation → QuoteSent → WaitingForMaterials → InQueue → InProduction → DataAnalysis → InReview → Done
    ```
    
    | Status | Who Acts | Description |
    |---|---|---|
    | `Draft` | You | Editable, no cost commitment |
    | `WaitingForConfirmation` | Adaptyv | Under review, quote being prepared |
    | `QuoteSent` | You | Review and confirm the quote |
    | `WaitingForMaterials` | Adaptyv | Gene fragments and target ordered |
    | `InQueue` | Adaptyv | Materials arrived, queued for lab |
    | `InProduction` | Adaptyv | Assay running |
    | `DataAnalysis` | Adaptyv | Raw data processing and QC |
    | `InReview` | Adaptyv | Final validation |
    | `Done` | You | Results available |
    | `Canceled` | Either | Experiment canceled |
    
    The `results_status` field on an experiment tracks: `none`, `partial`, or `all`.
    
    ## Common Workflows
    
    ### 1. Submit a Binding Screen (Step by Step)
    
    ```python
    # 1. Find a target
    targets = client.targets.list(search="EGFR", selfservice_only=True)
    target_id = targets.items[0].id
    
    # 2. Preview cost
    estimate = client.experiments.cost_estimate({
        "experiment_spec": {
            "experiment_type": "screening",
            "method": "bli",
            "target_id": target_id,
            "sequences": {"seq1": "EVQLVESGGGLVQ...", "seq2": "MKVLVAG..."},
            "n_replicates": 3
        }
    })
    
    # 3. Create experiment (starts as Draft)
    exp = client.experiments.create({
        "name": "EGFR binder screen batch 1",
        "experiment_spec": {
            "experiment_type": "screening",
            "method": "bli",
            "target_id": target_id,
            "sequences": {"seq1": "EVQLVESGGGLVQ...", "seq2": "MKVLVAG..."},
            "n_replicates": 3
        }
    })
    
    # 4. Submit for review
    client.experiments.submit(exp.experiment_id)
    
    # 5. Poll or use webhooks until Done
    # 6. Retrieve results
    results = client.experiments.get_results(exp.experiment_id)
    ```
    
    ### 2. Automated Pipeline (Skip Draft + Auto-Accept Quote)
    
    ```python
    exp = client.experiments.create({
        "name": "Auto pipeline run",
        "experiment_spec": {...},
        "skip_draft": True,
        "auto_accept_quote": True,
        "webhook_url": "https://my-server.com/webhook"
    })
    # Webhook fires on each status transition; poll or wait for Done
    ```
    
    ### 3. Using Webhooks
    
    Pass `webhook_url` when creating an experiment. Adaptyv POSTs to that URL on every status transition with the experiment ID, previous status, and new status.
    
    ## Sequences
    
    - Simple format: `{"seq1": "EVQLVESGGGLVQPGGSLRLSCAAS"}`
    - Rich format: `{"seq1": {"aa_string": "EVQLVESGGGLVQ...", "control": false, "metadata": {"type": "scfv"}}}`
    - Multi-chain: use colon separator — `"MVLS:EVQL"`
    - Valid amino acids: A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y (case-insensitive, stored uppercase)
    - Sequences can only be added to experiments in `Draft` status
    
    ## Filtering, Sorting, and Pagination
    
    All list endpoints support pagination (`limit` 1-100, default 50; `offset`), search (free-text on name fields), and sorting.
    
    **Filtering** uses s-expression syntax via the `filter` query parameter:
    - Comparison: `eq(field,value)`, `neq`, `gt`, `gte`, `lt`, `lte`, `contains(field,substring)`
    - Range/set: `between(field,lo,hi)`, `in(field,v1,v2,...)`
    - Logic: `and(expr1,expr2,...)`, `or(...)`, `not(expr)`
    - Null: `is_null(field)`, `is_not_null(field)`
    - JSONB: `at(field,key)` — e.g., `eq(at(metadata,score),42)`
    - Cast: `float()`, `int()`, `text()`, `timestamp()`, `date()`
    
    **Sorting** uses `asc(field)` or `desc(field)`, comma-separated (max 8):
    ```
    sort=desc(created_at),asc(name)
    ```
    
    **Example:** `filter=and(gte(created_at,2026-01-01),eq(status,done))`
    
    ## Error Handling
    
    All errors return:
    ```json
    {
      "error": "Human-readable description",
      "request_id": "req_019462a4-b1c2-7def-8901-23456789abcd"
    }
    ```
    The `request_id` is also in the `x-request-id` response header — include it when contacting support.
    
    ## Token Management
    
    Tokens use Biscuit-based cryptographic attenuation. You can create restricted tokens scoped by organization, resource type, actions (read/create/update), and expiry via `POST /tokens/attenuate`. Revoking a token (`POST /tokens/revoke`) revokes it and all its descendants.
    
    ## Detailed API Reference
    
    For the full list of all 32 endpoints with request/response schemas, read `references/api-endpoints.md`.
    
  • scientific-skills/astropy/SKILL.mdskill
    Show content (11534 bytes)
    ---
    name: astropy
    description: Comprehensive Python library for astronomy and astrophysics. This skill should be used when working with astronomical data including celestial coordinates, physical units, FITS files, cosmological calculations, time systems, tables, world coordinate systems (WCS), and astronomical data analysis. Use when tasks involve coordinate transformations, unit conversions, FITS file manipulation, cosmological distance calculations, time scale conversions, or astronomical data processing.
    license: BSD-3-Clause license
    metadata:
        skill-author: K-Dense Inc.
    ---
    
    # Astropy
    
    ## Overview
    
    Astropy is the core Python package for astronomy, providing essential functionality for astronomical research and data analysis. Use astropy for coordinate transformations, unit and quantity calculations, FITS file operations, cosmological calculations, precise time handling, tabular data manipulation, and astronomical image processing.
    
    ## When to Use This Skill
    
    Use astropy when tasks involve:
    - Converting between celestial coordinate systems (ICRS, Galactic, FK5, AltAz, etc.)
    - Working with physical units and quantities (converting Jy to mJy, parsecs to km, etc.)
    - Reading, writing, or manipulating FITS files (images or tables)
    - Cosmological calculations (luminosity distance, lookback time, Hubble parameter)
    - Precise time handling with different time scales (UTC, TAI, TT, TDB) and formats (JD, MJD, ISO)
    - Table operations (reading catalogs, cross-matching, filtering, joining)
    - WCS transformations between pixel and world coordinates
    - Astronomical constants and calculations
    
    ## Quick Start
    
    ```python
    import astropy.units as u
    from astropy.coordinates import SkyCoord
    from astropy.time import Time
    from astropy.io import fits
    from astropy.table import Table
    from astropy.cosmology import Planck18
    
    # Units and quantities
    distance = 100 * u.pc
    distance_km = distance.to(u.km)
    
    # Coordinates
    coord = SkyCoord(ra=10.5*u.degree, dec=41.2*u.degree, frame='icrs')
    coord_galactic = coord.galactic
    
    # Time
    t = Time('2023-01-15 12:30:00')
    jd = t.jd  # Julian Date
    
    # FITS files
    data = fits.getdata('image.fits')
    header = fits.getheader('image.fits')
    
    # Tables
    table = Table.read('catalog.fits')
    
    # Cosmology
    d_L = Planck18.luminosity_distance(z=1.0)
    ```
    
    ## Core Capabilities
    
    ### 1. Units and Quantities (`astropy.units`)
    
    Handle physical quantities with units, perform unit conversions, and ensure dimensional consistency in calculations.
    
    **Key operations:**
    - Create quantities by multiplying values with units
    - Convert between units using `.to()` method
    - Perform arithmetic with automatic unit handling
    - Use equivalencies for domain-specific conversions (spectral, doppler, parallax)
    - Work with logarithmic units (magnitudes, decibels)
    
    **See:** `references/units.md` for comprehensive documentation, unit systems, equivalencies, performance optimization, and unit arithmetic.
    
    ### 2. Coordinate Systems (`astropy.coordinates`)
    
    Represent celestial positions and transform between different coordinate frames.
    
    **Key operations:**
    - Create coordinates with `SkyCoord` in any frame (ICRS, Galactic, FK5, AltAz, etc.)
    - Transform between coordinate systems
    - Calculate angular separations and position angles
    - Match coordinates to catalogs
    - Include distance for 3D coordinate operations
    - Handle proper motions and radial velocities
    - Query named objects from online databases
    
    **See:** `references/coordinates.md` for detailed coordinate frame descriptions, transformations, observer-dependent frames (AltAz), catalog matching, and performance tips.
    
    ### 3. Cosmological Calculations (`astropy.cosmology`)
    
    Perform cosmological calculations using standard cosmological models.
    
    **Key operations:**
    - Use built-in cosmologies (Planck18, WMAP9, etc.)
    - Create custom cosmological models
    - Calculate distances (luminosity, comoving, angular diameter)
    - Compute ages and lookback times
    - Determine Hubble parameter at any redshift
    - Calculate density parameters and volumes
    - Perform inverse calculations (find z for given distance)
    
    **See:** `references/cosmology.md` for available models, distance calculations, time calculations, density parameters, and neutrino effects.
    
    ### 4. FITS File Handling (`astropy.io.fits`)
    
    Read, write, and manipulate FITS (Flexible Image Transport System) files.
    
    **Key operations:**
    - Open FITS files with context managers
    - Access HDUs (Header Data Units) by index or name
    - Read and modify headers (keywords, comments, history)
    - Work with image data (NumPy arrays)
    - Handle table data (binary and ASCII tables)
    - Create new FITS files (single or multi-extension)
    - Use memory mapping for large files
    - Access remote FITS files (S3, HTTP)
    
    **See:** `references/fits.md` for comprehensive file operations, header manipulation, image and table handling, multi-extension files, and performance considerations.
    
    ### 5. Table Operations (`astropy.table`)
    
    Work with tabular data with support for units, metadata, and various file formats.
    
    **Key operations:**
    - Create tables from arrays, lists, or dictionaries
    - Read/write tables in multiple formats (FITS, CSV, HDF5, VOTable)
    - Access and modify columns and rows
    - Sort, filter, and index tables
    - Perform database-style operations (join, group, aggregate)
    - Stack and concatenate tables
    - Work with unit-aware columns (QTable)
    - Handle missing data with masking
    
    **See:** `references/tables.md` for table creation, I/O operations, data manipulation, sorting, filtering, joins, grouping, and performance tips.
    
    ### 6. Time Handling (`astropy.time`)
    
    Precise time representation and conversion between time scales and formats.
    
    **Key operations:**
    - Create Time objects in various formats (ISO, JD, MJD, Unix, etc.)
    - Convert between time scales (UTC, TAI, TT, TDB, etc.)
    - Perform time arithmetic with TimeDelta
    - Calculate sidereal time for observers
    - Compute light travel time corrections (barycentric, heliocentric)
    - Work with time arrays efficiently
    - Handle masked (missing) times
    
    **See:** `references/time.md` for time formats, time scales, conversions, arithmetic, observing features, and precision handling.
    
    ### 7. World Coordinate System (`astropy.wcs`)
    
    Transform between pixel coordinates in images and world coordinates.
    
    **Key operations:**
    - Read WCS from FITS headers
    - Convert pixel coordinates to world coordinates (and vice versa)
    - Calculate image footprints
    - Access WCS parameters (reference pixel, projection, scale)
    - Create custom WCS objects
    
    **See:** `references/wcs_and_other_modules.md` for WCS operations and transformations.
    
    ## Additional Capabilities
    
    The `references/wcs_and_other_modules.md` file also covers:
    
    ### NDData and CCDData
    Containers for n-dimensional datasets with metadata, uncertainty, masking, and WCS information.
    
    ### Modeling
    Framework for creating and fitting mathematical models to astronomical data.
    
    ### Visualization
    Tools for astronomical image display with appropriate stretching and scaling.
    
    ### Constants
    Physical and astronomical constants with proper units (speed of light, solar mass, Planck constant, etc.).
    
    ### Convolution
    Image processing kernels for smoothing and filtering.
    
    ### Statistics
    Robust statistical functions including sigma clipping and outlier rejection.
    
    ## Installation
    
    ```bash
    # Install astropy
    uv pip install astropy
    
    # With optional dependencies for full functionality
    uv pip install astropy[all]
    ```
    
    ## Common Workflows
    
    ### Converting Coordinates Between Systems
    
    ```python
    from astropy.coordinates import SkyCoord
    import astropy.units as u
    
    # Create coordinate
    c = SkyCoord(ra='05h23m34.5s', dec='-69d45m22s', frame='icrs')
    
    # Transform to galactic
    c_gal = c.galactic
    print(f"l={c_gal.l.deg}, b={c_gal.b.deg}")
    
    # Transform to alt-az (requires time and location)
    from astropy.time import Time
    from astropy.coordinates import EarthLocation, AltAz
    
    observing_time = Time('2023-06-15 23:00:00')
    observing_location = EarthLocation(lat=40*u.deg, lon=-120*u.deg)
    aa_frame = AltAz(obstime=observing_time, location=observing_location)
    c_altaz = c.transform_to(aa_frame)
    print(f"Alt={c_altaz.alt.deg}, Az={c_altaz.az.deg}")
    ```
    
    ### Reading and Analyzing FITS Files
    
    ```python
    from astropy.io import fits
    import numpy as np
    
    # Open FITS file
    with fits.open('observation.fits') as hdul:
        # Display structure
        hdul.info()
    
        # Get image data and header
        data = hdul[1].data
        header = hdul[1].header
    
        # Access header values
        exptime = header['EXPTIME']
        filter_name = header['FILTER']
    
        # Analyze data
        mean = np.mean(data)
        median = np.median(data)
        print(f"Mean: {mean}, Median: {median}")
    ```
    
    ### Cosmological Distance Calculations
    
    ```python
    from astropy.cosmology import Planck18
    import astropy.units as u
    import numpy as np
    
    # Calculate distances at z=1.5
    z = 1.5
    d_L = Planck18.luminosity_distance(z)
    d_A = Planck18.angular_diameter_distance(z)
    
    print(f"Luminosity distance: {d_L}")
    print(f"Angular diameter distance: {d_A}")
    
    # Age of universe at that redshift
    age = Planck18.age(z)
    print(f"Age at z={z}: {age.to(u.Gyr)}")
    
    # Lookback time
    t_lookback = Planck18.lookback_time(z)
    print(f"Lookback time: {t_lookback.to(u.Gyr)}")
    ```
    
    ### Cross-Matching Catalogs
    
    ```python
    from astropy.table import Table
    from astropy.coordinates import SkyCoord, match_coordinates_sky
    import astropy.units as u
    
    # Read catalogs
    cat1 = Table.read('catalog1.fits')
    cat2 = Table.read('catalog2.fits')
    
    # Create coordinate objects
    coords1 = SkyCoord(ra=cat1['RA']*u.degree, dec=cat1['DEC']*u.degree)
    coords2 = SkyCoord(ra=cat2['RA']*u.degree, dec=cat2['DEC']*u.degree)
    
    # Find matches
    idx, sep, _ = coords1.match_to_catalog_sky(coords2)
    
    # Filter by separation threshold
    max_sep = 1 * u.arcsec
    matches = sep < max_sep
    
    # Create matched catalogs
    cat1_matched = cat1[matches]
    cat2_matched = cat2[idx[matches]]
    print(f"Found {len(cat1_matched)} matches")
    ```
    
    ## Best Practices
    
    1. **Always use units**: Attach units to quantities to avoid errors and ensure dimensional consistency
    2. **Use context managers for FITS files**: Ensures proper file closing
    3. **Prefer arrays over loops**: Process multiple coordinates/times as arrays for better performance
    4. **Check coordinate frames**: Verify the frame before transformations
    5. **Use appropriate cosmology**: Choose the right cosmological model for your analysis
    6. **Handle missing data**: Use masked columns for tables with missing values
    7. **Specify time scales**: Be explicit about time scales (UTC, TT, TDB) for precise timing
    8. **Use QTable for unit-aware tables**: When table columns have units
    9. **Check WCS validity**: Verify WCS before using transformations
    10. **Cache frequently used values**: Expensive calculations (e.g., cosmological distances) can be cached
    
    ## Documentation and Resources
    
    - Official Astropy Documentation: https://docs.astropy.org/en/stable/
    - Tutorials: https://learn.astropy.org/
    - GitHub: https://github.com/astropy/astropy
    
    ## Reference Files
    
    For detailed information on specific modules:
    - `references/units.md` - Units, quantities, conversions, and equivalencies
    - `references/coordinates.md` - Coordinate systems, transformations, and catalog matching
    - `references/cosmology.md` - Cosmological models and calculations
    - `references/fits.md` - FITS file operations and manipulation
    - `references/tables.md` - Table creation, I/O, and operations
    - `references/time.md` - Time formats, scales, and calculations
    - `references/wcs_and_other_modules.md` - WCS, NDData, modeling, visualization, constants, and utilities
    
    
  • scientific-skills/autoskill/SKILL.mdskill
    Show content (11480 bytes)
    ---
    name: autoskill
    description: Observe the user's screen via screenpipe, detect repeated research workflows, match them against existing scientific-agent-skills, and draft new skills (or composition recipes that chain existing ones) for the patterns not yet covered. Use when the user asks to analyze their recent work and propose skills based on what they actually do. Requires the screenpipe daemon (https://github.com/screenpipe/screenpipe) running locally on port 3030 — the skill has no other data source and will refuse to run if screenpipe is unreachable. All detection runs locally; only redacted cluster summaries reach the LLM.
    allowed-tools: Read Write Edit Bash
    license: MIT license
    metadata:
        skill-author: K-Dense Inc.
        requires: screenpipe
    ---
    
    # autoskill
    
    > **Requires a running [screenpipe](https://github.com/screenpipe/screenpipe) daemon.** This skill has no alternate data source — it reads exclusively from the local screenpipe HTTP API (default `http://localhost:3030`). If the daemon isn't running, `run()` raises `ScreenpipeUnreachable` with install instructions.
    
    > **Network access & environment variables.** This skill makes authenticated HTTP requests to (a) the user's local screenpipe daemon on loopback, and (b) the user-configured LLM backend — one of `http://localhost:1234/v1` (LM Studio, default), `https://api.anthropic.com` (opt-in Claude), or a user-supplied BYOK Foundry gateway. The skill reads three environment variables — `SCREENPIPE_TOKEN`, `ANTHROPIC_API_KEY`, `FOUNDRY_API_KEY` — and uses each only to authenticate to the single endpoint its name implies. No other network destinations, no telemetry, no data egress to any third party.
    
    ## Overview
    
    Turn the user's own workflow history — captured passively by the local [screenpipe](https://github.com/screenpipe/screenpipe) daemon — into new skills. This skill is on-demand: the user invokes it with a time window, it queries screenpipe's local HTTP API, clusters repeated workflow patterns, compares each pattern against the existing skills in this repo, and produces a staged folder of proposals the user can review, edit, and promote.
    
    ## When to Use This Skill
    
    Invoke this skill when the user asks to:
    - "Analyze my last 4 hours / day / week and propose new skills."
    - "Look at what I've been doing and tell me what's not covered yet."
    - "Draft a skill from my recent workflow."
    - "Find composition recipes for workflows I repeat."
    
    Do **not** invoke it for one-off questions about screenpipe itself, for real-time screen queries, or without an explicit user request — the skill analyzes sensitive local content and must stay explicitly user-triggered.
    
    ## Privacy Posture
    
    - **Screenpipe handles app/window filtering at capture time.** Install a starter deny-list by copying `references/screenpipe-config.yaml` into the user's screenpipe config. Sensitive apps (password managers, messaging, banking) are never OCR'd in the first place.
    - **Raw OCR never leaves the machine.** `scripts/fetch_window.py` pulls data over localhost HTTP. `scripts/cluster.py` reduces the timeline to app/duration/title summaries. `scripts/redact.py` strips emails, API keys, bearer tokens, and phone numbers as defense-in-depth before any cluster summary reaches the LLM.
    - **LLM backend defaults to `local`.** The recommended setup is [LM Studio](https://lmstudio.ai/) running `Gemma-4-31B-it` — strong reasoning at a size that fits on most workstation GPUs, and no data ever leaves your machine. Cloud backends (`claude`, `foundry`) are opt-in and documented in `config.yaml` for users who explicitly want them. Detection and embeddings always run locally regardless of backend choice.
    - **Dry-run mode** (`--plan`) prints the exact timeline that will be analyzed before any LLM call.
    - **TLS for localhost** (optional, for corporate policy): see `references/https-proxy.md` for the Caddy pattern.
    
    ## Prerequisites
    
    ### 1. Screenpipe daemon
    
    Either install the official release or build from source. Either way the daemon binds HTTP on `localhost:3030` by default.
    
    **From source** (recommended if you want the CLI daemon without the desktop GUI):
    
    ```bash
    git clone --depth 1 https://github.com/mediar-ai/screenpipe.git
    cd screenpipe
    cargo build -p screenpipe-engine --release
    # System deps (macOS): cmake + full Xcode.app (not just Command Line Tools).
    #   brew install cmake
    #   # if xcodebuild plug-ins error: sudo xcodebuild -runFirstLaunch
    ./target/release/screenpipe doctor   # confirm permissions + ffmpeg
    ./target/release/screenpipe record --disable-audio --use-pii-removal
    ```
    
    First run will prompt for macOS Screen Recording permission. Grant it and relaunch.
    
    ### 2. Screenpipe API token
    
    The local API now requires bearer auth. Retrieve your token and export it:
    
    ```bash
    export SCREENPIPE_TOKEN=$(screenpipe auth token)
    ```
    
    (Or set `screenpipe.token` directly in `config.yaml` — env var is preferred since it keeps secrets out of version control.)
    
    ### 3. Python environment
    
    Via `pipenv` from the repo root:
    
    ```bash
    pipenv install httpx pyyaml sentence-transformers
    ```
    
    The embedding model (`sentence-transformers/all-MiniLM-L6-v2`, ~80 MB) downloads on first run.
    
    ### 4. Local LLM (default path) — LM Studio
    
    - Install [LM Studio](https://lmstudio.ai/).
    - Download `Gemma-4-31B-it` (or another strong reasoning model; adjust `local.model` in `config.yaml`).
    - Load it via the CLI for headless use (no GUI required):
    
    ```bash
    lms load gemma-4-31b-it --context-length 131072 --gpu max -y
    lms status   # confirm server running on :1234
    ```
    
    ### 5. Cloud LLM backends (optional, opt-in)
    
    Only if you explicitly opt out of local:
    - `claude`: set `ANTHROPIC_API_KEY`, flip `backend: claude` in `config.yaml`.
    - `foundry`: set `FOUNDRY_API_KEY`, flip `backend: foundry`, set `foundry.endpoint` to your corporate gateway URL.
    
    ## Architecture
    
    ```
    screenpipe daemon (user-installed)
            │  HTTP on localhost:3030
            ▼
    scripts/fetch_window.py    → normalized timeline events
    scripts/redact.py          → regex scrub (defense-in-depth)
    scripts/cluster.py         → sessions + clusters (local only)
    scripts/match_skills.py    → top-k vs existing 135 skills (local embeddings)
    scripts/synthesize.py      → LLM judge: reuse / compose / novel
            │
            ▼
    ~/.autoskill/proposed/<timestamp>/        (default; override with --out)
      ├── report.md
      ├── composition-recipes/<name>/SKILL.md
      └── new-skills/<name>/SKILL.md
    
    scripts/promote.py         → user-approved proposal → scientific-skills/<name>/
    ```
    
    ## Workflow
    
    The skill ships a unified CLI at `scripts/autoskill.py` with three subcommands:
    
    ```bash
    python scripts/autoskill.py doctor   --config config.yaml --skills-dir ../
    python scripts/autoskill.py run      --start ... --end ... --config config.yaml
    python scripts/autoskill.py promote  --proposed ~/.autoskill/proposed/<ts> --skills-dir ../ --name <skill>
    ```
    
    ### 0. Preflight with `doctor`
    
    Before a full run, verify every dependency in one shot:
    
    ```bash
    python scripts/autoskill.py doctor \
      --config scientific-skills/autoskill/config.yaml \
      --skills-dir scientific-skills
    ```
    
    The report covers `config` (backend choice valid), `skills_dir` (exists), `screenpipe` (reachable + authed), and `llm` (LM Studio serving or API key present). Non-zero exit on any failure, with the offending line marked `error`.
    
    ### 1. Run the pipeline
    
    ```bash
    export SCREENPIPE_TOKEN=$(screenpipe auth token)
    python scripts/autoskill.py run \
      --start "2026-04-17T00:00:00Z" \
      --end   "2026-04-17T23:59:59Z" \
      --config scientific-skills/autoskill/config.yaml \
      --skills-dir scientific-skills
    ```
    
    Proposals land in `~/.autoskill/proposed/<timestamp>/` by default, keeping experimental output out of the skills repo. Pass `--out PATH` to override.
    
    Internally:
    1. **Fetch** — `fetch_window` paginates screenpipe's `/search` endpoint, normalizes events to `{ts, app, window_title, text, content_type}`.
    2. **Redact** — `redact` scrubs emails, API keys, bearer tokens, phones from OCR text and window titles as defense-in-depth over screenpipe's own PII removal.
    3. **Cluster** — `segment_sessions` splits on idle gaps (default 10 min) and drops short sessions; `cluster_sessions` groups sessions by app-signature and keeps clusters of size `min_cluster_size` (default 2).
    4. **Match** — `load_skill_descriptions` reads frontmatter from every `SKILL.md` in `scientific-skills/`; `top_k_matches` ranks each cluster against all skills using local `sentence-transformers` embeddings (cosine similarity).
    5. **Synthesize** — `synthesize` prompts the configured LLM backend to classify each cluster as `reuse`, `compose`, or `novel` and emit a SKILL.md body where appropriate.
    6. **Report** — writes `<out_dir>/<ts>/report.md`, plus `new-skills/<name>/SKILL.md` or `composition-recipes/<name>/SKILL.md` for each proposal.
    
    Add `--dry-run` to stop after clustering; this skips the LLM (and the sentence-transformers load), writing only `plan.md` for inspection.
    
    ### 2. Review and promote
    
    Open `~/.autoskill/proposed/<ts>/report.md`, edit drafts in place, delete anything you don't want. Then:
    
    ```bash
    python scripts/autoskill.py promote \
      --proposed ~/.autoskill/proposed/2026-04-17T14-30-00 \
      --skills-dir scientific-skills \
      --name zotero-pubmed-helper
    ```
    
    `promote` moves the directory into `scientific-skills/<name>/`, refusing to overwrite an existing skill. Exits non-zero with a friendly error if the proposal isn't found or the target already exists.
    
    ## Configuration
    
    See `config.yaml` for the full shape. Default values (local-first):
    
    ```yaml
    backend: local
    local:
      endpoint: http://localhost:1234/v1   # LM Studio's Developer server
      model: Gemma-4-31B-it
    
    screenpipe:
      url: http://localhost:3030           # or https://screenpipe.local via Caddy
    
    cluster:
      min_session_minutes: 5
      idle_gap_minutes: 10
      min_cluster_size: 2
    ```
    
    To opt into a cloud backend:
    
    ```yaml
    backend: claude                         # or foundry
    claude:
      model: claude-opus-4-7
    ```
    
    ## Composition recipes vs new skills
    
    - **compose**: the LLM judged that chaining existing skills covers the workflow. The emitted SKILL.md is intentionally thin — frontmatter + a "Workflow" section that invokes existing skills in order. The same agent runtime that discovered the skill can then invoke it end-to-end.
    - **novel**: no combination of existing skills covers it. A fuller SKILL.md is drafted, still following repo conventions (frontmatter, Overview, When to Use, Workflow). The user should always review new-skill drafts before promoting.
    
    ## Testing
    
    The skill is covered by a small pytest suite at `tests/`. Each script is unit-tested in isolation with dependency injection (mock HTTP transport, stub backend, stub embedder):
    
    ```bash
    cd scientific-skills/autoskill
    python -m pytest tests/ -v
    ```
    
    ## Composition with other skills in this repo
    
    The autoskill's embedding index covers all 135 sibling skills. Workflows that look like scientific writing will match `scientific-writing` / `literature-review` / `citation-management`; figure work will match `scientific-schematics` / `generate-image` / `infographics`; slide prep matches `scientific-slides` / `pptx`; etc. When a cluster scores high against two or three sibling skills the emitted composition recipe names them explicitly, so the user's future agent invocations use the optimized paths already documented in this repo.
    
  • scientific-skills/benchling-integration/SKILL.mdskill
    Show content (13063 bytes)
    ---
    name: benchling-integration
    description: Benchling R&D platform integration. Access registry (DNA, proteins), inventory, ELN entries, workflows via API, build Benchling Apps, query Data Warehouse, for lab data management automation.
    license: Unknown
    compatibility: Requires a Benchling account and API key
    metadata:
        skill-author: K-Dense Inc.
    ---
    
    # Benchling Integration
    
    ## Overview
    
    Benchling is a cloud platform for life sciences R&D. Access registry entities (DNA, proteins), inventory, electronic lab notebooks, and workflows programmatically via Python SDK and REST API.
    
    ## When to Use This Skill
    
    This skill should be used when:
    - Working with Benchling's Python SDK or REST API
    - Managing biological sequences (DNA, RNA, proteins) and registry entities
    - Automating inventory operations (samples, containers, locations, transfers)
    - Creating or querying electronic lab notebook entries
    - Building workflow automations or Benchling Apps
    - Syncing data between Benchling and external systems
    - Querying the Benchling Data Warehouse for analytics
    - Setting up event-driven integrations with AWS EventBridge
    
    ## Core Capabilities
    
    ### 1. Authentication & Setup
    
    **Python SDK Installation:**
    ```python
    # Stable release
    uv pip install benchling-sdk
    # or with Poetry
    poetry add benchling-sdk
    ```
    
    **Authentication Methods:**
    
    API Key Authentication (recommended for scripts):
    ```python
    from benchling_sdk.benchling import Benchling
    from benchling_sdk.auth.api_key_auth import ApiKeyAuth
    
    benchling = Benchling(
        url="https://your-tenant.benchling.com",
        auth_method=ApiKeyAuth("your_api_key")
    )
    ```
    
    OAuth Client Credentials (for apps):
    ```python
    from benchling_sdk.auth.client_credentials_oauth2 import ClientCredentialsOAuth2
    
    auth_method = ClientCredentialsOAuth2(
        client_id="your_client_id",
        client_secret="your_client_secret"
    )
    benchling = Benchling(
        url="https://your-tenant.benchling.com",
        auth_method=auth_method
    )
    ```
    
    **Key Points:**
    - API keys are obtained from Profile Settings in Benchling
    - Store credentials securely (use environment variables or password managers)
    - All API requests require HTTPS
    - Authentication permissions mirror user permissions in the UI
    
    For detailed authentication information including OIDC and security best practices, refer to `references/authentication.md`.
    
    ### 2. Registry & Entity Management
    
    Registry entities include DNA sequences, RNA sequences, AA sequences, custom entities, and mixtures. The SDK provides typed classes for creating and managing these entities.
    
    **Creating DNA Sequences:**
    ```python
    from benchling_sdk.models import DnaSequenceCreate
    
    sequence = benchling.dna_sequences.create(
        DnaSequenceCreate(
            name="My Plasmid",
            bases="ATCGATCG",
            is_circular=True,
            folder_id="fld_abc123",
            schema_id="ts_abc123",  # optional
            fields=benchling.models.fields({"gene_name": "GFP"})
        )
    )
    ```
    
    **Registry Registration:**
    
    To register an entity directly upon creation:
    ```python
    sequence = benchling.dna_sequences.create(
        DnaSequenceCreate(
            name="My Plasmid",
            bases="ATCGATCG",
            is_circular=True,
            folder_id="fld_abc123",
            entity_registry_id="src_abc123",  # Registry to register in
            naming_strategy="NEW_IDS"  # or "IDS_FROM_NAMES"
        )
    )
    ```
    
    **Important:** Use either `entity_registry_id` OR `naming_strategy`, never both.
    
    **Updating Entities:**
    ```python
    from benchling_sdk.models import DnaSequenceUpdate
    
    updated = benchling.dna_sequences.update(
        sequence_id="seq_abc123",
        dna_sequence=DnaSequenceUpdate(
            name="Updated Plasmid Name",
            fields=benchling.models.fields({"gene_name": "mCherry"})
        )
    )
    ```
    
    Unspecified fields remain unchanged, allowing partial updates.
    
    **Listing and Pagination:**
    ```python
    # List all DNA sequences (returns a generator)
    sequences = benchling.dna_sequences.list()
    for page in sequences:
        for seq in page:
            print(f"{seq.name} ({seq.id})")
    
    # Check total count
    total = sequences.estimated_count()
    ```
    
    **Key Operations:**
    - Create: `benchling.<entity_type>.create()`
    - Read: `benchling.<entity_type>.get(id)` or `.list()`
    - Update: `benchling.<entity_type>.update(id, update_object)`
    - Archive: `benchling.<entity_type>.archive(id)`
    
    Entity types: `dna_sequences`, `rna_sequences`, `aa_sequences`, `custom_entities`, `mixtures`
    
    For comprehensive SDK reference and advanced patterns, refer to `references/sdk_reference.md`.
    
    ### 3. Inventory Management
    
    Manage physical samples, containers, boxes, and locations within the Benchling inventory system.
    
    **Creating Containers:**
    ```python
    from benchling_sdk.models import ContainerCreate
    
    container = benchling.containers.create(
        ContainerCreate(
            name="Sample Tube 001",
            schema_id="cont_schema_abc123",
            parent_storage_id="box_abc123",  # optional
            fields=benchling.models.fields({"concentration": "100 ng/μL"})
        )
    )
    ```
    
    **Managing Boxes:**
    ```python
    from benchling_sdk.models import BoxCreate
    
    box = benchling.boxes.create(
        BoxCreate(
            name="Freezer Box A1",
            schema_id="box_schema_abc123",
            parent_storage_id="loc_abc123"
        )
    )
    ```
    
    **Transferring Items:**
    ```python
    # Transfer a container to a new location
    transfer = benchling.containers.transfer(
        container_id="cont_abc123",
        destination_id="box_xyz789"
    )
    ```
    
    **Key Inventory Operations:**
    - Create containers, boxes, locations, plates
    - Update inventory item properties
    - Transfer items between locations
    - Check in/out items
    - Batch operations for bulk transfers
    
    ### 4. Notebook & Documentation
    
    Interact with electronic lab notebook (ELN) entries, protocols, and templates.
    
    **Creating Notebook Entries:**
    ```python
    from benchling_sdk.models import EntryCreate
    
    entry = benchling.entries.create(
        EntryCreate(
            name="Experiment 2025-10-20",
            folder_id="fld_abc123",
            schema_id="entry_schema_abc123",
            fields=benchling.models.fields({"objective": "Test gene expression"})
        )
    )
    ```
    
    **Linking Entities to Entries:**
    ```python
    # Add references to entities in an entry
    entry_link = benchling.entry_links.create(
        entry_id="entry_abc123",
        entity_id="seq_xyz789"
    )
    ```
    
    **Key Notebook Operations:**
    - Create and update lab notebook entries
    - Manage entry templates
    - Link entities and results to entries
    - Export entries for documentation
    
    ### 5. Workflows & Automation
    
    Automate laboratory processes using Benchling's workflow system.
    
    **Creating Workflow Tasks:**
    ```python
    from benchling_sdk.models import WorkflowTaskCreate
    
    task = benchling.workflow_tasks.create(
        WorkflowTaskCreate(
            name="PCR Amplification",
            workflow_id="wf_abc123",
            assignee_id="user_abc123",
            fields=benchling.models.fields({"template": "seq_abc123"})
        )
    )
    ```
    
    **Updating Task Status:**
    ```python
    from benchling_sdk.models import WorkflowTaskUpdate
    
    updated_task = benchling.workflow_tasks.update(
        task_id="task_abc123",
        workflow_task=WorkflowTaskUpdate(
            status_id="status_complete_abc123"
        )
    )
    ```
    
    **Asynchronous Operations:**
    
    Some operations are asynchronous and return tasks:
    ```python
    # Wait for task completion
    from benchling_sdk.helpers.tasks import wait_for_task
    
    result = wait_for_task(
        benchling,
        task_id="task_abc123",
        interval_wait_seconds=2,
        max_wait_seconds=300
    )
    ```
    
    **Key Workflow Operations:**
    - Create and manage workflow tasks
    - Update task statuses and assignments
    - Execute bulk operations asynchronously
    - Monitor task progress
    
    ### 6. Events & Integration
    
    Subscribe to Benchling events for real-time integrations using AWS EventBridge.
    
    **Event Types:**
    - Entity creation, update, archive
    - Inventory transfers
    - Workflow task status changes
    - Entry creation and updates
    - Results registration
    
    **Integration Pattern:**
    1. Configure event routing to AWS EventBridge in Benchling settings
    2. Create EventBridge rules to filter events
    3. Route events to Lambda functions or other targets
    4. Process events and update external systems
    
    **Use Cases:**
    - Sync Benchling data to external databases
    - Trigger downstream processes on workflow completion
    - Send notifications on entity changes
    - Audit trail logging
    
    Refer to Benchling's event documentation for event schemas and configuration.
    
    ### 7. Data Warehouse & Analytics
    
    Query historical Benchling data using SQL through the Data Warehouse.
    
    **Access Method:**
    The Benchling Data Warehouse provides SQL access to Benchling data for analytics and reporting. Connect using standard SQL clients with provided credentials.
    
    **Common Queries:**
    - Aggregate experimental results
    - Analyze inventory trends
    - Generate compliance reports
    - Export data for external analysis
    
    **Integration with Analysis Tools:**
    - Jupyter notebooks for interactive analysis
    - BI tools (Tableau, Looker, PowerBI)
    - Custom dashboards
    
    ## Best Practices
    
    ### Error Handling
    
    The SDK automatically retries failed requests:
    ```python
    # Automatic retry for 429, 502, 503, 504 status codes
    # Up to 5 retries with exponential backoff
    # Customize retry behavior if needed
    from benchling_sdk.retry import RetryStrategy
    
    benchling = Benchling(
        url="https://your-tenant.benchling.com",
        auth_method=ApiKeyAuth("your_api_key"),
        retry_strategy=RetryStrategy(max_retries=3)
    )
    ```
    
    ### Pagination Efficiency
    
    Use generators for memory-efficient pagination:
    ```python
    # Generator-based iteration
    for page in benchling.dna_sequences.list():
        for sequence in page:
            process(sequence)
    
    # Check estimated count without loading all pages
    total = benchling.dna_sequences.list().estimated_count()
    ```
    
    ### Schema Fields Helper
    
    Use the `fields()` helper for custom schema fields:
    ```python
    # Convert dict to Fields object
    custom_fields = benchling.models.fields({
        "concentration": "100 ng/μL",
        "date_prepared": "2025-10-20",
        "notes": "High quality prep"
    })
    ```
    
    ### Forward Compatibility
    
    The SDK handles unknown enum values and types gracefully:
    - Unknown enum values are preserved
    - Unrecognized polymorphic types return `UnknownType`
    - Allows working with newer API versions
    
    ### Security Considerations
    
    - Never commit API keys to version control
    - Use environment variables for credentials
    - Rotate keys if compromised
    - Grant minimal necessary permissions for apps
    - Use OAuth for multi-user scenarios
    
    ## Resources
    
    ### references/
    
    Detailed reference documentation for in-depth information:
    
    - **authentication.md** - Comprehensive authentication guide including OIDC, security best practices, and credential management
    - **sdk_reference.md** - Detailed Python SDK reference with advanced patterns, examples, and all entity types
    - **api_endpoints.md** - REST API endpoint reference for direct HTTP calls without the SDK
    
    Load these references as needed for specific integration requirements.
    
    ### scripts/
    
    This skill currently includes example scripts that can be removed or replaced with custom automation scripts for your specific Benchling workflows.
    
    ## Common Use Cases
    
    **1. Bulk Entity Import:**
    ```python
    # Import multiple sequences from FASTA file
    from Bio import SeqIO
    
    for record in SeqIO.parse("sequences.fasta", "fasta"):
        benchling.dna_sequences.create(
            DnaSequenceCreate(
                name=record.id,
                bases=str(record.seq),
                is_circular=False,
                folder_id="fld_abc123"
            )
        )
    ```
    
    **2. Inventory Audit:**
    ```python
    # List all containers in a specific location
    containers = benchling.containers.list(
        parent_storage_id="box_abc123"
    )
    
    for page in containers:
        for container in page:
            print(f"{container.name}: {container.barcode}")
    ```
    
    **3. Workflow Automation:**
    ```python
    # Update all pending tasks for a workflow
    tasks = benchling.workflow_tasks.list(
        workflow_id="wf_abc123",
        status="pending"
    )
    
    for page in tasks:
        for task in page:
            # Perform automated checks
            if auto_validate(task):
                benchling.workflow_tasks.update(
                    task_id=task.id,
                    workflow_task=WorkflowTaskUpdate(
                        status_id="status_complete"
                    )
                )
    ```
    
    **4. Data Export:**
    ```python
    # Export all sequences with specific properties
    sequences = benchling.dna_sequences.list()
    export_data = []
    
    for page in sequences:
        for seq in page:
            if seq.schema_id == "target_schema_id":
                export_data.append({
                    "id": seq.id,
                    "name": seq.name,
                    "bases": seq.bases,
                    "length": len(seq.bases)
                })
    
    # Save to CSV or database
    import csv
    with open("sequences.csv", "w") as f:
        writer = csv.DictWriter(f, fieldnames=export_data[0].keys())
        writer.writeheader()
        writer.writerows(export_data)
    ```
    
    ## Additional Resources
    
    - **Official Documentation:** https://docs.benchling.com
    - **Python SDK Reference:** https://benchling.com/sdk-docs/
    - **API Reference:** https://benchling.com/api/reference
    - **Support:** [email protected]
    
    
  • scientific-skills/bgpt-paper-search/SKILL.mdskill
    Show content (2479 bytes)
    ---
    name: bgpt-paper-search
    description: Search scientific papers and retrieve structured experimental data extracted from full-text studies via the BGPT MCP server. Returns 25+ fields per paper including methods, results, sample sizes, quality scores, and conclusions. Use for literature reviews, evidence synthesis, and finding experimental details not available in abstracts alone.
    allowed-tools: Bash
    license: MIT
    metadata:
        skill-author: BGPT
        website: https://bgpt.pro/mcp
        github: https://github.com/connerlambden/bgpt-mcp
    ---
    
    # BGPT Paper Search
    
    ## Overview
    
    BGPT is a remote MCP server that searches a curated database of scientific papers built from raw experimental data extracted from full-text studies. Unlike traditional literature databases that return titles and abstracts, BGPT returns structured data from the actual paper content — methods, quantitative results, sample sizes, quality assessments, and 25+ metadata fields per paper.
    
    ## When to Use This Skill
    
    Use this skill when:
    - Searching for scientific papers with specific experimental details
    - Conducting systematic or scoping literature reviews
    - Finding quantitative results, sample sizes, or effect sizes across studies
    - Comparing methodologies used in different studies
    - Looking for papers with quality scores or evidence grading
    - Needing structured data from full-text papers (not just abstracts)
    - Building evidence tables for meta-analyses or clinical guidelines
    
    ## Setup
    
    BGPT is a remote MCP server — no local installation required.
    
    ### Claude Desktop / Claude Code
    
    Add to your MCP configuration:
    
    ```json
    {
      "mcpServers": {
        "bgpt": {
          "command": "npx",
          "args": ["mcp-remote", "https://bgpt.pro/mcp/sse"]
        }
      }
    }
    ```
    
    ### npm (alternative)
    
    ```bash
    npx bgpt-mcp
    ```
    
    ## Usage
    
    Once configured, use the `search_papers` tool provided by the BGPT MCP server:
    
    ```
    Search for papers about: "CRISPR gene editing efficiency in human cells"
    ```
    
    The server returns structured results including:
    - **Title, authors, journal, year, DOI**
    - **Methods**: Experimental techniques, models, protocols
    - **Results**: Key findings with quantitative data
    - **Sample sizes**: Number of subjects/samples
    - **Quality scores**: Study quality assessments
    - **Conclusions**: Author conclusions and implications
    
    ## Pricing
    
    - **Free tier**: 50 searches per network, no API key required
    - **Paid**: $0.01 per result with an API key from [bgpt.pro/mcp](https://bgpt.pro/mcp)
    
    
  • scientific-skills/aeon/SKILL.mdskill
    Show content (10587 bytes)
    ---
    name: aeon
    description: This skill should be used for time series machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use when working with temporal data, sequential patterns, or time-indexed observations requiring specialized algorithms beyond standard ML approaches. Particularly suited for univariate and multivariate time series analysis with scikit-learn compatible APIs.
    license: BSD-3-Clause license
    metadata:
        skill-author: K-Dense Inc.
    ---
    
    # Aeon Time Series Machine Learning
    
    ## Overview
    
    Aeon is a scikit-learn compatible Python toolkit for time series machine learning. It provides state-of-the-art algorithms for classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search.
    
    ## When to Use This Skill
    
    Apply this skill when:
    - Classifying or predicting from time series data
    - Detecting anomalies or change points in temporal sequences
    - Clustering similar time series patterns
    - Forecasting future values
    - Finding repeated patterns (motifs) or unusual subsequences (discords)
    - Comparing time series with specialized distance metrics
    - Extracting features from temporal data
    
    ## Installation
    
    ```bash
    uv pip install aeon
    ```
    
    ## Core Capabilities
    
    ### 1. Time Series Classification
    
    Categorize time series into predefined classes. See `references/classification.md` for complete algorithm catalog.
    
    **Quick Start:**
    ```python
    from aeon.classification.convolution_based import RocketClassifier
    from aeon.datasets import load_classification
    
    # Load data
    X_train, y_train = load_classification("GunPoint", split="train")
    X_test, y_test = load_classification("GunPoint", split="test")
    
    # Train classifier
    clf = RocketClassifier(n_kernels=10000)
    clf.fit(X_train, y_train)
    accuracy = clf.score(X_test, y_test)
    ```
    
    **Algorithm Selection:**
    - **Speed + Performance**: `MiniRocketClassifier`, `Arsenal`
    - **Maximum Accuracy**: `HIVECOTEV2`, `InceptionTimeClassifier`
    - **Interpretability**: `ShapeletTransformClassifier`, `Catch22Classifier`
    - **Small Datasets**: `KNeighborsTimeSeriesClassifier` with DTW distance
    
    ### 2. Time Series Regression
    
    Predict continuous values from time series. See `references/regression.md` for algorithms.
    
    **Quick Start:**
    ```python
    from aeon.regression.convolution_based import RocketRegressor
    from aeon.datasets import load_regression
    
    X_train, y_train = load_regression("Covid3Month", split="train")
    X_test, y_test = load_regression("Covid3Month", split="test")
    
    reg = RocketRegressor()
    reg.fit(X_train, y_train)
    predictions = reg.predict(X_test)
    ```
    
    ### 3. Time Series Clustering
    
    Group similar time series without labels. See `references/clustering.md` for methods.
    
    **Quick Start:**
    ```python
    from aeon.clustering import TimeSeriesKMeans
    
    clusterer = TimeSeriesKMeans(
        n_clusters=3,
        distance="dtw",
        averaging_method="ba"
    )
    labels = clusterer.fit_predict(X_train)
    centers = clusterer.cluster_centers_
    ```
    
    ### 4. Forecasting
    
    Predict future time series values. See `references/forecasting.md` for forecasters.
    
    **Quick Start:**
    ```python
    from aeon.forecasting.arima import ARIMA
    
    forecaster = ARIMA(order=(1, 1, 1))
    forecaster.fit(y_train)
    y_pred = forecaster.predict(fh=[1, 2, 3, 4, 5])
    ```
    
    ### 5. Anomaly Detection
    
    Identify unusual patterns or outliers. See `references/anomaly_detection.md` for detectors.
    
    **Quick Start:**
    ```python
    from aeon.anomaly_detection import STOMP
    
    detector = STOMP(window_size=50)
    anomaly_scores = detector.fit_predict(y)
    
    # Higher scores indicate anomalies
    threshold = np.percentile(anomaly_scores, 95)
    anomalies = anomaly_scores > threshold
    ```
    
    ### 6. Segmentation
    
    Partition time series into regions with change points. See `references/segmentation.md`.
    
    **Quick Start:**
    ```python
    from aeon.segmentation import ClaSPSegmenter
    
    segmenter = ClaSPSegmenter()
    change_points = segmenter.fit_predict(y)
    ```
    
    ### 7. Similarity Search
    
    Find similar patterns within or across time series. See `references/similarity_search.md`.
    
    **Quick Start:**
    ```python
    from aeon.similarity_search import StompMotif
    
    # Find recurring patterns
    motif_finder = StompMotif(window_size=50, k=3)
    motifs = motif_finder.fit_predict(y)
    ```
    
    ## Feature Extraction and Transformations
    
    Transform time series for feature engineering. See `references/transformations.md`.
    
    **ROCKET Features:**
    ```python
    from aeon.transformations.collection.convolution_based import RocketTransformer
    
    rocket = RocketTransformer()
    X_features = rocket.fit_transform(X_train)
    
    # Use features with any sklearn classifier
    from sklearn.ensemble import RandomForestClassifier
    clf = RandomForestClassifier()
    clf.fit(X_features, y_train)
    ```
    
    **Statistical Features:**
    ```python
    from aeon.transformations.collection.feature_based import Catch22
    
    catch22 = Catch22()
    X_features = catch22.fit_transform(X_train)
    ```
    
    **Preprocessing:**
    ```python
    from aeon.transformations.collection import MinMaxScaler, Normalizer
    
    scaler = Normalizer()  # Z-normalization
    X_normalized = scaler.fit_transform(X_train)
    ```
    
    ## Distance Metrics
    
    Specialized temporal distance measures. See `references/distances.md` for complete catalog.
    
    **Usage:**
    ```python
    from aeon.distances import dtw_distance, dtw_pairwise_distance
    
    # Single distance
    distance = dtw_distance(x, y, window=0.1)
    
    # Pairwise distances
    distance_matrix = dtw_pairwise_distance(X_train)
    
    # Use with classifiers
    from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
    
    clf = KNeighborsTimeSeriesClassifier(
        n_neighbors=5,
        distance="dtw",
        distance_params={"window": 0.2}
    )
    ```
    
    **Available Distances:**
    - **Elastic**: DTW, DDTW, WDTW, ERP, EDR, LCSS, TWE, MSM
    - **Lock-step**: Euclidean, Manhattan, Minkowski
    - **Shape-based**: Shape DTW, SBD
    
    ## Deep Learning Networks
    
    Neural architectures for time series. See `references/networks.md`.
    
    **Architectures:**
    - Convolutional: `FCNClassifier`, `ResNetClassifier`, `InceptionTimeClassifier`
    - Recurrent: `RecurrentNetwork`, `TCNNetwork`
    - Autoencoders: `AEFCNClusterer`, `AEResNetClusterer`
    
    **Usage:**
    ```python
    from aeon.classification.deep_learning import InceptionTimeClassifier
    
    clf = InceptionTimeClassifier(n_epochs=100, batch_size=32)
    clf.fit(X_train, y_train)
    predictions = clf.predict(X_test)
    ```
    
    ## Datasets and Benchmarking
    
    Load standard benchmarks and evaluate performance. See `references/datasets_benchmarking.md`.
    
    **Load Datasets:**
    ```python
    from aeon.datasets import load_classification, load_regression
    
    # Classification
    X_train, y_train = load_classification("ArrowHead", split="train")
    
    # Regression
    X_train, y_train = load_regression("Covid3Month", split="train")
    ```
    
    **Benchmarking:**
    ```python
    from aeon.benchmarking import get_estimator_results
    
    # Compare with published results
    published = get_estimator_results("ROCKET", "GunPoint")
    ```
    
    ## Common Workflows
    
    ### Classification Pipeline
    
    ```python
    from aeon.transformations.collection import Normalizer
    from aeon.classification.convolution_based import RocketClassifier
    from sklearn.pipeline import Pipeline
    
    pipeline = Pipeline([
        ('normalize', Normalizer()),
        ('classify', RocketClassifier())
    ])
    
    pipeline.fit(X_train, y_train)
    accuracy = pipeline.score(X_test, y_test)
    ```
    
    ### Feature Extraction + Traditional ML
    
    ```python
    from aeon.transformations.collection import RocketTransformer
    from sklearn.ensemble import GradientBoostingClassifier
    
    # Extract features
    rocket = RocketTransformer()
    X_train_features = rocket.fit_transform(X_train)
    X_test_features = rocket.transform(X_test)
    
    # Train traditional ML
    clf = GradientBoostingClassifier()
    clf.fit(X_train_features, y_train)
    predictions = clf.predict(X_test_features)
    ```
    
    ### Anomaly Detection with Visualization
    
    ```python
    from aeon.anomaly_detection import STOMP
    import matplotlib.pyplot as plt
    
    detector = STOMP(window_size=50)
    scores = detector.fit_predict(y)
    
    plt.figure(figsize=(15, 5))
    plt.subplot(2, 1, 1)
    plt.plot(y, label='Time Series')
    plt.subplot(2, 1, 2)
    plt.plot(scores, label='Anomaly Scores', color='red')
    plt.axhline(np.percentile(scores, 95), color='k', linestyle='--')
    plt.show()
    ```
    
    ## Best Practices
    
    ### Data Preparation
    
    1. **Normalize**: Most algorithms benefit from z-normalization
       ```python
       from aeon.transformations.collection import Normalizer
       normalizer = Normalizer()
       X_train = normalizer.fit_transform(X_train)
       X_test = normalizer.transform(X_test)
       ```
    
    2. **Handle Missing Values**: Impute before analysis
       ```python
       from aeon.transformations.collection import SimpleImputer
       imputer = SimpleImputer(strategy='mean')
       X_train = imputer.fit_transform(X_train)
       ```
    
    3. **Check Data Format**: Aeon expects shape `(n_samples, n_channels, n_timepoints)`
    
    ### Model Selection
    
    1. **Start Simple**: Begin with ROCKET variants before deep learning
    2. **Use Validation**: Split training data for hyperparameter tuning
    3. **Compare Baselines**: Test against simple methods (1-NN Euclidean, Naive)
    4. **Consider Resources**: ROCKET for speed, deep learning if GPU available
    
    ### Algorithm Selection Guide
    
    **For Fast Prototyping:**
    - Classification: `MiniRocketClassifier`
    - Regression: `MiniRocketRegressor`
    - Clustering: `TimeSeriesKMeans` with Euclidean
    
    **For Maximum Accuracy:**
    - Classification: `HIVECOTEV2`, `InceptionTimeClassifier`
    - Regression: `InceptionTimeRegressor`
    - Forecasting: `ARIMA`, `TCNForecaster`
    
    **For Interpretability:**
    - Classification: `ShapeletTransformClassifier`, `Catch22Classifier`
    - Features: `Catch22`, `TSFresh`
    
    **For Small Datasets:**
    - Distance-based: `KNeighborsTimeSeriesClassifier` with DTW
    - Avoid: Deep learning (requires large data)
    
    ## Reference Documentation
    
    Detailed information available in `references/`:
    - `classification.md` - All classification algorithms
    - `regression.md` - Regression methods
    - `clustering.md` - Clustering algorithms
    - `forecasting.md` - Forecasting approaches
    - `anomaly_detection.md` - Anomaly detection methods
    - `segmentation.md` - Segmentation algorithms
    - `similarity_search.md` - Pattern matching and motif discovery
    - `transformations.md` - Feature extraction and preprocessing
    - `distances.md` - Time series distance metrics
    - `networks.md` - Deep learning architectures
    - `datasets_benchmarking.md` - Data loading and evaluation tools
    
    ## Additional Resources
    
    - Documentation: https://www.aeon-toolkit.org/
    - GitHub: https://github.com/aeon-toolkit/aeon
    - Examples: https://www.aeon-toolkit.org/en/stable/examples.html
    - API Reference: https://www.aeon-toolkit.org/en/stable/api_reference.html
    
    
  • scientific-skills/anndata/SKILL.mdskill
    Show content (10214 bytes)
    ---
    name: anndata
    description: Data structure for annotated matrices in single-cell analysis. Use when working with .h5ad files or integrating with the scverse ecosystem. This is the data format skill—for analysis workflows use scanpy; for probabilistic models use scvi-tools; for population-scale queries use cellxgene-census.
    license: BSD-3-Clause license
    metadata:
        skill-author: K-Dense Inc.
    ---
    
    # AnnData
    
    ## Overview
    
    AnnData is a Python package for handling annotated data matrices, storing experimental measurements (X) alongside observation metadata (obs), variable metadata (var), and multi-dimensional annotations (obsm, varm, obsp, varp, uns). Originally designed for single-cell genomics through Scanpy, it now serves as a general-purpose framework for any annotated data requiring efficient storage, manipulation, and analysis.
    
    ## When to Use This Skill
    
    Use this skill when:
    - Creating, reading, or writing AnnData objects
    - Working with h5ad, zarr, or other genomics data formats
    - Performing single-cell RNA-seq analysis
    - Managing large datasets with sparse matrices or backed mode
    - Concatenating multiple datasets or experimental batches
    - Subsetting, filtering, or transforming annotated data
    - Integrating with scanpy, scvi-tools, or other scverse ecosystem tools
    
    ## Installation
    
    ```bash
    uv pip install anndata
    
    # With optional dependencies
    uv pip install anndata[dev,test,doc]
    ```
    
    ## Quick Start
    
    ### Creating an AnnData object
    ```python
    import anndata as ad
    import numpy as np
    import pandas as pd
    
    # Minimal creation
    X = np.random.rand(100, 2000)  # 100 cells × 2000 genes
    adata = ad.AnnData(X)
    
    # With metadata
    obs = pd.DataFrame({
        'cell_type': ['T cell', 'B cell'] * 50,
        'sample': ['A', 'B'] * 50
    }, index=[f'cell_{i}' for i in range(100)])
    
    var = pd.DataFrame({
        'gene_name': [f'Gene_{i}' for i in range(2000)]
    }, index=[f'ENSG{i:05d}' for i in range(2000)])
    
    adata = ad.AnnData(X=X, obs=obs, var=var)
    ```
    
    ### Reading data
    ```python
    # Read h5ad file
    adata = ad.read_h5ad('data.h5ad')
    
    # Read with backed mode (for large files)
    adata = ad.read_h5ad('large_data.h5ad', backed='r')
    
    # Read other formats
    adata = ad.read_csv('data.csv')
    adata = ad.read_loom('data.loom')
    adata = ad.read_10x_h5('filtered_feature_bc_matrix.h5')
    ```
    
    ### Writing data
    ```python
    # Write h5ad file
    adata.write_h5ad('output.h5ad')
    
    # Write with compression
    adata.write_h5ad('output.h5ad', compression='gzip')
    
    # Write other formats
    adata.write_zarr('output.zarr')
    adata.write_csvs('output_dir/')
    ```
    
    ### Basic operations
    ```python
    # Subset by conditions
    t_cells = adata[adata.obs['cell_type'] == 'T cell']
    
    # Subset by indices
    subset = adata[0:50, 0:100]
    
    # Add metadata
    adata.obs['quality_score'] = np.random.rand(adata.n_obs)
    adata.var['highly_variable'] = np.random.rand(adata.n_vars) > 0.8
    
    # Access dimensions
    print(f"{adata.n_obs} observations × {adata.n_vars} variables")
    ```
    
    ## Core Capabilities
    
    ### 1. Data Structure
    
    Understand the AnnData object structure including X, obs, var, layers, obsm, varm, obsp, varp, uns, and raw components.
    
    **See**: `references/data_structure.md` for comprehensive information on:
    - Core components (X, obs, var, layers, obsm, varm, obsp, varp, uns, raw)
    - Creating AnnData objects from various sources
    - Accessing and manipulating data components
    - Memory-efficient practices
    
    ### 2. Input/Output Operations
    
    Read and write data in various formats with support for compression, backed mode, and cloud storage.
    
    **See**: `references/io_operations.md` for details on:
    - Native formats (h5ad, zarr)
    - Alternative formats (CSV, MTX, Loom, 10X, Excel)
    - Backed mode for large datasets
    - Remote data access
    - Format conversion
    - Performance optimization
    
    Common commands:
    ```python
    # Read/write h5ad
    adata = ad.read_h5ad('data.h5ad', backed='r')
    adata.write_h5ad('output.h5ad', compression='gzip')
    
    # Read 10X data
    adata = ad.read_10x_h5('filtered_feature_bc_matrix.h5')
    
    # Read MTX format
    adata = ad.read_mtx('matrix.mtx').T
    ```
    
    ### 3. Concatenation
    
    Combine multiple AnnData objects along observations or variables with flexible join strategies.
    
    **See**: `references/concatenation.md` for comprehensive coverage of:
    - Basic concatenation (axis=0 for observations, axis=1 for variables)
    - Join types (inner, outer)
    - Merge strategies (same, unique, first, only)
    - Tracking data sources with labels
    - Lazy concatenation (AnnCollection)
    - On-disk concatenation for large datasets
    
    Common commands:
    ```python
    # Concatenate observations (combine samples)
    adata = ad.concat(
        [adata1, adata2, adata3],
        axis=0,
        join='inner',
        label='batch',
        keys=['batch1', 'batch2', 'batch3']
    )
    
    # Concatenate variables (combine modalities)
    adata = ad.concat([adata_rna, adata_protein], axis=1)
    
    # Lazy concatenation
    from anndata.experimental import AnnCollection
    collection = AnnCollection(
        ['data1.h5ad', 'data2.h5ad'],
        join_obs='outer',
        label='dataset'
    )
    ```
    
    ### 4. Data Manipulation
    
    Transform, subset, filter, and reorganize data efficiently.
    
    **See**: `references/manipulation.md` for detailed guidance on:
    - Subsetting (by indices, names, boolean masks, metadata conditions)
    - Transposition
    - Copying (full copies vs views)
    - Renaming (observations, variables, categories)
    - Type conversions (strings to categoricals, sparse/dense)
    - Adding/removing data components
    - Reordering
    - Quality control filtering
    
    Common commands:
    ```python
    # Subset by metadata
    filtered = adata[adata.obs['quality_score'] > 0.8]
    hv_genes = adata[:, adata.var['highly_variable']]
    
    # Transpose
    adata_T = adata.T
    
    # Copy vs view
    view = adata[0:100, :]  # View (lightweight reference)
    copy = adata[0:100, :].copy()  # Independent copy
    
    # Convert strings to categoricals
    adata.strings_to_categoricals()
    ```
    
    ### 5. Best Practices
    
    Follow recommended patterns for memory efficiency, performance, and reproducibility.
    
    **See**: `references/best_practices.md` for guidelines on:
    - Memory management (sparse matrices, categoricals, backed mode)
    - Views vs copies
    - Data storage optimization
    - Performance optimization
    - Working with raw data
    - Metadata management
    - Reproducibility
    - Error handling
    - Integration with other tools
    - Common pitfalls and solutions
    
    Key recommendations:
    ```python
    # Use sparse matrices for sparse data
    from scipy.sparse import csr_matrix
    adata.X = csr_matrix(adata.X)
    
    # Convert strings to categoricals
    adata.strings_to_categoricals()
    
    # Use backed mode for large files
    adata = ad.read_h5ad('large.h5ad', backed='r')
    
    # Store raw before filtering
    adata.raw = adata.copy()
    adata = adata[:, adata.var['highly_variable']]
    ```
    
    ## Integration with Scverse Ecosystem
    
    AnnData serves as the foundational data structure for the scverse ecosystem:
    
    ### Scanpy (Single-cell analysis)
    ```python
    import scanpy as sc
    
    # Preprocessing
    sc.pp.filter_cells(adata, min_genes=200)
    sc.pp.normalize_total(adata, target_sum=1e4)
    sc.pp.log1p(adata)
    sc.pp.highly_variable_genes(adata, n_top_genes=2000)
    
    # Dimensionality reduction
    sc.pp.pca(adata, n_comps=50)
    sc.pp.neighbors(adata, n_neighbors=15)
    sc.tl.umap(adata)
    sc.tl.leiden(adata)
    
    # Visualization
    sc.pl.umap(adata, color=['cell_type', 'leiden'])
    ```
    
    ### Muon (Multimodal data)
    ```python
    import muon as mu
    
    # Combine RNA and protein data
    mdata = mu.MuData({'rna': adata_rna, 'protein': adata_protein})
    ```
    
    ### PyTorch integration
    ```python
    from anndata.experimental import AnnLoader
    
    # Create DataLoader for deep learning
    dataloader = AnnLoader(adata, batch_size=128, shuffle=True)
    
    for batch in dataloader:
        X = batch.X
        # Train model
    ```
    
    ## Common Workflows
    
    ### Single-cell RNA-seq analysis
    ```python
    import anndata as ad
    import scanpy as sc
    
    # 1. Load data
    adata = ad.read_10x_h5('filtered_feature_bc_matrix.h5')
    
    # 2. Quality control
    adata.obs['n_genes'] = (adata.X > 0).sum(axis=1)
    adata.obs['n_counts'] = adata.X.sum(axis=1)
    adata = adata[adata.obs['n_genes'] > 200]
    adata = adata[adata.obs['n_counts'] < 50000]
    
    # 3. Store raw
    adata.raw = adata.copy()
    
    # 4. Normalize and filter
    sc.pp.normalize_total(adata, target_sum=1e4)
    sc.pp.log1p(adata)
    sc.pp.highly_variable_genes(adata, n_top_genes=2000)
    adata = adata[:, adata.var['highly_variable']]
    
    # 5. Save processed data
    adata.write_h5ad('processed.h5ad')
    ```
    
    ### Batch integration
    ```python
    # Load multiple batches
    adata1 = ad.read_h5ad('batch1.h5ad')
    adata2 = ad.read_h5ad('batch2.h5ad')
    adata3 = ad.read_h5ad('batch3.h5ad')
    
    # Concatenate with batch labels
    adata = ad.concat(
        [adata1, adata2, adata3],
        label='batch',
        keys=['batch1', 'batch2', 'batch3'],
        join='inner'
    )
    
    # Apply batch correction
    import scanpy as sc
    sc.pp.combat(adata, key='batch')
    
    # Continue analysis
    sc.pp.pca(adata)
    sc.pp.neighbors(adata)
    sc.tl.umap(adata)
    ```
    
    ### Working with large datasets
    ```python
    # Open in backed mode
    adata = ad.read_h5ad('100GB_dataset.h5ad', backed='r')
    
    # Filter based on metadata (no data loading)
    high_quality = adata[adata.obs['quality_score'] > 0.8]
    
    # Load filtered subset
    adata_subset = high_quality.to_memory()
    
    # Process subset
    process(adata_subset)
    
    # Or process in chunks
    chunk_size = 1000
    for i in range(0, adata.n_obs, chunk_size):
        chunk = adata[i:i+chunk_size, :].to_memory()
        process(chunk)
    ```
    
    ## Troubleshooting
    
    ### Out of memory errors
    Use backed mode or convert to sparse matrices:
    ```python
    # Backed mode
    adata = ad.read_h5ad('file.h5ad', backed='r')
    
    # Sparse matrices
    from scipy.sparse import csr_matrix
    adata.X = csr_matrix(adata.X)
    ```
    
    ### Slow file reading
    Use compression and appropriate formats:
    ```python
    # Optimize for storage
    adata.strings_to_categoricals()
    adata.write_h5ad('file.h5ad', compression='gzip')
    
    # Use Zarr for cloud storage
    adata.write_zarr('file.zarr', chunks=(1000, 1000))
    ```
    
    ### Index alignment issues
    Always align external data on index:
    ```python
    # Wrong
    adata.obs['new_col'] = external_data['values']
    
    # Correct
    adata.obs['new_col'] = external_data.set_index('cell_id').loc[adata.obs_names, 'values']
    ```
    
    ## Additional Resources
    
    - **Official documentation**: https://anndata.readthedocs.io/
    - **Scanpy tutorials**: https://scanpy.readthedocs.io/
    - **Scverse ecosystem**: https://scverse.org/
    - **GitHub repository**: https://github.com/scverse/anndata
    
    
  • scientific-skills/arboreto/SKILL.mdskill
    Show content (6929 bytes)
    ---
    name: arboreto
    description: Infer gene regulatory networks (GRNs) from gene expression data using scalable algorithms (GRNBoost2, GENIE3). Use when analyzing transcriptomics data (bulk RNA-seq, single-cell RNA-seq) to identify transcription factor-target gene relationships and regulatory interactions. Supports distributed computation for large-scale datasets.
    license: BSD-3-Clause license
    metadata:
        skill-author: K-Dense Inc.
    ---
    
    # Arboreto
    
    ## Overview
    
    Arboreto is a computational library for inferring gene regulatory networks (GRNs) from gene expression data using parallelized algorithms that scale from single machines to multi-node clusters.
    
    **Core capability**: Identify which transcription factors (TFs) regulate which target genes based on expression patterns across observations (cells, samples, conditions).
    
    ## Quick Start
    
    Install arboreto:
    ```bash
    uv pip install arboreto
    ```
    
    Basic GRN inference:
    ```python
    import pandas as pd
    from arboreto.algo import grnboost2
    
    if __name__ == '__main__':
        # Load expression data (genes as columns)
        expression_matrix = pd.read_csv('expression_data.tsv', sep='\t')
    
        # Infer regulatory network
        network = grnboost2(expression_data=expression_matrix)
    
        # Save results (TF, target, importance)
        network.to_csv('network.tsv', sep='\t', index=False, header=False)
    ```
    
    **Critical**: Always use `if __name__ == '__main__':` guard because Dask spawns new processes.
    
    ## Core Capabilities
    
    ### 1. Basic GRN Inference
    
    For standard GRN inference workflows including:
    - Input data preparation (Pandas DataFrame or NumPy array)
    - Running inference with GRNBoost2 or GENIE3
    - Filtering by transcription factors
    - Output format and interpretation
    
    **See**: `references/basic_inference.md`
    
    **Use the ready-to-run script**: `scripts/basic_grn_inference.py` for standard inference tasks:
    ```bash
    python scripts/basic_grn_inference.py expression_data.tsv output_network.tsv --tf-file tfs.txt --seed 777
    ```
    
    ### 2. Algorithm Selection
    
    Arboreto provides two algorithms:
    
    **GRNBoost2 (Recommended)**:
    - Fast gradient boosting-based inference
    - Optimized for large datasets (10k+ observations)
    - Default choice for most analyses
    
    **GENIE3**:
    - Random Forest-based inference
    - Original multiple regression approach
    - Use for comparison or validation
    
    Quick comparison:
    ```python
    from arboreto.algo import grnboost2, genie3
    
    # Fast, recommended
    network_grnboost = grnboost2(expression_data=matrix)
    
    # Classic algorithm
    network_genie3 = genie3(expression_data=matrix)
    ```
    
    **For detailed algorithm comparison, parameters, and selection guidance**: `references/algorithms.md`
    
    ### 3. Distributed Computing
    
    Scale inference from local multi-core to cluster environments:
    
    **Local (default)** - Uses all available cores automatically:
    ```python
    network = grnboost2(expression_data=matrix)
    ```
    
    **Custom local client** - Control resources:
    ```python
    from distributed import LocalCluster, Client
    
    local_cluster = LocalCluster(n_workers=10, memory_limit='8GB')
    client = Client(local_cluster)
    
    network = grnboost2(expression_data=matrix, client_or_address=client)
    
    client.close()
    local_cluster.close()
    ```
    
    **Cluster computing** - Connect to remote Dask scheduler:
    ```python
    from distributed import Client
    
    client = Client('tcp://scheduler:8786')
    network = grnboost2(expression_data=matrix, client_or_address=client)
    ```
    
    **For cluster setup, performance optimization, and large-scale workflows**: `references/distributed_computing.md`
    
    ## Installation
    
    ```bash
    uv pip install arboreto
    ```
    
    **Dependencies**: scipy, scikit-learn, numpy, pandas, dask, distributed
    
    ## Common Use Cases
    
    ### Single-Cell RNA-seq Analysis
    ```python
    import pandas as pd
    from arboreto.algo import grnboost2
    
    if __name__ == '__main__':
        # Load single-cell expression matrix (cells x genes)
        sc_data = pd.read_csv('scrna_counts.tsv', sep='\t')
    
        # Infer cell-type-specific regulatory network
        network = grnboost2(expression_data=sc_data, seed=42)
    
        # Filter high-confidence links
        high_confidence = network[network['importance'] > 0.5]
        high_confidence.to_csv('grn_high_confidence.tsv', sep='\t', index=False)
    ```
    
    ### Bulk RNA-seq with TF Filtering
    ```python
    from arboreto.utils import load_tf_names
    from arboreto.algo import grnboost2
    
    if __name__ == '__main__':
        # Load data
        expression_data = pd.read_csv('rnaseq_tpm.tsv', sep='\t')
        tf_names = load_tf_names('human_tfs.txt')
    
        # Infer with TF restriction
        network = grnboost2(
            expression_data=expression_data,
            tf_names=tf_names,
            seed=123
        )
    
        network.to_csv('tf_target_network.tsv', sep='\t', index=False)
    ```
    
    ### Comparative Analysis (Multiple Conditions)
    ```python
    from arboreto.algo import grnboost2
    
    if __name__ == '__main__':
        # Infer networks for different conditions
        conditions = ['control', 'treatment_24h', 'treatment_48h']
    
        for condition in conditions:
            data = pd.read_csv(f'{condition}_expression.tsv', sep='\t')
            network = grnboost2(expression_data=data, seed=42)
            network.to_csv(f'{condition}_network.tsv', sep='\t', index=False)
    ```
    
    ## Output Interpretation
    
    Arboreto returns a DataFrame with regulatory links:
    
    | Column | Description |
    |--------|-------------|
    | `TF` | Transcription factor (regulator) |
    | `target` | Target gene |
    | `importance` | Regulatory importance score (higher = stronger) |
    
    **Filtering strategy**:
    - Top N links per target gene
    - Importance threshold (e.g., > 0.5)
    - Statistical significance testing (permutation tests)
    
    ## Integration with pySCENIC
    
    Arboreto is a core component of the SCENIC pipeline for single-cell regulatory network analysis:
    
    ```python
    # Step 1: Use arboreto for GRN inference
    from arboreto.algo import grnboost2
    network = grnboost2(expression_data=sc_data, tf_names=tf_list)
    
    # Step 2: Use pySCENIC for regulon identification and activity scoring
    # (See pySCENIC documentation for downstream analysis)
    ```
    
    ## Reproducibility
    
    Always set a seed for reproducible results:
    ```python
    network = grnboost2(expression_data=matrix, seed=777)
    ```
    
    Run multiple seeds for robustness analysis:
    ```python
    from distributed import LocalCluster, Client
    
    if __name__ == '__main__':
        client = Client(LocalCluster())
    
        seeds = [42, 123, 777]
        networks = []
    
        for seed in seeds:
            net = grnboost2(expression_data=matrix, client_or_address=client, seed=seed)
            networks.append(net)
    
        # Combine networks and filter consensus links
        consensus = analyze_consensus(networks)
    ```
    
    ## Troubleshooting
    
    **Memory errors**: Reduce dataset size by filtering low-variance genes or use distributed computing
    
    **Slow performance**: Use GRNBoost2 instead of GENIE3, enable distributed client, filter TF list
    
    **Dask errors**: Ensure `if __name__ == '__main__':` guard is present in scripts
    
    **Empty results**: Check data format (genes as columns), verify TF names match gene names
    
    

README

Scientific Agent Skills

🔔 Claude Scientific Skills is now Scientific Agent Skills. Same skills, broader compatibility — now works with any AI agent that supports the open Agent Skills standard, not just Claude.

New: K-Dense BYOK — A free, open-source AI co-scientist that runs on your desktop, powered by Scientific Agent Skills. Bring your own API keys, pick from 40+ models, and get a full research workspace with web search, file handling, 100+ scientific databases, and access to all 135 skills in this repo. Your data stays on your computer, and you can optionally scale to cloud compute via Modal for heavy workloads. Get started here.

License: MIT Skills Databases Agent Skills Works with X LinkedIn YouTube

A comprehensive collection of 135 ready-to-use scientific and research skills (covering cancer genomics, drug-target binding, molecular dynamics, RNA velocity, geospatial science, time series forecasting, scientific ML resource discovery via Hugging Science, 78+ scientific databases, and more) for any AI agent that supports the open Agent Skills standard, created by K-Dense. Works with Cursor, Claude Code, Codex, and more. Transform your AI agent into a research assistant capable of executing complex multi-step scientific workflows across biology, chemistry, medicine, and beyond.


These skills enable your AI agent to seamlessly work with specialized scientific libraries, databases, and tools across multiple scientific domains. While the agent can use any Python package or API on its own, these explicitly defined skills provide curated documentation and examples that make it significantly stronger and more reliable for the workflows below:

  • 🧬 Bioinformatics & Genomics - Sequence analysis, single-cell RNA-seq, gene regulatory networks, variant annotation, phylogenetic analysis
  • 🧪 Cheminformatics & Drug Discovery - Molecular property prediction, virtual screening, ADMET analysis, molecular docking, lead optimization
  • 🔬 Proteomics & Mass Spectrometry - LC-MS/MS processing, peptide identification, spectral matching, protein quantification
  • 🏥 Clinical Research & Precision Medicine - Clinical trials, pharmacogenomics, variant interpretation, drug safety, clinical decision support, treatment planning
  • 🧠 Healthcare AI & Clinical ML - EHR analysis, physiological signal processing, medical imaging, clinical prediction models
  • 🖼️ Medical Imaging & Digital Pathology - DICOM processing, whole slide image analysis, computational pathology, radiology workflows
  • 🤖 Machine Learning & AI - Deep learning, reinforcement learning, time series analysis, model interpretability, Bayesian methods
  • 🔮 Materials Science & Chemistry - Crystal structure analysis, phase diagrams, metabolic modeling, computational chemistry
  • 🌌 Physics & Astronomy - Astronomical data analysis, coordinate transformations, cosmological calculations, symbolic mathematics, physics computations
  • ⚙️ Engineering & Simulation - Discrete-event simulation, multi-objective optimization, metabolic engineering, systems modeling, process optimization
  • 📊 Data Analysis & Visualization - Statistical analysis, network analysis, time series, publication-quality figures, large-scale data processing, EDA
  • 🌍 Geospatial Science & Remote Sensing - Satellite imagery processing, GIS analysis, spatial statistics, terrain analysis, machine learning for Earth observation
  • 🧪 Laboratory Automation - Liquid handling protocols, lab equipment control, workflow automation, LIMS integration
  • 📚 Scientific Communication - Literature review, peer review, scientific writing, document processing, posters, slides, schematics, citation management
  • 🔬 Multi-omics & Systems Biology - Multi-modal data integration, pathway analysis, network biology, systems-level insights
  • 🧬 Protein Engineering & Design - Protein language models, structure prediction, sequence design, function annotation
  • 🎓 Research Methodology - Hypothesis generation, scientific brainstorming, critical thinking, grant writing, scholar evaluation

Transform your AI coding agent into an 'AI Scientist' on your desktop!

If you find this repository useful, please consider giving it a star! It helps others discover these tools and encourages us to continue maintaining and expanding this collection.

🎬 New to Scientific Agent Skills? Watch our Getting Started with Scientific Agent Skills video for a quick walkthrough.


📦 What's Included

This repository provides 135 scientific and research skills organized into the following categories:

  • 100+ Scientific & Financial Databases - A unified database-lookup skill provides direct access to 78 public databases (PubChem, ChEMBL, UniProt, COSMIC, ClinicalTrials.gov, FRED, USPTO, and more), plus dedicated skills for DepMap, Imaging Data Commons, PrimeKG, U.S. Treasury Fiscal Data, and Hugging Science (curated catalog of scientific datasets, models, and demos across 17 scientific domains on Hugging Face). Multi-database packages like BioServices (~40 bioinformatics services), BioPython (38 NCBI sub-databases via Entrez), and gget (20+ genomics databases) add further coverage
  • 70+ Optimized Python Package Skills - Explicitly defined skills for RDKit, Scanpy, PyTorch Lightning, scikit-learn, BioPython, pyzotero, BioServices, PennyLane, Qiskit, OpenMM, MDAnalysis, scVelo, TimesFM, and others — with curated documentation, examples, and best practices. Note: the agent can write code using any Python package, not just these; these skills simply provide stronger, more reliable performance for the packages listed
  • 9 Scientific Integration Skills - Explicitly defined skills for Benchling, DNAnexus, LatchBio, OMERO, Protocols.io, Open Notebook, and more. Again, the agent is not limited to these — any API or platform reachable from Python is fair game; these skills are the optimized, pre-documented paths
  • 30+ Analysis & Communication Tools - Literature review, scientific writing, peer review, document processing, posters, slides, schematics, infographics, Mermaid diagrams, and more
  • 10+ Research & Clinical Tools - Hypothesis generation, grant writing, clinical decision support, treatment plans, regulatory compliance, scenario analysis

Each skill includes:

  • ✅ Comprehensive documentation (SKILL.md)
  • ✅ Practical code examples
  • ✅ Use cases and best practices
  • ✅ Integration guides
  • ✅ Reference materials

📋 Table of Contents


🚀 Why Use This?

Accelerate Your Research

  • Save Days of Work - Skip API documentation research and integration setup
  • Production-Ready Code - Tested, validated examples following scientific best practices
  • Multi-Step Workflows - Execute complex pipelines with a single prompt

🎯 Comprehensive Coverage

  • 135 Skills - Extensive coverage across all major scientific domains
  • 100+ Databases - Unified access to 78+ databases via database-lookup, plus dedicated data access skills and multi-database packages like BioServices, BioPython, and gget
  • 70+ Optimized Python Package Skills - RDKit, Scanpy, PyTorch Lightning, scikit-learn, BioServices, PennyLane, Qiskit, OpenMM, scVelo, TimesFM, and others (the agent can use any Python package; these are the pre-documented, higher-performing paths)

🔧 Easy Integration

  • Simple Setup - Copy skills to your skills directory and start working
  • Automatic Discovery - Your agent automatically finds and uses relevant skills
  • Well Documented - Each skill includes examples, use cases, and best practices

🌟 Maintained & Supported

  • Regular Updates - Continuously maintained and expanded by K-Dense team
  • Community Driven - Open source with active community contributions
  • Enterprise Ready - Commercial support available for advanced needs

🎯 Getting Started

Option 1: npx (all platforms)

Install Scientific Agent Skills with a single command:

npx skills add K-Dense-AI/scientific-agent-skills

This is the official standard approach for installing Agent Skills across all platforms, including Claude Code, Claude Cowork, Codex, Gemini CLI, Cursor, and any other agent that supports the open Agent Skills standard.

Option 2: GitHub CLI (gh skill)

If you use the GitHub CLI (v2.90.0+), you can install skills with gh skill:

# Browse and install interactively
gh skill install K-Dense-AI/scientific-agent-skills

# Install a specific skill directly
gh skill install K-Dense-AI/scientific-agent-skills scanpy

# Target a specific agent host
gh skill install K-Dense-AI/scientific-agent-skills --agent cursor
gh skill install K-Dense-AI/scientific-agent-skills --agent claude-code
gh skill install K-Dense-AI/scientific-agent-skills --agent codex
gh skill install K-Dense-AI/scientific-agent-skills --agent gemini

gh skill automatically installs to the correct directory for your agent host and records provenance metadata for supply chain integrity.

Version pinning

Pin to a specific release tag or commit SHA for reproducible installs:

# Pin to a release tag
gh skill install K-Dense-AI/scientific-agent-skills --pin v1.0.0

# Pin to a commit SHA
gh skill install K-Dense-AI/scientific-agent-skills --pin abc123def

Keeping skills up to date

# Check for updates interactively
gh skill update

# Update all installed skills
gh skill update --all

That's it! Your AI agent will automatically discover the skills and use them when relevant to your scientific tasks. You can also invoke any skill manually by mentioning the skill name in your prompt.


⚠️ Security Disclaimer

Skills can execute code and influence your coding agent's behavior. Review what you install.

Agent Skills are powerful — they can instruct your AI agent to run arbitrary code, install packages, make network requests, and modify files on your system. A malicious or poorly written skill has the potential to steer your coding agent into harmful behavior.

We take security seriously. All contributions go through a review process, and we run LLM-based security scans (via Cisco AI Defense Skill Scanner) on every skill in this repository. However, as a small team with a growing number of community contributions, we cannot guarantee that every skill has been exhaustively reviewed for all possible risks.

It is ultimately your responsibility to review the skills you install and decide which ones to trust.

We recommend the following:

  • Do not install everything at once. Only install the skills you actually need for your work. While installing the full collection was reasonable when K-Dense created and maintained every skill, the repository now includes many community contributions that we may not have reviewed as thoroughly.
  • Read the SKILL.md before installing. Each skill's documentation describes what it does, what packages it uses, and what external services it connects to. If something looks suspicious, don't install it.
  • Check the contribution history. Skills authored by K-Dense (K-Dense-AI) have been through our internal review process. Community-contributed skills have been reviewed to the best of our ability, but with limited resources.
  • Run the security scanner yourself. Before installing third-party skills, scan them locally:
    uv pip install cisco-ai-skill-scanner
    skill-scanner scan /path/to/skill --use-behavioral
    
  • Report anything suspicious. If you find a skill that looks malicious or behaves unexpectedly, please open an issue immediately so we can investigate.

All skills are scanned on an approximately weekly basis, and SECURITY.md is updated with the latest results. We try to address security gaps as they arise.


❤️ Support the Open Source Community

Scientific Agent Skills is powered by 50+ incredible open source projects maintained by dedicated developers and research communities worldwide. Projects like Biopython, Scanpy, RDKit, scikit-learn, PyTorch Lightning, and many others form the foundation of these skills.

If you find value in this repository, please consider supporting the projects that make it possible:

  • Star their repositories on GitHub
  • 💰 Sponsor maintainers via GitHub Sponsors or NumFOCUS
  • 📝 Cite projects in your publications
  • 💻 Contribute code, docs, or bug reports

👉 View the full list of projects to support


⚙️ Prerequisites

  • Python: 3.11+ (3.12+ recommended for best compatibility)
  • uv: Python package manager (required for installing skill dependencies)
  • Client: Any agent that supports the Agent Skills standard (Cursor, Claude Code, Gemini CLI, Codex, etc.)
  • System: macOS, Linux, or Windows with WSL2
  • Dependencies: Automatically handled by individual skills (check SKILL.md files for specific requirements)

Installing uv

The skills use uv as the package manager for installing Python dependencies. Install it using the instructions for your operating system:

macOS and Linux:

curl -LsSf https://astral.sh/uv/install.sh | sh

Windows:

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Alternative (via pip):

pip install uv

After installation, verify it works by running:

uv --version

For more installation options and details, visit the official uv documentation.


💡 Quick Examples

Once you've installed the skills, you can ask your AI agent to execute complex multi-step scientific workflows. Here are some example prompts:

🧪 Drug Discovery Pipeline

Goal: Find novel EGFR inhibitors for lung cancer treatment

Prompt:

Use available skills you have access to whenever possible. Query ChEMBL for EGFR inhibitors (IC50 < 50nM), analyze structure-activity relationships 
with RDKit, generate improved analogs with datamol, perform virtual screening with DiffDock 
against AlphaFold EGFR structure, search PubMed for resistance mechanisms, check COSMIC for 
mutations, and create visualizations and a comprehensive report.

Skills Used: ChEMBL, RDKit, datamol, DiffDock, AlphaFold DB, PubMed, COSMIC, scientific visualization

Need cloud GPUs and a publication-ready report at the end? Run this on K-Dense Web free.


🔬 Single-Cell RNA-seq Analysis

Goal: Comprehensive analysis of 10X Genomics data with public data integration

Prompt:

Use available skills you have access to whenever possible. Load 10X dataset with Scanpy, perform QC and doublet removal, integrate with Cellxgene 
Census data, identify cell types using NCBI Gene markers, run differential expression with 
PyDESeq2, infer gene regulatory networks with Arboreto, enrich pathways via Reactome/KEGG, 
and identify therapeutic targets with Open Targets.

Skills Used: Scanpy, Cellxgene Census, NCBI Gene, PyDESeq2, Arboreto, Reactome, KEGG, Open Targets

Want zero-setup cloud execution and shareable outputs? Try K-Dense Web free.


🧬 Multi-Omics Biomarker Discovery

Goal: Integrate RNA-seq, proteomics, and metabolomics to predict patient outcomes

Prompt:

Use available skills you have access to whenever possible. Analyze RNA-seq with PyDESeq2, process mass spec with pyOpenMS, integrate metabolites from 
HMDB/Metabolomics Workbench, map proteins to pathways (UniProt/KEGG), find interactions via 
STRING, correlate omics layers with statsmodels, build predictive model with scikit-learn, 
and search ClinicalTrials.gov for relevant trials.

Skills Used: PyDESeq2, pyOpenMS, HMDB, Metabolomics Workbench, UniProt, KEGG, STRING, statsmodels, scikit-learn, ClinicalTrials.gov

This pipeline is heavy on compute. Run it on K-Dense Web with cloud GPUs, free to start.


🎯 Virtual Screening Campaign

Goal: Discover allosteric modulators for protein-protein interactions

Prompt:

Use available skills you have access to whenever possible. Retrieve AlphaFold structures, identify interaction interface with BioPython, search ZINC 
for allosteric candidates (MW 300-500, logP 2-4), filter with RDKit, dock with DiffDock, 
rank with DeepChem, check PubChem suppliers, search USPTO patents, and optimize leads with 
MedChem/molfeat.

Skills Used: AlphaFold DB, BioPython, ZINC, RDKit, DiffDock, DeepChem, PubChem, USPTO, MedChem, molfeat

Skip the local GPU bottleneck. Run virtual screening on K-Dense Web free.


🏥 Clinical Variant Interpretation

Goal: Analyze VCF file for hereditary cancer risk assessment

Prompt:

Use available skills you have access to whenever possible. Parse VCF with pysam, annotate variants with Ensembl VEP, query ClinVar for pathogenicity, 
check COSMIC for cancer mutations, retrieve gene info from NCBI Gene, analyze protein impact 
with UniProt, search PubMed for case reports, check ClinPGx for pharmacogenomics, generate 
clinical report with document processing tools, and find matching trials on ClinicalTrials.gov.

Skills Used: pysam, Ensembl, ClinVar, COSMIC, NCBI Gene, UniProt, PubMed, ClinPGx, Document Skills, ClinicalTrials.gov

Need a polished clinical report at the end, not just code? K-Dense Web delivers publication-ready outputs. Try it free.


🌐 Systems Biology Network Analysis

Goal: Analyze gene regulatory networks from RNA-seq data

Prompt:

Use available skills you have access to whenever possible. Query NCBI Gene for annotations, retrieve sequences from UniProt, identify interactions via 
STRING, map to Reactome/KEGG pathways, analyze topology with Torch Geometric, reconstruct 
GRNs with Arboreto, assess druggability with Open Targets, model with PyMC, visualize 
networks, and search GEO for similar patterns.

Skills Used: NCBI Gene, UniProt, STRING, Reactome, KEGG, Torch Geometric, Arboreto, Open Targets, PyMC, GEO

Want end-to-end pipelines with shareable outputs and no setup? Try K-Dense Web free.

📖 Want more examples? Check out docs/examples.md for comprehensive workflow examples and detailed use cases across all scientific domains.


🚀 Want to Skip the Setup and Just Do the Science?

Recognize any of these?

  • You spent more time configuring environments than running analyses
  • Your workflow needs a GPU your local machine does not have
  • You need a shareable, publication-ready figure or report, not just a script
  • You want to run a complex multi-step pipeline right now, without reading package docs first

If so, K-Dense Web was built for you. It is the full AI co-scientist platform: everything in this repo plus cloud GPUs, 200+ skills, and outputs you can drop directly into a paper or presentation. Zero setup required.

FeatureThis RepoK-Dense Web
Scientific Skills135 skills200+ skills (exclusive access)
SetupManual installationZero setup, works instantly
ComputeYour machineCloud GPUs and HPC included
WorkflowsPrompt and codeEnd-to-end research pipelines
OutputsCode and analysisPublication-ready figures, reports, and papers
IntegrationsLocal toolsLab systems, ELNs, and cloud storage

"K-Dense Web took me from raw sequencing data to a draft figure in one afternoon. What used to take three days of environment setup and scripting now just works." Computational biologist, drug discovery

Try K-Dense Web

k-dense.ai | Read the full comparison


🔬 Use Cases

🧪 Drug Discovery & Medicinal Chemistry

  • Virtual Screening: Screen millions of compounds from PubChem/ZINC against protein targets
  • Lead Optimization: Analyze structure-activity relationships with RDKit, generate analogs with datamol
  • ADMET Prediction: Predict absorption, distribution, metabolism, excretion, and toxicity with DeepChem
  • Molecular Docking: Predict binding poses and affinities with DiffDock
  • Bioactivity Mining: Query ChEMBL for known inhibitors and analyze SAR patterns

🧬 Bioinformatics & Genomics

  • Sequence Analysis: Process DNA/RNA/protein sequences with BioPython and pysam
  • Single-Cell Analysis: Analyze 10X Genomics data with Scanpy, identify cell types, infer GRNs with Arboreto
  • Variant Annotation: Annotate VCF files with Ensembl VEP, query ClinVar for pathogenicity
  • Variant Database Management: Build scalable VCF databases with TileDB-VCF for incremental sample addition, efficient population-scale queries, and compressed storage of genomic variant data
  • Gene Discovery: Query NCBI Gene, UniProt, and Ensembl for comprehensive gene information
  • Network Analysis: Identify protein-protein interactions via STRING, map to pathways (KEGG, Reactome)

🏥 Clinical Research & Precision Medicine

  • Clinical Trials: Search ClinicalTrials.gov for relevant studies, analyze eligibility criteria
  • Variant Interpretation: Annotate variants with ClinVar, COSMIC, and ClinPGx for pharmacogenomics
  • Drug Safety: Query FDA databases for adverse events, drug interactions, and recalls
  • Precision Therapeutics: Match patient variants to targeted therapies and clinical trials

🔬 Multi-Omics & Systems Biology

  • Multi-Omics Integration: Combine RNA-seq, proteomics, and metabolomics data
  • Pathway Analysis: Enrich differentially expressed genes in KEGG/Reactome pathways
  • Network Biology: Reconstruct gene regulatory networks, identify hub genes
  • Biomarker Discovery: Integrate multi-omics layers to predict patient outcomes

📊 Data Analysis & Visualization

  • Statistical Analysis: Perform hypothesis testing, power analysis, and experimental design
  • Publication Figures: Create publication-quality visualizations with matplotlib and seaborn
  • Network Visualization: Visualize biological networks with NetworkX
  • Report Generation: Generate comprehensive PDF reports with Document Skills

🧪 Laboratory Automation

  • Protocol Design: Create Opentrons protocols for automated liquid handling
  • LIMS Integration: Integrate with Benchling and LabArchives for data management
  • Workflow Automation: Automate multi-step laboratory workflows

📚 Available Skills

This repository contains 135 scientific and research skills organized across multiple domains. Each skill provides comprehensive documentation, code examples, and best practices for working with scientific libraries, databases, and tools.

Skill Categories

Note: The Python package and integration skills listed below are explicitly defined skills — curated with documentation, examples, and best practices for stronger, more reliable performance. They are not a ceiling: the agent can install and use any Python package or call any API, even without a dedicated skill. The skills listed simply make common workflows faster and more dependable.

🧬 Bioinformatics & Genomics (21+ skills)

  • Sequence analysis: BioPython, pysam, scikit-bio, BioServices
  • Single-cell analysis: Scanpy, AnnData, scvi-tools, scVelo (RNA velocity), Arboreto, Cellxgene Census
  • Genomic tools: gget, geniml, gtars, deepTools, FlowIO, Polars-Bio, Zarr, TileDB-VCF
  • Differential expression: PyDESeq2
  • Phylogenetics: ETE Toolkit, Phylogenetics (MAFFT, IQ-TREE 2, FastTree)

🧪 Cheminformatics & Drug Discovery (10+ skills)

  • Molecular manipulation: RDKit, Datamol, Molfeat
  • Deep learning: DeepChem, TorchDrug
  • Docking & screening: DiffDock
  • Molecular dynamics: OpenMM + MDAnalysis (MD simulation & trajectory analysis)
  • Cloud quantum chemistry: Rowan (pKa, docking, cofolding)
  • Drug-likeness: MedChem
  • Benchmarks: PyTDC

🔬 Proteomics & Mass Spectrometry (2 skills)

  • Spectral processing: matchms, pyOpenMS

🏥 Clinical Research & Precision Medicine (8+ skills)

  • Clinical databases: via Database Lookup (ClinicalTrials.gov, ClinVar, ClinPGx, COSMIC, FDA, cBioPortal, Monarch, and more)
  • Cancer genomics: DepMap (cancer dependency scores, drug sensitivity)
  • Cancer imaging: Imaging Data Commons (NCI radiology & pathology datasets via idc-index)
  • Healthcare AI: PyHealth, NeuroKit2, Clinical Decision Support
  • Clinical documentation: Clinical Reports, Treatment Plans

🖼️ Medical Imaging & Digital Pathology (3 skills)

  • DICOM processing: pydicom
  • Whole slide imaging: histolab, PathML

🧠 Neuroscience & Electrophysiology (1 skill)

  • Neural recordings: Neuropixels-Analysis (extracellular spikes, silicon probes, spike sorting)

🤖 Machine Learning & AI (16+ skills)

  • Deep learning: PyTorch Lightning, Transformers, Stable Baselines3, PufferLib
  • Classical ML: scikit-learn, scikit-survival, SHAP
  • Time series: aeon, TimesFM (Google's zero-shot foundation model for univariate forecasting)
  • Bayesian methods: PyMC
  • Optimization: PyMOO
  • Graph ML: Torch Geometric
  • Dimensionality reduction: UMAP-learn
  • Statistical modeling: statsmodels

🔮 Materials Science, Chemistry & Physics (7 skills)

  • Materials: Pymatgen
  • Metabolic modeling: COBRApy
  • Astronomy: Astropy
  • Quantum computing: Cirq, PennyLane, Qiskit, QuTiP

⚙️ Engineering & Simulation (4 skills)

  • Numerical computing: MATLAB/Octave
  • Computational fluid dynamics: FluidSim
  • Discrete-event simulation: SimPy
  • Symbolic math: SymPy

📊 Data Analysis & Visualization (16+ skills)

  • Visualization: Matplotlib, Seaborn, Scientific Visualization
  • Geospatial analysis: GeoPandas, GeoMaster (remote sensing, GIS, satellite imagery, spatial ML, 500+ examples)
  • Data processing: Dask, Polars, Vaex
  • Network analysis: NetworkX
  • Document processing: Document Skills (PDF, DOCX, PPTX, XLSX)
  • Infographics: Infographics (AI-powered professional infographic creation)
  • Diagrams: Markdown & Mermaid Writing (text-based diagrams as default documentation standard)
  • Exploratory data analysis: EDA workflows
  • Statistical analysis: Statistical Analysis workflows

🧪 Laboratory Automation (4 skills)

  • Liquid handling: PyLabRobot
  • Cloud lab: Ginkgo Cloud Lab (cell-free protein expression, fluorescent pixel art via autonomous RAC infrastructure)
  • Protocol management: Protocols.io
  • LIMS integration: Benchling, LabArchives

🔬 Multi-omics & Systems Biology (4+ skills)

  • Pathway analysis: via Database Lookup (KEGG, Reactome, STRING) and PrimeKG
  • Multi-omics: HypoGeniC
  • Data management: LaminDB

🧬 Protein Engineering & Design (3 skills)

  • Protein language models: ESM
  • Glycoengineering: Glycoengineering (N/O-glycosylation prediction, therapeutic antibody optimization)
  • Cloud laboratory platform: Adaptyv (automated protein testing and validation)

📚 Scientific Communication (20+ skills)

  • Literature: Paper Lookup (PubMed, PMC, bioRxiv, medRxiv, arXiv, OpenAlex, Crossref, Semantic Scholar, CORE, Unpaywall), Literature Review
  • Advanced paper search: BGPT Paper Search (25+ structured fields per paper — methods, results, sample sizes, quality scores — from full text, not just abstracts)
  • Web search: Parallel Web (synthesized summaries with citations)
  • Research notebooks: Open Notebook (self-hosted NotebookLM alternative — PDFs, videos, audio, web pages; 16+ AI providers; multi-speaker podcast generation)
  • Writing: Scientific Writing, Peer Review
  • Document processing: XLSX, MarkItDown, Document Skills
  • Publishing: Venue Templates
  • Presentations: Scientific Slides, LaTeX Posters, PPTX Posters
  • Diagrams: Scientific Schematics, Markdown & Mermaid Writing
  • Infographics: Infographics (10 types, 8 styles, colorblind-safe palettes)
  • Citations: Citation Management
  • Illustration: Generate Image (AI image generation with FLUX.2 Pro and Gemini 3 Pro (Nano Banana Pro))

🔬 Scientific Databases & Data Access (6 skills → 100+ databases total)

A unified database-lookup skill provides direct REST API access to 78 public databases across all domains. Dedicated skills cover specialized data platforms. Multi-database packages like BioServices (~40 bioinformatics services), BioPython (38 NCBI sub-databases via Entrez), and gget (20+ genomics databases) add further coverage.

  • Unified access: Database Lookup (78 databases spanning chemistry, genomics, clinical, pathways, patents, economics, and more — PubChem, ChEMBL, UniProt, PDB, AlphaFold, KEGG, Reactome, STRING, ClinVar, COSMIC, ClinicalTrials.gov, FDA, FRED, USPTO, SEC EDGAR, and dozens more)
  • Cancer genomics: DepMap (cancer cell line dependencies, drug sensitivity, gene effect profiles)
  • Cancer imaging: Imaging Data Commons (NCI radiology & pathology datasets via idc-index)
  • Knowledge graph: PrimeKG (precision medicine knowledge graph — genes, drugs, diseases, phenotypes)
  • Fiscal data: U.S. Treasury Fiscal Data (national debt, Treasury statements, auctions, exchange rates)
  • Scientific ML resource catalog: Hugging Science (curated index of datasets, models, blog posts, and interactive Spaces across 17 scientific domains — astronomy, biology, chemistry, climate, genomics, materials science, medicine, physics, scientific reasoning, and more — with usage patterns for datasets, transformers, and gradio_client)

🔧 Infrastructure & Platforms (7+ skills)

  • Cloud compute: Modal
  • GPU acceleration: Optimize for GPU (CuPy, Numba CUDA, Warp, cuDF, cuML, cuGraph, KvikIO, cuCIM, cuxfilter, cuVS, cuSpatial, RAFT)
  • Genomics platforms: DNAnexus, LatchBio
  • Microscopy: OMERO
  • Automation: Opentrons
  • Resource detection: Get Available Resources

🎓 Research Methodology & Planning (12+ skills)

  • Ideation: Scientific Brainstorming, Hypothesis Generation
  • Critical analysis: Scientific Critical Thinking, Scholar Evaluation
  • Scenario analysis: What-If Oracle (multi-branch possibility exploration, risk analysis, strategic options)
  • Multi-perspective deliberation: Consciousness Council (diverse expert viewpoints, devil's advocate analysis)
  • Cognitive profiling: DHDNA Profiler (extract thinking patterns and cognitive signatures from any text)
  • Funding: Research Grants
  • Discovery: Research Lookup, Paper Lookup (10 academic databases)
  • Market analysis: Market Research Reports

⚖️ Regulatory & Standards (1 skill)

  • Medical device standards: ISO 13485 Certification

📖 For complete details on all skills, see docs/scientific-skills.md

💡 Looking for practical examples? Check out docs/examples.md for comprehensive workflow examples across all scientific domains.


🤝 Contributing

We welcome contributions to expand and improve this scientific skills repository!

Ways to Contribute

Add New Skills

  • Create skills for additional scientific packages or databases
  • Add integrations for scientific platforms and tools

📚 Improve Existing Skills

  • Enhance documentation with more examples and use cases
  • Add new workflows and reference materials
  • Improve code examples and scripts
  • Fix bugs or update outdated information

🐛 Report Issues

  • Submit bug reports with detailed reproduction steps
  • Suggest improvements or new features

How to Contribute

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-skill)
  3. Follow the existing directory structure and documentation patterns
  4. Ensure all new skills include comprehensive SKILL.md files
  5. Test your examples and workflows thoroughly
  6. Commit your changes (git commit -m 'Add amazing skill')
  7. Push to your branch (git push origin feature/amazing-skill)
  8. Submit a pull request with a clear description of your changes

Contribution Guidelines

Adhere to the Agent Skills Specification — Every skill must follow the official spec (valid SKILL.md frontmatter, naming conventions, directory structure)
✅ Maintain consistency with existing skill documentation format
✅ Ensure all code examples are tested and functional
✅ Follow scientific best practices in examples and workflows
✅ Update relevant documentation when adding new capabilities
✅ Provide clear comments and docstrings in code
✅ Include references to official documentation

Security Scanning

All skills in this repository are security-scanned using Cisco AI Defense Skill Scanner, an open-source tool that detects prompt injection, data exfiltration, and malicious code patterns in Agent Skills.

If you are contributing a new skill, we recommend running the scanner locally before submitting a pull request:

uv pip install cisco-ai-skill-scanner
skill-scanner scan /path/to/your/skill --use-behavioral

Note: A clean scan result reduces noise in review, but does not guarantee a skill is free of all risk. Contributed skills are also reviewed manually before merging.

Recognition

Contributors are recognized in our community and may be featured in:

  • Repository contributors list
  • Special mentions in release notes
  • K-Dense community highlights

Your contributions help make scientific computing more accessible and enable researchers to leverage AI tools more effectively!

Support Open Source

This project builds on 50+ amazing open source projects. If you find value in these skills, please consider supporting the projects we depend on.


🔧 Troubleshooting

Common Issues

Problem: Skills not loading

  • Verify skill folders are in the correct directory (see Getting Started)
  • Each skill folder must contain a SKILL.md file
  • Restart your agent/IDE after copying skills
  • In Cursor, check Settings → Rules to confirm skills are discovered

Problem: Missing Python dependencies

  • Solution: Check the specific SKILL.md file for required packages
  • Install dependencies: uv pip install package-name

Problem: API rate limits

  • Solution: Many databases have rate limits. Review the specific database documentation
  • Consider implementing caching or batch requests

Problem: Authentication errors

  • Solution: Some services require API keys. Check the SKILL.md for authentication setup
  • Verify your credentials and permissions

Problem: Outdated examples

  • Solution: Report the issue via GitHub Issues
  • Check the official package documentation for updated syntax

❓ FAQ

General Questions

Q: Is this free to use?
A: Yes! This repository is MIT licensed. However, each individual skill has its own license specified in the license metadata field within its SKILL.md file—be sure to review and comply with those terms.

Q: Why are all skills grouped together instead of separate packages?
A: We believe good science in the age of AI is inherently interdisciplinary. Bundling all skills together makes it trivial for you (and your agent) to bridge across fields—e.g., combining genomics, cheminformatics, clinical data, and machine learning in one workflow—without worrying about which individual skills to install or wire together.

Q: Can I use this for commercial projects?
A: The repository itself is MIT licensed, which allows commercial use. However, individual skills may have different licenses—check the license field in each skill's SKILL.md file to ensure compliance with your intended use.

Q: Do all skills have the same license?
A: No. Each skill has its own license specified in the license metadata field within its SKILL.md file. These licenses may differ from the repository's MIT License. Users are responsible for reviewing and adhering to the license terms of each individual skill they use.

Q: How often is this updated?
A: We regularly update skills to reflect the latest versions of packages and APIs. Major updates are announced in release notes.

Q: Can I use this with other AI models?
A: The skills follow the open Agent Skills standard and work with any compatible agent, including Cursor, Claude Code, and Codex.

Installation & Setup

Q: Do I need all the Python packages installed?
A: No! Only install the packages you need. Each skill specifies its requirements in its SKILL.md file.

Q: What if a skill doesn't work?
A: First check the Troubleshooting section. If the issue persists, file an issue on GitHub with detailed reproduction steps.

Q: Do the skills work offline?
A: Database skills require internet access to query APIs. Package skills work offline once Python dependencies are installed.

Contributing

Q: Can I contribute my own skills?
A: Absolutely! We welcome contributions. See the Contributing section for guidelines and best practices.

Q: How do I report bugs or suggest features?
A: Open an issue on GitHub with a clear description. For bugs, include reproduction steps and expected vs actual behavior.


💬 Support

Need help? Here's how to get support:


📖 Citation

If you use Scientific Agent Skills in your research or project, please cite it as:

BibTeX

@software{scientific_agent_skills_2026,
  author = {{K-Dense Inc.}},
  title = {Scientific Agent Skills: A Comprehensive Collection of Scientific Tools for AI Agents},
  year = {2026},
  url = {https://github.com/K-Dense-AI/scientific-agent-skills},
  note = {135 skills covering databases, packages, integrations, and analysis tools}
}

APA

K-Dense Inc. (2026). Scientific Agent Skills: A comprehensive collection of scientific tools for AI agents [Computer software]. https://github.com/K-Dense-AI/scientific-agent-skills

MLA

K-Dense Inc. Scientific Agent Skills: A Comprehensive Collection of Scientific Tools for AI Agents. 2026, github.com/K-Dense-AI/scientific-agent-skills.

Plain Text

Scientific Agent Skills by K-Dense Inc. (2026)
Available at: https://github.com/K-Dense-AI/scientific-agent-skills

We appreciate acknowledgment in publications, presentations, or projects that benefit from these skills!


📄 License

This project is licensed under the MIT License.

Copyright © 2026 K-Dense Inc. (k-dense.ai)

Key Points:

  • Free for any use (commercial and noncommercial)
  • Open source - modify, distribute, and use freely
  • Permissive - minimal restrictions on reuse
  • ⚠️ No warranty - provided "as is" without warranty of any kind

See LICENSE.md for full terms.

Individual Skill Licenses

⚠️ Important: Each skill has its own license specified in the license metadata field within its SKILL.md file. These licenses may differ from the repository's MIT License and may include additional terms or restrictions. Users are responsible for reviewing and adhering to the license terms of each individual skill they use.

Star History

Star History Chart