USP

This comprehensive collection stands out with 135 pre-built skills and 100+ database integrations, now compatible with any Agent Skills-compliant AI agent. Its unique K-Dense BYOK desktop app provides a private, powerful AI co-scientist wo…

Use cases

01Bioinformatics & Genomics analysis
02Cheminformatics & Drug Discovery
03Clinical Research & Precision Medicine
04Scientific Data Analysis & Visualization
05Automated Literature Review

Detected files (8)

scientific-skills/adaptyv/SKILL.mdskill

Show content (7629 bytes)

---
name: adaptyv
author: "K-Dense, Inc."
description: "How to use the Adaptyv Bio Foundry API and Python SDK for protein experiment design, submission, and results retrieval. Use this skill whenever the user mentions Adaptyv, Foundry API, protein binding assays, protein screening experiments, BLI/SPR assays, thermostability assays, or wants to submit protein sequences for experimental characterization. Also trigger when code imports `adaptyv`, `adaptyv_sdk`, or `FoundryClient`, or references `foundry-api-public.adaptyvbio.com`."
---

# Adaptyv Bio Foundry API

Adaptyv Bio is a cloud lab that turns protein sequences into experimental data. Users submit amino acid sequences via API or UI; Adaptyv's automated lab runs assays (binding, thermostability, expression, fluorescence) and delivers results in ~21 days.

## Quick Start

**Base URL:** `https://foundry-api-public.adaptyvbio.com/api/v1`

**Authentication:** Bearer token in the `Authorization` header. Tokens are obtained from [foundry.adaptyvbio.com](https://foundry.adaptyvbio.com/) sidebar.

When writing code, always read the API key from the environment variable `ADAPTYV_API_KEY` or from a `.env` file — never hardcode tokens. Check for a `.env` file in the project root first; if one exists, use a library like `python-dotenv` to load it.

```bash
export FOUNDRY_API_TOKEN="abs0_..."
curl https://foundry-api-public.adaptyvbio.com/api/v1/targets?limit=3 \
  -H "Authorization: Bearer $FOUNDRY_API_TOKEN"
```

Every request except `GET /openapi.json` requires authentication. Store tokens in environment variables or `.env` files — never commit them to source control.

## Python SDK

Install: `uv add adaptyv-sdk` (falls back to `uv pip install adaptyv-sdk` if no `pyproject.toml` exists)

**Environment variables** (set in shell or `.env` file):
```bash
ADAPTYV_API_KEY=your_api_key
ADAPTYV_API_URL=https://foundry-api-public.adaptyvbio.com/api/v1
```

### Decorator Pattern

```python
from adaptyv import lab

@lab.experiment(target="PD-L1", experiment_type="screening", method="bli")
def design_binders():
    return {"design_a": "MVKVGVNG...", "design_b": "MKVLVAG..."}

result = design_binders()
print(f"Experiment: {result.experiment_url}")
```

### Client Pattern

```python
from adaptyv import FoundryClient

client = FoundryClient(api_key="...", base_url="https://foundry-api-public.adaptyvbio.com/api/v1")

# Browse targets
targets = client.targets.list(search="EGFR", selfservice_only=True)

# Estimate cost
estimate = client.experiments.cost_estimate({
    "experiment_spec": {
        "experiment_type": "screening",
        "method": "bli",
        "target_id": "target-uuid",
        "sequences": {"seq1": "EVQLVESGGGLVQ..."},
        "n_replicates": 3
    }
})

# Create and submit
exp = client.experiments.create({...})
client.experiments.submit(exp.experiment_id)

# Later: retrieve results
results = client.experiments.get_results(exp.experiment_id)
```

## Experiment Types

| Type | Method | Measures | Requires Target |
|---|---|---|---|
| `affinity` | `bli` or `spr` | KD, kon, koff kinetics | Yes |
| `screening` | `bli` or `spr` | Yes/no binding | Yes |
| `thermostability` | — | Melting temperature (Tm) | No |
| `expression` | — | Expression yield | No |
| `fluorescence` | — | Fluorescence intensity | No |

## Experiment Lifecycle

```
Draft → WaitingForConfirmation → QuoteSent → WaitingForMaterials → InQueue → InProduction → DataAnalysis → InReview → Done
```

| Status | Who Acts | Description |
|---|---|---|
| `Draft` | You | Editable, no cost commitment |
| `WaitingForConfirmation` | Adaptyv | Under review, quote being prepared |
| `QuoteSent` | You | Review and confirm the quote |
| `WaitingForMaterials` | Adaptyv | Gene fragments and target ordered |
| `InQueue` | Adaptyv | Materials arrived, queued for lab |
| `InProduction` | Adaptyv | Assay running |
| `DataAnalysis` | Adaptyv | Raw data processing and QC |
| `InReview` | Adaptyv | Final validation |
| `Done` | You | Results available |
| `Canceled` | Either | Experiment canceled |

The `results_status` field on an experiment tracks: `none`, `partial`, or `all`.

## Common Workflows

### 1. Submit a Binding Screen (Step by Step)

```python
# 1. Find a target
targets = client.targets.list(search="EGFR", selfservice_only=True)
target_id = targets.items[0].id

# 2. Preview cost
estimate = client.experiments.cost_estimate({
    "experiment_spec": {
        "experiment_type": "screening",
        "method": "bli",
        "target_id": target_id,
        "sequences": {"seq1": "EVQLVESGGGLVQ...", "seq2": "MKVLVAG..."},
        "n_replicates": 3
    }
})

# 3. Create experiment (starts as Draft)
exp = client.experiments.create({
    "name": "EGFR binder screen batch 1",
    "experiment_spec": {
        "experiment_type": "screening",
        "method": "bli",
        "target_id": target_id,
        "sequences": {"seq1": "EVQLVESGGGLVQ...", "seq2": "MKVLVAG..."},
        "n_replicates": 3
    }
})

# 4. Submit for review
client.experiments.submit(exp.experiment_id)

# 5. Poll or use webhooks until Done
# 6. Retrieve results
results = client.experiments.get_results(exp.experiment_id)
```

### 2. Automated Pipeline (Skip Draft + Auto-Accept Quote)

```python
exp = client.experiments.create({
    "name": "Auto pipeline run",
    "experiment_spec": {...},
    "skip_draft": True,
    "auto_accept_quote": True,
    "webhook_url": "https://my-server.com/webhook"
})
# Webhook fires on each status transition; poll or wait for Done
```

### 3. Using Webhooks

Pass `webhook_url` when creating an experiment. Adaptyv POSTs to that URL on every status transition with the experiment ID, previous status, and new status.

## Sequences

- Simple format: `{"seq1": "EVQLVESGGGLVQPGGSLRLSCAAS"}`
- Rich format: `{"seq1": {"aa_string": "EVQLVESGGGLVQ...", "control": false, "metadata": {"type": "scfv"}}}`
- Multi-chain: use colon separator — `"MVLS:EVQL"`
- Valid amino acids: A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y (case-insensitive, stored uppercase)
- Sequences can only be added to experiments in `Draft` status

## Filtering, Sorting, and Pagination

All list endpoints support pagination (`limit` 1-100, default 50; `offset`), search (free-text on name fields), and sorting.

**Filtering** uses s-expression syntax via the `filter` query parameter:
- Comparison: `eq(field,value)`, `neq`, `gt`, `gte`, `lt`, `lte`, `contains(field,substring)`
- Range/set: `between(field,lo,hi)`, `in(field,v1,v2,...)`
- Logic: `and(expr1,expr2,...)`, `or(...)`, `not(expr)`
- Null: `is_null(field)`, `is_not_null(field)`
- JSONB: `at(field,key)` — e.g., `eq(at(metadata,score),42)`
- Cast: `float()`, `int()`, `text()`, `timestamp()`, `date()`

**Sorting** uses `asc(field)` or `desc(field)`, comma-separated (max 8):
```
sort=desc(created_at),asc(name)
```

**Example:** `filter=and(gte(created_at,2026-01-01),eq(status,done))`

## Error Handling

All errors return:
```json
{
  "error": "Human-readable description",
  "request_id": "req_019462a4-b1c2-7def-8901-23456789abcd"
}
```
The `request_id` is also in the `x-request-id` response header — include it when contacting support.

## Token Management

Tokens use Biscuit-based cryptographic attenuation. You can create restricted tokens scoped by organization, resource type, actions (read/create/update), and expiry via `POST /tokens/attenuate`. Revoking a token (`POST /tokens/revoke`) revokes it and all its descendants.

## Detailed API Reference

For the full list of all 32 endpoints with request/response schemas, read `references/api-endpoints.md`.

scientific-skills/astropy/SKILL.mdskill

Show content (11534 bytes)

---
name: astropy
description: Comprehensive Python library for astronomy and astrophysics. This skill should be used when working with astronomical data including celestial coordinates, physical units, FITS files, cosmological calculations, time systems, tables, world coordinate systems (WCS), and astronomical data analysis. Use when tasks involve coordinate transformations, unit conversions, FITS file manipulation, cosmological distance calculations, time scale conversions, or astronomical data processing.
license: BSD-3-Clause license
metadata:
    skill-author: K-Dense Inc.
---

# Astropy

## Overview

Astropy is the core Python package for astronomy, providing essential functionality for astronomical research and data analysis. Use astropy for coordinate transformations, unit and quantity calculations, FITS file operations, cosmological calculations, precise time handling, tabular data manipulation, and astronomical image processing.

## When to Use This Skill

Use astropy when tasks involve:
- Converting between celestial coordinate systems (ICRS, Galactic, FK5, AltAz, etc.)
- Working with physical units and quantities (converting Jy to mJy, parsecs to km, etc.)
- Reading, writing, or manipulating FITS files (images or tables)
- Cosmological calculations (luminosity distance, lookback time, Hubble parameter)
- Precise time handling with different time scales (UTC, TAI, TT, TDB) and formats (JD, MJD, ISO)
- Table operations (reading catalogs, cross-matching, filtering, joining)
- WCS transformations between pixel and world coordinates
- Astronomical constants and calculations

## Quick Start

```python
import astropy.units as u
from astropy.coordinates import SkyCoord
from astropy.time import Time
from astropy.io import fits
from astropy.table import Table
from astropy.cosmology import Planck18

# Units and quantities
distance = 100 * u.pc
distance_km = distance.to(u.km)

# Coordinates
coord = SkyCoord(ra=10.5*u.degree, dec=41.2*u.degree, frame='icrs')
coord_galactic = coord.galactic

# Time
t = Time('2023-01-15 12:30:00')
jd = t.jd  # Julian Date

# FITS files
data = fits.getdata('image.fits')
header = fits.getheader('image.fits')

# Tables
table = Table.read('catalog.fits')

# Cosmology
d_L = Planck18.luminosity_distance(z=1.0)
```

## Core Capabilities

### 1. Units and Quantities (`astropy.units`)

Handle physical quantities with units, perform unit conversions, and ensure dimensional consistency in calculations.

**Key operations:**
- Create quantities by multiplying values with units
- Convert between units using `.to()` method
- Perform arithmetic with automatic unit handling
- Use equivalencies for domain-specific conversions (spectral, doppler, parallax)
- Work with logarithmic units (magnitudes, decibels)

**See:** `references/units.md` for comprehensive documentation, unit systems, equivalencies, performance optimization, and unit arithmetic.

### 2. Coordinate Systems (`astropy.coordinates`)

Represent celestial positions and transform between different coordinate frames.

**Key operations:**
- Create coordinates with `SkyCoord` in any frame (ICRS, Galactic, FK5, AltAz, etc.)
- Transform between coordinate systems
- Calculate angular separations and position angles
- Match coordinates to catalogs
- Include distance for 3D coordinate operations
- Handle proper motions and radial velocities
- Query named objects from online databases

**See:** `references/coordinates.md` for detailed coordinate frame descriptions, transformations, observer-dependent frames (AltAz), catalog matching, and performance tips.

### 3. Cosmological Calculations (`astropy.cosmology`)

Perform cosmological calculations using standard cosmological models.

**Key operations:**
- Use built-in cosmologies (Planck18, WMAP9, etc.)
- Create custom cosmological models
- Calculate distances (luminosity, comoving, angular diameter)
- Compute ages and lookback times
- Determine Hubble parameter at any redshift
- Calculate density parameters and volumes
- Perform inverse calculations (find z for given distance)

**See:** `references/cosmology.md` for available models, distance calculations, time calculations, density parameters, and neutrino effects.

### 4. FITS File Handling (`astropy.io.fits`)

Read, write, and manipulate FITS (Flexible Image Transport System) files.

**Key operations:**
- Open FITS files with context managers
- Access HDUs (Header Data Units) by index or name
- Read and modify headers (keywords, comments, history)
- Work with image data (NumPy arrays)
- Handle table data (binary and ASCII tables)
- Create new FITS files (single or multi-extension)
- Use memory mapping for large files
- Access remote FITS files (S3, HTTP)

**See:** `references/fits.md` for comprehensive file operations, header manipulation, image and table handling, multi-extension files, and performance considerations.

### 5. Table Operations (`astropy.table`)

Work with tabular data with support for units, metadata, and various file formats.

**Key operations:**
- Create tables from arrays, lists, or dictionaries
- Read/write tables in multiple formats (FITS, CSV, HDF5, VOTable)
- Access and modify columns and rows
- Sort, filter, and index tables
- Perform database-style operations (join, group, aggregate)
- Stack and concatenate tables
- Work with unit-aware columns (QTable)
- Handle missing data with masking

**See:** `references/tables.md` for table creation, I/O operations, data manipulation, sorting, filtering, joins, grouping, and performance tips.

### 6. Time Handling (`astropy.time`)

Precise time representation and conversion between time scales and formats.

**Key operations:**
- Create Time objects in various formats (ISO, JD, MJD, Unix, etc.)
- Convert between time scales (UTC, TAI, TT, TDB, etc.)
- Perform time arithmetic with TimeDelta
- Calculate sidereal time for observers
- Compute light travel time corrections (barycentric, heliocentric)
- Work with time arrays efficiently
- Handle masked (missing) times

**See:** `references/time.md` for time formats, time scales, conversions, arithmetic, observing features, and precision handling.

### 7. World Coordinate System (`astropy.wcs`)

Transform between pixel coordinates in images and world coordinates.

**Key operations:**
- Read WCS from FITS headers
- Convert pixel coordinates to world coordinates (and vice versa)
- Calculate image footprints
- Access WCS parameters (reference pixel, projection, scale)
- Create custom WCS objects

**See:** `references/wcs_and_other_modules.md` for WCS operations and transformations.

## Additional Capabilities

The `references/wcs_and_other_modules.md` file also covers:

### NDData and CCDData
Containers for n-dimensional datasets with metadata, uncertainty, masking, and WCS information.

### Modeling
Framework for creating and fitting mathematical models to astronomical data.

### Visualization
Tools for astronomical image display with appropriate stretching and scaling.

### Constants
Physical and astronomical constants with proper units (speed of light, solar mass, Planck constant, etc.).

### Convolution
Image processing kernels for smoothing and filtering.

### Statistics
Robust statistical functions including sigma clipping and outlier rejection.

## Installation

```bash
# Install astropy
uv pip install astropy

# With optional dependencies for full functionality
uv pip install astropy[all]
```

## Common Workflows

### Converting Coordinates Between Systems

```python
from astropy.coordinates import SkyCoord
import astropy.units as u

# Create coordinate
c = SkyCoord(ra='05h23m34.5s', dec='-69d45m22s', frame='icrs')

# Transform to galactic
c_gal = c.galactic
print(f"l={c_gal.l.deg}, b={c_gal.b.deg}")

# Transform to alt-az (requires time and location)
from astropy.time import Time
from astropy.coordinates import EarthLocation, AltAz

observing_time = Time('2023-06-15 23:00:00')
observing_location = EarthLocation(lat=40*u.deg, lon=-120*u.deg)
aa_frame = AltAz(obstime=observing_time, location=observing_location)
c_altaz = c.transform_to(aa_frame)
print(f"Alt={c_altaz.alt.deg}, Az={c_altaz.az.deg}")
```

### Reading and Analyzing FITS Files

```python
from astropy.io import fits
import numpy as np

# Open FITS file
with fits.open('observation.fits') as hdul:
    # Display structure
    hdul.info()

    # Get image data and header
    data = hdul[1].data
    header = hdul[1].header

    # Access header values
    exptime = header['EXPTIME']
    filter_name = header['FILTER']

    # Analyze data
    mean = np.mean(data)
    median = np.median(data)
    print(f"Mean: {mean}, Median: {median}")
```

### Cosmological Distance Calculations

```python
from astropy.cosmology import Planck18
import astropy.units as u
import numpy as np

# Calculate distances at z=1.5
z = 1.5
d_L = Planck18.luminosity_distance(z)
d_A = Planck18.angular_diameter_distance(z)

print(f"Luminosity distance: {d_L}")
print(f"Angular diameter distance: {d_A}")

# Age of universe at that redshift
age = Planck18.age(z)
print(f"Age at z={z}: {age.to(u.Gyr)}")

# Lookback time
t_lookback = Planck18.lookback_time(z)
print(f"Lookback time: {t_lookback.to(u.Gyr)}")
```

### Cross-Matching Catalogs

```python
from astropy.table import Table
from astropy.coordinates import SkyCoord, match_coordinates_sky
import astropy.units as u

# Read catalogs
cat1 = Table.read('catalog1.fits')
cat2 = Table.read('catalog2.fits')

# Create coordinate objects
coords1 = SkyCoord(ra=cat1['RA']*u.degree, dec=cat1['DEC']*u.degree)
coords2 = SkyCoord(ra=cat2['RA']*u.degree, dec=cat2['DEC']*u.degree)

# Find matches
idx, sep, _ = coords1.match_to_catalog_sky(coords2)

# Filter by separation threshold
max_sep = 1 * u.arcsec
matches = sep < max_sep

# Create matched catalogs
cat1_matched = cat1[matches]
cat2_matched = cat2[idx[matches]]
print(f"Found {len(cat1_matched)} matches")
```

## Best Practices

1. **Always use units**: Attach units to quantities to avoid errors and ensure dimensional consistency
2. **Use context managers for FITS files**: Ensures proper file closing
3. **Prefer arrays over loops**: Process multiple coordinates/times as arrays for better performance
4. **Check coordinate frames**: Verify the frame before transformations
5. **Use appropriate cosmology**: Choose the right cosmological model for your analysis
6. **Handle missing data**: Use masked columns for tables with missing values
7. **Specify time scales**: Be explicit about time scales (UTC, TT, TDB) for precise timing
8. **Use QTable for unit-aware tables**: When table columns have units
9. **Check WCS validity**: Verify WCS before using transformations
10. **Cache frequently used values**: Expensive calculations (e.g., cosmological distances) can be cached

## Documentation and Resources

- Official Astropy Documentation: https://docs.astropy.org/en/stable/
- Tutorials: https://learn.astropy.org/
- GitHub: https://github.com/astropy/astropy

## Reference Files

For detailed information on specific modules:
- `references/units.md` - Units, quantities, conversions, and equivalencies
- `references/coordinates.md` - Coordinate systems, transformations, and catalog matching
- `references/cosmology.md` - Cosmological models and calculations
- `references/fits.md` - FITS file operations and manipulation
- `references/tables.md` - Table creation, I/O, and operations
- `references/time.md` - Time formats, scales, and calculations
- `references/wcs_and_other_modules.md` - WCS, NDData, modeling, visualization, constants, and utilities

scientific-skills/autoskill/SKILL.mdskill

Show content (11480 bytes)

---
name: autoskill
description: Observe the user's screen via screenpipe, detect repeated research workflows, match them against existing scientific-agent-skills, and draft new skills (or composition recipes that chain existing ones) for the patterns not yet covered. Use when the user asks to analyze their recent work and propose skills based on what they actually do. Requires the screenpipe daemon (https://github.com/screenpipe/screenpipe) running locally on port 3030 — the skill has no other data source and will refuse to run if screenpipe is unreachable. All detection runs locally; only redacted cluster summaries reach the LLM.
allowed-tools: Read Write Edit Bash
license: MIT license
metadata:
    skill-author: K-Dense Inc.
    requires: screenpipe
---

# autoskill

> **Requires a running [screenpipe](https://github.com/screenpipe/screenpipe) daemon.** This skill has no alternate data source — it reads exclusively from the local screenpipe HTTP API (default `http://localhost:3030`). If the daemon isn't running, `run()` raises `ScreenpipeUnreachable` with install instructions.

> **Network access & environment variables.** This skill makes authenticated HTTP requests to (a) the user's local screenpipe daemon on loopback, and (b) the user-configured LLM backend — one of `http://localhost:1234/v1` (LM Studio, default), `https://api.anthropic.com` (opt-in Claude), or a user-supplied BYOK Foundry gateway. The skill reads three environment variables — `SCREENPIPE_TOKEN`, `ANTHROPIC_API_KEY`, `FOUNDRY_API_KEY` — and uses each only to authenticate to the single endpoint its name implies. No other network destinations, no telemetry, no data egress to any third party.

## Overview

Turn the user's own workflow history — captured passively by the local [screenpipe](https://github.com/screenpipe/screenpipe) daemon — into new skills. This skill is on-demand: the user invokes it with a time window, it queries screenpipe's local HTTP API, clusters repeated workflow patterns, compares each pattern against the existing skills in this repo, and produces a staged folder of proposals the user can review, edit, and promote.

## When to Use This Skill

Invoke this skill when the user asks to:
- "Analyze my last 4 hours / day / week and propose new skills."
- "Look at what I've been doing and tell me what's not covered yet."
- "Draft a skill from my recent workflow."
- "Find composition recipes for workflows I repeat."

Do **not** invoke it for one-off questions about screenpipe itself, for real-time screen queries, or without an explicit user request — the skill analyzes sensitive local content and must stay explicitly user-triggered.

## Privacy Posture

- **Screenpipe handles app/window filtering at capture time.** Install a starter deny-list by copying `references/screenpipe-config.yaml` into the user's screenpipe config. Sensitive apps (password managers, messaging, banking) are never OCR'd in the first place.
- **Raw OCR never leaves the machine.** `scripts/fetch_window.py` pulls data over localhost HTTP. `scripts/cluster.py` reduces the timeline to app/duration/title summaries. `scripts/redact.py` strips emails, API keys, bearer tokens, and phone numbers as defense-in-depth before any cluster summary reaches the LLM.
- **LLM backend defaults to `local`.** The recommended setup is [LM Studio](https://lmstudio.ai/) running `Gemma-4-31B-it` — strong reasoning at a size that fits on most workstation GPUs, and no data ever leaves your machine. Cloud backends (`claude`, `foundry`) are opt-in and documented in `config.yaml` for users who explicitly want them. Detection and embeddings always run locally regardless of backend choice.
- **Dry-run mode** (`--plan`) prints the exact timeline that will be analyzed before any LLM call.
- **TLS for localhost** (optional, for corporate policy): see `references/https-proxy.md` for the Caddy pattern.

## Prerequisites

### 1. Screenpipe daemon

Either install the official release or build from source. Either way the daemon binds HTTP on `localhost:3030` by default.

**From source** (recommended if you want the CLI daemon without the desktop GUI):

```bash
git clone --depth 1 https://github.com/mediar-ai/screenpipe.git
cd screenpipe
cargo build -p screenpipe-engine --release
# System deps (macOS): cmake + full Xcode.app (not just Command Line Tools).
#   brew install cmake
#   # if xcodebuild plug-ins error: sudo xcodebuild -runFirstLaunch
./target/release/screenpipe doctor   # confirm permissions + ffmpeg
./target/release/screenpipe record --disable-audio --use-pii-removal
```

First run will prompt for macOS Screen Recording permission. Grant it and relaunch.

### 2. Screenpipe API token

The local API now requires bearer auth. Retrieve your token and export it:

```bash
export SCREENPIPE_TOKEN=$(screenpipe auth token)
```

(Or set `screenpipe.token` directly in `config.yaml` — env var is preferred since it keeps secrets out of version control.)

### 3. Python environment

Via `pipenv` from the repo root:

```bash
pipenv install httpx pyyaml sentence-transformers
```

The embedding model (`sentence-transformers/all-MiniLM-L6-v2`, ~80 MB) downloads on first run.

### 4. Local LLM (default path) — LM Studio

- Install [LM Studio](https://lmstudio.ai/).
- Download `Gemma-4-31B-it` (or another strong reasoning model; adjust `local.model` in `config.yaml`).
- Load it via the CLI for headless use (no GUI required):

```bash
lms load gemma-4-31b-it --context-length 131072 --gpu max -y
lms status   # confirm server running on :1234
```

### 5. Cloud LLM backends (optional, opt-in)

Only if you explicitly opt out of local:
- `claude`: set `ANTHROPIC_API_KEY`, flip `backend: claude` in `config.yaml`.
- `foundry`: set `FOUNDRY_API_KEY`, flip `backend: foundry`, set `foundry.endpoint` to your corporate gateway URL.

## Architecture

```
screenpipe daemon (user-installed)
        │  HTTP on localhost:3030
        ▼
scripts/fetch_window.py    → normalized timeline events
scripts/redact.py          → regex scrub (defense-in-depth)
scripts/cluster.py         → sessions + clusters (local only)
scripts/match_skills.py    → top-k vs existing 135 skills (local embeddings)
scripts/synthesize.py      → LLM judge: reuse / compose / novel
        │
        ▼
~/.autoskill/proposed/<timestamp>/        (default; override with --out)
  ├── report.md
  ├── composition-recipes/<name>/SKILL.md
  └── new-skills/<name>/SKILL.md

scripts/promote.py         → user-approved proposal → scientific-skills/<name>/
```

## Workflow

The skill ships a unified CLI at `scripts/autoskill.py` with three subcommands:

```bash
python scripts/autoskill.py doctor   --config config.yaml --skills-dir ../
python scripts/autoskill.py run      --start ... --end ... --config config.yaml
python scripts/autoskill.py promote  --proposed ~/.autoskill/proposed/<ts> --skills-dir ../ --name <skill>
```

### 0. Preflight with `doctor`

Before a full run, verify every dependency in one shot:

```bash
python scripts/autoskill.py doctor \
  --config scientific-skills/autoskill/config.yaml \
  --skills-dir scientific-skills
```

The report covers `config` (backend choice valid), `skills_dir` (exists), `screenpipe` (reachable + authed), and `llm` (LM Studio serving or API key present). Non-zero exit on any failure, with the offending line marked `error`.

### 1. Run the pipeline

```bash
export SCREENPIPE_TOKEN=$(screenpipe auth token)
python scripts/autoskill.py run \
  --start "2026-04-17T00:00:00Z" \
  --end   "2026-04-17T23:59:59Z" \
  --config scientific-skills/autoskill/config.yaml \
  --skills-dir scientific-skills
```

Proposals land in `~/.autoskill/proposed/<timestamp>/` by default, keeping experimental output out of the skills repo. Pass `--out PATH` to override.

Internally:
1. **Fetch** — `fetch_window` paginates screenpipe's `/search` endpoint, normalizes events to `{ts, app, window_title, text, content_type}`.
2. **Redact** — `redact` scrubs emails, API keys, bearer tokens, phones from OCR text and window titles as defense-in-depth over screenpipe's own PII removal.
3. **Cluster** — `segment_sessions` splits on idle gaps (default 10 min) and drops short sessions; `cluster_sessions` groups sessions by app-signature and keeps clusters of size `min_cluster_size` (default 2).
4. **Match** — `load_skill_descriptions` reads frontmatter from every `SKILL.md` in `scientific-skills/`; `top_k_matches` ranks each cluster against all skills using local `sentence-transformers` embeddings (cosine similarity).
5. **Synthesize** — `synthesize` prompts the configured LLM backend to classify each cluster as `reuse`, `compose`, or `novel` and emit a SKILL.md body where appropriate.
6. **Report** — writes `<out_dir>/<ts>/report.md`, plus `new-skills/<name>/SKILL.md` or `composition-recipes/<name>/SKILL.md` for each proposal.

Add `--dry-run` to stop after clustering; this skips the LLM (and the sentence-transformers load), writing only `plan.md` for inspection.

### 2. Review and promote

Open `~/.autoskill/proposed/<ts>/report.md`, edit drafts in place, delete anything you don't want. Then:

```bash
python scripts/autoskill.py promote \
  --proposed ~/.autoskill/proposed/2026-04-17T14-30-00 \
  --skills-dir scientific-skills \
  --name zotero-pubmed-helper
```

`promote` moves the directory into `scientific-skills/<name>/`, refusing to overwrite an existing skill. Exits non-zero with a friendly error if the proposal isn't found or the target already exists.

## Configuration

See `config.yaml` for the full shape. Default values (local-first):

```yaml
backend: local
local:
  endpoint: http://localhost:1234/v1   # LM Studio's Developer server
  model: Gemma-4-31B-it

screenpipe:
  url: http://localhost:3030           # or https://screenpipe.local via Caddy

cluster:
  min_session_minutes: 5
  idle_gap_minutes: 10
  min_cluster_size: 2
```

To opt into a cloud backend:

```yaml
backend: claude                         # or foundry
claude:
  model: claude-opus-4-7
```

## Composition recipes vs new skills

- **compose**: the LLM judged that chaining existing skills covers the workflow. The emitted SKILL.md is intentionally thin — frontmatter + a "Workflow" section that invokes existing skills in order. The same agent runtime that discovered the skill can then invoke it end-to-end.
- **novel**: no combination of existing skills covers it. A fuller SKILL.md is drafted, still following repo conventions (frontmatter, Overview, When to Use, Workflow). The user should always review new-skill drafts before promoting.

## Testing

The skill is covered by a small pytest suite at `tests/`. Each script is unit-tested in isolation with dependency injection (mock HTTP transport, stub backend, stub embedder):

```bash
cd scientific-skills/autoskill
python -m pytest tests/ -v
```

## Composition with other skills in this repo

The autoskill's embedding index covers all 135 sibling skills. Workflows that look like scientific writing will match `scientific-writing` / `literature-review` / `citation-management`; figure work will match `scientific-schematics` / `generate-image` / `infographics`; slide prep matches `scientific-slides` / `pptx`; etc. When a cluster scores high against two or three sibling skills the emitted composition recipe names them explicitly, so the user's future agent invocations use the optimized paths already documented in this repo.

scientific-skills/benchling-integration/SKILL.mdskill

Show content (13063 bytes)

---
name: benchling-integration
description: Benchling R&D platform integration. Access registry (DNA, proteins), inventory, ELN entries, workflows via API, build Benchling Apps, query Data Warehouse, for lab data management automation.
license: Unknown
compatibility: Requires a Benchling account and API key
metadata:
    skill-author: K-Dense Inc.
---

# Benchling Integration

## Overview

Benchling is a cloud platform for life sciences R&D. Access registry entities (DNA, proteins), inventory, electronic lab notebooks, and workflows programmatically via Python SDK and REST API.

## When to Use This Skill

This skill should be used when:
- Working with Benchling's Python SDK or REST API
- Managing biological sequences (DNA, RNA, proteins) and registry entities
- Automating inventory operations (samples, containers, locations, transfers)
- Creating or querying electronic lab notebook entries
- Building workflow automations or Benchling Apps
- Syncing data between Benchling and external systems
- Querying the Benchling Data Warehouse for analytics
- Setting up event-driven integrations with AWS EventBridge

## Core Capabilities

### 1. Authentication & Setup

**Python SDK Installation:**
```python
# Stable release
uv pip install benchling-sdk
# or with Poetry
poetry add benchling-sdk
```

**Authentication Methods:**

API Key Authentication (recommended for scripts):
```python
from benchling_sdk.benchling import Benchling
from benchling_sdk.auth.api_key_auth import ApiKeyAuth

benchling = Benchling(
    url="https://your-tenant.benchling.com",
    auth_method=ApiKeyAuth("your_api_key")
)
```

OAuth Client Credentials (for apps):
```python
from benchling_sdk.auth.client_credentials_oauth2 import ClientCredentialsOAuth2

auth_method = ClientCredentialsOAuth2(
    client_id="your_client_id",
    client_secret="your_client_secret"
)
benchling = Benchling(
    url="https://your-tenant.benchling.com",
    auth_method=auth_method
)
```

**Key Points:**
- API keys are obtained from Profile Settings in Benchling
- Store credentials securely (use environment variables or password managers)
- All API requests require HTTPS
- Authentication permissions mirror user permissions in the UI

For detailed authentication information including OIDC and security best practices, refer to `references/authentication.md`.

### 2. Registry & Entity Management

Registry entities include DNA sequences, RNA sequences, AA sequences, custom entities, and mixtures. The SDK provides typed classes for creating and managing these entities.

**Creating DNA Sequences:**
```python
from benchling_sdk.models import DnaSequenceCreate

sequence = benchling.dna_sequences.create(
    DnaSequenceCreate(
        name="My Plasmid",
        bases="ATCGATCG",
        is_circular=True,
        folder_id="fld_abc123",
        schema_id="ts_abc123",  # optional
        fields=benchling.models.fields({"gene_name": "GFP"})
    )
)
```

**Registry Registration:**

To register an entity directly upon creation:
```python
sequence = benchling.dna_sequences.create(
    DnaSequenceCreate(
        name="My Plasmid",
        bases="ATCGATCG",
        is_circular=True,
        folder_id="fld_abc123",
        entity_registry_id="src_abc123",  # Registry to register in
        naming_strategy="NEW_IDS"  # or "IDS_FROM_NAMES"
    )
)
```

**Important:** Use either `entity_registry_id` OR `naming_strategy`, never both.

**Updating Entities:**
```python
from benchling_sdk.models import DnaSequenceUpdate

updated = benchling.dna_sequences.update(
    sequence_id="seq_abc123",
    dna_sequence=DnaSequenceUpdate(
        name="Updated Plasmid Name",
        fields=benchling.models.fields({"gene_name": "mCherry"})
    )
)
```

Unspecified fields remain unchanged, allowing partial updates.

**Listing and Pagination:**
```python
# List all DNA sequences (returns a generator)
sequences = benchling.dna_sequences.list()
for page in sequences:
    for seq in page:
        print(f"{seq.name} ({seq.id})")

# Check total count
total = sequences.estimated_count()
```

**Key Operations:**
- Create: `benchling.<entity_type>.create()`
- Read: `benchling.<entity_type>.get(id)` or `.list()`
- Update: `benchling.<entity_type>.update(id, update_object)`
- Archive: `benchling.<entity_type>.archive(id)`

Entity types: `dna_sequences`, `rna_sequences`, `aa_sequences`, `custom_entities`, `mixtures`

For comprehensive SDK reference and advanced patterns, refer to `references/sdk_reference.md`.

### 3. Inventory Management

Manage physical samples, containers, boxes, and locations within the Benchling inventory system.

**Creating Containers:**
```python
from benchling_sdk.models import ContainerCreate

container = benchling.containers.create(
    ContainerCreate(
        name="Sample Tube 001",
        schema_id="cont_schema_abc123",
        parent_storage_id="box_abc123",  # optional
        fields=benchling.models.fields({"concentration": "100 ng/μL"})
    )
)
```

**Managing Boxes:**
```python
from benchling_sdk.models import BoxCreate

box = benchling.boxes.create(
    BoxCreate(
        name="Freezer Box A1",
        schema_id="box_schema_abc123",
        parent_storage_id="loc_abc123"
    )
)
```

**Transferring Items:**
```python
# Transfer a container to a new location
transfer = benchling.containers.transfer(
    container_id="cont_abc123",
    destination_id="box_xyz789"
)
```

**Key Inventory Operations:**
- Create containers, boxes, locations, plates
- Update inventory item properties
- Transfer items between locations
- Check in/out items
- Batch operations for bulk transfers

### 4. Notebook & Documentation

Interact with electronic lab notebook (ELN) entries, protocols, and templates.

**Creating Notebook Entries:**
```python
from benchling_sdk.models import EntryCreate

entry = benchling.entries.create(
    EntryCreate(
        name="Experiment 2025-10-20",
        folder_id="fld_abc123",
        schema_id="entry_schema_abc123",
        fields=benchling.models.fields({"objective": "Test gene expression"})
    )
)
```

**Linking Entities to Entries:**
```python
# Add references to entities in an entry
entry_link = benchling.entry_links.create(
    entry_id="entry_abc123",
    entity_id="seq_xyz789"
)
```

**Key Notebook Operations:**
- Create and update lab notebook entries
- Manage entry templates
- Link entities and results to entries
- Export entries for documentation

### 5. Workflows & Automation

Automate laboratory processes using Benchling's workflow system.

**Creating Workflow Tasks:**
```python
from benchling_sdk.models import WorkflowTaskCreate

task = benchling.workflow_tasks.create(
    WorkflowTaskCreate(
        name="PCR Amplification",
        workflow_id="wf_abc123",
        assignee_id="user_abc123",
        fields=benchling.models.fields({"template": "seq_abc123"})
    )
)
```

**Updating Task Status:**
```python
from benchling_sdk.models import WorkflowTaskUpdate

updated_task = benchling.workflow_tasks.update(
    task_id="task_abc123",
    workflow_task=WorkflowTaskUpdate(
        status_id="status_complete_abc123"
    )
)
```

**Asynchronous Operations:**

Some operations are asynchronous and return tasks:
```python
# Wait for task completion
from benchling_sdk.helpers.tasks import wait_for_task

result = wait_for_task(
    benchling,
    task_id="task_abc123",
    interval_wait_seconds=2,
    max_wait_seconds=300
)
```

**Key Workflow Operations:**
- Create and manage workflow tasks
- Update task statuses and assignments
- Execute bulk operations asynchronously
- Monitor task progress

### 6. Events & Integration

Subscribe to Benchling events for real-time integrations using AWS EventBridge.

**Event Types:**
- Entity creation, update, archive
- Inventory transfers
- Workflow task status changes
- Entry creation and updates
- Results registration

**Integration Pattern:**
1. Configure event routing to AWS EventBridge in Benchling settings
2. Create EventBridge rules to filter events
3. Route events to Lambda functions or other targets
4. Process events and update external systems

**Use Cases:**
- Sync Benchling data to external databases
- Trigger downstream processes on workflow completion
- Send notifications on entity changes
- Audit trail logging

Refer to Benchling's event documentation for event schemas and configuration.

### 7. Data Warehouse & Analytics

Query historical Benchling data using SQL through the Data Warehouse.

**Access Method:**
The Benchling Data Warehouse provides SQL access to Benchling data for analytics and reporting. Connect using standard SQL clients with provided credentials.

**Common Queries:**
- Aggregate experimental results
- Analyze inventory trends
- Generate compliance reports
- Export data for external analysis

**Integration with Analysis Tools:**
- Jupyter notebooks for interactive analysis
- BI tools (Tableau, Looker, PowerBI)
- Custom dashboards

## Best Practices

### Error Handling

The SDK automatically retries failed requests:
```python
# Automatic retry for 429, 502, 503, 504 status codes
# Up to 5 retries with exponential backoff
# Customize retry behavior if needed
from benchling_sdk.retry import RetryStrategy

benchling = Benchling(
    url="https://your-tenant.benchling.com",
    auth_method=ApiKeyAuth("your_api_key"),
    retry_strategy=RetryStrategy(max_retries=3)
)
```

### Pagination Efficiency

Use generators for memory-efficient pagination:
```python
# Generator-based iteration
for page in benchling.dna_sequences.list():
    for sequence in page:
        process(sequence)

# Check estimated count without loading all pages
total = benchling.dna_sequences.list().estimated_count()
```

### Schema Fields Helper

Use the `fields()` helper for custom schema fields:
```python
# Convert dict to Fields object
custom_fields = benchling.models.fields({
    "concentration": "100 ng/μL",
    "date_prepared": "2025-10-20",
    "notes": "High quality prep"
})
```

### Forward Compatibility

The SDK handles unknown enum values and types gracefully:
- Unknown enum values are preserved
- Unrecognized polymorphic types return `UnknownType`
- Allows working with newer API versions

### Security Considerations

- Never commit API keys to version control
- Use environment variables for credentials
- Rotate keys if compromised
- Grant minimal necessary permissions for apps
- Use OAuth for multi-user scenarios

## Resources

### references/

Detailed reference documentation for in-depth information:

- **authentication.md** - Comprehensive authentication guide including OIDC, security best practices, and credential management
- **sdk_reference.md** - Detailed Python SDK reference with advanced patterns, examples, and all entity types
- **api_endpoints.md** - REST API endpoint reference for direct HTTP calls without the SDK

Load these references as needed for specific integration requirements.

### scripts/

This skill currently includes example scripts that can be removed or replaced with custom automation scripts for your specific Benchling workflows.

## Common Use Cases

**1. Bulk Entity Import:**
```python
# Import multiple sequences from FASTA file
from Bio import SeqIO

for record in SeqIO.parse("sequences.fasta", "fasta"):
    benchling.dna_sequences.create(
        DnaSequenceCreate(
            name=record.id,
            bases=str(record.seq),
            is_circular=False,
            folder_id="fld_abc123"
        )
    )
```

**2. Inventory Audit:**
```python
# List all containers in a specific location
containers = benchling.containers.list(
    parent_storage_id="box_abc123"
)

for page in containers:
    for container in page:
        print(f"{container.name}: {container.barcode}")
```

**3. Workflow Automation:**
```python
# Update all pending tasks for a workflow
tasks = benchling.workflow_tasks.list(
    workflow_id="wf_abc123",
    status="pending"
)

for page in tasks:
    for task in page:
        # Perform automated checks
        if auto_validate(task):
            benchling.workflow_tasks.update(
                task_id=task.id,
                workflow_task=WorkflowTaskUpdate(
                    status_id="status_complete"
                )
            )
```

**4. Data Export:**
```python
# Export all sequences with specific properties
sequences = benchling.dna_sequences.list()
export_data = []

for page in sequences:
    for seq in page:
        if seq.schema_id == "target_schema_id":
            export_data.append({
                "id": seq.id,
                "name": seq.name,
                "bases": seq.bases,
                "length": len(seq.bases)
            })

# Save to CSV or database
import csv
with open("sequences.csv", "w") as f:
    writer = csv.DictWriter(f, fieldnames=export_data[0].keys())
    writer.writeheader()
    writer.writerows(export_data)
```

## Additional Resources

- **Official Documentation:** https://docs.benchling.com
- **Python SDK Reference:** https://benchling.com/sdk-docs/
- **API Reference:** https://benchling.com/api/reference
- **Support:** [email protected]

scientific-skills/bgpt-paper-search/SKILL.mdskill

Show content (2479 bytes)

---
name: bgpt-paper-search
description: Search scientific papers and retrieve structured experimental data extracted from full-text studies via the BGPT MCP server. Returns 25+ fields per paper including methods, results, sample sizes, quality scores, and conclusions. Use for literature reviews, evidence synthesis, and finding experimental details not available in abstracts alone.
allowed-tools: Bash
license: MIT
metadata:
    skill-author: BGPT
    website: https://bgpt.pro/mcp
    github: https://github.com/connerlambden/bgpt-mcp
---

# BGPT Paper Search

## Overview

BGPT is a remote MCP server that searches a curated database of scientific papers built from raw experimental data extracted from full-text studies. Unlike traditional literature databases that return titles and abstracts, BGPT returns structured data from the actual paper content — methods, quantitative results, sample sizes, quality assessments, and 25+ metadata fields per paper.

## When to Use This Skill

Use this skill when:
- Searching for scientific papers with specific experimental details
- Conducting systematic or scoping literature reviews
- Finding quantitative results, sample sizes, or effect sizes across studies
- Comparing methodologies used in different studies
- Looking for papers with quality scores or evidence grading
- Needing structured data from full-text papers (not just abstracts)
- Building evidence tables for meta-analyses or clinical guidelines

## Setup

BGPT is a remote MCP server — no local installation required.

### Claude Desktop / Claude Code

Add to your MCP configuration:

```json
{
  "mcpServers": {
    "bgpt": {
      "command": "npx",
      "args": ["mcp-remote", "https://bgpt.pro/mcp/sse"]
    }
  }
}
```

### npm (alternative)

```bash
npx bgpt-mcp
```

## Usage

Once configured, use the `search_papers` tool provided by the BGPT MCP server:

```
Search for papers about: "CRISPR gene editing efficiency in human cells"
```

The server returns structured results including:
- **Title, authors, journal, year, DOI**
- **Methods**: Experimental techniques, models, protocols
- **Results**: Key findings with quantitative data
- **Sample sizes**: Number of subjects/samples
- **Quality scores**: Study quality assessments
- **Conclusions**: Author conclusions and implications

## Pricing

- **Free tier**: 50 searches per network, no API key required
- **Paid**: $0.01 per result with an API key from [bgpt.pro/mcp](https://bgpt.pro/mcp)

scientific-skills/aeon/SKILL.mdskill

Show content (10587 bytes)

---
name: aeon
description: This skill should be used for time series machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use when working with temporal data, sequential patterns, or time-indexed observations requiring specialized algorithms beyond standard ML approaches. Particularly suited for univariate and multivariate time series analysis with scikit-learn compatible APIs.
license: BSD-3-Clause license
metadata:
    skill-author: K-Dense Inc.
---

# Aeon Time Series Machine Learning

## Overview

Aeon is a scikit-learn compatible Python toolkit for time series machine learning. It provides state-of-the-art algorithms for classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search.

## When to Use This Skill

Apply this skill when:
- Classifying or predicting from time series data
- Detecting anomalies or change points in temporal sequences
- Clustering similar time series patterns
- Forecasting future values
- Finding repeated patterns (motifs) or unusual subsequences (discords)
- Comparing time series with specialized distance metrics
- Extracting features from temporal data

## Installation

```bash
uv pip install aeon
```

## Core Capabilities

### 1. Time Series Classification

Categorize time series into predefined classes. See `references/classification.md` for complete algorithm catalog.

**Quick Start:**
```python
from aeon.classification.convolution_based import RocketClassifier
from aeon.datasets import load_classification

# Load data
X_train, y_train = load_classification("GunPoint", split="train")
X_test, y_test = load_classification("GunPoint", split="test")

# Train classifier
clf = RocketClassifier(n_kernels=10000)
clf.fit(X_train, y_train)
accuracy = clf.score(X_test, y_test)
```

**Algorithm Selection:**
- **Speed + Performance**: `MiniRocketClassifier`, `Arsenal`
- **Maximum Accuracy**: `HIVECOTEV2`, `InceptionTimeClassifier`
- **Interpretability**: `ShapeletTransformClassifier`, `Catch22Classifier`
- **Small Datasets**: `KNeighborsTimeSeriesClassifier` with DTW distance

### 2. Time Series Regression

Predict continuous values from time series. See `references/regression.md` for algorithms.

**Quick Start:**
```python
from aeon.regression.convolution_based import RocketRegressor
from aeon.datasets import load_regression

X_train, y_train = load_regression("Covid3Month", split="train")
X_test, y_test = load_regression("Covid3Month", split="test")

reg = RocketRegressor()
reg.fit(X_train, y_train)
predictions = reg.predict(X_test)
```

### 3. Time Series Clustering

Group similar time series without labels. See `references/clustering.md` for methods.

**Quick Start:**
```python
from aeon.clustering import TimeSeriesKMeans

clusterer = TimeSeriesKMeans(
    n_clusters=3,
    distance="dtw",
    averaging_method="ba"
)
labels = clusterer.fit_predict(X_train)
centers = clusterer.cluster_centers_
```

### 4. Forecasting

Predict future time series values. See `references/forecasting.md` for forecasters.

**Quick Start:**
```python
from aeon.forecasting.arima import ARIMA

forecaster = ARIMA(order=(1, 1, 1))
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=[1, 2, 3, 4, 5])
```

### 5. Anomaly Detection

Identify unusual patterns or outliers. See `references/anomaly_detection.md` for detectors.

**Quick Start:**
```python
from aeon.anomaly_detection import STOMP

detector = STOMP(window_size=50)
anomaly_scores = detector.fit_predict(y)

# Higher scores indicate anomalies
threshold = np.percentile(anomaly_scores, 95)
anomalies = anomaly_scores > threshold
```

### 6. Segmentation

Partition time series into regions with change points. See `references/segmentation.md`.

**Quick Start:**
```python
from aeon.segmentation import ClaSPSegmenter

segmenter = ClaSPSegmenter()
change_points = segmenter.fit_predict(y)
```

### 7. Similarity Search

Find similar patterns within or across time series. See `references/similarity_search.md`.

**Quick Start:**
```python
from aeon.similarity_search import StompMotif

# Find recurring patterns
motif_finder = StompMotif(window_size=50, k=3)
motifs = motif_finder.fit_predict(y)
```

## Feature Extraction and Transformations

Transform time series for feature engineering. See `references/transformations.md`.

**ROCKET Features:**
```python
from aeon.transformations.collection.convolution_based import RocketTransformer

rocket = RocketTransformer()
X_features = rocket.fit_transform(X_train)

# Use features with any sklearn classifier
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier()
clf.fit(X_features, y_train)
```

**Statistical Features:**
```python
from aeon.transformations.collection.feature_based import Catch22

catch22 = Catch22()
X_features = catch22.fit_transform(X_train)
```

**Preprocessing:**
```python
from aeon.transformations.collection import MinMaxScaler, Normalizer

scaler = Normalizer()  # Z-normalization
X_normalized = scaler.fit_transform(X_train)
```

## Distance Metrics

Specialized temporal distance measures. See `references/distances.md` for complete catalog.

**Usage:**
```python
from aeon.distances import dtw_distance, dtw_pairwise_distance

# Single distance
distance = dtw_distance(x, y, window=0.1)

# Pairwise distances
distance_matrix = dtw_pairwise_distance(X_train)

# Use with classifiers
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier

clf = KNeighborsTimeSeriesClassifier(
    n_neighbors=5,
    distance="dtw",
    distance_params={"window": 0.2}
)
```

**Available Distances:**
- **Elastic**: DTW, DDTW, WDTW, ERP, EDR, LCSS, TWE, MSM
- **Lock-step**: Euclidean, Manhattan, Minkowski
- **Shape-based**: Shape DTW, SBD

## Deep Learning Networks

Neural architectures for time series. See `references/networks.md`.

**Architectures:**
- Convolutional: `FCNClassifier`, `ResNetClassifier`, `InceptionTimeClassifier`
- Recurrent: `RecurrentNetwork`, `TCNNetwork`
- Autoencoders: `AEFCNClusterer`, `AEResNetClusterer`

**Usage:**
```python
from aeon.classification.deep_learning import InceptionTimeClassifier

clf = InceptionTimeClassifier(n_epochs=100, batch_size=32)
clf.fit(X_train, y_train)
predictions = clf.predict(X_test)
```

## Datasets and Benchmarking

Load standard benchmarks and evaluate performance. See `references/datasets_benchmarking.md`.

**Load Datasets:**
```python
from aeon.datasets import load_classification, load_regression

# Classification
X_train, y_train = load_classification("ArrowHead", split="train")

# Regression
X_train, y_train = load_regression("Covid3Month", split="train")
```

**Benchmarking:**
```python
from aeon.benchmarking import get_estimator_results

# Compare with published results
published = get_estimator_results("ROCKET", "GunPoint")
```

## Common Workflows

### Classification Pipeline

```python
from aeon.transformations.collection import Normalizer
from aeon.classification.convolution_based import RocketClassifier
from sklearn.pipeline import Pipeline

pipeline = Pipeline([
    ('normalize', Normalizer()),
    ('classify', RocketClassifier())
])

pipeline.fit(X_train, y_train)
accuracy = pipeline.score(X_test, y_test)
```

### Feature Extraction + Traditional ML

```python
from aeon.transformations.collection import RocketTransformer
from sklearn.ensemble import GradientBoostingClassifier

# Extract features
rocket = RocketTransformer()
X_train_features = rocket.fit_transform(X_train)
X_test_features = rocket.transform(X_test)

# Train traditional ML
clf = GradientBoostingClassifier()
clf.fit(X_train_features, y_train)
predictions = clf.predict(X_test_features)
```

### Anomaly Detection with Visualization

```python
from aeon.anomaly_detection import STOMP
import matplotlib.pyplot as plt

detector = STOMP(window_size=50)
scores = detector.fit_predict(y)

plt.figure(figsize=(15, 5))
plt.subplot(2, 1, 1)
plt.plot(y, label='Time Series')
plt.subplot(2, 1, 2)
plt.plot(scores, label='Anomaly Scores', color='red')
plt.axhline(np.percentile(scores, 95), color='k', linestyle='--')
plt.show()
```

## Best Practices

### Data Preparation

1. **Normalize**: Most algorithms benefit from z-normalization
   ```python
   from aeon.transformations.collection import Normalizer
   normalizer = Normalizer()
   X_train = normalizer.fit_transform(X_train)
   X_test = normalizer.transform(X_test)
   ```

2. **Handle Missing Values**: Impute before analysis
   ```python
   from aeon.transformations.collection import SimpleImputer
   imputer = SimpleImputer(strategy='mean')
   X_train = imputer.fit_transform(X_train)
   ```

3. **Check Data Format**: Aeon expects shape `(n_samples, n_channels, n_timepoints)`

### Model Selection

1. **Start Simple**: Begin with ROCKET variants before deep learning
2. **Use Validation**: Split training data for hyperparameter tuning
3. **Compare Baselines**: Test against simple methods (1-NN Euclidean, Naive)
4. **Consider Resources**: ROCKET for speed, deep learning if GPU available

### Algorithm Selection Guide

**For Fast Prototyping:**
- Classification: `MiniRocketClassifier`
- Regression: `MiniRocketRegressor`
- Clustering: `TimeSeriesKMeans` with Euclidean

**For Maximum Accuracy:**
- Classification: `HIVECOTEV2`, `InceptionTimeClassifier`
- Regression: `InceptionTimeRegressor`
- Forecasting: `ARIMA`, `TCNForecaster`

**For Interpretability:**
- Classification: `ShapeletTransformClassifier`, `Catch22Classifier`
- Features: `Catch22`, `TSFresh`

**For Small Datasets:**
- Distance-based: `KNeighborsTimeSeriesClassifier` with DTW
- Avoid: Deep learning (requires large data)

## Reference Documentation

Detailed information available in `references/`:
- `classification.md` - All classification algorithms
- `regression.md` - Regression methods
- `clustering.md` - Clustering algorithms
- `forecasting.md` - Forecasting approaches
- `anomaly_detection.md` - Anomaly detection methods
- `segmentation.md` - Segmentation algorithms
- `similarity_search.md` - Pattern matching and motif discovery
- `transformations.md` - Feature extraction and preprocessing
- `distances.md` - Time series distance metrics
- `networks.md` - Deep learning architectures
- `datasets_benchmarking.md` - Data loading and evaluation tools

## Additional Resources

- Documentation: https://www.aeon-toolkit.org/
- GitHub: https://github.com/aeon-toolkit/aeon
- Examples: https://www.aeon-toolkit.org/en/stable/examples.html
- API Reference: https://www.aeon-toolkit.org/en/stable/api_reference.html

scientific-skills/anndata/SKILL.mdskill

Show content (10214 bytes)

---
name: anndata
description: Data structure for annotated matrices in single-cell analysis. Use when working with .h5ad files or integrating with the scverse ecosystem. This is the data format skill—for analysis workflows use scanpy; for probabilistic models use scvi-tools; for population-scale queries use cellxgene-census.
license: BSD-3-Clause license
metadata:
    skill-author: K-Dense Inc.
---

# AnnData

## Overview

AnnData is a Python package for handling annotated data matrices, storing experimental measurements (X) alongside observation metadata (obs), variable metadata (var), and multi-dimensional annotations (obsm, varm, obsp, varp, uns). Originally designed for single-cell genomics through Scanpy, it now serves as a general-purpose framework for any annotated data requiring efficient storage, manipulation, and analysis.

## When to Use This Skill

Use this skill when:
- Creating, reading, or writing AnnData objects
- Working with h5ad, zarr, or other genomics data formats
- Performing single-cell RNA-seq analysis
- Managing large datasets with sparse matrices or backed mode
- Concatenating multiple datasets or experimental batches
- Subsetting, filtering, or transforming annotated data
- Integrating with scanpy, scvi-tools, or other scverse ecosystem tools

## Installation

```bash
uv pip install anndata

# With optional dependencies
uv pip install anndata[dev,test,doc]
```

## Quick Start

### Creating an AnnData object
```python
import anndata as ad
import numpy as np
import pandas as pd

# Minimal creation
X = np.random.rand(100, 2000)  # 100 cells × 2000 genes
adata = ad.AnnData(X)

# With metadata
obs = pd.DataFrame({
    'cell_type': ['T cell', 'B cell'] * 50,
    'sample': ['A', 'B'] * 50
}, index=[f'cell_{i}' for i in range(100)])

var = pd.DataFrame({
    'gene_name': [f'Gene_{i}' for i in range(2000)]
}, index=[f'ENSG{i:05d}' for i in range(2000)])

adata = ad.AnnData(X=X, obs=obs, var=var)
```

### Reading data
```python
# Read h5ad file
adata = ad.read_h5ad('data.h5ad')

# Read with backed mode (for large files)
adata = ad.read_h5ad('large_data.h5ad', backed='r')

# Read other formats
adata = ad.read_csv('data.csv')
adata = ad.read_loom('data.loom')
adata = ad.read_10x_h5('filtered_feature_bc_matrix.h5')
```

### Writing data
```python
# Write h5ad file
adata.write_h5ad('output.h5ad')

# Write with compression
adata.write_h5ad('output.h5ad', compression='gzip')

# Write other formats
adata.write_zarr('output.zarr')
adata.write_csvs('output_dir/')
```

### Basic operations
```python
# Subset by conditions
t_cells = adata[adata.obs['cell_type'] == 'T cell']

# Subset by indices
subset = adata[0:50, 0:100]

# Add metadata
adata.obs['quality_score'] = np.random.rand(adata.n_obs)
adata.var['highly_variable'] = np.random.rand(adata.n_vars) > 0.8

# Access dimensions
print(f"{adata.n_obs} observations × {adata.n_vars} variables")
```

## Core Capabilities

### 1. Data Structure

Understand the AnnData object structure including X, obs, var, layers, obsm, varm, obsp, varp, uns, and raw components.

**See**: `references/data_structure.md` for comprehensive information on:
- Core components (X, obs, var, layers, obsm, varm, obsp, varp, uns, raw)
- Creating AnnData objects from various sources
- Accessing and manipulating data components
- Memory-efficient practices

### 2. Input/Output Operations

Read and write data in various formats with support for compression, backed mode, and cloud storage.

**See**: `references/io_operations.md` for details on:
- Native formats (h5ad, zarr)
- Alternative formats (CSV, MTX, Loom, 10X, Excel)
- Backed mode for large datasets
- Remote data access
- Format conversion
- Performance optimization

Common commands:
```python
# Read/write h5ad
adata = ad.read_h5ad('data.h5ad', backed='r')
adata.write_h5ad('output.h5ad', compression='gzip')

# Read 10X data
adata = ad.read_10x_h5('filtered_feature_bc_matrix.h5')

# Read MTX format
adata = ad.read_mtx('matrix.mtx').T
```

### 3. Concatenation

Combine multiple AnnData objects along observations or variables with flexible join strategies.

**See**: `references/concatenation.md` for comprehensive coverage of:
- Basic concatenation (axis=0 for observations, axis=1 for variables)
- Join types (inner, outer)
- Merge strategies (same, unique, first, only)
- Tracking data sources with labels
- Lazy concatenation (AnnCollection)
- On-disk concatenation for large datasets

Common commands:
```python
# Concatenate observations (combine samples)
adata = ad.concat(
    [adata1, adata2, adata3],
    axis=0,
    join='inner',
    label='batch',
    keys=['batch1', 'batch2', 'batch3']
)

# Concatenate variables (combine modalities)
adata = ad.concat([adata_rna, adata_protein], axis=1)

# Lazy concatenation
from anndata.experimental import AnnCollection
collection = AnnCollection(
    ['data1.h5ad', 'data2.h5ad'],
    join_obs='outer',
    label='dataset'
)
```

### 4. Data Manipulation

Transform, subset, filter, and reorganize data efficiently.

**See**: `references/manipulation.md` for detailed guidance on:
- Subsetting (by indices, names, boolean masks, metadata conditions)
- Transposition
- Copying (full copies vs views)
- Renaming (observations, variables, categories)
- Type conversions (strings to categoricals, sparse/dense)
- Adding/removing data components
- Reordering
- Quality control filtering

Common commands:
```python
# Subset by metadata
filtered = adata[adata.obs['quality_score'] > 0.8]
hv_genes = adata[:, adata.var['highly_variable']]

# Transpose
adata_T = adata.T

# Copy vs view
view = adata[0:100, :]  # View (lightweight reference)
copy = adata[0:100, :].copy()  # Independent copy

# Convert strings to categoricals
adata.strings_to_categoricals()
```

### 5. Best Practices

Follow recommended patterns for memory efficiency, performance, and reproducibility.

**See**: `references/best_practices.md` for guidelines on:
- Memory management (sparse matrices, categoricals, backed mode)
- Views vs copies
- Data storage optimization
- Performance optimization
- Working with raw data
- Metadata management
- Reproducibility
- Error handling
- Integration with other tools
- Common pitfalls and solutions

Key recommendations:
```python
# Use sparse matrices for sparse data
from scipy.sparse import csr_matrix
adata.X = csr_matrix(adata.X)

# Convert strings to categoricals
adata.strings_to_categoricals()

# Use backed mode for large files
adata = ad.read_h5ad('large.h5ad', backed='r')

# Store raw before filtering
adata.raw = adata.copy()
adata = adata[:, adata.var['highly_variable']]
```

## Integration with Scverse Ecosystem

AnnData serves as the foundational data structure for the scverse ecosystem:

### Scanpy (Single-cell analysis)
```python
import scanpy as sc

# Preprocessing
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, n_top_genes=2000)

# Dimensionality reduction
sc.pp.pca(adata, n_comps=50)
sc.pp.neighbors(adata, n_neighbors=15)
sc.tl.umap(adata)
sc.tl.leiden(adata)

# Visualization
sc.pl.umap(adata, color=['cell_type', 'leiden'])
```

### Muon (Multimodal data)
```python
import muon as mu

# Combine RNA and protein data
mdata = mu.MuData({'rna': adata_rna, 'protein': adata_protein})
```

### PyTorch integration
```python
from anndata.experimental import AnnLoader

# Create DataLoader for deep learning
dataloader = AnnLoader(adata, batch_size=128, shuffle=True)

for batch in dataloader:
    X = batch.X
    # Train model
```

## Common Workflows

### Single-cell RNA-seq analysis
```python
import anndata as ad
import scanpy as sc

# 1. Load data
adata = ad.read_10x_h5('filtered_feature_bc_matrix.h5')

# 2. Quality control
adata.obs['n_genes'] = (adata.X > 0).sum(axis=1)
adata.obs['n_counts'] = adata.X.sum(axis=1)
adata = adata[adata.obs['n_genes'] > 200]
adata = adata[adata.obs['n_counts'] < 50000]

# 3. Store raw
adata.raw = adata.copy()

# 4. Normalize and filter
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, n_top_genes=2000)
adata = adata[:, adata.var['highly_variable']]

# 5. Save processed data
adata.write_h5ad('processed.h5ad')
```

### Batch integration
```python
# Load multiple batches
adata1 = ad.read_h5ad('batch1.h5ad')
adata2 = ad.read_h5ad('batch2.h5ad')
adata3 = ad.read_h5ad('batch3.h5ad')

# Concatenate with batch labels
adata = ad.concat(
    [adata1, adata2, adata3],
    label='batch',
    keys=['batch1', 'batch2', 'batch3'],
    join='inner'
)

# Apply batch correction
import scanpy as sc
sc.pp.combat(adata, key='batch')

# Continue analysis
sc.pp.pca(adata)
sc.pp.neighbors(adata)
sc.tl.umap(adata)
```

### Working with large datasets
```python
# Open in backed mode
adata = ad.read_h5ad('100GB_dataset.h5ad', backed='r')

# Filter based on metadata (no data loading)
high_quality = adata[adata.obs['quality_score'] > 0.8]

# Load filtered subset
adata_subset = high_quality.to_memory()

# Process subset
process(adata_subset)

# Or process in chunks
chunk_size = 1000
for i in range(0, adata.n_obs, chunk_size):
    chunk = adata[i:i+chunk_size, :].to_memory()
    process(chunk)
```

## Troubleshooting

### Out of memory errors
Use backed mode or convert to sparse matrices:
```python
# Backed mode
adata = ad.read_h5ad('file.h5ad', backed='r')

# Sparse matrices
from scipy.sparse import csr_matrix
adata.X = csr_matrix(adata.X)
```

### Slow file reading
Use compression and appropriate formats:
```python
# Optimize for storage
adata.strings_to_categoricals()
adata.write_h5ad('file.h5ad', compression='gzip')

# Use Zarr for cloud storage
adata.write_zarr('file.zarr', chunks=(1000, 1000))
```

### Index alignment issues
Always align external data on index:
```python
# Wrong
adata.obs['new_col'] = external_data['values']

# Correct
adata.obs['new_col'] = external_data.set_index('cell_id').loc[adata.obs_names, 'values']
```

## Additional Resources

- **Official documentation**: https://anndata.readthedocs.io/
- **Scanpy tutorials**: https://scanpy.readthedocs.io/
- **Scverse ecosystem**: https://scverse.org/
- **GitHub repository**: https://github.com/scverse/anndata

scientific-skills/arboreto/SKILL.mdskill

Show content (6929 bytes)

---
name: arboreto
description: Infer gene regulatory networks (GRNs) from gene expression data using scalable algorithms (GRNBoost2, GENIE3). Use when analyzing transcriptomics data (bulk RNA-seq, single-cell RNA-seq) to identify transcription factor-target gene relationships and regulatory interactions. Supports distributed computation for large-scale datasets.
license: BSD-3-Clause license
metadata:
    skill-author: K-Dense Inc.
---

# Arboreto

## Overview

Arboreto is a computational library for inferring gene regulatory networks (GRNs) from gene expression data using parallelized algorithms that scale from single machines to multi-node clusters.

**Core capability**: Identify which transcription factors (TFs) regulate which target genes based on expression patterns across observations (cells, samples, conditions).

## Quick Start

Install arboreto:
```bash
uv pip install arboreto
```

Basic GRN inference:
```python
import pandas as pd
from arboreto.algo import grnboost2

if __name__ == '__main__':
    # Load expression data (genes as columns)
    expression_matrix = pd.read_csv('expression_data.tsv', sep='\t')

    # Infer regulatory network
    network = grnboost2(expression_data=expression_matrix)

    # Save results (TF, target, importance)
    network.to_csv('network.tsv', sep='\t', index=False, header=False)
```

**Critical**: Always use `if __name__ == '__main__':` guard because Dask spawns new processes.

## Core Capabilities

### 1. Basic GRN Inference

For standard GRN inference workflows including:
- Input data preparation (Pandas DataFrame or NumPy array)
- Running inference with GRNBoost2 or GENIE3
- Filtering by transcription factors
- Output format and interpretation

**See**: `references/basic_inference.md`

**Use the ready-to-run script**: `scripts/basic_grn_inference.py` for standard inference tasks:
```bash
python scripts/basic_grn_inference.py expression_data.tsv output_network.tsv --tf-file tfs.txt --seed 777
```

### 2. Algorithm Selection

Arboreto provides two algorithms:

**GRNBoost2 (Recommended)**:
- Fast gradient boosting-based inference
- Optimized for large datasets (10k+ observations)
- Default choice for most analyses

**GENIE3**:
- Random Forest-based inference
- Original multiple regression approach
- Use for comparison or validation

Quick comparison:
```python
from arboreto.algo import grnboost2, genie3

# Fast, recommended
network_grnboost = grnboost2(expression_data=matrix)

# Classic algorithm
network_genie3 = genie3(expression_data=matrix)
```

**For detailed algorithm comparison, parameters, and selection guidance**: `references/algorithms.md`

### 3. Distributed Computing

Scale inference from local multi-core to cluster environments:

**Local (default)** - Uses all available cores automatically:
```python
network = grnboost2(expression_data=matrix)
```

**Custom local client** - Control resources:
```python
from distributed import LocalCluster, Client

local_cluster = LocalCluster(n_workers=10, memory_limit='8GB')
client = Client(local_cluster)

network = grnboost2(expression_data=matrix, client_or_address=client)

client.close()
local_cluster.close()
```

**Cluster computing** - Connect to remote Dask scheduler:
```python
from distributed import Client

client = Client('tcp://scheduler:8786')
network = grnboost2(expression_data=matrix, client_or_address=client)
```

**For cluster setup, performance optimization, and large-scale workflows**: `references/distributed_computing.md`

## Installation

```bash
uv pip install arboreto
```

**Dependencies**: scipy, scikit-learn, numpy, pandas, dask, distributed

## Common Use Cases

### Single-Cell RNA-seq Analysis
```python
import pandas as pd
from arboreto.algo import grnboost2

if __name__ == '__main__':
    # Load single-cell expression matrix (cells x genes)
    sc_data = pd.read_csv('scrna_counts.tsv', sep='\t')

    # Infer cell-type-specific regulatory network
    network = grnboost2(expression_data=sc_data, seed=42)

    # Filter high-confidence links
    high_confidence = network[network['importance'] > 0.5]
    high_confidence.to_csv('grn_high_confidence.tsv', sep='\t', index=False)
```

### Bulk RNA-seq with TF Filtering
```python
from arboreto.utils import load_tf_names
from arboreto.algo import grnboost2

if __name__ == '__main__':
    # Load data
    expression_data = pd.read_csv('rnaseq_tpm.tsv', sep='\t')
    tf_names = load_tf_names('human_tfs.txt')

    # Infer with TF restriction
    network = grnboost2(
        expression_data=expression_data,
        tf_names=tf_names,
        seed=123
    )

    network.to_csv('tf_target_network.tsv', sep='\t', index=False)
```

### Comparative Analysis (Multiple Conditions)
```python
from arboreto.algo import grnboost2

if __name__ == '__main__':
    # Infer networks for different conditions
    conditions = ['control', 'treatment_24h', 'treatment_48h']

    for condition in conditions:
        data = pd.read_csv(f'{condition}_expression.tsv', sep='\t')
        network = grnboost2(expression_data=data, seed=42)
        network.to_csv(f'{condition}_network.tsv', sep='\t', index=False)
```

## Output Interpretation

Arboreto returns a DataFrame with regulatory links:

| Column | Description |
|--------|-------------|
| `TF` | Transcription factor (regulator) |
| `target` | Target gene |
| `importance` | Regulatory importance score (higher = stronger) |

**Filtering strategy**:
- Top N links per target gene
- Importance threshold (e.g., > 0.5)
- Statistical significance testing (permutation tests)

## Integration with pySCENIC

Arboreto is a core component of the SCENIC pipeline for single-cell regulatory network analysis:

```python
# Step 1: Use arboreto for GRN inference
from arboreto.algo import grnboost2
network = grnboost2(expression_data=sc_data, tf_names=tf_list)

# Step 2: Use pySCENIC for regulon identification and activity scoring
# (See pySCENIC documentation for downstream analysis)
```

## Reproducibility

Always set a seed for reproducible results:
```python
network = grnboost2(expression_data=matrix, seed=777)
```

Run multiple seeds for robustness analysis:
```python
from distributed import LocalCluster, Client

if __name__ == '__main__':
    client = Client(LocalCluster())

    seeds = [42, 123, 777]
    networks = []

    for seed in seeds:
        net = grnboost2(expression_data=matrix, client_or_address=client, seed=seed)
        networks.append(net)

    # Combine networks and filter consensus links
    consensus = analyze_consensus(networks)
```

## Troubleshooting

**Memory errors**: Reduce dataset size by filtering low-variance genes or use distributed computing

**Slow performance**: Use GRNBoost2 instead of GENIE3, enable distributed client, filter TF list

**Dask errors**: Ensure `if __name__ == '__main__':` guard is present in scripts

**Empty results**: Check data format (genes as columns), verify TF names match gene names

README

Scientific Agent Skills

🔔 Claude Scientific Skills is now Scientific Agent Skills. Same skills, broader compatibility — now works with any AI agent that supports the open Agent Skills standard, not just Claude.

New: K-Dense BYOK — A free, open-source AI co-scientist that runs on your desktop, powered by Scientific Agent Skills. Bring your own API keys, pick from 40+ models, and get a full research workspace with web search, file handling, 100+ scientific databases, and access to all 135 skills in this repo. Your data stays on your computer, and you can optionally scale to cloud compute via Modal for heavy workloads. Get started here.

A comprehensive collection of 135 ready-to-use scientific and research skills (covering cancer genomics, drug-target binding, molecular dynamics, RNA velocity, geospatial science, time series forecasting, scientific ML resource discovery via Hugging Science, 78+ scientific databases, and more) for any AI agent that supports the open Agent Skills standard, created by K-Dense. Works with Cursor, Claude Code, Codex, and more. Transform your AI agent into a research assistant capable of executing complex multi-step scientific workflows across biology, chemistry, medicine, and beyond.

These skills enable your AI agent to seamlessly work with specialized scientific libraries, databases, and tools across multiple scientific domains. While the agent can use any Python package or API on its own, these explicitly defined skills provide curated documentation and examples that make it significantly stronger and more reliable for the workflows below:

🧬 Bioinformatics & Genomics - Sequence analysis, single-cell RNA-seq, gene regulatory networks, variant annotation, phylogenetic analysis
🧪 Cheminformatics & Drug Discovery - Molecular property prediction, virtual screening, ADMET analysis, molecular docking, lead optimization
🔬 Proteomics & Mass Spectrometry - LC-MS/MS processing, peptide identification, spectral matching, protein quantification
🏥 Clinical Research & Precision Medicine - Clinical trials, pharmacogenomics, variant interpretation, drug safety, clinical decision support, treatment planning
🧠 Healthcare AI & Clinical ML - EHR analysis, physiological signal processing, medical imaging, clinical prediction models
🖼️ Medical Imaging & Digital Pathology - DICOM processing, whole slide image analysis, computational pathology, radiology workflows
🤖 Machine Learning & AI - Deep learning, reinforcement learning, time series analysis, model interpretability, Bayesian methods
🔮 Materials Science & Chemistry - Crystal structure analysis, phase diagrams, metabolic modeling, computational chemistry
🌌 Physics & Astronomy - Astronomical data analysis, coordinate transformations, cosmological calculations, symbolic mathematics, physics computations
⚙️ Engineering & Simulation - Discrete-event simulation, multi-objective optimization, metabolic engineering, systems modeling, process optimization
📊 Data Analysis & Visualization - Statistical analysis, network analysis, time series, publication-quality figures, large-scale data processing, EDA
🌍 Geospatial Science & Remote Sensing - Satellite imagery processing, GIS analysis, spatial statistics, terrain analysis, machine learning for Earth observation
🧪 Laboratory Automation - Liquid handling protocols, lab equipment control, workflow automation, LIMS integration
📚 Scientific Communication - Literature review, peer review, scientific writing, document processing, posters, slides, schematics, citation management
🔬 Multi-omics & Systems Biology - Multi-modal data integration, pathway analysis, network biology, systems-level insights
🧬 Protein Engineering & Design - Protein language models, structure prediction, sequence design, function annotation
🎓 Research Methodology - Hypothesis generation, scientific brainstorming, critical thinking, grant writing, scholar evaluation

Transform your AI coding agent into an 'AI Scientist' on your desktop!

⭐ If you find this repository useful, please consider giving it a star! It helps others discover these tools and encourages us to continue maintaining and expanding this collection.

🎬 New to Scientific Agent Skills? Watch our Getting Started with Scientific Agent Skills video for a quick walkthrough.

📦 What's Included

This repository provides 135 scientific and research skills organized into the following categories:

100+ Scientific & Financial Databases - A unified database-lookup skill provides direct access to 78 public databases (PubChem, ChEMBL, UniProt, COSMIC, ClinicalTrials.gov, FRED, USPTO, and more), plus dedicated skills for DepMap, Imaging Data Commons, PrimeKG, U.S. Treasury Fiscal Data, and Hugging Science (curated catalog of scientific datasets, models, and demos across 17 scientific domains on Hugging Face). Multi-database packages like BioServices (~40 bioinformatics services), BioPython (38 NCBI sub-databases via Entrez), and gget (20+ genomics databases) add further coverage
70+ Optimized Python Package Skills - Explicitly defined skills for RDKit, Scanpy, PyTorch Lightning, scikit-learn, BioPython, pyzotero, BioServices, PennyLane, Qiskit, OpenMM, MDAnalysis, scVelo, TimesFM, and others — with curated documentation, examples, and best practices. Note: the agent can write code using any Python package, not just these; these skills simply provide stronger, more reliable performance for the packages listed
9 Scientific Integration Skills - Explicitly defined skills for Benchling, DNAnexus, LatchBio, OMERO, Protocols.io, Open Notebook, and more. Again, the agent is not limited to these — any API or platform reachable from Python is fair game; these skills are the optimized, pre-documented paths
30+ Analysis & Communication Tools - Literature review, scientific writing, peer review, document processing, posters, slides, schematics, infographics, Mermaid diagrams, and more
10+ Research & Clinical Tools - Hypothesis generation, grant writing, clinical decision support, treatment plans, regulatory compliance, scenario analysis

Each skill includes:

✅ Comprehensive documentation (SKILL.md)
✅ Practical code examples
✅ Use cases and best practices
✅ Integration guides
✅ Reference materials

🚀 Why Use This?

⚡ Accelerate Your Research

Save Days of Work - Skip API documentation research and integration setup
Production-Ready Code - Tested, validated examples following scientific best practices
Multi-Step Workflows - Execute complex pipelines with a single prompt

🎯 Comprehensive Coverage

135 Skills - Extensive coverage across all major scientific domains
100+ Databases - Unified access to 78+ databases via database-lookup, plus dedicated data access skills and multi-database packages like BioServices, BioPython, and gget
70+ Optimized Python Package Skills - RDKit, Scanpy, PyTorch Lightning, scikit-learn, BioServices, PennyLane, Qiskit, OpenMM, scVelo, TimesFM, and others (the agent can use any Python package; these are the pre-documented, higher-performing paths)

🔧 Easy Integration

Simple Setup - Copy skills to your skills directory and start working
Automatic Discovery - Your agent automatically finds and uses relevant skills
Well Documented - Each skill includes examples, use cases, and best practices

🌟 Maintained & Supported

Regular Updates - Continuously maintained and expanded by K-Dense team
Community Driven - Open source with active community contributions
Enterprise Ready - Commercial support available for advanced needs

🎯 Getting Started

Option 1: npx (all platforms)

Install Scientific Agent Skills with a single command:

npx skills add K-Dense-AI/scientific-agent-skills

This is the official standard approach for installing Agent Skills across all platforms, including Claude Code, Claude Cowork, Codex, Gemini CLI, Cursor, and any other agent that supports the open Agent Skills standard.

Option 2: GitHub CLI (`gh skill`)

If you use the GitHub CLI (v2.90.0+), you can install skills with gh skill:

# Browse and install interactively
gh skill install K-Dense-AI/scientific-agent-skills

# Install a specific skill directly
gh skill install K-Dense-AI/scientific-agent-skills scanpy

# Target a specific agent host
gh skill install K-Dense-AI/scientific-agent-skills --agent cursor
gh skill install K-Dense-AI/scientific-agent-skills --agent claude-code
gh skill install K-Dense-AI/scientific-agent-skills --agent codex
gh skill install K-Dense-AI/scientific-agent-skills --agent gemini

gh skill automatically installs to the correct directory for your agent host and records provenance metadata for supply chain integrity.

Version pinning

Pin to a specific release tag or commit SHA for reproducible installs:

# Pin to a release tag
gh skill install K-Dense-AI/scientific-agent-skills --pin v1.0.0

# Pin to a commit SHA
gh skill install K-Dense-AI/scientific-agent-skills --pin abc123def

Keeping skills up to date

# Check for updates interactively
gh skill update

# Update all installed skills
gh skill update --all

That's it! Your AI agent will automatically discover the skills and use them when relevant to your scientific tasks. You can also invoke any skill manually by mentioning the skill name in your prompt.

⚠️ Security Disclaimer

Skills can execute code and influence your coding agent's behavior. Review what you install.

Agent Skills are powerful — they can instruct your AI agent to run arbitrary code, install packages, make network requests, and modify files on your system. A malicious or poorly written skill has the potential to steer your coding agent into harmful behavior.

We take security seriously. All contributions go through a review process, and we run LLM-based security scans (via Cisco AI Defense Skill Scanner) on every skill in this repository. However, as a small team with a growing number of community contributions, we cannot guarantee that every skill has been exhaustively reviewed for all possible risks.

It is ultimately your responsibility to review the skills you install and decide which ones to trust.

We recommend the following:

Do not install everything at once. Only install the skills you actually need for your work. While installing the full collection was reasonable when K-Dense created and maintained every skill, the repository now includes many community contributions that we may not have reviewed as thoroughly.
Read the SKILL.md before installing. Each skill's documentation describes what it does, what packages it uses, and what external services it connects to. If something looks suspicious, don't install it.
Check the contribution history. Skills authored by K-Dense (K-Dense-AI) have been through our internal review process. Community-contributed skills have been reviewed to the best of our ability, but with limited resources.

Run the security scanner yourself. Before installing third-party skills, scan them locally:

uv pip install cisco-ai-skill-scanner
skill-scanner scan /path/to/skill --use-behavioral

Report anything suspicious. If you find a skill that looks malicious or behaves unexpectedly, please open an issue immediately so we can investigate.

All skills are scanned on an approximately weekly basis, and SECURITY.md is updated with the latest results. We try to address security gaps as they arise.

❤️ Support the Open Source Community

Scientific Agent Skills is powered by 50+ incredible open source projects maintained by dedicated developers and research communities worldwide. Projects like Biopython, Scanpy, RDKit, scikit-learn, PyTorch Lightning, and many others form the foundation of these skills.

If you find value in this repository, please consider supporting the projects that make it possible:

⭐ Star their repositories on GitHub
💰 Sponsor maintainers via GitHub Sponsors or NumFOCUS
📝 Cite projects in your publications
💻 Contribute code, docs, or bug reports

👉 View the full list of projects to support

⚙️ Prerequisites

Python: 3.11+ (3.12+ recommended for best compatibility)
uv: Python package manager (required for installing skill dependencies)
Client: Any agent that supports the Agent Skills standard (Cursor, Claude Code, Gemini CLI, Codex, etc.)
System: macOS, Linux, or Windows with WSL2
Dependencies: Automatically handled by individual skills (check SKILL.md files for specific requirements)

Installing uv

The skills use uv as the package manager for installing Python dependencies. Install it using the instructions for your operating system:

macOS and Linux:

curl -LsSf https://astral.sh/uv/install.sh | sh

Windows:

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Alternative (via pip):

pip install uv

After installation, verify it works by running:

uv --version

For more installation options and details, visit the official uv documentation.

💡 Quick Examples

Once you've installed the skills, you can ask your AI agent to execute complex multi-step scientific workflows. Here are some example prompts:

🧪 Drug Discovery Pipeline

Goal: Find novel EGFR inhibitors for lung cancer treatment

Prompt:

Use available skills you have access to whenever possible. Query ChEMBL for EGFR inhibitors (IC50 < 50nM), analyze structure-activity relationships 
with RDKit, generate improved analogs with datamol, perform virtual screening with DiffDock 
against AlphaFold EGFR structure, search PubMed for resistance mechanisms, check COSMIC for 
mutations, and create visualizations and a comprehensive report.

Skills Used: ChEMBL, RDKit, datamol, DiffDock, AlphaFold DB, PubMed, COSMIC, scientific visualization

Need cloud GPUs and a publication-ready report at the end? Run this on K-Dense Web free.

🔬 Single-Cell RNA-seq Analysis

Goal: Comprehensive analysis of 10X Genomics data with public data integration

Prompt:

Use available skills you have access to whenever possible. Load 10X dataset with Scanpy, perform QC and doublet removal, integrate with Cellxgene 
Census data, identify cell types using NCBI Gene markers, run differential expression with 
PyDESeq2, infer gene regulatory networks with Arboreto, enrich pathways via Reactome/KEGG, 
and identify therapeutic targets with Open Targets.

Skills Used: Scanpy, Cellxgene Census, NCBI Gene, PyDESeq2, Arboreto, Reactome, KEGG, Open Targets

Want zero-setup cloud execution and shareable outputs? Try K-Dense Web free.

🧬 Multi-Omics Biomarker Discovery

Goal: Integrate RNA-seq, proteomics, and metabolomics to predict patient outcomes

Prompt:

Use available skills you have access to whenever possible. Analyze RNA-seq with PyDESeq2, process mass spec with pyOpenMS, integrate metabolites from 
HMDB/Metabolomics Workbench, map proteins to pathways (UniProt/KEGG), find interactions via 
STRING, correlate omics layers with statsmodels, build predictive model with scikit-learn, 
and search ClinicalTrials.gov for relevant trials.

Skills Used: PyDESeq2, pyOpenMS, HMDB, Metabolomics Workbench, UniProt, KEGG, STRING, statsmodels, scikit-learn, ClinicalTrials.gov

This pipeline is heavy on compute. Run it on K-Dense Web with cloud GPUs, free to start.

🎯 Virtual Screening Campaign

Goal: Discover allosteric modulators for protein-protein interactions

Prompt:

Use available skills you have access to whenever possible. Retrieve AlphaFold structures, identify interaction interface with BioPython, search ZINC 
for allosteric candidates (MW 300-500, logP 2-4), filter with RDKit, dock with DiffDock, 
rank with DeepChem, check PubChem suppliers, search USPTO patents, and optimize leads with 
MedChem/molfeat.

Skills Used: AlphaFold DB, BioPython, ZINC, RDKit, DiffDock, DeepChem, PubChem, USPTO, MedChem, molfeat

Skip the local GPU bottleneck. Run virtual screening on K-Dense Web free.

🏥 Clinical Variant Interpretation

Goal: Analyze VCF file for hereditary cancer risk assessment

Prompt:

Use available skills you have access to whenever possible. Parse VCF with pysam, annotate variants with Ensembl VEP, query ClinVar for pathogenicity, 
check COSMIC for cancer mutations, retrieve gene info from NCBI Gene, analyze protein impact 
with UniProt, search PubMed for case reports, check ClinPGx for pharmacogenomics, generate 
clinical report with document processing tools, and find matching trials on ClinicalTrials.gov.

Skills Used: pysam, Ensembl, ClinVar, COSMIC, NCBI Gene, UniProt, PubMed, ClinPGx, Document Skills, ClinicalTrials.gov

Need a polished clinical report at the end, not just code? K-Dense Web delivers publication-ready outputs. Try it free.

🌐 Systems Biology Network Analysis

Goal: Analyze gene regulatory networks from RNA-seq data

Prompt:

Use available skills you have access to whenever possible. Query NCBI Gene for annotations, retrieve sequences from UniProt, identify interactions via 
STRING, map to Reactome/KEGG pathways, analyze topology with Torch Geometric, reconstruct 
GRNs with Arboreto, assess druggability with Open Targets, model with PyMC, visualize 
networks, and search GEO for similar patterns.

Skills Used: NCBI Gene, UniProt, STRING, Reactome, KEGG, Torch Geometric, Arboreto, Open Targets, PyMC, GEO

Want end-to-end pipelines with shareable outputs and no setup? Try K-Dense Web free.

📖 Want more examples? Check out docs/examples.md for comprehensive workflow examples and detailed use cases across all scientific domains.

🚀 Want to Skip the Setup and Just Do the Science?

Recognize any of these?

You spent more time configuring environments than running analyses
Your workflow needs a GPU your local machine does not have
You need a shareable, publication-ready figure or report, not just a script
You want to run a complex multi-step pipeline right now, without reading package docs first

If so, K-Dense Web was built for you. It is the full AI co-scientist platform: everything in this repo plus cloud GPUs, 200+ skills, and outputs you can drop directly into a paper or presentation. Zero setup required.

Feature	This Repo	K-Dense Web
Scientific Skills	135 skills	200+ skills (exclusive access)
Setup	Manual installation	Zero setup, works instantly
Compute	Your machine	Cloud GPUs and HPC included
Workflows	Prompt and code	End-to-end research pipelines
Outputs	Code and analysis	Publication-ready figures, reports, and papers
Integrations	Local tools	Lab systems, ELNs, and cloud storage

"K-Dense Web took me from raw sequencing data to a draft figure in one afternoon. What used to take three days of environment setup and scripting now just works." Computational biologist, drug discovery

Try K-Dense Web

k-dense.ai | Read the full comparison

🔬 Use Cases

🧪 Drug Discovery & Medicinal Chemistry

Virtual Screening: Screen millions of compounds from PubChem/ZINC against protein targets
Lead Optimization: Analyze structure-activity relationships with RDKit, generate analogs with datamol
ADMET Prediction: Predict absorption, distribution, metabolism, excretion, and toxicity with DeepChem
Molecular Docking: Predict binding poses and affinities with DiffDock
Bioactivity Mining: Query ChEMBL for known inhibitors and analyze SAR patterns

🧬 Bioinformatics & Genomics

Sequence Analysis: Process DNA/RNA/protein sequences with BioPython and pysam
Single-Cell Analysis: Analyze 10X Genomics data with Scanpy, identify cell types, infer GRNs with Arboreto
Variant Annotation: Annotate VCF files with Ensembl VEP, query ClinVar for pathogenicity
Variant Database Management: Build scalable VCF databases with TileDB-VCF for incremental sample addition, efficient population-scale queries, and compressed storage of genomic variant data
Gene Discovery: Query NCBI Gene, UniProt, and Ensembl for comprehensive gene information
Network Analysis: Identify protein-protein interactions via STRING, map to pathways (KEGG, Reactome)

🏥 Clinical Research & Precision Medicine

Clinical Trials: Search ClinicalTrials.gov for relevant studies, analyze eligibility criteria
Variant Interpretation: Annotate variants with ClinVar, COSMIC, and ClinPGx for pharmacogenomics
Drug Safety: Query FDA databases for adverse events, drug interactions, and recalls
Precision Therapeutics: Match patient variants to targeted therapies and clinical trials

🔬 Multi-Omics & Systems Biology

Multi-Omics Integration: Combine RNA-seq, proteomics, and metabolomics data
Pathway Analysis: Enrich differentially expressed genes in KEGG/Reactome pathways
Network Biology: Reconstruct gene regulatory networks, identify hub genes
Biomarker Discovery: Integrate multi-omics layers to predict patient outcomes

📊 Data Analysis & Visualization

Statistical Analysis: Perform hypothesis testing, power analysis, and experimental design
Publication Figures: Create publication-quality visualizations with matplotlib and seaborn
Network Visualization: Visualize biological networks with NetworkX
Report Generation: Generate comprehensive PDF reports with Document Skills

🧪 Laboratory Automation

Protocol Design: Create Opentrons protocols for automated liquid handling
LIMS Integration: Integrate with Benchling and LabArchives for data management
Workflow Automation: Automate multi-step laboratory workflows

📚 Available Skills

This repository contains 135 scientific and research skills organized across multiple domains. Each skill provides comprehensive documentation, code examples, and best practices for working with scientific libraries, databases, and tools.

Skill Categories

Note: The Python package and integration skills listed below are explicitly defined skills — curated with documentation, examples, and best practices for stronger, more reliable performance. They are not a ceiling: the agent can install and use any Python package or call any API, even without a dedicated skill. The skills listed simply make common workflows faster and more dependable.

🧬 Bioinformatics & Genomics (21+ skills)

Sequence analysis: BioPython, pysam, scikit-bio, BioServices
Single-cell analysis: Scanpy, AnnData, scvi-tools, scVelo (RNA velocity), Arboreto, Cellxgene Census
Genomic tools: gget, geniml, gtars, deepTools, FlowIO, Polars-Bio, Zarr, TileDB-VCF
Differential expression: PyDESeq2
Phylogenetics: ETE Toolkit, Phylogenetics (MAFFT, IQ-TREE 2, FastTree)

🧪 Cheminformatics & Drug Discovery (10+ skills)

Molecular manipulation: RDKit, Datamol, Molfeat
Deep learning: DeepChem, TorchDrug
Docking & screening: DiffDock
Molecular dynamics: OpenMM + MDAnalysis (MD simulation & trajectory analysis)
Cloud quantum chemistry: Rowan (pKa, docking, cofolding)
Drug-likeness: MedChem
Benchmarks: PyTDC

🔬 Proteomics & Mass Spectrometry (2 skills)

Spectral processing: matchms, pyOpenMS

🏥 Clinical Research & Precision Medicine (8+ skills)

Clinical databases: via Database Lookup (ClinicalTrials.gov, ClinVar, ClinPGx, COSMIC, FDA, cBioPortal, Monarch, and more)
Cancer genomics: DepMap (cancer dependency scores, drug sensitivity)
Cancer imaging: Imaging Data Commons (NCI radiology & pathology datasets via idc-index)
Healthcare AI: PyHealth, NeuroKit2, Clinical Decision Support
Clinical documentation: Clinical Reports, Treatment Plans

🖼️ Medical Imaging & Digital Pathology (3 skills)

DICOM processing: pydicom
Whole slide imaging: histolab, PathML

🧠 Neuroscience & Electrophysiology (1 skill)

Neural recordings: Neuropixels-Analysis (extracellular spikes, silicon probes, spike sorting)

🤖 Machine Learning & AI (16+ skills)

Deep learning: PyTorch Lightning, Transformers, Stable Baselines3, PufferLib
Classical ML: scikit-learn, scikit-survival, SHAP
Time series: aeon, TimesFM (Google's zero-shot foundation model for univariate forecasting)
Bayesian methods: PyMC
Optimization: PyMOO
Graph ML: Torch Geometric
Dimensionality reduction: UMAP-learn
Statistical modeling: statsmodels

🔮 Materials Science, Chemistry & Physics (7 skills)

Materials: Pymatgen
Metabolic modeling: COBRApy
Astronomy: Astropy
Quantum computing: Cirq, PennyLane, Qiskit, QuTiP

⚙️ Engineering & Simulation (4 skills)

Numerical computing: MATLAB/Octave
Computational fluid dynamics: FluidSim
Discrete-event simulation: SimPy
Symbolic math: SymPy

📊 Data Analysis & Visualization (16+ skills)

Visualization: Matplotlib, Seaborn, Scientific Visualization
Geospatial analysis: GeoPandas, GeoMaster (remote sensing, GIS, satellite imagery, spatial ML, 500+ examples)
Data processing: Dask, Polars, Vaex
Network analysis: NetworkX
Document processing: Document Skills (PDF, DOCX, PPTX, XLSX)
Infographics: Infographics (AI-powered professional infographic creation)
Diagrams: Markdown & Mermaid Writing (text-based diagrams as default documentation standard)
Exploratory data analysis: EDA workflows
Statistical analysis: Statistical Analysis workflows

🧪 Laboratory Automation (4 skills)

Liquid handling: PyLabRobot
Cloud lab: Ginkgo Cloud Lab (cell-free protein expression, fluorescent pixel art via autonomous RAC infrastructure)
Protocol management: Protocols.io
LIMS integration: Benchling, LabArchives

🔬 Multi-omics & Systems Biology (4+ skills)

Pathway analysis: via Database Lookup (KEGG, Reactome, STRING) and PrimeKG
Multi-omics: HypoGeniC
Data management: LaminDB

🧬 Protein Engineering & Design (3 skills)

Protein language models: ESM
Glycoengineering: Glycoengineering (N/O-glycosylation prediction, therapeutic antibody optimization)
Cloud laboratory platform: Adaptyv (automated protein testing and validation)

📚 Scientific Communication (20+ skills)

Literature: Paper Lookup (PubMed, PMC, bioRxiv, medRxiv, arXiv, OpenAlex, Crossref, Semantic Scholar, CORE, Unpaywall), Literature Review
Advanced paper search: BGPT Paper Search (25+ structured fields per paper — methods, results, sample sizes, quality scores — from full text, not just abstracts)
Web search: Parallel Web (synthesized summaries with citations)
Research notebooks: Open Notebook (self-hosted NotebookLM alternative — PDFs, videos, audio, web pages; 16+ AI providers; multi-speaker podcast generation)
Writing: Scientific Writing, Peer Review
Document processing: XLSX, MarkItDown, Document Skills
Publishing: Venue Templates
Presentations: Scientific Slides, LaTeX Posters, PPTX Posters
Diagrams: Scientific Schematics, Markdown & Mermaid Writing
Infographics: Infographics (10 types, 8 styles, colorblind-safe palettes)
Citations: Citation Management
Illustration: Generate Image (AI image generation with FLUX.2 Pro and Gemini 3 Pro (Nano Banana Pro))

🔬 Scientific Databases & Data Access (6 skills → 100+ databases total)

A unified database-lookup skill provides direct REST API access to 78 public databases across all domains. Dedicated skills cover specialized data platforms. Multi-database packages like BioServices (~40 bioinformatics services), BioPython (38 NCBI sub-databases via Entrez), and gget (20+ genomics databases) add further coverage.

Unified access: Database Lookup (78 databases spanning chemistry, genomics, clinical, pathways, patents, economics, and more — PubChem, ChEMBL, UniProt, PDB, AlphaFold, KEGG, Reactome, STRING, ClinVar, COSMIC, ClinicalTrials.gov, FDA, FRED, USPTO, SEC EDGAR, and dozens more)
Cancer genomics: DepMap (cancer cell line dependencies, drug sensitivity, gene effect profiles)
Cancer imaging: Imaging Data Commons (NCI radiology & pathology datasets via idc-index)
Knowledge graph: PrimeKG (precision medicine knowledge graph — genes, drugs, diseases, phenotypes)
Fiscal data: U.S. Treasury Fiscal Data (national debt, Treasury statements, auctions, exchange rates)
Scientific ML resource catalog: Hugging Science (curated index of datasets, models, blog posts, and interactive Spaces across 17 scientific domains — astronomy, biology, chemistry, climate, genomics, materials science, medicine, physics, scientific reasoning, and more — with usage patterns for datasets, transformers, and gradio_client)

🔧 Infrastructure & Platforms (7+ skills)

Cloud compute: Modal
GPU acceleration: Optimize for GPU (CuPy, Numba CUDA, Warp, cuDF, cuML, cuGraph, KvikIO, cuCIM, cuxfilter, cuVS, cuSpatial, RAFT)
Genomics platforms: DNAnexus, LatchBio
Microscopy: OMERO
Automation: Opentrons
Resource detection: Get Available Resources

🎓 Research Methodology & Planning (12+ skills)

Ideation: Scientific Brainstorming, Hypothesis Generation
Critical analysis: Scientific Critical Thinking, Scholar Evaluation
Scenario analysis: What-If Oracle (multi-branch possibility exploration, risk analysis, strategic options)
Multi-perspective deliberation: Consciousness Council (diverse expert viewpoints, devil's advocate analysis)
Cognitive profiling: DHDNA Profiler (extract thinking patterns and cognitive signatures from any text)
Funding: Research Grants
Discovery: Research Lookup, Paper Lookup (10 academic databases)
Market analysis: Market Research Reports

⚖️ Regulatory & Standards (1 skill)

Medical device standards: ISO 13485 Certification

📖 For complete details on all skills, see docs/scientific-skills.md

💡 Looking for practical examples? Check out docs/examples.md for comprehensive workflow examples across all scientific domains.

🤝 Contributing

We welcome contributions to expand and improve this scientific skills repository!

Ways to Contribute

✨ Add New Skills

Create skills for additional scientific packages or databases
Add integrations for scientific platforms and tools

📚 Improve Existing Skills

Enhance documentation with more examples and use cases
Add new workflows and reference materials
Improve code examples and scripts
Fix bugs or update outdated information

🐛 Report Issues

Submit bug reports with detailed reproduction steps
Suggest improvements or new features

How to Contribute

Fork the repository
Create a feature branch (git checkout -b feature/amazing-skill)
Follow the existing directory structure and documentation patterns
Ensure all new skills include comprehensive SKILL.md files
Test your examples and workflows thoroughly
Commit your changes (git commit -m 'Add amazing skill')
Push to your branch (git push origin feature/amazing-skill)
Submit a pull request with a clear description of your changes

Contribution Guidelines

✅ Adhere to the Agent Skills Specification — Every skill must follow the official spec (valid SKILL.md frontmatter, naming conventions, directory structure)
✅ Maintain consistency with existing skill documentation format
✅ Ensure all code examples are tested and functional
✅ Follow scientific best practices in examples and workflows
✅ Update relevant documentation when adding new capabilities
✅ Provide clear comments and docstrings in code
✅ Include references to official documentation

Security Scanning

All skills in this repository are security-scanned using Cisco AI Defense Skill Scanner, an open-source tool that detects prompt injection, data exfiltration, and malicious code patterns in Agent Skills.

If you are contributing a new skill, we recommend running the scanner locally before submitting a pull request:

uv pip install cisco-ai-skill-scanner
skill-scanner scan /path/to/your/skill --use-behavioral

Note: A clean scan result reduces noise in review, but does not guarantee a skill is free of all risk. Contributed skills are also reviewed manually before merging.

Recognition

Contributors are recognized in our community and may be featured in:

Repository contributors list
Special mentions in release notes
K-Dense community highlights

Your contributions help make scientific computing more accessible and enable researchers to leverage AI tools more effectively!

Support Open Source

This project builds on 50+ amazing open source projects. If you find value in these skills, please consider supporting the projects we depend on.

🔧 Troubleshooting

Common Issues

Problem: Skills not loading

Verify skill folders are in the correct directory (see Getting Started)
Each skill folder must contain a SKILL.md file
Restart your agent/IDE after copying skills
In Cursor, check Settings → Rules to confirm skills are discovered

Problem: Missing Python dependencies

Solution: Check the specific SKILL.md file for required packages
Install dependencies: uv pip install package-name

Problem: API rate limits

Solution: Many databases have rate limits. Review the specific database documentation
Consider implementing caching or batch requests

Problem: Authentication errors

Solution: Some services require API keys. Check the SKILL.md for authentication setup
Verify your credentials and permissions

Problem: Outdated examples

Solution: Report the issue via GitHub Issues
Check the official package documentation for updated syntax

❓ FAQ

General Questions

Q: Is this free to use?
A: Yes! This repository is MIT licensed. However, each individual skill has its own license specified in the license metadata field within its SKILL.md file—be sure to review and comply with those terms.

Q: Why are all skills grouped together instead of separate packages?
A: We believe good science in the age of AI is inherently interdisciplinary. Bundling all skills together makes it trivial for you (and your agent) to bridge across fields—e.g., combining genomics, cheminformatics, clinical data, and machine learning in one workflow—without worrying about which individual skills to install or wire together.

Q: Can I use this for commercial projects?
A: The repository itself is MIT licensed, which allows commercial use. However, individual skills may have different licenses—check the license field in each skill's SKILL.md file to ensure compliance with your intended use.

Q: Do all skills have the same license?
A: No. Each skill has its own license specified in the license metadata field within its SKILL.md file. These licenses may differ from the repository's MIT License. Users are responsible for reviewing and adhering to the license terms of each individual skill they use.

Q: How often is this updated?
A: We regularly update skills to reflect the latest versions of packages and APIs. Major updates are announced in release notes.

Q: Can I use this with other AI models?
A: The skills follow the open Agent Skills standard and work with any compatible agent, including Cursor, Claude Code, and Codex.

Installation & Setup

Q: Do I need all the Python packages installed?
A: No! Only install the packages you need. Each skill specifies its requirements in its SKILL.md file.

Q: What if a skill doesn't work?
A: First check the Troubleshooting section. If the issue persists, file an issue on GitHub with detailed reproduction steps.

Q: Do the skills work offline?
A: Database skills require internet access to query APIs. Package skills work offline once Python dependencies are installed.

Contributing

Q: Can I contribute my own skills?
A: Absolutely! We welcome contributions. See the Contributing section for guidelines and best practices.

Q: How do I report bugs or suggest features?
A: Open an issue on GitHub with a clear description. For bugs, include reproduction steps and expected vs actual behavior.

💬 Support

Need help? Here's how to get support:

📖 Documentation: Check the relevant SKILL.md and references/ folders
🐛 Bug Reports: Open an issue
💡 Feature Requests: Submit a feature request
💼 Enterprise Support: Contact K-Dense for commercial support

📖 Citation

If you use Scientific Agent Skills in your research or project, please cite it as:

BibTeX

@software{scientific_agent_skills_2026,
  author = {{K-Dense Inc.}},
  title = {Scientific Agent Skills: A Comprehensive Collection of Scientific Tools for AI Agents},
  year = {2026},
  url = {https://github.com/K-Dense-AI/scientific-agent-skills},
  note = {135 skills covering databases, packages, integrations, and analysis tools}
}

APA

K-Dense Inc. (2026). Scientific Agent Skills: A comprehensive collection of scientific tools for AI agents [Computer software]. https://github.com/K-Dense-AI/scientific-agent-skills

MLA

K-Dense Inc. Scientific Agent Skills: A Comprehensive Collection of Scientific Tools for AI Agents. 2026, github.com/K-Dense-AI/scientific-agent-skills.

Plain Text

Scientific Agent Skills by K-Dense Inc. (2026)
Available at: https://github.com/K-Dense-AI/scientific-agent-skills

We appreciate acknowledgment in publications, presentations, or projects that benefit from these skills!

📄 License

This project is licensed under the MIT License.

Key Points:

✅ Free for any use (commercial and noncommercial)
✅ Open source - modify, distribute, and use freely
✅ Permissive - minimal restrictions on reuse
⚠️ No warranty - provided "as is" without warranty of any kind

See LICENSE.md for full terms.

Individual Skill Licenses

⚠️ Important: Each skill has its own license specified in the license metadata field within its SKILL.md file. These licenses may differ from the repository's MIT License and may include additional terms or restrictions. Users are responsible for reviewing and adhering to the license terms of each individual skill they use.

USP

Use cases

Detected files (8)

README

Scientific Agent Skills

📦 What's Included

📋 Table of Contents

🚀 Why Use This?

⚡ Accelerate Your Research

🎯 Comprehensive Coverage

🔧 Easy Integration

🌟 Maintained & Supported

🎯 Getting Started

Option 1: npx (all platforms)

Option 2: GitHub CLI (gh skill)

Version pinning

Keeping skills up to date

⚠️ Security Disclaimer

❤️ Support the Open Source Community

⚙️ Prerequisites

Installing uv

💡 Quick Examples

🧪 Drug Discovery Pipeline

🔬 Single-Cell RNA-seq Analysis

🧬 Multi-Omics Biomarker Discovery

🎯 Virtual Screening Campaign

🏥 Clinical Variant Interpretation

🌐 Systems Biology Network Analysis

🚀 Want to Skip the Setup and Just Do the Science?

🔬 Use Cases

🧪 Drug Discovery & Medicinal Chemistry

🧬 Bioinformatics & Genomics

🏥 Clinical Research & Precision Medicine

🔬 Multi-Omics & Systems Biology

📊 Data Analysis & Visualization

🧪 Laboratory Automation

📚 Available Skills

Skill Categories

🧬 Bioinformatics & Genomics (21+ skills)

🧪 Cheminformatics & Drug Discovery (10+ skills)

🔬 Proteomics & Mass Spectrometry (2 skills)

🏥 Clinical Research & Precision Medicine (8+ skills)

🖼️ Medical Imaging & Digital Pathology (3 skills)

🧠 Neuroscience & Electrophysiology (1 skill)

🤖 Machine Learning & AI (16+ skills)

🔮 Materials Science, Chemistry & Physics (7 skills)

⚙️ Engineering & Simulation (4 skills)

📊 Data Analysis & Visualization (16+ skills)

🧪 Laboratory Automation (4 skills)

🔬 Multi-omics & Systems Biology (4+ skills)

🧬 Protein Engineering & Design (3 skills)

📚 Scientific Communication (20+ skills)

🔬 Scientific Databases & Data Access (6 skills → 100+ databases total)

🔧 Infrastructure & Platforms (7+ skills)

🎓 Research Methodology & Planning (12+ skills)

⚖️ Regulatory & Standards (1 skill)

🤝 Contributing

Ways to Contribute

How to Contribute

Contribution Guidelines

Security Scanning

Recognition

Support Open Source

🔧 Troubleshooting

Common Issues

❓ FAQ

General Questions

Installation & Setup

Contributing

💬 Support

📖 Citation

BibTeX

APA

MLA

Plain Text

📄 License

Key Points:

Individual Skill Licenses

Star History

Option 2: GitHub CLI (`gh skill`)